NoSQL Rocks? We try to understand
At Gioorgi.com we was never a SQL fan. In 2000 we thinked SQL was boring, mostly because sql algebra could be a bit boring. Then we found this book written by one of the father of SQL. Years ago Google and then Facebook popped out with new incredible ideas, for improved and super fast scalability, which eventually turned to the "NoSQL" mantra.
But NoSQL is a mature technology, or it is only a path traced by the Social company out of there? Let's explore together...
After Facebook's Cassandra and Google's BigTable, all the mumbo-magic of the heavy EJB specification started to fade out suddendly. Java Spring tried to convince us EJB are too complicated...but it is true? EJB specification (born on the end of '90) aims to offer:
- scalability and fault tolerance
- stateless lifecycle (with stateless ejb)
- stateful lifecycle (with stateful ejb)
- transaction managed lifecycle (entity ejb). This eventually included new ORM tools like Hibernate and so on in the last specifications.
EJB implementation was trained by big vendor with a lot of expertise on transaction monitor: complex stuff able to scale and manage transaction. Tipically such vendor (IBM, Tibco, etc) was managing banking transaction, so in that context consistency was a major concern.
Then...the Father of Java (Goslin) was hired by Google :) Is something changing?
Yes, it is: in the last years eBay, Amazon, and Facebook needed to scale at the maximum rate. Google Search engine is another story because initially it was a read only query, so it needs to scale but with a better set of constraints. (in the last year Google started real-time upgrade to their index, and faces similar issues).
In this article, we will try to explore the NoSQL mantra. So let's start. Your comments will be welcome! What is your experience? Keep in mind NoSQL is not a standard, its like more a "tag" a descriptive label. All the NoSQL database differs each other, so we faced some difficulties harvesting information out of there.
Focusing on Database, the first thing to consider is the Brewer's CAP Theorem:
The CAP theorem, also known as Brewer's theorem, states that it is impossible for a distributed computer system to simultaneously provide all three of the following guarantees:[1][2]
- Consistency (all nodes see the same data at the same time)
- Availability (node failures do not prevent survivors from continuing to operate)
- Partition tolerance (the system continues to operate despite arbitrary message loss)
According to the theorem, a distributed system can satisfy any two of these guarantees at the same time, but not all three.[3]
Normally relational databases try to take the first two. The easiest implementation is a master/slave replication on which if the master fail the slave can recover. But if the cable between master and slave is cut by a wererabbit, your infrastructure is broken. You slave cannot be synced and if your master go down you are in disgrace, even with slave working... a true distributed system would avoid this. Apache Cassandra take the last two, loosing Consistency at 100%. Google's Big Table and MongoDB drops Availability at 100%Anyway some NoSQL can "tune" this option: for instance on Cassandra you will be able to say how much replicas in the cluster you want consistent. This is quite important. Big Table could lost Availability of some data, but for a very short time! (BigTable is a consuing on its documentation, so I will avoid to do more deeply analysis).
Take a look also at this other comparation, which is a bit biased in favor of Mongodb.
So the conclusion is: stay and wait, NoSql is still not yet mature.