NoSQL Rocks? We try to understand

At Gioorgi.com we was never a SQL fan. In 2000 we thinked SQL was boring, mostly because sql algebra could be a bit boring. Then we found this book written by one of the father of SQL. Years ago Google and then Facebook popped out with new incredible ideas, for improved and super fast scalability, which eventually turned to the "NoSQL" mantra.

But NoSQL is a mature technology, or it is only a path traced by the Social company out of there? Let's explore together...

After Facebook's Cassandra and Google's BigTable, all the mumbo-magic of the heavy EJB  specification started to fade out suddendly.  Java Spring tried to convince us EJB are too complicated...but it is true? EJB specification (born on the end of '90) aims to offer:

  • scalability and fault tolerance
  • stateless lifecycle (with stateless ejb)
  • stateful lifecycle (with stateful ejb)
  • transaction managed lifecycle (entity ejb). This eventually included new ORM tools like Hibernate and so on in the last specifications.
Some of this stuff performs very well, and is heavy used. Stateless ejb seems the most used here at Gioorgi.com when our customers ask us EJB implementation.

EJB implementation was trained by big vendor with a lot of expertise on transaction monitor: complex stuff able to scale and manage transaction. Tipically such vendor (IBM, Tibco, etc) was managing banking transaction, so in that context consistency was a major concern.

Then...the  Father of Java (Goslin) was hired by Google :) Is something changing?

Yes, it is: in the last years  eBay, Amazon, and Facebook needed to scale at the maximum rate. Google Search engine  is another story because initially it was a read only query, so it needs to scale but with a better set of constraints. (in the last year Google started real-time upgrade to their index, and faces similar issues).

In this  article, we will try to explore the NoSQL mantra. So let's start. Your comments will be welcome! What is your experience? Keep in mind NoSQL is not a standard, its like more a "tag" a descriptive label.  All the NoSQL database differs each other, so we faced some difficulties harvesting information out of there.

Focusing on Database, the first thing to consider is the Brewer's CAP Theorem:

The CAP theorem, also known as Brewer's theorem, states that it is impossible for a distributed computer system to simultaneously provide all three of the following guarantees:[1][2]

  • Consistency (all nodes see the same data at the same time)
  • Availability (node failures do not prevent survivors from continuing to operate)
  • Partition tolerance (the system continues to operate despite arbitrary message loss)

According to the theorem, a distributed system can satisfy any two of these guarantees at the same time, but not all three.[3]

Normally relational databases try to take the first two. The easiest implementation is a master/slave replication on which if the master fail the slave can recover. But if the cable between master and slave is cut by a wererabbit, your infrastructure is broken. You slave cannot be synced and if your master go down you are in disgrace, even with slave working... a true distributed system would avoid this. Apache Cassandra take the last two, loosing Consistency at 100%. Google's Big Table and MongoDB drops Availability at 100%

Anyway some NoSQL can "tune" this option: for instance on Cassandra you will be able to say how much replicas in the cluster you want consistent. This is quite important. Big Table could lost Availability of some data, but for a very short time! (BigTable is a consuing on its documentation, so I will avoid to do more deeply analysis).

Is NoSQL best fit for your next project? This question is very difficult, also because there are not yet best practices and/or pattern to follow and explore, but a lot of them do not seems to pay the bill yet.

Take a look also at this other comparation, which is a bit biased in favor of Mongodb.

So the conclusion is: stay and wait, NoSql is still not yet mature.