Help Daitan find a Sql solution for codezauker

In my code ramblings during the developement of Code Zauker, I ended up studing a bit NoSql  database.

Code zauker started using Redis, because Redis is a very bold memory-based no-sql db.
Redis also support complex data type like sorted set, lists and so on, which was very userful.
Anyway I needed a very fast way of doing statistics on data collected by code zauker, for inspecting data and finding out new feature I’d like to add to it.
Worst, my current pocket pose a hard limit on a redis instance in terms of available RAM. So I started thinking a lot about a redis drop-in replacement for Code Zauker.

As this very smart article pointed out, “SQL pays a lot of attention to transactional guaranties, schemas, and referential integrity”.

Drop Referential integrity

But on these days no human is going to push sql to the db:most Rails or PHP webapp are so easy to write, so the end user will use a web 2.0 interface with just in touch validation, limits checking and so on.

In this scenario, transactional guaranties and referential integrity can be part of a problem, not of the solution.

Large data set can contains referential integrity holes (missed user emails for instance) but you still need to cope with them.  Trying to bump them out of the db is a losing stragey, and Customer will be unhappy. Customer is human, and human cope well with small errors on a large most consistent data base. They will not complain even if 10% of data is trash, as far as I can tell you, based on my experience…

Do not drop  grouping and vectorial functions

Sql anyway has a lot of operators for  aggregated reporting information. You can group by data, intersect them and so on.  Some NoSql servers offer similar features (like MongoDB) but they still miss the point.

So in an ideal world, I will be happy with a loose schema sql database which still will be able to do some grouping and reporting on data.

I am not stupid: these needs are very hard to get in the same data structure, so NoSQL chooses the “unstructured normalized” path.

I am studying a redis drop-in replacement for Code Zauker with the above feature in mind, ie. fast access but grouping features.

For Code Zauker, I need:

  • a very low-memory footprint (because I cannot run big Redis instances on my hosting)
  • an acceptable intersect operation on trigram sets.
  • a well compressed data store. Redis is very good to compress its data, and my experimental sqlite drop-in replacement showed the sqlite “ingenuous db” is a bit too much suboptimal.
    Anyway the sql drop in replacement performed well with the “:memory:” db.
  • Sharding and data partition could be a plus, because I plan to fire code zauker on a huge set of code base, and I’d like to have different indexes on different machine/environment, with a light federation engine.

So do you have a solution? Please help me (the Daitan) to find out a brave idea!

PS: Some smart guys are using MySql as a NoSQL database at least from 2010…

Leave a Reply