Postgres Take it All
PostgreSQL is becoming a catch-all solution for simple scenario, and this is a trend, not an accident. A new article full of evidence enforced the ideas I wrote down in the 2019 about Kafka vs PostgreSQL: lets dig into it.
In the last 15 years in the IT are born very nice solution to address ‘facebook effect’ challenge. Some example to explain what I mean.
Back in 2009…
Friends
When you have a graph of friends of friends, a relational database can have an hard day extracting all the information you need. If your database has 1 billion people as records (like Facebook used to) your problem seems intractable. NoSQL database are born.
We are the dot in dot….bubble
You have a big shop like Ikea or Amazon: you sell heterogeneous stuff from plastic bags to reindeer meatballs, passing by desks and chairs. On these scenarios a big t_product table is challenging (Oracle per-table column limit is 1000). For sure every of these things have a price. You can meta-describe it but believe me, you are entering in the meta-dimension madness: for instance Reddit chosed to have just two tables to model everything, a very bold (and crazy) move but okey…we was in 2012…
During this phase Mongo DB emerged.
The castle of Logs
If you have a social application (Linkedin) you end up managing a lot of streamed data (from cellphones, website, advertisement systems, etc) you need to process in very different way. Also a big trading system has similar need. Sometime you want to re-play the data to reprocess them on a different angle. Kafka emerged.
In this time frame databases was just databases, and Oracle was the only one to start providing additional feature on top of them. PostgreSQL was known to be painful slow but less bugged than 10 years before.
Fast forward on 2025
Let me address the challenge from another point of view.
You can rent on AWS a 64 CPU+128GB RAM for 14304€ per year. This machine has the same CPU and double the memory of Sun E10000. Fineco used 3 of them in 2000 to manage its trading on line platform. Now it costs less then a 15 day-vacation on Europe (or Japan).
PostgreSQL nowadays offer:
- Json management via native-type support
- Full text search support (with stemming included)
- Parallel select SQL
Cloud providers offer PostgreSQL installation with stronger performance too.
Why a dumb architecture is better than a bad architecture
Often I am called to set up new projects (which are 99% of time tech refresh of legacy one, for sure). These system interact with other big systems.
When I ask for a volume estimation, things get incredible complex. Business stakeholders tend to be a bit optimistic on these values. Also data not always are congruent. On the last system I am working on, about 40% of input transactions was not valid (i.e. hits the limit or are malformed) and so the declared volume was 40% greater then the outcome sent to other systems (!).
On these scenarios, finding the correct sizing of the system upfront is challenging and often impossible.
On some projects, make little sense to propose MongoDB or Kafka without strong evidence (in the form of volume, scope, provisioning costs tradeoff, etc).
Things change…
I want to stress you than after first set of deployments, and if your business starts to become successful, things will change very fast. As an example first installation of Facebook just required 1000 USD rented machine and used MySQL.
Last but not least it seems BigData are quite dead, for reason not linked to the IT but to other forces (like legal risks of keeping too much data laying around). These facts bring up another weird situation: a fat MySql/MariaDB installation can often be on par on a MongoDB cluster, simply because you have too little data to justify MongoDB super-powers.
The complexity of managing a fault-tolerant Kafka installation is often underestimated: a replicated passive-active database is far more simple to set-up.
Conclusion
It become crucial to analyze your use case. Modern database like PostgreSQL (or Oracle) are ideal for exploring different architectural setup: you can implement small queues, structure key/value data in json column, do full text searches functionalities, and so on.
I was amazed on how it is simple to update the search index of a single document in SQLite compared to the index update of an ElasticSearch cluster. It is a simple task, but on SQLite you need to know nothing: it is just an update!
And yes SQLite too has minimal text indexing capabilities too :)
After you embrace the “PostrgreSQL take it all” mantra, you can evolve your infrastructure adding components like RabbitMQ, Kafka, ElasticSearch, Vector Databases, etc in a second phase (that believe be, could be in a far future called “NEVER” :)