A lot of distributed database started tutorial with the statement “disk read are slow, write are fast” which is true to some extent, and accounted for de-normalizing data and memory caches.
Anyway the solution seems the OPPOSITE (bold are mine):
[…]disk performance is that the throughput of hard drives has been diverging from the latency of a disk seek for the last decade.[…]
To compensate for this performance divergence, modern operating systems have become increasingly aggressive in their use of main memory for disk caching.[….]
Furthermore, we are building on top of the JVM, and anyone who has spent any time with Java memory usage knows two things:
- The memory overhead of objects is very high, often doubling the size of the data stored (or worse).
- Java garbage collection becomes increasingly fiddly and slow as the in-heap data increases.
As a result of these factors using the filesystem and relying on pagecache is superior to maintaining an in-memory cache or other structure—we at least double the available cache by having automatic access to all free memory, and likely double again by storing a compact byte structure rather than individual objects. Doing so will result in a cache of up to 28-30GB on a 32GB machine without GC penalties. Furthermore, this cache will stay warm even if the service is restarted, whereas the in-process cache will need to be rebuilt in memory (which for a 10GB cache may take 10 minutes) or else it will need to start with a completely cold cache (which likely means terrible initial performance). This also greatly simplifies the code as all logic for maintaining coherency between the cache and filesystem is now in the OS, which tends to do so more efficiently and more correctly than one-off in-process attempts. If your disk usage favors linear reads then read-ahead is effectively pre-populating this cache with useful data on each disk read.
Source: Apache Kafka