One of the more interesting patches for the linux kernel lately has been Wu Fengguang's adaptive readahead patchset, currently at version 12. Talking about its performance benefits Wu says: "besides file servers and desktops, it is recently found to benefit postgresql databases a lot.".
So I decided to do a simple benchmark to see what difference would adaptive readahead make in my case. The idea was to test a very simple database query (random select) to the PostgreSQL database and see how it performs through time (while the memory is being primed with data from disk).
As a test kernel I used 2.6.17-rc5 which was patched with the latest Wu's patchset. Because I have used the unsupported release candidate kernel I did have some conflicts while applying the patch, but they were quite easy to resolve. Before every run I rebooted the computer to be sure that file cache is cleaned up so we have a fresh start every time. Immediately after reboot I fired up the attached simple perl script. Its only task was to query the database in a random fashion. The test database was a table with data of about million and a half phone subscribers queried by number. Every ten seconds the script would print out the average number of queries per second achieved in the last ten seconds period. The idea was to monitor it through time to see how fast the kernel is able to pull database data into the main memory, which is a test that even simple readahead algorithm should do well. There was enough physical memory to cache the whole table and its indexes so at the end of the benchmark, when we achieved the full speed, there was no I/O to disk, all data was cached in memory and we were in fact measuring the CPU speed (Pentium M 1.5GHz scaled down to 600MHz, if you need to know).
I must admit that I didn't pick this test by chance. I noticed before that PostgreSQL database was very slow in this kind of tests, at least compared to the other databases. It would always spend much more time to pull data into memory thanks to the now known fact that PostgreSQL doesn't have any readahead algorithms implemented by itself, but instead relies on the kernel to do the magic.
In the picture below you can see the difference between the run on the default (unpatched) kernel (red line) and the run with the adaptive readahead patchset applied (green line).
I think the graph speaks for itself. It took around 6 minutes to prime the memory when run on the standard kernel, while on the other hand, when adaptive readahead was compiled in, the database was fully cached after only 2 minutes. That amounts to a 3x speedup.
This test was so simple that we shouldn't draw any far reaching conclusions. Yes, Wu has made a good job and in cases like this it will surely help to get the data from disk cached in memory faster. The hard part is to test the myriad of other setups and especially the behavior of the patch when the memory resources are scarce. No doubt Wu is busy testing at the moment and I hope other people will join in and report what they have found.