With a little bit of torturing, and some fun on the way, find out how fast your hard disk drive really is.
1-Terabyte hard disk drives are slowly coming to the market, so I suppose we can't complain that we don't have enough space to save (the ever increasing amount of) our precious data. But, it's also a known fact that although disk storage capacities are improving at an impressive rate, disk performance improvements are occurring at a rather slower rate. Unfortunately, larger disk doesn't always mean faster disk. What follows is an explanation of two techniques for measuring disk performance in Linux.
The upcoming 2.6.20 Linux kernel is bringing a nice virtualization framework for all virtualization fans out there. It's called KVM, short for Kernel-based Virtual Machine. Not only is it user-friendly, but also of high performance and very stable, even though it's not yet officialy released. This article tries to explain how it all works, in theory and practice, together with some simple benchmarks.
A little bit of theory
There are several approaches to virtualization, today. One of them is a so called paravirtualization, where the guest OS must be slightly modified in order to run virtualized. The other method is called "full virtualization", where the guest OS can run as it is, unmodified. It has been said that full virtualization trades performance for compatibility, because it's harder to accomplish good performance without guest OS assisting in the process of virtualization. On the other hand, recent processor developments tend to narrow that gap. Both Intel (VT) and AMD (AMD-V) latest processors have hardware support for virtualization, tending to make paravirtualization not necessary. This is exactly what KVM is all about, by adding virtualization capabilities to a standard Linux kernel, we can enjoy all the fine-tuning work that has gone (and is going) into the kernel, and bring that benefit into a virtualized environment.
Seeing that the development of the ext3 file system successor has started, and that Andrew Morton has released mm patch containing ext4 file system, I decided to run some simple benchmarks, even in this early stage of development.
Because the mm patch also contains Hans Reiser's reiser4 file system, I decided to run benchmarks against it, too, for a good measure. Let me once again remind that both ext4 and reiser4 are still in development, while ext3 has been in production for many years, so take all the results below with a grain of salt.
PostgreSQL is a powerful, open source relational database system. It has more than 15 years of active development and a proven architecture that has earned it a strong reputation for reliability, data integrity, and correctness.
One of the PostgreSQL's most sophisticated features is so called Multi-Version Concurrency Control (MVCC), a standard technique for avoiding conflicts between reads and writes of the same object in database. MVCC guarantees that each transaction sees a consistent view of the database by reading non-current data for objects modified by concurrent transactions. Thanks to MVCC, PostgreSQL has great scalability, a robust hot backup tool and many other nice features comparable to the most advanced commercial databases.
One of the more interesting patches for the linux kernel lately has been Wu Fengguang's adaptive readahead patchset, currently at version 12. Talking about its performance benefits Wu says: "besides file servers and desktops, it is recently found to benefit postgresql databases a lot.".
So I decided to do a simple benchmark to see what difference would adaptive readahead make in my case. The idea was to test a very simple database query (random select) to the PostgreSQL database and see how it performs through time (while the memory is being primed with data from disk).