Can you handle the pressure? Making Linux bulletproof under load

Operating under memory pressure has been a persistent problem for Linux customers. Despite significant work done in the 2.6 kernel to improve its handling of memory, it is still easy to make the Linux kernel slow to a crawl or lock up completely under load.

One of the fundamental sources for memory pressure is the filesystem pagecache usage, along with the buffer_head entries that control them. Another problem area is inode and dentry cache entries in the slab cache. Linux struggles to keep either of these under control. Userspace processes provide another obvious source of memory usage, which are partially handled by the OOM killer subsystem, which has often been accused of making poor decisions on which process to kill.

This paper takes a closer look at various scenerios causing of memory pressure and the way VM handles it currently, what we have done to keep the system for falling apart. This paper also discusses the future work that needs to be done to improve further, which may require careful re-design of subsystems.

This paper will try to describe the basics of memory reclaim in a way that is comprensible. In order to achieve that, some minor details have been glossed over; for the full gore, see the code. The intent is to give an overview first to give the reader some hope of understanding basic concepts and precepts.

As with any complex system, it is critical to have a high-level broad overview of how the system works before attempting to change anything within. Hopefully this paper will provide that skeleton understanding, and allow the reader to proceed to the code details themselves. This paper covers Linux 2.6.11.

...

Download PDF.