Evaluating Linux Kernel Crash Dumping Mechanisms

There have been several kernel crash dump capturing solutions available for Linux for some time now and one of them, kdump, has even made it into the mainline kernel.

But the mere fact of having such a feature does not necessary imply that we can obtain a dump reliably under any conditions. The LKDTT (Linux Kernel Dump Test Tool) project was created to evaluate crash dumping mechanisms in terms of success rate, accuracy and completeness.

A major goal of LKDTT is maximizing the coverage of the tests. For this purpose, LKDTT forces the system to crash by artificially recreating crash scenarios (panic, hang, exception, stack overflow, hang, etc.), taking into account the hardware conditions (such as ongoing DMA or interrupt state) and the load of the system. The latter being key for the significance and reproducibility of the tests.

Using LKDTT the author could constate the superior reliability of the kexec-based approach to crash dumping, although several deficiencies in kdump were revealed too. Since the final goal is having the best crash dumping mechanism possible, this paper also addresses how the aforementioned problems were identified and solved. Finally, possible applications of kdump beyond crash dumping will be introduced.

...

Download PDF.