Linux Kernel Dump Test Tool

Testing Linux kernel crash dump capturing capabilities


News

DTT hosted at Sourceforge

2005-11-08: DTT project's home resides now at Sourceforge.

First release of DTT

2005-11-08: There have been several kernel crash dump capturing mechanisms available for Linux for some time now (diskdump, LKCD, mkdump, kdump, etc.), but having all these funky features does not imply that a crash dump can be reliably obtained under any conditions.,

DTT (Dump Test Tool) is a test suite that evaluates the reliability of kernel crash dump capturing mechanisms for Linux by precisely recreating crash scenarios that take into account both the state of HW and the load conditions of the system.

Documentation

Overview

Introduction

The goal of DTT (Dump Test Tool) is providing a reliable estimate of the success rate capturing crash dumps and, if a dump was obtained, evaluating its integrity.

Shortcomings of current testing methods

Specifically, DTT is a set of user-space and kernel-space tools that force the system to die by artificially creating crash scenarios. But unlike existing test suites DTT takes into account the state of HW and the load conditions of the system. Ignoring these factors the test coverage is reduced so badly that it renders the results meaningless. Thus, DTT tries to reproduce given HW and load conditions (ongoing DMA, specific I/O rates, etc) before causing the system to crash.

Implementation

In DTT, the system failures are induced at predefined "crash points" which are implemented as kernel hooks (a list of essential crash points is provided). Besides, all the testing process can be controlled and dynamically reconfigured by means of a user-space tools provided in the DTT bundle.

One of the main problems that arises when testing crash dumping solutions is that artfully inserting crash points in the kernel does not always suffice to recreate certain scenarios. Some execution paths are rarely trodden and the kernel has to be lured to take the right wrong way. Besides, the tester might want HW to be in a particular state: ongoing DMA and/or I/O, certain memory and CPU usage levels, etc. This is when the set of auxiliary tools included in DTT comes into play to induce the desired additional conditions.

Finally, the automation of the testing process is under way.

Conclusion

The critical role crash dumping solutions play in enterprise systems requires proper testing, which is something the current testing methods cannot achieve. Using DTT I have found many deficiencies in mkdump, kdump and other similar projects and my purpose is sharing this information with the whole community so that they can be improved and compared fairly.

Fernando Luis Vázquez Cao @ NTT Data Intellilink