2005-11-08: DTT project's home resides now at Sourceforge.
2005-11-08: There have been several kernel crash dump capturing mechanisms available for Linux for some time now (diskdump, LKCD, mkdump, kdump, etc.), but having all these funky features does not imply that a crash dump can be reliably obtained under any conditions.,
DTT (Dump Test Tool) is a test suite that evaluates the reliability of kernel crash dump capturing mechanisms for Linux by precisely recreating crash scenarios that take into account both the state of HW and the load conditions of the system.
The goal of DTT (Dump Test Tool) is providing a reliable estimate of the success rate capturing crash dumps and, if a dump was obtained, evaluating its integrity.
Specifically, DTT is a set of user-space and kernel-space tools that force the system to die by artificially creating crash scenarios. But unlike existing test suites DTT takes into account the state of HW and the load conditions of the system. Ignoring these factors the test coverage is reduced so badly that it renders the results meaningless. Thus, DTT tries to reproduce given HW and load conditions (ongoing DMA, specific I/O rates, etc) before causing the system to crash.
In DTT, the system failures are induced at predefined "crash points" which are implemented as kernel hooks (a list of essential crash points is provided). Besides, all the testing process can be controlled and dynamically reconfigured by means of a user-space tools provided in the DTT bundle.
One of the main problems that arises when testing crash dumping solutions is that artfully inserting crash points in the kernel does not always suffice to recreate certain scenarios. Some execution paths are rarely trodden and the kernel has to be lured to take the right wrong way. Besides, the tester might want HW to be in a particular state: ongoing DMA and/or I/O, certain memory and CPU usage levels, etc. This is when the set of auxiliary tools included in DTT comes into play to induce the desired additional conditions.
Finally, the automation of the testing process is under way.
The critical role crash dumping solutions play in enterprise systems requires proper testing, which is something the current testing methods cannot achieve. Using DTT I have found many deficiencies in mkdump, kdump and other similar projects and my purpose is sharing this information with the whole community so that they can be improved and compared fairly.