Beyond core dumps: advanced bug detection and failure prediction

Student: Not assigned yet
Owner: Cristiano Giuffrida c.giuffrida@few.vu.nl
SVN branch name: N/A

Abstract

The Minix 3 reliability architecture offers proactive and reactive failure detection and recovery for device drivers. While it is often possible to recover from a transient failure with no significant service disruption, it is also desirable to detect and fix the originating bug as soon as possible. The traditional approach to collect information on the nature of the failure from the runtime is to produce a core dump of the target process after the crash. The amount of information obtained is, however, often insufficient to identify the source of the problem and immediately reproduce the bug, especially in the common case of transient failures. This project is to leverage the existing reliability architecture to monitor the behavior of a buggy process and gather more insightful information on a failure in realtime. The amount of data collected over time shall be processed to provide a more detailed picture of the failure, aid in the diagnosis of the originating bug, as well as improve existing techniques used for proactive failure recovery.