Table of Contents

Fault Injection

Student: Anton Kuijsten
Mentors: Cristiano Giuffrida, Arun Thomas
git repository: minix-fault

Abstract

The goal of this project is to implement a new compiler-based fault injection tool which can be used for reliability testing on MINIX 3 . This should improve the existing Software Implemented Fault Injection (SWIFI) tool included in the MINIX distribution, which is based on tracing and can only be used for a limited number of OS components. The tool will be implemented as an LLVM transformation pass. At link time, it will be able to inject various fault types, each with its own probability. Fault injection can be limited to a selection of functions. At run time, a probability can be set to manage global fault occurence, and statistics on fault occurences can be dumped.

Current Status

DoneItemPercentage CompleteComments
basic fault injection functionality (midterm)0%
added statistics and dynamically adjustable fault occurence probabilities0%

Status Reports

Week 1 (30-05) Loading of a pass in llvm-2.9 works, when installed from pkgsrc. Binary packages for llvm-2.9 and llvm-3.1 can't load passes dynamically, probably because they're not built with dynamic library support. Also, llvm-3.1 from pkgsrc produces an error during the build process. A simple hello world pass can be compiled in isolation from the llvm source code. Only the llvm headers need to be available. In the future, it might be a good idea to include these in the installation of a binary llvm package.

I'll look into a fast way to integrate pass usage into the buildsystem (compiler driver or changes to makefiles). Later this summer, the llvm gold linker might become available on Minix, after which a better and more permanent solution can be applied.

Week 2 - 3 (10-06)

Llvm pass loading is integrated into the buildsystem of the minix-fault git repository. A compiler driver from another llvm project is added to the source tree in commands/llvmdrv. Passes are added to lib/libllvm. Make rules are added to share/mk/minix.llvm.mk.

A basic block cloning function is added in the fault injector pass. The cloned blocks are not yet reachable (there is no branch instruction from the first basic block to the first cloned block), but branch instructions between basic blocks seem to be mapped correctly. This pass is not tested much, but the llvm assembly output looks good at first glance.

By setting environment variables when calling make, the passes can be activated. LLVM_CONFIG=TEST for the hello world pass, LLVM_CONFIG=FAULT for the fault injector pass, and LLVM_DEBUG=info for more information on the commands that are executed by the llvm compiler driver (llvmdrv).

For compiling the passes, you need to install clang 2.9 from pkgsrc (not from pkgin). Also, the llvm header files from pkgsrc need to be copied to the source tree. See lib/libllvm/README for instructions.

As of now, some subdirectories of /usr/src don't build correctly when a pass is used. Therefore, only build servers/, drivers/ and lib/ with LLVM_CONFIG=[…]. Also, servers/inet, servers/vm, drivers/acpi and drivers/random get build errors that are not yet solved. Use these commands to build and install the libraries and the working servers and drivers with the hello world pass:

The bytecode input and output from a pass can be converted into readable llvm assembly with: llvm-dis /usr/src/servers/vfs/main.bcc -o /usr/src/servers/vfs/main.bcc.ll && llvm-dis /usr/src/servers/vfs/main.BCC -o /usr/src/servers/vfs/main.BCC.ll

Week 4

The pass is now better integrated into the build system. Now, everything can be built with:

If lib/, servers/, and drivers/ were already build, they have to be cleaned first. Otherwise, they will be skipped, because the targets seem to be up to date.

Week 5

A library is added to lib/libllvm/faultlib (the other subdirectories of lib/libllvm are llvm passes). This library is linked into the instrumented binaries of services. It contains code and variable to disable/enable fault execution, and (in the future) print out statistics.

Also, a command line tool is added to commands/faultinjector. The tool can be used to disable/enable fault execution, and to run a test function in faultlib.

A command line argument is added to the pass, so that a comma separated list of functions that have to be instrumented can be specified.

Currently, no fault is injected. Instead, a printf(“cloned\n”) statement is added to each cloned basic block.

To test all new features:

Week 6

All calls to control fault execution are now routed through PM, because the userspace tool can't send messages directly to e.g. MFS.

Also, the command line tool now accepts service names (labels) instead of endpoint numbers. Usage is now: faultinjector <label> (on|off|test)

Week 8

We will start with implementing the fault types from the swifi tools (overview will follow).

Swifi tries to emulate programming errors by binary rewriting and modifying data. The injected faults often resemble errors in assembly code, such as forgetting to load data into a register. This should emulate C source errors, such as initializing function arguments.

Faults that are injected by llvm will resemble C source errors more directly. For example, we will be able to simply modify function arguments, instead of removing register load instructions.

Week 9

Working on instrumentation with different fault types.

Week 10-11

Break to finish thesis and go to conference

Design

The pass will be executed at link time. The pass can be configured with the set of functions that have to be instrumented (by name). For each selected function, the instructions are duplicated, and faults are injected into the duplicate set of instructions. At the entry point of the function, a single global probability decides whether the original or fault-injected set of instructions is executed. This probability is initialized to 0, and can be changed at run time. Furthermore, the fault type can be configured with a set of probabilities. These probabilities determine the link time injection of each fault type for each individual instruction.

Instructions are duplicated by cloning all basic blocks in a function, and the branch instructions connecting those blocks. A new first basic block is created, which has to decide if execution is branched to the original first basic block, or the fault injected first basic block, based on probability. This decision can be made by a C function in the system library, which the pass can inline into the first basic block.

A source file will be added to the system library. This file contains the probability variable and function described above. Also, it contains functions that implement the system calls that change the probability variable and print out statistics.

Supported fault types

- swapping operands - changing operators in assignments and branching conditions - swapping branches - Other faults from the swifi tool and literature. These will be determined together with the mentor after a study.

User-interface

Compilation will be configured and executed by calling a script. This is the most convenient way to pass configuration parameters. Probability changes are passed to a running service with a new system call and user tool. The output of Statistics can be triggered by a signal or a system call, and will be printed to standard output by the service.

Minix integration

Dynamic Library support will probably be working for Clang when GSoC starts. That will be needed to load the link time pass as a module. Clang does not have good support for applying passes with a simple command line argument. Therefore, we can have to modify the build system makefiles to apply passes between compiler commands. If that proves to be to complex, we can use a script that acts a compiler driver, so that the makefiles don't need much changes. Alternatively, a pass is available that gives Clang the ability to load passes with command line arguments (But the downside is that the patch will have to be maintained in the minix clang package).

This is an example of how the build script would be called to inject faults into vfs, for the functions do_link,dev_io and map_driver, with an operand swapping probability of 0.2, a branch swapping probability of 0.6, and an unspecified operator swapping probability which will be set to a default value:

build-fault-injection.sh servers/vfs -functions do_link,dev_io,map_driver -p 0.4 -p_swap_operands 0.2 -p_swap_branches 0.6

Testing

Most of the faults can be tested by setting the fault probability to 1, and printing debug output. For example, the printed outcome of a mathematical operation can indicate swapped operands. Debug output that indicates which branches are executed indicate swapped branches. Fault probability can be tested by looping a function many times, and check if the actual fault ratio approaches the configured probability. Different test scenario's can be implemented in the hello driver.

Schedule

Pre-Coding Period (Apr 23 - May 20)

Week 1 (May 21 - May 27)

Get a dummy pass working on Clang in Minix. Get the build system to use the pass, possibly by using a compiler driver script, or a patch to Clang.

Week 2 (May 28 - Jun 3)

study basic block cloning. All basic blocks of a function have to be cloned, so that each basic block gets a fault-injected clone. At the start of the function, it is decided if the original or fault-injected set of basic blocks is executed.

Week 3 (Jun 4 - Jun 10)

implement basic block cloning.

Week 4 (Jun 11 - Jun 17)

study fault types that are supported by the swifi tool, and look into literature. With mentor, determine set of fault types that will be implemented.

Week 5 (Jun 18 - Jun 24)

study fault types that are supported by the swifi tool, and look into literature. With mentor, determine set of fault types that will be implemented.

Week 6 (Jun 25 - Jul 1)

implement instrumentation for fault types

Week 7 (Jul 2 - Jul 8) - MIDTERM

implement instrumentation for fault types Not all fault types are supported yet, but basic fault injection with a few fault types is working.

Week 8 (Jul 9 - Jul 15)

implement instrumentation for fault types

Week 9 (Jul 16 - Jul 22)

add pass option to select which functions to instrument.

Week 10 (Jul 23 - Jul 29)

implement compile-time adjustment of probabilities for each fault type per basic block

Week 11 (Jul 30 - Aug 5)

implement fault counters to dump statistics and dynamically adjust probability of execution for each fault type.

Week 12 (Aug 6 - Aug 12)

implement fault counters to dump statistics and dynamically adjust probability of execution for each fault type.

Week 13 (Aug 13 - Aug 19) - FINAL

finish everything. submit to Google

Resources