February 21, 2023

Fuzz Testing

What Is Fuzz Testing?

Fuzz testing consists of exercising component methods using data that is more unusual than the hand-crafted cases test writers tend to provide, with a goal of exposing possible crashes, memory problems, and a variety of other ill behavior.

Fuzz testing may also be coverage-driven, with code compiled in a way that provides feedback to the test driver, such that fuzz data becomes tailored to exercising all code paths in the program.

We support fuzz testing using the clang compiler. Fuzz testing requires a specially written fuzz testing function to be present in the test driver, and is then requested by adding fuzz to the specified ufid. When fuzz testing, it is helpful to also specify a sanitizer option in the ufid, such as asan (the address sanitizer), so that more errors are detected.

Writing a Fuzz Test

A fuzz test is simply a C function with a special name, LLVMFuzzerTestOneInput. The fuzz testing system calls this function repeatedly, supplying different data each time, and the function is responsible for invoking the methods to be tested using this data. The fuzz testing library supplies its own custom main() to perform these calls, so a fuzz test cannot have its own main().

Within the BDE system, where we do want to co-locate a fuzz test within the ordinary test driver, we use a macro to rename main when building for fuzz testing. The build system will define BDE_ACTIVATE_FUZZ_TESTING when building for fuzz testing to enable this.

A fuzz test is expected to attempt to crash on the first failure detected. This might be a “natural” crash, perhaps because the program indirects through bad pointers, or a deliberate crash via an unhandled exception or a call to abort. In BDE fuzz tests, a deliberate crash is invoked through the assertion system, as seen below in the call to BSLS_ASSERT_INVOKE. The fuzz testing infrastructure intercepts such crash attempts, saves the problematic input, reports the failure, and exits.

A BDE test driver adapted for fuzz testing will include code similar to the following template, just before main(). The BDE code base has several components that have already been modified this way. Please see those, e.g., ball_patternutil.t.cpp, for complete examples.

A Fuzz Testing Template

The following is an empty example template for a fuzz testing function.

// ============================================================================
//                              FUZZ TESTING
// ----------------------------------------------------------------------------
//                              Overview
//                              --------
// The following function, 'LLVMFuzzerTestOneInput', is the entry point for the
// clang fuzz testing facility.  See {http://bburl/BDEFuzzTesting} for details
// on how to build and run with fuzz testing enabled.
//-----------------------------------------------------------------------------

#ifdef BDE_ACTIVATE_FUZZ_TESTING
#define main test_driver_main
#endif

extern "C"
int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size)
    // Use the specified 'data' array of 'size' bytes as input to methods of
    // this component and return zero.
{
    const char *FUZZ   = reinterpret_cast<const char *>(data);
    int         LENGTH = static_cast<int>(size);
    int         test   = 0;

    if (LENGTH > 0) {
        // Use first fuzz byte to select the test case.
        test = (*FUZZ++ & 0xFF) % 100;
        --LENGTH;
    }

    switch (test) { case 0:  // Zero is always the leading case.
      case N: {
        // --------------------------------------------------------------------
        // TESTING 'myFunction'
        //
        // Plan:
        //   Describe how 'myFunction' will be fuzz tested.
        //
        // Testing:
        //   static void myFunction(arg1 value, ...);
        // --------------------------------------------------------------------

        // ...  Test myFunction using ASSERT or in other ways ...
      } break;
      // ... other cases ...
      default: {
      } break;
    }

    if (testStatus > 0) {
        BSLS_ASSERT_INVOKE("FUZZ TEST FAILURES");
    }

    return 0;
}

Generating Fuzz Test Inputs

In BDE testing methodology, there are often table-driven tests where the author has generated interesting test data by hand, and calls methods with that data, perhaps varying some other parameter along the way. It might look something like the following.

const char *DATA[] = {
    "Hello",
    "World!",
    "",
    "------------------------------------------------------------",
    "123 123 123 123 123",
};
size_t NUM_DATA = sizeof(DATA) / sizeof(*DATA);

const uint8_t LIMITS[] = { 0, 1, 2, 3, 11, 21, 255 };
size_t NUM_LIMITS = sizeof(LIMITS) / sizeof(*LIMITS);

for (size_t i = 0; i < NUM_DATA; ++i) {
    for (size_t j = 0; j < NUM_LIMITS; ++j) {
        int result = obj.method(DATA[i], strlen(DATA[i]), LIMITS[j]);
        ASSERTV(0 == result);
    }
}

In fuzz testing, we generally don’t want to do this. The intent of fuzz testing is to have “surprising” inputs, so we want to use the fuzz data as much as we can, in order to eliminate hidden assumptions in the test data that might prevent errors from being noticed. So, if we are writing a fuzz test with the intent of paralleling the normal test above, we might write it like this.

// ,,,
switch (test) {
  case 1: {
    uint8_t limit = 0;
    if (LENGTH > 0) {
        limit = *FUZZ++ & 0xFF;
        --LENGTH;
    }

    int result = obj.method(FUZZ, strnlen(FUZZ, LENGTH), limit);
    ASSERTV(0 == result);
  } break;
  // ...

Rather than keeping tables of strings and limits, we allow the fuzz data to supply both a limit and a string, and we only test a single input rather than looping through a set of cases. The fuzz testing infrastructure will do the looping for us, and it will come up with combinations of strings and limits that we might not see in the hand-written data, and that we might miss if we used the fuzz data only for the string but not for the limit.

What Does a Fuzz Test Test?

Fuzz testing involves a variety of approaches depending on the nature of the methods to be tested. It is up to the author of the fuzz test to decide which approaches are appropriate for the tests being conducted. Given the fuzz test skeleton above, fuzz tests may include the usual invocations of ASSERTV and related test macros, and any failure will result in the test driver aborting and thus notifying the fuzz testing machinery that the supplied input has caused a failure.

  • Acceptance Testing Functions with Wide Contracts:

    Functions with wide contracts claim to accept any input. Thus, the fuzz test may simply invoke such methods with the supplied data. The purpose of such a test is to verify that the method does not crash or cause any detectable undefined behavior, but not to check that the function produces the correct result.

    obj.wideFun(FUZZ, LENGTH);
    
  • Acceptance Testing Functions with Narrow Contracts:

    Functions with narrow contracts claim to accept only a limited set of inputs.

    • Valid Input:

      The fuzz test may examine the supplied data and call the method to be tested only if the data falls within the contract. If the data is valid for the contract, the test again simply verifies that the method does not crash or cause detectable undefined behavior.

      if (LENGTH > 5 && FUZZ[0] == 'A' && FUZZ[1] == '(') {
          obj.narrowFun(FUZZ, LENGTH);
      }
      
    • Invalid Input:

      The fuzz test may choose to invoke methods with data that the narrow contract prohibits to determine whether such out-of-contract data is caught and handled by the method, especially when built in safe contract modes. Here, the test uses the ASSERT_SAFE_PASS/FAIL macros to verify that the called method detects out-of-contract data and calls the failure handler, or processes in-contract data and does not invoke the handler. If there is a crash or other detectable undefined behavior, that too will be caught in either case. Once again, we are not testing if the result of the method is correct.

      #ifdef BDE_BUILD_TARGET_EXC
      if (LENGTH > 5 && FUZZ[0] == 'A' && FUZZ[1] == '(') {
          bsls::AssertTestHandlerGuard g;
          ASSERT_SAFE_PASS(obj.narrowFun(FUZZ, LENGTH));
      }
      else {
          bsls::AssertTestHandlerGuard g;
          ASSERT_SAFE_FAIL(obj.narrowFun(FUZZ, LENGTH));
      }
      #endif
      

    For a more generalized and robust approach fuzzing functions with preconditions, see Fuzzing Functions with Preconditions.

  • Comprehensive Correctness Testing:

    Within any of the above approaches related to contract scope, the fuzz test may also choose to verify not only that the called functions do not crash, but also that they correctly process their input. In this context, the value of correctness testing depends on the ability to provide an independently written “oracle” function that determines whether the input is correct and what the results of the method should be. This is not always feasible, since such determination (e.g., well-formedness of XML or JSON) may sometimes be as complex and prone to error as the component under test itself.

    bool allNumeric = true;
    for (int i = 0; allNumeric && i < LENGTH; ++i) {
        allNumeric = '0' <= FUZZ[i] && FUZZ[i] <= '9';
    }
    bool result = obj.checkAllNumeric(FUZZ, LENGTH);
    ASSERTV(allNumeric, result, allNumeric == result);
    

Generating Function Input from Fuzz Data

The two components bslim_fuzzdataview and bslim_fuzzutil can simplify the creation of function input from raw fuzz data. FuzzDataView provides a view to a non-modifiable buffer of fuzz data obtained from a fuzz testing harness such as LLVM’s libFuzzer. The FuzzDataView component is passed as an argument to FuzzUtil, which contains functions that create fundamental and standard library types from the fuzz data.

For example, imagine we are fuzzing a parser and want to use fuzz data to populate a configuration object:

switch (test) {
  typedef bslim::FuzzUtil FuzzUtil;
  case 1: {
    bslim::FuzzDataView fuzzData(FUZZ, LENGTH)
    Options options;
    options.setMaxDepth(FuzzUtil::consumeNumberInRange<int>(&fuzzData, 1, 128));
    options.setSkipUnknownElements(FuzzUtil::consumeBool(&fuzzData));
    options.setValidateSchema(FuzzUtil::consumeBool(&fuzzData));

    Obj mX;
    mX.parse(fuzzData.data(), fuzzData.length(), options);
  } break;
  // ...

Additional fuzz utilities may be created at higher levels to simplify the process of creating higher level types. For instance, bdlt_fuzzutil builds upon bslim_fuzzdataview and bslim_fuzzutil to create date values for testing functions that accept dates as parameters.

Fuzzing Functions with Preconditions

When fuzzing a function with preconditions (i.e., a function with a “narrow contract”), if we naively supply fuzz data as input, we will frequently invoke the function out of contract. This has two problems, the first theoretical, the second practical. Theoretically, calling a function out of contract has undefined behavior, and any “errors” the fuzzer reports for such input are not important – i.e., basically “out-of-contract” input is an uninteresting input-space for the fuzzer to explore. The second, more practical, issue is that functions with preconditions typically enforce their preconditions at run-time, in appropriate build-modes, using BSLS_ASSERT, which will catch these precondition violations and report them as errors and cause the fuzz test to end prematurely. In order to enable functions with preconditions (which are enforced with BSLS_ASSERT) to be tested effectively, we have introduced a new component bsls_precondions.h, which provides the macros BSLS_PRECONDITIONS_BEGIN and BSLS_PRECONDITIONS_END. These macros allow a developer to demarcate the uses of BSLS_ASSERT that enforce preconditions for the function under test.

For example:

double mySqrt(double x)
    // Return the square root of the specified 'x'.  The behavior is
    // undefined unless 'x >= 0'.
{
    BSLS_PRECONDITIONS_BEGIN();
    BSLS_ASSERT(0 <= x);
    BSLS_PRECONDITIONS_END();
    return sqrt(x);
}

BSLS_PRECONDITIONS_BEGIN and BSLS_PRECONDITIONS_END are needed to demarcate the preconditions for the function being tested, because when we fuzz test a function, we want to ignore failures only from the preconditions for that function under test, but report errors from any other BSLS_ASSERT failures. We refer to preconditions that fail in the function under test as “top-level” preconditions. Applying BSLS_PRECONDITIONS_BEGIN and BSLS_PRECONDITIONS_END allows us to treat these top-level failures differently from any other failures that are detected. Notably, in most build modes – currently any non-fuzzing build – these two macros expand to nothing, adding no overhead to non-fuzz-related function invocations.

The function under test must be invoked with BSLS_FUZZTEST_EVALUATE found in bsls_fuzztest.h. BSLS_FUZZTEST_EVALUATE identifies the function being tested to the BSLS_PRECONDITIONS_* macros in order to differentiate between BSLS_ASSERT failures for top-level preconditions, which should be ignored, or other failures, which should be reported as errors.

Prior to invoking BSLS_FUZZTEST_EVALUATE we must create a FuzzTestHandlerGuard like so:

bsls::FuzzTestHandlerGuard hg;

BSLS_FUZZTEST_EVALUATE(mySqrt(input));

If a function in one component delegates its implementation and precondition checks to a different component, we want to ignore any top-level precondition failures even though they are generated by this other component. To ignore top-level precondition failures originating in another component, we use BSLS_FUZZTEST_EVALUATE_RAW; however, like the non-RAW version, this ignores only top-level precondition failures.

Building and Running Fuzz Tests

BDE libraries and test drivers can be built and linked to enable fuzz testing using clang compilers. It is best to use the most recent version of the compiler available, as the fuzz testing system is frequently updated.

When using the cmake system to build fuzz tests, the test drivers should be built, but not automatically run. The main() routine supplied by the fuzz testing library takes different arguments than the normal test driver arguments.

When the executable is run, the main() function in the fuzz testing library will repeatedly invoke LLVMFuzzerTestOneInput with a variety of data. Once the program detects an error and aborts, the clang fuzz testing machinery will save the supplied data that caused the crash in a file named crash-... for further examination. The fuzz test may choose to print out verbose testing information, but note that the normal command-line arguments that control verbosity do not work due to the custom main(), and the default fuzz testing output is itself quite verbose.

First, set up the build environment. In this example, we are requesting a 64-bit fuzz testing build with address sanitizer included, and that version 13 of the clang compiler be used. We request safe mode to enable all of the contract assertions, and optimization in the hope of exposing more possible bad behavior.

$ eval `bbs_build_env -u opt_dbg_safe_64_asan_fuzz_cpp17 -p clang-13`

Then configure and build the fuzz test.

$ bbs_build configure build --targets=ball_patternutil.t --tests=build

Finally, run the fuzz test. When not invoked with command-line arguments, a fuzz testing test driver will run forever or until it crashes. There are a variety of arguments that control the behavior of the test driver, described here. In particular, the argument -max_total_time=N will limit the running time to N seconds, and -help=1 will display all available options.

$ ./_build/*/ball_patternutil.t -max_total_time=120

If a fuzz test stops due to hitting a specified limit, it exits with a normal status (0). If it stops due to a detected error causing a crash, it exits with a failed status (1). Thus, for automated testing, the test can be run with its output redirected to a discarding device and a time limit specified, checking the exit status once it’s done.

Fuzz testing may also be run incrementally, with initial inputs specified. If the test driver is supplied with one or more directories on the command line, it treats files in those directories as the initial input corpus for fuzz testing, and will mutate those inputs to derive further test cases, writing interesting ones back to the first directory. Providing such a set of initial inputs can be useful when correct input is highly structured, such that the fuzz testing procedure may take a long time to find its way there if left unguided. (Although in that case, we suggest that a better, or at least alternate, option is to write test cases that generate structured input using the fuzz data as a base.) The corpus directory may start off empty, in which case fuzz testing will generate and save its data from scratch.

Interpreting Fuzz Test Results

For comprehensive details on the output produced by fuzz testing, see the documentation here.

The fuzz tester writes output describing what it’s doing as it does it, which is generally not useful or interesting. On failure (that is, when the test machinery intercepts an attempt to crash), depending on the nature of the crash and the sanitizers that are built into the program, the fuzz test will write additional output to the standard error channel describing what it believes to be the problem, and whatever data it can provide as to its location. It will write the fuzz data that caused the problem to a file named crash-....

Here is some sample output for a one-line fuzz test that treats the fuzz data as a pointer and tries to indirect it, which causes an immediate failure.

extern "C" int LLVMFuzzerTestOneInput(int **f) { return **f == 0; }
INFO: Seed: 1428378131
INFO: Loaded 1 modules   (1 inline 8-bit counters): 1 [0x78d128, 0x78d129),
INFO: Loaded 1 PC tables (1 PCs): 1 [0x560bc0,0x560bd0),
INFO: -max_len is not provided; libFuzzer will not generate inputs larger than 4096 bytes
=================================================================
==194626==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x602000000050 at pc 0x000000539e25 bp 0x7ffcae0dc970 sp 0x7ffcae0dc968
READ of size 8 at 0x602000000050 thread T0
    #0 0x539e24  (./ft.t+0x539e24)
    #1 0x440131  (./ft.t+0x440131)
    #2 0x446c91  (./ft.t+0x446c91)
    #3 0x448936  (./ft.t+0x448936)
    #4 0x4309d5  (./ft.t+0x4309d5)
    #5 0x41f4c2  (./ft.t+0x41f4c2)
    #6 0x3dcc01ed1c  (/lib64/libc.so.6+0x3dcc01ed1c)
    #7 0x41f574  (./ft.t+0x41f574)

0x602000000051 is located 0 bytes to the right of 1-byte region [0x602000000050,0x602000000051)
allocated by thread T0 here:
    #0 0x5366b8  (./ft.t+0x5366b8)
    #1 0x44003b  (./ft.t+0x44003b)
    #2 0x446c91  (./ft.t+0x446c91)
    #3 0x448936  (./ft.t+0x448936)
    #4 0x4309d5  (./ft.t+0x4309d5)
    #5 0x41f4c2  (./ft.t+0x41f4c2)
    #6 0x3dcc01ed1c  (/lib64/libc.so.6+0x3dcc01ed1c)

SUMMARY: AddressSanitizer: heap-buffer-overflow (./ft.t+0x539e24)
Shadow bytes around the buggy address:
  0x0c047fff7fb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c047fff7fc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c047fff7fd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c047fff7fe0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c047fff7ff0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x0c047fff8000: fa fa 00 fa fa fa 00 fa fa fa[01]fa fa fa fa fa
  0x0c047fff8010: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c047fff8020: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c047fff8030: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c047fff8040: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c047fff8050: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
  Shadow gap:              cc
==194626==ABORTING
MS: 0 ; base unit: 0000000000000000000000000000000000000000


artifact_prefix='./'; Test unit written to ./crash-da39a3ee5e6b4b0d3255bfef95601890afd80709
Base64:

Debugging Failed Fuzz Tests

Generally speaking, once a problem is detected, testing needs to fall back to ordinary debugging; fuzz testing tells you that a problem exists with a specified input, and it is then up to you to locate the problem. Depending on the nature of the problem, there may be output from the test program that will provide clues. In the sample output above, we see that a memory overflow has been detected, and the program provides stack traces for where the memory was allocated, where the overflow happened, and the contents of memory around the problematic area. Near the end, we see that the test program has written the bad input to a file named crash-da39a3ee5e6b4b0d3255bfef95601890afd80709.

The test program can be rerun supplying that file as a command-line argument. When this is done, only the contents of that file are supplied as input data to the fuzz testing subroutine, making it easy to repeat the failure.

The sanitizer infrastructure provides some support for debugging; see, for example, AddressSanitizerAndDebugger. There is a well-known program location, __sanitizer::Die, that is called after the program prints its report and before it exits; setting a breakpoint there allows for tracing back to where the error occurred. A debugging session for the above failure might begin as follows:

$ gdb ./ft.t
(gdb) break __sanitizer::Die
(gdb) run crash-da39a3ee5e6b4b0d3255bfef95601890afd80709
...
Thread 1 "ft.t" hit Breakpoint 1, __sanitizer::Die ()
...
(gdb) where
...
#4  0x0000000000539e25 in LLVMFuzzerTestOneInput (f=0x7fffffffc830)
at ft.t.cpp:1
...