C++ property-based testing

There are several C++ test frameworks for property-based testing, and a clear winner has yet to emerge. Here we will be using the RapidCheck framework, which comes pre-installed on your user sessions of the practicals platform.

To demonstrate how RapidCheck works, we will reimplement a few classical linear algebra primitives using std::vector<float>, hopefully getting some appreciation of all the work that is saved when we use a linear algebra library like Armadillo instead of rolling our own code along the way.

A first example

Let’s start with a simple vector sum implementation:

#include <stdlib.h>
#include <vector>

std::vector<float> vector_sum(const std::vector<float>& vec,
                              const std::vector<float>& vec2) {
    std::vector<float> output;
    for (size_t i = 0; i < vec.size(); ++i) {
        output.push_back(vec.at(i) + vec2.at(i));
    }
    return output;
}

We can write a property-based test for it using RapidCheck as follows:

#include <rapidcheck.h>

int main() {
    rc::check(
        "vector_sum sums vectors",
        [](const std::vector<float>& vec, const std::vector<float> &vec2) {
            std::vector<float> result = vector_sum(vec, vec2);
            for (size_t i = 0; i < result.size(); ++i) {
                RC_ASSERT(result[i] == vec[i] + vec2[i]);
            }
        }
    );
}

If the syntax used for the second parameter of the rc::check function surprises you, click this text to learn more about it.

This is a lambda expression, a shorthand to quickly define a function that is available since version 2011 of the C++ standard. In this particular case, you could have achieved the same result by defining a separate function like this…
void vector_sum_sums_vectors(const std::vector<float>& vec,
                             const std::vector<float>& vec2) {
    std::vector<float> result = vector_sum(vec, vec2);
    for (size_t i = 0; i < result.size(); ++i) {
        RC_ASSERT(result[i] == vec[i] + vec2[i]);
    }
}

int main() {
    rc::check("vector_sum sums vectors", vector_sum_sums_vectors);
}
…but when using frameworks like RapidCheck that rely on you defining a lot of single-use functions, it is nicer to be able to define them right at the only point where they are being used, rather than produce spaghetti code where the reader must constantly jump from one place of the code to the other in order to understand what’s going on.

In any case, if we compile and run the test defined above, it fails:

$ make -j$(nproc) && ./sum.bin
Using configuration: seed=15308558600823943072

- vector_sum sums vectors
Falsifiable after 5 tests and 6 shrinks

std::tuple<std::vector<float>, std::vector<float>>:
([0], [])

Exception thrown with message:
vector::_M_range_check: __n (which is 0) >= this->size() (which is 0)
Some of your RapidCheck properties had failures. To reproduce these, run with:
RC_PARAMS="reproduce=BchdlNGdvJ3XzVXbgMXdtNHI2V2Y09mczB6rcFH8sLH1g+KXxBP7yRNovyV

What this error message tells us, in a fairly roundabout way, is that if our test is passed one single-element std::vector and one empty std::vector, an exception is thrown because an std::vector is indexed outside of its useful range.

If you cannot guess where that’s coming from, you could locate the point of failure using a debugger like GDB or by adding printouts at strategic places of the code. Whichever way you go, what you will eventually figure out is that the problem lies inside of the vector_sum() function:

std::vector<float> vector_sum(const std::vector<float>& vec,
                              const std::vector<float>& vec2) {
    std::vector<float> output;
    // ERROR: This loop has vec.size() iterations...
    for (size_t i = 0; i < vec.size(); ++i) {
        // ...then it tries to index both vec and vec2 with the corresponding
        // values of i. But what if vec2 has less elements then vec?
        output.push_back(vec.at(i) + vec2.at(i));
    }
    return output;
}

Basically, as currently written, the code assumes that the two input vectors, vec and vec2, have the same dimensionality, which may not be the case.

Such assumptions are really easy to make when writing code on a bad day, and they are hard to find because as soon as the assumption is ingrained in your brain, any hand-written test you write is likely to use an input that follows the same assumption. But here, it only took a bit of randomness to expose it. Such is the power of property-based testing.

If you are surprised that this situation leads to an exception being thrown, click this text for more information on what's going on.

The only reason why this out-of-bounds indexing problem was reliably reported with an exception was that I made the choice to use the uncommon at() method, rather than the more usual [] indexing operator. Without this precaution, the program would exhibit undefined behavior, a very problematic situation where program behavior becomes unpredictable because the compiler applies code transformations under the assumption that undefined behavior does not happen.

As a result of these incorrect code transformations, a program that exhibits undefined behavior can do basically anything, including but not limited to…

Crashing with an unclear error message such as “Segmentation fault”.

Reading invalid data from a memory location it’s not supposed to access, possibly leaking secret user data like passwords to the outside world in the process.

Changing memory locations it’s not supposed to write to, leading other parts of the program to misbehave in bizarre ways later on.

Executing unexpected instructions injected by an attacker trying to get into your system.

Not doing the expected work at all, perhaps jumping to an unexpected stream of instructions somewhere else in the program or perhaps just terminating without having done anything.

Or seemingly working as expected today, only to fail tomorrow in a weird way.

To make matters worse, the behavior is not fixed but may change depending on which compiler is used, which compiler flags are set, which operating system is used, or what is the particular state of the environment (e.g. value of environment variables, state of other running processes on the same system) at the time where the program is running.

Because of this variability, undefined behavior can be hard to detect, make sense of and correct. Which is why there is a trend in modern programming languages to make it harder to trigger undefined behavior by removing it from all commonly used primitives and only allowing for its possibility in uncommon and specialized situations.

Unfortunately, legacy programming languages like C++ and Fortran cannot follow this trend as it would break too much existing code. This is a good argument for using more modern programming languages like Python or Rust whenever your project allows for it.

And now that this point is clarified, let us go back to fixing our test.

Now that we found an unexpected input configuration, we need to think about how we want to handle it. Passing in two vectors of different dimensionality looks like a usage error from the caller of the function, so perhaps just crashing with a more self-explanatory error message is enough?

#include <stdexcept>

std::vector<float> vector_sum(const std::vector<float>& vec,
                              const std::vector<float>& vec2) {
    // Detect inhomogeneous input configurations
    if (vec.size() != vec2.size()) {
        throw std::runtime_error(
            "vector_sum(): input vectors don't have the same dimension"
        );
    }

    std::vector<float> output;
    for (size_t i = 0; i < vec.size(); ++i) {
        output.push_back(vec.at(i) + vec2.at(i));
    }
    return output;
}

With this change, our test would fail with a nicer error message, but still fail. We also need to teach the test that throwing an std::runtime_error is the expected behavior when vectors of different lengths are passed to vector_sum(). And while we are at it, we will also add a check that ensures the output has the same length as the inputs:

int main() {
    rc::check(
        "vector_sum sums vectors",
        [](const std::vector<float>& vec, const std::vector<float> &vec2) {
            std::vector<float> result;
            try {
                result = vector_sum(vec, vec2);
            } catch(std::runtime_error&) {
                // We only except an assertion failure if the length differs
                RC_ASSERT(vec.size() != vec2.size());
                // In that case, the test can't continue: there is no result
                return;
            }

            // If the test passes, then all input and output lengths should match
            RC_ASSERT(vec.size() == vec2.size());
            RC_ASSERT(vec.size() == result.size());

            // Rest is as before
            for (size_t i = 0; i < result.size(); ++i) {
                RC_ASSERT(result[i] == vec[i] + vec2[i]);
            }
        }
    );
}

With that, our test now passes:

$ make -j$(nproc) && ./sum.bin
Using configuration: seed=8215553341845074688

- vector_sum sums vectors
OK, passed 100 tests

Does this mean that our function is correct? Not necessarily in an absolute sense. All it means is that after trying 100 random input configurations, our property-based test harness was not able to find an input configuration that fails the test.

There is, however, an input configuration that RapidCheck did not try, which fails this test. Can you find it? Click this text for the answer!

As Hypothesis, a Python cousin of RapidCheck, correctly points out, this vector_sum() version does not pass the test if one of the inputs contains the value NaN, shorthand for Not a Number.

NaN is an special floating-point value (or, more accurately, a set of special floating-point values) that is emitted by floating-point functions when asked to perform a meaningless computation, such as dividing 0 by 0 or computing the square root of a negative number.

Because it is just a value in the output dataset, numerical codebases rarely check for its presence, which means that it commonly ends up in output files. Good numerical libraries must therefore be prepared to handle the presence of NaN in their input. Unfortunately, NaN has very nasty properties as far as mathematics go, one of which being that x == x is false if x is NaN, which commonly breaks sorting algorithms and tests based on floating-point equality.

Here, we could account for the presence of NaN in the test by either moving to bitwise equality or ignoring results where inputs contain non-finite floating point values, documenting that these are not supported in the function’s documentation.

The moral of this story is that it is best not to perform property-based testing using a general-purpose random generator that only produces “normal” numbers, but instead use a specialized generator that gives high probability to edge cases that programmers are unlikely to think about.

Unfortunately, RapidCheck disappoints in this respect, showing the relative immaturity of property-based testing in the C++ ecosystem.

So, what can we make of all this?

Testing is not just about finding bugs in the function being tested. It’s also about forming a better understanding of the problem we’re facing, and making conscious choices about what inputs we want to handle and how. A test failure indicates that the test assertions do not accurately reflect how the function being tested works, but this mismatch can be a problem in either the function or the test code.
Property-based testing is very effective at forcing us to think about uncomfortable edge cases that our brain would rather ignore. And by freeing us from inefficient human generation of interesting test inputs, property based testing lets us focus on more interesting questions about how our function should behave, which is what testing should be about.
Property based testing is not a panacea however. In this example, it is attention to detail that led me to notice that as initially formulated, the test did not check that the output was as long as the inputs. Similarly, if a function has a very narrow edge cases that is only triggered by very specific inputs, random search is unlikely to hit that configuration. In this case, you may need to complement property-based testing with manual testing with handpicked input, using a framework like Catch.

Exercise

You will find the above property-based testing example in the robust-programming/tests subdirectory of your home directory on the practicals server, where it is called sum.cpp. You can compile it with make -j$(nproc). Alongside it, you will find an incomplete dot.cpp property-based test. The goal of this exercise is for you to fill it.

Write a function that computes the dot product of two vectors, defined as lists of floating point numbers as above, then write some tests that assert that it does so correctly. To avoid duplicating the full dot product computation in the test (which is unlikely to help you find any bugs, since the test and the function being tested would share any misunderstanding from your side), try the following strategy:

Start by generating a vector where only one component is not zero, and compute the dot product of that vector with itself. The result is the squared norm of the vector, which in this scenario should be very easy to predict. The following test skeleton is provided to help you get started more quickly, feel free to check out the RapidCheck documentation to learn more about the various functions that are being used.

#include <rapidcheck.h>
#include <stdlib.h>

// Data needed to build a vector where exactly one component is not zero
struct SimpleVectorConfig {
    // Value of the nonzero component
    float component;

    // Index of the nonzero component
    size_t idx;

    // Length of the vector
    size_t length;
};

// RapidCheck generator that produces valid SimpleVectorConfigs
rc::Gen<SimpleVectorConfig> simple_vector_generator() {
    // Good values of "idx" depend on "length", so we must pick "length" first.
    // "mapcat" is how we express this idea of random generation where some
    // generated quantities depend on others in RapidCheck.
    return rc::gen::mapcat(
        // Don't let length grow too big to avoid slow tests / RAM exhaustion.
        rc::gen::inRange<size_t>(1, 1024),
        [](size_t length) {
            // RapidCheck syntax for building a generator of structs with
            // per-member configuration.
            return rc::gen::build<SimpleVectorConfig>(
                // "component" value simply uses the standard float generator.
                rc::gen::set(&SimpleVectorConfig::component),
                // "index" values must be between 0 and "length", right-exclusive.
                rc::gen::set(&SimpleVectorConfig::idx,
                             rc::gen::inRange<size_t>(0, length)),
                // "length" generator re-emits the value already generated above.
                rc::gen::set(&SimpleVectorConfig::length,
                             rc::gen::just(length))
            );
        }
    );
}

int main() {
    rc::check(
        "single-component squared norm",
        [&] {
            SimpleVectorConfig vector_config = *simple_vector_generator();
            // TODO: Replace with dot product test
            RC_ASSERT(vector_config.length >= 1);
            RC_ASSERT(vector_config.idx < vector_config.length);
        }
    );
}

Now generate two vectors of this sort and compute their dot product. Again, the result should be easy to predict without writing down the full dot product computation in your test.
Now generate one arbitrary vector, as done in the vector addition example above, and compute its dot product with itself. This gives you its squared norm, as before. Test that this squared norm is greater than the square of the largest vector component.
Finally, generate two arbitrary vectors, and check that their dot product follows the Cauchy-Schwarz inequality: pow(dot(v1, v2), 2) <= dot(v1, v1) * dot(v2, v2).

Note that floating-point rounding errors differ between multiple ways to do the same computation, so you may need to use approximate comparisons in some of your tests. The following approximate comparison algorithm is suggested:

#include <cmath>

bool is_close(float value,
              float reference,
              float rel_tol = 1e-6,
              float abs_tol = 1e-8) {
    return std::abs(value - reference) < std::max(rel_tol * std::abs(reference),
                                                  abs_tol);
}

Robust Programming

C++ property-based testing

A first example

Exercise