Introduction
This website hosts the instructions for the practical exercises of the “Robust Programming” computing training. Please use the arrows or the menu on the left to navigate between pages.
Goal
The aim of this training is to discover some of the software engineering concepts and methodologies and to practise using selected tools in order to write code that is better tested, better documented and easier to understand and maintain. We will cover:
- automatic rule checks
- documentation
- tests
- automation of workflows
Languages
For most practicals, Python and C++ versions of each exercise are provided. Unless otherwise stated, it is not expected that you complete both exercises (though you can if you are fast enough). The intent is rather that you pick the language you are most comfortable with, so that you can optimally focus on the new concepts introduced by this course.
For some exercises, you will find two tabs for each language:
print("Hello world")
#include <iostream>
int main() {
std::cout << "Hello World";
return 0;
}
Virtual machine
We deployed virtual machines in the cloud at VirtualData.
To ensure that everyone works with the same environment during
the practicals, it is strongly suggested that you do them on the Linux server
that you have been given access. All instructions given in this document will
assume that you are doing so, and may not work as expected on another computer. Simply connect to the virtual machine using ssh
:
ssh <username>@<IP>
<username>
, <IP>
, and corresponding password will be given during the lecture. More information on the set up can be found on the platform section.
Platform
We deployed virtual machines in the cloud at VirtualData. To ensure that everyone observes the same system performance phenomena during the practicals, it is strongly suggested that you do them on the Linux server that you have been given access. All instructions given in this document will assume that you are doing so, and may not work as expected on another computer.
Network connexion
At least one of the academic institutions with which you work likely provides you with access to eduroam. Be sure to set it up and try it out in advance: you may need help from your local computing service as the configuration procedure is institution-specific. Here’s the documentation for IJCLab members.
In case you experience last-minute issues with eduroam, you will alternatively be able to use our public Wi-Fi network. You will need to be physically at IJCLab to set up this one, so no advance preparation is possible in this configuration.
Server configuration
The practical servers are configured in the following manner:
- Each server is running an AlmaLinux OS (Alma 9).
- The username to connect is
student
, and you should use the password given at the start of the lecture. - All necessary tools to compile and execute scripts are installed and accessible in the server. If this is not the case, please ask one of the teachers.
- When you connect to the server, you are automatically put in a Python
virtual environment (called
robprog
). This environment contains all necessary packages to perform the work. If you deactivated it, just log out from the server and reconnect, or executesource /opt/.venv/bin/activate
.
Finally keep in mind that
- You should not use the server for other purposes than this lecture.
- The server will be available until the 12th April, then it will be
reallocated to other work and all data will be removed. This should give you
enough time to finish the second homework, but please remember to
scp
any file you care about away from the server!
SSH shortcuts
At the beginning of the course, you have been provided with the IP address of the server
that you will be working on, as well as your user name on it. However, IP
addresses are unwieldy to remember and user names are boring to specify on every
connexion, so you may want to add something like the following to your SSH
config file (~/.ssh/config
on Linux and macOS)…
# Robust programming training
Host robust-programming
User <username>
HostName <server IP address>
…where you replace the <bits between brackets>
with the matching information
from the registration e-mail that you have received. Once that is done
ssh robust-programming
will connect you directly to the practicals platform,
with the right user name, and directly ask you for your password.
Source code
First of all, you should download the source code for the practicals. It is located inside of this archive, which you can download and unpack in your home directory using this script:
#!/bin/bash
FOLDER_NAME="robust-programming"
BASE_URL="https://informatique-des-deux-infinis.pages.in2p3.fr/pheniics"
cd ~
# Keep a copy of the current
# folder if it exists
if test -d ${FOLDER_NAME}; then
SUFFIX=`date +"%Y%m%d_%s"`
BACKUP=${FOLDER_NAME}_${SUFFIX}
echo "Found existing ${FOLDER_NAME} folder"
echo "Keeping a copy under ${BACKUP} and redownloading it..."
mv ${FOLDER_NAME} ${BACKUP}
fi
curl ${BASE_URL}/${FOLDER_NAME}/${FOLDER_NAME}.tar.gz | tar -xz
If, for some reason, you get one of these files into an undesirable state (e.g.
accidentally delete it), you can get back a fresh copy of it in its initial
state by re-running the script. It will save your current work, and download
again the robust-programming
directory in which you will find the original
source files.
Eventually, your home directory ~/robust-programming
directory containing the
source code for all practicals. For demonstrations, this will be the final
source code at the end of the demonstration. You can compile C++ programs using
the provided Makefile:
make -j$(nproc)
Some Makefiles also have additional options, such as make test
for running the
test binaries (if any) or make doc
for building Doxygen documentation. Please
feel free to check the contents of the Makefiles for more info about what they
do.
Local file access
One drawback of using an SSH-based remote configuration is that it seemingly
forces you into using archaic terminal-based code editors like vi
or emacs
,
instead of more modern alternatives like Sublime Text or Visual Studio Code.
Thankfully, there are ways to mount a remote directory that you have access
to via SSH as a local directory on your laptop, so that you can manipulate files
in it locally with your preferred text editor.
- For a general way to do this, please check out this sshfs
tutorial
that Hadrien wrote as part of the documentation of
srv-calcul-ambulant
, another compute server that he administrates. Translating these instructions for a different server should be straightforward (you only need to change the server name), but do get back to us if you encounter any issue. - If you are using Visual Studio Code specifically, it has its own support for
operating over SSH
connections that you
may prefer over
sshfs
.
Of course, this access is only good for editing source code, and you should not try build programs or execute them on your local machine: you may not have the required prerequisites installed and even if you do, your local system is probably not as well tuned for reliable and reproducible performance benchmarking as the remote one.
Automatic rule checkers
The best way to get a team of developers to agree on some common rules is to have a tool that automatically enforces the rules. Failing that, the second best option is to have a tool that automatically detects rule violations, and then lets the programmer address them.
C++ and Python both have a long history, a large user base, and unfortunate design choices that make it easy for developers to accidentally write incorrect code. As a result, many tools exist to automatically detect clear-cut problems, as well as suspicious code constructs that may or may not be a problem, but will likely make the code harder to understand and maintain in any case. Finally, another common target automation is the enforcement of a consistent coding style across the entire codebase, which makes code easier to read.
All of these tools, and the kind of problems that they detect and correct, are heavily language-dependent. As a result, tools for each language will be introduced in a separate section. Please choose the language whose tooling you want to study:
Linting your Python code
In this section, we will learn how to use linters, formatters, and pre-commit hooks to improve the quality of your code and automate the process of checking for errors and enforcing coding standards.
The Style Guide for Python Code is called PEP8 (or see this stylized presentation). Rules are proposed by the community of users, and subject to change. There are hundreds of rules as of 2024, and it would not be feasible (even desirable) to check them manually each time we write a code.
Linters: the flake8 example
A linter is a tool that analyzes your code for potential errors, bugs, and stylistic issues.
It can help you catch mistakes early in the development process and
ensure that your code is consistent and maintainable.
There are many linters available for Python, but one of the most popular is flake8.
flake8
is a command-line tool that runs several other linters, including pyflakes
and pycodestyle
, and
reports any errors or warnings that it finds.
You can run flake8
from the command line to analyze your code.
For example, to check all Python files in the current directory and its subdirectories, you can use the command flake8 .
(see the documentation for more options/configuration).
flake8
will report any errors or warnings that it finds,
along with the line number and a brief description of the issue.
Fix the issues reported by flake8
and run it again to ensure that
your code is free of errors and warnings.
Exercise: run flake8
on the module ~/robust-programming/tests/bad_quality_flake8.py
and fix the errors:
flake8 ~/robust-programming/tests/bad_quality_flake8.py
Note that we installed flake8
but also some of its extensions to ensure a wider rule coverage:
flake8==7.0.0
flake8-docstrings==1.7.0
pep8-naming==0.13.3
Formatters: the black example
Complying to a coding style can be a tedious task, which a linter won’t help you with. A formatter can help you automate this process by automatically formatting your code to comply with a specific coding style.
One of the most popular formatters for Python is black.
black
reformats your entire file in place to comply with the black coding style (a strict subset of PEP 8).
It is a great tool to enforce consistent formatting across your code base.
Exercise: first run black
on the module ~/robust-programming/tests/bad_quality_black.py
with the options --test
and diff
black --check --diff ~/robust-programming/tests/bad_quality_black.py
--test
runs black without overwriting the file, and --diff
shows you what
would have been changed: +
are additions, and -
are deletions. Once you
understand better the changes, you can directly apply black
to reformat the file:
black ~/robust-programming/tests/bad_quality_black.py
Linters vs formatters
Should you use a linter or a formatter? In practice both!
They usually are very complementary, and your code will be only
better by using both. Note however they can contredict each other
sometimes, see e.g. the black
FAQ
to remedy to this problem.
When to run linters and formatters?
You can run your favourite linter and formatter on your local
machine each time you make changes, but let’s face it: it will
become annoying very quickly, and there is a high chance to forget to
do it. Instead, you can automate the rule checks using pre-commit
hooks
and continuous integration
jobs.
Pre-commit hooks
Pre-commit hooks are scripts installed in your .git/hooks
directory that are run before each commit.
They can be used to automate the process of checking for errors and enforcing coding standards,
and can help you catch mistakes early in the development process.
In Python, the pre-commit package can be used to manage pre-commit hooks. It allows you to define a set of hooks in a configuration file, and then run them automatically before each commit.
Continuous integration
Pre-commit hooks are great, but they assume that you have your development environment installed in your local machine, which is not always the case. So instead of enforcing checks before committing the code, you can perform the check after each push in the repository. This is called continuous integration, and you will learn more at the end of this lecture (see continuous integration).
C++ automatic rule checkers
Compiler warnings
One advantage of using an ahead-of-time compiler, as C++ programmers normally do, is that the compiler can check some aspects of program correctness without even running the program. Unfortunately, for historical and cultural reasons, many of these checks are disabled by default.
Compiler warnings, also known as “diagnostics” or “lints”, usually point at code which is legal according to the C++ standard, but invokes tracherous language corner cases that are unlikely to match programmer intent/comprehension. Code which triggers compiler warnings should normally be reviewed carefully for correctness, and ultimately rewritten into a more readable/correct form that does not trigger the warning.
In this section, we will explore some useful options of the GCC and clang compilers that you may want to enable to let them detect more issues with your code. We will focus on the behavior of GCC, which is largely imitated by clang. But if you are using another compiler, such as Microsoft Visual C++, you may want to check its documentation.
-Wall
This incorrectly named GCC and clang compiler option does not enable all error checking features, as you would expect. It does, however, enable a set of compiler warnings that the GCC and clang developers consider to provide a good trade-off between protection and false positives.
Click here for some examples of common problems that you can detect
with -Wall
- Comparison of a
char*
pointer to a hardcoded string: The following code will compare the memory addresses of two strings, and not their contents as the developer likely intended, a classic mistake in C and old-style C++.
void charptr_eq_str(const char* s) {
if (s == "reference string") { /* ... */ }
}
- Statically out of bounds array accesses: A C++ program that performs reads or writes beyond the end of an array invokes undefined behavior. This is the worst kind of error that a C++ programmer can make, because a compiler is allowed to assume that it does not happen when optimizing code, and thus unpredictable chaos will ensue at execution time. Using analysis that is carried out during the code optimization process (i.e. only when optimizations are enabled), GCC can detect this error in easy cases where all relevant parameters are known are compile time, as in the following example:
float numbers[42];
for (int i = 1; i <= 42; ++i) { // Whoops! This is C++ not Fortran/Julia
numbers[i] = i;
}
- Misleading indentation: One unfortunate design decision of C++ is that
control flow operations like
if
andwhile
do not start a new code block. As a result, it is very easy to accidentally write code like the following, which-Wall
will rightly report as suspicious:
if (condition)
foo(); // Runs only if condition is true
bar(); // Runs always, no matter if condition is true or false
- Missing enum variant in switch : One problem with enums and switch statements is that it is very easy to start from correct usage…
enum class Color {
Red,
Green,
Blue
};
// ... much later, possibly in a different source file ...
void print_color(Color color) {
switch (color) {
case Color::Red:
std::cout << "Red" << std::endl;
break;
case Color::Green:
std::cout << "Green" << std::endl;
break;
case Color::Blue:
std::cout << "Blue" << std::endl;
}
}
…and later break the code by adding new enum variants without updating switch statements:
enum class Color {
Red,
Green,
Blue,
Pink // This is new and support must be added to the switch statement
};
With -Wall
, GCC will correctly point out that the switch statement must be
updated.
-Wextra
As previously mentioned, the -Wall
option only enables some of the available
compiler warnings. More warnings can be enabled by additionally using the
-Wextra
option, which historically enabled warnings that the compiler authors
considered to provide a less favorable true vs false positive compromises.
Most programmers, however, would consider the warnings which are enabled by
-Wextra
nowadays to point at code patterns that are still highly objectionable
and worth eliminating.
Click here for some examples of common problems that you can detect
with -Wextra
- Implicit switch fallthrough: Starting from the incorrect switch
statement example from the
-Wall
section, this would be one incorrect way to fix the it:
void print_more_colors(Color color) {
switch (color) {
case Color::Red:
std::cout << "Red" << std::endl;
break;
case Color::Green:
std::cout << "Green" << std::endl;
break;
case Color::Blue:
std::cout << "Blue" << std::endl;
case Color::Pink: // This breaks the former Color::Blue case
std::cout << "Pink" << std::endl;
}
}
This is incorrect because C++ switch statements infamously fall through by
default, which means that if the switch statement is invoked with color
set
to Color::Blue
, the program will continue executing code for other cases
(in this case Color::Pink
), and thus output both “Blue” and “Pink” in
succession. With -Wextra
, GCC will rightly point out that a break;
statement should be inserted before the start of the Color::Pink
case.
- Missing field initializations: If you have a struct with multiple fields such as…
struct MultiField {
int a;
float b;
int c;
};
…then C++ allows only some of the fields to be initialized during construction:
MultiField make_multi_field() {
return { 12, 3.4 };
}
When you do this, only the first struct fields are initialized, and the
remaining ones are left uninitialized. This surprising behavior commonly
happens as a result of adding new struct fields without updating code that
instantiates the struct, and because reading uninitialized data is undefined
behavior, it is a very dangerous mistake. GCC will therefore warn about this
patten in -Wextra
mode.
- Signed/unsigned integer type confusion: The C++ community has had many
heated debates about where signed vs unsigned integer types should be used.
One thing remains constant, however: programmers often pick the inappropriate
integer type for their application, to disastrous results. Here are two
examples that are detected by GCC in
-Wextra
mode:
void suspicious_int_conditions(int s, unsigned u) {
// You probably meant to use a signed integer here
if (u < 0) { /* Never executed */ }
// The signed integer is reinterpreted as unsigned, with negative numbers
// mapped into positive ones. This is likely to result in surprising outcome.
while (s < u) { /* ... */ }
}
Hopefully, you will agree that there is little point in using only -Wall
without -Wextra
these days, and both options should normally be used.
Other correctness-oriented warnings
At this point, it will probably not surprise you that the -Wall -Wextra
combination still does not enable the full error-checking power of GCC and
clang. Among the many other available
options,
you may specifically want to enable…
-Wshadow
: This detects situation where a local variable has the same name as a global variable, which is well-defined by the language (local variable takes precedence over global one) but can easily confuse a reader who is aware of the existence of the global variable.-Wnon-virtual-dtor
: This detects situation where a class has virtual methods, and is thus meant to be used via a pointer-to-base-class, but does not have the virtual destructor that this usage normally requires for correct resource cleanup.-Woverloaded-virtual
: This detects situations where code creates an overload of a virtual method, rather than an override, which is usually a mistake coming from an incorrect or outdated copy of the virtual method’s signature.-Wconversion
detects risks of loss of information caused by implicit conversions (e.g. trying to store asize_t
into anchar
will normally lead to truncation).-Wduplicated-cond
detects theif (cond) { /*...*/ } else if (cond) { /*...*/ }
duplicate condition pattern, which usually comes from copy-and-paste errors.-Wduplicated-branches
detects theif (cond1) { stuff(); } else { stuff(); }
duplicate branch pattern, which also normally comes from copy-and-paste errors.-Wnull-dereference
detects simple attempts to dereference a null pointer, which is undefined behavior and thus strictly forbidden in any C++ program.-Wdouble-promotion
detects situations where a computation was seemingly intended to be carried out in single precision, but is actually carried out in (potentially slower) double precision due to the use of double-precision literals.
-Wpedantic
aka -pedantic
While both -Wall
and -Wextra
are mainly about correctness, this is mainly
about portability. By default, GCC and clang both provide a number of “GNU”
extensions to the C and C++ standard which are not available in other compilers,
including but far from limited to…
- Mathematical constants like
M_PI
in<cmath>
/<math.h>
. - Support for the
void main()
main function signature. - C-style arrays whose length is
0
or determined at runtime. __restrict__
manual alias analysis optimizations.- Hardware-agnostic SIMD support.
Many of thse extensions are extremely widely used and could be considered bugfixes for shortcomings of the C++ standard. But nonetheless they are not part of the standard, not available in other compilers, and a program that uses them is technically not a true C++ program.
The synonymous -pedantic
and -Wpedantic
command line options disable some of
these extensions. If you care in any capacity about the portability of your code
to compilers other than GCC and clang, including Microsoft, NVidia and Intel
compilers, you should consider using these compiler options in order to reduce
the risk of accidentally using GCC- or clang-specific features.
Beware, however, that -Wpedantic
does not detect usage of all compiler
extensions and is therefore not a substitute for running test builds for all
compilers you care about in your project’s continuous integration
infrastructure.
-Weverything
This clang-specific command line option is what most programmers agree -Wall
should have been. It enables every single clang warning. It is commonly combined
with a set of -Wno-xyz
options (e.g. -Wno-shadow
), which disables specific
compiler warnings that the developers themselves (not the compiler authors)
deemed to have an unfavorable cost/benefit tradeoff for their project.
-Werror
Because most compiler warnings point at code constructs that are suspicious, but not illegal according to the language rules, warnings do not normally cause compilation to fail.
For everyday development builds, this is undesirable: it commonly leads to undesirable outcomes like warnings being ignored by developers, or scripts continuing to do thing like running tests or other scripts on a build that does not even pass the compiler’s internal checks, hiding the outcome of these checks pretty deep in the depth of old build logs.
In these situations, the -Werror
compiler option, which turns warnings into
hard erros that fail the build, can be used to ensure that the build fails
promptly when warnings as present, without burying the warnings below thousands
of lines of unrelated script output.
However, -Werror
should not be made a mandatory part of every non-development
build of the project, because it means that a single compiler update that adds a
new warning is all it takes to break the build for everyone, including mere
users of the project who are not working on it and would not want to write
patches to fix the warnings.
Exercise
The analyzers/mdbook_snippets.cpp
file contains all the source code examples
of this chapter, and can be compiled using the Makefile
in the same directory
using the make
command.
Try to modify the Makefile’s first line (CXXFLAGS
variable definition) to
enable some of the warnings discussed in this chapter. Then rebuild using this
new configuration (you may need to run make clean
before make
), and see how
the compiler starts to report the various issues that were discussed previously.
clang-format
Modern programming languages allow a single program can be written in many different ways, with different amounts of spaces, line feeds, parentheses, and so on. This creates a conundrum:
- On one hand, programmers who learn/work on their own tend to develop, over time, a distinctive way to lay out code in the source file that they are most comfortable with.
- On the other hand, programs which are written by many different programmers are a lot more readable when everyone agrees to follow a common style, than when each individual developer goes and lays out the code in their own preferred/signature style.
The universally accepted solution to this problem is to standardize on a common coding style whose rules are simple enough to be easily machine-enforced. Tools can then be used to automatically translate the entire project’s code to this style, both at adoption time and whenever new code gets integrated into the common repository.
It is inevitable that the simple rules enforced by the tool will result in imperfect results that everyone agrees looks bad from time to time. To some extent, this can be addressed by making the rules more complex, which is why automatic code formatters tend to have a bewildering number of configuration options.
One should not forget, however, that attempts to improve code formatting often translate into hours of team arguments and configuration trial and error with near zero useful code production. The best is the enemy of the good, and staying as close as possible to the default tool configuration, however perfectible it may be, has the major advantage of quickly giving you the main benefit of automatic code formatting (standardizing the project’s coding style with minimal manual effort) without paying too many costs in return (endless team arguments about ideal code style).
In C++, the generally agreed standard for automatic code formatting is the
clang-format
utility. It internally uses the clang compiler, but can be used
on any project even if the build is done using GCC or MSVC, as long as clang
still manages to make basic sense of the source code. It is very easy to use:
just point it at a source file with the -i
flag for in-place reformating,
and it will adjust the formatting according to the default style:
clang-format -i source.cpp
With a bit of extra work, you can also configure your code editor to automatically reformat source files whenever they are saved, or configure version control systems like git so that each modified source file is reformatted before creating a commit. Finally, as a last resort, your project’s continuous integration should check that the source files are formatted correctly before code is integrated into the project’s master branch. We’ll talk more about continous integration later in this course.
Exercise
Because it is used as part of a static site generation pipeline and must match
the indentation expectations of the underlying HTML generation tool, the
mdbook_snippets.cpp
file has a very unusual style that you may find hard to
read.
Try to automatically reformat it using clang-format
, then visually inspect the
result using your favorite text editor. Are there any formatter decisions that
you find yourself disagreeing with?
Dynamic analysis
clang-format
and compiler warnings are two examples of static program
analysis: they can analyze and transform source code without running the
underlying program, and reach conclusions that are valid no matter which input
values the program is called with, or which operating system and hardware is
used to executes the program. Ultimately, however, static analysis will not find
many classes of problems in your code for two key reasons.
First, static analysis is hard. When working on static analysis tools, it is surprisingly easy to encounter edge cases where a precise analysis algorithm which tries to always returns the correct result would take an extremely long time to run or not manage to reach a conclusion at all.
Second, some information that is critical to program correctness is not known until runtime. For example, if a C++ program indexes an array with an integer that is specified by the user through the command line interface or API, whether that array access is in bounds (correct well-defined behavior) or out of bounds (incorrect undefined behavior) cannot be determined at compile time. A static analyzer could pessimistically flag this code pattern as potentially flawed, and some do. But this would inevitably have false positives, which would reduce programmer trust in the analyzer.
The solution to both of these problems is to stop relying on static analysis alone, and instead complement it with dynamic analysis, which analyzes the runtime behavior of the program during the execution of one or more specific test jobs, in a specific environment and with specific inputs.
Of course, one general drawback of dynamic analysis is that it will only find errors in that particular execution configuration. If your program is susceptible to out-of-bounds array accesses, but your test workloads do not trigger any such access, then dynamic analysis will not report the problem. In other words, dynamic analysis is only as good as your test suite is.
Valgrind
Valgrind uses a form of dynamic analysis called dynamic binary instrumentation. You can think of it as running your program in a kind of x86 CPU emulator (technically a JIT) that supports injection of arbitrary checks every time an instruction is executed, a basic block is entered/exited, etc.
One advantage of this approach is that it can be applied to any program without any preparation (although availability of debug info will greatly improve error messages). The drawbacks are that…
- Some information from the source code (e.g. various categories of undefined behavior) is missing in the final binary, which limits the amount of problems that can be detected.
- Instrumentation must be shoehorned into an existing binary, without invalidating code and data pointers or modifying existing control flow. As a result, it cannot be implemented very efficiently, which commonly results in execution being slowed down by ~30x. You will want to account for this by running valgrind-instrumented programs on smaller workloads.
To keep the implementation simple, valgrind also serializes execution of all program threads onto a single OS thread. This means that parallel programes will not get any speedup from availability of multiple CPU cores, resulting in an even greater valgrind-induced slowdown.
For program correctness checking purpose, the most useful valgrind tool is Memcheck, which is conveniently the one that is enabled by default when you run valgrind without any extra option. Memcheck checks for various memory usage errors, including…
- Out-of-bounds accesses aka “buffer overflow”.
- Use of uninitialized memory, which is undefined behavior.
- Use-after-free of heap-allocated data.
- Invalid frees, aka trying to free a pointer that was not previously allocated by the matching allocator (or was already freed before).
- Memory leaks, aka allocating data and never freeing it.
To reduce overhead and false positives, some checks are disabled by default. See the command-line options documentation for more info about these and how they can be enabled when needed.
The easiest way to run your program through memcheck is to first make sure that
your program is built with debug symbols (enabled by GCC/clang option -g
) and
then run it through valgrind using this sort of command line invocation:
# Simple version: just run "your_program" under valgrind's watch.
valgrind ./your_program
# More complex version, with extra arguments to both valgrind of your program
#
# To make the command more readable, using -- to separate valgrind's options
# from the command line that valgrind is monitoring is recommended.
valgrind --leak-check=full --track-origins=yes -- ./your_program arg1 arg2
Sanitizers
Some problems cannot be found by binary instrumentation like valgrind, either because the relevant information is not present in final binaries or because the slowdown brought by the dynamic instrumentation is so great that getting to the point of program execution where the error is triggered would take an excessive amount of time and machine resources.
In those situations, better results can be obtained by using a more invasive form of dynamic analysis, where the compiler injects supplementary error-checking code during program compilation. In C++, this is most commonly done using the Sanitizer tool family:
- AddressSanitizer and MemorySanitizer cover memory errors similar to those found by valgrind’s Memcheck, but at a much lower runtime performance cost.
- ThreadSanitizer and UndefinedBehaviorSanitizer cover many more cases of undefined behavior that Valgrind cannot reliably detect because un-instrumented binaries do not contain enough information about what the original source code was doing.
- Other sanitizers are available, but they cover quite specialized failure modes and require more involved setup, so they are only used in situations where the high cost of failure justifies extra effort (e.g. security-critical software that processes untrusted user input).
In exchange for detecting more bugs and having a reduced impact on program performance than valgrind, sanitizers bring several limitations of their own:
- Due to their deep integration with compiler internals, they work best when used with the compiler and libc/STL implementation that they were developed against, typically clang/libc++. Other compilers like GCC may try to support them too, but will often do so less reliably (expect hard-to-debug crashes and some missed error detection).
- Undefined behavior is a program-wide property, it may not directly happen in the code that you wrote but instead in libc or other libraries that your program uses as a result of an invalid API request. Therefore complete error detection requires recompiling every single dependency of the program, down to libc, with sanitizer instrumentation. Source-based package managers like Spack may help with this, but in general achieving good sanitizer coverage in larger software projects is best done with help from the project’s computing infrastructure experts.
To use sanitizers, just pass the right -fsanitize=xyz
option to your compiler
during build, as documented in your compiler’s
manual.
Beware that some sanitizers are incompatible with each other, and even if they
work combining them will result in increased runtime resource usage, so you may
need multiple full-stack software builds to cover all the sanitizers that you
are interested in. Use of a dedicated highly specced build & test server is
therefore recommended.
It’s also worth noting that by default sanitizers don’t always stop the program when an error is detected, which can result in duplicated error reporting and misses in automatic test runs. You can enforce fail-on-error mode on operating systems where it is not the default using environment variables, e.g.
export ASAN_OPTIONS=abort_on_error=1:halt_on_error=1
export UBSAN_OPTIONS=abort_on_error=1:halt_on_error=1
Exercise
As previously discussed, the ability of GCC’s warnings to detect undefined behavior, like incorrect memory usage, is unfortunately very limited.
In particular, the out-of-bounds array access that the fortran_iteration()
function performs is not detected when compiler optimizations are not enabled.
Experiment with valgrind and sanitizers, and see if you can find a tool configuration that succeeds at detecting this out-of-bounds memory access.
Conclusion
Where tools won’t help
It is easy to get carried away and add lots and lots of automated tools to your software build and test infrastructure, in the hope of finding more and more errors, but keep in mind that…
- Each tool you add slows down your automated test routine and requires additional long-term maintenance effort. Because of this, you should normally only use one tool per category of error that you want to detect. Prefer a few good tools over many bad ones.
- Tools are mainly focused on helping you follow the language rules and avoid
suspicious constructs (areas of the language that are known to be hard to use,
or useless code that suggests a typo has been made). Tools will not normally
find more advanced logic errors, such as incorrect math, hardcoded
reduced-precision math constants (e.g.
#define PI 3.14
) or use of exact comparison for validation of floating-point results. - To detect this sort of errors and make the most of dynamic analysis tools, you will want to have a good test suite that covers as much of your code as possible in a way that is as fine-grained as possible. We will explore this further in the remainder of this course.
Want more tools anyway?
Here are some other tools that you may want to experiment with, which may find extra issues beyond the ones covered by the tools we discussed here.
- Static analysis:
- Dynamic analysis:
Documentation
It is important to document how software works, because it allows people to understand how it works a lot faster than by trying to make sense of the source code. We’re not only talking about other people here, by the way. When you get back to working on some software project after a break of several months, you will also thank yourself for having written good documentation.
Software can be provided with many different flavors of documentation, including…
- API reference: Summarizes what individual data types are meant to store, what functions take as input and emit as output, what are the error conditions…
- Tutorials/Examples: Introduces a new user to the software, what it does, and how it’s used in some concrete, hopefully common scenarios.
- Architecture/Design overview: In larger software, explains how the various sub-parts fit together and why they were designed that way.
- Contribution guide: In open-source projects, tells newcomers about how to best submit new contributions to the software.
- Data management plan: In applications that are meant to store user data for extended periods of time, explains how data gets there, how long it’s kept in the system, how it’s protected from system failures…
Documentation is not even necessarily a written document that’s separate from the source code. Some aspects of the source code, such as the name and type of code entities, can also provide basic documentation to the reader of the code. For example, compare the following two functions:
# Would you rather use this function?
def do_stuff(x, y):
return x[y]
# ...or this function?
def get_phone_number(
phone_book: dict[Name, PhoneNumber],
full_name: Name
) -> PhoneNumber:
return phone_book[full_name]
// Would you rather use this function?
auto do_stuff(auto&& x, auto&& y) {
return x[y];
}
// ...or this function?
PhoneNumber get_phone_number(const std::map<Name, PhoneNumber>& phone_book,
const Name& full_name) {
return phone_book[full_name];
}
Even though do_stuff
and get_phone_number
can be used to do the same thing,
you will hopefully agree that get_phone_number
is a lot easier to understand
in its because it’s more specific about what it does. There is a more general
lesson to be learned here: your code should only be as generic as it needs to be
for the task at hand, because specific programs are easier to understand,
document, test and maintain than generic ones. Please resist the temptation to
write code which solves problems that you do not have!
In this practical, we will cover the two most common forms of documentation, namely API documentation and tutorials. API documentation tools are language-specific so you will need to pick which language you want to study here:
Python API documentation
Docstrings
A docstring is a string literal that occurs as the first statement in a module, function, class, or method definition.
Such a docstring becomes the __doc__
special attribute of that object.
>>> import numpy as np
>>> print(np.sum.__doc__)
Sum of array elements over a given axis.
Parameters
----------
a : array_like
Elements to sum.
axis : None or int or tuple of ints, optional
Axis or axes along which a sum is performed. The default,
axis=None, will sum all of the elements of the input array. If
axis is negative it counts from the last to the first axis.
.. versionadded:: 1.7.0
If axis is a tuple of ints, a sum is performed on all of the axes
specified in the tuple instead of a single axis or all the axes as
before.
dtype : dtype, optional
The type of the returned array and of the accumulator in which the
elements are summed. The dtype of `a` is used by default unless `a`
has an integer dtype of less precision than the default platform
integer. In that case, if `a` is signed then the platform integer
is used while if `a` is unsigned then an unsigned integer of the
same precision as the platform integer is used.
out : ndarray, optional
Alternative output array in which to place the result. It must have
the same shape as the expected output, but the type of the output
values will be cast if necessary.
keepdims : bool, optional
If this is set to True, the axes which are reduced are left
in the result as dimensions with size one. With this option,
the result will broadcast correctly against the input array.
If the default value is passed, then `keepdims` will not be
passed through to the `sum` method of sub-classes of
`ndarray`, however any non-default value will be. If the
sub-class' method does not implement `keepdims` any
exceptions will be raised.
initial : scalar, optional
Starting value for the sum. See `~numpy.ufunc.reduce` for details.
.. versionadded:: 1.15.0
where : array_like of bool, optional
Elements to include in the sum. See `~numpy.ufunc.reduce` for details.
.. versionadded:: 1.17.0
Returns
-------
sum_along_axis : ndarray
An array with the same shape as `a`, with the specified
axis removed. If `a` is a 0-d array, or if `axis` is None, a scalar
is returned. If an output array is specified, a reference to
`out` is returned.
See Also
--------
ndarray.sum : Equivalent method.
add.reduce : Equivalent functionality of `add`.
cumsum : Cumulative sum of array elements.
trapz : Integration of array values using the composite trapezoidal rule.
mean, average
Notes
-----
Arithmetic is modular when using integer types, and no error is
raised on overflow.
The sum of an empty array is the neutral element 0:
>>> np.sum([])
0.0
For floating point numbers the numerical precision of sum (and
``np.add.reduce``) is in general limited by directly adding each number
individually to the result causing rounding errors in every step.
However, often numpy will use a numerically better approach (partial
pairwise summation) leading to improved precision in many use-cases.
This improved precision is always provided when no ``axis`` is given.
When ``axis`` is given, it will depend on which axis is summed.
Technically, to provide the best speed possible, the improved precision
is only used when the summation is along the fast axis in memory.
Note that the exact precision may vary depending on other parameters.
In contrast to NumPy, Python's ``math.fsum`` function uses a slower but
more precise approach to summation.
Especially when summing a large number of lower precision floating point
numbers, such as ``float32``, numerical errors can become significant.
In such cases it can be advisable to use `dtype="float64"` to use a higher
precision for the output.
Examples
--------
>>> np.sum([0.5, 1.5])
2.0
>>> np.sum([0.5, 0.7, 0.2, 1.5], dtype=np.int32)
1
>>> np.sum([[0, 1], [0, 5]])
6
>>> np.sum([[0, 1], [0, 5]], axis=0)
array([0, 6])
>>> np.sum([[0, 1], [0, 5]], axis=1)
array([1, 5])
>>> np.sum([[0, 1], [np.nan, 5]], where=[False, True], axis=1)
array([1., 5.])
If the accumulator is too small, overflow occurs:
>>> np.ones(128, dtype=np.int8).sum(dtype=np.int8)
-128
You can also start the sum with a value other than zero:
>>> np.sum([10], initial=5)
15
All modules should normally have docstrings, and all functions and classes exported by a module should also have docstrings.
Public methods (including the __init__
constructor) should also have docstrings.
A package may be documented in the module docstring of the __init__.py
file in the package directory.
String literals occurring elsewhere in Python code may also act as documentation.
They are not recognized by the Python bytecode compiler and are not accessible as runtime object attributes (i.e. not assigned to __doc__
),
but two types of extra docstrings may be extracted by software tools:
- String literals occurring immediately after a simple assignment at the top level of a module, class, or
__init__
method are called “attribute docstrings”. - String literals occurring immediately after another docstring are called “additional docstrings”.
More information at https://peps.python.org/pep-0257/.
Google vs NumPy vs others
How should the docstrings be formatted? There are many variants that you can choose, and the two main popular styles are:
- NumPy: https://numpydoc.readthedocs.io/en/latest/format.html#docstring-standard
- Google: https://google.github.io/styleguide/pyguide.html
The main difference between the two styles is that Google uses indentation to separate sections, whereas NumPy uses underlines.
Doctest
A doctest is a way to embed test cases within the documentation of a function or module in Python.
The tests are written in the form of interactive Python sessions,
and they are used to make sure that the code examples in the documentation are accurate.
Check for example the examples given above in the np.sum
function.
Let’s take a simple Python function that calculates the area of a rectangle:
import sys
import doctest
def area(width, height):
"""Calculate the area of a rectangle.
>>> area(5, 10)
50
>>> area(2, 3)
6
"""
return width * height
if __name__ == "__main__":
sys.exit(doctest.testmod())
The tests are executed using:
python mymodule.py
This will run all of the doctests in the current module and report any failures. Alternatively you can run it using the doctest module from the command line:
python -m doctest mymodule.py
Exercise
Add a docstring to this function by documenting its arguments, its return value. Do not hesitate to add missing types as well in the definition of the function. You can add any useful notes as well (see the example above), and finally add a doctest to it:
def vector_sum(vec, vec2):
output = []
for i in range(len(vec)):
output.append(vec[i] + vec2[i])
return output
Use generators!
Writing documentation is a time consuming task that we easily skip if we do not get some help. Typically, we recommend to use an extension with your favourite code editor to automatically generate template docstrings (e.g. autoDocstring in VSCode and VSCodium), or use integrated chatbot based on large language models (e.g. Microsoft Copilot). In the latter case, please be sure to check the output carefully as it may seem sensible at first glance, but could turn out to be completely wrong on closer examination.
C++ API documentation
Introduction
In C++, the de-facto standard for API documentation is Doxygen. Like many API documentation tools, it works by parsing the program’s source code, looking for specially formatted code comments that precede the declaration of the code entity (type, function…) that is being documented.
// The following code comment is parsed by Doxygen as a function description,
// since it comes right before a function declaration.
//
// Notice how the \param directive can be used to document individual function
// parameters. You should not overuse this functionality: keep it for situations
// where the function documentation is not, on its own, complete enough for
// users to infer parameter semantics.
/*!
* Compute the squared Euclidean norm of `vec`.
*
* In linear algebra, the norm of a vector quantifies how far away from the
* origin the vector extends in the underlying vector space. Among possible
* norms, one will most commonly use the Euclidean norm or 2-norm, which is
* the square root of the sum of squared vector coordinates.
*
* In low-dimensional spaces, computing the square root is much more expensive
* than other parts of the norm computation. Therefore, it is recommended to use
* the squared norm over the actual Euclidean norm whenever your use case allows
* for it.
*
* \param[in] vec Coordinates of a vector in an n-dimensional Euclidean space.
*/
double norm2(const std::vector<double>& vec);
This is a good design choice for an API documentation system because it brings the documentation as close as possible to the code entity that is being documented, which reduces the odds that as the code evolves, developers will forget to update associated API documentation.
However, this also means that Doxygen a poor fit for documentation which is not as closely related to the source code, such as user tutorials, as using it for this purpose would drown the source code into heaps of long comments which are not strongly related to the program’s main job. We will later discuss another documentation tool, mdBook which is better suited to this sort of documentation.
Setting up Doxygen
First of all, you need to install the doxygen
command line tool. For the
purpose of this training, we have done it for you in the virtual machines on
which the practicals are un. But in most Linux distributions, this is most
easily done by using the doxygen
package from your distribution’s standard
package manager. Other operating systems may require finding an alternate source
of pre-built binaries, or even building Doxygen from
source.
After installing Doxygen, the next step will be to create a configuration file
for your project. This is most easily done by using the doxygen -g
command,
which will create a Doxyfile
in the current directory that contains all
configuration options, pre-set to mostly reasonable default values. The
configuration file uses a simple KEY = VALUE
format, supplemented by comments
that explain what each configuration option does.
Options that you may want to adjust away from the defaults include…
EXTRACT_PACKAGE = YES
: This ensures that every source file gets documented, even in absence of a file-wide documentation comment. This avoids the puzzling situation where you added documentation to some code entity but it doesn’t appear in the Doxygen output. In programming languages with a source/header dichotomy like C++, this setting is best complemented with removing every source file extension from theFILE_PATTERNS
list.QT_AUTOBRIEF = YES
: This ensures that the first sentence of an entity’s documentation is interpreted as a summary, which is a good practice in general, and better than the visually noisy alternative of needing explicit\brief
tag inside of each and every doc comment.SHOW_INCLUDE_FILES = NO
: This suppresses the default Doxygen behavior of listing every header included by a source file, which is usually more noisy than informative.
Once Doxygen is configured, running doxygen
in your source directory will
produce, among other things, an html
directory containing a set of HTML pages
with your new source code documentation. It is advised to add this directory,
along with other automatically generated Doxygen output, to your project’s
.gitignore
version control exclusion rules.
Another optional step which you may want to take is to set up your build system so that documentation can be built together with the source code in a single command. This ensures that the documentation is regularly built in your everyday development workflow, and errors in the documentation are therefore noticed quickly.
The apidoc/
subdirectory of the source code examples demonstrate a full setup
that uses the aforementioned configuration, based on the GNU Make build system.
Running make
in this directory (or the parent directory) will automatically
rebuild both the provided C++ programs and the associated Doxygen documentation.
Writing the API documentation
A complete guide to Doxygen comments would be out of scope for this quick tutorial, we will refer you to the official manual for this. But here is a complete header file that demonstrates some key features of Doxygen that you may be interested in when starting out:
#pragma once
// It is a good idea to start a header file with a Doxygen comment that explains
// what it is about, what common theme unifies all functionality implemented by
// the underlying code module.
//
// If you can't find a common theme that can be summarized in a single short
// sentence, your code module is probably trying to do too many things and would
// benefit from being split into several simpler code modules.
//
// Notice the following Doxygen syntax points:
//
// - Comments are flagged for Doxygen consumption using a marker character, in
// this course we are using the Qt style of Doxygen comment where the marker
// character is an exclamation mark.
// - We need to clarify that we are documenting the entire source file, rather
// than some specific code object, using the \file directive.
// - Thanks to the QT_AUTOBRIEF = YES configuration option, the first sentence
// of a Doxygen comment is automatically treated as a summarized description
// of the entity that is being documented.
/*!
* \file
*
* This file demonstrates basic usage of Doxygen for API documentation purpose.
*
* We will use it to demnstrate notions such as file-wide documentation,
* function documentation, type documentation, and so on.
*/
#include <vector>
// The following code comment is parsed by Doxygen as a function description,
// since it comes right before a function declaration.
//
// Notice how the \param directive can be used to document individual function
// parameters. You should not overuse this functionality: keep it for situations
// where the function documentation is not, on its own, complete enough for
// users to infer parameter semantics.
/*!
* Compute the squared Euclidean norm of `vec`.
*
* In linear algebra, the norm of a vector quantifies how far away from the
* origin the vector extends in the underlying vector space. Among possible
* norms, one will most commonly use the Euclidean norm or 2-norm, which is
* the square root of the sum of squared vector coordinates.
*
* In low-dimensional spaces, computing the square root is much more expensive
* than other parts of the norm computation. Therefore, it is recommended to use
* the squared norm over the actual Euclidean norm whenever your use case allows
* for it.
*
* \param[in] vec Coordinates of a vector in an n-dimensional Euclidean space.
*/
double norm2(const std::vector<double>& vec);
// Doxygen can also document much more complex code entities, such as class
// hierarchies.
/*! This is the base class of all evil
*
* It has no destructor because evil cannot be destroyed, only displaced.
*/
class EvilBase {
public:
/*! Automatically set up the evilness */
EvilBase() = default;
/*! Query the current level of evilness */
int evilness() const { return m_evilness; }
protected:
/*! Manually configure the level of evilness (for experts only) */
explicit EvilBase(int evilness) : m_evilness(evilness) {}
private:
/*! Current level of evilness */
int m_evilness = 42;
};
/*! Travel management system powered by the souls of unborn children
*
* Hand washing is strongly advised after prolonged contact with instances of
* this class. In case of accidental ingestion, seek immediate medical
* assistance.
*/
class Notilus : public EvilBase {
public:
/*! Release the proverbial Kraken */
Notilus() : EvilBase(666) {}
};
Exercise
Pick any C++ codebase (either one of the source directories supplied as part of
this course, other than apidoc
obviously, or possible a simple program of
yours) and try to set up Doxygen and write some documentation for some of the
public entities in the source code.
Conclusion
As you can see, writing API documentation is not terribly difficult, and can be a significant asset to people getting familiar with your program if done right.
However, there is more to documentation than API docs. For example, while it is technically possible to use Doxygen to document higher-level architecture or write tutorials, as done by libraries like hwloc or Eigen, the quirks of Doxygen make this needlessly difficult. Nicer toolchains exist for standalone documentation, as we are going to see in the next chapter on mdbook.
Tutorial-style docs with mdBook
Introduction
API documentation tools are useful when documenting code. But if the document that you are writing is mainly structured prose with a small-ish amount of code in it, like the text that you are currently reading or user tutorials in general, there are nicer alternatives.
Static Site Generators (SSGs) that generate HTML from Markdown have gained tremendous popularity for this application in recent years because…
- Starting from a plain text format makes it much easier to use version control systems.
- Targeting HTML lets you leverage the huge ecosystem of web tools and hosting providers.
- Among the various plain text formats that compile to HTML, Markdown is by far the most popular option, and arguably one of the easiest to learn.
Unfortunately, Markdown by itself (including its CommonMark standardized dialect) does not specify enough functionality to ease the writing of complex, multi-page documents. Hence we often need to tie ourself to a specific renderer that extends Markdown enough to make it useful. In this practical, we will use mdBook for this purpose.
Getting started with mdBook
There are many alternatives to install mdBook on a machine. For the sake of simplicity, mdBook is already installed on the virtual machines that you are using.
Initialisation of the book
The mdbook init
command will create a new directory containing an
empty book for you to get started.
Give it the name of the directory that you want to create:
mdbook init my-first-book
It will ask a few questions before generating the book. After answering the questions, you can change the current directory into the new book, and inspect the directory:
$ tree -L 2 .
.
├── book
├── book.toml
└── src
├── chapter_1.md
└── SUMMARY.md
2 directories, 3 files
The file book.toml
contains settings for describing how to build your book.
This is written in the TOML markup language.
The default settings are usually good enough to get you
started. When you are interested in exploring more features and options
that mdBook provides, check out the Configuration chapter for more details.
The folder src
contains your source code to build the book,
written in markdown. You can add as many files as you want.
Note that the SUMMARY.md
file contains
the definition of the table of content. You can have many files in src
, but
only those defined in SUMMARY.md
will actually appear on the rendered book.
Finally the folder book
, initially empty, will contain the rendered book.
Rendering a book
There are several ways to render a book, but one of the easiest methods
is to use the serve
command, which will build your book and
start a local webserver. As you are on a Virtual machine, you have to specify
the hostname, as well as a port that is open:
mdbook serve --hostname <IP> --port <PORT>
where <IP>
is the IP of your machine, and <PORT>
is set to 24000 here (we
opened this port for you). Just follow the displayed URL on the terminal to read
your book. You can leave the server running even while you edit the content of
the book, and mdBook will automatically rebuild the output and automatically
refresh your web browser. Note that now the book
folder contains many
more files (recreated each time you make modifications in the src
folder).
Publishing a book
Once you have written your book, you may want to host it somewhere for
others to view. The first step is to build the output of the book.
This can be done with the mdbook build
command in the same directory
where the book.toml
file is located:
mdbook build
This will update the directory named book
which contains the HTML content
of your book. You can then place this directory on any web server to host it.
For more information about publishing and deploying, check out the Continuous Integration section from the mdBook documentation, and we will play with a concrete example at the end of this training (Continuous integration).
Exercise: build a logbook
Build a new book, whose sections follow the sections of this training. In each section, you will write your findings, notes and feedback about this training :-)
Testing your code: the basics
Testing a code… a very vast subject. For some it is a fuzzy concept, borderline boring, for others it is the heart of programming; but everyone agrees that testing a code correctly is not an easy task.
In most languages, there are libraries to facilitate the writing of tests, still it remains up to you to make the tests relevant and effective to understand the limits and the flaws of our implementation and design. And then there is the age-old question: have I tested my code enough?
It might not be an existential question if you’re writing a little library for yourself only (though…), but very quickly the question becomes legit. Keep in mind that most of big software projects used to start as small projects that have grown unexpectedly. Here is how the creator of Linux described the first public version of the now-ubiquitous operating system:
I’m doing a (free) operating system (just a hobby, won’t be big and professional like gnu) for 386(486) AT clones.
In this practical, we will cover two common tools to automate the execution of tests. As these tools are language-specific you will need to pick which language you want to study here:
Basic testing with pytest
In Python, there are libraries to facilitate the writing of tests. You have seen in
the documentation section the use of doctest
to write simple
tests attached to the documentation of a method or a class. While doctest
provides a first
layer of tests, called unit tests, it is not well suited to more complex tests, outside
the scope of a single object.
The other widely used test framework in Python is pytest.
pytest
is a testing framework for Python. It is a powerful tool for discovering and
running tests, and it has a number of useful features, such as the ability to rerun
only the tests that failed during the last run, and support for running tests in parallel.
Basic usage
You will find at ~/robust-programming/tests/test_addition.py
a simple test for the add
function:
def add(a: int, b: int):
""" Add two integers
Parameters
----------
a: int
First integer
b: int
Second integer
Returns
----------
out: int
Sum of `a` and `b`
Examples
--------
>>> add(2, 3)
5
"""
return a + b
def test_addition():
assert add(2, 3) == 5
assert add(-2, 3) == 1
assert add(2, -3) == -1
assert add(0, 0) == 0
In this script we define the function with its corresponding test suite, although in practice you will decouple the code and the tests in different files (even directory) and import necessary modules in the test files. Just execute this test and observe the result:
pytest ~/robust-programming/tests/test_addition.py
============================ test session starts =============================
platform linux -- Python 3.9.12, pytest-7.1.0, pluggy-1.0.0
Using --randomly-seed=1793550101
rootdir: /home/peloton/codes/robust-programming/practicals
plugins: asdf-2.14.3, typeguard-4.1.5, anyio-3.5.0, cov-4.0.0, factoryboy-2.5.1, rerunfailures-13.0, randomly-3.15.0
collected 1 item
code/tests/test_addition.py . [100%]
============================= 1 passed in 0.16s ==============================
There are several pieces of information
platform
: it gives you information on which platform (linux
here) the tests ran, and versions forPython
andpytest
used. Report this information for example in bug reports or merge requests to ease the reproducibility of results.seed
: the initial seed used when running the test (useful if your program uses randomly generates numbers, to allow reproduction of test runs).plugins
: additional plugins loaded (might not be used)collected item
: number of tests run. Notice that we check 4 assertions, but the number of collected items is 1:pytest
reports the number of test functions.passed items
: Summary of success and failure. In this case, our test passes.
Exercise: Change one value of a test to fail it on purpose, re-run pytest
, and observe the report.
Use with doctest
You can also run doctest
within pytest
. Just add the option --doctest-modules
when running the test:
pytest --doctest-modules ~/robust-programming/tests/test_addition.py
Note that in this case, pytest
collected 2 items (our manually defined test plus the doctest attached to the function add
).
Parametrize tests
Our current way of defining tests is not very compact (series of copy/paste with different values). You can instead parametrize your test by templating the operation to perform, and specifying the list of values (in/out) to test.
Exercise: Rewrite the test by using the pytest.mark.parametrize decorator.
Handle exceptions
Being aware of the limitations of your code is as important as knowing what your code does under normal circumstances. Hence it is a good practice to identify and test exceptions raised by your code.
The pytest.raises
function allows you to test that a specific
exception is raised when certain code is run. This can be useful
for verifying that your code is handling errors and exceptions as expected.
Exercise: Implement a test to check that the function add
raises a TypeError
if one of the two arguments is a string.
Coverage
One of the advantage of using a tool such a pytest
is to compute the
coverage of our test suite, that is the percentage of our code that
is tested. To enable the coverage report, you can run pytest
with the options --cov
and --cov-report
:
pytest --cov=. --cov-report=term ~/robust-programming/tests/test_addition.py
---------- coverage: platform linux, python 3.9.12-final-0 -----------
Name Stmts Miss Cover
-------------------------------------------------
code/tests/test_addition.py 7 0 100%
-------------------------------------------------
TOTAL 7 0 100%
Covered in this context means being hit during the execution of the test suites – which does not necessarily means it is tested meaningfuly.
Note that if you want to inspect in details which lines of your code are
covered or not, you will prefer the argument --cov-report=html
.
pytest
will create a folder htmlcov
with the source of a
web page that will contain the line-by-line coverage report for
each module.
Free practice
Write tests for your code, and execute them with pytest
. You might want to explore other functionalities of pytest
such as mock.
Basic testing with Catch2
Introduction
The easiest way to test software is to write a bunch of assertions. These are simple statements which assess that given certain predetermined inputs, your code should behave in a certain way (produce a certain result, throw an exception…).
It is certainly possible to write basic assertions by hand…
// Assume this is the code we want to test
int triple(int x) {
return 3 * x;
}
// Somewhere else in the source tree, we might write this test program:
#include <exception>
#include <iostream>
#include <sstream>
int main() {
const int result1 = triple(0);
if (result1 != 0) {
std::ostringstream s;
s << "expected triple(0) to be 0, but got " << result1;
throw std::runtime_error(s.str());
}
const int result2 = triple(1);
if (result2 != 3) {
std::ostringstream s;
s << "expected triple(1) to be 3, but got " << result2;
throw std::runtime_error(s.str());
}
const int result3 = triple(2);
if (result3 != 6) {
std::ostringstream s;
s << "expected triple(2) to be 6, but got " << result3;
throw std::runtime_error(s.str());
}
std::cout << "All tests completed successfully" << std::endl;
}
…but as you can see, this gets verbose quickly, and we have not even started introducing other commonly desired test harness features like…
- Being able to run all tests even if some of them fails and report on all failing tests at the end.
- Being able to selectively run some tests, without running the other ones.
Therefore, you will soon want to introduce some abstractions that reduce the amount of code you need to write for each test. Thankfully, you do not need to write this code yourself, as there are many high-quality test frameworks available for C++. In this practical, we will use Catch2.
Using it, the above test program becomes…
#include <catch2/catch_test_macros.hpp>
#include <catch2/catch_session.hpp>
TEST_CASE("triple() multiplies a number by 3") {
REQUIRE(triple(0) == 0);
REQUIRE(triple(1) == 3);
REQUIRE(triple(2) == 6);
}
int main(int argc, char* argv[]) {
return Catch::Session().run(argc, argv);
}
…which as you can see is a lot more readable, focused on the task at hand, and yet more featureful (you get the aforementioned test harness features for free, among other).
Basic Catch2 usage
In Catch2, as in many other test frameworks, testing is done using test cases, which are code blocks described using a string and containing more assertions. If one of the assertion fails, you will automatically get an error message telling which test case(s) failed and why:
Randomness seeded to: 3094418497
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
test_triple_catch.bin is a Catch2 v3.5.3 host application.
Run with -? for options
-------------------------------------------------------------------------------
triple() multiplies a number by 3
-------------------------------------------------------------------------------
test_triple_catch.cpp:7
...............................................................................
test_triple_catch.cpp:9: FAILED:
REQUIRE( triple(1) == 3 )
with expansion:
2 == 3
===============================================================================
test cases: 1 | 1 failed
assertions: 2 | 1 passed | 1 failed
Test cases terminate on the first failed assertion, so if you want to test multiple mathematically unrelated properties, it is a good practice to test them using separate test cases. But for any nontrivial test, doing this naively means duplicating common setup code. The standard way to avoid this in Catch2 is to use sections:
#include <catch2/catch_test_macros.hpp>
#include <catch2/catch_session.hpp>
TEST_CASE("std::vector size/capacity matches expectations") {
// Common setup
std::vector<int> v(42);
const size_t initial_capacity = v.capacity();
// Common expectations
REQUIRE(v.size() == 42);
REQUIRE(initial_capacity <= 42);
// Section that tests push_back
SECTION("adding an element increases the size, may affect capacity") {
v.push_back(123);
REQUIRE(v.size() == 43);
if (initial_capacity >= 43) {
REQUIRE(v.capacity() == initial_capacity);
} else {
REQUIRE(v.capacity() >= 43);
}
}
// Section that tests pop_back
SECTION("removing an element decreases the size, capacity is unaffected") {
v.pop_back();
REQUIRE(v.size() == 41);
REQUIRE(v.capacity() == initial_capacity);
}
// Section that tests clear
SECTION("clearing the vector zeroes the size, capacity is unaffected") {
v.clear();
REQUIRE(v.size() == 0);
REQUIRE(v.capacity() == initial_capacity);
}
}
int main(int argc, char* argv[]) {
return Catch::Session().run(argc, argv);
}
From this code, Catch2 will generate three test cases that begin by performing the same initial setup (and initial assertion checking), and then go on to each test how a different method of std::vector
affects the vector’s size and capacity.
For more information on the functionality provided by Catch2 and the way it is used, please refer to the library’s reference documentation.
Exercise
Set up Catch2 on one of the example programs provided as part of this course, or a program of your own, and write some test cases with assertions. If you are not using the provided Makefiles, bear in mind that you will need to add -lCatch2
to your compiler flags in order to link to the Catch2 library.
Property-based testing
A time-honored techniques to avoid introducing bugs while modifying software is to set up automated tests which assert that code still does what it’s supposed to do (e.g. produces expected results). But it is not so easy to get this right:
- A test is most useful when it targets a small software component (e.g. single function, class…) because when that fails, it gives you a precise indication of where the problem is, which saves you debugging time. But that means you need to write a lot of tests, which is tedious.
- Because writing tests is tedious, manually written tests tend to be very unimaginative: they always test against the same inputs, do not try many different configurations… and that increases the odds that bugs which only affect more exotic edge cases will be missed.
- The above problem is magnified by the fact that in most scientific projects, the same person writes the code being tested and the test that exercises it: what edge case you missed while writing the code, you are likely to also miss when writing the associated test.
Thankfully, there is a way to make fine-grained tests a little easier to write and much more imaginative, and that is to write tests that operate over random inputs, and assert properties that should be true of any output. This is called property-based testing.
Defining a property
The concept of property-based testing is borrowed from functional programming. To summarize, the idea is no longer to predict the output knowing a particular input, but to determine and test the properties of our function. But what do we mean by property? Mathematically, a property p for all elements of a set X is usually defined as a function p: X → {true, false}. Well, it is not always obvious to determine in practice.
Let’s try to reason on a simple example that takes two numbers and adds them:
def add_numbers(a: int, b: int) -> int:
return a + b
What particular property(ies) characterizes the add_numbers
function?
Looking at the function, we see that it is based around the addition operator.
Unlike subtraction, addition is commutative, so add_numbers(a, b) == add_numbers(b, a)
.
But we could have made a bug by writing the operator *
instead of +
and the test would still be valid…
So what other property distinguishes addition from multiplication?
The neutral element! add_numbers(a, 0) == a
is not satisfied for multiplication.
We have therefore isolated at least two properties which allow us to test our function (its implementation) without having to resort to explicit input/output values (well almost, see later). Finding properties is the real difficulty of this type of test. Here are some properties that often come up in our scientific developments (the list is endless!):
- The invariants (e.g. size of a collection when applying a
map
, or the contents of a collection when sorting it) - Inverses (invertible function)
- Idempotence (applying twice the function left the results identical than applying it once)
- Properties of internal composition laws: associativity, commutativity…
- Structural induction (solve a smaller problem first)
- Hard to prove, easy to verify (think of a maze)
A less trivial example
Let us assume, for example, that you are writing a C++ routine that takes an integer as input and decomposes it into prime factors. It could have the following signature:
std::vector<unsigned> prime_factors(unsigned input);
What can we tell about such a function without looking at the implementation?
- All output integers should be prime numbers.
- The product of all output integers should be equal to
input
.
So we can write a test that, given random unsigned integers, passes them through
prime_factors
and asserts that the result has these two properties:
void assert(bool expectation, std::string message) {
if (!expectation) {
throw std::runtime_error(message);
}
}
bool is_prime(unsigned number) {
for (unsigned i = 2; i <= number / 2; ++i) {
if (number % i == 0) {
return false;
}
}
return true;
}
int main() {
for (int i = 0; i < 100; ++i) {
// NOTE: This does not cover all possible values of "unsigned", in the
// exercises we will use a better randomness source that does.
unsigned input = rand();
std::vector<unsigned> factors = prime_factors(input);
unsigned product = 1;
for (const unsigned factor: factors) {
assert(is_prime(factor), "Returned factor is not a prime number");
product *= factor;
}
assert(product == input, "Factor product is not original input");
}
}
And that, in a nutshell, is the idea of property-based testing. Of course, real world property-based testing frameworks have more features, such as printing out what inputs made the test fails or searching for the simplest input that makes the test fail. But the basic principle is simple: unless you are explicitly testing for a known tricky edge case, write your tests to operate over random inputs and assert things that should be true of any output.
Now that you are hopefully convinced that this is a testing technique worth learning, please choose in which programming language you would like to experiment with it:
Python property-based testing
In Python, one of the most established frameworks for property-based testing is Hypothesis. It has been pre-installed on your user sessions of the practicals platform, along with the pytest framework which provides a convenient way to run your tests.
To demonstrate how Hypothesis works, we will reimplement a few classical linear algebra primitives using Python lists of floating point numbers, as well as other popular methods, hopefully getting some appreciation of all the work that is saved when we use NumPy instead of rolling our own code along the way.
A first example
Let’s start with a simple vector sum implementation (that you documented earlier!):
def vector_sum(vec, vec2):
output = []
for i in range(len(vec)):
output.append(vec[i] + vec2[i])
return output
We can write a property-based test for it using Hypothesis as follows:
from hypothesis import given
from hypothesis.strategies import floats, lists
@given(lists(floats()), lists(floats()))
def test_sum(vec, vec2):
result = vector_sum(vec, vec2)
for i in range(len(result)):
assert result[i] == vec[i] + vec2[i]
Notice how we design the test as a function that takes certain inputs, whose
nature is specified via the @given
decorator.
In any case if we execute the test defined above, it fails:
$ pytest sum.py
============================= test session starts ==============================
platform linux -- Python 3.9.16, pytest-7.2.2, pluggy-1.0.0
rootdir: /tp/homes/tmp_user_1/tests
plugins: hypothesis-6.70.0
collected 1 item
sum.py F [100%]
=================================== FAILURES ===================================
___________________________________ test_sum ___________________________________
@given(lists(floats()), lists(floats()))
> def test_sum(vec, vec2):
sum.py:11:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
sum.py:12: in test_sum
result = vector_sum(vec, vec2)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
vec = [0.0], vec2 = []
def vector_sum(vec, vec2):
output = []
for i in range(len(vec)):
> output.append(vec[i] + vec2[i])
E IndexError: list index out of range
E Falsifying example: test_sum(
E vec=[0.0],
E vec2=[],
E )
sum.py:7: IndexError
=========================== short test summary info ============================
FAILED sum.py::test_sum - IndexError: list index out of range
============================== 1 failed in 1.06s ===============================
What the failure message tells us is that if it is called with one list of
length 1 and one empty list, our vector_sum()
crashes at the line
output.append(vec[i] + vec2[i])
. That’s because as currently written, the code
assumes that both input lists are of the same length (i.e. both input vectors
have the same dimensionality). Which is not true of an arbitrary pair of vectors.
Such assumptions are really easy to make when writing code on a bad day, and they are hard to find because as soon as the assumption is ingrained in your brain, any hand-written test you write is likely to use an input that follows the same assumption. But here, it only took a bit of randomness to expose it. Such is the power of property-based testing.
Now that we found an unexpected input configuration, we need to think about how we want to handle it. Passing in two vectors of different dimensionality looks like a usage error from the caller of the function, so perhaps just crashing with a more self-explanatory error message is enough?
def vector_sum(vec, vec2):
assert len(vec) == len(vec2), "Can't sum vectors of different lengths"
output = []
for i in range(len(vec)):
output.append(vec[i] + vec2[i])
return output
With this change, our test would fail with a nicer error messages, but still
fail. We also need to teach the test that failing with an AssertionError
is
the expected behavior when vectors of different length are passed in. And while
we’re at it, we will also add a check that ensures the output has the same
length as the inputs:
@given(lists(floats()), lists(floats()))
def test_sum(vec, vec2):
try:
result = vector_sum(vec, vec2)
except AssertionError:
# We only except an assertion failure if the length differs
assert len(vec) != len(vec2)
# In that case, the test can't continue: there is no result
return
# If the test passes, then all input and output lengths should match
assert len(vec) == len(vec2)
assert len(vec) == len(result)
# Value check is as before
for i in range(len(result)):
assert result[i] == vec[i] + vec2[i]
Gimme gimme gimme a NaN
And with these changes, we get… a rather different test failure:
$ pytest sum.py
============================= test session starts ==============================
platform linux -- Python 3.9.16, pytest-7.2.2, pluggy-1.0.0
rootdir: /tp/homes/tmp_user_1/tests
plugins: hypothesis-6.70.0
collected 1 item
sum.py F [100%]
=================================== FAILURES ===================================
___________________________________ test_sum ___________________________________
@given(lists(floats()), lists(floats()))
> def test_sum(vec, vec2):
sum.py:12:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
vec = [0.0], vec2 = [nan]
@given(lists(floats()), lists(floats()))
def test_sum(vec, vec2):
try:
result = vector_sum(vec, vec2)
except AssertionError:
# We only except an assertion failure if the length differs
assert len(vec) != len(vec2)
# In that case, the test can't continue: there is no result
return
# If the test passes, then all input and output lengths should match
assert len(vec) == len(vec2)
assert len(vec) == len(result)
# Value check is as before
for i in range(len(result)):
> assert result[i] == vec[i] + vec2[i]
E assert nan == (0.0 + nan)
E Falsifying example: test_sum(
E vec=[0.0],
E vec2=[nan],
E )
sum.py:27: AssertionError
=========================== short test summary info ============================
FAILED sum.py::test_sum - assert nan == (0.0 + nan)
============================== 1 failed in 1.67s ===============================
Here, the problem is that floating point math is complicated (a topic for a
another course). If we claim to
accept any list of floating-point numbers as input, then we must account not
just for “normal” numbers, but also for special numbers that the IEEE-754
standard defines, such as infinities and NaN (Not a Number). Alas, the latter
does not honor normal math identities like x == x
.
Now we need to think about what we want, as there are several reasonable ways to handle this:
- We could check inputs and fail with an error if they cannot be summed with reasonable results. This would be appropriate in projects where we care a lot about not performing meaningless computations (and eventually emitting meaningless results), but checking inputs may be costly if we expect high-dimensional input vectors.
- We could warn users that they cannot expect sane results if the function is
fed with special IEEE-754 numbers. In that case, rather than modifying the
code of the
vector_sum()
function, we would modify its documentation to clarify the situation, then modifytest_sum()
to ignore results on pathological input.
Since we have already shown an example of the first strategy (error-checking in the function being tested), we will now show an example of the second strategy:
import math
from hypothesis import given
from hypothesis.strategies import floats, lists
# Will produce unexpected results on special IEEE-754 numbers: inf, NaN...
def vector_sum(vec, vec2):
assert len(vec) == len(vec2), "Can't sum vectors of different lengths"
output = []
for i in range(len(vec)):
output.append(vec[i] + vec2[i])
return output
@given(lists(floats()), lists(floats()))
def test_sum(vec, vec2):
try:
result = vector_sum(vec, vec2)
except AssertionError:
assert len(vec) != len(vec2)
return
assert len(vec) == len(vec2)
assert len(vec) == len(result)
for i in range(len(result)):
# Do not check results on non-finite numbers (NaN and inf)
if math.isfinite(vec[i]) and math.isfinite(vec2[i]):
assert result[i] == vec[i] + vec2[i]
After this change, the test passes, showing that the vector_sum()
function now
behaves in the way that the test expects it to:
$ pytest sum.py
============================= test session starts ==============================
platform linux -- Python 3.9.16, pytest-7.2.2, pluggy-1.0.0
rootdir: /tp/homes/tmp_user_1/tests
plugins: hypothesis-6.70.0
collected 1 item
sum.py . [100%]
============================== 1 passed in 1.18s ===============================
So, what can we make of all this?
- Testing is not just about finding bugs in the function being tested. It’s also about forming a better understanding of the problem we’re facing, and making conscious choices about what inputs we want to handle and how. A test failure indicates that the test assertions do not accurately reflect how the function being tested works, but this mismatch can be a problem in either the function or the test code.
- Property-based testing is very effective at forcing us to think about uncomfortable edge cases that our brain would rather ignore, like special floating-point numbers.
- By freeing us from inefficient human generation of interesting test inputs, property based testing lets us focus on more interesting questions about how our function should behave, which is what testing should be about.
- Property based testing is not a panacea however. In this example, it is attention to detail that led us to notice that as initially formulated, the test did not check that the output was as long as the inputs. Similarly, if a function has a very narrow edge cases that is only triggered by very specific inputs, random search is unlikely to hit that configuration. In this case, you may need to hint the algorithm or outright force it to go through known-tricky inputs.
Exercise
You will find the above property-based testing example in the ~/robust-programming/tests
subdirectory of your home directory on the practicals server, where it is called sum.py
.
Alongside it, you will find incomplete files for the following three exercises on property-based test:
- Sorting a list:
sort_list.py
- Dot product:
dot.py
- Condorcet method:
condorcet.py
The goal of these exercises is for you to fill them. You will probably not have the time to complete all exercises, so choose based on your level.
Beginner: sorting a list
Let’s suppose we write a function that takes a list of integers as input, and returns a sorted version of the list (yes, I know there are ready-made functions for that…):
def sort_list(alist: list) -> list:
""" sort the input list in a naive way """
if len(alist) <= 1:
return alist
out = []
alist_tmp = alist.copy()
while alist_tmp:
out.append(min(alist_tmp))
alist_tmp.pop(alist_tmp.index(min(alist_tmp)))
return out
Exercise: find and implement properties that characterise the sort function.
Medium: dot product of two vectors
Write a function that computes the dot product of two vectors, defined as lists of floating point numbers as above, then write some tests that assert that it does so correctly. To avoid duplicating the full dot product computation in the test (which is unlikely to help you find any bugs, since the test and the function being tested would share any misunderstanding from your side), try the following strategy:
-
Start by generating a vector where only one component is not zero, and compute the dot product of that vector with itself. The result is the squared norm of the vector, which in this scenario should be very easy to predict. You may find the following test skeleton useful, feel free to check out the Hypothesis documentation to learn more about the various functions that are being used in the
@given
decorator.from hypothesis import given from hypothesis.strategies import floats, integers, just, tuples max_len = 1024 @given( integers(min_value = 1, max_value = max_len).flatmap( lambda length: tuples( floats(), integers(min_value = 0, max_value = length - 1), just(length) ) ) ) def test_norm2_easy(params): coord, idx, length = params assert length >= 1 assert idx < length
-
Now generate two vectors of this sort and compute their dot product. Again, the result should be easy to predict without writing down the full dot product computation in your test. You may want to extract the logic for generating one vector into a reusable building block to avoid copy-and-paste, which you can do as follows:
easy_vector_params = integers(min_value = 1, max_value = max_len).flatmap( lambda length: tuples( floats(), integers(min_value = 0, max_value = length - 1), just(length) ) ) @given(easy_vector_params, easy_vector_params) def test_dot_easy(params1, params2): pass
-
Now generate one arbitrary vector, as done in the vector addition example above, and compute its dot product with itself. This gives you its squared norm, as before. Test that this squared norm is greater than the square of the largest vector component.
-
Finally, generate two arbitrary vectors, and check that their dot product follows the Cauchy-Schwarz inequality:
pow(dot(v1, v2), 2) <= dot(v1, v1) * dot(v2, v2)
.
Note that floating-point rounding errors differ between multiple ways to do the
same computation, so you may need to use approximate comparisons like
math.isclose()
in some of your tests.
Hard: intransivity of the Condorcet method
Condorcet’s method is a voting system in which each voter draws up an ordered list of candidates and the only winner, if there is one, is the candidate who, compared in turn with each of the other candidates, would each time prove to be the preferred candidate.
The curiosity of this system is that there is no guarantee that there will be a candidate who satisfies the victory criterion. In other words, this system is not transitive: if A is better than B, and B is better than C, A is not necessarily better than C. This is Condorcet’s paradox, and we propose to demonstrate it through hypothesis.
Exercise: Write a test function which claims that Condorcet’s system is transitive, and show that this test does not work.
Free practice
You probably have a lot of code on you, so why not try implementing property-based tests in your existing test suites?
Hypothesis, numpy & pandas
You can integrate strategies for Numpy and Pandas into Hypothesis quite naturally: https://hypothesis.readthedocs.io/en/latest/numpy.html
Hypothesis & pytest
There is a Hypothesis module for pytest. Everything is fairly transparent (as soon as you have the two libraries installed, it works by itself), and it allows you to insert your property-based tests naturally into your existing test suites:
- https://hypothesis.readthedocs.io/en/latest/quickstart.html#running-tests
- https://hypothesis.readthedocs.io/en/latest/details.html?highlight=pytest#the-hypothesis-pytest-plugin
Health checks
Hypothesis not only exposes tools for carrying out tests, but also offers features for detecting poor programming practices that can hinder the proper maintenance and execution of code:
- Tests that require overly complex data and slow down execution.
- Tests that do not explore the parameter space sufficiently.
- Faulty recursion strategies (i.e. that diverge too much).
- Tests that have no chance of finishing before the end of the world.
See https://hypothesis.readthedocs.io/en/latest/healthchecks.html for more information.
C++ property-based testing
There are several C++ test frameworks for property-based testing, and a clear winner has yet to emerge. Here we will be using the RapidCheck framework, which comes pre-installed on your user sessions of the practicals platform.
To demonstrate how RapidCheck works, we will reimplement a few classical linear
algebra primitives using std::vector<float>
, hopefully getting some
appreciation of all the work that is saved when we use a linear algebra library
like Armadillo instead of rolling our own code
along the way.
A first example
Let’s start with a simple vector sum implementation:
#include <stdlib.h>
#include <vector>
std::vector<float> vector_sum(const std::vector<float>& vec,
const std::vector<float>& vec2) {
std::vector<float> output;
for (size_t i = 0; i < vec.size(); ++i) {
output.push_back(vec.at(i) + vec2.at(i));
}
return output;
}
We can write a property-based test for it using RapidCheck as follows:
#include <rapidcheck.h>
int main() {
rc::check(
"vector_sum sums vectors",
[](const std::vector<float>& vec, const std::vector<float> &vec2) {
std::vector<float> result = vector_sum(vec, vec2);
for (size_t i = 0; i < result.size(); ++i) {
RC_ASSERT(result[i] == vec[i] + vec2[i]);
}
}
);
}
If the syntax used for the second parameter of the rc::check
function
surprises you, click this text to learn more about it.
This is a lambda expression, a shorthand to quickly define a function that is available since version 2011 of the C++ standard. In this particular case, you could have achieved the same result by defining a separate function like this…
void vector_sum_sums_vectors(const std::vector<float>& vec, const std::vector<float>& vec2) { std::vector<float> result = vector_sum(vec, vec2); for (size_t i = 0; i < result.size(); ++i) { RC_ASSERT(result[i] == vec[i] + vec2[i]); } } int main() { rc::check("vector_sum sums vectors", vector_sum_sums_vectors); }
…but when using frameworks like RapidCheck that rely on you defining a lot of single-use functions, it is nicer to be able to define them right at the only point where they are being used, rather than produce spaghetti code where the reader must constantly jump from one place of the code to the other in order to understand what’s going on.
In any case, if we compile and run the test defined above, it fails:
$ make -j$(nproc) && ./sum.bin
Using configuration: seed=15308558600823943072
- vector_sum sums vectors
Falsifiable after 5 tests and 6 shrinks
std::tuple<std::vector<float>, std::vector<float>>:
([0], [])
Exception thrown with message:
vector::_M_range_check: __n (which is 0) >= this->size() (which is 0)
Some of your RapidCheck properties had failures. To reproduce these, run with:
RC_PARAMS="reproduce=BchdlNGdvJ3XzVXbgMXdtNHI2V2Y09mczB6rcFH8sLH1g+KXxBP7yRNovyV
What this error message tells us, in a fairly roundabout way, is that if our
test is passed one single-element std::vector
and one empty std::vector
, an
exception is thrown because an std::vector
is indexed outside of its useful
range.
If you cannot guess where that’s coming from, you could locate the point of
failure using a debugger like GDB or by adding printouts at strategic places of
the code. Whichever way you go, what you will eventually figure out is that the
problem lies inside of the vector_sum()
function:
std::vector<float> vector_sum(const std::vector<float>& vec,
const std::vector<float>& vec2) {
std::vector<float> output;
// ERROR: This loop has vec.size() iterations...
for (size_t i = 0; i < vec.size(); ++i) {
// ...then it tries to index both vec and vec2 with the corresponding
// values of i. But what if vec2 has less elements then vec?
output.push_back(vec.at(i) + vec2.at(i));
}
return output;
}
Basically, as currently written, the code assumes that the two input vectors,
vec
and vec2
, have the same dimensionality, which may not be the case.
Such assumptions are really easy to make when writing code on a bad day, and they are hard to find because as soon as the assumption is ingrained in your brain, any hand-written test you write is likely to use an input that follows the same assumption. But here, it only took a bit of randomness to expose it. Such is the power of property-based testing.
If you are surprised that this situation leads to an exception being thrown, click this text for more information on what's going on.
The only reason why this out-of-bounds indexing problem was reliably reported with an exception was that I made the choice to use the uncommon
at()
method, rather than the more usual[]
indexing operator. Without this precaution, the program would exhibit undefined behavior, a very problematic situation where program behavior becomes unpredictable because the compiler applies code transformations under the assumption that undefined behavior does not happen.As a result of these incorrect code transformations, a program that exhibits undefined behavior can do basically anything, including but not limited to…
- Crashing with an unclear error message such as “Segmentation fault”.
- Reading invalid data from a memory location it’s not supposed to access, possibly leaking secret user data like passwords to the outside world in the process.
- Changing memory locations it’s not supposed to write to, leading other parts of the program to misbehave in bizarre ways later on.
- Executing unexpected instructions injected by an attacker trying to get into your system.
- Not doing the expected work at all, perhaps jumping to an unexpected stream of instructions somewhere else in the program or perhaps just terminating without having done anything.
- Or seemingly working as expected today, only to fail tomorrow in a weird way.
To make matters worse, the behavior is not fixed but may change depending on which compiler is used, which compiler flags are set, which operating system is used, or what is the particular state of the environment (e.g. value of environment variables, state of other running processes on the same system) at the time where the program is running.
Because of this variability, undefined behavior can be hard to detect, make sense of and correct. Which is why there is a trend in modern programming languages to make it harder to trigger undefined behavior by removing it from all commonly used primitives and only allowing for its possibility in uncommon and specialized situations.
Unfortunately, legacy programming languages like C++ and Fortran cannot follow this trend as it would break too much existing code. This is a good argument for using more modern programming languages like Python or Rust whenever your project allows for it.
And now that this point is clarified, let us go back to fixing our test.
Now that we found an unexpected input configuration, we need to think about how we want to handle it. Passing in two vectors of different dimensionality looks like a usage error from the caller of the function, so perhaps just crashing with a more self-explanatory error message is enough?
#include <stdexcept>
std::vector<float> vector_sum(const std::vector<float>& vec,
const std::vector<float>& vec2) {
// Detect inhomogeneous input configurations
if (vec.size() != vec2.size()) {
throw std::runtime_error(
"vector_sum(): input vectors don't have the same dimension"
);
}
std::vector<float> output;
for (size_t i = 0; i < vec.size(); ++i) {
output.push_back(vec.at(i) + vec2.at(i));
}
return output;
}
With this change, our test would fail with a nicer error message, but still
fail. We also need to teach the test that throwing an std::runtime_error
is
the expected behavior when vectors of different lengths are passed to
vector_sum()
. And while we are at it, we will also add a check that ensures
the output has the same length as the inputs:
int main() {
rc::check(
"vector_sum sums vectors",
[](const std::vector<float>& vec, const std::vector<float> &vec2) {
std::vector<float> result;
try {
result = vector_sum(vec, vec2);
} catch(std::runtime_error&) {
// We only except an assertion failure if the length differs
RC_ASSERT(vec.size() != vec2.size());
// In that case, the test can't continue: there is no result
return;
}
// If the test passes, then all input and output lengths should match
RC_ASSERT(vec.size() == vec2.size());
RC_ASSERT(vec.size() == result.size());
// Rest is as before
for (size_t i = 0; i < result.size(); ++i) {
RC_ASSERT(result[i] == vec[i] + vec2[i]);
}
}
);
}
With that, our test now passes:
$ make -j$(nproc) && ./sum.bin
Using configuration: seed=8215553341845074688
- vector_sum sums vectors
OK, passed 100 tests
Does this mean that our function is correct? Not necessarily in an absolute sense. All it means is that after trying 100 random input configurations, our property-based test harness was not able to find an input configuration that fails the test.
There is, however, an input configuration that RapidCheck did not try, which fails this test. Can you find it? Click this text for the answer!
As Hypothesis, a Python cousin of RapidCheck, correctly points out, this
vector_sum()
version does not pass the test if one of the inputs contains the value NaN, shorthand for Not a Number.NaN is an special floating-point value (or, more accurately, a set of special floating-point values) that is emitted by floating-point functions when asked to perform a meaningless computation, such as dividing 0 by 0 or computing the square root of a negative number.
Because it is just a value in the output dataset, numerical codebases rarely check for its presence, which means that it commonly ends up in output files. Good numerical libraries must therefore be prepared to handle the presence of NaN in their input. Unfortunately, NaN has very nasty properties as far as mathematics go, one of which being that
x == x
is false if x is NaN, which commonly breaks sorting algorithms and tests based on floating-point equality.Here, we could account for the presence of NaN in the test by either moving to bitwise equality or ignoring results where inputs contain non-finite floating point values, documenting that these are not supported in the function’s documentation.
The moral of this story is that it is best not to perform property-based testing using a general-purpose random generator that only produces “normal” numbers, but instead use a specialized generator that gives high probability to edge cases that programmers are unlikely to think about.
Unfortunately, RapidCheck disappoints in this respect, showing the relative immaturity of property-based testing in the C++ ecosystem.
So, what can we make of all this?
- Testing is not just about finding bugs in the function being tested. It’s also about forming a better understanding of the problem we’re facing, and making conscious choices about what inputs we want to handle and how. A test failure indicates that the test assertions do not accurately reflect how the function being tested works, but this mismatch can be a problem in either the function or the test code.
- Property-based testing is very effective at forcing us to think about uncomfortable edge cases that our brain would rather ignore. And by freeing us from inefficient human generation of interesting test inputs, property based testing lets us focus on more interesting questions about how our function should behave, which is what testing should be about.
- Property based testing is not a panacea however. In this example, it is attention to detail that led me to notice that as initially formulated, the test did not check that the output was as long as the inputs. Similarly, if a function has a very narrow edge cases that is only triggered by very specific inputs, random search is unlikely to hit that configuration. In this case, you may need to complement property-based testing with manual testing with handpicked input, using a framework like Catch.
Exercise
You will find the above property-based testing example in the
robust-programming/tests
subdirectory of your home directory on the practicals
server, where it is called sum.cpp
. You can compile it with make -j$(nproc)
.
Alongside it, you will find an incomplete dot.cpp
property-based test. The
goal of this exercise is for you to fill it.
Write a function that computes the dot product of two vectors, defined as lists of floating point numbers as above, then write some tests that assert that it does so correctly. To avoid duplicating the full dot product computation in the test (which is unlikely to help you find any bugs, since the test and the function being tested would share any misunderstanding from your side), try the following strategy:
-
Start by generating a vector where only one component is not zero, and compute the dot product of that vector with itself. The result is the squared norm of the vector, which in this scenario should be very easy to predict. The following test skeleton is provided to help you get started more quickly, feel free to check out the RapidCheck documentation to learn more about the various functions that are being used.
#include <rapidcheck.h> #include <stdlib.h> // Data needed to build a vector where exactly one component is not zero struct SimpleVectorConfig { // Value of the nonzero component float component; // Index of the nonzero component size_t idx; // Length of the vector size_t length; }; // RapidCheck generator that produces valid SimpleVectorConfigs rc::Gen<SimpleVectorConfig> simple_vector_generator() { // Good values of "idx" depend on "length", so we must pick "length" first. // "mapcat" is how we express this idea of random generation where some // generated quantities depend on others in RapidCheck. return rc::gen::mapcat( // Don't let length grow too big to avoid slow tests / RAM exhaustion. rc::gen::inRange<size_t>(1, 1024), [](size_t length) { // RapidCheck syntax for building a generator of structs with // per-member configuration. return rc::gen::build<SimpleVectorConfig>( // "component" value simply uses the standard float generator. rc::gen::set(&SimpleVectorConfig::component), // "index" values must be between 0 and "length", right-exclusive. rc::gen::set(&SimpleVectorConfig::idx, rc::gen::inRange<size_t>(0, length)), // "length" generator re-emits the value already generated above. rc::gen::set(&SimpleVectorConfig::length, rc::gen::just(length)) ); } ); } int main() { rc::check( "single-component squared norm", [&] { SimpleVectorConfig vector_config = *simple_vector_generator(); // TODO: Replace with dot product test RC_ASSERT(vector_config.length >= 1); RC_ASSERT(vector_config.idx < vector_config.length); } ); }
-
Now generate two vectors of this sort and compute their dot product. Again, the result should be easy to predict without writing down the full dot product computation in your test.
-
Now generate one arbitrary vector, as done in the vector addition example above, and compute its dot product with itself. This gives you its squared norm, as before. Test that this squared norm is greater than the square of the largest vector component.
-
Finally, generate two arbitrary vectors, and check that their dot product follows the Cauchy-Schwarz inequality:
pow(dot(v1, v2), 2) <= dot(v1, v1) * dot(v2, v2)
.
Note that floating-point rounding errors differ between multiple ways to do the same computation, so you may need to use approximate comparisons in some of your tests. The following approximate comparison algorithm is suggested:
#include <cmath>
bool is_close(float value,
float reference,
float rel_tol = 1e-6,
float abs_tol = 1e-8) {
return std::abs(value - reference) < std::max(rel_tol * std::abs(reference),
abs_tol);
}
Outro
A bit of demystification
In property-based tests, unlike example-based tests (oracle), we do not specify any particular values: only the type of the input data and the property to be tested. The framework (Hypothesis/RapidCheck) will take care of actually executing the test. Magic you would say to me.
Almost magical. There is still a lot of smoke in most cases. If we take our function, we suspect that we cannot test the whole set of floats… Under the hood, there is therefore a random number generator, which stops at some point given (configurable, but not infinite). So basically I could have generated a set of random integers, written my traditional oracle test, and put a good old for loop in front of it:
import sys
import numpy as np
def add_numbers(a: int, b: int) -> int:
return a + b
n_iteration = 1000
data_in = np.random.randint(-sys.maxsize - 1, sys.maxsize, (n_iteration, 2))
for a, b in data_in:
assert(add_numbers(a, b) == add_numbers(b, a))
Of course, this is possible for simple cases, and there are fortunately a large number of cases where property-based tests do not really have simple equivalents.
Finally, what should I choose for my tests?
Oracle tests or property tests? Our answer is: both! If you are looking for some motivations for using property-based testing, here are a few that are quite relevant:
- Property-based tests are generally more generic than Oracle tests.
- They expose a more concise description of the requirements and expected behavior of our code than oracles tests.
- As a result, a property-based test can replace a large number of oracle tests.
- By delegating input generation to specialized tools, property-based testing will be more efficient in finding implementation problems, particular untreated values…
- Property-based tests require more thinking effort than oracle tests, and therefore they allow for better introspection.
- As a result, code design is often better.
But it is not always easy or relevant to find properties, so we can still rely on oracle tests. It is the combination of both that will make your code even more robust :-) To go further, we invite you to read this quite interesting blog post (based on F# - but the idea is the same).
To finish, and to meditate: programming is not just a matter of lines of code, it is above all conceiving a design that makes it possible to fulfill objectives.
Other forms of randomized testing
As you could see in this chapter, randomizing your tests is a powerful technique to make sure that you do not forget about algorithmic edge cases. Other variations of the concept that you may want to learn about include…
- Fuzz testing aka fuzzing: This is a close cousin of property-based testing which is commonly used in security-critical code, such as complex file format parsers that operate on untrusted data from the Internet. In fuzzing, a specialized test harness constantly feeds your code with randomly mutated versions of a correct input, where mutations are informed by real-time tracking of changes in source code coverage. The goal of the test harness is to make your code crash. If a crash does happen, it indicates that your code is not yet fully robust in the face of unexpected inputs, such as those that could be generated by an attacker.
- Mutating testing: A good test suite should catch involuntary code changes that change program results. Mutation testing verifies this by flipping the idea of property-based testing around and randomly modifying the program under test, making sure that these program changes cause the test suite to fail. If the test suite does not fail, it suggests that there are some aspects of your program that are not yet well covered by the test suite, as random changes in program outputs are not detected by the test suite.
Continuous integration
Let’s take a breath. In this lecture, you learned about how to perform automatic rule checks, how to document your code, and how to write and execute tests.
There is an adage that we should not write more than 10 lines of code per day. It does not seem much, but if at each step you have to manually perform all quality checks, write and deploy the corresponding documentation, and design and run new tests to make sure the newly introduced lines does not break the code, you would probably write less than 10 lines a day.
GitLab CI/CD
What if at each modification, your code were shipped somewhere that has the correct environment to check and execute it, reporting any errors? This would mean that instead of manually performing all checks, you would outsource most of the work to a remote server that will do the boring work on your behalf, leaving you the luxury to focus on more interesting tasks. Crazy? Possible!
In this section, we will learn how to automate a workflow using the GitLab CI/CD (continuous integration/continuous deployment). A workflow can be: checking the code is correctly written (linting), running a test suite, deploying a package on a registry, deploying an online documentation, …
Before starting
Connect to the IN2P3 GitLab (https://gitlab.in2p3.fr) and create a new repository. If you have not yet done so, generate a ssh key, and add it to your account.
Then clone your newly created repository, and create a new empty file called .gitlab-ci.yml
at the root of your repository. Also add the robust-programming
folder.
Your repository should look like:
(robprog) -bash-5.1$ ls -al
drwxr-xr-x. 4 student users 109 Mar 27 12:18 .
drwx------. 10 student users 4096 Mar 27 20:09 ..
drwxr-xr-x. 8 student users 4096 Mar 27 16:33 .git
-rw-r--r--. 1 student users 894 Mar 27 12:18 .gitlab-ci.yml
drwxr-xr-x. 8 student users 154 Mar 27 16:33 robust-programming
Your first pipeline
A CI pipeline is composed of a set of jobs that must all successfully run to completion. Jobs may be independent or depend on each other. The easiest form of dependency management in GitLab CI are pipeline stages. Jobs in a pipeline stage can run in parallel, but each pipeline stage must run to completion before the next stage starts. Let’s implement a first job to apply rule checking to our code. Add the following lines to the file:
default:
image: almalinux:9
stages:
- check
- render
- deploy
The first block specifies the operating system that we want to use in the remote server. You can use any name known from e.g. the DockerHub, and more. The second block defines the different stages of our pipeline (that we will have to define).
Let’s implement the check
stage. Define a new block that contains:
Lint Python:
image: python:3.9
stage: check
before_script:
- |
python -m venv .venv
source .venv/bin/activate
pip install hypothesis flake8
script:
- flake8 practicals/code/tests/sum.py
Lint C++:
stage: check
before_script:
- |
dnf update -y
dnf install -y gcc-c++
dnf install -y clang libubsan libasan clang-tools-extra valgrind
script:
- cd robust-programming/analyzers
- make
- clang-format -i mdbook_snippets.cpp
Lint Python
(or Lint C++
) is a (arbitrary) job name related to the check
stage.
In the case of the Python job, we specify another image than the default one
(you can override any defaults within a job).
The section before_script
is where we install any dependencies.
The section script
is the actual execution of checks. More information on available keywords can be found online.
Save your work, commit, and push to your repository.
Then go to the GitLab interface, select on the left bar: Build
> Pipelines
and observe the progression of your pipeline!
Exercise
Implement the other jobs and stages:
check
: add a job to run tests and report any failures.render
: build your documentation written withmdbook
in a previous section.deploy
: deploy your documentation online using GitLab pages.