Max Bernstein wrote a blog post [1] on using CReduce [2, 3] for languages other than C. This inspired me to have a stab at applying CReduce to a tool I work with daily at work, VerCors.
CReduce is a tool that minimizes C files that trigger a compiler bug. It does this by iteratively changing the input file and checking if the problem is still there. CReduce does not include any magic routines that can automatically detect compiler bugs. Instead, besides an input file, you also have to supply a wrapper script that calls the compiler. The script should set the exit status to 0 if the bug is still there.
When CReduce has exhausted all its tricks for further reducing a file, and none of those yield a smaller file, CReduce is finished. This obviously does not mean CReduce found the smallest file that reproduces the bug. It is just one possible file that still triggers the bug, one that is hopefully smaller than its input.
The fun part here is that CReduce does not just support minimizing C
programs. With the --not-c
flag you turn off CReduce’s
C-specific file reduction passes. According to Max, you don’t
necessarily need this flag, it just makes file reduction slower than it
needs to be for non-C languages. A soft requirement that is often
mentioned in the context of tools like CReduce is that to get the most
use out of them, the time required to check for the bug should be low.
Since many altered versions of the original input file need to be
tested, obviously this requirement is crucial for reasonably quick
convergence to a small input file. Unfortunately, VerCors does not
fulfill this requirement: verifying a small file from a cold start (cold
JVM, no verification cache) usually takes between 10 and 20 seconds. I
will ignore this limitation for now, and discuss some ideas for how we
can improve this at the end.
We have a tool for our input file (VerCors) and a tool to minimize input files (CReduce). Now we just need a bug to trigger with the input file! Luckily I ran into a small one the other day with the C frontend while Alexander and I were participating in this year’s VerifyThis competition. Essentially, the bug concerns the handling of whitespace in verification annotations.
Consider the following C function called f
:
This function does exactly nothing, from the perspective of a C
compiler. From the perspective of VerCors, the function f
contains a verification annotation on line two. This annotation
states that, on line two, the expression k < n
should be
true. By putting assertions in such verification annotations (often also
called “ghost code” [4]), we can make hidden invariants in
the code explicit, without changing the behaviour of the program for the
C compiler.
In this particular case, the assertion does not actually hold. Do you know why?1
The actual bug occurs once the full capabilities of verification annotations are utilized and combined with regular C syntax. The following code snippet illustrates the problem:
What we intended to write here was that k is in the set of all numbers
between 0 (inclusive) and n (exclusive). Within verification
annotations, VerCors supports a limited set of mathematical operators,
such as the set contains operator \in
and set range
comprehension (with the second bound being exclusive) for sets using the
{a .. b}
syntax.
When we give this snippet to VerCors, we actually get the following error:
[INFO] Start: VerCors (at 21:05:24)
======================================
At vercors-interpreted-3290997030366011973.i
--------------------------------------
7 # 1 "<stdin>" 2
8 void f(int k, int n) {
[-----
9 //@ assert k \in {0.. n};
-----]
10 }
11
--------------------------------------
At t.c
--------------------------------------
1 void f(int k, int n) {
[-----------------------------
2 //@ assert k \in {0.. n};
-----------------------------]
3 }
4
--------------------------------------
Could not find field named n.
======================================
[INFO] Done: VerCors (at 21:05:26, duration: 00:00:01)
As you might guess, adding a space between 0
and
..
fixes the problem. To make the problem extra explicit,
the code snippet can be rephrased as follows, while keeping the bug:
I haven’t done any deep debugging, but I think the problem is related
to that C allows omitting the fractional part of a decimal number.
Essentially, “0.
” is allowed syntax for the fractional
number 0. It seems like this interpretation takes precedence over the
..
of the range syntax in the language allowed in the
verification annotations.
I think, ultimately, this set range comprehension syntax should not be present in the C frontend of VerCors. However, as the specification language allowed within verification annotations is essentially fully shared between all VerCors frontends, it’s a tricky matter to start adding exceptions for one of the frontends.
Anyhow, armed with an example that triggered the bug, finally I could let CReduce loose on it to see if it could reduce it down to a MWE.2
For this I only needed one more ingredient: a wrapper script that would run VerCors on the example file and exits with a 0 status if the bug was still present. CReduce then uses this wrapper script to detect if it minimized the bug away or not. This is the script I used:
#!/usr/bin/env bash
cat /dev/null | \
vercors -q linkedListCReduce.c --skip-backend | \
grep "Could not find field named k."
It is a straightforward script. Essentially, VerCors is executed on
the input file, skipping backend verification passes because we are not
interested in verifying the input. Then, the VerCors output is searched
for the line “Could not find field named k.
”. This
indirectly indicates whether or not the 0..
form is still
present in the output.
There’s only one tidbit that’s a bit strange: the
cat /dev/null
part. We had to add that because somehow
CReduce hangs when I leave VerCors’ stdin open (thanks for figuring that
one out, Alexander!). The weird part is that in the initial “sanity
check” run of CReduce, the wrapper script without cat
works
just fine. It’s only when it starts actually reducing the input file
when something strange happens to stdin
. Probably it’s on
our side, since the VerCors --watch
mode listens to
stdin
. Still, it’s strange this hanging is only triggered
after CReduce’s sanity check.
After that was all set up, we could finally start CReduce. I used the following invocation, some initial output included:
> time creduce --save-temps --n 8 --timeout 30 --not-c ./interesting.bash linkedListCReduce.c
===< 11437 >===
running 8 interestingness tests in parallel
===< pass_blank :: 0 >===
(1.0 %, 2035 bytes)
(1.9 %, 2015 bytes)
===< pass_lines :: 0 >===
(1.0 %, 2035 bytes)
(Interestingness test killed by timeout at 30 seconds.)
(6.1 %, 1929 bytes)
(10.0 %, 1849 bytes)
...
Initially, file size goes down quite a bit, but then it plateaus. I don’t think it got much above 80%. This is the smaller example CReduce ends up with:
struct Node {
struct Node *next;
};
typedef struct Node Node_t;
//@ ghost typedef seq<Node_t*> Nodes;
struct List {
Nodes ns;
};
typedef struct List List_t;
/*@ yields int k;
ensures l->ns[0.. k] == \old(l>ns);
*/
void removeBetter(List_t *l, Node_t *toremove);
This is pretty small I would say: reasonably good for an automated procedure that requires 0 domain knowledge! The only downside is that it took over 30 minutes to get to this point:
________________________________________________________
Executed in 38,19 mins fish external
usr time 380,89 mins 805,00 micros 380,89 mins
sys time 15,78 mins 167,00 micros 15,78 mins
Obviously, that one’s on VerCors…
As a sanity check, I also ran CReduce with the following wrapper script:
Essentially, the script just checks if the 0..
pattern
is still there in the input file. This leads to the following output in
about two seconds:
ns0..
So, clearly CReduce is capable if reducing the file even further, it’s just that VerCors as an oracle is a bit too restrictive and slow to make it effective in practice.
--not-c
flag really necessary?You might consider not using the --not-c
flag for this
experiment. Clearly the input file is written in C, right? So why not?
Our case is more nuanced: the input file is actually the VerCors dialect
of C, in which verification annotations are significant. For example, it
would be reasonable for CReduce to remove all comments from the input
file in an attempt to reduce it, but that is exactly what would make our
bug disappear. I haven’t actually tested if CReduce does that, though; I
would expect so.
To make using CReduce with VerCors practical, first we’d have to
speed up the process. There’s already a flag in VerCors to speed up
subsequent verification runs: the --watch
flag. This
essentially keeps a JVM running that reruns VerCors whenever one of the
input files changes. Hooking this up to CReduce would be some
engineering work, but it sounds feasible.
Finally, on a related note, we have a student, Wander Nauta, working on applying fuzzing to VerCors and related verification tools. It’s roughly related to what I describe in this post, except he uses modern tools and his work is way more in depth compared to this quick post. He should graduate soon, check out his homepage or http://essay.utwente.nl for his thesis if you’re interested.