Unit Propagation - CReduce, meet VerCors

Max Bernstein wrote a blog post [1] on using CReduce [2, 3] for languages other than C. This inspired me to have a stab at applying CReduce to a tool I work with daily at work, VerCors.

CReduce is a tool that minimizes C files that trigger a compiler bug. It does this by iteratively changing the input file and checking if the problem is still there. CReduce does not include any magic routines that can automatically detect compiler bugs. Instead, besides an input file, you also have to supply a wrapper script that calls the compiler. The script should set the exit status to 0 if the bug is still there.

When CReduce has exhausted all its tricks for further reducing a file, and none of those yield a smaller file, CReduce is finished. This obviously does not mean CReduce found the smallest file that reproduces the bug. It is just one possible file that still triggers the bug, one that is hopefully smaller than its input.

The fun part here is that CReduce does not just support minimizing C programs. With the --not-c flag you turn off CReduce’s C-specific file reduction passes. According to Max, you don’t necessarily need this flag, it just makes file reduction slower than it needs to be for non-C languages. A soft requirement that is often mentioned in the context of tools like CReduce is that to get the most use out of them, the time required to check for the bug should be low. Since many altered versions of the original input file need to be tested, obviously this requirement is crucial for reasonably quick convergence to a small input file. Unfortunately, VerCors does not fulfill this requirement: verifying a small file from a cold start (cold JVM, no verification cache) usually takes between 10 and 20 seconds. I will ignore this limitation for now, and discuss some ideas for how we can improve this at the end.

The bug

We have a tool for our input file (VerCors) and a tool to minimize input files (CReduce). Now we just need a bug to trigger with the input file! Luckily I ran into a small one the other day with the C frontend while Alexander and I were participating in this year’s VerifyThis competition. Essentially, the bug concerns the handling of whitespace in verification annotations.

Verification annotations?

void f(int k, int n) {
  //@ assert k < n;
}

This function does exactly nothing, from the perspective of a C compiler. From the perspective of VerCors, the function f contains a verification annotation on line two. This annotation states that, on line two, the expression k < n should be true. By putting assertions in such verification annotations (often also called “ghost code” [4]), we can make hidden invariants in the code explicit, without changing the behaviour of the program for the C compiler.

In this particular case, the assertion does not actually hold. Do you know why?¹

The actual bug

The actual bug occurs once the full capabilities of verification annotations are utilized and combined with regular C syntax. The following code snippet illustrates the problem:

void f(int k, int n) {
  //@ assert k \in {0.. n};
}

What we intended to write here was that k is in the set of all numbers between 0 (inclusive) and n (exclusive). Within verification annotations, VerCors supports a limited set of mathematical operators, such as the set contains operator \in and set range comprehension (with the second bound being exclusive) for sets using the {a .. b} syntax.

[INFO] Start: VerCors (at 21:05:24)
======================================
At vercors-interpreted-3290997030366011973.i
--------------------------------------
    7  # 1 "<stdin>" 2
    8  void f(int k, int n) {
                          [-----
    9    //@ assert k \in {0.. n};
                           -----]
   10  }
   11
--------------------------------------
At t.c
--------------------------------------
    1  void f(int k, int n) {
      [-----------------------------
    2    //@ assert k \in {0.. n};
       -----------------------------]
    3  }
    4
--------------------------------------
Could not find field named n.
======================================
[INFO] Done: VerCors (at 21:05:26, duration: 00:00:01)

As you might guess, adding a space between 0 and .. fixes the problem. To make the problem extra explicit, the code snippet can be rephrased as follows, while keeping the bug:

void f(int k, int n) {
  //@ assert k \in {(0.). n};
}

I haven’t done any deep debugging, but I think the problem is related to that C allows omitting the fractional part of a decimal number. Essentially, “0.” is allowed syntax for the fractional number 0. It seems like this interpretation takes precedence over the .. of the range syntax in the language allowed in the verification annotations.

I think, ultimately, this set range comprehension syntax should not be present in the C frontend of VerCors. However, as the specification language allowed within verification annotations is essentially fully shared between all VerCors frontends, it’s a tricky matter to start adding exceptions for one of the frontends.

Using CReduce

Anyhow, armed with an example that triggered the bug, finally I could let CReduce loose on it to see if it could reduce it down to a MWE.²

The wrapper script

For this I only needed one more ingredient: a wrapper script that would run VerCors on the example file and exits with a 0 status if the bug was still present. CReduce then uses this wrapper script to detect if it minimized the bug away or not. This is the script I used:

#!/usr/bin/env bash

cat /dev/null | \
  vercors -q linkedListCReduce.c --skip-backend | \
  grep "Could not find field named k."

It is a straightforward script. Essentially, VerCors is executed on the input file, skipping backend verification passes because we are not interested in verifying the input. Then, the VerCors output is searched for the line “Could not find field named k.”. This indirectly indicates whether or not the 0.. form is still present in the output.

There’s only one tidbit that’s a bit strange: the cat /dev/null part. We had to add that because somehow CReduce hangs when I leave VerCors’ stdin open (thanks for figuring that one out, Alexander!). The weird part is that in the initial “sanity check” run of CReduce, the wrapper script without cat works just fine. It’s only when it starts actually reducing the input file when something strange happens to stdin. Probably it’s on our side, since the VerCors --watch mode listens to stdin. Still, it’s strange this hanging is only triggered after CReduce’s sanity check.

CReduce result

After that was all set up, we could finally start CReduce. I used the following invocation, some initial output included:

Initially, file size goes down quite a bit, but then it plateaus. I don’t think it got much above 80%. This is the smaller example CReduce ends up with:

struct Node {
  struct Node *next;
};
typedef struct Node Node_t;
//@ ghost typedef seq<Node_t*> Nodes;
struct List {
  Nodes ns;
};
typedef struct List List_t;
/*@ yields int k;
   ensures l->ns[0.. k] == \old(l>ns);
   */
void removeBetter(List_t *l, Node_t *toremove);

This is pretty small I would say: reasonably good for an automated procedure that requires 0 domain knowledge! The only downside is that it took over 30 minutes to get to this point:

#!/usr/bin/env bash

cat linkedListCReduce.c | grep "0\\.\\."

Essentially, the script just checks if the 0.. pattern is still there in the input file. This leads to the following output in about two seconds:

So, clearly CReduce is capable if reducing the file even further, it’s just that VerCors as an oracle is a bit too restrictive and slow to make it effective in practice.

Is the --not-c flag really necessary?

You might consider not using the --not-c flag for this experiment. Clearly the input file is written in C, right? So why not? Our case is more nuanced: the input file is actually the VerCors dialect of C, in which verification annotations are significant. For example, it would be reasonable for CReduce to remove all comments from the input file in an attempt to reduce it, but that is exactly what would make our bug disappear. I haven’t actually tested if CReduce does that, though; I would expect so.

Closing thoughts

To make using CReduce with VerCors practical, first we’d have to speed up the process. There’s already a flag in VerCors to speed up subsequent verification runs: the --watch flag. This essentially keeps a JVM running that reruns VerCors whenever one of the input files changes. Hooking this up to CReduce would be some engineering work, but it sounds feasible.

Finally, on a related note, we have a student, Wander Nauta, working on applying fuzzing to VerCors and related verification tools. It’s roughly related to what I describe in this post, except he uses modern tools and his work is way more in depth compared to this quick post. He should graduate soon, check out his homepage or http://essay.utwente.nl for his thesis if you’re interested.

References

[1]

“You can use C-Reduce for any language.” Accessed: May 19, 2025. [Online]. Available: https://bernsteinbear.com/blog/creduce/

[2]

“CReduce home page.” Accessed: Mar. 31, 2022. [Online]. Available: https://web.archive.org/web/20220331045711/https://embed.cs.utah.edu/creduce/

[3]

J. Regehr, Y. Chen, P. Cuoq, E. Eide, C. Ellison, and X. Yang, “Test-case reduction for C compiler bugs,” in PLDI, 2012. doi: 10.1145/2254064.2254104.

[4]

J.-C. Filliâtre, L. Gondelman, and A. Paskevich, “The spirit of ghost code,” in CAV, 2014. doi: 10.1007/978-3-319-08867-9_1.