A new constant expression interpreter for Clang

23 de novembro de 2022Timm Baeder7 minutos (tempo de leitura)

In this blog post, I explain the upstream work I have been doing on a new constant expression interpreter for Clang.

Constant expressions in C and C++

One of the many techniques modern compilers use to improve the runtime performance of the compiled code is to perform computations at compile time instead of runtime, if possible. Nobody expects to actually see two integer literals being added at runtime if the compiler can easily do it at compile time.

Since C++11, there is a language construct to ensure something is a constant expression, the constexpr keyword. This keyword can be used for variables and functions and makes it possible to require an expression to be evalueable at compile-time. constexpr has been extended with every new version of the C++ standard (and in later versions, we have consteval and constinit as well). And now, even the next version of C is getting similar functionality.

The skeleton of a new interpreter was added by Nandor Licker in 2019 via D64146. However, no work has happened on it since then.

I have been working on improving this new experimental constant interpreter for the last few months. This blog post is meant as a high-level overview of the functionality, and details the progress I've made together with all the helpful reviewers. The individual commits can be found here.

High-level functionality overview

Constant expressions need to be evaluated at compile time for a variety of reasons. Some of them are obvious, for example, if a value is necessary to be known at compile time (think a constant array size). Some aren't as obvious, e.g., the usual test of whether or not something is actually a constant expression is to evaluate it and see if that works.

In Clang's AST classes, there are a few simple accessors to this functionality in the Expr class, which represents all expression types. The member functions all start with Evaluate*, like EvaluateAsInt().

So, for example, the following C++ program is valid, and the array size can be computed at compile time because the getSize() function is constexpr:

c++
constexpr int getSize() {
  return 10;
}
constexpr int intArr[getSize()] = {};
static_assert(intArr[2] == 0);

The constexpr keyword ensures that getSize() can be evaluated at compile time. Without it, the example does not compile. The static_assert() here is what really makes it impossible for either getSize() or intArr to not be constexpr—a static assertion must be evaluated at compile time. Without the static assertion and the constexpr keywords, the example could still work if it was at block scope. In that case, it would result in a variable-length array.

The current constant expression interpreter is located in clang/lib/AST/ExprConstant.cpp. It implements different AST visitors for each expression type it supports, e.g., one for an int result and one for a float result (and a few more). Whenever it evaluates an expression, it decides what visitor to use based on the type of the expression.

The new approach

The new constant interpreter is located in clang/lib/AST/Interp/ . For simple expressions, its approach is quite similar to that of the old one, i.e., it will walk the AST, directly compute a value from that and return it. However, it does all the work in only one AST visitor (or, more accurately, two: one for bytecode generation and one for direct evaluation).

For functions, however, it outputs bytecode. This bytecode can later be interpreted multiple times, and the AST doesn't need to be walked again. The transformation from AST to bytecode results in a representation of the same operations that are better suited for repeated interpretation. As an example, here is the generated bytecode for the above getSize():

getSize 0x615000003500:
frame size: 0
arg size:   0
rvo:        0
this arg:   0
       0    ConstSint32         10
       8    RetSint32                    
      12    NoRet

The output is as simple as expected: Push a constant 10 on the stack and return that. The additional NoRet instruction is just an "end of code" marker. The numbers to the left of the instructions are offset into the char buffer that's used to store the function bytecode.

As mentioned above, the interpreter manages data using its own stack. Instructions like ConstSint32 push a value on the stack and do nothing else. Other instructions like Add or Mul will use the values on the stack, compute a new value and push that onto the stack.s

There are numerous bytecode instructions already, e.g., for local variable management, integer arithmetic, function calls, casts, array access, etc. They are managed via an Opcodes.td file, and code is generated to handle things like disassembly, type safety and argument passing. Explaining all the different bytecode instructions would be boring, so instead, here's a short explanation of how a function like

c++
constexpr int inc(int a) {
  return a + 1;
}
static_assert(inc(5) == 6);

will be compiled and interpreted (both at compile time of the actual program). The program in question is only a tiny bit more complex than the getSize() function we've already seen, but I think the differences are still interesting. The generated bytecode for the inc() function is:

inc 0x615000003500:
frame size: 0
arg size:   8
rvo:        0
this arg:   0
       0    GetParamSint32      0
       8    ConstSint32         1
      16    AddSint32                    
      20    RetSint32                    
      24    NoRet

We do not generate bytecode for the static_assert() calling the function, since that's just a one-off expression. It will, however, call the inc() function, which has bytecode generated for it.

Since inc() has one parameter, the caller will have to put the value on the stack, where the GetParamSint32 instruction will then read it. The 0 parameter to this instruction is simply an offset into the argument stack frame of this function. After interpreting the GetParamSint32 instruction, the stack contents are [5].

The next instruction is ConstSint32 and simply puts a 1 on the stack, so we end up with [5, 1], In this case, the value next to the instruction is the literal value it will push to the stack.

The AddSint32 instruction will pop the topmost two values from the stack, add them and put the result back on the stack. So after it completed, we end up with a stack of [6]. The Add family of instructions may generally print diagnostics if the addition results in an integer overflow, for example.

The last instruction we interpret is the RetSint32, which, as the name implies, will return a signed 32-bit integer. It will pop that integer from the top of the stack, convert it to an APValue (which is what Clang uses to represent the computed value) and return it. Like the other instructions, this will not inspect the stack for what type to return. This is a decision we make when generating the bytecode, and we generate the Ret instruction for the correct value type.

Meaningless performance numbers

I have not spent any time optimizing things, apart from keeping performance in mind when working on the interpreter. So in a few cases, I've added special bytecode instructions to do common operations, like integer increments or decrements.

Making performance comparisons or running benchmarks is rather hard to do usually since real-world applications always need just one more feature to implement before I can even run them.

And since the current interpreter has far more features than the new one (and probably a lot more correct), the comparison isn't fair anyway.

However, for this program:

c++
constexpr int f() {
  int m = 0;
  for (int i = 0; i < 100'000; i++) {
    m = m + 1;
  }
  return m;
}
static_assert(f() == 100'000);

The new interpreter takes 0.9s on my system while the current one takes 25.2s. Which, again, is meaningless, but now you have a number.

Note that the optimization level the example is built with does not affect the performance of the constant expression interpreter or the generated bytecode. How fast the interpreter does its job is only affected by the optimizations that were applied when building the host compiler. For the numbers above, I've used the same unoptimized debug builds of Clang.

Testing and current status

You can test the new interpreter by compiling any example program with -fexperimental-new-constant-interpreter, for example, here on Godbolt.

Because both interpreters coexist in the Clang code base at the moment, it's rather easy to write test cases that exercise new functionality in the new interpreter and pass -fexperimental-new-constant-interpreter. And that is what all the test cases in clang/test/AST/Interp/ do. As an additional precaution, they also compare the results with those of the current interpreter.

However, I have increasingly started to enable the new interpreter in existing test cases in clang/test/SemaCXX. Going forward, I hope to do that more and get lots of testing from existing test cases for free.

As for the current status, there is no public checklist, but again, you can check what works using a recent Clang build and -fexperimental-new-constant-interpreter. However, I have around 20 local commits at any point in time, and those are just waiting to be reviewed and pushed.

In terms of implemented or missing features, the new interpreter is currently a mixed bag. It can use struct and class types with multiple base types, but cannot perform a simple integer increment via the ++ operator. It can do while, do-while and for loops, but it doesn't understand any floating-point values. Multi-dimensional arrays work, but simple bit shift operations do not. And then, there are all the FIXME comments in the newly added tests.

So, lots of work and interesting things ahead!

Acknowledgments

I'd like to thank all the usual reviewers that criticize and improve my patches: Aaron Ballman, Erich Keane, Tom Honermann and Shafik Yaghmour. All the feedback I've received on the "Introduce support for floating-point numbers" has been incredibly valuable, so thanks to all the floating-point specialists, especially Serge Pavlov.

And, of course, I would've never started to look into this if Nandor Licker hadn't laid all the excellent groundwork.