OptOut – Compiler Undefined Behavior Optimizations

April 24, 2020

Research by: Eyal Itkin, Gili Yankovitch

Introduction

During 35C3, Gili Yankovitch (@cytingale) and I attended a great talk called: “Memsad – Why Clearing Memory is Hard” (https://media.ccc.de/v/35c3-9788-memsad). In his talk, Ilja van Sprundel presented the difficulties programmers face when trying to wipe a memory area that may contain secrets. This is due to the fact that most of the time, the calls to memset() have an undefined meaning according to the C standard, and therefore they are optimized out during compilation.

Intrigued by this gap between the programmers’ expectations and the compiler’s behavior, we asked if there are additional optimizations like these, beyond the scope of wiping memory? We were both quite familiar with the C standard, and already knew that many C programmers don’t usually follow each and every part of the standard. This led us to suspect that this extended approach would find some interesting results, and so we began our research.

In this blog post, we describe the versions of the tools and open sources we studied. Note – As our research took place approximately a year ago, some of these may not be fully up to date.

Undefined Behavior in C/C++

The C/C++ programming languages seem simple and quite straightforward to most common/embedded developers. Unfortunately, most programmers are not familiar with the in-depth details of the C standard, nor the C++ one. This is a common cause for many security vulnerabilities that can be found in the dark corners of the code. In a previous blog post from 2016, we gave a few examples of Integer-Overflow cases which are flagged as “undefined” in the standard. You can read more about it here.

As seen in this online cpp reference, the standard specifies a list of code classes, one of which is the “Undefined Behavior” class:

There are no restrictions on the behavior of the program. Examples of undefined behavior are memory accesses outside of array bounds, signed integer overflow, null pointer dereference, …, etc. Compilers are not required to diagnose undefined behavior (although many simple situations are diagnosed), and the compiled program is not required to do anything meaningful.

The reference also includes some code examples for a few such undefined behavior (UB) cases.

But what does it all mean? In theory, if compilers detect a code snippet which is undefined, they can do whatever they like: “the compiled program is not required to do anything meaningful.” In practice, compiler writers are relatively conservative, and they only apply code optimizations if the optimization will preserve the true meaning of the code in all defined cases. While compilers won’t actively search for a UB case and change the entire program to the efficient “return 0;” function, they will apply an optimization if it makes sense in all of the standard-defined cases.

One example of such an optimization is found in the following code snippet shown in Figure 1:

Figure 1: Signed integer-overflow UB-based optimizations.

On the left, we see 3 code checks with the signed addition of two operands, and on the right we see the matching assembly instructions, as compiled by Clang x86-64 version 3.9.0 with optimizations flags “-o2”. This example was generated by the always useful Compiler Explorer.

What can we learn about each of the 3 checks?

  1. The first check was removed: a + 100 < a is semantically equivalent to 100 < 0, which always evaluates to false.
  2. The second check was also changed to: b < 0.
  3. Only the third check wasn’t modified by the compiler’s optimizations.

The compiler’s optimization tried to eliminate identical operands from both sides of the comparison, as such an optimization preserves the condition being checked. The only case in which the condition will in fact be changed is if the original addition operation will overflow (be bigger than 2GB). However, signed integer overflow is undefined by the standard, and so the compiler can ignore this edge case and continue on with the optimization.

Back in 2007, a similar optimization led to a wide discussion in regard to GCC’s optimizations. We highly recommend that our audience read the entire discussion.

Now that we understand the nature of UB-based compiler optimizations, it is time to recreate the results from the original research, so we can try and expand them to cover all UB-based optimizations.

Starting to play with GCC

Compiling our own version of GCC wasn’t an easy feat, but we finally got it to work by following this excellent guide. We chose GCC (version 8.2.0) instead of Clang, because the original research introduced a small patch for GCC to print out a warning each time the compiler removes a call to memset(), and it’s easier to expand an existing patch than to recreate everything from scratch.

Figure 2 shows the simple patch for catching opt-out calls to memset():

Figure 2: GCC patch to warn about optimized out calls to memset().

Less than 10 lines of code, and that’s it. Sadly, catching every UB-based optimization demanded way more code changes than this original patch.

After a few hours of toying around with the compiler, we found a neat debug trick: You can run GCC and tell it to dump the code tree after each pass. Just bear in mind that there are tens of dozens of such passes, so going through them isn’t very easy. To test in which pass these optimizations occur, we wrote 3 simple test programs in C, each with a unique integer-overflow UB:

void ub_signed_int_overflow_case_1(int length, int capacity)
{
    if (capacity < 0 || length < 0)    
    {
        printf("Negative\n");
        exit(4);
    }
    /* allocate the buffer */
    while( length > capacity )
    {                         
        capacity *= 2;
        /* EI-DBG: Should get optimized out */
        if (capacity < 0)      
        {                       
            printf("Overflowed\n");
            exit(2);              
        }                                             
    }
    ...
}

Figure 3: MRuby-inspired test for signed multiplication-based integer-overflow UB.

void ub_signed_int_overflow_case_2(int length)
{
    int capacity = 120;
    /* EI-DBG: Should get optimized out */
    if (length + capacity < length)
    {
        printf("Overflowed\n");
    } else
    {
        printf("OK!\n");
    }
}

Figure 4: Classic signed integer-overflow UB in addition.

void ub_signed_int_overflow_case_3(int length)
{
    int capacity = 120;
    if(capacity < 0)
    {
        printf("Negative\n");
        return;
    }
    if(length < 0)
    {
        printf("Negative\n");
        return;
    }
    /* EI-DBG: Should get optimized out */
    if (length + capacity < 0)
    {
        printf("Overflowed\n");
    } else
    {
        printf("OK!\n");
    }
}

Figure 5: Constant-propagation + signed addition-based integer-overflow UB.

Soon enough, we found that the interesting lines are changed in passes that are related to “constant propagation” and “constant folding.”

Initially, we thought that UBSan might be useful in prompting our undefined-behavior tests. However, it turns out that most of the optimizations happen before it kicks into action, and it only reports dynamic violations in the code that survived the optimizations. Not exactly useful.

Debugging GCC’s behavior was way tougher than we initially anticipated, but after we sprayed the code with multiple debug prints, we zoomed in on fold_binary_loc in file fold-const.c. It turns out that the documentation inside the code was misleading, and in fact most of the optimizations happen inside generic_simplify. As if that wasn’t enough, this logic is contained in a generated file that is generated from the patterns inside match.pd.

Now that we (think that we) understood where all of the optimizations happen, we placed plenty of print messages on UB-based decisions throughout the code, and near optimization based decisions. We then wrapped our modified GCC with a Python script to keep track of our messages, and tell us if a code line was optimized because of a UB-based logic in that same line. Surprisingly, we had to fix a few bugs in the line tracking inside GCC, as we initially encountered a few bugs in our script which we traced back to GCC’s code. Time for the results.

Results

After we finalized our GCC patch, we tried to use it to compile a wide variety of open sources, and sadly, most of them were free of UB-based warnings. These are the warnings that we did find:

Libpng 1.2.29: Found a NULL deref near an optimized out check for NULL.

  • The check was optimized out because of the possible deref before it. Bingo.
  • This bug was already fixed in the latest version of the library.

Libtiff Up until 4.0.10 – CVE-2019-14973: Multiple integer overflow checks are optimized out.

  • During our research, these 3 bugs were still valid, but we couldn’t find an actual way to exploit them to something more than a DoS.
  • While writing this blog post, we found this commit from a few months ago. It addressed exactly our results, now labeled CVE-2019-14973, and even mentions it was caused by undefined behavior in C.

All 3 bugs that we found are similar, and look like this:

tmsize_t bytes = nmemb * elem_size;

/*
 * XXX: Check for integer overflow.
 */
if (nmemb && elem_size && bytes / elem_size == nmemb)
    cp = _TIFFrealloc(buffer, bytes);

Figure 6: Integer-overflow check as found in the source of libtiff.

The output of our script, regarding the condition line:

tif_aux.c:70 – overflow based pattern simplification
tif_aux.c:70 – simplification due to constant (variables) propagation (2)
tif_aux.c:70 – gimple_simplified to: if (1 != 0)
tif_aux.c:70 – Folding predicate 1 != 0 to true
tif_aux.c:70 – propagate known values: always true, if (1 != 0)
tif_aux.c:70 – Folded into: if (1 != 0)
tif_aux.c:70 – gimple_simplified to: if (_3 != 0)

In short, it identified the overflow check bytes / elem_size == nmemb as always true and notified us that it folded it out, to the code that can be seen in Figure 7.

if (nmemb && elem_size)
    cp = _TIFFrealloc(buffer, bytes);

Figure 7: Actual code after the compiler’s optimizations.

The reason for this optimization is surprising: tmsize_t is signed. For some unknown reason, the authors of the library decided to give their basic used type the confusing name of tmsize_t, even though, in contrast to the well known type size_t, it is not unsigned.

Conclusions

Our first conclusion is obvious when you think of it: the latest compiler versions contain more optimizations and produce more efficient code. GCC 5.4 failed to optimize the multiplication in libtiff, but GCC 8.2 optimized it out like a charm. Our second conclusion, however, is more interesting.

Although we were quite optimistic when we started this research, we soon realized that our expectations weren’t correlated to the results we received in practice. While we don’t know why there are way more calls to memset() that get optimized out in comparison to other undefined behavior cases, we can still speculate:

Guess #1: It is possible that programmers are more aware of other UB-cases when compared to optimized out calls to memory wiping. Based on our previous experience in code audit and vulnerability research, this isn’t very likely. However, we do know that open sources tend to get compiled with various compilation flags that instruct the compiler to treat signed integer overflow just like unsigned integer overflow. This is one solution for handling code that wasn’t written according to the standard.

Guess #2: Fuzzers. Many open sources were fuzzed to death, and if compilers would introduce an optimization that would “break” the code, fuzzers will find this gap and report it. As fuzzers don’t usually care about memory wiping, this explains why such optimizations went widely unnoticed up until Ilja’s talk in 35C3.

The second guess seems way more likely, and it also means that although useful, our patch for GCC won’t give us a valuable advantage in comparison to the research tools already used.

It was a good lead to research, but sadly for us, it seems that fuzzers killed such bug classes before we reached them. On a more optimistic tone, since fuzzers test the binary level and not the code level, they can’t be fooled by the original intent of the programmer; they test the actual code as produced by the compiler, after it performed all of the optimizations rounds.

When combined with our first conclusion, we advise researchers and programmers alike to fuzz their programs after they compile them with the most updated version of the compiler. It seems that simply upgrading the compiler is enough to find results based on the recent optimizations that this compiler now supports.