An optimizing compiler is one that tries to maximize some attribute(s) of an executable program at the expense of other attribute(s). Usually the goal is to improve performance or code size at the expense of compiler time and the possibility to debug the program at a later stage. Most modern compilers support some sort of optimization. Normally code optimized for performance is the usual preference. In cases where space is a constraint like embedded systems, developers also prefer code optimized for size.

Code optimization is both an art as well as a science. Various compilers use different techniques for optimizing code. Let us discuss a few of them with examples:

  • Copy propagation:
    Consider this code segment:
    A = B
    C = 2.0 + A

    The compiler may change this code to:
    A = B
    C = 2.0 + B

    This is done so that the CPU can run both the instructions in parallel.

  • Removing constants from being calculated during runtime.

Consider this code segment:
Const A = 1.7320
Const B = 1.4140
C = A + B

The compiler may change this code to:

Const A = 1.7320
Const B = 1.4140
Const C = 3.146

This avoids extra calculation during runtime.

  • Dead code removal

Compiler will search for pieces of code which have no effect and will remove them during compilation. For example, variables which are calculated but never used, etc.

Some of these flaws are present in the source code, but are hidden by non-optimizing compilers. Let’s see a few examples:

CVE-2009-1897 kernel: tun/tap: Fix crashes if open() /dev/net/tun and then poll() it

The TUN/TAP driver provides a virtual network device which performs packet tunneling; it's useful in a number of situations, including virtualization, virtual private networks, and more. In normal usage of the TUN driver, a program will open /dev/net/tun, then make an ioctl() call to set up the network endpoints.

The TUN device supports the poll() system call. The beginning of the function implementing this functionality (in 2.6.30) looks like this:

   static unsigned int tun_chr_poll(struct file *file, poll_table * wait)
             struct tun_file *tfile = file->private_data;
             struct tun_struct *tun = __tun_get(tfile);
             struct sock *sk = tun->sk;
             unsigned int mask = 0;
             if (!tun)
                 return POLLERR;

Herbert Xu noticed a problem where a lack of packet accounting could let a hostile application pin down large amounts of kernel memory and generally degrade system performance.

The line of code which has been underlined above was added by Herbert's patch; that is where things begin to go wrong. The code references the pointer tun before it is checked if it is NULL, which is done later in the code. 

GCC will, by default, optimize the NULL test out (referring to “if (!tun)”). The reasoning is that, since the pointer has already been dereferenced (and has not been changed), it cannot be NULL. So there is no point in checking it. This logic makes perfect sense except that in the case of the kernel where NULL might actually be a valid pointer. The default selinux module allowed mapping the zero page, converting this bug into a privilege escalation flaw. This was however later corrected by preventing processes running as unconfined_t from being able to map low memory in the kernel.

The Linux kernel uses GCC’s -fno-delete-null-pointer-checks to disable such optimization.

Issues caused by Dead store removal

Applications often need to read sensitive data from users (like passwords), files (like cryptographic keys), or network. Memory used for this sensitive data needs to be properly scrubbed by overwriting its contents or it may have some security implications.

Attackers typically exploit this type of vulnerability by using a core dump or runtime mechanism to access the memory used by a particular application and recover the secret information. Once an attacker has access to the secret information, it is relatively straightforward to further exploit the system and possibly compromise other resources with which the application interacts. 

Optimizing compilers remove memory overwriting code, when the overwritten memory is not used later in the program. This causes sensitive information to be left in the memory after its usage.

Consider the following code, in which the password is read from the user, some processing is performed back on that and then the code attempts to scrub the password from memory:

(example taken from the OWASP website)

void GetData(char *MFAddr) { 
        char pwd[64];
        if (GetPasswordFromUser(pwd, sizeof(pwd))) {
         if (ConnectToMainframe(MFAddr, pwd)) {
                       // Interaction with mainframe
         memset(pwd, 0, sizeof(pwd));

If the above code is compiled with optimization enabled, then the call to memset will be removed as a dead store, because the buffer is not used after it is overwritten. Since the buffer has sensitive data it may be vulnerable to attack if the password is left resident in the memory.

GCC has the -fno-dse flag to remove this optimization. However, a better choice is to have a subsequent “use” of the memory which will prevent this behavior. Also using volatile asm that references the password should ensure that the instructions are not deleted by GCC.

Lastly, a possible solution to the above problem is suggested by the glibc project. However this was added only in glibc-2.29 and is currently not available with any supported versions of Red Hat Enterprise Linux.

Division by zero

The x86 processor raises an exception when it encounters a division by zero, where as PowerPC and MIPS will silently ignore it. In C division by zero is undefined behaviour and therefore the compiler can assume that the divisor is always non-zero. 

Consider the following kernel code:

msize = 1 / msize;/* provoke a signal */

When compiling with GCC, this code behaves as intended on an x86, but not on a PowerPC, because it will not generate an exception. When compiling with Clang, the result is even more surprising. Clang assumes that the divisor msize must be non-zero—on any system—since otherwise the division is undefined. Combined with this assumption, the zero check !msize becomes always false, since msize cannot be both zero and non-zero. The compiler determines that the whole block of code is unreachable and removes it, which has the unexpected effect of removing the programmer’s original intention of guarding against the case when msize is zero.

Consider another example, this time from PostgreSQL code:

if(arg2 == 0)
        ereport(ERROR, (errcode(ERRCODE_DIVISION_BY_ZERO),
                         errmsg("division by zero")));
/* No overflow is possible */
PG_RETURN_INT32((int32) arg1 / arg2);

When arg2 is zero, then it calls an error reporting function, which never returns to the calling function, therefore guarding against the possible division by zero. However, the programmer failed to inform the compiler that the call to ereport(ERROR,...)does not return. This implies that the division will always execute. Combined with the assumption that the divisor must be non-zero, on some platforms (e.g., Alpha, S/390, and SPARC) GCC moves the division before the zero check arg2 == 0, causing division by zero. 

GCC compiler and optimizations

Various levels of optimizations are provided by the GCC compiler with options to explicitly control various specific behaviours as well. The default optimization level is zero, which provides no optimization at all. This can be explicitly specified with option -O or -O0. 

The purpose of level 1 optimization is to produce an optimized binary in a short interval of time. The second level of optimization performs all other supported optimizations within the given architecture that do not involve a space-speed trade-off, a balance between the two objectives. The third and highest level enables even more optimizations by putting emphasis on speed over size. This includes optimizations enabled at -O2 and rename-register. The optimization inline-functions also is enabled here, which can increase performance but also can drastically increase the size of the object, depending upon the functions that are inlined. 

GCC also allows individual optimization features to be turned on or off via the command line.

A comprehensive list of various optimization features and their uses is available on the GCC website. Also note that in certain cases disabling optimization may make security warnings less effective and also disable source fortification. 


While code optimization is a useful feature of the modern compiler, in some cases it may have certain unwanted side effects. Developers need to understand and be mindful of how their code is being compiled, especially for sections which deal with sensitive data and/or critical sections of code.

About the author

Huzaifa Sidhpurwala is a Principal Product Security Engineer with Red Hat and part of a number of upstream security groups such as Mozilla, LibreOffice, Python, PHP and others. He speaks about security issues at open source conferences, and has been a Fedora contributor for more than 10 years.

Read full bio