ProductsDesktop Server For Scientific Computing For IBM POWER For IBM System z For SAP Business Applications Red Hat Network Satellite ManagementExtended Update Support High Availability High Performance Network Load Balancer Resilient Storage Scalable File System Smart Management Extended Lifecycle SupportWeb Server Developer Studio Portfolio Edition JBoss Operations Network FuseSource Integration Products Web Framework Kit Application Platform Data Grid Portal Platform SOA Platform Business Rules Management System (BRMS) Data Services Platform Messaging JBoss Community or JBoss enterprise
SolutionsApplication development Business process management Enterprise application integration Interoperability Operational efficiency Security VirtualizationMigrate to Red Hat Enterprise Linux Systems management Upgrading to Red Hat Enterprise Linux JBoss Enterprise Middleware IBM AIX to Red Hat Enterprise Linux HP-UX to Red Hat Enterprise Linux Solaris to Red Hat Enterprise Linux UNIX to Red Hat Enterprise Linux Start a conversation with Red Hat Migration services
TrainingPopular and new courses JBoss Middleware Administration curriculum Core System Administration curriculum JBoss Middleware Development curriculum Advanced System Administration curriculum Linux Development curriculum Cloud Computing and Virtualization curriculum
ConsultingStandard Operating Environment (SOE) Strategic Migration Planning Service-oriented architecture (SOA) Enterprise Data Solutions Business Process Management
Issue #2 December 2004
- Better Living Through RPM, Part 2
- How Red Hat Got Its Name
- Red Hat Summit: Bringing the Heat to the Big Easy
- Imagine Choice
- Improving Usability: Principles and Steps for Better Software
- Geek Giving Guide
- From Source to Binary: The Inner Workings of GCC
- Configuring Devices with udev
- Tux Paint: Mousing Your Way to a Masterpiece
- Unlimited Anytime Minutes: GnomeMeeting, the Softphone
From the Inside
In each Issue
- Editor's Blog
- Red Hat Speaks
- Ask Shadowman
- Tips & Tricks
- Fedora Status Report
- Magazine Archive
From Source to Binary: The Inner Workings of GCC
by Diego Novillo
- The Compilation Process
- Internal Organization of GCC
- Overcoming Architectural Limitations
- Tree SSA
- About the Author
Computers are very complex and, at the same time, very dumb objects. They need to be told in excruciating detail exactly what to do. Even worse, they only understand sequences of bits (1s and 0s), so for instance, simple operations like adding two integer numbers may require a sequence of about 48 bits on a modern Pentium processor. Imagine what it would take to implement an Operating System.
Needless to say, we don't usually program computers using strings of bits. Instead, we write all the instructions in a language we can understand and then have it translated into the appropriate sequence of 1s and 0s that the computer can understand. The concept of translation in the computer world is pervasive. Everything you tell the computer to do needs to go through a translation process. Sometimes this translation is done on-the-fly (for instance, Perl scripts and spreadsheets) in a process called interpretation. Other times the translation is done once before the program starts (for instance, C or C++ applications) in a process called compilation. There are also hybrid approaches that are becoming popular which mix interpretation and compilation in a process called just-in-time compilation (JIT).
In this article, we explore how compilers work internally by examining GCC, the GNU Compiler Collection, which has become one of the most popular compilers in the UNIX world. We will begin by describing the anatomy of a generic compiler and continue with an overview of GCC's internal organization. Finally, we will describe recent architectural advances in GCC's core infrastructure that are paving the way for more powerful features that were difficult or impossible to implement in earlier versions of GCC.
Note that this is intended to be a very high-level overview. Compilers are fairly complex software systems (GCC consists of about 2.1 million lines of code and has been in development for over 15 years) and we will gloss over many of the gory details. If you are interested in GCC or compilers in general, please refer to the links listed in the section called “Bibliography”.
The Compilation Process
A compiler takes a program written in one language (source code) and transforms it into a semantically equivalent program in another language (object code). The best way of thinking about a compiler is that of a pipeline that gradually converts the program into different intermediate shapes until it reaches the final object code.
There are three major components to this pipeline. The Front End, which reads and validates the syntax of the input program. The Middle End, which analyzes and transforms the program into a more efficient form. Finally, the Back End is responsible for doing the final mapping into object code.
Computers are not very good at dealing with free form text, so the front end takes away all the spacing, comments and other formatting and converts the original program into a more concise form called intermediate representation (IR). These representations are nothing more than internal data structures that are much easier to manipulate than the original stream of text.
In general, compilers manipulate more than a single representation for the
program. Initially, the Front End parses the input program and builds a data
structure called Abstract Syntax Trees (AST), which
represent each statement in a hierarchical structure. For instance, given the
x = y - z * 3, its corresponding AST
is shown in Figure 1, “Abstract Syntax Tree for x = y - z *
The Front End is responsible for validating the syntactical structure of the input program, emitting most diagnostics about language conformance, creating internal data structures for data types and variables declared in the program, debugging information like file names and line numbers, and building the initial AST representation.
With the intermediate representation built by the front end, the middle end proceeds to analyze and transform the program. All the transformations done here and in the back end usually have two goals: (a) make the object code run as fast as possible (performance optimizations), or (b) make the object code take as little space as possible (space optimizations). These optimizations are typically machine and target independent.
The Front End knows very little about what the program actually does. Optimization is possible when the compiler understands the flow of control in the program (control-flow analysis) and how the data is transformed as the program executes (data-flow analysis). Analysis of the control and data flow of the program allows the compiler to improve the runtime performance of the code. Many different optimizations are possible once the compiler understands the control and data flow of the program. The following are a few of the most popular optimization techniques used in standard optimizing compilers:
- Algebraic simplifications
Expressions are simplified using algebraic properties of their
operators and operands. For instance,
i + 1 - iis converted to
1. Other properties like associativity, commutativity, and distributivity are also used to simplify expressions.
- Constant folding
Expressions for which all operands are constant can be evaluated
at compile time and replaced with their values. For instance, the
a = 4 + 3 - 8can be replaced with
a = -1. This optimization yields best results when combined with constant propagation (see below).
- Redundancy elimination
There are several techniques that deal with the elimination of
redundant computations. Some of the more common ones include:
- Loop-invariant code motion
- Computations inside loops that produce the same result for every iteration are moved outside the loop.
- Common sub-expression elimination
- If an expression is computed more than once on a specific execution path and its operands are never modified, the repeated computations are replaced with the result computed in the first one.
- Partial redundancy elimination
- A computation is partially redundant if some execution path computes the expression more than once. This optimization adds and removes computations from execution paths to minimize the number of redundant computations in the program. It encompasses the effects of loop-invariant code motion and common sub-expression elimination.
A final translation phase produces machine code for the target architecture. At this stage, the compiler needs to have very detailed knowledge about the hardware where the program will run. This means that the intermediate representation of the program is usually converted into a form resembling machine language. In this form, it is possible to apply transformations that take advantage of the target architecture. For instance:
- Register allocation
- Tries to maximize the amount of program variables that are assigned to hardware registers instead of memory (registers are orders of magnitude faster than main memory).
- Code scheduling
- Takes advantage of the super-scalar features of modern processors to rearrange instructions so that multiple instructions are in different stages of execution simultaneously.
Even after final code generation, the resulting code is typically not ready to run as-is. Some compilers (GCC among them) generate assembly code which is then fed into the assembler for object code generation. After object code has been generated, the linker is responsible for collecting all the different object files and libraries needed to build the final executable.
Internal Organization of GCC
GCC is designed around two different IRs. One called tree, which is essentially the abstract syntax trees we discussed previously. The other IR is called RTL (Register Transfer Language), is used by GCC to do optimizations and code generation. The following diagram shows an overview of the compilation process in GCC.
Notice how the Front and Middle ends are sort of fused together. This is because, GCC does not operate on trees all that much. Trees are mostly used as a stepping stone in the generation of RTL. Until recently, the only operation done on trees was function inlining. Most everything else was relegated to the RTL optimizer. In a way, this makes sense because from the diagram, you will notice that each language Front End generates its own version of trees. The C parser generates C trees, the C++ parser generates C++ trees, etc. Each version is different in its own way, so analyzes and optimizations on trees would require N different implementations, one for each front end. That is hardly scalable, so all the optimization work is done in RTL.
RTL can be thought of as an assembly language for a machine with an infinite number of registers. Being a low-level representation, RTL works well for optimizations that are close to the target (for example, register allocation, delay slot optimizations, peepholes, etc). Furthermore, GCC offers a fairly flexible way for developers to describe their target machine when porting GCC to a new architecture. In fact, when it was initially implemented, GCC would generate RTL almost immediately, so it's only natural that most of the compiler is implemented around it.
However, some analyzes and transformations need higher level information about the program that is not possible (or very difficult) to obtain from RTL (for example, array references, data types, references to program variables, control flow structures). In general, too many target features are exposed in RTL. For instance, if the target cannot handle more than 32-bit quantities in a register, the RTL representation splits values bigger than 32 bits into multiple values. Also, things like function calling conventions are exposed in detail in RTL, so a single function call may span multiple RTL instructions. All these details make it difficult to implement analyzes and optimizations that don't really need to concern themselves with target details. Over time, some of these transformations have been implemented in RTL, but the end result is code that is excessively convoluted, hard to maintain, and error prone.
Overcoming Architectural Limitations
Since the existing optimization framework in GCC was not flexible enough to implement several improvements we had in mind, Red Hat sponsored a project to overhaul GCC's architecture. We considered several approaches and settled on the idea of starting with the tree representation. Trees contain detailed information about data types, variables, and control structures of the original program. Optimizing the tree representation in GCC is appealing because (a) it facilitates the implementation of new analyzes and optimizations closer to the source and (b) it simplifies the work of the RTL optimizers, potentially speeding up the compilation process or improving the generated code.
This new infrastructure represents one of the biggest changes GCC has experienced in recent years. It took approximately 3 years to complete the basic functionality with the contributions of more than 30 developers from the GCC community. The framework is based on a compiler formalism called Static Single Assignment (SSA), which is a representation used internally by the compiler to keep track of how data flows through the program. This information is essential for the compiler to guide its optimization decisions. Given that the framework was implemented on top of the Tree data structure and uses SSA for data flow analysis, we named it Tree SSA.
Tree SSAAlthough GCC parse trees provide very detailed information about the original program, they are not suitable for optimization for two main reasons:
- Lack of a common representation
- There is no single tree representation shared by all the front ends. Notice in Figure 2, “Existing Compilation Process in GCC” how every front end generates its own flavor of trees. A tree optimizer would have to be implemented once for every Front End supported by GCC. This would be a maintenance nightmare and would create a significant burden on every new Front End.
- Structural complexity
- The parse trees generated by the Front End are an almost exact representation of the original input program. As such, expressions can be combined in almost infinite ways (if you've ever seen an obfuscated C program, you know what I mean). This flexibility is very convenient for humans, because of its expressive power. However, compilers, being the simple-minded automata they are, cannot easily cope with them. To overcome these limitations, we have introduced two new tree-based intermediate representations called GENERIC and GIMPLE. GENERIC addresses the lack of a common tree representation among the various front ends. GIMPLE solves the complexity problems that facilitate the discovery of data and control flow in the program.
GENERIC and GIMPLE
Although some Front Ends share the same tree representation, there is no single representation used by all GCC Front Ends. Every Front End is responsible from translating its own parse trees directly into RTL. To address this problem, we have introduced a new representation, named GENERIC, that is merely a superset of all the existing tree representations in GCC. Instead of generating RTL, every Front End is responsible for converting its own parse trees into GENERIC trees. Since GENERIC trees are language-independent, all the semantics of the input language must be explicitly described by each Front End. Some of these semantics are translated in the "genericize" pass, while others are translated in the "gimplification" pass.
The conversion to GENERIC removes language dependencies from
the program representation. However, GENERIC trees still are
structurally complex, so the next step is to break them down into
more manageable pieces. This new representation, called GIMPLE, is
lexically identical to GENERIC but it imposes structural restrictions
on statements. For instance, no expression can contain more than 3
operands, arguments to a function call can only be variables (i.e.,
function calls cannot be nested), the predicate of an
if statement can only contain a
single comparison, etc. This form was derived from the SIMPLE
representation used by McGill University's McCAT compiler .
The program in the section called “GENERIC and GIMPLE” illustrates the differences between GENERIC and GIMPLE. Notice how the two versions are the same program, but the GIMPLE form of the program is more concise (albeit longer). This is the key property that makes things easier to the optimizer because it can make a variety of simplifying assumptions that, in the end, make for a compiler that is easier to maintain and improve.
GENERIC GIMPLE if (foo (a + b, c)) t1 = a + b c = b++ / a t2 = foo (t1, c) endif if (t2 != 0) return c t3 = b b = b + 1 c = t3 / a endif return c
Static Single Assignment (SSA)
The Static Single Assignment form is a representation that is laid on top of GIMPLE that exposes very explicitly the flow of data within the program. The central idea is that of versioning. Every time a variable X is assigned a new value, the compiler creates a new version of X and the next time that variable X is used, the compiler looks up the latest version of X and uses that. Notice that this representation is completely internal to the compiler, it is not something that shows up in the generated code nor could be observed by the debugger.
Take, for instance, the program in Figure 5, “Static Single Assignment Form of a Program”. The left side is the original GIMPLE program and the right side shows the same program in SSA form.
GIMPLE program SSA form 1 a = 3 a_1 = 3 2 b = 9 b_2 = 9 3 c = a + b c_3 = a_1 + b_2 4 a = b + 1 a_4 = b_2 + 21 5 d = a + c d_5 = a_4 + c_3 6 return d return d_5
Notice that every assignment generates a new version number for the
variable being modified. And every time a variable is used inside an
expression, it always uses the latest version. So, the use of
a in line 3 is modified to
a_1, while the use of
a in line 5 uses
So, why is this advantageous to the optimizer? Consider, for
instance, a very common optimization called constant
propagation. In this optimization, the compiler tries to
compute at compile time as many expressions as possible using constant
values. Since the program is in SSA form and all the variables have
version numbers, the compiler can simply build arrays indexed by
version number to keep track of constant values. Suppose we had an
CST for this purpose. In this
CST would hold
CST would hold
9. So, when the compiler examines
the statement in line 3,
c_3 = a_1 +
b_2, it can deduce that
c_3 must always be
12 (that is, it stores
CST). After visiting all the
statements in this fashion, we end up with
1 a_1 = 3 2 b_2 = 9 3 c_3 = 12 4 a_4 = 30 5 d_5 = 42 6 return 42
And now, after having propagated all the constants, we find out that
none of the statements in lines 1 through 5 are really useful at all.
This program merely computes and returns the value
Now, what happens when it is not possible to determine what the latest version of a variable is? Real programs are seldom straight line code, there are loops and conditionals that alter the flow of control. For instance, in the following program it may be impossible for the compiler to determine what branch of the conditional will be taken at runtime:
x = foo () if (x > 10) a = 92 else a = 23 endif return a
When the compiler tries to convert the program above into SSA form, it
does not know which of the two versions for variable
a to use. To solve this ambiguity,
the compiler creates a third version for variable
a that is the
merge of the two conflicting versions. This
merge operation is known as a PHI function:
x_1 = foo () if (x_1 > 10) a_2 = 92 else a_3 = 23 endif a_4 = PHI (a_2, a_3) return a_4
Since the compiler does not know in advance which of
a_3 will be returned, it creates
a_4 as the merge of the two. This
indicates to the optimizers that
be one of the other two versions, but we don't know which.
Though largely invisible to the end user, a compiler is one of the most important components in the system. As the popularity of Linux grows, so does the need of having a strong compiler. Recognizing this importance, we started the Tree SSA project to modernize GCC's internals.
Tree SSA provides a new optimization framework to make it
possible for GCC to implement high-level analyzes and
optimizations. The initial implementation will be available with
the release of GCC 4.0. Though the framework is still in active
development, the next release will include several new features
including pointer and array bound checking
-fmudflap), support for Fortran
90, auto vectorization, various high-level loop transformations
and most traditional scalar optimization passes.
By basing all the analyzes and transformations on widely known published algorithms, we aim to improve our ability to maintain and add new features to GCC. Furthermore, the use of standard techniques will encourage external participation from groups in the compiler community that are not necessarily familiar with GCC.
About the Author
 R. Cytron, J. Ferrante, B. Rosen, M. Wegman, and K. Zadeck. Efficiently Computing Static Single Assignment Form and the Control Dependence Graph. ACM Transactions on Programming Languages and Systems, 13(4): 451-490, October 1991.
 L. Hendren, C. Donawa, M. Emami, G. Gao, Justiani, B. Sridharan, "Designing the McCAT Compiler Based on a Family of Structured Intermediate Representations", Proceedings of the 5th International Workshop on Languages and Compilers for Parallel Computing, Lecture Notes in Computer Science, no. 457, Springer-Verlag.
 Robert Morgan. Building an Optimizing Compiler, Butterworth-Heinemann, 1998.