|
3. Compiling Java
The core tool of Java implementation is the
compiler. This is jc1, a new gcc front-end (see
Bibliography for GCC). This has similar structure
as existing front-ends (such as cc1plus, the C++
front-end), and shares most of the code with them. The most unusual
aspect of jc1 is that its "parser" reads
either Java source files or Java bytecode files. (The first
release will only directly support bytecodes; parsing Java source will
be done by invoking Sun's javac. A
future version will
provide an integrated Java parser, largely for the sake of compilation
speed.) In any case, it is important that jc1 can read
bytecodes, for the following reasons:
- It is the natural way to get declarations of external classes; in this respect, a Java bytecode file is like a C++ pre-compiled header file.
- It is needed so we can support code produced from other tools that produce Java bytecodes (such as the Kawa Scheme-to-Java-bytecode compiler; see Bibliography for Kawa).
- Some libraries are (unfortunately) distributed as Java bytecodes without source.
Much of the work of the compiler is the same whether we are
compiling from source code or from byte codes. For example emitting
code to handle method invocation is the same either way. When we
compile from source, we need a parser, semantic analysis, and
error-checking. On the other hand, when parsing from bytecodes, we
need to extract higher-level information from the lower-level
bytecodes, which we will discuss in 3.1
Transforming bytecodes.
3.1 Transforming bytecodes
This section describes how jc1 works.
The executable content of a bytecode file contains a vector of
bytecode instructions for each (non-native) method. The bytecodes are
instructions for an abstract machine with some local variable
registers and a stack used for expression evaluation. (The first few
local variables are initialized with the actual parameters.) Each
local variable or stack "slot" is a word big enough for
either a 32-bit integer, a float, or an object reference
(pointer). (Two slots are used for 64-bits doubles and longs.) The
slots are not typed; i.e., at one point, a slot might contain
an integer value, and, at another point, the same slot might contain
an object reference. However, you cannot store an integer in a slot,
and then retrieve the same bits re-interpreted as an object
reference. Moreover, at any given program point, each slot has a
unique type can be determined using static data flow. (The type may be
"unassigned", in which case you are not allowed to read the
slot's value.) These restrictions are part of Java security model,
and are enforced by the Java bytecode verifier. We do a similar
analysis in jc1, which lets us know for
every program
point, the current stack pointer, and the type of every local variable
and stack slot.
Internally gcc uses two main
representations: The tree
representation is at the level of an abstract syntax tree, and is used
to represent high-level (fully-typed) expressions, declarations, and
types. The rtl (Register Transform
Language) form is used
to represent instructions, instruction patterns, and machine-level
calculations in general. Constant folding is done using
trees, but otherwise most optimizations
are done at the
rtl level.
The basic strategy for translating Java stack-oriented bytecodes is
that we create a dummy local variable for each Java local variable or
stack slot. These are mapped to gcc "virtual
registers," and standard gcc register
allocation later
assigns each virtual register to a hard register or a stack
location. This makes it easy to map each opcode into
tree-nodes or rtl to manipulate the virtual
registers.
As an example, consider how to compile iadd, which adds
the top two ints on the stack. For illustration, assume the stack
pointer is 3, and virtual registers 50, 51, and 52 are associated with
stack slots 0, 1, and 2. The code generated by jc1 is the
following:
reg51 := vreg51 + vreg52
Note that the stack exists only at compile-time. There is no stack,
stack pointer, or stack operations in the emitted code.
This simple model has some problems, compared to conventional
compilers:
- We would like to do constant folding, which is done at the tree level. However, tree nodes are typed.
- The simple-minded approach uses lots of virtual registers, and the code generated is very bad. Running the optimizer (with the -O flag) fixes the generated code, but you still get a lot of useless stack slots. It would be nice not to have to run the optimizer, and if you do, not to make unnecessary work for it.
- The rtl representation is semi-typed, since it distinguishes the various machine "modes" such as pointer, and different-sized integers and floats. This causes problems because a given Java stack slot may have different types at different points in the code.
The last problem we solve by using a separate virtual register for
each machine mode. For example, for local variable slot 5 we might use
vreg40 when it contains an integer, and vreg41
when it points an object reference. This is safe, because the Java
verifier does not allow storing an integer in an object slot and later
reusing the same value as an object reference or float.
The other two problems we solve by modeling the stack as a stack of
tree nodes, and not storing the results in their
"home" virtual registers unless we have to. Thus jc1actually
does the following for iadd:
tree arg1 = pop_value (int_type_node);
tree arg2 = pop_value (int_type_node);
push_value (fold (build (PLUS_EXPR, int_type_node, arg1,
arg2)));
The build function is the standard gcc function for
creating tree-nodes, while fold is the standard function
for doing constant folding. The functions pop_value and
push_value are specific to jc1, and keep track
of which stack location corresponds to which tree node. No
code is actually generated yet.
This works for straight-line code (i.e. within a basic block). When
we come to a branch or a branch target, we have to flush the stack of
tree nodes, and make sure the value of each stack slot gets
saved in its "home" virtual register. The stack is usually
empty at branches and branch targets, so this does not happen very
often. Otherwise, we only emit actual rtl instructions to
evaluate the expressions when we get to a side-effecting operation
(such as a store or a method call).
Since we only allocate a virtual register when we need one, we are
now using fewer virtual registers, which leads to better code. We also
get the benefits of gcc constant folding, plus the existing
machinery for selecting the right instructions for addition and other
expressions.
The result is that we end up generating code using the same
mechanisms that the gcc C and C++ front-ends do, and
therefore we can expect similar code quality.
3.2 Class meta-data
Compiling the executable content is only part of the problem. The
Java run-time also needs an extensive data structure that describes
each class with its fields and methods. This is the
"meta-data" or "reflective" data for the
class. The compiler has to somehow make it possible for the run-time
system to correctly initialize the data structures before the compiled
classes can be used.
If we are compiling from bytecodes, the compiler can just pass
through the meta-data as they are encoded in the class file. (If the
run-time supports dynamically loading new classes, it already knows
how to read meta-data from a class file.)
It is inconvenient if the meta-data and the compiled code are in
different files. The run-time should be able to create its
representation of the meta-data without having to go beyond its
address space. For example reading in the meta-data from an external
file may cause consistency problems, and it may not even be possible
for embedded systems.
A possible solution is to emit something like:
static const char FooClassData[] = "\xCa\xfe\xba\xbe...";
static {
LoadClassFromByteArray(FooClassData);
Patch up method to point to code;
The code marked static is compiled into
a dummy function
that is executed at program startup. This can be handled using
whatever mechanism is used to execute C++ static initializers. This
dummy function reads the meta-data in external format from
FooClassData, creates the internal
representation, and
enters the class into the class table. It then patches up the method
descriptors so that they point to the compiled code.
This works, but it is rather wasteful in terms of memory and
startup time. We need space for both the external representation (in
FooClassData) and the internal representation, and we have
to spend time creating the latter from the former. It is much better
if we can have the compiler directly create the internal
representation. If we use initialized static data, we can have the
compiler statically allocate and initialize the internal data
structures. This means the actual initialization needed at run is very
little most of it is just entering the meta-data into a global
symbol table.
Consider the following example class:
public class Foo extends Bar {
public int a;
public int f(int j) { return a+j; }
};
That class compiles into something like the following:
int Foo_f (Foo* this, int j)
{ return this->a + j; }
struct Method Foo_methods[1] = {{
/* name: */ "f";
/* code: */ (Code) &Foo_f;
/* access: */ PUBLIC,
/* etc */ ...
}};
struct Class Foo_class = {
/* name: */ "Foo",
/* num_methods: */ 1,
/* methods: */ Foo_methods,
/* super: */ &Bar_class,
/* num_fields: */ 1,
/* etc */ ...
};
static {
RegisterClass (&Foo_class, "Foo");
}
Thus startup is fast, and does not require any dynamic allocation.
3.3 Static references
A class may make references to static
fields and methods of another class. If we can assume that the other
class will also be jc1-compiled, then
jc1 can emit a direct reference to the
external static field or method (just like
a C++ compiler would). That is, a call to a static method can be compiled as a direct
function call. If you want to make a static call from a pre-compiled
class to a known class that you do not know is pre-compiled,
you can try using extra indirection and a "trampoline" stub
that jumps to the correct method. (Not that this feature is very
useful: it makes little sense to pre-compile a class without also
pre-compiling the other classes that it statically depends on.)
A related problem has to do with string constants. The language
specification requires that string literals that are equal will
compile into the same String object at run-time. This complicates
separate compilation, and it makes it difficult to statically allocate
the strings (as in done for C), unless you have a truly unusual
linker. So, we compile references to string literals as indirect
references to a pointer that gets initialized at class initialization
time.
3.4 Linking
The jc1 program creates standard
assembler files that are processed by the standard unmodified GNU
assembler. The resulting object files can be placed in a dynamic or
static library, or linked (together with the run-time system) into an
executable, using a standard linker. The only linker extension we
really need is support for static initializers, but we already have
that (since gcc supports C++).
While we do not need any linker modification, there are some
that may be desirable. Here are some ideas.
Java needs a lot of names at run-time for classes,
files, local variables, methods, type signatures, and source
files. These are represented in the constant pool of .class files as CONSTANT_Utf8 values. Compilation removes the
need for many of these names (because it resolves many symbolic field
and method references). However, names are still needed for the
meta-data. Two different class files may generate two references the
same name. It may be desirable to combine them to save space (and to
speed up equality tests). That requires linker support. The compiler
could place these names in a special section of the object file, and
the linker could then combine identical names.
The run-time maintains a global table of all loaded classes, so it
can find a class given its name. When most of the classes are
statically compiled and allocated, it is reasonable to pre-compute
(the static part of) the global class table. One idea is that the
linker could build a perfect hash table of the classes in a library or
program.
|