Compiling Java for Embedded Systems


< Prev Contents Next >

3. Compiling Java
The core tool of Java implementation is the compiler. This is jc1, a new gcc front-end (see Bibliography for GCC). This has similar structure as existing front-ends (such as cc1plus, the C++ front-end), and shares most of the code with them. The most unusual aspect of jc1 is that its "parser" reads either Java source files or Java bytecode files. (The first release will only directly support bytecodes; parsing Java source will be done by invoking Sun's javac. A future version will provide an integrated Java parser, largely for the sake of compilation speed.) In any case, it is important that jc1 can read bytecodes, for the following reasons:

  • It is the natural way to get declarations of external classes; in this respect, a Java bytecode file is like a C++ pre-compiled header file.
  • It is needed so we can support code produced from other tools that produce Java bytecodes (such as the Kawa Scheme-to-Java-bytecode compiler; see Bibliography for Kawa).
  • Some libraries are (unfortunately) distributed as Java bytecodes without source.

Much of the work of the compiler is the same whether we are compiling from source code or from byte codes. For example emitting code to handle method invocation is the same either way. When we compile from source, we need a parser, semantic analysis, and error-checking. On the other hand, when parsing from bytecodes, we need to extract higher-level information from the lower-level bytecodes, which we will discuss in 3.1 Transforming bytecodes.

3.1 Transforming bytecodes
This section describes how jc1 works.

The executable content of a bytecode file contains a vector of bytecode instructions for each (non-native) method. The bytecodes are instructions for an abstract machine with some local variable registers and a stack used for expression evaluation. (The first few local variables are initialized with the actual parameters.) Each local variable or stack "slot" is a word big enough for either a 32-bit integer, a float, or an object reference (pointer). (Two slots are used for 64-bits doubles and longs.) The slots are not typed; i.e., at one point, a slot might contain an integer value, and, at another point, the same slot might contain an object reference. However, you cannot store an integer in a slot, and then retrieve the same bits re-interpreted as an object reference. Moreover, at any given program point, each slot has a unique type can be determined using static data flow. (The type may be "unassigned", in which case you are not allowed to read the slot's value.) These restrictions are part of Java security model, and are enforced by the Java bytecode verifier. We do a similar analysis in jc1, which lets us know for every program point, the current stack pointer, and the type of every local variable and stack slot.

Internally gcc uses two main representations: The tree representation is at the level of an abstract syntax tree, and is used to represent high-level (fully-typed) expressions, declarations, and types. The rtl (Register Transform Language) form is used to represent instructions, instruction patterns, and machine-level calculations in general. Constant folding is done using trees, but otherwise most optimizations are done at the rtl level.

The basic strategy for translating Java stack-oriented bytecodes is that we create a dummy local variable for each Java local variable or stack slot. These are mapped to gcc "virtual registers," and standard gcc register allocation later assigns each virtual register to a hard register or a stack location. This makes it easy to map each opcode into tree-nodes or rtl to manipulate the virtual registers.

As an example, consider how to compile iadd, which adds the top two ints on the stack. For illustration, assume the stack pointer is 3, and virtual registers 50, 51, and 52 are associated with stack slots 0, 1, and 2. The code generated by jc1 is the following: reg51 := vreg51 + vreg52

Note that the stack exists only at compile-time. There is no stack, stack pointer, or stack operations in the emitted code.

This simple model has some problems, compared to conventional compilers:

  • We would like to do constant folding, which is done at the tree level. However, tree nodes are typed.
  • The simple-minded approach uses lots of virtual registers, and the code generated is very bad. Running the optimizer (with the -O flag) fixes the generated code, but you still get a lot of useless stack slots. It would be nice not to have to run the optimizer, and if you do, not to make unnecessary work for it.
  • The rtl representation is semi-typed, since it distinguishes the various machine "modes" such as pointer, and different-sized integers and floats. This causes problems because a given Java stack slot may have different types at different points in the code.

The last problem we solve by using a separate virtual register for each machine mode. For example, for local variable slot 5 we might use vreg40 when it contains an integer, and vreg41 when it points an object reference. This is safe, because the Java verifier does not allow storing an integer in an object slot and later reusing the same value as an object reference or float.

The other two problems we solve by modeling the stack as a stack of tree nodes, and not storing the results in their "home" virtual registers unless we have to. Thus jc1actually does the following for iadd:

tree arg1 = pop_value (int_type_node);
tree arg2 = pop_value (int_type_node);
push_value (fold (build (PLUS_EXPR, int_type_node, arg1, arg2)));

The build function is the standard gcc function for creating tree-nodes, while fold is the standard function for doing constant folding. The functions pop_value and push_value are specific to jc1, and keep track of which stack location corresponds to which tree node. No code is actually generated yet.

This works for straight-line code (i.e. within a basic block). When we come to a branch or a branch target, we have to flush the stack of tree nodes, and make sure the value of each stack slot gets saved in its "home" virtual register. The stack is usually empty at branches and branch targets, so this does not happen very often. Otherwise, we only emit actual rtl instructions to evaluate the expressions when we get to a side-effecting operation (such as a store or a method call).

Since we only allocate a virtual register when we need one, we are now using fewer virtual registers, which leads to better code. We also get the benefits of gcc constant folding, plus the existing machinery for selecting the right instructions for addition and other expressions.

The result is that we end up generating code using the same mechanisms that the gcc C and C++ front-ends do, and therefore we can expect similar code quality.

3.2 Class meta-data
Compiling the executable content is only part of the problem. The Java run-time also needs an extensive data structure that describes each class with its fields and methods. This is the "meta-data" or "reflective" data for the class. The compiler has to somehow make it possible for the run-time system to correctly initialize the data structures before the compiled classes can be used.

If we are compiling from bytecodes, the compiler can just pass through the meta-data as they are encoded in the class file. (If the run-time supports dynamically loading new classes, it already knows how to read meta-data from a class file.)

It is inconvenient if the meta-data and the compiled code are in different files. The run-time should be able to create its representation of the meta-data without having to go beyond its address space. For example reading in the meta-data from an external file may cause consistency problems, and it may not even be possible for embedded systems.

A possible solution is to emit something like:

static const char FooClassData[] = "\xCa\xfe\xba\xbe...";
static {
LoadClassFromByteArray(FooClassData);
Patch up method to point to code;

The code marked static is compiled into a dummy function that is executed at program startup. This can be handled using whatever mechanism is used to execute C++ static initializers. This dummy function reads the meta-data in external format from FooClassData, creates the internal representation, and enters the class into the class table. It then patches up the method descriptors so that they point to the compiled code.

This works, but it is rather wasteful in terms of memory and startup time. We need space for both the external representation (in FooClassData) and the internal representation, and we have to spend time creating the latter from the former. It is much better if we can have the compiler directly create the internal representation. If we use initialized static data, we can have the compiler statically allocate and initialize the internal data structures. This means the actual initialization needed at run is very little most of it is just entering the meta-data into a global symbol table.

Consider the following example class:

public class Foo extends Bar {
    public int a;
    public int f(int j) { return a+j; }
};

That class compiles into something like the following:

int Foo_f (Foo* this, int j)
{ return this->a + j; }

struct Method Foo_methods[1] = {{
    /* name: */ "f";
    /* code: */ (Code) &Foo_f;
    /* access: */ PUBLIC,
    /* etc */ ...
}};

struct Class Foo_class = {
     /* name: */ "Foo",
     /* num_methods: */ 1,
     /* methods: */ Foo_methods,
     /* super: */ &Bar_class,
     /* num_fields: */ 1,
     /* etc */ ...
};

static {
     RegisterClass (&Foo_class, "Foo");
}

Thus startup is fast, and does not require any dynamic allocation.

3.3 Static references
A class may make references to static fields and methods of another class. If we can assume that the other class will also be jc1-compiled, then jc1 can emit a direct reference to the external static field or method (just like a C++ compiler would). That is, a call to a static method can be compiled as a direct function call. If you want to make a static call from a pre-compiled class to a known class that you do not know is pre-compiled, you can try using extra indirection and a "trampoline" stub that jumps to the correct method. (Not that this feature is very useful: it makes little sense to pre-compile a class without also pre-compiling the other classes that it statically depends on.)

A related problem has to do with string constants. The language specification requires that string literals that are equal will compile into the same String object at run-time. This complicates separate compilation, and it makes it difficult to statically allocate the strings (as in done for C), unless you have a truly unusual linker. So, we compile references to string literals as indirect references to a pointer that gets initialized at class initialization time.

3.4 Linking
The jc1 program creates standard assembler files that are processed by the standard unmodified GNU assembler. The resulting object files can be placed in a dynamic or static library, or linked (together with the run-time system) into an executable, using a standard linker. The only linker extension we really need is support for static initializers, but we already have that (since gcc supports C++).

While we do not need any linker modification, there are some that may be desirable. Here are some ideas.

Java needs a lot of names at run-time for classes, files, local variables, methods, type signatures, and source files. These are represented in the constant pool of .class files as CONSTANT_Utf8 values. Compilation removes the need for many of these names (because it resolves many symbolic field and method references). However, names are still needed for the meta-data. Two different class files may generate two references the same name. It may be desirable to combine them to save space (and to speed up equality tests). That requires linker support. The compiler could place these names in a special section of the object file, and the linker could then combine identical names.

The run-time maintains a global table of all loaded classes, so it can find a class given its name. When most of the classes are statically compiled and allocated, it is reasonable to pre-compute (the static part of) the global class table. One idea is that the linker could build a perfect hash table of the classes in a library or program.


< Prev Contents Next >