No Silver Bullet - Garbage Collection for Java in Embedded Systems


< Prev Contents Next >

The Kaffe conservative garbage collector

During the development phase of the Cygnus GNUPro compiler for the Java language, we've been experimenting with Kaffe, a free VM mainly written by Tim Wilkinson. It features an implementation of the classic three colors conservative scan algorithm. The source code, available from ftp.transvirtual.com, has been widely ported to several platforms and OSs.

We chose Kaffe because it is, by large, the most popular open source VM implementation available on the Internet. Access to the source code and the availability of a JIT made it a convenient initial target for our static native Java compiler. Its conservative garbage collector is able to work in a non cooperative environment and collects objects generated by the bytecode interpreter, the JIT or Java code compiled natively by our GNUPro compiler.

The three colors model
The three colors model first introduced by Dijkstra in [6] is useful to describe incremental or non-incremental tracing garbage collectors. It assigns a color to a heap cell depending on the degree of completion of its collection. At the start of collection, all cells are white. If the cell and its immediate descendants have been visited, it is colored black and won't be reclaimed. A cell entirely visited with the exception of its immediate descendants is colored grey. If the mutator can run in parallel with the collector and changes any pointer on a black cell, it has to change its color back to grey. Consequently, the collector must reconsider a grey cell since its collection was incomplete or changed by the action of a concurrently running mutator.

Under the three colors model, a collector runs until all reachable cells are blackened. White (unvisited) cells are garbage and hence reclaimable.

The source code
The source code for the virtual machine found under kaffevm/ features two files that are relevant to the garbage collection. gc-mem.c implements the GC aware memory allocator and some low level GC operations. gc-incremental.c implements the conservative collector.

Overall architecture
Kaffe distinguishes several allocation needs. On one hand, it has to deal with Java objects: classes, arrays and class instances. On the other hand, it has to allocate memory for different internal data structures that are required by the VM and its components (like the JIT) to function properly. All these pieces of memory need to be garbaged collected and/or finalized differently.

For example, the memory allocated for a class instance may contain references to other objects and is scanned (walked) conservatively, but also may or may not require finalization. Note that the actual implementation that entirely and conservatively walks a class instance memory isn't the most efficient. It could be optimized by the use of the object's class information to distinguish, in the object's memory, references to other objects from non collectable data.

On the other hand, the data structure representing a java.lang.Class object contains a lot of fields. It turns out that only some of them need to be walked (sometimes optionally), and classes are never finalized. Some other data structures, like thread locks or global paths are never garbage collected. Finally, live threads native stacks need to be scanned conservatively, because they may contain references objects.

To deal with these requirements, one of the bookkeeping information Kaffe ties to a garbage collected chunk of memory is a pointer to a particular GC function set. A GC function set is a data structure featuring two function pointers: the first one is the collection routine and defines how the allocated chunk of memory should be walked. The second one specifies an object's finalization routine, or if not a function pointer, it can also be a flag used to mark the memory chunk's special properties, tagging normal, fixed or root objects.

This data structure forms the typedef gcFuncs and several elements of that type are statically defined in gc-incremental.c.

When the garbage collector has to operate on a memory chunk, it retrieves its private GC function set and applies the function that is required to walk or finalize it. The table below summarizes how some of the memory allocations serving particular purposes are linked to a dedicated GC function set:

Allocating garbage collected memory
The main function used to allocate garbage collected memory is gc_malloc(). It allocates a memory block guaranteed to be big enough to fit the requested size by calling gc_heap_malloc(), which belongs to the low level memory allocator. It then ties to that block a pointer to an instance of the gcFuncs data structure. For example, to allocate 52 bytes of garbage collected memory that will represent an object's instance which needs to be finalized, one can write:

void *newObject = gc_malloc (52, &gcFinalizeObject);

The GC function set is also used to determine whether the allocated memory is a GC root in which case it is linked to the rootObjects list. For the allocation of a collectable object, gc_malloc() also sets its color to WHITE and inserts the whole garbage collected page it belongs to in the white list.

The function gc_malloc_fixed() is a short cut to the invocation of gc_malloc() with the gcFixed GC function set as a second argument. There are other functions that can attach or reattach a random piece of memory to the garbage collector. In that case, an indirect root is created and collected separately.

Execution of the garbage collector and finalizer
Created during the initialization phase of Kaffe, the conservative collector runs as a separate daemon thread set to run the gcMan() function. At the same time, the finalizer thread is created with a hook to the finaliserMan() function.

Low level memory allocation
The low level memory allocator has two different behaviors depending on the size of the object is has to allocate. It will try to fit as many small objects of the same size as possible within a piece of memory the size of a page. This page will contain a local gc_block header and a computed amount of blocks of a rounded up size. This number is computed so that each block has somewhere in a special section of the page two private entries. The first one is a pointer to an instance of the gcFuncs data structure. The second private entry is an eight bit unsigned integer which is used as a state flag and also retains color information. A page of garbage collected memory intended to store small objects is organized as:

  • The size of this entire block is typically the system's page size. There are N
    gcFuncs pointers `F', N
    bytes of flag information `f', and N
    `Data' blocks of a given size. For example, on a typical 32 bit system with a page size of 4KB, a gc_block of 48 bytes and 4 bytes pointers, a garbage collected block of memory will be able to store 32 'Data' blocks of 120 bytes, using up to 4048 bytes. On that kind of system object sizes range from 8 bytes to 4032 bytes. They're all multiple of the size of a pointer.

The low level memory allocation function is gc_heap_malloc() which, when called for the first time, initializes the heap by invoking gc_heap_initialize(). This function sets the sztable[] table, whose entries will be later used to round the size of a small allocations (below MAX_SMALL_OBJECT_SIZE bytes) and tell which free list of garbage collected pages should be processed to get a new chunk of that size.

If the size of the block to allocate is found to be big, then gc_large_block() is called to allocates a round number of pages sufficient enough to contain the object and its private collector information. This piece of memory will contain a gc_block header. The state of the memory chunk is set to NORMAL, and its color set to FREE.

Otherwise, in the case of a small block, sztable[] is addressed using the size of the allocation to figure a rounded up size and a free list index. Then, the allocation of the normalized chunk takes a number of steps:

  1. gc_small_block() is called once but may fail. For example, if gc_heap_malloc() is called for the first time, the pointer to the primary free list (gc_prim_freelist) is null and needs to be allocated. Or the system might also be running out of memory. For all those cases, execution continues with the steps 2 and further. Otherwise, the execution goes directly to step 3.
  2. After having eventually performed some garbage collection (if and only if there is a GC thread and some heap already allocated -- which is not the case during the first invocation), gc_small_block() is called again with the same original size. If this fails, the size of the chunk to allocate is changed to the value of the heap allocation size (ALLOC_HEAPSIZE, typically 1MB) and some memory is allocated from the system, invoking gc_system_alloc(). This will get pagealloc() called which in its turn relies, depending on how Kaffe was built, on sbrk(), memalign() or malloc() to perform the real allocation. This block is then inserted in the gc_objecthash hash table and then made a free list available entry.
  3. The routine allocates an object of the rounded-up size. The free list is processed to find a fit for a chunk of the size of gc_pgsize which is usually set to match the size of a system's page. This chunk of memory will be prepared to receive a given amount of small object of the rounded up size. Within that page, blocks of the normalized size are chained into a free list whose head pointer is maintained in the free field of the local gc_block header. Each of these blocks has its state tagged NORMAL and its color set to FREE.
  4. The next available entry of the rounded-up size is extracted from the allocated block, using the free list information contained in its header. Its state is reset to normal and it content zeroed and returned.

Things might take place slightly differently on an embedded system. The sztable[] can computed by advance and stored in ROM. pagealloc() might end up doing something different to obtain a new memory block, depending on the memory management routines supported by the embedded device.

Conservatively guessing the nature of a pointer
A conservative GC has to determine if an address is a valid garbage collected memory location. The function gc_heap_isobject() provides this functionality. Its implementation successively looks for several properties featured by a memory location pointing to an object. By doing so, it reduces the chances of a misidentification that would lead to the unintentional retention of a garbage object and cause a memory leak.

gc_heap_isobject() first tries to retrieve the gc_block associated with the memory location. It then computes the memory location's hash table index, from which it gets an entry in the gc_objecthash array. This linked list of allocated blocks is walked until an address equal to the gc_block of the processed memory location is found. The final test makes sure that the examined address fits within this allocated piece of collectable memory and that it has a color. If one of the test fails during the process, the pointer is not considered to be a piece of garbage collected memory, and the function returns false. Otherwise, true is returned.

Collection routines
There are several functions of the walk family that are used to collect single memory locations or blocks of memory. Some of them are used to specify the collector parts of the GC function sets, while others are used internally.

walkMemory() first colors a memory block to black and inserts the page it belongs to in the black list. It then applies the collector function of this chunk of memory to its entirety.

walkBlock() processes each grey memory chunks of a garbage collected page, setting their color back to black and individually applying their private GC functions.

walkConservative() calls scanConservative(), which chops a memory chunk into pointers and calls markObject() on each one if it's a GC heap objects. This is a function that basically traces references to other objects within an object. In this process, it considers every word found inside an object's memory as a pointer to an other live object. Since it may or may not be the case, it calls gc_heap_isobject() to filter pointers to valid objects, discarding all other address values: pointers to some other memory locations, good-looking integers and any other random array of bits.

walkNull() doesn't perform any collection. It's the collector attached to fixed objects, which are objects that are never collected, like the GC thread.

walkIndirectRoot() is used to collect the indirect roots that are created when a random piece of memory is attached to the garbage collector. It simply calls its own collector for the amount of memory this block represents.

The garbage collection process
There are several possible invocations of the garbage collector. gcMan(), whose code is periodically executed by the GC thread, is an endless loop. After having made other thread waiting upon completion, its calls the function startGC(). This processes all root objects, set their color to white and calls markObject() on them. markObject() first makes sure that the object belongs to garbage collected memory by calling gc_heap_isobject(). The object is then turned grey and the page it belongs to is moved to the grey list.

startGC() ends by calling walkLiveThreads(). This function processes the list of live threads object and calls walkMemory() on each of them. It turns out that for live threads, the private GC routine is walkThread() which calls markObject() on private thread info and then applies walkConservative() on the thread's stack, looking for object's references to be collected.

gcMan() continues by processing the list of grey allocated pages that it eventually obtained from the startGC() pass, and calls the function walkBlock() on them.

The white (garbage) list is processed and objects that need a finalization are marked calling markObject(). Next, potentially new grey objects are processed the same way grey objects were previously processed.

We're reaching the end of the collection process. finishGC() is called to process garbage (white) objects. Objects that need to be finalized are moved to the finalise list and those which don't require finalization but are white are returned to the local free storage of the page they belong to, calling gc_heap_free(). The finalizer thread is then woken up. Finally, black objects are moved back to the white list and marked accordingly.

Another way of invoking the garbage collector is by calling the invokeGC() which basically wakes up the GC thread. This is used to implement native functions like System.gc() or Runtime.gc().

Finalization
The finalizer thread, periodically allowed to run by the GC thread, walks the finalise list and processes each entries (which are pages of garbage collected memory) as follows: the page is first moved to the white list. Then, elements of the page marked to be finalized are individually processed by their own finalizer function.

Incremental garbage collection with Kaffe
The collector is designed for incremental garbage collection by implementing the three colors model; and some code is already in place to handle it. Several problems have to be solved, including the crucial issue of the implementation of software write barriers.


< Prev Contents Next >