The Kaffe conservative garbage collector
During the development phase of the Cygnus GNUPro compiler for the
Java language, we've been experimenting with Kaffe, a free VM mainly
written by Tim Wilkinson. It features an implementation of the
classic three colors conservative scan algorithm. The source code,
available from ftp.transvirtual.com, has been widely ported
to several platforms and OSs.
We chose Kaffe because it is, by large, the most popular open source
VM implementation available on the Internet. Access to the source code
and the availability of a JIT made it a convenient initial target for
our static native Java compiler. Its conservative garbage collector is
able to work in a non cooperative environment and collects objects
generated by the bytecode interpreter, the JIT or Java code compiled
natively by our GNUPro compiler.
The three colors model
The three colors model first introduced by Dijkstra in [6] is useful
to describe incremental or non-incremental tracing garbage
collectors. It assigns a color to a heap cell depending on the degree
of completion of its collection. At the start of collection, all cells
are white. If the cell and its immediate descendants have been
visited, it is colored black and won't be reclaimed. A cell entirely
visited with the exception of its immediate descendants is colored
grey. If the mutator can run in parallel with the collector and
changes any pointer on a black cell, it has to change its color back
to grey. Consequently, the collector must reconsider a grey cell since
its collection was incomplete or changed by the action of a
concurrently running mutator.
Under the three colors model, a collector runs until all reachable
cells are blackened. White (unvisited) cells are garbage and hence
reclaimable.
The source code
The source code for the virtual machine found under kaffevm/
features two files that are relevant to the garbage collection.
gc-mem.c implements the GC aware memory allocator and some
low level GC operations. gc-incremental.c implements the
conservative collector.
Overall architecture
Kaffe distinguishes several allocation needs. On one hand, it has to
deal with Java objects: classes, arrays and class instances. On the
other hand, it has to allocate memory for different internal data
structures that are required by the VM and its components (like the
JIT) to function properly. All these pieces of memory need to be
garbaged collected and/or finalized differently.
For example, the memory allocated for a class instance may contain
references to other objects and is scanned (walked) conservatively,
but also may or may not require finalization. Note that the actual
implementation that entirely and conservatively walks a class instance
memory isn't the most efficient. It could be optimized by the use of
the object's class information to distinguish, in the object's memory,
references to other objects from non collectable data.
On the other hand, the data structure representing a
java.lang.Class object contains a lot of fields. It turns out
that only some of them need to be walked (sometimes optionally), and
classes are never finalized. Some other data structures, like thread
locks or global paths are never garbage collected. Finally, live
threads native stacks need to be scanned conservatively, because they
may contain references objects.
To deal with these requirements, one of the bookkeeping information
Kaffe ties to a garbage collected chunk of memory is a pointer to a
particular GC function set. A GC function set is a data structure
featuring two function pointers: the first one is the collection
routine and defines how the allocated chunk of memory should be
walked. The second one specifies an object's finalization routine, or
if not a function pointer, it can also be a flag used to mark the
memory chunk's special properties, tagging normal, fixed or root
objects.
This data structure forms the typedef gcFuncs and several
elements of that type are statically defined in
gc-incremental.c.
When the garbage collector has to operate on a memory chunk, it
retrieves its private GC function set and applies the function that is
required to walk or finalize it. The table below summarizes how some
of the memory allocations serving particular purposes are linked to a
dedicated GC function set:
Allocating garbage collected memory
The main function used to allocate garbage collected memory is
gc_malloc(). It allocates a memory block guaranteed to be big
enough to fit the requested size by calling gc_heap_malloc(),
which belongs to the low level memory allocator. It then ties to that
block a pointer to an instance of the gcFuncs data
structure. For example, to allocate 52 bytes of garbage collected
memory that will represent an object's instance which needs to be
finalized, one can write:
void *newObject = gc_malloc (52, &gcFinalizeObject);
The GC function set is also used to determine whether the allocated
memory is a GC root in which case it is linked to the
rootObjects list. For the allocation of a collectable object,
gc_malloc() also sets its color to WHITE and inserts the
whole garbage collected page it belongs to in the white list.
The function gc_malloc_fixed() is a short cut to the
invocation of gc_malloc() with the gcFixed GC
function set as a second argument. There are other functions that can
attach or reattach a random piece of memory to the garbage
collector. In that case, an indirect root is created and collected
separately.
Execution of the garbage collector and finalizer
Created during the initialization phase of Kaffe, the conservative
collector runs as a separate daemon thread set to run the
gcMan() function. At the same time, the finalizer thread is
created with a hook to the finaliserMan() function.
Low level memory allocation
The low level memory allocator has two different behaviors depending
on the size of the object is has to allocate. It will try to fit as
many small objects of the same size as possible within a piece of
memory the size of a page. This page will contain a local
gc_block header and a computed amount of blocks of a rounded
up size. This number is computed so that each block has somewhere in a
special section of the page two private entries. The first one is a
pointer to an instance of the gcFuncs data structure. The
second private entry is an eight bit unsigned integer which is used as
a state flag and also retains color information. A page of garbage
collected memory intended to store small objects is organized as:
- The size of this entire block is typically the system's page
size. There are N
gcFuncs pointers `F',
N bytes of flag information `f', and N
`Data' blocks of a given size. For
example, on a typical 32 bit system with a page size of 4KB, a
gc_block of 48 bytes and 4 bytes pointers, a garbage
collected block of memory will be able to store 32 'Data' blocks
of 120 bytes, using up to 4048 bytes. On that kind of system
object sizes range from 8 bytes to 4032 bytes. They're all
multiple of the size of a pointer.
The low level memory allocation function is gc_heap_malloc()
which, when called for the first time, initializes the heap by
invoking gc_heap_initialize(). This function sets the
sztable[] table, whose entries will be later used to round
the size of a small allocations (below MAX_SMALL_OBJECT_SIZE
bytes) and tell which free list of garbage collected pages should be
processed to get a new chunk of that size.
If the size of the block to allocate is found to be big, then
gc_large_block() is called to allocates a round number of
pages sufficient enough to contain the object and its private
collector information. This piece of memory will contain a
gc_block header. The state of the memory chunk is set to
NORMAL, and its color set to FREE.
Otherwise, in the case of a small block, sztable[] is
addressed using the size of the allocation to figure a rounded up size
and a free list index. Then, the allocation of the normalized chunk
takes a number of steps:
- gc_small_block() is called once but may fail. For
example, if gc_heap_malloc() is called for the first
time, the pointer to the primary free list
(gc_prim_freelist) is null and needs to be allocated. Or
the system might also be running out of memory. For all those
cases, execution continues with the steps 2 and
further. Otherwise, the execution goes directly to step 3.
- After having eventually performed some garbage collection (if and only
if there is a GC thread and some heap already allocated -- which is not
the case during the first invocation), gc_small_block() is
called again with the same original size. If this fails, the size
of the chunk to allocate is changed to the value of the heap
allocation size (ALLOC_HEAPSIZE, typically 1MB) and
some memory is allocated from the system, invoking
gc_system_alloc(). This will get pagealloc()
called which in its turn relies, depending on how Kaffe was
built, on sbrk(), memalign() or
malloc() to perform the real allocation. This block is
then inserted in the gc_objecthash hash table and then
made a free list available entry.
- The routine allocates an object of the rounded-up size. The free
list is processed to find a fit for a chunk of the
size of gc_pgsize which is usually set to match the size of a
system's page. This chunk of memory will be prepared to receive a
given amount of small object of the rounded up size. Within that
page, blocks of the normalized size are chained into a
free list whose head pointer is maintained in the free
field of the local gc_block header. Each of these blocks
has its state tagged NORMAL and its color set to FREE.
- The next available entry of the rounded-up size is extracted from
the allocated block, using the free list information contained in
its header. Its state is reset to normal and it content zeroed
and returned.
Things might take place slightly differently on an embedded
system. The sztable[] can computed by advance and stored in
ROM. pagealloc() might end up doing something different to
obtain a new memory block, depending on the memory management routines
supported by the embedded device.
Conservatively guessing the nature of a pointer
A conservative GC has to determine if an address is a valid garbage
collected memory location. The function gc_heap_isobject()
provides this functionality. Its implementation successively looks
for several properties featured by a memory location pointing to an
object. By doing so, it reduces the chances of a misidentification
that would lead to the unintentional retention of a garbage object and
cause a memory leak.
gc_heap_isobject() first tries to retrieve the
gc_block associated with the memory location. It then
computes the memory location's hash table index, from which it gets an
entry in the gc_objecthash array. This linked list of
allocated blocks is walked until an address equal to the
gc_block of the processed memory location is found. The
final test makes sure that the examined address fits within this
allocated piece of collectable memory and that it has a color. If one
of the test fails during the process, the pointer is not considered to
be a piece of garbage collected memory, and the function returns
false. Otherwise, true is returned.
Collection routines
There are several functions of the walk family that are used
to collect single memory locations or blocks of memory. Some of them
are used to specify the collector parts of the GC function sets, while
others are used internally.
walkMemory() first colors a memory block to black and inserts
the page it belongs to in the black list. It then applies the
collector function of this chunk of memory to its entirety.
walkBlock() processes each grey memory chunks of a garbage
collected page, setting their color back to black and individually
applying their private GC functions.
walkConservative() calls scanConservative(), which
chops a memory chunk into pointers and calls markObject() on
each one if it's a GC heap objects. This is a function that basically
traces references to other objects within an object. In this process,
it considers every word found inside an object's memory as a pointer
to an other live object. Since it may or may not be the case, it calls
gc_heap_isobject() to filter pointers to valid objects,
discarding all other address values: pointers to some other memory
locations, good-looking integers and any other random array of bits.
walkNull() doesn't perform any collection. It's the collector
attached to fixed objects, which are objects that are never collected,
like the GC thread.
walkIndirectRoot() is used to collect the indirect roots that
are created when a random piece of memory is attached to the garbage
collector. It simply calls its own collector for the amount of memory
this block represents.
The garbage collection process
There are several possible invocations of the garbage collector.
gcMan(), whose code is periodically executed by the GC
thread, is an endless loop. After having made other thread waiting
upon completion, its calls the function startGC(). This
processes all root objects, set their color to white and calls
markObject() on them. markObject() first makes sure
that the object belongs to garbage collected memory by calling
gc_heap_isobject(). The object is then turned grey and the
page it belongs to is moved to the grey list.
startGC() ends by calling walkLiveThreads(). This
function processes the list of live threads object and calls
walkMemory() on each of them. It turns out that for live
threads, the private GC routine is walkThread() which calls
markObject() on private thread info and then applies
walkConservative() on the thread's stack, looking for
object's references to be collected.
gcMan() continues by processing the list of grey allocated
pages that it eventually obtained from the startGC() pass,
and calls the function walkBlock() on them.
The white (garbage) list is processed and objects that need a
finalization are marked calling markObject(). Next,
potentially new grey objects are processed the same way grey objects
were previously processed.
We're reaching the end of the collection process. finishGC()
is called to process garbage (white) objects. Objects that need to be
finalized are moved to the finalise list and those which
don't require finalization but are white are returned to the local
free storage of the page they belong to, calling
gc_heap_free(). The finalizer thread is then woken
up. Finally, black objects are moved back to the white list and marked
accordingly.
Another way of invoking the garbage collector is by calling the
invokeGC() which basically wakes up the GC thread. This is
used to implement native functions like System.gc() or
Runtime.gc().
Finalization
The finalizer thread, periodically allowed to run by the GC thread,
walks the finalise list and processes each entries (which are
pages of garbage collected memory) as follows: the page is first moved
to the white list. Then, elements of the page marked to be finalized
are individually processed by their own finalizer function.
Incremental garbage collection with Kaffe
The collector is designed for incremental garbage collection by
implementing the three colors model; and some code is already in place
to handle it. Several problems have to be solved, including the
crucial issue of the implementation of software write barriers.
|