Internal Structure of GDB
GDB is quite a large body of code. Not counting shared libraries such as BFD, GDB totals about 400,000 lines of C code. To
be sure, much of this code is specific to particular hosts or targets, and after 13 years of development, some of the code
relates only to long-vanished machines. Even so, there is a lot there.
The following diagram illustrates the major components of GDB:
Internally, GDB can be usefully thought of as having two sides.
The symbol side is primarily responsible for symbol tables, expression handling, language support, source display, and other activities that involve symbolic data.
The target side manipulates the process or target system being debugged. It mainly works with numerical data, bits and bytes. (In this context, "target side" still means host-based code.)
Most actual debugging activities involve elements of both target and symbol manipulation. For example, to display the value of
var1+b[idx], GDB will use its symbolic information to look up the addresses of var1, b, and idx; then use the target side to
collect the contents of memory at those addresses; then invoke the evaluator to do the final calculation. However, the
separation is clean enough that the knowledgeable user can work with either side separately. So for instance it is possible to
do some analysis of a program without ever connecting to a target system. It is also possible to connect to a target system,
download a program, set breakpoints, and run it, all without having any symbolic information at all! While these seem like
oddball cases, experienced developers know that they do occur from time to time, and it's very useful to have a debugger that
can handle them.
The Symbol Side
BFD
On the symbol side, GDB uses the Binary Format Descriptor (BFD) library to read executables and symbol files. BFD was one
of Cygnus' first major development projects, and was a key step in making GNU tools useful for cross development. BFD is a
portable universal object file library, capable of reading a.out, COFF, ELF, and other formats. Most importantly, the library is
structured as a collection of format descriptor objects (the "BFD"s), which all coexist and can be selected at runtime. Not only
is this useful for activities like format conversion, but it means that BFD can identify the type and architecture of an object file
automatically. In turn, this allows GDB to choose the right symbol reader, so you don't have to think about it.
Since it is a general library that is also used by the assembler and linker, BFD handles only the basic object file structure;
sections, global linker symbols, relocations, and so forth. GDB itself handles the reading and analysis of debug information.
Debug Symbol Reading
GDB has two layers of symbol reader. The lower layer is usually simple; it calls BFD functions to get the linker symbols and
makes them into GDB symbols. These symbols will prove useful if debug info is missing, since they can be used in
backtraces at least.
The lower-level symbol reader also detects the presence of specific kinds of debug symbol info, and invokes the appropriate
upper-level reader. Although the levels are somewhat uncoupled, a full MxN matrix is not really possible. For instance, there is
an embedding of "stabs" (dbx) debug info in COFF files, and so the the COFF reader needs to look for a .stabs section and
invoke the stabs reader if found. However, there is no defined way to embed DWARF 2 debug info in a.out, and therefore the
a.out reader has no mechanism to call the DWARF 2 reader.
The actual symbol tables are not too unusual in structure. Basically, they're a collection of interconnected nodes. Symbols
consist of types, functions, and variables. The set of attributes is common to all supported languages, although some
attributes will only appear with, for example, Java.
GDB handles line numbers with a separate line table that matches line numbers with addresses. It builds an inverse lookup
table to speed up the mapping of arbitrary addresses to source file and line number.
Language Support
Language support is straightforward in concept, though complicated to do completely. The centerpiece is an expression
parser for the language. In most cases, this is handled with a yacc grammar, although the CHILL language has some
complexities that make it easier to parse correctly with straight C code. The result of parsing is a generic expression object,
although it may include elements used only by a single language. Language support also requires language-specific display
routines for both types and values resulting from expression evaluation.
All of this is connected to the rest of GDB through the creation of an object that records attributes of the languages, plus
inclusion in various case statements.
Target Side
The target side of GDB manipulates actual hardware or a simulation of it. As a special case, it also manipulates corefiles or
dumpfiles, which are a program's state recorded into a file. (While corefiles are more common for native than embedded
environments, a number of the more sophisticated embedded developers, such as Cisco and Network Appliance, have
defined dumpfiles that GDB can be made to read).
Target Vector
Most of GDB's target manipulation passes through an abstraction known as the target vector. A target vector is similar in
concept to a C++ class, with about 30-40 methods. The set of methods ranges from the obvious, like target_fetch_registers,
which uploads values of the machine's registers into GDB, to the obscure, like target_terminal_ours_for_output, which relates
to terminal control for native debugging. Each target vector implements a specific kind of debugging capability, so for instance
the simulator target vector methods simply make subroutine calls to the built-in simulator library, while the standard protocol
methods send packets like $g#67 out through a serial or TCP port, and wait for a response.
At present, GDB includes about 70 different target vectors, of which perhaps 10 are for native debugging, leaving some 60
embedded protocols. The embedded protocols include:
Builtin simulator (sim)
Standard GDB protocol (remote)
MIPS PMON protocol (mips, pmon, ddb, lsi)
ARM RDI/ADP protocol (rdi)
SDS protocol for PowerPC (sds)
A29K UDI protocol (udi)
ROM monitors; PPCBug, ROM68K, etc (ppcbug, rom68k, dink32, etc)
VxWorks protocol (vx)
Macraigor wiggler for PowerPC (ocd wiggler)
Hitachi E7000 emulator (e7000)
EST emulator (est)
NetROM emulator (nrom)
To use one of these, just say target name port, where name is one of the names listed in parentheses above, and port is
a serial port like com1 or a TCP port like cygnus.com:1234. Insight also provides a target connection dialog that allows you to
set this up interactively.
There are additional protocols included in versions of GDB not distributed by Cygnus or the FSF. For instance, Intel has an
HDI backend that is part of GDB960.
In general, each protocol implementation is straightforward, and only becomes complex if the target system is complex in
some way. This is a common way for people to extend GDB; just clone an existing target vector, modify as needed, register it
in the list of target vectors, and it's added.
Architecture Description
Another element of the target side is the target architecture definition. The definition is a set of macros that are defined
differently for each architecture. They range from definitions like TARGET_BYTE_ORDER, which is either a constant like
BIG_ENDIAN or a variable for bi-endian chips, to FRAME_CHAIN_VALID, which detects the outermost (bottom) frame in the
stack.
Architecture descriptions are generally tricky to write, mostly because the stack analysis and unwinding code is intimately
connected to the calling convention defined by the compiler. This has become even more of an issue recently, since
compilers for embedded targets are experimenting with improving runtime performance by changing the calling conventions.
Also, architecture variants such as Thumb and MIPS16 require different calling conventions. Thus, the debugger's analysis
code becomes much more complicated, and requires closer collaboration with compiler developers.
Fortunately for GDB users, Cygnus works with nearly all semiconductor vendors to create GDB ports while the chips are still in
development. By the time a chip is announced, the architecture description is done, along with a simulator, allowing people to
try out the architecture before they get an actual device!
Debugger Algorithms
The target side includes a set of algorithms for standard types of operations. For instance there is a generic_load function that
downloads a program by doing a sequence of memory writes. While many target vectors use this function for their load
methods, others have a special load method, perhaps involving UDP over Ethernet, or something else unique to that target
vector.
Another example of a generic algorithm is the memory breakpoint code, which works by writing traps or illegal instructions on
top of the program, then restoring the real instruction before restarting the program.
These algorithms generally only require attention if a new architecture requires a new capability. For instance, the Mitsubishi
D10V is a Harvard architecture chip, and its split instruction and data spaces required changes to GDB's basic assumption of
a flat uniform address space.
Execution Control
The most important algorithm in GDB's target side is the one that does execution control. The key function is proceed(), which
is used for both single-stepping and continuing execution. proceed then calls two functions: resume(), which actually sends
the packet to the target telling it to execute, and wait_for_inferior(), which waits for the target to come back and tell it what
happened. Here is a bit of pseudo-code illustrating how wait_for_inferior works:
wait_for_inferior()
{
while (1)
wait for target program to send signal
if signal was trap && breakpoint was set at PC
restore original instruction
return
[S]
if was single-stepping && PC is at next source line
return
else
step next instruction
}
In actual practice, wait_for_inferior has many tricky cases to deal with, such as thread-specific breakpoints, hardware
watchpoints, architectures that have to use breakpoints to implement stepping, and so forth. As of March 1999, the while(1)
loop is 1800 lines long and replete with gotos that were added in the past to solve logic problems.
Remote Debugging Protocol
GDB's standard remote debugging protocol is widely used. Since there aren't any official statistics, it's hard to know how
common it is, but we hear about many creative uses, and perhaps as many as 50% of the embedded systems running the
Internet (routers for instance) support GDB, at least in the prototype stage. By default, the protocol is ASCII, although Cygnus
recently added a binary download option. While ASCII may seem primitive, it is very reliable across a broad range of
connections; in using the new binary option, we have discovered that many communication paths are still not 8-bit clean!
The target-side code is usually known as the GDB debugging stub, or just stub for short.
The basic format of a packet is $ data # checksum, where checksum is a simple two-digit checksum of data, which is an
ASCII string. Numbers are always in hexadecimal. Upon successful receipt of a packet, the receiver must return a + (plus),
otherwise a - (minus).
A stub must understand these nine types of packets:
g read all registers
G write all registers
maddr,length read length bytes of memory at address addr
Maddr,length:data write length bytes data to memory at address addr
caddr continue at address addr
saddr step one instruction at address addr
Csig,addr continue with signal sig at address addr
Ssig,addr step one instruction with signal sig at address addr
? get reason for stopping
There are about another 10 types of optional packets; these are for thread support, detaching, querying the target, resetting the
target, and so forth.
The stub's response depends on the packet type. In many cases, the stub need only respond with OK, while in the case of
register and memory reads, it should return a string of hex digits with all the data run together. In the case of stepping and
continuing, the stub should come back with the signal that caused the program to stop. The signals mimic Unix signals,
although for embedded they are just agreed-upon numbers. For instance, GDB declares traps to be signal 5, so if the target
program hits a breakpoint trap, the stub will come back with S05. Other possible return packet types include Odata, for output
data from the program, and Xsig, to indicate that the program exited.
|