GDB: An Open Source Debugger for Embedded Development
Back to Cygnus White Papers index

Internal Structure of GDB
GDB is quite a large body of code. Not counting shared libraries such as BFD, GDB totals about 400,000 lines of C code. To be sure, much of this code is specific to particular hosts or targets, and after 13 years of development, some of the code relates only to long-vanished machines. Even so, there is a lot there.

The following diagram illustrates the major components of GDB:

Internally, GDB can be usefully thought of as having two sides.

The symbol side is primarily responsible for symbol tables, expression handling, language support, source display, and other activities that involve symbolic data.

The target side manipulates the process or target system being debugged. It mainly works with numerical data, bits and bytes. (In this context, "target side" still means host-based code.)

Most actual debugging activities involve elements of both target and symbol manipulation. For example, to display the value of var1+b[idx], GDB will use its symbolic information to look up the addresses of var1, b, and idx; then use the target side to collect the contents of memory at those addresses; then invoke the evaluator to do the final calculation. However, the separation is clean enough that the knowledgeable user can work with either side separately. So for instance it is possible to do some analysis of a program without ever connecting to a target system. It is also possible to connect to a target system, download a program, set breakpoints, and run it, all without having any symbolic information at all! While these seem like oddball cases, experienced developers know that they do occur from time to time, and it's very useful to have a debugger that can handle them.

The Symbol Side

BFD
On the symbol side, GDB uses the Binary Format Descriptor (BFD) library to read executables and symbol files. BFD was one of Cygnus' first major development projects, and was a key step in making GNU tools useful for cross development. BFD is a portable universal object file library, capable of reading a.out, COFF, ELF, and other formats. Most importantly, the library is structured as a collection of format descriptor objects (the "BFD"s), which all coexist and can be selected at runtime. Not only is this useful for activities like format conversion, but it means that BFD can identify the type and architecture of an object file automatically. In turn, this allows GDB to choose the right symbol reader, so you don't have to think about it.

Since it is a general library that is also used by the assembler and linker, BFD handles only the basic object file structure; sections, global linker symbols, relocations, and so forth. GDB itself handles the reading and analysis of debug information.

Debug Symbol Reading
GDB has two layers of symbol reader. The lower layer is usually simple; it calls BFD functions to get the linker symbols and makes them into GDB symbols. These symbols will prove useful if debug info is missing, since they can be used in backtraces at least.

The lower-level symbol reader also detects the presence of specific kinds of debug symbol info, and invokes the appropriate upper-level reader. Although the levels are somewhat uncoupled, a full MxN matrix is not really possible. For instance, there is an embedding of "stabs" (dbx) debug info in COFF files, and so the the COFF reader needs to look for a .stabs section and invoke the stabs reader if found. However, there is no defined way to embed DWARF 2 debug info in a.out, and therefore the a.out reader has no mechanism to call the DWARF 2 reader.

The actual symbol tables are not too unusual in structure. Basically, they're a collection of interconnected nodes. Symbols consist of types, functions, and variables. The set of attributes is common to all supported languages, although some attributes will only appear with, for example, Java.

GDB handles line numbers with a separate line table that matches line numbers with addresses. It builds an inverse lookup table to speed up the mapping of arbitrary addresses to source file and line number.

Language Support
Language support is straightforward in concept, though complicated to do completely. The centerpiece is an expression parser for the language. In most cases, this is handled with a yacc grammar, although the CHILL language has some complexities that make it easier to parse correctly with straight C code. The result of parsing is a generic expression object, although it may include elements used only by a single language. Language support also requires language-specific display routines for both types and values resulting from expression evaluation.

All of this is connected to the rest of GDB through the creation of an object that records attributes of the languages, plus inclusion in various case statements.

Target Side
The target side of GDB manipulates actual hardware or a simulation of it. As a special case, it also manipulates corefiles or dumpfiles, which are a program's state recorded into a file. (While corefiles are more common for native than embedded environments, a number of the more sophisticated embedded developers, such as Cisco and Network Appliance, have defined dumpfiles that GDB can be made to read).

Target Vector
Most of GDB's target manipulation passes through an abstraction known as the target vector. A target vector is similar in concept to a C++ class, with about 30-40 methods. The set of methods ranges from the obvious, like target_fetch_registers, which uploads values of the machine's registers into GDB, to the obscure, like target_terminal_ours_for_output, which relates to terminal control for native debugging. Each target vector implements a specific kind of debugging capability, so for instance the simulator target vector methods simply make subroutine calls to the built-in simulator library, while the standard protocol methods send packets like $g#67 out through a serial or TCP port, and wait for a response.

At present, GDB includes about 70 different target vectors, of which perhaps 10 are for native debugging, leaving some 60 embedded protocols. The embedded protocols include:

Builtin simulator (sim) Standard GDB protocol (remote) MIPS PMON protocol (mips, pmon, ddb, lsi) ARM RDI/ADP protocol (rdi) SDS protocol for PowerPC (sds) A29K UDI protocol (udi) ROM monitors; PPCBug, ROM68K, etc (ppcbug, rom68k, dink32, etc) VxWorks protocol (vx) Macraigor wiggler for PowerPC (ocd wiggler) Hitachi E7000 emulator (e7000) EST emulator (est) NetROM emulator (nrom)

To use one of these, just say target name port, where name is one of the names listed in parentheses above, and port is a serial port like com1 or a TCP port like cygnus.com:1234. Insight also provides a target connection dialog that allows you to set this up interactively.

There are additional protocols included in versions of GDB not distributed by Cygnus or the FSF. For instance, Intel has an HDI backend that is part of GDB960.

In general, each protocol implementation is straightforward, and only becomes complex if the target system is complex in some way. This is a common way for people to extend GDB; just clone an existing target vector, modify as needed, register it in the list of target vectors, and it's added.

Architecture Description
Another element of the target side is the target architecture definition. The definition is a set of macros that are defined differently for each architecture. They range from definitions like TARGET_BYTE_ORDER, which is either a constant like BIG_ENDIAN or a variable for bi-endian chips, to FRAME_CHAIN_VALID, which detects the outermost (bottom) frame in the stack.

Architecture descriptions are generally tricky to write, mostly because the stack analysis and unwinding code is intimately connected to the calling convention defined by the compiler. This has become even more of an issue recently, since compilers for embedded targets are experimenting with improving runtime performance by changing the calling conventions. Also, architecture variants such as Thumb and MIPS16 require different calling conventions. Thus, the debugger's analysis code becomes much more complicated, and requires closer collaboration with compiler developers.

Fortunately for GDB users, Cygnus works with nearly all semiconductor vendors to create GDB ports while the chips are still in development. By the time a chip is announced, the architecture description is done, along with a simulator, allowing people to try out the architecture before they get an actual device!

Debugger Algorithms
The target side includes a set of algorithms for standard types of operations. For instance there is a generic_load function that downloads a program by doing a sequence of memory writes. While many target vectors use this function for their load methods, others have a special load method, perhaps involving UDP over Ethernet, or something else unique to that target vector.

Another example of a generic algorithm is the memory breakpoint code, which works by writing traps or illegal instructions on top of the program, then restoring the real instruction before restarting the program.

These algorithms generally only require attention if a new architecture requires a new capability. For instance, the Mitsubishi D10V is a Harvard architecture chip, and its split instruction and data spaces required changes to GDB's basic assumption of a flat uniform address space.

Execution Control
The most important algorithm in GDB's target side is the one that does execution control. The key function is proceed(), which is used for both single-stepping and continuing execution. proceed then calls two functions: resume(), which actually sends the packet to the target telling it to execute, and wait_for_inferior(), which waits for the target to come back and tell it what happened. Here is a bit of pseudo-code illustrating how wait_for_inferior works:

wait_for_inferior()
{
while (1)
wait for target program to send signal
if signal was trap && breakpoint was set at PC
restore original instruction
return
[S]
if was single-stepping && PC is at next source line
return
else
step next instruction
}

In actual practice, wait_for_inferior has many tricky cases to deal with, such as thread-specific breakpoints, hardware watchpoints, architectures that have to use breakpoints to implement stepping, and so forth. As of March 1999, the while(1) loop is 1800 lines long and replete with gotos that were added in the past to solve logic problems.

Remote Debugging Protocol
GDB's standard remote debugging protocol is widely used. Since there aren't any official statistics, it's hard to know how common it is, but we hear about many creative uses, and perhaps as many as 50% of the embedded systems running the Internet (routers for instance) support GDB, at least in the prototype stage. By default, the protocol is ASCII, although Cygnus recently added a binary download option. While ASCII may seem primitive, it is very reliable across a broad range of connections; in using the new binary option, we have discovered that many communication paths are still not 8-bit clean!

The target-side code is usually known as the GDB debugging stub, or just stub for short.

The basic format of a packet is $ data # checksum, where checksum is a simple two-digit checksum of data, which is an ASCII string. Numbers are always in hexadecimal. Upon successful receipt of a packet, the receiver must return a + (plus), otherwise a - (minus).

A stub must understand these nine types of packets:

g read all registers G write all registers maddr,length read length bytes of memory at address addr Maddr,length:data write length bytes data to memory at address addr caddr continue at address addr saddr step one instruction at address addr Csig,addr continue with signal sig at address addr Ssig,addr step one instruction with signal sig at address addr ? get reason for stopping

There are about another 10 types of optional packets; these are for thread support, detaching, querying the target, resetting the target, and so forth.

The stub's response depends on the packet type. In many cases, the stub need only respond with OK, while in the case of register and memory reads, it should return a string of hex digits with all the data run together. In the case of stepping and continuing, the stub should come back with the signal that caused the program to stop. The signals mimic Unix signals, although for embedded they are just agreed-upon numbers. For instance, GDB declares traps to be signal 5, so if the target program hits a breakpoint trap, the stub will come back with S05. Other possible return packet types include Odata, for output data from the program, and Xsig, to indicate that the program exited.

The following transcript uses the remotedebug flag and command-line GDB to illustrate the packet traffic associated with a simple debugging session on Hitachi's eval board for the SH-2. This board's CMON monitor has a GDB stub built into it.

% sh-hms-gdb -nw a.out
GNU gdb 4.17-gnupro-98r2
Copyright 1998 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. This version of GDB is supported for customers of Cygnus Solutions. Type "show warranty" for details. This GDB was configured as "--host=sparc-sun-sunos4.1 --target=sh-hms"... (gdb) set remotedebug 1 Connect to board's serial port via (gdb) target remote hellcab:1010 TCP port 1010 of portmaster "hellcab"
Remote debugging using hellcab:1010
Sending packet: $Hc-1#09...Ack GDB attempts to set current thread
Packet received: empty response, no threads
Sending packet: $qOffsets#4b...Ack ditto for executable offsets
Packet received:
Sending packet: $?#3f...Ack find out the target's current state
Packet received: S05
Sending packet: $Hg0#df...Ack try to fiddle with threads more
Packet received: continue to ignore thread fiddling
Sending packet: $g#67...Ack get all the registers
Packet received: 000000030000789404000940000000000000000100000000e3fefcd60c05c9e3040005180000 12c004000000000011e00000006400000000207f0d9e04000ad004000156000079b8e46838d700000000808a 159100000000000003f10000000000000000000000000000000000000000
Sending packet: $m0,2#fb...Ack bogus stack probably, but try
Packet received: 0000 to decipher anyway
Sending packet: $m0,2#fb...Ack
Packet received: 0000
0x4000156 in ?? ()
(gdb) load
Loading section .text, size 0x12f0 lma 0x4004000
Sending packet: $M4004000,67:df07d008d108e20020227004310389fbd006400b200b6403d005400b200b 00090403ff0004005a5004005a8804004040040041a00009000900090009000900092f862f962fe64f227ffc6 ef3d80f480b0009d10fe204212261f3d80e480b00092e02d80b61f3d9#79...Ack
Packet received: OK
[... many more M packets ...]
(gdb) break 10 set a breakpoint
Breakpoint 1 at 0x4004058: file hello.c, line 10.
(gdb) continue and run to it
Continuing.
Sending packet: $m4004058,2#30...Ack
Packet received: 61f3
Sending packet: $M4004058,2:c320#42...Ack install the breakpoint
Packet received: OK
Sending packet: $Hc0#db...Ack
Packet received:
Sending packet: $c#63...Ack let the program loose!
Packet received: S05 it got a trap
Sending packet: $g#67...Ack collect all the registers again
Packet received:
0000000004005a700000000400000000040040c000000000e3fefcd60c05c9e304004140000012c00400000 0000011e000000064000000000403feec0403feec0400405804004052e46838d700000000808a159100000000000003f0000 0000000000000000000000000000000000000
Sending packet: $m4004040,2#27...Ack collect lots of the stack
Packet received: 2f86
Sending packet: $m4004042,2#29...Ack
Packet received: 2f96
Sending packet: $m4004044,2#2b...Ack
Packet received: 2fe6
[... more reading of stack memory ...]
Sending packet: $m403fef0,4#c5...Ack
Packet received: 04004016
Sending packet: $M4004058,2:61f3#4a...Ack restore the instruction
Packet received: OK at the breakpoint
Breakpoint 1, main () at hello.c:10
10 b = foo();
(gdb)

Prev Table of Contents Next
Interfaces to GDB   Future Plans