| GDB: An Open Source Debugger for Embedded Development |
| Back to Cygnus White Papers index |
|
Internal Structure of GDB The following diagram illustrates the major components of GDB: Internally, GDB can be usefully thought of as having two sides. The symbol side is primarily responsible for symbol tables, expression handling, language support, source display, and other activities that involve symbolic data. The target side manipulates the process or target system being debugged. It mainly works with numerical data, bits and bytes. (In this context, "target side" still means host-based code.) Most actual debugging activities involve elements of both target and symbol manipulation. For example, to display the value of var1+b[idx], GDB will use its symbolic information to look up the addresses of var1, b, and idx; then use the target side to collect the contents of memory at those addresses; then invoke the evaluator to do the final calculation. However, the separation is clean enough that the knowledgeable user can work with either side separately. So for instance it is possible to do some analysis of a program without ever connecting to a target system. It is also possible to connect to a target system, download a program, set breakpoints, and run it, all without having any symbolic information at all! While these seem like oddball cases, experienced developers know that they do occur from time to time, and it's very useful to have a debugger that can handle them.
The Symbol Side
BFD Since it is a general library that is also used by the assembler and linker, BFD handles only the basic object file structure; sections, global linker symbols, relocations, and so forth. GDB itself handles the reading and analysis of debug information.
Debug Symbol Reading The lower-level symbol reader also detects the presence of specific kinds of debug symbol info, and invokes the appropriate upper-level reader. Although the levels are somewhat uncoupled, a full MxN matrix is not really possible. For instance, there is an embedding of "stabs" (dbx) debug info in COFF files, and so the the COFF reader needs to look for a .stabs section and invoke the stabs reader if found. However, there is no defined way to embed DWARF 2 debug info in a.out, and therefore the a.out reader has no mechanism to call the DWARF 2 reader. The actual symbol tables are not too unusual in structure. Basically, they're a collection of interconnected nodes. Symbols consist of types, functions, and variables. The set of attributes is common to all supported languages, although some attributes will only appear with, for example, Java. GDB handles line numbers with a separate line table that matches line numbers with addresses. It builds an inverse lookup table to speed up the mapping of arbitrary addresses to source file and line number.
Language Support All of this is connected to the rest of GDB through the creation of an object that records attributes of the languages, plus inclusion in various case statements.
Target Side
Target Vector At present, GDB includes about 70 different target vectors, of which perhaps 10 are for native debugging, leaving some 60 embedded protocols. The embedded protocols include: Builtin simulator (sim) Standard GDB protocol (remote) MIPS PMON protocol (mips, pmon, ddb, lsi) ARM RDI/ADP protocol (rdi) SDS protocol for PowerPC (sds) A29K UDI protocol (udi) ROM monitors; PPCBug, ROM68K, etc (ppcbug, rom68k, dink32, etc) VxWorks protocol (vx) Macraigor wiggler for PowerPC (ocd wiggler) Hitachi E7000 emulator (e7000) EST emulator (est) NetROM emulator (nrom) To use one of these, just say target name port, where name is one of the names listed in parentheses above, and port is a serial port like com1 or a TCP port like cygnus.com:1234. Insight also provides a target connection dialog that allows you to set this up interactively. There are additional protocols included in versions of GDB not distributed by Cygnus or the FSF. For instance, Intel has an HDI backend that is part of GDB960. In general, each protocol implementation is straightforward, and only becomes complex if the target system is complex in some way. This is a common way for people to extend GDB; just clone an existing target vector, modify as needed, register it in the list of target vectors, and it's added.
Architecture Description Architecture descriptions are generally tricky to write, mostly because the stack analysis and unwinding code is intimately connected to the calling convention defined by the compiler. This has become even more of an issue recently, since compilers for embedded targets are experimenting with improving runtime performance by changing the calling conventions. Also, architecture variants such as Thumb and MIPS16 require different calling conventions. Thus, the debugger's analysis code becomes much more complicated, and requires closer collaboration with compiler developers. Fortunately for GDB users, Cygnus works with nearly all semiconductor vendors to create GDB ports while the chips are still in development. By the time a chip is announced, the architecture description is done, along with a simulator, allowing people to try out the architecture before they get an actual device!
Debugger Algorithms Another example of a generic algorithm is the memory breakpoint code, which works by writing traps or illegal instructions on top of the program, then restoring the real instruction before restarting the program. These algorithms generally only require attention if a new architecture requires a new capability. For instance, the Mitsubishi D10V is a Harvard architecture chip, and its split instruction and data spaces required changes to GDB's basic assumption of a flat uniform address space.
Execution Control
wait_for_inferior() In actual practice, wait_for_inferior has many tricky cases to deal with, such as thread-specific breakpoints, hardware watchpoints, architectures that have to use breakpoints to implement stepping, and so forth. As of March 1999, the while(1) loop is 1800 lines long and replete with gotos that were added in the past to solve logic problems.
Remote Debugging Protocol The target-side code is usually known as the GDB debugging stub, or just stub for short. The basic format of a packet is $ data # checksum, where checksum is a simple two-digit checksum of data, which is an ASCII string. Numbers are always in hexadecimal. Upon successful receipt of a packet, the receiver must return a + (plus), otherwise a - (minus). A stub must understand these nine types of packets: g read all registers G write all registers maddr,length read length bytes of memory at address addr Maddr,length:data write length bytes data to memory at address addr caddr continue at address addr saddr step one instruction at address addr Csig,addr continue with signal sig at address addr Ssig,addr step one instruction with signal sig at address addr ? get reason for stopping There are about another 10 types of optional packets; these are for thread support, detaching, querying the target, resetting the target, and so forth. The stub's response depends on the packet type. In many cases, the stub need only respond with OK, while in the case of register and memory reads, it should return a string of hex digits with all the data run together. In the case of stepping and continuing, the stub should come back with the signal that caused the program to stop. The signals mimic Unix signals, although for embedded they are just agreed-upon numbers. For instance, GDB declares traps to be signal 5, so if the target program hits a breakpoint trap, the stub will come back with S05. Other possible return packet types include Odata, for output data from the program, and Xsig, to indicate that the program exited. The following transcript uses the remotedebug flag and command-line GDB to illustrate the packet traffic associated with a simple debugging session on Hitachi's eval board for the SH-2. This board's CMON monitor has a GDB stub built into it.
% sh-hms-gdb -nw a.out
|