September 28, 2006

Frysk: Debugging in real time

by Rick Moseley

Frysk logo

This article is the first in a three-part series about the Frysk monitoring/debugging tool. Future articles will focus on the monitoring side of Frysk and the debugging side of Frysk, while this article is a general overview of Frysk.

The challenge

Around the beginning of 2005, the Red Hat Customer Advisory Board (CAB) met. These major Red Hat customers were asked what new tools they would like to have developed to help them create, test, and maintain their applications. After much discussion, it was determined that a new type of debugger would be a much-welcomed tool.

One of the major complaints that customers have about current debugging tools is that they have to be employed after-the-fact. That is, with the tools currently available, an application has to completely fail and then a developer has to restart the application with a debugger attached before it is possible to find out why it failed. This is fine if the application is predictable in the way it fails, but, unfortunately, this is usually not the case. The CAB members believed that the ideal debugging tool would be one that could be activated at the point when the application failed, rather than requiring the developer to recreate the conditions that led to the failure.

Another major CAB complaint is that current debugging tools--which were originally developed many years--ago have failed to keep up with modern day CPU/compiler technologies that languages such as Java and C++ provide. The capabilities of these languages, such as creating multi-threaded applications, have not been addressed by older generations of debugging tools. Frysk has been designed from the outset to encompass all of the latest CPU/compiler technologies.

From these customers' needs was born the initial requirements for a "new and improved" debugging and monitoring tool: a debugger that constantly monitors an application, detects when it fails, and attaches itself at the point where a developer can extract useful information.

The response

In response to the CAB's request, in May of 2005 Red Hat put together a new development team. Their purpose? To further refine the requirements and to start down the path to creating this new state-of-the-art debugging and monitoring tool. From the beginning it was obvious that if this new tool (now known as Frsyk) could be designed and developed thoughtfully, it could be used in a wide variety of ways for monitoring and/or debugging applications.

No login required. Want to see your comments in print? Send a letter to the editor.

From the CAB comments and suggestions, it was obvious that this tool would appeal to two separate and distinct types of users: in addition to the debugging functionality for developers, the monitoring part of Frysk would be used by system administrators to keep them informed of the health of their systems.

Frysk overall design

Of course the main goal in the initial design phase was to create the tool the CAB customers wanted. But early on the development team saw that the possibilities for Frysk were virtually endless. A few use cases were put together that provided the springboard for developing a full set of requirements.

The use cases depict scenarios of how a system administrator, a developer, and a student might use Frysk to more efficiently perform their everyday duties. Each scenario highlights unique areas of the Frysk design that are unavailable in the current open-source toolsets. As can be seen from the diversity of the three types of people depicted in the use cases, the Frysk design requires true versatility.

One thing that became obvious from the use cases was that Frysk had to be broken into two pieces to address the two different types of users for which Frysk was targeted: a monitoring part and a source window part. These two pieces, while distinctly different, can be joined at the right places to form a fully-functional and cohesive tool.

There are four well-known debugger basic principles that are well-enumerated in Jonathan B. Rosenberg's "How Debuggers Work: Algorithms, Data Structures, and Architectures:"

"First, the Heisenberg principle that says the debugger must intrude on the debuggee in a minimal way. Second, at all costs, the debugger must be truthful so the programmer always trusts it. Third, the debugger's most important role is the presentation of the content information so the user always knows where he is and how he got there in the debuggee. And fourth, unfortunately, the debugger you have is almost always behind technologically where you need it to be."

The Frysk design steadfastly adheres to the first three principles while attempting to mitigate the last.

Observer concept

Frysk uses an observer model to monitor the behavior of processes. This observer concept is based on the Java "Observable" class. The theory of operation is that when an observer is attached to a process or a thread within a process (henceforth referred to simply as a process), it reports back to the process that initiated the observer (in our case, Frysk) that a specific, user-specified behavior has occurred.
Currently the following observers can be attached to a process:

  • Exec--Monitor a process for a call to the "exec" function.
  • Fork--Monitor a process for a call to the "fork" function.
  • Task Clone--Monitor a process for a clone operation.
  • Task Terminating--Monitor a process for any activity for it to exit the CPU queue.
  • Syscall--Monitor a process for any system call.
  • Custom--An observer based on one of the above observers with filtering logic.

How Frysk observers work

Once a user determines some process or group of processes is not behaving properly, Frysk can be used to attach one or more observers to the suspicious processes. For example, say that a process keeps exiting the CPU queue unexpectedly. The user would attach a Task Terminating observer to monitor that process and catch it when it tries to exit. When the Task Terminating observer "fires," Frysk could be programmed to activate a source window where the user could then do a backtrace and find out how the process got to its current state.

Another scenario where the Frysk observers would be useful is to attach a Custom observer based on the Syscall observer. If just a Syscall observer is attached, that observer fires every time a system call is made. Defining a Custom observer based on the Syscall observer allows the user to filter the system calls down to a user-specified system call or a set of system calls. The user can then specify what action to take when the Custom observer fires, including showing the source code of the process or just logging the event.

As can be seen from the above examples, the Frysk observers are a powerful tool that allows developers and systems administrators quite a bit of flexibility. The possibilities are virtually limitless in how observers can be configured to diagnose almost any system or programming problem.

Frysk user interface design

Frysk will provide both a command line and a graphical and visual user interface (UI). While debugging monolithic applications lends itself very well to a command-line debugger interface, today's multi-threaded applications demand a more visual interface. As Frysk is designed to be both a monitoring and debugging tool, being able to display data in a concise, coherent fashion demands such a graphical interface.

What language to use?

Early in the design of Frysk it was decided that a current, state-of-the-art language must be used to design and implement the UI part. Research suggested that Java would be the language of choice for writing graphics-intensive applications, and its wide support and the number of open source libraries written for Java confirmed this.

Another factor that figured heavily into the decision to use Java was the availability of the gcj compiler. To keep Frysk totally independent from other outside packages (JVMs) that we have no control over, all of the Frysk Java code is compiled using the latest version of gcj. (This has been possible thanks to the support of the gcj development team of Andrew Haley, Tom Tromey, and Bryce McKinlay and their rapid response to requests for help from the Frysk team. They have been indispensable in resolving the problems that occurred during the early days of Frysk development.)

Which graphics package to use for the UI?

After a long debate, taking into consideration where the current Red Hat desktop group was focusing its efforts, it was decided to write the Frysk UI in Java. Open source graphics would be provided by Gnome interfacing through the Java-Gnome (JG) libraries. It was an important consideration that whatever package was used, it had to integrate smoothly and seamlessly with the current Red Hat desktop environment. Gnome would most easily let us do this.

While this was a choice fraught with peril from the very beginning, mainly because of the relative immaturity of the JG libraries, it has drawn the Frysk development team closer to the Gnome community through our constant interaction on IRC channels and mailing lists. In fact, three members of the Frysk team have been granted CVS check-in capabilities for JG. This allows us to not only fix any problems found during Frysk development, now these fixes can be placed back into the JG CVS head (upstream, if you will) so that future releases of JG will include the fixes. Active participation in the Gnome community, we feel, will provide many benefits down the road for Frysk.

Also, using Java-Gnome gives the Frysk team access to a lot of compatible software written to supplement Gnome. For example, Cairo, a powerful two-dimensional graphics library that makes it easy to draw well-rendered custom widgets, is being used on the monitoring side of Frysk to graph user-specified events. The following screenshot shows one of the uses of Cairo in the Status pane, where observer events are being plotted.

Frysk Monitoring Window screenshot
Frysk monitoring window screenshot.

Source window

A significant amount of thought has gone into the design of the Frysk source window. There are some debuggers available in the open source community that have a lot of very nice features. The Frysk source window functionality encompasses most of those--including keyword highlighting, variable highlighting, current line highlighting, various search functions, and many other features.

The challenge for the Frysk team was to develop a source window that incorporates most of those debugger's capabilities and then goes a step beyond. Because C++ was the preliminary language that Frysk would debug, it was decided that one way to do something a little different was in the way the source code that contained macros was displayed. The key here was to allow the user to choose whether or not to show the source code of the macro in the window. In the following screenshot is an example of how Frysk has implemented this feature.

Frysk Source Window screenshot
Frysk source window screenshot

In the above example there are two macros defined, bar and foobar, with the foobar macro being called from inside of the bar macro. There is inline code--denoted by the "i" in the column with the line numbers. The user can toggle whether or not the inline code is shown by left-clicking on the line number containing the "i." This is one of the ways Frysk is faithful to the third principle of debuggers: "the presentation of content information so the user always knows where he is and how he got there."

The source window is in its infancy from a development standpoint. To date, most of the time has been spent on developing the underpinnings for it. Java bindings had to be written and tested for the libelf, libdwarf, and libunwind libraries. With these tasks complete, more work can be applied to the more visible parts of the source window.

What the Frysk future holds

Frysk is a new project, but the future looks bright. As Frysk grows in sophistication, more focus will be placed on handling more complex, distributed systems--whether they are are many processors running many applications or multiple processors running a single application. Frysk's data visualization techniques will need to be improved so that hundreds or thousands of processors can be viewed in a coherent way.

How to get involved with Frysk

If you want to become involved with Frysk by working on code, submitting ideas for enhancement requests or just monitoring the development, please visit the Frysk website. Here you can subscribe to the Frysk mailing list or find out where the Frysk developers hang out on a public IRC channel.

More Frysk articles planned

Right now, the plan is to have two more articles on Frysk to go into more detail on the ongoing design and implementation of the two major parts of Frysk, the monitoring side and the debugging side. Look for them in future editions of Red Hat Magazine.

Acknowledgments

Many thanks the Frysk development team, including Andrew Cagney, Stan Cox, Michael Cvet, Adam Jocksch, Chris Moller, Tim Moore, Phil Muldoon, Nurdin Premji, Sami Wagiaalla, Mark Weilaard, and Elena Zannoni. Thanks to Len DiMaggio for his work on testing Frysk, especially with the Dogtail scripts he wrote for the UI part of Frysk. (Please see Len's highly informative article regarding this effort from the August 2006 issue.)

Also, special thanks to Mike Behm for his help with this article.