apport/breakpad and fedora

Peter Jones pjones at redhat.com
Tue Jun 24 15:20:53 UTC 2008


Colin Walters wrote:
> 2008/6/23 Will Woods <wwoods at redhat.com>:

>> If I remember right, the reason for this part of the discussion was:
>>
>> 1) Linking everything on the system to breakpad is a bit nasty.
>> 2) Apport doesn't need to be linked in, but it runs *after* the process
>> gets dumped by the kernel. At which point it's slightly different from
>> when it actually crashed.
> 
> 
> Yeah, sounds right.
> 
> pjones' idea was to have a system service that would receive
>> notification of segfaults and use utrace to stop the process and
>> generate a (breakpad-style report).
>>
> 
> He was thinking of hooking it into kerneloops, right?

This was really just my "easiest first-pass way to implement it"; I 
expect we can replace this part with something better if we need to, and 
it may or may not be necessary.

> Though isn't there a race between when we get the kernel notification and
> when the service stops it and inspects?  Not my area of expertise really,
> just thinking out loud.

If we're /not/ changing any kernel APIs, we'd want to do several things, 
conditional on the feature being enabled.  A mostly inclusive list follows:

1) make /var/cache/cores/ a tmpfs mount
2) set kernel.core_pattern to something like "/var/cache/cores/core.%p"
3) do something along the lines of setfacl to limit access
4) "ulimit -c $SOMETHING_NONZERO" for everything.

If we were to change kernel APIs, my initial thought is a utrace plugin 
that suspends the task instead of delivering the segfault, and gives us 
a notification on a file descriptor we're ppoll()ing on.  Then we'd go 
examine the process's memory and collect a trace.  This also has the 
advantage that it means no shared writable space and no spinning up the 
disk to write the core out.  Also, on the whole it requires fewer 
different parts of the system to be set up right.

>> It would make the 'debuginfo-install' message go away, because (if DAV +
>> FUSE does the right thing) you'll have all the debuginfo you need, in
>> the right place - mounted as a FUSE filesystem.
> 
> Ah, ok.

FWIW, the debuginfo server I'm working on is at 
http://git.fedorahosted.org/git/?p=littlebottom.git;a=summary .  It's 
still very much in its infancy, and I can use all the help I can get. 
I'll gladly add you to the group if you want to help out ;)

>>> My 2¢ - Link in breakpad, create http://crash.fedoraproject.org
>>> running Socorro.
>> Link it into what? Everything, via LD_PRELOAD? Or just GNOME stuff? I
>> thought bug-buddy already used breakpad?

IMNSHO, LD_PRELOAD is just a plain bad idea here (and nearly everywhere 
else).  There are also plenty of places where we want tracebacks, but 
the upstream maintainers won't like the patches, and we don't want to be 
carrying patches.  Not to mention patching everything is a herculean task.

I really if we're going to succeed, we've got to plan on /not/ changing 
most executables.

> I'm personally most interested in the desktop apps because, well we desktop
> developers are masochists and code complex user-facing code in C/C++, and
> not surprisingly they crash =)

The same is true of the rest of the system; I think our solution needs 
to work for everything (well, everything compiled, though the 
reporting/statistics infrastructure need not be even that specific.)

> So right now...hm, actually this is weird, I can't get any Fedora-compiled
> program to spawn bug-buddy at all right now.  I get it for some local custom
> code, but not for anything in /usr/bin.  I see libgnomebreakpad is linked
> into the process.

Another point against the "link in a magic library" approach.  If the 
crashing executable has to do the work to spawn the reporting tool, 
it'll *never* be reliable.

>>  Longer term investigate utrace system service instead of having apps
>>> link to breakpad (this gets us non-desktop system crashes without
>>> having to universally LD_PRELOAD or whatever).
>> Yeah, I don't think we need to solve this until we've got the
>> proof-of-concept stack: a couple of choice apps sending Breakpad reports
>> (with debuginfo fetched from littlebottom) to our own Socorro instance.

I think we're all in agreement here.

-- 
   Peter




More information about the fedora-devel-list mailing list