[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: apport/breakpad and fedora



Colin Walters wrote:
2008/6/23 Will Woods <wwoods redhat com>:

If I remember right, the reason for this part of the discussion was:

1) Linking everything on the system to breakpad is a bit nasty.
2) Apport doesn't need to be linked in, but it runs *after* the process
gets dumped by the kernel. At which point it's slightly different from
when it actually crashed.


Yeah, sounds right.

pjones' idea was to have a system service that would receive
notification of segfaults and use utrace to stop the process and
generate a (breakpad-style report).


He was thinking of hooking it into kerneloops, right?

This was really just my "easiest first-pass way to implement it"; I expect we can replace this part with something better if we need to, and it may or may not be necessary.

Though isn't there a race between when we get the kernel notification and
when the service stops it and inspects?  Not my area of expertise really,
just thinking out loud.

If we're /not/ changing any kernel APIs, we'd want to do several things, conditional on the feature being enabled. A mostly inclusive list follows:

1) make /var/cache/cores/ a tmpfs mount
2) set kernel.core_pattern to something like "/var/cache/cores/core.%p"
3) do something along the lines of setfacl to limit access
4) "ulimit -c $SOMETHING_NONZERO" for everything.

If we were to change kernel APIs, my initial thought is a utrace plugin that suspends the task instead of delivering the segfault, and gives us a notification on a file descriptor we're ppoll()ing on. Then we'd go examine the process's memory and collect a trace. This also has the advantage that it means no shared writable space and no spinning up the disk to write the core out. Also, on the whole it requires fewer different parts of the system to be set up right.

It would make the 'debuginfo-install' message go away, because (if DAV +
FUSE does the right thing) you'll have all the debuginfo you need, in
the right place - mounted as a FUSE filesystem.

Ah, ok.

FWIW, the debuginfo server I'm working on is at http://git.fedorahosted.org/git/?p=littlebottom.git;a=summary . It's still very much in its infancy, and I can use all the help I can get. I'll gladly add you to the group if you want to help out ;)

My 2ยข - Link in breakpad, create http://crash.fedoraproject.org
running Socorro.
Link it into what? Everything, via LD_PRELOAD? Or just GNOME stuff? I
thought bug-buddy already used breakpad?

IMNSHO, LD_PRELOAD is just a plain bad idea here (and nearly everywhere else). There are also plenty of places where we want tracebacks, but the upstream maintainers won't like the patches, and we don't want to be carrying patches. Not to mention patching everything is a herculean task.

I really if we're going to succeed, we've got to plan on /not/ changing most executables.

I'm personally most interested in the desktop apps because, well we desktop
developers are masochists and code complex user-facing code in C/C++, and
not surprisingly they crash =)

The same is true of the rest of the system; I think our solution needs to work for everything (well, everything compiled, though the reporting/statistics infrastructure need not be even that specific.)

So right now...hm, actually this is weird, I can't get any Fedora-compiled
program to spawn bug-buddy at all right now.  I get it for some local custom
code, but not for anything in /usr/bin.  I see libgnomebreakpad is linked
into the process.

Another point against the "link in a magic library" approach. If the crashing executable has to do the work to spawn the reporting tool, it'll *never* be reliable.

 Longer term investigate utrace system service instead of having apps
link to breakpad (this gets us non-desktop system crashes without
having to universally LD_PRELOAD or whatever).
Yeah, I don't think we need to solve this until we've got the
proof-of-concept stack: a couple of choice apps sending Breakpad reports
(with debuginfo fetched from littlebottom) to our own Socorro instance.

I think we're all in agreement here.

--
  Peter


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]