[Crash-utility] Miscellaneous fixes/enhancements to crash 4.0-2.10
Dave Anderson
anderson at redhat.com
Wed Nov 9 22:36:50 UTC 2005
Castor Fu wrote:
> On Wed, 9 Nov 2005, Dave Anderson wrote:
>
> > Castor Fu wrote:
> >
> >> Hi guys:
> >>
> >> We're trying to bring in a newer version of crash, and have
> >> some changes which we've made since going to version 4.0-2.10
> >> which I'll outline below, and have attached a patch:
> >>
> >> 1. Makefile, help.c, extensions.c, new file kw.c
> >> Add a keyword expansion mechanism to the 'extend' command.
> >> We did this primarily to let us load different extensions
> >> based on the dump file itself or the version of crash.
> >>
> >> +"\n The named <shared-object> is subjected to a simple keyword expansion",
> >> +" scheme with the following rules:",
> >> +" $(crash_version) version of crash",
> >> +" $(corestr-<varname>) string named <varname> taken from the core file",
> >> +" $(builddir) build directory for the kernel",
> >
> > I still have no idea what this does? Can you expand upon how this works,
> > what does it do, what does it looks like, can it be ignored, etc? Couldn't
> > this kind of thing can be self-contained in each extension module?
> > It just seems to be way too "implementation-specific" for lack of a
> > better term?
>
> Consider a set of extension commands which rely on some data structure
> which tends to change from version to version of say, a device driver.
> The extension code could, with a great deal more work, be written
> to extract structure offsets, etc. from the debugging information,
> but it's much easier to just '#include' the header file.
>
> This way, one could have something like
>
> extend /usr/lib/crash/fc$(corestr-fc_version).o
>
> in one's .crashrc, and the appropriate extensions could be loaded without
> the person analyzing the crash having to figure out which version it came from
> right away.
>
Wow. You guys have gone way beyond whatever I could have
imagined for extensions!
I'm going to have to let this one stew for a while...
>
> [ Along these lines, I'm trying to look at a RHEL diskdump and I must
> be missing the point about how one pulls together the modules to
> look at the dump. I have the rpm kernel-debuginfo-2.6.9-22.EL loaded, and all the
> kernel modules have .debug appended on the filename so 'mod -S' doesn't work. ]
Not sure why it doesn't work for you. The module.ko files come with the base
kernel package. The debuginfo package just supplements them with debug data.
>
>
> One could also implement this with an expect wrapper which would give one much
> more generality, but then everyone is going to end up with their own little
> wrapper....
>
> ...
>
> >> 6. task.c
> >> Be more defensive about the value of tc->processor
> >>
> >
> > Why is it necessary to do a verify_task() again -- it had to
> > have gone through verify_task() in order to get here, right?
> > What exactly did you see that required this?
>
> We've seen dumps where the task structures get corrupted by
> other bad pointers, and crash tended to just give up the ghost
> because of dereferencing tc->processor which would get
> set to some ridiculous value.
>
> I think I've seen errors at both these spots, but one or the
> other may have been from searching through the code for tc->processor.
>
Ok -- I figured that it was due to some unnatural dumpfile.
Anyway, it's harmless, and if it works in those cases, that's fine.
>
> >
> > --- crash-4.0-2.10/task.c 2005-11-07 07:44:06 -08:00
> > +++ crash-4.0-2.10-new/task.c 2005-11-07 18:57:19 -08:00
> > @@ -2942,7 +2942,7 @@
> > fprintf(fp, "(SYSRQ)");
> > else if (machdep->flags & INIT)
> > fprintf(fp, "(INIT)");
> > - else if (kt->cpu_flags[tc->processor] & NMI)
> > + else if (verify_task(tc, 2) && kt->cpu_flags[tc->processor] & NMI)
> > fprintf(fp, "(NMI)");
> > else if (tc->task == tt->panic_task)
> > fprintf(fp, "(PANIC)");
> >
> > This I'm presuming is related to the above, but again, what did you
> > see that requires this?:
> >
> > @@ -4337,7 +4340,8 @@
> >
> > tc = FIRST_CONTEXT();
> > for (i = 0; i < RUNNING_TASKS(); i++, tc++) {
> > - if (task_has_cpu(tc->task, NULL)) {
> > + if (task_has_cpu(tc->task, NULL)
> > + && tc->processor >= 0 && tc->processor < NR_CPUS) {
> > tt->panic_threads[tc->processor] = tc->task;
> > found++;
> > }
> >
> >
> >> Also, try to set the panic string correctly in when decoding dump.
> >
> > This is LKCD specific anyway, right?
>
> I don't think so. in get_panicmsg() it looks like there's code which
> is trying to get the panicmsg from either tt_panicmsg or LKCD.
>
> This string will get overwritten by looking through the msgbuf...
>
Yeah -- I should have mentioned that the get_panicmsg() code is only
satisfied by ancient MCLX-style dumpfiles. I'm pretty sure nobody
still has any of those...
So it is LKCD only, which is fine.
>
> >> 7. x86.c
> >>
> >> Be a little more liberal deciding that the frame pointer has been included.
> >> gcc 3.x can re-schedule code so that instead of
> >>
> >> push bp
> >> mov esp, bp
> >>
> >> we sometimes see
> >>
> >> push bp
> >> xor eax,eax
> >> move esp, bp
> >>
> >> so detect both sequences.
> >
> > This looks reasonable. Didn't think anybody built kernels that way anymore...
>
> You mean nobody builds with frame pointers anymore? I guess we do because it's easier
> to debug. Having 'bt' pull out the arguments on the stack is nice. If we can get
> similar functionality without frame pointers, we'd be happy to chuck them...
>
> -castor
Thanks,
Dave
More information about the Crash-utility
mailing list