[Crash-utility] Miscellaneous fixes/enhancements to crash 4.0-2.10

Wed Nov 9 22:36:50 UTC 2005

Castor Fu wrote:

> On Wed, 9 Nov 2005, Dave Anderson wrote:
>
> > Castor Fu wrote:
> >
> >> Hi guys:
> >>
> >> We're trying to bring in a newer version of crash, and have
> >> some changes which we've made since going to version 4.0-2.10
> >> which I'll outline below, and have attached a patch:
> >>
> >> 1. Makefile,  help.c, extensions.c, new file kw.c
> >>       Add a keyword expansion mechanism to the 'extend' command.
> >>      We did this primarily to let us load different extensions
> >>      based on the dump file itself or the version of crash.
> >>
> >>      +"\n  The named <shared-object> is subjected to a simple keyword expansion",
> >>      +"  scheme with the following rules:",
> >>      +"    $(crash_version)       version of crash",
> >>      +"    $(corestr-<varname>)   string named <varname> taken from the core file",
> >>      +"    $(builddir)            build directory for the kernel",
> >
> > I still have no idea what this does?  Can you expand upon how this works,
> > what does it do, what does it looks like, can it be ignored, etc?  Couldn't
> > this kind of thing can be self-contained in each extension module?
> > It just seems to be way too "implementation-specific" for lack of a
> > better term?
>
> Consider a set of extension commands which rely on some data structure
> which tends to change from version to version of say, a device driver.
> The extension code could, with a great deal more work, be written
> to extract structure offsets, etc. from the debugging information,
> but it's much easier to just '#include' the header file.
>
> This way, one could have something like
>
>      extend /usr/lib/crash/fc$(corestr-fc_version).o
>
> in one's .crashrc, and the appropriate extensions could be loaded without
> the person analyzing the crash having to figure out which version it came from
> right away.
>

Wow.  You guys have gone way beyond whatever I could have
imagined for extensions!

I'm going to have to let this one stew for a while...

>
> [ Along these lines, I'm trying to look at a RHEL diskdump and I must
> be missing the point about how one  pulls together the modules to
> look at the dump.  I have the rpm kernel-debuginfo-2.6.9-22.EL loaded, and all the
> kernel modules have .debug appended on the filename so 'mod -S' doesn't work.  ]

Not sure why it doesn't work for you.  The module.ko files come with the base
kernel package.  The debuginfo package just supplements them with debug data.

>
>
> One could also implement this with an expect wrapper which would give one much
> more generality, but then everyone is going to end up with their own little
> wrapper....
>
> ...
>
> >> 6.  task.c
> >>      Be more defensive about the value of tc->processor
> >>
> >
> > Why is it necessary to do a verify_task() again -- it had to
> > have gone through verify_task() in order to get here, right?
> > What exactly did you see that required this?
>
> We've seen dumps where the task structures get corrupted by
> other bad pointers, and crash tended to just give up the ghost
> because of dereferencing tc->processor which would get
> set to some ridiculous value.
>
> I think I've seen errors at both these spots, but one or the
> other may have been from searching through the code for tc->processor.
>

Ok -- I figured that it was due to some unnatural dumpfile.
Anyway, it's harmless, and if it works in those cases, that's fine.

>
> >
> > --- crash-4.0-2.10/task.c       2005-11-07 07:44:06 -08:00
> > +++ crash-4.0-2.10-new/task.c   2005-11-07 18:57:19 -08:00
> > @@ -2942,7 +2942,7 @@
> >                        fprintf(fp, "(SYSRQ)");
> >                else if (machdep->flags & INIT)
> >                        fprintf(fp, "(INIT)");
> > -               else if (kt->cpu_flags[tc->processor] & NMI)
> > +               else if (verify_task(tc, 2) && kt->cpu_flags[tc->processor] & NMI)
> >                        fprintf(fp, "(NMI)");
> >                else if (tc->task == tt->panic_task)
> >                        fprintf(fp, "(PANIC)");
> >
> > This I'm presuming is related to the above, but again, what did you
> > see that requires this?:
> >
> > @@ -4337,7 +4340,8 @@
> >
> >         tc = FIRST_CONTEXT();
> >         for (i = 0; i < RUNNING_TASKS(); i++, tc++) {
> > -               if (task_has_cpu(tc->task, NULL)) {
> > +               if (task_has_cpu(tc->task, NULL)
> > +                    && tc->processor >= 0 && tc->processor < NR_CPUS) {
> >                        tt->panic_threads[tc->processor] = tc->task;
> >                        found++;
> >                }
> >
> >
> >>      Also, try to set the panic string correctly in when decoding dump.
> >
> > This is LKCD specific anyway, right?
>
> I don't think so.  in get_panicmsg() it looks like there's code which
> is trying to get the panicmsg from either tt_panicmsg or LKCD.
>
> This string will get overwritten by looking through the msgbuf...
>

Yeah -- I should have mentioned that the get_panicmsg() code is only
satisfied by ancient MCLX-style dumpfiles.  I'm pretty sure nobody
still has any of those...

So it is LKCD only, which is fine.

>
> >> 7.  x86.c
> >>
> >>      Be a little more liberal deciding that the frame pointer has been included.
> >>      gcc 3.x can re-schedule code so that instead of
> >>
> >>          push bp
> >>          mov esp, bp
> >>
> >>      we sometimes see
> >>
> >>          push bp
> >>          xor eax,eax
> >>          move esp, bp
> >>
> >>      so detect both sequences.
> >
> > This looks reasonable.  Didn't think anybody built kernels that way anymore...
>
> You mean nobody builds with frame pointers anymore?  I guess we do because it's easier
> to debug.  Having 'bt' pull out the arguments on the stack is nice.   If we can get
> similar functionality without frame pointers, we'd be happy to chuck them...
>
>      -castor

Thanks,
  Dave