[Crash-utility] Miscellaneous fixes/enhancements to crash 4.0-2.10

Wed Nov 9 22:15:14 UTC 2005

On Wed, 9 Nov 2005, Dave Anderson wrote:

> Castor Fu wrote:
>
>> Hi guys:
>>
>> We're trying to bring in a newer version of crash, and have
>> some changes which we've made since going to version 4.0-2.10
>> which I'll outline below, and have attached a patch:
>>
>> 1. Makefile,  help.c, extensions.c, new file kw.c
>>       Add a keyword expansion mechanism to the 'extend' command.
>>      We did this primarily to let us load different extensions
>>      based on the dump file itself or the version of crash.
>>
>>      +"\n  The named <shared-object> is subjected to a simple keyword expansion",
>>      +"  scheme with the following rules:",
>>      +"    $(crash_version)       version of crash",
>>      +"    $(corestr-<varname>)   string named <varname> taken from the core file",
>>      +"    $(builddir)            build directory for the kernel",
>
> I still have no idea what this does?  Can you expand upon how this works,
> what does it do, what does it looks like, can it be ignored, etc?  Couldn't
> this kind of thing can be self-contained in each extension module?
> It just seems to be way too "implementation-specific" for lack of a
> better term?

Consider a set of extension commands which rely on some data structure
which tends to change from version to version of say, a device driver.
The extension code could, with a great deal more work, be written
to extract structure offsets, etc. from the debugging information,
but it's much easier to just '#include' the header file.

This way, one could have something like

     extend /usr/lib/crash/fc$(corestr-fc_version).o

in one's .crashrc, and the appropriate extensions could be loaded without
the person analyzing the crash having to figure out which version it came from
right away.

[ Along these lines, I'm trying to look at a RHEL diskdump and I must
be missing the point about how one  pulls together the modules to
look at the dump.  I have the rpm kernel-debuginfo-2.6.9-22.EL loaded, and all the 
kernel modules have .debug appended on the filename so 'mod -S' doesn't work.  ]

One could also implement this with an expect wrapper which would give one much
more generality, but then everyone is going to end up with their own little
wrapper....

...

>> 6.  task.c
>>      Be more defensive about the value of tc->processor
>>
>
> Why is it necessary to do a verify_task() again -- it had to
> have gone through verify_task() in order to get here, right?
> What exactly did you see that required this?

We've seen dumps where the task structures get corrupted by
other bad pointers, and crash tended to just give up the ghost
because of dereferencing tc->processor which would get
set to some ridiculous value.

I think I've seen errors at both these spots, but one or the
other may have been from searching through the code for tc->processor.

>
> --- crash-4.0-2.10/task.c       2005-11-07 07:44:06 -08:00
> +++ crash-4.0-2.10-new/task.c   2005-11-07 18:57:19 -08:00
> @@ -2942,7 +2942,7 @@
>                        fprintf(fp, "(SYSRQ)");
>                else if (machdep->flags & INIT)
>                        fprintf(fp, "(INIT)");
> -               else if (kt->cpu_flags[tc->processor] & NMI)
> +               else if (verify_task(tc, 2) && kt->cpu_flags[tc->processor] & NMI)
>                        fprintf(fp, "(NMI)");
>                else if (tc->task == tt->panic_task)
>                        fprintf(fp, "(PANIC)");
>
> This I'm presuming is related to the above, but again, what did you
> see that requires this?:
>
> @@ -4337,7 +4340,8 @@
>
>         tc = FIRST_CONTEXT();
>         for (i = 0; i < RUNNING_TASKS(); i++, tc++) {
> -               if (task_has_cpu(tc->task, NULL)) {
> +               if (task_has_cpu(tc->task, NULL)
> +                    && tc->processor >= 0 && tc->processor < NR_CPUS) {
>                        tt->panic_threads[tc->processor] = tc->task;
>                        found++;
>                }
>
>
>>      Also, try to set the panic string correctly in when decoding dump.
>
> This is LKCD specific anyway, right?

I don't think so.  in get_panicmsg() it looks like there's code which
is trying to get the panicmsg from either tt_panicmsg or LKCD.

This string will get overwritten by looking through the msgbuf...

>> 7.  x86.c
>>
>>      Be a little more liberal deciding that the frame pointer has been included.
>>      gcc 3.x can re-schedule code so that instead of
>>
>>          push bp
>>          mov esp, bp
>>
>>      we sometimes see
>>
>>          push bp
>>          xor eax,eax
>>          move esp, bp
>>
>>      so detect both sequences.
>
> This looks reasonable.  Didn't think anybody built kernels that way anymore...

You mean nobody builds with frame pointers anymore?  I guess we do because it's easier
to debug.  Having 'bt' pull out the arguments on the stack is nice.   If we can get 
similar functionality without frame pointers, we'd be happy to chuck them...

     -castor