[Crash-utility] Crash faults when determining panic task

Lawrence, Joe Joe.Lawrence at stratus.com
Wed Sep 28 20:35:56 UTC 2011


I have a vmcore generated on RHEL6.1 that newer versions of crash have
trouble analyzing (5.1.1-2.el6 seems to work ok). 

I can provide additional binary files if needed, just let me know what
convention best suits the list (ftp, private email attachment, etc.)


Crash Version:        OS:               Result:
crash 5.1.8           Debian wheezy     faults
crash 5.1.7-1.el6     RHEL6.2 Alpha     faults
crash 5.1.1-2.el6     RHEL6.1           ok


Kernel:
2.6.32-131.0.15.el6.exp10.bz16586.x86_64    (2.6.32-131.0.15 + a fix for
Red Hat bz-707268)


Interesting warnings when starting crash:
WARNING: sparsemem: invalid section number: 137438888923
WARNING: sparsemem: invalid section number: 137438888923


First fault, null pointer deference:

please wait... (determining panic task)         
Program received signal SIGSEGV, Segmentation fault.
x86_64_get_dumpfile_stack_frame (rsp=0x7fffffffcc58, rip=0x7fffffffcc50,

    bt_in=0x7fffffffcce0) at x86_64.c:4183
4183                    ur_rip = ULONG(user_regs +
(gdb) p user_regs
$1 = 0x0


Workaround, check that bt->machdep is not NULL:

diff -Nupr crash-5.1.8/x86_64.c crash-5.1.8.new/x86_64.c
--- crash-5.1.8/x86_64.c        2011-09-16 15:01:12.000000000 -0400
+++ crash-5.1.8.new/x86_64.c    2011-09-28 14:12:45.347188571 -0400
@@ -4178,7 +4178,7 @@ x86_64_get_dumpfile_stack_frame(struct b
                                goto skip_stage;
                        }
                }
-       } else if (ELF_NOTES_VALID()) {
+       } else if (ELF_NOTES_VALID() && bt->machdep) {
                user_regs = bt->machdep;
                ur_rip = ULONG(user_regs +
                        OFFSET(user_regs_struct_rip));


Second fault, a curiously large n_descsz in elf note header:

please wait... (determining panic task)         
Program received signal SIGSEGV, Segmentation fault.
get_regs_from_note (note=0xd26472 "\b", ip=0x7fffffffc4e0,
sp=0x7fffffffc4e8)
    at netdump.c:2221
2221            *sp = ULONG(user_regs + offset_sp);
(gdb) p *(Elf64_Nhdr *)note
$1 = {n_namesz = 8, n_descsz = 3438804992, n_type = 8}


Workaround, do not attempt reading registers from elf notes (this chunk
of code was not present in crash 5.1.1):

diff -Nupr crash-5.1.8/netdump.c crash-5.1.8.new/netdump.c
--- crash-5.1.8/netdump.c       2011-09-16 15:01:12.000000000 -0400
+++ crash-5.1.8.new/netdump.c   2011-09-28 14:14:43.687183734 -0400
@@ -2286,7 +2286,7 @@ get_netdump_regs_x86_64(struct bt_info *
 
                bt->machdep = (void *)user_regs;
        }
-
+#if 0
        if (ELF_NOTES_VALID() && 
            (bt->flags & BT_DUMPFILE_SEARCH) && DISKDUMP_DUMPFILE() && 
            (note = (Elf64_Nhdr *)
@@ -2305,7 +2305,7 @@ get_netdump_regs_x86_64(struct bt_info *
 
                bt->machdep = (void *)user_regs;
        }
-
+#endif
         machdep->get_stack_frame(bt, ripp, rspp);  }


Given the warning messages at the beginning of the process, I'm sure if
I'm dealing with a corrupted or incomplete vmcore image.  Let me know
what additional info could be useful if this seems worth debugging
further.

Thanks,

-- Joe Lawrence
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/crash-utility/attachments/20110928/cd3030b6/attachment.htm>


More information about the Crash-utility mailing list