[Crash-utility] crash can not read ia64 lkcd v9 dump

Fri Dec 8 19:31:04 UTC 2006

Bernhard Walle wrote:

> Hello,
>
> * Dave Anderson <anderson at redhat.com> [2006-12-08 19:37]:
> > I wish I could help you out, but w/respect to anything associated with
> > LKCD, I'm only a receptor of patches from the LKCD developers on the
> > list, and I personally don't do any work with them at all.
> >
> > That whole ia64-specific lkcd_fix_mem.c file came from Troy Heber for
> > ia64 LKCD dumpfile support (troy.heber at hp.com).  Troy's an active
> > contributor on this list, and may have a quick answer -- I'm afraid I
> > have no idea what it does...
>
> Anyway, thanks for the information!
>
> > Yes I agree (presuming that eventually the list turns into 64-bit
> > symbol values).  But I don't see any attachment other than your pgp
> > signature?
>
> Sorry, I just forgot it. Here it is.
>
> And yes, the values are larger in the end. And also our
> /boot/System.map on IA64 has zero-prefixes. The map.4 file is
> generated by lkcd, and they simply don't use the zero prefixes. Which
> should not matter, IMO. So I vote for applying the attached patch.
>

Ah, Ok -- that makes sense now...

And your patch is sane -- and queued for the next release.

>
> > I'd never seen those types of __crc_ absolutes, probably because
> > they don't show up in our kernels.  The closest 2.6.5-era ia64
> > Red Hat kernel (2.6.5-1.358) System.map starts out like this:
> >
> > a00000010010c5a0 T I_BDEV
> > a0000001005bd140 r __ksymtab_I_BDEV
> > a0000001005c6ca8 r __kstrtab_I_BDEV
> > a00000010030ff60 T QUIRK_LIST
> > a0000001005c0950 r __ksymtab_QUIRK_LIST
> > a0000001005cb698 r __kstrtab_QUIRK_LIST
> > a000000100714ff8 S ROOT_DEV
> > a0000001005bb020 r __ksymtab_ROOT_DEV
> > a0000001005c4248 r __kstrtab_ROOT_DEV
> > a00000010030fd00 T SELECT_DRIVE
> > a0000001005c0920 r __ksymtab_SELECT_DRIVE
> > a0000001005cb660 r __kstrtab_SELECT_DRIVE
> > a00000010030fde0 T SELECT_INTERRUPT
> >
> > Whatever... maybe a different build CONFIG or something?
>
> Hm ..., I also don't understand why the /boot/System.map of the same
> kernel isn't identical to the map.4 file generated by klcd. In fact,
> map.4 is missing symbol and gdb fails to load. But even with the
> /boot/System.map of the right kernel, it doesn't work (i.e. backtrace
> is complete garbage).
>

I'm guessing that the backtrace of the active tasks are bogus,
but all of the sleeping tasks backtraces are OK?  If the LKCD
dump operation does *not* force the panic task and the other
currently-active tasks to drop a switch_stack on their stacks,
you'll not get a backtrace.  The panic task and active tasks in
the netdump, diskdump and kdump facilities all run through
an unw_init_running() as part of their shutdown procedures,
and each cpu stores the address of it's switch_stack in its
current->thread.ksp field.  Then the ia64 backtrace needs no
special handling between active and non-active tasks.

I would have thought that LKCD would do the same kind
of thing?  If the LKCD facilility *does* do that, then
it's a matter of finding the location of the switch_stack
on the kernel stack.

BTW, worst case, you can get a rough idea of what's going
on by using "bt -t", which dumps all of the kernel text addresses
found from just above the task_struct to the end of the stack.

For example, here's a "echo c > /proc/sysrq-trigger" kdump,
where you get a clear backtrace:

crash> bt
PID: 3235   TASK: e0000040484a0000  CPU: 0   COMMAND: "bash"
 #0 [BSP:e0000040484a13e8] machine_kexec at a000000100058a10
 #1 [BSP:e0000040484a13c8] crash_kexec at a0000001000cbea0
 #2 [BSP:e0000040484a13a0] sysrq_handle_crashdump at a0000001003a0680
 #3 [BSP:e0000040484a1350] __handle_sysrq at a00000010039fec0
 #4 [BSP:e0000040484a1320] write_sysrq_trigger at a0000001001e3390
 #5 [BSP:e0000040484a12d0] vfs_write at a000000100156800
 #6 [BSP:e0000040484a1258] sys_write at a000000100157350
 #7 [BSP:e0000040484a1258] ia64_ret_from_syscall at a00000010000c560
  EFRAME: e0000040484a7e40
      B0: 2000000000152820      CR_IIP: a000000000010620
 CR_IPSR: 00001213085a6010      CR_IFS: 0000000000000008
  AR_PFS: c000000000000008      AR_RSC: 000000000000000f
 AR_UNAT: 0000000000000000     AR_RNAT: 0000000000000000
  AR_CCV: 0000000000000000     AR_FPSR: 0009804c8a70033f
  LOADRS: 0000000001b80000 AR_BSPSTORE: 60000fff7fffc380
      B6: 2000000000218cc0          B7: a000000000010640
      PR: 0000000000590a41          R1: 2000000000290238
      R2: c000000000001fc7          R3: 60000ffffe76b6f0
      R8: 0000000000000001          R9: 0000000000000000
     R10: 0000000000000000         R11: c000000000000512
     R12: 60000ffffe76b6d0         R13: 200000000004f790
     R14: 0000000000000063         R15: 0000000000000403
     R16: 60000000000641ff         R17: 60000ffffe76b6b0
     R18: 0000000000000000         R19: 6000000000064210
     R20: 0000000000000001         R21: 6000000000030063
     R22: 2000000000636e79         R23: 6000000000064200
     R24: 0000000000000010         R25: 0000000000000000
     R26: c000000000000004         R27: 000000000000000f
     R28: a000000000010620         R29: 00001213085a6010
     R30: 0000000000000008         R31: 00000000005a0a41
      F6: 000000000000000000000     F7: 000000000000000000000
      F8: 000000000000000000000     F9: 000000000000000000000
     F10: 000000000000000000000    F11: 000000000000000000000
 #8 [BSP:e0000040484a1258] __kernel_syscall_via_break at a000000000010620
crash>

But if I do a "bt -t" on the same task, because the ia64 BSP area
is just above the task_struct, you can see the trace in kind of a
"reverse order":

crash> bt -t
PID: 3235   TASK: e0000040484a0000  CPU: 0   COMMAND: "bash"
              START: machine_kexec at a000000100058a10
  [e0000040484a12b8] ia64_ret_from_syscall at a00000010000c560
  [e0000040484a1308] sys_write at a000000100157350
  [e0000040484a1338] vfs_write at a000000100156800
  [e0000040484a1388] write_sysrq_trigger at a0000001001e3390
  [e0000040484a13b0] __handle_sysrq at a00000010039fec0
  [e0000040484a13d0] sysrq_handle_crashdump at a0000001003a0680
  [e0000040484a13e0] __handle_sysrq at a00000010039fe70
  [e0000040484a13f0] crash_kexec at a0000001000cbea0
  [e0000040484a1420] machine_kexec at a000000100058a10
  [e0000040484a1450] unw_init_running at a00000010000cdb0
  [e0000040484a1488] ia64_machine_kexec at a000000100058c60
  [e0000040484a14a8] ia64_handle_irq at a000000100011cd0
  [e0000040484a14d8] ia64_handle_irq at a000000100011c50
  [e0000040484a14f8] __do_IRQ at a0000001000e4120
  [e0000040484a1508] irq_exit at a000000100087220
  [e0000040484a1538] iosapic_end_level_irq at a00000010004f730
  [e0000040484a1550] __do_IRQ at a0000001000e4070
  [e0000040484a1580] do_softirq at a000000100087150
  [e0000040484a15c0] __do_softirq at a000000100086f90
  [e0000040484a1620] net_rx_action at a00000010051efc0
  [e0000040484a1630] e1000_check_options at a000000200965e18
  [e0000040484a16d0] e1000_clean at a00000020093e8e0
  [e0000040484a16e0] e1000_check_options at a000000200965e18
  [e0000040484a16f0] ip_rcv at a000000100568e40
  [e0000040484a1710] e1000_clean_rx_irq at a000000200944cd0
  [e0000040484a1770] __do_softirq at a000000100086f90
  [e0000040484a17c8] net_rx_action at a00000010051efc0
  [e0000040484a17d8] e1000_check_options at a000000200965e18
  [e0000040484a1880] e1000_clean at a00000020093e8e0
  [e0000040484a1890] e1000_check_options at a000000200965e18
  [e0000040484a18a0] ip_rcv at a000000100568e40
  [e0000040484a18c0] e1000_clean_rx_irq at a000000200944cd0
  [e0000040484a18f8] sync_buffer at a00000010015cb40
  [e0000040484a1910] io_schedule at a000000100620bd0
  [e0000040484a1940] __delayacct_blkio_start at a0000001000ebc70
  [e0000040484a19b8] io_schedule at a000000100620c00
  [e0000040484a78d0] machine_kexec at a000000100058a10
  [e0000040484a7c10] machine_kexec at a000000100058a10
  [e0000040484a7ca0] schedule at a00000010061f7c0
crash>

But -- since the ia64 is the only processor for which you
can get real, dependable, backtraces for, it would be nice
if it could work for LKCD dumpfiles.

Dave

>
> But first I'll fix the header format which _is_ different in crash and
> our SLES9 kernel (and klcdutils), and if it then doesn't work I'll
> come back to the system maps.
>
> Thanks for your help!
>
> Regards,
>   Bernhard
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/crash-utility/attachments/20061208/31db1efc/attachment.htm>