[Crash-utility] Re: [RFC] Crash patch for DWARF CFI based unwind support

Thu Oct 26 15:46:17 UTC 2006

Rachita Kothiyal wrote:

>
> > Hi Rachita,
> >
> > I've only been able to test this on a live system that has __start_unwind
> > and __end_unwind symbols, so I don't know what a backtrace with an
> > in-kernel exception frame, or a backtrace with a transition to the x86_64
> > IRQ stack or x86_64 exception stacks, would look like.  If you have
> > an example, I'd be interested in seeing how they get handled.
>
> Hi Dave
>
> I was trying to use LKDTM to create various scenarios for crash dump.
> To start with is the case of panic() in an interrupt context. Here I
> am inducing a panic in handle_IRQ_event(), where I am registering a
> jprobe. jp_handle_irq_event() is the jprobe handler which in turn calls
> lkdtm_handler(). Running crash on the dump gives the following:
>
> crash> bt
> PID: 3898   TASK: ffff81022e988e20  CPU: 0   COMMAND: "slapd"
>  #0 [ffffffff8064bcf8] crash_kexec at ffffffff80152211
>  #1 [ffffffff8064bd40] machine_kexec at ffffffff8011a739
>  #2 [ffffffff8064bd80] crash_kexec at ffffffff8015222d
>  #3 [ffffffff8064be08] crash_kexec at ffffffff80152211
>  #4 [ffffffff8064be30] bust_spinlocks at ffffffff8011fd6d
>  #5 [ffffffff8064be40] panic at ffffffff80131410
>  #6 [ffffffff8064beb0] cdrom_pc_intr at ffffffff802ebe68
>  #7 [ffffffff8064bef0] ide_intr at ffffffff802df26f
>  #8 [ffffffff8064bf30] lkdtm_handler at ffffffff8800230d
>  #9 [ffffffff8064bf40] jp_handle_irq_event at ffffffff880023e8
> #10 [ffffffff8064bf50] __do_IRQ at ffffffff801544f4
> #11 [ffffffff8064bf58] __do_softirq at ffffffff80136b8f
> #12 [ffffffff8064bf90] do_IRQ at ffffffff8010bda1
> --- <IRQ stack> ---
> #13 [ffff810229fd5f80] ret_from_intr at ffffffff80109b95
>     [exception RIP: unknown or invalid address]
>     RIP: 0000000000000000  RSP: 0000000000000000  RFLAGS: 00000000
>     RAX: ffffffffffffffff  RBX: 00002afe35608c98  RCX: 00002afe359f7be4
>     RDX: 0000000000000033  RSI: 0000000000000202  RDI: 00007fff754bfbe0
>     RBP: 000000000000000a   R8: 000055555590bca0   R9: 0000000000000000
>     R10: 00002afe35608c98  R11: 0000000000000001  R12: 0000000000000000
>     R13: 0000000000000000  R14: 00002afe3597b1e0  R15: 000055555590b760
>     ORIG_RAX: 000000000000002b  CS: 0000  SS: 0000
> bt: WARNING: possibly bogus exception frame
>     RIP: 00002afe359f7be4  RSP: 00007fff754bfbe0  RFLAGS: 00000202
>     RAX: 00002afe35608c98  RBX: 000055555590b760  RCX: 0000000000000001
>     RDX: 00002afe35608c98  RSI: 0000000000000000  RDI: 000055555590bca0
>     RBP: ffffffff80109c0b   R8: 000000000000000a   R9: 0000000000000000
>     R10: 0000000000000000  R11: 00002afe3597b1e0  R12: 000055555590b760
>     R13: 00007fff754bfd38  R14: 0000000000000001  R15: 000055555590b760
>     ORIG_RAX: ffffffffffffffff  CS: 0033  SS: 002b
> crash> set unwind on
> unwind: on
> crash> bt
> PID: 3898   TASK: ffff81022e988e20  CPU: 0   COMMAND: "slapd"
>  #0 [ffffffff8064bd88] crash_kexec at ffffffff80152211
>  #1 [ffffffff8064be48] panic at ffffffff80131410
>  #2 [ffffffff8064bf38] lkdtm_handler at ffffffff8800230d
> --- <IRQ stack> ---
>  #3 [ffff810229fd5f80] ret_from_intr at ffffffff80109b95
>     [exception RIP: unknown or invalid address]
>     RIP: 0000000000000000  RSP: 0000000000000000  RFLAGS: 00000000
>     RAX: ffffffffffffffff  RBX: 00002afe35608c98  RCX: 00002afe359f7be4
>     RDX: 0000000000000033  RSI: 0000000000000202  RDI: 00007fff754bfbe0
>     RBP: 000000000000000a   R8: 000055555590bca0   R9: 0000000000000000
>     R10: 00002afe35608c98  R11: 0000000000000001  R12: 0000000000000000
>     R13: 0000000000000000  R14: 00002afe3597b1e0  R15: 000055555590b760
>     ORIG_RAX: 000000000000002b  CS: 0000  SS: 0000
> bt: WARNING: possibly bogus exception frame
>  #4 [ffff810229fd5f80] common_interrupt at ffffffff80109b95
>     RIP: 00002afe359f7be4  RSP: 00007fff754bfbe0  RFLAGS: 00000202
>     RAX: 00002afe35608c98  RBX: 000055555590b760  RCX: 0000000000000001
>     RDX: 00002afe35608c98  RSI: 0000000000000000  RDI: 000055555590bca0
>     RBP: ffffffff80109c0b   R8: 000000000000000a   R9: 0000000000000000
>     R10: 0000000000000000  R11: 00002afe3597b1e0  R12: 000055555590b760
>     R13: 00007fff754bfd38  R14: 0000000000000001  R15: 000055555590b760
>     ORIG_RAX: ffffffffffffffff  CS: 0033  SS: 002b
> crash>
>
> Comments?
>

Not really.  I don't understand why IRQ exception frame contents
look the way they do.  The crash utility's "irq_eframe" value,
which gets assigned from a fixed location at the top of the
interrupt stack, is obviously pointing to a bogus exception frame.
There might be something different in newer kernels, although I
don't see it in 2.6.18?

If you do a "help -m", the very last line shows the base address
of each per-cpu interrupt stack, as in:

crash> help -m
...
                          ibase[cpus]:
   ffffffff80453d00 0000010037e28000 0000010037e60000 0000010037e94000
crash>

Each of the interrupt stacks are 16k long, so you can
look at them with a "rd" with a count of 2048.  The linkage
between the interrupt stack and the stack from where it came
is supposed to be at the first word just below the top 64 words
on that stack.

So if I take cpu 2's interrupt stack from above, which is at
10037e60000, and dump it using "rd -s", I see this:

crash> rd -s 10037e60000 2048
...
     10037e63f90:  0000007fbffff4b0 0000007fbffff410
     10037e63fa0:  0000000000000046 0000000000000046
     10037e63fb0:  apic_timer_interrupt+0x85 000001003741df58
     10037e63fc0:  0000000000000000 0000000000000000
     10037e63fd0:  0000000000000000 0000000000000000
     10037e63fe0:  0000000000000000 0000000000000000
     10037e63ff0:  0000000000000000 0000000000000000
crash>

The linkage to the previous stack is at IRQ stack address
10037e63fb8, which points back to a process stack address of
1003741df58.  That process stack address is also a pointer
to the exception frame that was laid down there:

crash> pt_regs 000001003741df58
struct pt_regs {
  r15 = 0x0,
  r14 = 0x1003741df58,
  r13 = 0x7fbfffe840,
  r12 = 0xa8,
  rbp = 0x17dbfc0,
  rbx = 0xffffffff801103ce,
  r11 = 0x246,
  r10 = 0x31d072f928,
  r9 = 0x4587d9,
  r8 = 0x3741df58,
  rax = 0xffffffffffffffda,
  rcx = 0xffffffffffffffff,
  rdx = 0xa8,
  rsi = 0x17dbfc0,
  rdi = 0x4,
  orig_rax = 0x0,
  rip = 0x31d05b85b2,
  cs = 0x33,
  eflags = 0x246,
  rsp = 0x7fbfffe808,
  ss = 0x2b
}

So the last time an interrupt occurred on this
cpu, a program was running in user space (cs 0x33)
when a timer interrupt occurred.

If it had occurred in kernel mode, a kernel mode
exception frame should be there,  So for example,
if I take cpu 3's interrupt stack from above, it
shows this:

crash> rd -s 0000010037e94000 2048
...
     10037e97f90:  0000000000000000 0000000000000000   ................
     10037e97fa0:  0000000000000046 0000000000000046   F.......F.......
     10037e97fb0:  apic_timer_interrupt+0x85 0000010037e83e98   .........>.7....
     10037e97fc0:  0000000000000000 0000000000000000   ................
     10037e97fd0:  0000000000000000 0000000000000000   ................
     10037e97fe0:  0000000000000000 0000000000000000   ................
     10037e97ff0:  0000000000000000 0000000000000000   ................
crash>

And in this case, the timer interrupt occurred while running
in kernel mode (cs 0x10):

crash> pt_regs 0000010037e83e98
struct pt_regs {
  r15 = 0x0,
  r14 = 0x0,
  r13 = 0x0,
  r12 = 0x0,
  rbp = 0x1,
  rbx = 0x0,
  r11 = 0x5,
  r10 = 0x0,
  r9 = 0x8,
  r8 = 0x10037e82000,
  rax = 0x0,
  rcx = 0x0,
  rdx = 0x0,
  rsi = 0x1003f047030,
  rdi = 0x1000102a7e0,
  orig_rax = 0xffffffffffffffef,
  rip = 0xffffffff8010e84c,
  cs = 0x10,
  eflags = 0x246,
  rsp = 0x10037e83f48,
  ss = 0x18
}
crash>

The rip above shows that the cpu was in idle:

crash> sym 0xffffffff8010e84c
ffffffff8010e84c (t) mwait_idle+0x56  include/asm/thread_info.h: 63
crash>

So I cannot explain what's going on in your kernel,
where it looks like that basic stack-linkage assumption
is being violated, or has changed somehow?  Maybe kdump
or jprobes has done something?  I don't know what else to
tell you.

>
> In the stacktrace with 'unwind on', I was expecting to see jp_handle_irq_event
> appear too (as frame 3)..Could my using a module to register the probe be the
> reason ?
>

I don't know.  Maybe since the jprobes procedure mucks with
the text, it somehow alter the unwind mechanism?

Helpless,
  Dave

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/crash-utility/attachments/20061026/9b917e3b/attachment.htm>