[Crash-utility] Re: Problem with using crash 4.0-2.21 on ppc

Haren Myneni haren at us.ibm.com
Fri Feb 24 02:51:03 UTC 2006


Dave Anderson wrote:

> Haren Myneni wrote:
>
>> Rachita Kothiyal wrote:
>>
>> >On Thu, Feb 23, 2006 at 09:49:37AM -0500, Dave Anderson wrote:
>> >
>> >
>> >>Ok, then I guess I'll take that as a thumbs-up.
>> >>
>> >>Waiting on Rachita's go-ahead...
>> >>
>> >>
>> >
>> >Dave,
>> >
>> >After the application of the patch (posted by Haren)
>> >on crash-4.0-2.21, I am now able to open the dump using crash
>> >for analysis.
>> >
>> >The following may be unrelated to the present discussion, but
>> >it is an observation:
>> >
>> >When I do 'bt -a' I get the following error on one of the cpus:
>> >
>> >PID: 2871   TASK: c000000161d05800  CPU: 4   COMMAND: "klogd"
>> >bt: invalid kernel virtual address: ff807a50  type: "Regs NIP value"
>> >
>> >
>> Rachita,
>>     As I mentioned before, this task should be running in user space.
>> You should notice the similar kind of stack trace even using GDB. Better
>> to give proper error message here.
>>  
>>
> Is ff807a50 typically a legitimate user-space stack address
> in ppc64 user VM?  You could probably run the address
> through IN_TASK_VMA(), and if it is a valid user-space
> stack address, just indicate that the process was running
> in user-space.

ff807a50 is in user space. Yes, the kernel address on PPC64 starts at 
c000000000000000. Not only we should print an message says that "running 
is user space", but also to display traces for other active traces. It 
is a bug too. I will send the fix ASAP.

> Now I understand why you (ppc64) dump the register set
> first, because all the other processor types would show
> a stack trace emanating from user-space down into the
> reception of the IP interrupt issued by the panicking
> processor.
>
Yes, displaying regs is already part of stack trace on other archs.

>>  
>> About your other issue: I could not reproduce it.
>>
>> crash 4.0-2.21
>> Copyright (C) 2002, 2003, 2004, 2005, 2006  Red Hat, Inc.
>> Copyright (C) 2004, 2005, 2006  IBM Corporation
>> Copyright (C) 1999-2006  Hewlett-Packard Co
>> Copyright (C) 2005  Fujitsu Limited
>> Copyright (C) 2005  NEC Corporation
>> Copyright (C) 1999, 2002  Silicon Graphics, Inc.
>> Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
>> This program is free software, covered by the GNU General Public 
>> License,
>> and you are welcome to change it and/or distribute copies of it under
>> certain conditions.  Enter "help copying" to see the conditions.
>> This program has absolutely no warranty.  Enter "help warranty" for 
>> details.
>>
>> GNU gdb 6.1
>> Copyright 2004 Free Software Foundation, Inc.
>> GDB is free software, covered by the GNU General Public License, and 
>> you are
>> welcome to change it and/or distribute copies of it under certain
>> conditions.
>> Type "show copying" to see the conditions.
>> There is absolutely no warranty for GDB.  Type "show warranty" for 
>> details.
>> This GDB was configured as "powerpc64-unknown-linux-gnu"...
>>
>> crash: pglist_data.node_mem_map structure member does not exist.
>> crash: certain memory-related commands will fail or display invalid data
>>
>>       KERNEL: /home/hbabu/2616-rc2-k1/vmlinux
>>     DUMPFILE: /home/vmcore_2616_rc2_0207
>>         CPUS: 2
>>         DATE: Tue Feb  7 16:56:08 2006
>>       UPTIME: 00:00:09
>> LOAD AVERAGE: 0.05, 0.24, 0.12
>>        TASKS: 57
>>     NODENAME: elm3a135
>>      RELEASE: 2.6.16-rc2-kexec-k1
>>      VERSION: #6 SMP Tue Feb 7 16:46:10 PST 2006
>>      MACHINE: ppc64  (unknown Mhz)
>>       MEMORY: 2.9 GB
>>        PANIC: "SysRq : Trigger a crashdump"
>>          PID: 11076
>>      COMMAND: "kpanic"
>>         TASK: c00000000bc6d800  [THREAD_INFO: c0000000ac504000]
>>          CPU: 1
>>        STATE: TASK_RUNNING (SYSRQ)
>>
>> crash> bt
>> PID: 11076  TASK: c00000000bc6d800  CPU: 1   COMMAND: "kpanic"
>>
>>  R0:  0000000000000000    R1:  c0000000ac507970    R2:  c00000000077a4a0
>>  R3:  c0000000ac5079e0    R4:  0000000000000000    R5:  0000000000000000
>>  R6:  756d700d0a657220    R7:  6120637261736864    R8:  0000000000000000
>>  R9:  c0000000007b0fa0    R10: 0000000000000000    R11: c0000000007b0fa8
>>  R12: 8000000000001032    R13: c0000000005a5d80    R14: 0000000000000000
>>  R15: 0000000000000000    R16: 00000000100bbf08    R17: 00000000100bbeb8
>>  R18: 0000000010070000    R19: 0000000000000000    R20: 0000000010046720
>>  R21: 000000000000001f    R22: 00000000100040e8    R23: 0000000010004d74
>>  R24: 8000000000009032    R25: 0000000000000000    R26: 0000000000000000
>>  R27: 0000000000000063    R28: 0000000000000009    R29: 0000000000000000
>>  R30: c0000000005e1560    R31: c0000000b96dd000
>>  NIP: c0000000000777a8    MSR: 8000000000001032    OR3: c0000000ac7202f8
>>  CTR: c000000000278b04    LR:  c000000000278b18    XER: 0000000000000000
>>  CCR: c0000000ac507b90    MQ:  0000000000000000    DAR: 0000000000000063
>>  DSISR: 0000000000000009     Syscall Result: 0000000000000000
>>  NIP [c0000000000777a8] .crash_kexec
>>  LR  [c000000000278b18] .sysrq_handle_crashdump
>>
>>  #0 [c0000000ac507970] .crash_kexec at c0000000000777d0
>>  #1 [c0000000ac507b50] .sysrq_handle_crashdump at c000000000278b18
>>  #2 [c0000000ac507bd0] .__handle_sysrq at c0000000002789c0
>>  #3 [c0000000ac507c80] .write_sysrq_trigger at c000000000105478
>>  #4 [c0000000ac507d00] .vfs_write at c0000000000b72ec
>>  #5 [c0000000ac507d90] .sys_write at c0000000000b74c4
>>  #6 [c0000000ac507e30] syscall_exit at c0000000000086f8
>>  syscall  [c01] exception frame:
>>  R0:  0000000000000004    R1:  00000000ffd109d0    R2:  000000004001ee60
>>  R3:  0000000000000001    R4:  000000001004f4a8    R5:  0000000000000002
>>  R6:  000000001004f3a8    R7:  0000000000000011    R8:  000000001004f530
>>  R9:  0000000000000000    R10: 0000000000000000    R11: 0000000000000000
>>  R12: 0000000000000000    R13: 000000001004c9d8
>>  NIP: 000000000ff691e8    MSR: 000000000200f032    OR3: 0000000000000001
>>  CTR: 00000000100040ec    LR:  000000001000432c    XER: 0000000020000000
>>  CCR: 0000000048008448    MQ:  c00000000077a4a0    DAR: 00000000100040ec
>>  DSISR: 0000000040000000     Syscall Result: 0000000000000000
>>
>> crash> set -c 0
>>     PID: 0
>> COMMAND: "swapper"
>>    TASK: c0000000005a5050  (1 of 2)  [THREAD_INFO: c000000000558000]
>>     CPU: 0
>>   STATE: TASK_RUNNING (ACTIVE)
>> crash> bt
>> PID: 0      TASK: c0000000005a5050  CPU: 0   COMMAND: "swapper"
>>
>>  R0:  0000000000000000    R1:  c00000000055bd80    R2:  c00000000077a4a0
>>  R3:  0000000000000000    R4:  c0000000005a5350    R5:  0000000000000002
>>  R6:  0000000024004042    R7:  0000000000000000    R8:  c00000000055ba00
>>  R9:  c0000000005a4e88    R10: 0000008000000000    R11: 00003fef00100649
>>  R12: 0000000028004028    R13: c0000000005a5b80
>>  NIP: c000000000018648    MSR: 8000000000009032    OR3: 0000000000000000
>>  CTR: 0000000000000000    LR:  c0000000000186b8    XER: 0000000020000000
>>  CCR: 0000000044004042    MQ:  c0000000005a5050    DAR: c0000000b780b780
>>  DSISR: c0000000000186b8     Syscall Result: 0000000000000000
>>  NIP [c000000000018648] .default_idle
>>
>>  #0 [c00000000055bd80] .default_idle at c0000000000186b8
>>  #1 [c00000000055be00] .cpu_idle at c0000000000184f4
>>  #2 [c00000000055be70] .rest_init at c0000000000092f4
>>  #3 [c00000000055bef0] .start_kernel at c000000000502760
>>  #4 [c00000000055bf90] .hmt_init at c000000000008574
>> crash> set -c 1
>>     PID: 11076
>> COMMAND: "kpanic"
>>    TASK: c00000000bc6d800  [THREAD_INFO: c0000000ac504000]
>>     CPU: 1
>>   STATE: TASK_RUNNING (SYSRQ)
>> crash> bt
>> PID: 11076  TASK: c00000000bc6d800  CPU: 1   COMMAND: "kpanic"
>>
>>  R0:  0000000000000000    R1:  c0000000ac507970    R2:  c00000000077a4a0
>>  R3:  c0000000ac5079e0    R4:  0000000000000000    R5:  0000000000000000
>>  R6:  756d700d0a657220    R7:  6120637261736864    R8:  0000000000000000
>>  R9:  c0000000007b0fa0    R10: 0000000000000000    R11: c0000000007b0fa8
>>  R12: 8000000000001032    R13: c0000000005a5d80    R14: 0000000000000000
>>  R15: 0000000000000000    R16: 00000000100bbf08    R17: 00000000100bbeb8
>>  R18: 0000000010070000    R19: 0000000000000000    R20: 0000000010046720
>>  R21: 000000000000001f    R22: 00000000100040e8    R23: 0000000010004d74
>>  R24: 8000000000009032    R25: 0000000000000000    R26: 0000000000000000
>>  R27: 0000000000000063    R28: 0000000000000009    R29: 0000000000000000
>>  R30: c0000000005e1560    R31: c0000000b96dd000
>>  NIP: c0000000000777a8    MSR: 8000000000001032    OR3: c0000000ac7202f8
>>  CTR: c000000000278b04    LR:  c000000000278b18    XER: 0000000000000000
>>  CCR: c0000000ac507b90    MQ:  0000000000000000    DAR: 0000000000000063
>>  DSISR: 0000000000000009     Syscall Result: 0000000000000000
>>  NIP [c0000000000777a8] .crash_kexec
>>  LR  [c000000000278b18] .sysrq_handle_crashdump
>>
>>  #0 [c0000000ac507970] .crash_kexec at c0000000000777d0
>>  #1 [c0000000ac507b50] .sysrq_handle_crashdump at c000000000278b18
>>  #2 [c0000000ac507bd0] .__handle_sysrq at c0000000002789c0
>>  #3 [c0000000ac507c80] .write_sysrq_trigger at c000000000105478
>>  #4 [c0000000ac507d00] .vfs_write at c0000000000b72ec
>>  #5 [c0000000ac507d90] .sys_write at c0000000000b74c4
>>  #6 [c0000000ac507e30] syscall_exit at c0000000000086f8
>>  syscall  [c01] exception frame:
>>  R0:  0000000000000004    R1:  00000000ffd109d0    R2:  000000004001ee60
>>  R3:  0000000000000001    R4:  000000001004f4a8    R5:  0000000000000002
>>  R6:  000000001004f3a8    R7:  0000000000000011    R8:  000000001004f530
>>  R9:  0000000000000000    R10: 0000000000000000    R11: 0000000000000000
>>  R12: 0000000000000000    R13: 000000001004c9d8
>>  NIP: 000000000ff691e8    MSR: 000000000200f032    OR3: 0000000000000001
>>  CTR: 00000000100040ec    LR:  000000001000432c    XER: 0000000020000000
>>  CCR: 0000000048008448    MQ:  c00000000077a4a0    DAR: 00000000100040ec
>>  DSISR: 0000000040000000     Syscall Result: 0000000000000000
>>
>> crash>
>>
>> Probably, this issue is showing up on your system (has 8 CPUS) since my
>> system is having only 2 CPUs. We need to investigate.
>>  
>>
> That's all I could think of as well.  Rachita also didn't mention
> whether he could do "set <task|pid>" of that same task, and then
> get a backtrace?  But a crash-gdb backtrace would be helpful.

It looks like her system has 16 CPUs (I believe with SMT). I also 
checked whether enabling SMT will cause the problem on 
paca[cpu#].dataoffset. Based on my information so far, paca[] will be 
created even for SMT threads too.

>>  
>> Dave, I tested very few commands on PPC64 vmcore. Where as Rachita is
>> doing more testing. We might see some bugs which I have not encountered.
>> We will get back to you with patches as we find bugs.
>>  
>
> That's understood and not a problem -- especially on kernels
> that are beyond the RHEL4 era.  Do you want me to go ahead
> and put out a new release with your paca fix?

Sure, if you have some other fixes or on other archs. Otherwise, can we 
wait for early next week. I am wondering what is causing for Rachita's 
issue. Is it related to the same paca.dataoffset patch? just want to 
make sure.

BTW, I ran the crash tool on RHEL4 vmcore (not the recent RHEL4 update 
version) to see whether I am breaking backward compatibility. Small fix. 
I somehow overlooked. Sorry. Probably, that might be the reason I saved 
one RHEL4 vmcore and the corresponding vmlinux.debug.

Thanks
Haren


> Dave
>  
>
>------------------------------------------------------------------------
>
>--
>Crash-utility mailing list
>Crash-utility at redhat.com
>https://www.redhat.com/mailman/listinfo/crash-utility
>  
>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: crash-bt-back-compat.patch
Type: text/x-patch
Size: 411 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/crash-utility/attachments/20060223/4ba5fc11/attachment.bin>


More information about the Crash-utility mailing list