[Crash-utility] [ANNOUNCE][RFC] gcore extension module: user-mode process core dump

Mon Jan 31 00:09:57 UTC 2011

From: Dave Anderson <anderson at redhat.com>
Subject: Re: [ANNOUNCE][RFC] gcore extension module: user-mode process core dump
Date: Fri, 28 Jan 2011 09:31:50 -0500 (EST)

> 
> 
> ----- Original Message -----
> 
>> Also, I have a question about the fact that gcore hanged during the
>> process of gathering note information.
>> 
>> I attempted reproducing the bug on 2.6.35.10-74.fc14.x86_64 with
>> crash-5.0.6-2.fc14.x86_64 and crash-5.1.1, but it have not been
>> reproduced yet: gcore worked well for both crash versions.
>> 
>> I then retried using 2.6.34-2.fc14.x86_64 but failed to boot on the
>> same environment as in 2.6.35.10-74.fc14.x86_64.
>> 
>> So, questions I have are: In what kind of environments did you face
>> the hang? I want to and need to set up the same environment as
>> yours. In Fedora Alpha, its kernel version was already 2.6.35
>> according to the release notes:
>> 
>> http://fedoraproject.org/wiki/Fedora_14_Alpha_release_notes#Linux_Kernel_2.6.35
>> 
>> Also, it is helpful if you show me a backtrace during gcore hanging.
> 
> I retested it with the latest gcore.tar.bz2 using the same fc14 dumpfile
> and it works OK.  
> 

That's a good news. I've got confirmed the cause is in restore_frame_pointer().

> I did re-verify that it hangs with the older version:
> 
> # ls -l /root/gcore.tar.bz2 gcore.tar.bz2
> -rw-r--r-- 1 root root 28666 Jan 24 11:05 /root/gcore.tar.bz2  <- hangs
> -rw-r--r-- 1 root root 29266 Jan 27 10:15 gcore.tar.bz2        <- works OK
> # 
> 
> (gdb) bt
> #0  0x0000003e838cd6a0 in __lseek_nocancel () from /lib64/libc.so.6
> #1  0x0000000000534fd8 in read_netdump (fd=-1, bufptr=0x7fffeb5977e0, cnt=8, addr=18446612134417074248, paddr=2102855752)
>     at netdump.c:526
> #2  0x000000000053b663 in read_kdump (fd=-1, bufptr=0x7fffeb5977e0, cnt=8, addr=18446612134417074248, paddr=2102855752)
>     at netdump.c:2553
> #3  0x000000000046bc1b in readmem (addr=18446612134417074248, memtype=1, buffer=0x7fffeb5977e0, size=8, 
>     type=0x2b95faf6d370 "restore_frame_pointer: resume rbp", error_handle=5) at memory.c:1849
> #4  0x00002b95faf6980c in restore_frame_pointer () from ./extensions/gcore.so
> #5  0x00002b95faf6a196 in restore_rest () from ./extensions/gcore.so
> #6  0x00002b95faf69d51 in genregs_get () from ./extensions/gcore.so
> #7  0x00002b95faf6585c in fill_thread_core_info () from ./extensions/gcore.so
> #8  0x00002b95faf65ccc in fill_note_info () from ./extensions/gcore.so
> #9  0x00002b95faf64755 in gcore_coredump () from ./extensions/gcore.so
> #10 0x00002b95faf6a95e in do_gcore () from ./extensions/gcore.so
> #11 0x00002b95faf6a7f9 in cmd_gcore () from ./extensions/gcore.so
> #12 0x0000000000454631 in exec_command () at main.c:674
> #13 0x00000000004544de in main_loop () at main.c:633
> #14 0x0000000000578b39 in captured_command_loop (data=0x3) at ./main.c:226
> #15 0x0000000000577cfb in catch_errors (func=0x578b30 <captured_command_loop>, func_args=0x0, errstring=0x82092c "", 
>     mask=<value optimized out>) at exceptions.c:520
> #16 0x0000000000579286 in captured_main (data=<value optimized out>) at ./main.c:924
> #17 0x0000000000577cfb in catch_errors (func=0x578b70 <captured_main>, func_args=0x7fffeb597f70, errstring=0x82092c "", 
>     mask=<value optimized out>) at exceptions.c:520
> #18 0x00000000005788d4 in gdb_main (args=0x7d56fb40) at ./main.c:939
> #19 0x0000000000578916 in gdb_main_entry (argc=<value optimized out>, argv=0x7d56fb40) at ./main.c:959
> #20 0x00000000004d2b7d in gdb_main_loop (argc=2, argv=0x7fffeb598478) at gdb_interface.c:78
> #21 0x0000000000454281 in main (argc=3, argv=0x7fffeb598478) at main.c:547
> (gdb)

Thanks for giving me a backtrace. It helps a lot.

It looks to me that restore_frame_pointer() loops here during the
trivial operation of tracing frame pointers on the stack.

I guess from the situation that the values of frame pointer are
looping on the kernel stack. Some of a serise of frame pointers are
broken?

> 
> If you're still interested, I can make the vmlinux/vmcore available to you.

I'm still interested in that. Could you provide me with them? I need
to figure out exact situtation of kernel stack relevant to the
behaviour of restore_frame_pointer().

Thanks,
HATAYAMA Daisuke