[Crash-utility] [ANNOUNCE][RFC] gcore extension module: user-mode process core dump
Dave Anderson
anderson at redhat.com
Mon Jan 24 19:27:39 UTC 2011
----- Original Message -----
> gcore extension module provides a means to create ELF core dump for
> user-mode process that is contained within crash kernel dump. I design
> this to behave as kernel's ELF core dumper.
>
> For previous discussion, see:
> https://www.redhat.com/archives/crash-utility/2010-August/msg00001.html
A few observations...
I'll fix unwind_x86_64.h to prevent this build warning:
# make extensions
...
gcc -Wall -I.. -I./libgcore -fPIC -DX86_64 -c -o libgcore/gcore_x86.o libgcore/gcore_x86.c
In file included from libgcore/gcore_x86.c:19:
../unwind_x86_64.h:61:1: warning: "offsetof" redefined
In file included from libgcore/gcore_x86.c:17:
../defs.h:60:1: warning: this is the location of the previous definition
...
But the gcore.mk file should gracefully fail to build on non-supported
architectures. It ends up spewing ~200 lines of error messages when
attempted, for example, on a ppc64 machine:
# make extensions
gcc -m64 -Wall -I.. -I./libgcore -fPIC -DPPC64 -c -o libgcore/gcore_coredump.o libgcore/gcore_coredump.c
In file included from libgcore/gcore_coredump.c:17:
./libgcore/gcore_defs.h:355:1: warning: "ELF_NGREG" redefined
In file included from /usr/include/asm/sigcontext.h:13,
from /usr/include/bits/sigcontext.h:28,
from /usr/include/signal.h:339,
from ../defs.h:38,
from libgcore/gcore_coredump.c:16:
/usr/include/asm/elf.h:92:1: warning: this is the location of the previous definition
In file included from libgcore/gcore_coredump.c:17:
./libgcore/gcore_defs.h:356: error: invalid application of ‘sizeof’ to incomplete type ‘struct user_regs_struct’
./libgcore/gcore_defs.h:356: error: conflicting types for ‘elf_gregset_t’
/usr/include/asm/elf.h:124: note: previous declaration of ‘elf_gregset_t’ was here
./libgcore/gcore_defs.h:490: error: conflicting types for ‘__kernel_old_uid_t’
/usr/include/asm/posix_types.h:28: note: previous declaration of ‘__kernel_old_uid_t’ was here
./libgcore/gcore_defs.h:491: error: conflicting types for ‘__kernel_old_gid_t’
/usr/include/asm/posix_types.h:29: note: previous declaration of ‘__kernel_old_gid_t’ was here
libgcore/gcore_coredump.c:25: error: expected ‘)’ before ‘*’ token
libgcore/gcore_coredump.c:33: error: expected declaration specifiers or ‘...’ before ‘Elf_Ehdr’
... [ cut ] ...
./libgcore/gcore_defs.h:490: error: conflicting types for ‘__kernel_old_uid_t’
/usr/include/asm/posix_types.h:28: note: previous declaration of ‘__kernel_old_uid_t’ was here
./libgcore/gcore_defs.h:491: error: conflicting types for ‘__kernel_old_gid_t’
/usr/include/asm/posix_types.h:29: note: previous declaration of ‘__kernel_old_gid_t’ was here
make[3]: [gcore.so] Error 1 (ignored)
#
Your documentation implies that the command would only work on
certain kernel versions:
> Compared with the previous version, this release:
> - supports more kernel versions, and
> - collects register values more accurately (but still not perfect).
>
> Support Range
> =============
>
> |----------------+----------------------------------------------|
> | ARCH | X86, X86_64 |
> |----------------+----------------------------------------------|
> | Kernel Version | RHEL4.8, RHEL5.5, RHEL6.0 and Vanilla 2.6.36 |
> |----------------+----------------------------------------------|
But, for example, on a 2.6.34-2.fc14 kernel (presumably unsupported),
it seems to work OK on some tasks, but on others it doesn't work so well.
Here, the "less" command can be dumped OK kernel:
crash> sys | grep RELEASE
RELEASE: 2.6.34-2.fc14.x86_64
crash> ps
... [ cut ] ...
> 2080 1490 0 ffff880079ed2480 RU 7.6 289900 159684 crash
2084 1 0 ffff880077a7a480 IN 0.1 248592 1936 rsyslogd
2090 2080 5 ffff880079ed4900 IN 0.0 105432 828 less
crash> gcore -v0 2090
Saved core.2090.less
crash>
But with the same (full) 2.6.34-2.fc14 dumpfile, it can't seem to handle
dumping the crash utility itself, and just hangs:
crash> swap
FILENAME TYPE SIZE USED PCT PRIORITY
/dev/dm-1 PARTITION 18579452k 0k 0% -1
crash> ps
... [ cut ] ...
> 2080 1490 0 ffff880079ed2480 RU 7.6 289900 159684 crash
2084 1 0 ffff880077a7a480 IN 0.1 248592 1936 rsyslogd
2090 2080 5 ffff880079ed4900 IN 0.0 105432 828 less
crash> gcore -v1 2080
gcore: Restoring the thread group ...
gcore: done.
gcore: Retrieving note information ...
< hangs forever >
...
I would have thought that it would either work-for-all or work-for-none
with respect to a particular kernel version?
In any case, if it's going to fail, perhaps there should be some mechanism
in place that would prevent it from hanging, and instead print a message
that the kernel version is not supported? Or if a particular data structure
is different than the "supported" versions, it should fail immediately?
Just a thought...
Also I note that "gcore -v7" fails -- shouldn't it be accepted as an argument?
crash> gcore -v7 2080
gcore: invalid vlevel: 7.
crash>
Thanks,
Dave
> TODO
> ====
>
> I have still remaining tasks to do:
> - Improvement on register collection for active tasks
> - Improvement on callee-saved register collection on x86_64
> - Support core dump for tasks running in x86_32 compatibility mode
>
> Usage
> =====
>
> 1) Expand source files under extensions directory.
>
> Arrange the attached source files as shown below:
>
> ./extensions/gcore.c
> ./extensions/gcore.mk
> ./extensions/libgcore/gcore_coredump.c
> ./extensions/libgcore/gcore_coredump_table.c
> ./extensions/libgcore/gcore_defs.h
> ./extensions/libgcore/gcore_dumpfilter.c
> ./extensions/libgcore/gcore_global_data.c
> ./extensions/libgcore/gcore_regset.c
> ./extensions/libgcore/gcore_verbose.c
> ./extensions/libgcore/gcore_x86.c
>
> 2) Type ``make extensions''; then, ``gcore.so'' is generated under
> extensions directory.
>
> 3) Type ``extend gcore.so'' to load gcore extension module.
>
> Look at help message for actual usage: I attach the help message at
> the end of this mail.
>
> 4) Type ``extend -u gcore.so'' to unload gcore extension module.
>
> Help Message
> ============
>
> NAME
> gcore - gcore - retrieve a process image as a core dump
>
> SYNOPSIS
> gcore
> gcore [-v vlevel] [-f filter] [pid | taskp]*
> This command retrieves a process image as a core dump.
>
> DESCRIPTION
>
> -v Display verbose information according to vlevel:
>
> progress library error page fault
> ---------------------------------------
> 0
> 1 x
> 2 x
> 4 x (default)
> 7 x x x
>
> -f Specify kinds of memory to be written into core dumps according to
> the filter flag in bitwise:
>
> AP AS FP FS ELF HP HS
> ------------------------------
> 0
> 1 x
> 2 x
> 4 x
> 8 x
> 16 x x
> 32 x
> 64 x
> 127 x x x x x x x
>
> AP Anonymous Private Memory
> AS Anonymous Shared Memory
> FP File-Backed Private Memory
> FS File-Backed Shared Memory
> ELF ELF header pages in file-backed private memory areas
> HP Hugetlb Private Memory
> HS Hugetlb Shared Memory
>
> If no pid or taskp is specified, gcore tries to retrieve the process
> image
> of the current task context.
>
> The file name of a generated core dump is core.<pid> where pid is PID
> of
> the specified process.
>
> For a multi-thread process, gcore generates a core dump containing
> information for all threads, which is similar to a behaviour of the
> ELF
> core dumper in Linux kernel.
>
> Notice the difference of PID on between crash and linux that ps
> command in
> crash utility displays LWP, while ps command in Linux thread group
> tid,
> precisely PID of the thread group leader.
>
> gcore provides core dump filtering facility to allow users to select
> what
> kinds of memory maps to be included in the resulting core dump. There
> are
> 7 kinds memory maps in total, and you can set it up with set command.
> For more detailed information, please see a help command message.
>
> EXAMPLES
> Specify the process you want to retrieve as a core dump. Here assume
> the
> process with PID 12345.
>
> crash> gcore 12345
> Saved core.12345
> crash>
>
> Next, specify by TASK. Here assume the process placing at the address
> f9d7000 with PID 32323.
>
> crash> gcore f9d78000
> Saved core.32323
> crash>
>
> If multiple arguments are given, gcore performs dumping process in the
> order the arguments are given.
>
> crash> gcore 5217 ffff880136d72040 23299 24459 ffff880136420040
> Saved core.5217
> Saved core.1130
> Saved core.1130
> Saved core.24459
> Saved core.30102
> crash>
>
> If no argument is given, gcore tries to retrieve the process of the
> current
> task context.
>
> crash> set
> PID: 54321
> COMMAND: "bash"
> TASK: e0000040f80c0000
> CPU: 0
> STATE: TASK_INTERRUPTIBLE
> crash> gcore
> Saved core.54321
>
> When a multi-thread process is specified, the generated core file name
> has
> the thread leader's PID; here it is assumed to be 12340.
>
> crash> gcore 12345
> Saved core.12340
>
> It is not allowed to specify two same options at the same time.
>
> crash> gcore -v 1 1234 -v 1
> Usage: gcore
> gcore [-v vlevel] [-f filter] [pid | taskp]*
> gcore -d
> Enter "help gcore" for details.
>
> It is allowed to specify -v and -f options in a different order.
>
> crash> gcore -v 2 5201 -f 21 ffff880126ff9520 5205
> Saved core.5174
> Saved core.5217
> Saved core.5167
> crash> gcore 5201 ffff880126ff9520 -f 21 5205 -v 2
> Saved core.5174
> Saved core.5217
> Saved core.5167
>
> Signed-off-by: HATAYAMA Daisuke <d.hatayama at jp.fujitsu.com>
>
>
> [Text File:gcore.c]
>
>
> [Text File:gcore.mk]
>
>
> [Text File:gcore_coredump.c]
>
>
> [Text File:gcore_coredump_table.c]
>
>
> [Text File:gcore_defs.h]
>
>
> [Text File:gcore_dumpfilter.c]
>
>
> [Text File:gcore_global_data.c]
>
>
> [Text File:gcore_regset.c]
>
>
> [Text File:gcore_verbose.c]
>
>
> [Text File:gcore_x86.c]
More information about the Crash-utility
mailing list