[Crash-utility] [RFC] gcore subcommand: a process coredump feature
S.Iguchi
iguchi.sg at ncos.nec.co.jp
Thu Aug 5 02:02:33 UTC 2010
Hi,
From: HATAYAMA Daisuke <d.hatayama at jp.fujitsu.com>
Subject: Re: [Crash-utility] [RFC] gcore subcommand: a process coredump feature
Date: Tue, 03 Aug 2010 15:17:00 +0900 (東京 (標準時))
> Hello Iguchi-san,
>
> Thanks for your comments.
>
> From: "S.Iguchi" <iguchi.sg at ncos.nec.co.jp>
> Subject: Re: [Crash-utility] [RFC] gcore subcommand: a process coredump feature
> Date: Tue, 03 Aug 2010 13:10:09 +0900 (JST)
>
> > Hi, Hatayama-san
> >
> > I have a mostly same purpose extension with your patch.
> > But your patch is great! , because supporting latest kernel and
> > also dump filter masking.
> >
> > my current extention file is attached.
> > Yes, my code is quite buggy, ugly and not enough against latest kernel
> > than yours.
> > (sigh ... I didnot know fill_vma_cache(), so do "vm -p" everytime before dump.)
> >
> > BTW, I have some comments.
> > I'd like to add some features below to yours.
> > or if you will do, it is happy for me. :)
> >
> > - support i386
> > - support elf32 binary on x86-64
> > - support old kernel (before 2.6.17)
> >
> > as Dave said, if your patch committed as extension,
> > I could submit some patches to that.
> >
> > How about this?
>
> As I've written in the first entry, I have a plan to support RHEL4,
> RHEL5 and RHEL6 on i386, x86_64 and IA64, and the latest upstream
> kernel, too. Next table shows correspondence of community's kernel
> versions.
>
> RHEL4 RHEL5 RHEL6 upstream
> ---------------------------------
> 2.6.9 2.6.18 2.6.32 2.6.35
>
> So, it could probably be enough for your first and third requests.
>
Ugh, i didnt check RHEL4 ... sorry.
thank you for your explanation.
> On the other hand, I've not planned to support ia32 emulation over
> both x86_64 and ia64.
>
OK.
it is enough for me to support ia32 emulation on x86-64 ...
if your extension applied, I'll think about it.
Thanks.
Regards,
Seigo Iguchi
> >
> > Best regards,
> > Seigo Iguchi
> >
> >
> > From: HATAYAMA Daisuke <d.hatayama at jp.fujitsu.com>
> > Subject: [Crash-utility] [RFC] gcore subcommand: a process coredump feature
> > Date: Mon, 02 Aug 2010 18:00:02 +0900 (東京 (標準時))
> >
> >> Hello,
> >>
> >> For some weeks I've developed gcore subcommand for crash utility which
> >> provides process coredump feature for crash kernel dump, strongly
> >> demanded by users who want to investigate user-space applications
> >> contained in kernel crash dump.
> >>
> >> I've now finished making a prototype version of gcore and found out
> >> what are the issues to be addressed intensely. Could you give me any
> >> comments and suggestions on this work?
> >>
> >>
> >> Motivation
> >> ==========
> >>
> >> It's a relatively familiar technique that in a cluster system a
> >> currently running node triggers crash kernel dump mechanism when
> >> detecting a kind of a critical error in order for the running, error
> >> detecting server to cease as soon as possible. Concequently, the
> >> residual crash kernel dump contains a process image for the erroneous
> >> user application. At the case, developpers are interested in user
> >> space, rather than kernel space.
> >>
> >> There's also a merit of gcore that it allows us to use several
> >> userland debugging tools, such as GDB and binutils, in order to
> >> analyze user space memory.
> >>
> >>
> >> Current Status
> >> ==============
> >>
> >> I confirm the prototype version runs on the following configuration:
> >>
> >> Linux Kernel Version: 2.6.34
> >> Supporting Architecture: x86_64
> >> Crash Version: 5.0.5
> >> Dump Format: ELF
> >>
> >> I'm planning to widen a range of support as follows:
> >>
> >> Linux Kernel Version: Any
> >> Supporting Architecture: i386, x86_64 and IA64
> >> Dump Format: Any
> >>
> >>
> >> Issues
> >> ======
> >>
> >> Currently, I have issues below.
> >>
> >> 1) Retrieval of appropriate register values
> >>
> >> The prototype version retrieves register values from a _wrong_
> >> location: a top of the kernel stack, into which register values are
> >> saved at any preemption context switch. On the other hand, the
> >> register values that should be included here are the ones saved at
> >> user-to-kernel context switch on any interrupt event.
> >>
> >> I've yet to implement this. Specifically, I need to do the following
> >> task from now.
> >>
> >> (1) list all entries from user-space to kernel-space execution path.
> >>
> >> (2) divide the entries according to where and how the register
> >> values from user-space context are saved.
> >>
> >> (3) compose a program that retrieves the saved register values from
> >> appropriate locations that is traced by means of (1) and (2).
> >>
> >> Ideally, I think it's best if crash library provides any means of
> >> retrieving this kind of register values, that is, ones saved on
> >> various stack frames. Is there such a plan to do?
> >>
> >>
> >> 2) Getting a signal number for a task which was during core dump
> >> process at kernel crash
> >>
> >> If a target task is halfway of core dump process, it's better to know
> >> a signal number in order to know why the task was about to be core
> >> dumped.
> >>
> >> Unfortunately, I have no choice but backtrace the kernel stack to
> >> retrieve a signal number saved there as an argument of, for example,
> >> do_coredump().
> >>
> >>
> >> 3) Kernel version compatibility
> >>
> >> crash's policy is to support all kernel versions by the latest crash
> >> package. On the other hand, the prototype is based on kernel 2.6.34.
> >> This means more kernel versions need to be supported.
> >>
> >> Well, the question is: to what versions do I need to really test in
> >> addition to the latest upstream kernel? I think it's practically
> >> enough to support RHEL4, RHEL5 and RHEL6.
> >>
> >>
> >> Build Instruction
> >> =================
> >>
> >> $ tar xf crash-5.0.5.tar.gz
> >> $ cd crash-5.0.5/
> >> $ patch -p 1 < gcore.patch
> >> $ make
> >>
> >>
> >> Usage
> >> =====
> >>
> >> Use help subcommand of crash utility as ``help gcore''.
> >>
> >>
> >> Attached File
> >> =============
> >>
> >> * gcore.patch
> >>
> >> A patch implementing gcore subcommand for crash-5.0.5.
> >>
> >> The diffstat output is as follows.
> >>
> >> $ diffstat gcore.patch
> >> Makefile | 10 +-
> >> defs.h | 15 +
> >> gcore.c | 1858 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >> gcore.h | 639 ++++++++++++++++++++
> >> global_data.c | 3 +
> >> help.c | 28 +
> >> netdump.c | 27 +
> >> tools.c | 37 ++
> >> 8 files changed, 2615 insertions(+), 2 deletions(-)
> >>
> >> --
> >> HATAYAMA Daisuke
> >> d.hatayama at jp.fujitsu.com
>
More information about the Crash-utility
mailing list