[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

capturing a core file from a non-privileged daemon




All-

I have a system where a daemon (started as root) periodically forks
non-privileged workers, and those workers sometimes try to dump core.
I would like to capture those core files for debugging, but so far I
have been unable to find a way for the daemon to actually generate the
core file.  I'm hoping someone can point out what I'm missing.

The system is RHEL 4.8, x86_64, currently running 2.6.9-89.0.23.ELsmp.
Running "dmesg", I see dozens of these per day:

#dmesg
lpd[12642]: segfault at 000000000000000c rip 0000000000a73cce rsp
00000000ffffcea0 error 4
lpd[21006]: segfault at 000000000000000c rip 0000000000a73cce rsp
00000000ffffcea0 error 4
lpd[16944]: segfault at 0000000036383675 rip 0000000000a73cce rsp
00000000ffffcea0 error 4
lpd[19501]: segfault at 0000000036383675 rip 0000000000a73cce rsp
00000000ffffcea0 error 4
lpd[11300]: segfault at 000000000000000c rip 0000000000a73cce rsp
00000000ffffcea0 error 4


The daemon is a slightly older version of the LPRng lpd.  Its model is to
start as root but switch to a non-privileged user (lp) and then fork workers
as needed for queue processing.  The daemon is locally-compiled and is
not stripped.

It's being started with the following line in /etc/init.d/lpd:

	daemon /usr/local/sbin/lpd

Because the daemon shell function defaults to setting "ulimit -c 0", I've
added the following two lines to the startup script, to override that
default behavior:

DAEMON_COREFILE_LIMIT=unlimited
export DAEMON_COREFILE_LIMIT

If I check the /proc/<pid>/limits file for both the master lpd process
or any of the worker processes, I can see that the core file limit is
"unlimited":

$ ps -ef | grep -i lpd
lp 16689 1 1 Jul11 ? 01:17:39 lpd Waiting lp 16690 16689 0 Jul11 ? 00:04:58 lpd LOG2

#cat /proc/16689/limits Limit Soft Limit Hard Limit Units Max cpu time unlimited unlimited seconds Max file size unlimited unlimited bytes Max data size unlimited unlimited bytes Max stack size 10485760 unlimited bytes Max core file size unlimited unlimited bytes Max resident set unlimited unlimited bytes Max processes 16383 16383 processes Max open files 1024 1024 files Max locked memory 32768 32768 bytes Max address space unlimited unlimited bytes Max file locks unlimited unlimited locks Max pending signals 1024 1024 signals Max msgqueue size 819200 819200 bytes

#cat /proc/16690/limits Limit Soft Limit Hard Limit Units Max cpu time unlimited unlimited seconds Max file size unlimited unlimited bytes Max data size unlimited unlimited bytes Max stack size 10485760 unlimited bytes Max core file size unlimited unlimited bytes Max resident set unlimited unlimited bytes Max processes 16383 16383 processes Max open files 1024 1024 files Max locked memory 32768 32768 bytes Max address space unlimited unlimited bytes Max file locks unlimited unlimited locks Max pending signals 1024 1024 signals Max msgqueue size 819200 819200 bytes


So, it doesn't appear that it's a problem with "ulimit"...

Because the worker processes are non-privileged and the main daemon
process has / as its CWD, it's potentially a permissions problem.  To
get around that, I set kernel.core_pattern so that core files would go
into /tmp:

#sysctl -a | egrep -i 'kernel.core'
kernel.core_pattern = /tmp/core.%p.%e.%s.%t
kernel.core_uses_pid = 1


After doing that, still no joy.  After some web searching, I was
even desperate enough to try setting "kernel.suid_dumpable" parameter
mentioned here:

	http://wiki.zimbra.com/index.php?title=Enabling_Core_Files

even though the "lpd" process is not setuid, it just starts as root.
That too made no difference.

On the off chance that the kernel.core_pattern wasn't being honored, I
even went so far as to briefly try changing ownership (to "lp") and
permissions (775) on /, to give the daemon permission to dump core in /.
That also made no difference, so it's been undone.

I've also tried pursuing using "systemtap" to install a segfault probe
that just watches for segfaults from processes named "lpd", and that works
but unfortunately systemtap on RHEL4 cannot do user-level tracing, which
is what I need.

Anyone have any ideas on what I've missed?  To be able to debug what's
going on with the worker daemons, I really need to get my hands on some of
the core files.  I'm comfortable with both gdb and strace/ltrace, but if
at all possible I want to avoid attaching to the main daemon and just
using one of those tools to gather a huge volume of data just waiting for
one of the forked children to segfault.  Capturing a core file would be
a much better way to start the debugging process.

Thanks,

Tim
--
Tim Mooney                                             Tim Mooney ndsu edu
Enterprise Computing & Infrastructure                  701-231-1076 (Voice)
Room 242-J6, IACC Building                             701-231-8541 (Fax)
North Dakota State University, Fargo, ND 58105-5164


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]