[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

RE: capturing a core file from a non-privileged daemon



I wonder if it is trying to right the core file to the
root directory and failing?



> -----Original Message-----
> From: Tim Mooney [mailto:Tim Mooney ndsu edu]
> Sent: Thursday, July 15, 2010 2:42 PM
> To: redhat-sysadmin-list redhat com
> Subject: capturing a core file from a non-privileged daemon
> 
> 
> All-
> 
> I have a system where a daemon (started as root) periodically forks
> non-privileged workers, and those workers sometimes try to dump core.
> I would like to capture those core files for debugging, but so far I
> have been unable to find a way for the daemon to actually generate the
> core file.  I'm hoping someone can point out what I'm missing.
> 
> The system is RHEL 4.8, x86_64, currently running 2.6.9-89.0.23.ELsmp.
> Running "dmesg", I see dozens of these per day:
> 
> #dmesg
> lpd[12642]: segfault at 000000000000000c rip 0000000000a73cce rsp
> 00000000ffffcea0 error 4
> lpd[21006]: segfault at 000000000000000c rip 0000000000a73cce rsp
> 00000000ffffcea0 error 4
> lpd[16944]: segfault at 0000000036383675 rip 0000000000a73cce rsp
> 00000000ffffcea0 error 4
> lpd[19501]: segfault at 0000000036383675 rip 0000000000a73cce rsp
> 00000000ffffcea0 error 4
> lpd[11300]: segfault at 000000000000000c rip 0000000000a73cce rsp
> 00000000ffffcea0 error 4
> 
> 
> The daemon is a slightly older version of the LPRng lpd.  Its model is
> to
> start as root but switch to a non-privileged user (lp) and then fork
> workers
> as needed for queue processing.  The daemon is locally-compiled and is
> not stripped.
> 
> It's being started with the following line in /etc/init.d/lpd:
> 
>  	daemon /usr/local/sbin/lpd
> 
> Because the daemon shell function defaults to setting "ulimit -c 0",
> I've
> added the following two lines to the startup script, to override that
> default behavior:
> 
> DAEMON_COREFILE_LIMIT=unlimited
> export DAEMON_COREFILE_LIMIT
> 
> If I check the /proc/<pid>/limits file for both the master lpd process
> or any of the worker processes, I can see that the core file limit is
> "unlimited":
> 
> $ ps -ef | grep -i lpd
> lp       16689     1  1 Jul11 ?        01:17:39 lpd Waiting
> lp       16690 16689  0 Jul11 ?        00:04:58 lpd LOG2
> 
> #cat /proc/16689/limits
> Limit                     Soft Limit           Hard Limit
> Units
> Max cpu time              unlimited            unlimited
> seconds
> Max file size             unlimited            unlimited
> bytes
> Max data size             unlimited            unlimited
> bytes
> Max stack size            10485760             unlimited
> bytes
> Max core file size        unlimited            unlimited
> bytes
> Max resident set          unlimited            unlimited
> bytes
> Max processes             16383                16383
> processes
> Max open files            1024                 1024
> files
> Max locked memory         32768                32768
> bytes
> Max address space         unlimited            unlimited
> bytes
> Max file locks            unlimited            unlimited
> locks
> Max pending signals       1024                 1024
> signals
> Max msgqueue size         819200               819200
> bytes
> 
> #cat /proc/16690/limits
> Limit                     Soft Limit           Hard Limit
> Units
> Max cpu time              unlimited            unlimited
> seconds
> Max file size             unlimited            unlimited
> bytes
> Max data size             unlimited            unlimited
> bytes
> Max stack size            10485760             unlimited
> bytes
> Max core file size        unlimited            unlimited
> bytes
> Max resident set          unlimited            unlimited
> bytes
> Max processes             16383                16383
> processes
> Max open files            1024                 1024
> files
> Max locked memory         32768                32768
> bytes
> Max address space         unlimited            unlimited
> bytes
> Max file locks            unlimited            unlimited
> locks
> Max pending signals       1024                 1024
> signals
> Max msgqueue size         819200               819200
> bytes
> 
> 
> So, it doesn't appear that it's a problem with "ulimit"...
> 
> Because the worker processes are non-privileged and the main daemon
> process has / as its CWD, it's potentially a permissions problem.  To
> get around that, I set kernel.core_pattern so that core files would go
> into /tmp:
> 
> #sysctl -a | egrep -i 'kernel.core'
> kernel.core_pattern = /tmp/core.%p.%e.%s.%t
> kernel.core_uses_pid = 1
> 
> 
> After doing that, still no joy.  After some web searching, I was
> even desperate enough to try setting "kernel.suid_dumpable" parameter
> mentioned here:
> 
>  	http://wiki.zimbra.com/index.php?title=Enabling_Core_Files
> 
> even though the "lpd" process is not setuid, it just starts as root.
> That too made no difference.
> 
> On the off chance that the kernel.core_pattern wasn't being honored, I
> even went so far as to briefly try changing ownership (to "lp") and
> permissions (775) on /, to give the daemon permission to dump core in
> /.
> That also made no difference, so it's been undone.
> 
> I've also tried pursuing using "systemtap" to install a segfault probe
> that just watches for segfaults from processes named "lpd", and that
> works
> but unfortunately systemtap on RHEL4 cannot do user-level tracing,
> which
> is what I need.
> 
> Anyone have any ideas on what I've missed?  To be able to debug what's
> going on with the worker daemons, I really need to get my hands on
some
> of
> the core files.  I'm comfortable with both gdb and strace/ltrace, but
> if
> at all possible I want to avoid attaching to the main daemon and just
> using one of those tools to gather a huge volume of data just waiting
> for
> one of the forked children to segfault.  Capturing a core file would
be
> a much better way to start the debugging process.
> 
> Thanks,
> 
> Tim
> --
> Tim Mooney
> Tim Mooney ndsu edu
> Enterprise Computing & Infrastructure                  701-231-1076
> (Voice)
> Room 242-J6, IACC Building                             701-231-8541
> (Fax)
> North Dakota State University, Fargo, ND 58105-5164
> 
> --
> redhat-sysadmin-list mailing list
> redhat-sysadmin-list redhat com
> https://www.redhat.com/mailman/listinfo/redhat-sysadmin-list


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]