[Mod_nss-list] Crashing apache processes
Lars Skovlund
LARSSKOV at dk.ibm.com
Fri Nov 1 15:35:24 UTC 2013
Hi Rob
I'm a little confused as to why you're asking me about Opencryptoki. Are
there known issues with it?
No, we are talking to a hardware PKCS#11 implementation based on the IBM
4765 (which Opencryptoki
can also do, but based on the standard (CCA) firmware and not the specific
PKCS#11 code that I use).
With your description below, I was able to find the problem in this
snippet of mod_nss code:
if (chdir(mc->pCertificateDatabase) != 0) {
ap_log_error(APLOG_MARK, APLOG_ERR, 0, base_server,
"Unable to change directory to %s", mc->pCertificateDatabase);
if (mc->nInitCount == 1)
nss_die();
else
return;
}
rv = NSS_Initialize(mc->pCertificateDatabase, mc->pDBPrefix,
mc->pDBPrefix, "secmod.db", NSS_INIT_READONLY);
On my test machine, the apache user had no access to the NSS database
directory (it was in /root/y4nss),
and I had missed this in the logs. According to the above, if that chdir
fails, then nss_init_SSLLibrary returns
with no indication of error, and without calling NSS_Initialize. The very
next line of nss_init_Child then
goes on to make an NSS call, and fails with a segfault. So this could be
argued to be an NSS bug, but also,
mod_nss should arguably handle the situation more gracefully on its end.
Now I at least know what caused it. Thanks, Rob!
Best regards
Lars
Rob Crittenden <rcritten at redhat.com> wrote on 11/01/2013 01:59:36 PM:
> Rob Crittenden <rcritten at redhat.com>
> 11/01/2013 01:59 PM
>
> To
>
> Lars Skovlund/Denmark/IBM at IBMDK, mod_nss-list at redhat.com,
>
> cc
>
> Subject
>
> Re: [Mod_nss-list] Crashing apache processes
>
> Lars Skovlund wrote:
> > Hello list,
> >
> > As part of a customer case I'm working on, I've been trying to set up
> > the combination of Apache, mod_nss and our own PKCS#11 provider.
> > I've gotten the Apache server to start, but the tasks spawned by
Apache
> > are dying left and right:
> >
> > [Thu Oct 31 15:48:15 2013] [notice] child pid 6649 exit signal
> > Segmentation fault (11)
> > [Thu Oct 31 15:48:15 2013] [notice] child pid 6650 exit signal
> > Segmentation fault (11)
> > [Thu Oct 31 15:48:16 2013] [notice] child pid 6645 exit signal
> > Segmentation fault (11), possible coredump in /tmp
> > [Thu Oct 31 15:48:16 2013] [notice] child pid 6646 exit signal
> > Segmentation fault (11), possible coredump in /tmp
> > [Thu Oct 31 15:48:16 2013] [notice] child pid 6647 exit signal
> > Segmentation fault (11), possible coredump in /tmp
> > [Thu Oct 31 15:48:16 2013] [notice] child pid 6648 exit signal
> > Segmentation fault (11), possible coredump in /tmp
> > [Thu Oct 31 15:48:16 2013] [notice] child pid 6651 exit signal
> > Segmentation fault (11)
> > [Thu Oct 31 15:48:16 2013] [notice] child pid 6653 exit signal
> > Segmentation fault (11)
> > [Thu Oct 31 15:48:18 2013] [notice] child pid 6668 exit signal
> > Segmentation fault (11), possible coredump in /tmp
> >
> > and so on. My investigation points toward NSS being shut down
> > prematurely (by the main process?) while the incipient worker
processes
> > continue to call it (heavily edited for brevity):
> >
> > [root at cccclab4 ~]# gdb --args httpd -X -D FOREGROUND
> > GNU gdb (GDB) Red Hat Enterprise Linux (7.2-60.el6)
> > [...]
> > (gdb) break NSS_Initialize
> > Breakpoint 1 (NSS_Initialize) pending.
> > (gdb) run
> > Starting program: /usr/sbin/httpd -X -D FOREGROUND
> > Breakpoint 1, NSS_Initialize (configdir=0x7ffff82bf0a0 "/root/y4nss/",
> > certPrefix=0x0, keyPrefix=0x0, secmodName=0x7fffed87c9ae
"secmod.db",
> > flags=1) at nssinit.c:817
> > (gdb) watch g_default_trust_domain
> > Hardware watchpoint 2: g_default_trust_domain
> > (gdb) cont
> > Continuing.
> > Hardware watchpoint 2: g_default_trust_domain
> >
> > Old value = (NSSTrustDomain *) 0x0
> > New value = (NSSTrustDomain *) 0x7ffff8426be0
> > STAN_LoadDefaultNSS3TrustDomain () at pki3hack.c:153
> > 153 return PR_SUCCESS;
> > (gdb) cont
> > Continuing.
> > Please enter password for "TEST" token:
> > [Thread 0x7fffe3c37700 (LWP 10913) exited]
> > (gdb) cont
> > Hardware watchpoint 2: g_default_trust_domain
> >
> > Old value = (NSSTrustDomain *) 0x7ffff8426be0
> > New value = (NSSTrustDomain *) 0x0
> > 0x00007ffff34a1828 in STAN_Shutdown () at pki3hack.c:212
> > 212 g_default_trust_domain = NULL;
> > (gdb) cont
> > Continuing.
> > [New Thread 0x7fffe3c37700 (LWP 10916)]
> >
> > Program received signal SIGSEGV, Segmentation fault.
> > nssTrustDomain_GetCertsFromCache (td=0x0, certListOpt=0x7ffff836a7e0)
> > at tdcache.c:1127
> > 1127 PZ_Lock(td->cache->lock);
> > (gdb) print td
> > $1 = (NSSTrustDomain *) 0x0
> > (gdb) bt
> > #0 nssTrustDomain_GetCertsFromCache (td=0x0,
certListOpt=0x7ffff836a7e0)
> > at tdcache.c:1127
> > #1 0x00007ffff349b727 in NSSTrustDomain_TraverseCertificates (td=0x0,
> > callback=0x7ffff34623e0 <pk11ListCertCallback>,
arg=0x7fffffffdf20)
> > at trustdomain.c:1015
> > #2 0x00007ffff34622fc in PK11_ListCerts (type=PK11CertListUser,
pwarg=0x0)
> > at pk11cert.c:2509
> > #3 0x00007fffed872772 in nss_init_Child (p=0x7ffff83d4c08,
> > base_server=0x7ffff8212880) at nss_engine_init.c:1370
> > #4 0x00007ffff7fd6b0c in ap_run_child_init (pchild=0x7ffff83d4c08,
> > s=0x7ffff8212880) at
/usr/src/debug/httpd-2.2.15/server/config.c:155
> > #5 0x00007ffff7fea725 in child_main (child_num_arg=<value optimized
out>)
> > at /usr/src/debug/httpd-2.2.15/server/mpm/prefork/prefork.c:518
> > #6 0x00007ffff7feac46 in make_child (s=0x7ffff8212880, slot=0)
> > at /usr/src/debug/httpd-2.2.15/server/mpm/prefork/prefork.c:707
> > #7 0x00007ffff7feb293 in ap_mpm_run (_pconf=<value optimized out>,
> > plog=<value optimized out>, s=<value optimized out>)
> > at /usr/src/debug/httpd-2.2.15/server/mpm/prefork/prefork.c:983
> > #8 0x00007ffff7fc2900 in main (argc=4, argv=0x7fffffffe408)
> > at /usr/src/debug/httpd-2.2.15/server/main.c:760
> >
> > The NULL pointer that is dereferenced in the final steps comes from
the
> > global variable I watched. Is this a known bug, or are there
> > configuration problems that are known to cause this?
> > The versions of Apache and nss I am using are these (they are the
> > versions I got from our local RHN node):
> >
> > [root at cccclab4 ~]# rpm -qa nss httpd
> > httpd-2.2.15-29.el6_4.x86_64
> > nss-3.14.0.0-12.el6.x86_64
> > [root at cccclab4 ~]#
> >
> > Any help you can give is greatly appreciated.
>
> Ok, so Apache makes things difficult for us. It loads and reloads the
> modules a couple of times during startup.
>
> During the initial start stdout/stdin are still open and things are
> launched as root. This, from an Apache perspective, is just a sanity
> startup to get the list of configuration options available in the
> module. We take this opportunity to prompt for any token passwords that
> are needed.
>
> Then Apache unloads the module. We have to shut down NSS when this
happens.
>
> Then it restarts things, perhaps in multiple forked children. In each
> one we initialize NSS and apply the configuration.
>
> Is this the opencryptoki module?
>
> rob
>
Medmindre andet er angivet ovenfor: / Unless Otherwise Stated Above:
IBM Danmark ApS
Nymøllevej 91
2800 Kongens Lyngby, Danmark
CVR nr.: 65305216
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/mod_nss-list/attachments/20131101/5e2ef00c/attachment.htm>
More information about the Mod_nss-list
mailing list