[Mod_nss-list] Crashing apache processes

Fri Nov 1 15:35:24 UTC 2013

Hi Rob

I'm a little confused as to why you're asking me about Opencryptoki. Are 
there known issues with it?
No, we are talking to a hardware PKCS#11 implementation based on the IBM 
4765 (which Opencryptoki
can also do, but based on the standard (CCA) firmware and not the specific 
PKCS#11 code that I use).

With your description below, I was able to find the problem in this 
snippet of mod_nss code:

   if (chdir(mc->pCertificateDatabase) != 0) {
        ap_log_error(APLOG_MARK, APLOG_ERR, 0, base_server,
            "Unable to change directory to %s", mc->pCertificateDatabase);
        if (mc->nInitCount == 1)
            nss_die();
        else
            return;
    }
   rv = NSS_Initialize(mc->pCertificateDatabase, mc->pDBPrefix, 
mc->pDBPrefix, "secmod.db", NSS_INIT_READONLY);

On my test machine, the apache user had no access to the NSS database 
directory (it was in /root/y4nss), 
and I had missed this in the logs. According to the above, if that chdir 
fails, then nss_init_SSLLibrary returns
with no indication of error, and without calling NSS_Initialize. The very 
next line of nss_init_Child then
goes on to make an NSS call, and fails with a segfault. So this could be 
argued to be an NSS bug, but also,
mod_nss should arguably handle the situation more gracefully on its end.

Now I at least know what caused it. Thanks, Rob!

Best regards

Lars

Rob Crittenden <rcritten at redhat.com> wrote on 11/01/2013 01:59:36 PM:

> Rob Crittenden <rcritten at redhat.com> 
> 11/01/2013 01:59 PM
> 
> To
> 
> Lars Skovlund/Denmark/IBM at IBMDK, mod_nss-list at redhat.com, 
> 
> cc
> 
> Subject
> 
> Re: [Mod_nss-list] Crashing apache processes
> 
> Lars Skovlund wrote:
> > Hello list,
> >
> > As part of a customer case I'm working on, I've been trying to set up
> > the combination of Apache, mod_nss and our own PKCS#11 provider.
> > I've gotten the Apache server to start, but the tasks spawned by 
Apache
> > are dying left and right:
> >
> > [Thu Oct 31 15:48:15 2013] [notice] child pid 6649 exit signal
> > Segmentation fault (11)
> > [Thu Oct 31 15:48:15 2013] [notice] child pid 6650 exit signal
> > Segmentation fault (11)
> > [Thu Oct 31 15:48:16 2013] [notice] child pid 6645 exit signal
> > Segmentation fault (11), possible coredump in /tmp
> > [Thu Oct 31 15:48:16 2013] [notice] child pid 6646 exit signal
> > Segmentation fault (11), possible coredump in /tmp
> > [Thu Oct 31 15:48:16 2013] [notice] child pid 6647 exit signal
> > Segmentation fault (11), possible coredump in /tmp
> > [Thu Oct 31 15:48:16 2013] [notice] child pid 6648 exit signal
> > Segmentation fault (11), possible coredump in /tmp
> > [Thu Oct 31 15:48:16 2013] [notice] child pid 6651 exit signal
> > Segmentation fault (11)
> > [Thu Oct 31 15:48:16 2013] [notice] child pid 6653 exit signal
> > Segmentation fault (11)
> > [Thu Oct 31 15:48:18 2013] [notice] child pid 6668 exit signal
> > Segmentation fault (11), possible coredump in /tmp
> >
> > and so on. My investigation points toward NSS being shut down
> > prematurely (by the main process?) while the incipient worker 
processes
> > continue to call it (heavily edited for brevity):
> >
> > [root at cccclab4 ~]# gdb --args httpd -X -D FOREGROUND
> > GNU gdb (GDB) Red Hat Enterprise Linux (7.2-60.el6)
> > [...]
> > (gdb) break NSS_Initialize
> > Breakpoint 1 (NSS_Initialize) pending.
> > (gdb) run
> > Starting program: /usr/sbin/httpd -X -D FOREGROUND
> > Breakpoint 1, NSS_Initialize (configdir=0x7ffff82bf0a0 "/root/y4nss/",
> >      certPrefix=0x0, keyPrefix=0x0, secmodName=0x7fffed87c9ae 
"secmod.db",
> >      flags=1) at nssinit.c:817
> > (gdb) watch g_default_trust_domain
> > Hardware watchpoint 2: g_default_trust_domain
> > (gdb) cont
> > Continuing.
> > Hardware watchpoint 2: g_default_trust_domain
> >
> > Old value = (NSSTrustDomain *) 0x0
> > New value = (NSSTrustDomain *) 0x7ffff8426be0
> > STAN_LoadDefaultNSS3TrustDomain () at pki3hack.c:153
> > 153    return PR_SUCCESS;
> > (gdb) cont
> > Continuing.
> > Please enter password for "TEST" token:
> > [Thread 0x7fffe3c37700 (LWP 10913) exited]
> > (gdb) cont
> > Hardware watchpoint 2: g_default_trust_domain
> >
> > Old value = (NSSTrustDomain *) 0x7ffff8426be0
> > New value = (NSSTrustDomain *) 0x0
> > 0x00007ffff34a1828 in STAN_Shutdown () at pki3hack.c:212
> > 212            g_default_trust_domain = NULL;
> > (gdb) cont
> > Continuing.
> > [New Thread 0x7fffe3c37700 (LWP 10916)]
> >
> > Program received signal SIGSEGV, Segmentation fault.
> > nssTrustDomain_GetCertsFromCache (td=0x0, certListOpt=0x7ffff836a7e0)
> >      at tdcache.c:1127
> > 1127    PZ_Lock(td->cache->lock);
> > (gdb) print td
> > $1 = (NSSTrustDomain *) 0x0
> > (gdb) bt
> > #0  nssTrustDomain_GetCertsFromCache (td=0x0, 
certListOpt=0x7ffff836a7e0)
> >      at tdcache.c:1127
> > #1  0x00007ffff349b727 in NSSTrustDomain_TraverseCertificates (td=0x0,
> >      callback=0x7ffff34623e0 <pk11ListCertCallback>, 
arg=0x7fffffffdf20)
> >      at trustdomain.c:1015
> > #2  0x00007ffff34622fc in PK11_ListCerts (type=PK11CertListUser, 
pwarg=0x0)
> >      at pk11cert.c:2509
> > #3  0x00007fffed872772 in nss_init_Child (p=0x7ffff83d4c08,
> >      base_server=0x7ffff8212880) at nss_engine_init.c:1370
> > #4  0x00007ffff7fd6b0c in ap_run_child_init (pchild=0x7ffff83d4c08,
> >      s=0x7ffff8212880) at 
/usr/src/debug/httpd-2.2.15/server/config.c:155
> > #5  0x00007ffff7fea725 in child_main (child_num_arg=<value optimized 
out>)
> >      at /usr/src/debug/httpd-2.2.15/server/mpm/prefork/prefork.c:518
> > #6  0x00007ffff7feac46 in make_child (s=0x7ffff8212880, slot=0)
> >      at /usr/src/debug/httpd-2.2.15/server/mpm/prefork/prefork.c:707
> > #7  0x00007ffff7feb293 in ap_mpm_run (_pconf=<value optimized out>,
> >      plog=<value optimized out>, s=<value optimized out>)
> >      at /usr/src/debug/httpd-2.2.15/server/mpm/prefork/prefork.c:983
> > #8  0x00007ffff7fc2900 in main (argc=4, argv=0x7fffffffe408)
> >      at /usr/src/debug/httpd-2.2.15/server/main.c:760
> >
> > The NULL pointer that is dereferenced in the final steps comes from 
the
> > global variable I watched. Is this a known bug, or are there
> > configuration problems that are known to cause this?
> > The versions of Apache and nss I am using are these (they are the
> > versions I got from our local RHN node):
> >
> > [root at cccclab4 ~]# rpm -qa nss httpd
> > httpd-2.2.15-29.el6_4.x86_64
> > nss-3.14.0.0-12.el6.x86_64
> > [root at cccclab4 ~]#
> >
> > Any help you can give is greatly appreciated.
> 
> Ok, so Apache makes things difficult for us. It loads and reloads the 
> modules a couple of times during startup.
> 
> During the initial start stdout/stdin are still open and things are 
> launched as root. This, from an Apache perspective, is just a sanity 
> startup to get the list of configuration options available in the 
> module. We take this opportunity to prompt for any token passwords that 
> are needed.
> 
> Then Apache unloads the module. We have to shut down NSS when this 
happens.
> 
> Then it restarts things, perhaps in multiple forked children. In each 
> one we initialize NSS and apply the configuration.
> 
> Is this the opencryptoki module?
> 
> rob
> 

Medmindre andet er angivet ovenfor: / Unless Otherwise Stated Above:
IBM Danmark ApS
Nymøllevej 91
2800 Kongens Lyngby, Danmark
CVR nr.: 65305216 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/mod_nss-list/attachments/20131101/5e2ef00c/attachment.htm>