[Linux-cluster] Re: nanny segfault problem

Christopher Barry Christopher.Barry at qlogic.com
Tue Nov 13 22:53:20 UTC 2007


On Tue, 2007-11-13 at 15:14 -0500, Christopher Barry wrote:
> script got scraped by my gateway - attached here as a textfile
> 
> 
> On Tue, 2007-11-13 at 15:05 -0500, Christopher Barry wrote:
> > Greetings All,
> > 
> > running RHEL4U5
> > 
> > I have a bunch of services on my cluster w/ access via redundant
> > directors.
> > 
> > I've created a generic service checking script, which I'm specifying in
> > lvs.cf's 'send_program' config parameter.
> > 
> > script is attached to this post. see that for how it works with the
> > symlinks described below.
> > 
> > I create symlinks to the script for every service I want to check, with
> > their name containing the port to hit, as in:
> > /sbin/lvs-<port>.sh
> > 
> > so the symlink name to check ssh availability, for instance, is:
> > /sbin/lvs-22.sh
> > 
> > The script works fine, and returns the first contiguous block of
> > [[:alnum:]] text data from the connection attempt for use with the
> > expect line of lvs.cf.
> > 
> > 
> > The problem is, when nanny is spawned by pulse, all of the nanny
> > processes segfault.
> > 
> > > Nov 13 14:40:44 kop-sds-dir-01 lvs[17740]: create_monitor for ssh_access/kop-sds-01 running as pid 17749
> > > Nov 13 14:40:44 kop-sds-dir-01 nanny[17749]: making 10.32.12.11:22 available
> > > Nov 13 14:40:44 kop-sds-dir-01 kernel: nanny[17749]: segfault at 000000000000006c rip 000000335e570810 rsp 0000007fbfffe978 error 4
> > 
> > this occurs almost instantly for every nanny process.
> > 
> > Can anyone venture a guess as to what is happening?
> > 
> > see my lvs.cf here:
> > http://nanny-error.pastebin.com/m592f7911
> > 
> > 

All,

More interesting developments:
If I start pulse with:

# pulse -v --nodaemon

everything (kinda) works.

# pulse -v

does not work work at all, however.

Something is different between daemon mode and not, beyond apparently
backgrounding it.

I was thinking this may be a permissions issue, but I'd already changed
the mode of my script to 4755.







More information about the Linux-cluster mailing list