[Linux-cluster] Re: iptables protection wrapper; nfsexport.sh vs ip.sh racing

Axel Thimm Axel.Thimm at ATrpms.net
Mon Aug 29 23:35:23 UTC 2005


On Mon, Aug 29, 2005 at 02:41:19PM -0400, Lon Hohberger wrote:
> On Tue, 2005-08-23 at 00:52 +0200, Axel Thimm wrote:
> > The typical NFS cluster setups seem to fail for Gigabit NFS/tcp. Some
> > clients that are busy during the relocation of services either bail
> > out with RPC garbage, or set the filesytem to EACCES, or timeout for
> > 17 min.
> > 
> > This has to do with some racing/timing in the NFS vs ip setup/teardown
> > procedure. Protecting the service startup/shutdown with an iptables
> > rule is a good workaround to fix this.
> > 
> > But what is the proper way to integrate this workaround? I could setup
> > new resource agents, one with start=1 and another with start=6 to
> > start/stop dropping packages. Or I could modify the current resource
> > agents to allow for child entities and wrap one script around the
> > service and one in the inner element.
> > 
> > I could probably also hack ip.sh to introduce some delay, to make sure
> > the NFS services are really up/down before proceeding. Or maybe fix
> > the true evil by making nfsexport.sh wait for NFS startup/stop
> > completion (how?)?
> 
> Traditionally, we start the NFS daemons as a service to people who
> forget to start them before starting rgmanager.
> 
> I.e.  Red Hat / Fedora Core users are supposed to do this prior to
> configuring NFS services in rgmanager:
> 
>    chkconfig --level 345 nfslock on
>    chkconfig --level 345 nfs on
> 
> It's really an attempt at a workaround a configuration problem -- and
> nothing more.

The above is with nfs running on all nodes already. The racing seems
to be with the exportfs commands and ip setup/teardown.

It is easy to reproduce (>=50%) if the client connects over Gigabit
and is in write transaction while the service is moved. We saw this in
two different setups. If you throttle the network bandwidth to <=
20MB/sec you don't trigger the bug, so it really seems like a racing
problem.
-- 
Axel.Thimm at ATrpms.net
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20050830/c19cef72/attachment.sig>


More information about the Linux-cluster mailing list