[Linux-cluster] NFS Serving Issues

Wed Aug 17 19:01:28 UTC 2011

Hi Alan, 

Thanks for getting back.

On Wed, 2011-08-17 at 12:00 +0100, Alan Brown wrote:
> Colin Simpson wrote:
> 
> > ,when the service is stopped I get a "Stale NFS file handle" from
> > mounted filesystems accessing the NFS mount point at those times.
> i.e.
> > if I have a copy going I get on the service being disabled:
> 
> That's normal if a NFS server mount is unexported or nfsd shuts down.
> 
> It _should_ (but doesn't always) clear when NFS resumes.

It does clear the "stale NFS file handle" when the service fails over.
But that's not really the issue for me.  My beef is that it seems that
as the "stale NFS handle" will be liable to cause apps on clients to get
upset (will possibly just quit), whereas the hang will suspend client
apps looking at this mount point until the service failsover. Seems
better.

> > Why is there a behaviour disparity? Which is correct?
> 
> They're both correct - and both wrong. :)

But it seems a subtle change to an NFS service setup in the config i.e
the IP containing the NFS export and client vs the IP sitting at the top
level (i.e at the same level as NFS export), results in the NFS behaving
like a hard mount vs a soft mount (even though I'm mounting as hard in
both cases from the clients). 

Maybe I'm confused, just seems pretty unclear as to why the behaviour
should be different. The config fragment you gave behaves for me
properly (IMHO) and the clients hang until service failover (so exactly
like my first case IP ref contains the NFS export etc)

> on server side, restarting the nfslock service is usually sufficent to
> get umount to work (It's safe, clients are told to reacquire their
> locks)
> What's your backend? GFS?
> 
> > But more seriously I can't easily shut the cluster down cleanly when
> > told to by a UPS on power outage. Shutting down the node will be
> unable
> > to be performed cleanly as a resource is open (so will be liable to
> > fencing).
> 
> If the filesystem is GFS, fencing is about the only reliable way of
> leaving the cluster.
> 

The backend I'm trying is ext4 (failing over the mount). I had tried to
manually stop nfsd and nfslock (even though the cluster seems to drop
locks anyway from the output it's writing in the messages file) and that
make no difference sadly, still fails to umount. Even after leaving it
for ages. 

The failure to umount only seems to occur if you are actively performing
a large amount of continuous activity to this NFS export (copying a
large file over when it fails or is stopped). I wonder if this hanging
isn't unexpected with NFS given the "self_fence" option provided in the
fs resources?

Thanks again

Colin

This email and any files transmitted with it are confidential and are intended solely for the use of the individual or entity to whom they are addressed.  If you are not the original recipient or the person responsible for delivering the email to the intended recipient, be advised that you have received this email in error, and that any use, dissemination, forwarding, printing, or copying of this email is strictly prohibited. If you received this email in error, please immediately notify the sender and delete the original.