[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Linux-cluster] NFS Serving Issues

Colin Simpson wrote:

,when the service is stopped I get a "Stale NFS file handle" from
mounted filesystems accessing the NFS mount point at those times. i.e.
if I have a copy going I get on the service being disabled:

That's normal if a NFS server mount is unexported or nfsd shuts down.

It _should_ (but doesn't always) clear when NFS resumes.

The only way around this is to define an IP for the NFS service/export pair and make the IP the final dependency in the service (ie, the IP is the last thing to come up and first thing to go down):

<service autostart="1" domain="msslap-pref" name="MSSLAU-X41" recovery="restart">
                        <clusterfs ref="/stage/peace12">
                                <nfsexport ref="msslau-x41-exports">
<nfsclient ref="/stage/peace12-- alphac"> <nfsclient ref="/stage/peace12--127/8"> <nfsclient ref="/stage/peace12-- linuxt">

<nfsclient ref="/stage/peace12-- plasmawriter">

 <nfsclient ref="/stage/peace12-- webserver">

         <ip ref=""/>



The above format of cluster.conf having the "ip ref" contain the rest of
the things is as per the "Deploying Highly Available NFS on Red Hat
Enterprise Linux 6" document.

But if I don't enclose the nfs and fs things in the ip, the clients hang
until the services restart i.e

Which is normal NFS client behaviour. (Cluster sends out a bunch of gratuitous ARPs when the services change host in order to have the IP/arp pair updated more quickly.)

BTW Is it best practice to use one nfsexport per nfsclient or is one
nfsexport resource enough cluster wide?

"It depends"

If all NFS will come off one host then one resource is enough.

If NFS might run off several hosts then you'll need one resource per export.

Why is there a behaviour disparity? Which is correct?

They're both correct - and both wrong. :)

Question 2: I have the old case on either of the above where I can't
unmount the exported file system when I stop the service (so I can't
migrate it). Not unless I halt the file server hosting the file share or
force fence it. I just get the old:

# umount /mnt/home
umount: /mnt/home: device is busy.
        (In some cases useful info about processes that use
         the device is found by lsof(8) or fuser(1))

On client side, umount -l will help in these cases.

on server side, restarting the nfslock service is usually sufficent to get umount to work (It's safe, clients are told to reacquire their locks)

Of course nothing is shown in lsof or fuser. This is annoying for a
number of reasons. One is that I can't readily perform basic load
balancing by migrating NFS services to their correct nodes (as I can't
migrate a service without halting a node).

What's your backend? GFS?

But more seriously I can't easily shut the cluster down cleanly when
told to by a UPS on power outage. Shutting down the node will be unable
to be performed cleanly as a resource is open (so will be liable to

If the filesystem is GFS, fencing is about the only reliable way of leaving the cluster.

However: if you shut down nfsd _and_ nfslock, you should be able to unmount the FSes cleanly.

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]