[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Linux-cluster] How to set up NFS HA service



I think my first attempt to answer ended up in the bit bucket because of a wlan problem while I saved it to the drafts folder. Sigh...

Lon Hohberger wrote:

On Tue, 2005-04-19 at 15:08 +0200, birger wrote:

Known bug/feature:

https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=151669

You can change this behavior if you wanted to by adding <child type=...>
to service.sh's "special" element in the XML meta-data.

I thought about trying just that, but believed it couldn't be that simple... :-D



I'm also a bit puzzled about why the file systems don't get unmounted when I disable all services.


They're GFS.  Add force_unmount="1" to the <fs> elements if you want
them to be umounted.  GFS is nice because you *don't* have to umount
it.

That was exactly why I wanted to mount the gfs file systems outside the service. I am very happy with this unexpected behaviour. I want the file systems to be there. :-)


I was afraid they didn't unmount because of some problem.

FYI, NFS services on traditional file systems don't cleanly stop right
now due to an EBUSY during umount from the kernel.  Someone's looking in
to it on the NFS side (apparently, not all the refs are getting cleared
if a node has an NFS mount ref and we unexport the FS, or something).

I saw a very similar problem some years ago on Solaris with Veritas FirstWatch. fuser and lofs came up empty, but still the file system was busy when I tried to umount. I found a workaround... Restarting statd and lockd and then umount. Seems like they had their paws in the file system somehow.
Since FirstWatch was mostly a bunch of sh scripts it was easy to modify the nfs umount code to do this.


Regarding lockd, I think my solution is valid given the 2 restraints:
- The cluster nodes should not be NFS clients (and thanks to GFS I don't need that)
- There should only be one NFS service running on any cluster node. And I only have one NFS service.


When I set the name for statd to the name of the service IP address and relocate the status dir to a cluster disk, a takeover should behave just like a server reboot, shouldn't it?

>>Apr 19 14:42:58 server1 clurgmgrd[7498]: <notice> Service nfssvc started
Apr 19 14:43:56 server1 clurgmgrd[7498]: <notice> status on nfsclient "nis-hosts-ro" returned 1 (generic error)
Apr 19 14:43:56 server1 clurgmgrd[7498]: <notice> status on nfsclient "nis-hosts" returned 1 (generic error)
Apr 19 14:44:56 server1 clurgmgrd[7498]: <notice> status on nfsclient "nis-hosts-ro" returned 1 (generic error)
Apr 19 14:44:56 server1 clurgmgrd[7498]: <notice> status on nfsclient "nis-hosts" returned 1 (generic error)


Hmm, that's odd, it could be a bug in the status phase which is related
to NIS exports.  Does this only happen after a failover, or does it
happen all the time?

My cluster only has one node (even if I have defined 2 nodes). I have to get the first node production ready and migrate everything over first. Then make the old file server a second cluster node.


I'll have a look around and see if I can find a solution.

--
birger


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]