[Linux-cluster] How to set up NFS HA service
birger
birger at birger.sh
Tue Apr 19 18:47:49 UTC 2005
I think my first attempt to answer ended up in the bit bucket because of a
wlan problem while I saved it to the drafts folder. Sigh...
Lon Hohberger wrote:
> On Tue, 2005-04-19 at 15:08 +0200, birger wrote:
>
> Known bug/feature:
>
> https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=151669
>
> You can change this behavior if you wanted to by adding <child type=...>
> to service.sh's "special" element in the XML meta-data.
I thought about trying just that, but believed it couldn't be that simple... :-D
>>I'm also a bit puzzled about why the file systems don't get unmounted
>>when I disable all services.
>
>
> They're GFS. Add force_unmount="1" to the <fs> elements if you want
> them to be umounted. GFS is nice because you *don't* have to umount
> it.
That was exactly why I wanted to mount the gfs file systems outside the
service. I am very happy with this unexpected behaviour. I want the file
systems to be there. :-)
I was afraid they didn't unmount because of some problem.
> FYI, NFS services on traditional file systems don't cleanly stop right
> now due to an EBUSY during umount from the kernel. Someone's looking in
> to it on the NFS side (apparently, not all the refs are getting cleared
> if a node has an NFS mount ref and we unexport the FS, or something).
I saw a very similar problem some years ago on Solaris with Veritas
FirstWatch. fuser and lofs came up empty, but still the file system was busy
when I tried to umount. I found a workaround... Restarting statd and lockd
and then umount. Seems like they had their paws in the file system somehow.
Since FirstWatch was mostly a bunch of sh scripts it was easy to modify the
nfs umount code to do this.
Regarding lockd, I think my solution is valid given the 2 restraints:
- The cluster nodes should not be NFS clients (and thanks to GFS I don't
need that)
- There should only be one NFS service running on any cluster node. And I
only have one NFS service.
When I set the name for statd to the name of the service IP address and
relocate the status dir to a cluster disk, a takeover should behave just
like a server reboot, shouldn't it?
>>Apr 19 14:42:58 server1 clurgmgrd[7498]: <notice> Service nfssvc started
>>Apr 19 14:43:56 server1 clurgmgrd[7498]: <notice> status on nfsclient "nis-hosts-ro" returned 1 (generic error)
>>Apr 19 14:43:56 server1 clurgmgrd[7498]: <notice> status on nfsclient "nis-hosts" returned 1 (generic error)
>>Apr 19 14:44:56 server1 clurgmgrd[7498]: <notice> status on nfsclient "nis-hosts-ro" returned 1 (generic error)
>>Apr 19 14:44:56 server1 clurgmgrd[7498]: <notice> status on nfsclient "nis-hosts" returned 1 (generic error)
>
>
> Hmm, that's odd, it could be a bug in the status phase which is related
> to NIS exports. Does this only happen after a failover, or does it
> happen all the time?
My cluster only has one node (even if I have defined 2 nodes). I have to get
the first node production ready and migrate everything over first. Then make
the old file server a second cluster node.
I'll have a look around and see if I can find a solution.
--
birger
More information about the Linux-cluster
mailing list