[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

[Linux-cluster] RHEL 5.3 NFSv4 cluster



Hi,

 

Is there an up to date document detailing the configuration of an NFSv4 cluster service on a 2-node RHEL 5.3 Cluster Suite setup? Most of the info I find is from 2006/2007 and states that these features are in a state of flux and could change soon.

 

My current configuration is 2 nodes, RHEL 5.3 (kernel 2.6.18-128.1.14.el5PAE), SAN attached shared storage, with GFS2 file systems.

 

I read the documents at:

http://wiki.linux-nfs.org/wiki/index.php/NFS_Recovery_and_Client_Migration

http://www.howtoforge.com/high_availability_nfs_drbd_heartbeat

 

And also the NFS cluster cookbook and Red Hat’s NFS cluster example. The former two are fairly old, and the latter two documents seem fairly basic and don’t address certain issues like:

 

  1. Is it still recommended to configure /var/lib/nfs/v4recovery on a shared file system between nodes?
  2. Do I need to set the “fsid=” parameter for every export in /etc/exports and set it to a unique value? (I currently only have fsid set for nfs root)
  3. Should I set all of the RPC services in /etc/sysconfig/nfs to listen on a dedicated port?
  4. Can I leave the NFS service running on both nodes at the same time and just fail over the IP address, or should I add the nfs service script to the cluster config to start/stop it as part of the service?
  5. The NFS Recovery and Client Migration doc above mentions that lock migration is not handled yet and that there needs to be a way to release locks and leases during failover. Has this been addressed somehow? Does stopping/starting the NFS service accomplish this?

 

Also, when mounting my NFS shares using the cluster’s virtual IP address or name, I get some errors in my NFS server’s logs regarding timed out callbacks:

 

Jun 25 15:00:12 node2 kernel: nfs4_cb: server <CLIENT1 IP ADDRESS> not responding, timed out

Jun 25 17:07:37 node2 kernel: nfs4_cb: server <CLIENT2 IP ADDRESS> not responding, timed out

 

If I mount the file system using the cluster node’s static address/name, these errors don’t appear, but for obvious reasons, this is undesirable.

 

Thanks,

Eric


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]