[Linux-cluster] Cluster NFS causes kernel bug

Fri Nov 2 19:40:42 UTC 2007

On Wed, 23 Oct 2007, Gordon wrote:

2) Thanks for the report on NFSv3/UDP.  From my reading that sounded
like something to avoid, but maybe I need to try it anyway.  How
reliable has it been?  Do the clients reconnect most times?

In your case, NFS over TCP is likely to have been the major cause of
your problems. UDP can fail over much more transparently, because there
is no state to it to expire. 

You could also try tweaking your timeout, retry, and hard vs. soft
failure modes on NFS. 

Gordan
------------------------------------------------------------------------
----------------

I found the major stability issue.  

Using managed IP *BAD*, using managed NFS ok.  I had the nfs service
running on all the nodes and just failed the IP over.  It does not
matter if you use UDP or TCP, the managed IP is flaky.

Using managed NFS over TCP was still a bit unstable but not nearly as
bad as the managed IP.

So what I have settled on for my testbed:
64bit AMD Opteron
CentOS 5.0
SAN with brocade switch and storage arrays
GFS1
Managed NFS
UDP over NFS

I will be doing stability testing over the next couple weeks and will
post my findings.  If the stability is good my testbed goes live before
the end of the year.

Thanks for all the hard work on RHCS :)
Tim
*****************************************************************
This e-mail and any files transmitted with it may be proprietary 
and are intended solely for the use of the individual or entity to 
whom they are addressed. If you have received this e-mail in 
error please notify the sender. Please note that any views or
opinions presented in this e-mail are solely those of the author 
and do not necessarily represent those of ITT Corporation. The 
recipient should check this e-mail and any attachments for the 
presence of viruses. ITT accepts no liability for any damage 
caused by any virus transmitted by this e-mail.
*******************************************************************