[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Linux-cluster] RHEL/CentOS-6 HA NFS Configuration Question



Those running HA NFS should be aware of the following two NFSD open leaks.

The first is the nfs4_open_downgrade leak:
http://marc.info/?l=linux-nfs&m=131077202109185&w=2
https://bugzilla.redhat.com/show_bug.cgi?id=714153

Redhat supposedly fixed this, but I never saw the errata go by.. while we
waited for them to fix it, we went to an upstream kernel and got bit
by this one:

http://marc.info/?l=linux-nfs&m=131077202109185&w=2

NFSD open leaks will cause your filesystems to fail to umount, even after
waiting through your lease time.  You'll see the device's open count
will be non-zero (dmsetup info <device>), even though the filesystem
is unexported, and kernel nfsds are stopped.

We've been running our NFS4 HA cluster for a few months now on
a 3.2.5 kernel, and failover/recovery works well.

Ben

On May 16, 2012, at 2:19 PM, Colin Simpson wrote:

> This is interesting.
> 
> We very often see the filesystems fail to umount on busy clustered NFS
> servers.
> 
> What is the nature of the "real fix"?
> 
> I like the idea of NFSD fully being in user space, so killing it would
> definitely free the fs.
> 
> Alan Brown (who's on this list) recently posted to a RH BZ that he was
> one of the people who moved it into kernel space for performance reasons
> in the past (that are no longer relevant):
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=580863#c9
> 
> , but I doubt this is the fix you have in mind.
> 
> Colin
> 
> On Tue, 2012-05-15 at 20:21 +0200, Fabio M. Di Nitto wrote:
>> This solves different issues at startup, relocation and recovery
>> 
>> Also note that there is known limitation in nfsd (both rhel5/6) that
>> could cause some problems in some conditions in your current
>> configuration. A permanent fix is being worked on atm.
>> 
>> Without extreme details, you might have 2 of those services running on
>> the same node and attempting to relocate one of them can fail because
>> the fs cannot be unmounted. This is due to nfsd holding a lock (at
>> kernel level) to the FS. Changing config to the suggested one, mask the
>> problem pretty well, but more testing for a real fix is in progress.
>> 
>> Fabio
>> 
>> --
>> Linux-cluster mailing list
>> Linux-cluster redhat com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> 
> ________________________________
> 
> 
> This email and any files transmitted with it are confidential and are intended solely for the use of the individual or entity to whom they are addressed. If you are not the original recipient or the person responsible for delivering the email to the intended recipient, be advised that you have received this email in error, and that any use, dissemination, forwarding, printing, or copying of this email is strictly prohibited. If you received this email in error, please immediately notify the sender and delete the original.
> 
> 
> --
> Linux-cluster mailing list
> Linux-cluster redhat com
> https://www.redhat.com/mailman/listinfo/linux-cluster



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]