[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Linux-cluster] GFS2 + NFS crash BUG: Unable to handle kernelNULL pointer deference



Hi,

On Tue, 2011-07-12 at 19:52 +0100, Colin Simpson wrote:
> I just ask this as I have a cluster where we wish to share a project
> directories and home dirs and have them accessible by Linux clients via
> NFS and PC's via Samba. As I say the locking cross OS doesn't matter. 
> 
If it doesn't matter, then you can set the "localflocks" mount option on
GFS2 and each mount will act as if it was a single node filesystem so
far as locking is concerned. From a support perspective that config
(active/active NFS and Samba) is not supported (by Red Hat), because we
don't test it, because generally you do need locking in order to make it
safe wrt to accesses between NFS/Samba.

It is something where we'd like to expand our support in future though,
and the more requests we receive the better idea we get of exactly what
use cases people require, and thus where to spend our time.

> And using 2.6.32-71.24.1.el6.x86_64 kernel we are seeing the kernel
> often panicing (every week or so) on one node. Could this be the cause?
> 
It shouldn't be. If the set up you've described causes problems then
they will be in terms of coherency between the NFS and Samba exports, if
you've got a panic then thats something else entirely.

> It's hard to catch as the fencing has stopped me so far getting a good
> core (and the change to crashkernel param which changed in 6.1 the new
> param doesn't play with the old kernel) . Plus I guess I need to see if
> it happens on the latest kernels, but they are worse for me due to
> BZ#712139. I guess the first thing I'll get from support is try the
> latest hotfix kernel (which I can only get once I've tested the test
> kernel). Also plus long fence intervals aren't great to capture. 
> 
Do you not get messages in syslog? Thats the first thing to look at,
getting a core is helpful, but often not essential in kernel debugging.

> So is it time for me to look at going back to ext4 for an HA file
> server.
> 
> Can anyone from RH tell me if I'm wasting my time even trying this on
> GFS2 (that I will get instability and kernel crashes)? 
> 
> Really unfortunate if so, as I really like the setup when it's
> working.....
> 
> Also, after a node crashes some GFS mounts aren't too happy, they take a
> long time to mount back on the original failed node. The filesystems are
> dirty when we fsck them lots of 
> 
> Ondisk and fsck bitmaps differ at block 109405952 (0x6856700) 
> Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free)
> Metadata type is 0 (free)
> 
> Some differences in free space etc
> 
> Can anyone from RH tell me if I'm wasting my time even trying this on
> GFS2 (that I will get GFS2 instability and kernel crashes)? 
> 
I suspect that it will not work exactly as you expect due to potential
coherency issues, but you still should not be getting kernel crashes
either way,

Steve.

> Thanks
> 
> Colin
> 
> On Tue, 2011-07-12 at 02:29 +0100, Colin Simpson wrote:
> > OK, so my question is, is there any other reason apart from the risk
> > of
> > individual file corruption from locking being incompatible between
> > local/samba vs NFS that may lead to issues i.e. we aren't really
> > interested in locking working between NFS and local/samba access just
> > that it works consistently in NFS when accessing files that way (with
> > a
> > single node server) and locally/samba when accessing files that way.
> > 
> > I mean I'm thinking of, for example, I have a build that generates
> > source code via NFS then some time later a PC comes in via Samba and
> > accesses these files for building on that environment. The two systems
> > aren't requiring locking to work cross platform/protocol, just need to
> > be exported to the two systems. But locking on each one separately is
> > useful.
> > 
> > If there are and we should be using all access via NFS on NFS exported
> > filesystems, one issue that also springs to mind is commercial backup
> > systems that support GFS2 but don't support backing up via NFS.
> > 
> > Is there anything else I should know about GFS2 limitations?
> > Is there a book "GFS: The Missing Manual"? :)
> > 
> > Thanks
> > 
> > Colin
> > 
> > On Mon, 2011-07-11 at 13:05 +0100, J. Bruce Fields wrote:
> > > On Mon, Jul 11, 2011 at 11:43:58AM +0100, Steven Whitehouse wrote:
> > > > Hi,
> > > >
> > > > On Mon, 2011-07-11 at 09:30 +0100, Alan Brown wrote:
> > > > > On 08/07/11 22:09, J. Bruce Fields wrote:
> > > > >
> > > > > > With default mount options, the linux NFS client (like most
> > NFS
> > > clients)
> > > > > > assumes that a file has a most one writer at a time.
> > > (Applications that
> > > > > > need to do write-sharing over NFS need to use file locking.)
> > > > >
> > > > > The problem is that file locking on V3 isn't passed back down to
> > > the
> > > > > filesystem - hence the issues with nfs vs samba (or local disk
> > > > > access(*)) on the same server.
> > >
> > > The NFS server *does* acquire locks on the exported filesystem (and
> > > does
> > > it the same way for v2, v3, and v4).
> > >
> > > For local filesystems (ext3, xfs, btrfs), this is sufficient.
> > >
> > > For exports of cluster filesystems like gfs2, there are more
> > > complicated
> > > problems that, as Steve says, will require some work to do to fix.
> > >
> > > Samba is a more complicated issue due to the imperfect match between
> > > Windows and Linux locking semantics, but depending on how it's
> > > configured Samba will also acquire locks on the exported filesystem.
> > >
> > > --b.
> > >
> > > --
> > > Linux-cluster mailing list
> > > Linux-cluster redhat com
> > > https://www.redhat.com/mailman/listinfo/linux-cluster
> > >
> > >
> > 
> > This email and any files transmitted with it are confidential and are
> > intended solely for the use of the individual or entity to whom they
> > are addressed.  If you are not the original recipient or the person
> > responsible for delivering the email to the intended recipient, be
> > advised that you have received this email in error, and that any use,
> > dissemination, forwarding, printing, or copying of this email is
> > strictly prohibited. If you received this email in error, please
> > immediately notify the sender and delete the original.
> > 
> > 
> > 
> > --
> > Linux-cluster mailing list
> > Linux-cluster redhat com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> > 
> > 
> 
> 
> --
> Linux-cluster mailing list
> Linux-cluster redhat com
> https://www.redhat.com/mailman/listinfo/linux-cluster



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]