[Linux-cluster] GFS2 + NFS crash BUG: Unable to handle kernelNULL pointer deference

Colin Simpson Colin.Simpson at iongeo.com
Wed Jul 13 16:41:33 UTC 2011


Hi

I'd be looking at wanting to do single node NFS (active/passive,
failover) but we are currently running with CTDB Samba on both nodes for
this same directory. Would that work with "localflocks" and/or be
supported in such a config? 

I'm thinking the clustered samba would also have to go in such a config
using "localflocks".

Sadly the messages file says nothing at all, apart from one node reports
the other node isn't responding and it fences it. There are kdump disks
on the nodes, but the RHEL 6.1 update changed the kernel param to
autocrashkernel=auto and this doesn't work with the 6.0 kernel (just  we
are currently running due to another bug in 6.1's latest kernel. I seem
to remember I used "crashkernel=512M-2G:64M,2G-:128M" with the older
kernels, but I can't remember. Maybe I should try that again but the
only was I know to get a kdump is to set a large fence delay. 

Thanks

Colin

On Wed, 2011-07-13 at 16:53 +0100, Steven Whitehouse wrote:
> Hi,
> 
> On Tue, 2011-07-12 at 19:52 +0100, Colin Simpson wrote:
> > I just ask this as I have a cluster where we wish to share a project
> > directories and home dirs and have them accessible by Linux clients
> via
> > NFS and PC's via Samba. As I say the locking cross OS doesn't
> matter.
> >
> If it doesn't matter, then you can set the "localflocks" mount option
> on
> GFS2 and each mount will act as if it was a single node filesystem so
> far as locking is concerned. From a support perspective that config
> (active/active NFS and Samba) is not supported (by Red Hat), because
> we
> don't test it, because generally you do need locking in order to make
> it
> safe wrt to accesses between NFS/Samba.
> 
> It is something where we'd like to expand our support in future
> though,
> and the more requests we receive the better idea we get of exactly
> what
> use cases people require, and thus where to spend our time.
> 
> > And using 2.6.32-71.24.1.el6.x86_64 kernel we are seeing the kernel
> > often panicing (every week or so) on one node. Could this be the
> cause?
> >
> It shouldn't be. If the set up you've described causes problems then
> they will be in terms of coherency between the NFS and Samba exports,
> if
> you've got a panic then thats something else entirely.
> 
> > It's hard to catch as the fencing has stopped me so far getting a
> good
> > core (and the change to crashkernel param which changed in 6.1 the
> new
> > param doesn't play with the old kernel) . Plus I guess I need to see
> if
> > it happens on the latest kernels, but they are worse for me due to
> > BZ#712139. I guess the first thing I'll get from support is try the
> > latest hotfix kernel (which I can only get once I've tested the test
> > kernel). Also plus long fence intervals aren't great to capture.
> >
> Do you not get messages in syslog? Thats the first thing to look at,
> getting a core is helpful, but often not essential in kernel
> debugging.
> 
> > So is it time for me to look at going back to ext4 for an HA file
> > server.
> >
> > Can anyone from RH tell me if I'm wasting my time even trying this
> on
> > GFS2 (that I will get instability and kernel crashes)?
> >
> > Really unfortunate if so, as I really like the setup when it's
> > working.....
> >
> > Also, after a node crashes some GFS mounts aren't too happy, they
> take a
> > long time to mount back on the original failed node. The filesystems
> are
> > dirty when we fsck them lots of
> >
> > Ondisk and fsck bitmaps differ at block 109405952 (0x6856700)
> > Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free)
> > Metadata type is 0 (free)
> >
> > Some differences in free space etc
> >
> > Can anyone from RH tell me if I'm wasting my time even trying this
> on
> > GFS2 (that I will get GFS2 instability and kernel crashes)?
> >
> I suspect that it will not work exactly as you expect due to potential
> coherency issues, but you still should not be getting kernel crashes
> either way,
> 
> Steve.
> 
> > Thanks
> >
> > Colin
> >
> > On Tue, 2011-07-12 at 02:29 +0100, Colin Simpson wrote:
> > > OK, so my question is, is there any other reason apart from the
> risk
> > > of
> > > individual file corruption from locking being incompatible between
> > > local/samba vs NFS that may lead to issues i.e. we aren't really
> > > interested in locking working between NFS and local/samba access
> just
> > > that it works consistently in NFS when accessing files that way
> (with
> > > a
> > > single node server) and locally/samba when accessing files that
> way.
> > >
> > > I mean I'm thinking of, for example, I have a build that generates
> > > source code via NFS then some time later a PC comes in via Samba
> and
> > > accesses these files for building on that environment. The two
> systems
> > > aren't requiring locking to work cross platform/protocol, just
> need to
> > > be exported to the two systems. But locking on each one separately
> is
> > > useful.
> > >
> > > If there are and we should be using all access via NFS on NFS
> exported
> > > filesystems, one issue that also springs to mind is commercial
> backup
> > > systems that support GFS2 but don't support backing up via NFS.
> > >
> > > Is there anything else I should know about GFS2 limitations?
> > > Is there a book "GFS: The Missing Manual"? :)
> > >
> > > Thanks
> > >
> > > Colin
> > >
> > > On Mon, 2011-07-11 at 13:05 +0100, J. Bruce Fields wrote:
> > > > On Mon, Jul 11, 2011 at 11:43:58AM +0100, Steven Whitehouse
> wrote:
> > > > > Hi,
> > > > >
> > > > > On Mon, 2011-07-11 at 09:30 +0100, Alan Brown wrote:
> > > > > > On 08/07/11 22:09, J. Bruce Fields wrote:
> > > > > >
> > > > > > > With default mount options, the linux NFS client (like
> most
> > > NFS
> > > > clients)
> > > > > > > assumes that a file has a most one writer at a time.
> > > > (Applications that
> > > > > > > need to do write-sharing over NFS need to use file
> locking.)
> > > > > >
> > > > > > The problem is that file locking on V3 isn't passed back
> down to
> > > > the
> > > > > > filesystem - hence the issues with nfs vs samba (or local
> disk
> > > > > > access(*)) on the same server.
> > > >
> > > > The NFS server *does* acquire locks on the exported filesystem
> (and
> > > > does
> > > > it the same way for v2, v3, and v4).
> > > >
> > > > For local filesystems (ext3, xfs, btrfs), this is sufficient.
> > > >
> > > > For exports of cluster filesystems like gfs2, there are more
> > > > complicated
> > > > problems that, as Steve says, will require some work to do to
> fix.
> > > >
> > > > Samba is a more complicated issue due to the imperfect match
> between
> > > > Windows and Linux locking semantics, but depending on how it's
> > > > configured Samba will also acquire locks on the exported
> filesystem.
> > > >
> > > > --b.
> > > >
> > > > --
> > > > Linux-cluster mailing list
> > > > Linux-cluster at redhat.com
> > > > https://www.redhat.com/mailman/listinfo/linux-cluster
> > > >
> > > >
> > >
> > > This email and any files transmitted with it are confidential and
> are
> > > intended solely for the use of the individual or entity to whom
> they
> > > are addressed.  If you are not the original recipient or the
> person
> > > responsible for delivering the email to the intended recipient, be
> > > advised that you have received this email in error, and that any
> use,
> > > dissemination, forwarding, printing, or copying of this email is
> > > strictly prohibited. If you received this email in error, please
> > > immediately notify the sender and delete the original.
> > >
> > >
> > >
> > > --
> > > Linux-cluster mailing list
> > > Linux-cluster at redhat.com
> > > https://www.redhat.com/mailman/listinfo/linux-cluster
> > >
> > >
> >
> >
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> 





More information about the Linux-cluster mailing list