[Linux-cluster] dlm caused a kernel panic

Patrick Caulfield pcaulfie at redhat.com
Wed Dec 14 08:53:49 UTC 2005


Jeff Dinisco wrote:
> I'm running FC4 (2.6.13-1.1532_FC4smp), dlm-1.0.0-3 and GFS-6.1.0-3.  I
> have a 3 node cluster.  The df command has always been very slow to
> return output on my gfs mounted filesystems.  Series of events...
> 
> 16:20:00 - node01 was out of the cluster, node02 and node03 were active
> with 2 gfs filesystems mounted
> 16:22:10 - after joining the cluster, both filesystems were successfully
> mounted
> 16:22:37 - a df command was attempted by a monitoring script
> 16:22:54 - I executed /etc/init.d/gfs stop and it failed because 1 of
> the filesystems was busy and could not be umounted (the above df command
> may have been the cause, it ended up hanging)
> 
> 16:22:55 - node02 and node03 panicked and were not properly fenced

If there was only one node left in the cluster it would not fence the other
two because it doesn't have quorum. So it can't be sure that it's not just
been cut off from the other two nodes and they might be working fine.

> Dec 13 16:22:56 node02 kernel: ------------[ cut here ]------------
> Dec 13 16:22:56 node02 kernel: kernel BUG at
> /usr/src/build/627959-i686/BUILD/smp/src/lockqueue.c:1007!

I can reproduce this under very heavy lock load, but I'm not sure what's
causing it as yet. The "flood" tool I check in to STABLE yesterday is almost
guaranteed to cause it.

-- 

patrick




More information about the Linux-cluster mailing list