[Linux-cluster] GFS 6.0 node without quorum tries to fence

Michael Conrad Tadpol Tilstra mtilstra at redhat.com
Tue Aug 3 16:12:47 UTC 2004


On Tue, Aug 03, 2004 at 10:49:26AM -0500, Derek Anderson wrote:
> Bernd,
> 
> Please see http://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=128635.  There 
> are some outstanding fence issues here.

except that bug is on cman, and this is about gulm.

> On Tuesday 03 August 2004 06:55, Schumacher, Bernd wrote:
> > Hi,
> > I have three nodes oben, mitte and unten.
> >
> > Test:
> > I have disabled eth0 on mitte, so that mitte will be excluded.
> >
> > Result:
> > Oben and unten are trying to fence mitte and build a new cluster. OK!
> > But mitte tries to fence oben and unten. PROBLEM!

Actually not problem. just not what you expected.  Hopefully I can
explain why... (you have a netsplit. neither side knows what the other
is doing, and must assume that the other is dead and they are right.)

> > Why can this happen? Mitte knows that it can not build a cluster. See
> > Logfile from mitte: "Have 1, need 2"

So looking at what you gave below, mitte was master. (making this guess
from the "Core lost slave quorum" part of the message below.)  It knows
that it doesn't have quorum, it still is going to try to be the Master.
It does not know "that it can not build a cluster."  The only thing it
knows right now about the other nodes is that they failed to send
heartbeats.  Therefor they must have left the cluter abnormally.
Therefor it must fence them.

The other two nodes see that mitte have failed to reply to heartbeats.
Therefor it must have left the cluster abnormally.  Therefor it must be
fenced.

Both sides of the netsplit are trying to resolve things to regain the
cluster.  From an outsiders view point (which you and I have, the nodes
do not.) We can see that mitte's attempts are futile, oben and unten
will get control of the cluter.  But the node cannot see this.

This is what makes netsplits kind of ugly.  

(using ifdown to test cluster stuff causes extra confusion in my
opinion. because you actually are creating a netsplit case.  Not a
simpler node down case.  The power switch is nice for this.)


I hope that made some sence.

-- 
Michael Conrad Tadpol Tilstra
Blood is thicker than water, and much tastier.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20040803/d4d32a63/attachment.sig>


More information about the Linux-cluster mailing list