[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Linux-cluster] GFS 6.0 node without quorum tries to fence



On Tue, Aug 03, 2004 at 10:49:26AM -0500, Derek Anderson wrote:
> Bernd,
> 
> Please see http://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=128635.  There 
> are some outstanding fence issues here.

except that bug is on cman, and this is about gulm.

> On Tuesday 03 August 2004 06:55, Schumacher, Bernd wrote:
> > Hi,
> > I have three nodes oben, mitte and unten.
> >
> > Test:
> > I have disabled eth0 on mitte, so that mitte will be excluded.
> >
> > Result:
> > Oben and unten are trying to fence mitte and build a new cluster. OK!
> > But mitte tries to fence oben and unten. PROBLEM!

Actually not problem. just not what you expected.  Hopefully I can
explain why... (you have a netsplit. neither side knows what the other
is doing, and must assume that the other is dead and they are right.)

> > Why can this happen? Mitte knows that it can not build a cluster. See
> > Logfile from mitte: "Have 1, need 2"

So looking at what you gave below, mitte was master. (making this guess
from the "Core lost slave quorum" part of the message below.)  It knows
that it doesn't have quorum, it still is going to try to be the Master.
It does not know "that it can not build a cluster."  The only thing it
knows right now about the other nodes is that they failed to send
heartbeats.  Therefor they must have left the cluter abnormally.
Therefor it must fence them.

The other two nodes see that mitte have failed to reply to heartbeats.
Therefor it must have left the cluster abnormally.  Therefor it must be
fenced.

Both sides of the netsplit are trying to resolve things to regain the
cluster.  From an outsiders view point (which you and I have, the nodes
do not.) We can see that mitte's attempts are futile, oben and unten
will get control of the cluter.  But the node cannot see this.

This is what makes netsplits kind of ugly.  

(using ifdown to test cluster stuff causes extra confusion in my
opinion. because you actually are creating a netsplit case.  Not a
simpler node down case.  The power switch is nice for this.)


I hope that made some sence.

-- 
Michael Conrad Tadpol Tilstra
Blood is thicker than water, and much tastier.

Attachment: pgp00003.pgp
Description: PGP signature


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]