[Linux-cluster] Network hiccup + power-fencing =both nodes godown(redhat cluster 4)

Tue Jan 17 14:34:09 UTC 2006

Am I seeing actual problems: I was earlier :)  I had the network blip,
then both boxes fenced each-other after it recoverd.  By blip, I'm
guessing it was more than just a blip - more than 21 seconds.  I think
it was long enough that they both tried to fence the other, so that
process was running when the network came back online, and then they did
their thing.

I don't mind a 2nd round of cycling on the bad box, really.  I don't
like it, but I care more about the one that's up and running with my
database one it more than the one that failed.

Thank you sir for all your help.
Jeff

-----Original Message-----
From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Patrick Caulfield
Sent: Tuesday, January 17, 2006 9:03 AM
To: linux clustering
Subject: Re: [Linux-cluster] Network hiccup + power-fencing =both nodes
godown(redhat cluster 4)

Jeff Harr wrote:
> Thanks Patrick.  I have upped my deadnode_timeouts to 120 each.  
> 
> My worry though is the box somehow rebooting and joining faster than
the
> other can wait its 120 seconds and take over the cluster.  Is there
> another timeout value that I can tweak to keep the original, crashed
> node from rebooting and joining too quickly?  Unfortunately, when the
> boxes crash they seem to come right back up and not stay dead.  I
think
> this might be ILO behavior, but not sure.  I know when I shutdown -hy
> now, they stay down, and when the power-fencing takes place they stay
> down too, but not for crashes.
> 

If the crashed node tries to join while the other node thinks it's still
in
the cluster then it will get rejected and its join should fail. Of
course the
other node will still think it's alive but won't be able to talk to it
because
it doesn't have any services running.

When the remaining node notices it has gone then it should fence it (and
cause
another power cycle!). So things should be OK.

Are you seeing actual problems ?
-- 

patrick

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster