[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

[Linux-cluster] Network hiccup + power-fencing = both nodes go down (redhat cluster 4)

Hi all, it has been a while since I posted anything.  Once again, I’d appreciate anything anyone has to say regarding this latest issue.  Basically, we have a situation where both nodes are suddenly unable to reach each other due to a “network hiccup”, and they begin trying to fence each other (power fencing).  Then suddenly, the network returns and they turn each other off.  My need: make redhat cluster robust enough not to do this.  It could be that my configurations are wrong, and I’m going to include them (attached).


My idea/solution: I THINK I could increase the post-fail-delay to a higher number than 0, thus making it wait to see if things “come back up”.  Perhaps I make 1 node wait like 2 minutes for the other one to come up, and another node wait zero seconds.  Thus insuring that nobody does anything at the same time?


Some small proof that the dual-reboot happened:

I know that both boxes fenced the other and “succeeded”, and my ILO event logs show both servers being powered off.


Thanks a lot,



Attachment: cluster_db2.conf
Description: cluster_db2.conf

Attachment: cluster_db1.conf
Description: cluster_db1.conf

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]