Hi all, it has been a while since I posted anything. Once again, I’d appreciate anything anyone has to say regarding this latest issue. Basically, we have a situation where both nodes are suddenly unable to reach each other due to a “network hiccup”, and they begin trying to fence each other (power fencing). Then suddenly, the network returns and they turn each other off. My need: make redhat cluster robust enough not to do this. It could be that my configurations are wrong, and I’m going to include them (attached).
My idea/solution: I THINK I could increase the post-fail-delay to a higher number than 0, thus making it wait to see if things “come back up”. Perhaps I make 1 node wait like 2 minutes for the other one to come up, and another node wait zero seconds. Thus insuring that nobody does anything at the same time?
Some small proof that the dual-reboot happened:
I know that both boxes fenced the other and “succeeded”, and my ILO event logs show both servers being powered off.
Thanks a lot,