[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Linux-cluster] fence start-up issue



Eric Ritchie wrote:
I sometimes run into an issue when a node in my 2-node cluster is rebooting and hangs on fenced. It seems it can't communicate with the other node and after the post_join_delay, it fences the other node. This happened again today, and when the second node rebooted after the fence, they were in a split-brain configuration. I saw in the cluster faq, in the cman section, question 6 that the cluster communication network should be the same network as the fencing device. I think this may be my problem but I don't understand why. I'm using HP iLo for fencing and I setup cross-connect cables for the cluster communication between the 2 nodes. Why would having cluster communication and fencing on different networks be an issue?

Thanks for your time


Having distinct heartbeat and fencing networks creates the possibility of race condition, which you seem to be running into.

The cluster communication may not have stabilized in the post_join_delay time frame due to any number of issues including network outage. In this case fencing would fail from the node starting up as it is the same path to fence device as to cluster member.

By separating the two - fence can succeed while cluster communication fails.

Recommendation would be for cluster communication and iLO reachability to be through the same NIC on the host.

-regards
Subhendu

--
Subhendu Ghosh
Solutions Architect
Red Hat


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]