[Linux-cluster] fence start-up issue

Subhendu Ghosh sghosh at redhat.com
Fri Sep 12 23:39:33 UTC 2008


Eric Ritchie wrote:
>    I sometimes run into an issue when a node in my 2-node cluster is 
> rebooting and hangs on fenced. It seems it can't communicate with the 
> other node and after the post_join_delay, it fences the other node. This 
> happened again today, and when the second node rebooted after the fence, 
> they were in a split-brain configuration.
>    I saw in the cluster faq, in the cman section, question 6 that the 
> cluster communication network should be the same network as the fencing 
> device. I think this may be my problem but I don't understand why. I'm 
> using HP iLo for fencing and I setup cross-connect cables for the 
> cluster communication between the 2 nodes. Why would having cluster 
> communication and fencing on different networks be an issue?
> 
> Thanks for your time
> 

Having distinct heartbeat and fencing networks creates the possibility of race 
condition, which you seem to be running into.

The cluster communication may not have stabilized in the post_join_delay time 
frame due to any number of issues including network outage.  In this case 
fencing would fail from the node starting up as it is the same path to fence 
device as to cluster member.

By separating the two - fence can succeed while cluster communication fails.

Recommendation would be for cluster communication and iLO reachability to be 
through the same NIC on the host.

-regards
Subhendu

-- 
Subhendu Ghosh
Solutions Architect
Red Hat




More information about the Linux-cluster mailing list