[Linux-cluster] Problems with Cluster


We have configured an active-active two node cluster accessing a
shared storage. We are using Red Hat Cluster Suite on RHEL  4Update 4.
The nodes are HP Servers having ILOs as the fence devices. I am facing
certain problems in the setup and would seek suggestions.

While booting up the systems, many a times one node forces the other
one to reboot for no reason. How do I prevent that ?

We want the failover to happen when the power supply fails to either
of the nodes. In order to test the scenario, we removed the power
cables from one of the nodes. However the failover did not happen and
upon observing the logs we found that the alive node could not connect
to the fence device (ILO in this case) of the dead node since it was
powered off and the fencing could not take place. Does this mean that
we would not be able to have a failover in case of power failure for
one of the nodes. Is there a way we can do it ? How is the cluster
supposed to react when the ILO itself is powered off ?

Another failover test was conducted by removing the network
connectivity on of one of the nodes. Failover did happen smoothly in
the dead node was rebooted by the fence device. However when the dead
node was reconnected to the network, it fenced and rebooted the alive
node. How to deal with this ?



