[Linux-cluster] Re: Node2 kills node1 when it is booting ...

Stewart Walters stewart at epits.com.au
Tue Jan 27 09:48:43 UTC 2009


carlopmart wrote:
> carlopmart wrote:
>> Hi all,
>>
>>  I need to setup another rhcs today with two nodes. But every times 
>> that I start second node, node1 returns this error:
>>
>> cman killed by node 2 because we rejoined the cluster without a full 
>> restart
>>
>>  .. and cman stops on node1. Why?? I didn't find any solution under 
>> http://sources.redhat.com/cluster/wiki/FAQ/
>>
>>  My nodes are rhel5.3
>>
>>  Many thanks.
>>
>
> Please, I need your help ... Any ideas???
>

Sounds like node1 fenced node2, and node2 hasn't been rebooted since 
being fenced. Either that, or node2 uses manual fencing and you haven't 
yet manually acknowledged that it was rebooted.

Check your logs in /var/log/messages on node1, I'm pretty sure you'll 
see a reference there that node2 has been fenced.

You'll probably also see somewhere in the logs on node1, that it 
detected node2 did not leave the cluster after being fenced, and as a 
result node1 itself has decided to stop itself to prevent data 
corruption (the message will be something like that anyway).

If you are using manual fencing on a node2, after you reboot it you need 
to run "fence_manual_ack -n <node2>" from node1.  Do this only after 
you've restarted node2 but before cman starts back up on it in the next 
boot sequence.  At this point node1 will stop fencing node2 and both 
nodes should be able to join the cluster succesfully.

Manual fencing is evil :-)

Try to avoid it if you can - as you'll get this scenario on your cluster 
every time a node is fenced.  This is the reason why Red Hat write in 
their documentation numerous times that manual fencing is not supported 
in Production clusters (it's almost as if they're trying to tell us 
something...). ;-)

Also, you mentioned that the solution was not found in the FAQ.  While 
it might not include reference to this specific symptoms, I'm pretty 
sure the FAQ, the man pages for fence_manual and the RHCS documentation 
from Red Hat all cover the requirements of having to manually 
acknowleging nodes that use manual fencing.  If you do in fact employ 
manual fencing in your cluster, you might want to go over this 
documentation again.

If you don't use manual fencing, please accept my apologies for 
expressing my general distaste for manual fencing instead of actually 
helping you!! :-)

Kind Regards,

Stewart




More information about the Linux-cluster mailing list