[Linux-cluster] Re: Node2 kills node1 when it is booting ...

carlopmart carlopmart at gmail.com
Tue Jan 27 10:19:07 UTC 2009


Stewart Walters wrote:
> carlopmart wrote:
>> carlopmart wrote:
>>> Hi all,
>>>
>>>  I need to setup another rhcs today with two nodes. But every times 
>>> that I start second node, node1 returns this error:
>>>
>>> cman killed by node 2 because we rejoined the cluster without a full 
>>> restart
>>>
>>>  .. and cman stops on node1. Why?? I didn't find any solution under 
>>> http://sources.redhat.com/cluster/wiki/FAQ/
>>>
>>>  My nodes are rhel5.3
>>>
>>>  Many thanks.
>>>
>>
>> Please, I need your help ... Any ideas???
>>
> 
> Sounds like node1 fenced node2, and node2 hasn't been rebooted since 
> being fenced. Either that, or node2 uses manual fencing and you haven't 
> yet manually acknowledged that it was rebooted.
> 
> Check your logs in /var/log/messages on node1, I'm pretty sure you'll 
> see a reference there that node2 has been fenced.
> 
> You'll probably also see somewhere in the logs on node1, that it 
> detected node2 did not leave the cluster after being fenced, and as a 
> result node1 itself has decided to stop itself to prevent data 
> corruption (the message will be something like that anyway).
> 
> If you are using manual fencing on a node2, after you reboot it you need 
> to run "fence_manual_ack -n <node2>" from node1.  Do this only after 
> you've restarted node2 but before cman starts back up on it in the next 
> boot sequence.  At this point node1 will stop fencing node2 and both 
> nodes should be able to join the cluster succesfully.
> 
> Manual fencing is evil :-)
> 
> Try to avoid it if you can - as you'll get this scenario on your cluster 
> every time a node is fenced.  This is the reason why Red Hat write in 
> their documentation numerous times that manual fencing is not supported 
> in Production clusters (it's almost as if they're trying to tell us 
> something...). ;-)
> 
> Also, you mentioned that the solution was not found in the FAQ.  While 
> it might not include reference to this specific symptoms, I'm pretty 
> sure the FAQ, the man pages for fence_manual and the RHCS documentation 
> from Red Hat all cover the requirements of having to manually 
> acknowleging nodes that use manual fencing.  If you do in fact employ 
> manual fencing in your cluster, you might want to go over this 
> documentation again.
> 
> If you don't use manual fencing, please accept my apologies for 
> expressing my general distaste for manual fencing instead of actually 
> helping you!! :-)
> 
> Kind Regards,
> 
> Stewart
> 
> -- 
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 

Many thanks for your help Stewart, but I don't use manual fence as fence device 
in this cluster. I am using gnbd to do this.

I post my cluster.conf

-- 
CL Martinez
carlopmart {at} gmail {d0t} com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cluster.conf
Type: text/xml
Size: 1675 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090127/4040c3d2/attachment.xml>


More information about the Linux-cluster mailing list