[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Linux-cluster] Re: Node2 kills node1 when it is booting ...



Stewart Walters wrote:
carlopmart wrote:
Stewart Walters wrote:
carlopmart wrote:
carlopmart wrote:
Hi all,

I need to setup another rhcs today with two nodes. But every times that I start second node, node1 returns this error:

cman killed by node 2 because we rejoined the cluster without a full restart

.. and cman stops on node1. Why?? I didn't find any solution under http://sources.redhat.com/cluster/wiki/FAQ/

 My nodes are rhel5.3

 Many thanks.


Please, I need your help ... Any ideas???


Sounds like node1 fenced node2, and node2 hasn't been rebooted since being fenced. Either that, or node2 uses manual fencing and you haven't yet manually acknowledged that it was rebooted.

Check your logs in /var/log/messages on node1, I'm pretty sure you'll see a reference there that node2 has been fenced.

You'll probably also see somewhere in the logs on node1, that it detected node2 did not leave the cluster after being fenced, and as a result node1 itself has decided to stop itself to prevent data corruption (the message will be something like that anyway).

If you are using manual fencing on a node2, after you reboot it you need to run "fence_manual_ack -n <node2>" from node1. Do this only after you've restarted node2 but before cman starts back up on it in the next boot sequence. At this point node1 will stop fencing node2 and both nodes should be able to join the cluster succesfully.

Manual fencing is evil :-)

Try to avoid it if you can - as you'll get this scenario on your cluster every time a node is fenced. This is the reason why Red Hat write in their documentation numerous times that manual fencing is not supported in Production clusters (it's almost as if they're trying to tell us something...). ;-)

Also, you mentioned that the solution was not found in the FAQ. While it might not include reference to this specific symptoms, I'm pretty sure the FAQ, the man pages for fence_manual and the RHCS documentation from Red Hat all cover the requirements of having to manually acknowleging nodes that use manual fencing. If you do in fact employ manual fencing in your cluster, you might want to go over this documentation again.

If you don't use manual fencing, please accept my apologies for expressing my general distaste for manual fencing instead of actually helping you!! :-)

Kind Regards,

Stewart

--
Linux-cluster mailing list
Linux-cluster redhat com
https://www.redhat.com/mailman/listinfo/linux-cluster


Many thanks for your help Stewart, but I don't use manual fence as fence device in this cluster. I am using gnbd to do this.

I post my cluster.conf

------------------------------------------------------------------------

--
Linux-cluster mailing list
Linux-cluster redhat com
https://www.redhat.com/mailman/listinfo/linux-cluster
Silly question then, have you actually restarted (i.e. actually rebooted) the cluster node1?

Regards,

Stewart

--
Linux-cluster mailing list
Linux-cluster redhat com
https://www.redhat.com/mailman/listinfo/linux-cluster

Yes, and then works, but when I need to do an ordered shutdown (first node1), fenced daemon on node2 doesn't stops ....



--
CL Martinez
carlopmart {at} gmail {d0t} com


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]