[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Linux-cluster] RHEL 6 two-node cluster - nodes killing each other's cman

On 07/27/2012 01:44 AM, DIMITROV, TANIO wrote:
I'm testing RHEL 6.2 cluster using CMAN.
It is a two-node cluster, no shared data. The problem is that if there is a connectivity problem between the nodes, each of them continues working as stand-alone - which is OK (no shared data, manual fencing). But when the connection comes back up the nodes kill each other's cman instances :

Jul 26 13:58:05.000 node1 corosync[15771]: cman killed by node 2 because we were killed by cman_tool or other application
Jul 26 13:58:05.000 node1 gfs_controld[15900]: cluster is down, exiting
Jul 26 13:58:05.000 node1 gfs_controld[15900]: daemon cpg_dispatch error 2
Jul 26 13:58:05.000 node1 dlm_controld[15848]: cluster is down, exiting

Can this be avoided somehow?

Thanks in advance!

The error you see is the result of 2 clusters with existing state trying to merge. Both nodes have previously been in a quorate cluster and therefore have existing cluster state. At this time, CMAN and other tools do not support merging cluster states so that is why you hit this problem. The solution is to implement fencing, because once a node is fenced and rebooted, it starts with no state (ie. not dirty) and can join the existing node (which has state, ie. dirty) successfully.

While it is possible to run clusters without fencing, behavior is designed with fencing in mind and you can end up with strange behavior like you've experienced when fencing doesn't trigger. In some occasions, both nodes will kill each other and you'll lose both cluster nodes. If this is really a critical system, I highly recommend fencing.


Ryan Mitchell
Red Hat Global Support Services

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]