[Linux-cluster] Cman doesn't realize the failed node

Hakan VELIOGLU veliogluh at itu.edu.tr
Wed Nov 12 11:17:00 UTC 2008


Hi,

I am testing and trying to understand the cluster environment. I ve  
built a two node cluster system without any service (Red Hat EL 5.2  
x64). I run the cman and rgmanager services succesfully and then  
poweroff one node suddenly. After thsi I excpect that the other node  
realize this failure and take up all the resources however running  
node doesn't realize this failure. I use "cman_tool nodes" and  
"clustat" commands and they say the failed node is active and online.  
What am i missing? Why cman doesn't realize the failure?

[root at cl1 ~]# cat /etc/cluster/cluster.conf
<?xml version="1.0" ?>
<cluster alias="kume" config_version="54" name="kume">
         <totem token="1000" hold="100"/>
         <fence_daemon post_fail_delay="0" post_join_delay="3"/>
         <clusternodes>
                 <clusternode name="cl2.cc.itu.edu.tr" nodeid="1" votes="1">
                         <fence/>
                 </clusternode>
                 <clusternode name="cl1.cc.itu.edu.tr" nodeid="2" votes="1">
                         <fence/>
                 </clusternode>
         </clusternodes>
         <cman expected_votes="1" two_node="1"/>
         <fencedevices/>
         <rm>
                 <failoverdomains>
                         <failoverdomain name="domain" ordered="1"  
restricted="1">
                                 <failoverdomainnode  
name="cl2.cc.itu.edu.tr" priority="1"/>
                                 <failoverdomainnode  
name="cl1.cc.itu.edu.tr" priority="2"/>
                         </failoverdomain>
                 </failoverdomains>
                 <resources/>
                 <service autostart="0" domain="domain"  
name="veritabani" recovery="restart"/>
         </rm>
</cluster>
[root at cl1 ~]#


When the node gows down, the TOTEM repeastedly logs messages like this.
Nov 12 13:12:57 cl1 openais[5809]: [TOTEM] The consensus timeout expired.
Nov 12 13:12:57 cl1 openais[5809]: [TOTEM] entering GATHER state from 3.
Nov 12 13:13:03 cl1 openais[5809]: [TOTEM] The consensus timeout expired.
Nov 12 13:13:03 cl1 openais[5809]: [TOTEM] entering GATHER state from 3.
Nov 12 13:13:09 cl1 openais[5809]: [TOTEM] The consensus timeout expired.
Nov 12 13:13:09 cl1 openais[5809]: [TOTEM] entering GATHER state from 3.
Nov 12 13:13:14 cl1 openais[5809]: [TOTEM] The consensus timeout expired.
Nov 12 13:13:14 cl1 openais[5809]: [TOTEM] entering GATHER state from 3.



Hakan





More information about the Linux-cluster mailing list