[Linux-cluster] Two node cluster, start CMAN fence the other node

Alex Re are at gmx.es
Thu Apr 15 17:05:01 UTC 2010


Good afternoon,
I'm trying to form my first cluster of two nodes, using iLO fence 
devices. I need some help because I can't find what I've missed.
My main problem is that the "service cman start" reboots the other node 
and I can't form the two nodes cluster.
I'm using (at both nodea and nodeb, they are on the same VLAN and pings 
each other ok):

[root at nodea ~]# uname -a
Linux nodea 2.6.18-164.15.1.el5 #1 SMP Wed Mar 17 11:30:06 EDT 2010 
x86_64 x86_64 x86_64 GNU/Linux
[root at nodea ~]# rpm -qa |grep cman
cman-2.0.115-1.el5_4.9

[root at nodea ~]# cat /etc/cluster/cluster.conf (nodeb has the same file)
<?xml version="1.0" ?>
<cluster alias="VCluster" config_version="5" name="VCluster">
<fence_daemon post_fail_delay="0" post_join_delay="25"/>
<clusternodes>
<clusternode name="nodea" nodeid="1" votes="1">
<fence>
<method name="1">
<device name="nodeaILO"/>
</method>
</fence>
</clusternode>
<clusternode name="nodeb" nodeid="2" votes="1">
<fence>
<method name="1">
<device name="nodebILO"/>
</method>
</fence>
</clusternode>
</clusternodes>
<cman expected_votes="1" two_node="1"/>
<fencedevices>
<fencedevice agent="fence_ilo" hostname="nodeacn" login="user" 
name="nodeaILO" passwd="hp"/>
<fencedevice agent="fence_ilo" hostname="nodebcn" login="user" 
name="nodebILO" passwd="hp"/>
</fencedevices>
<rm>
<failoverdomains/>
<resources/>
</rm>
</cluster>

When I start the cman service, it hangs up for some time at the 
"Starting fencing..." step and after those configured 25secs it fences 
nodeb and reboots it.
[root at nodea ~]# service cman start
Starting cluster:
    Loading modules... done
    Mounting configfs... done
    Starting ccsd... done
    Starting cman... done
    Starting daemons... done
    Starting fencing... done
                                                            [  OK  ]

"nodeb" gets rebooted:
[root at nodeb ~]#
Broadcast message from root (Thu Apr 15 18:42:24 2010):

The system is going down for system halt NOW!

At the syslog I just can find:
Apr 15 18:40:59 nodea ccsd[16930]: Initial status:: Quorate
Apr 15 18:40:59 nodea openais[16936]: [CLM  ] Members Left:
Apr 15 18:40:59 nodea openais[16936]: [CLM  ] Members Joined:
Apr 15 18:40:59 nodea openais[16936]: [CLM  ] CLM CONFIGURATION CHANGE
Apr 15 18:41:00 nodea openais[16936]: [CLM  ] New Configuration:
Apr 15 18:41:00 nodea openais[16936]: [CLM  ]     r(0) ip(10.192.16.42)
Apr 15 18:41:00 nodea openais[16936]: [CLM  ] Members Left:
Apr 15 18:41:00 nodea openais[16936]: [CLM  ] Members Joined:
Apr 15 18:41:00 nodea openais[16936]: [CLM  ]     r(0) ip(10.192.16.42)
Apr 15 18:41:00 nodea openais[16936]: [SYNC ] This node is within the 
primary component and will provide service.
Apr 15 18:41:00 nodea openais[16936]: [TOTEM] entering OPERATIONAL state.
Apr 15 18:41:00 nodea openais[16936]: [CMAN ] quorum regained, resuming 
activity
Apr 15 18:41:00 nodea openais[16936]: [CLM  ] got nodejoin message 
10.192.16.42
Apr 15 18:42:11 nodea fenced[16955]: nodeb not a cluster member after 25 
sec post_join_delay
Apr 15 18:42:11 nodea fenced[16955]: fencing node "nodeb"
Apr 15 18:42:23 nodea fenced[16955]: fence "nodeb" success

[root at nodea ~]# clustat
Cluster Status for VCluster @ Thu Apr 15 18:55:23 2010
Member Status: Quorate

  Member Name                                                     ID   
Status
  ------ ----                                                     ---- 
------
  nodea                                                               1 
Online, Local
  nodeb                                                               2 
Offline

Then when nodeb starts again, I try to start cman there to join the 
cluster... but it again fences "nodea":
[root at nodeb ~]# clustat
Could not connect to CMAN: No such file or directory
[root at nodeb ~]# service cman start
Starting cluster:
    Loading modules... done
    Mounting configfs... done
    Starting ccsd... done
    Starting cman... done
    Starting qdiskd... done
    Starting daemons... done
    Starting fencing... (wait for 25secs again) done
                                                            [  OK  ]
"nodea" gets rebooted:
[root at nodea ~]#
Broadcast message from root (Thu Apr 15 18:58:40 2010):

The system is going down for system halt NOW!

Apr 15 18:57:31 nodeb openais[11789]: [CLM  ] Members Joined:
Apr 15 18:57:31 nodeb openais[11789]: [CLM  ]     r(0) ip(10.192.16.44)
Apr 15 18:57:31 nodeb openais[11789]: [SYNC ] This node is within the 
primary component and will provide service.
Apr 15 18:57:31 nodeb openais[11789]: [TOTEM] entering OPERATIONAL state.
Apr 15 18:57:31 nodeb openais[11789]: [CMAN ] quorum regained, resuming 
activity
Apr 15 18:57:31 nodeb openais[11789]: [CLM  ] got nodejoin message 
10.192.16.44
Apr 15 18:57:34 nodeb qdiskd[10323]: <info> Quorum Daemon Initializing
Apr 15 18:57:34 nodeb qdiskd[10323]: <crit> Initialization failed
Apr 15 18:58:42 nodeb fenced[11816]: nodea not a cluster member after 25 
sec post_join_delay
Apr 15 18:58:42 nodeb fenced[11816]: fencing node "nodea"
Apr 15 18:58:54 nodeb fenced[11816]: fence "nodea" success

And I can't get the two nodes, joining the cluster...
I guess I'm missing something at the cluster.conf file??? I can't find 
what I'm making wrong.

Thanks for any help!

Alex Re
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20100415/6cfac950/attachment.htm>


More information about the Linux-cluster mailing list