[Linux-cluster] Two node cluster, start CMAN fence the other node

Jeff Sturm jeff.sturm at eprize.com
Thu Apr 15 18:34:52 UTC 2010


For two node clusters there's a convenient workaround:  crossover cable.

 

You'll need a spare Ethernet port but that's easier than getting certain
switches to do multicast correctly.  (At least in my experience.)

 

From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of
Jason_Henderson at Mitel.com
Sent: Thursday, April 15, 2010 1:44 PM
To: linux clustering
Cc: linux-cluster at redhat.com; linux-cluster-bounces at redhat.com
Subject: Re: [Linux-cluster] Two node cluster,start CMAN fence the other
node

 


Most likely the multicast packet communication between the 2 nodes is
not getting through your network. 

linux-cluster-bounces at redhat.com wrote on 04/15/2010 01:05:01 PM:

> Good afternoon,
> I'm trying to form my first cluster of two nodes, using iLO fence 
> devices. I need some help because I can't find what I've missed. 
> My main problem is that the "service cman start" reboots the other 
> node and I can't form the two nodes cluster.
> I'm using (at both nodea and nodeb, they are on the same VLAN and 
> pings each other ok):
> 
> [root at nodea ~]# uname -a
> Linux nodea 2.6.18-164.15.1.el5 #1 SMP Wed Mar 17 11:30:06 EDT 2010 
> x86_64 x86_64 x86_64 GNU/Linux
> [root at nodea ~]# rpm -qa |grep cman
> cman-2.0.115-1.el5_4.9
> 
> [root at nodea ~]# cat /etc/cluster/cluster.conf (nodeb has the same
file)
> <?xml version="1.0" ?>
> <cluster alias="VCluster" config_version="5" name="VCluster">
>     <fence_daemon post_fail_delay="0" post_join_delay="25"/>
>     <clusternodes>
>         <clusternode name="nodea" nodeid="1" votes="1">
>             <fence>
>                 <method name="1">
>                     <device name="nodeaILO"/>
>                 </method>
>             </fence>
>         </clusternode>
>         <clusternode name="nodeb" nodeid="2" votes="1">
>             <fence>
>                 <method name="1">
>                     <device name="nodebILO"/>
>                 </method>
>             </fence>
>         </clusternode>
>     </clusternodes>
>     <cman expected_votes="1" two_node="1"/>
>     <fencedevices>
>         <fencedevice agent="fence_ilo" hostname="nodeacn" 
> login="user" name="nodeaILO" passwd="hp"/>
>         <fencedevice agent="fence_ilo" hostname="nodebcn" 
> login="user" name="nodebILO" passwd="hp"/>
>     </fencedevices>
>     <rm>
>         <failoverdomains/>
>         <resources/>
>     </rm>
> </cluster>
> 
> When I start the cman service, it hangs up for some time at the 
> "Starting fencing..." step and after those configured 25secs it 
> fences nodeb and reboots it.
> [root at nodea ~]# service cman start
> Starting cluster: 
>    Loading modules... done
>    Mounting configfs... done
>    Starting ccsd... done
>    Starting cman... done
>    Starting daemons... done
>    Starting fencing... done
>                                                            [  OK  ]
> 
> "nodeb" gets rebooted:
> [root at nodeb ~]# 
> Broadcast message from root (Thu Apr 15 18:42:24 2010):
> 
> The system is going down for system halt NOW!
> 
> At the syslog I just can find:
> Apr 15 18:40:59 nodea ccsd[16930]: Initial status:: Quorate 
> Apr 15 18:40:59 nodea openais[16936]: [CLM  ] Members Left: 
> Apr 15 18:40:59 nodea openais[16936]: [CLM  ] Members Joined: 
> Apr 15 18:40:59 nodea openais[16936]: [CLM  ] CLM CONFIGURATION CHANGE

> Apr 15 18:41:00 nodea openais[16936]: [CLM  ] New Configuration: 
> Apr 15 18:41:00 nodea openais[16936]: [CLM  ]     r(0)
ip(10.192.16.42)  
> Apr 15 18:41:00 nodea openais[16936]: [CLM  ] Members Left: 
> Apr 15 18:41:00 nodea openais[16936]: [CLM  ] Members Joined: 
> Apr 15 18:41:00 nodea openais[16936]: [CLM  ]     r(0)
ip(10.192.16.42)  
> Apr 15 18:41:00 nodea openais[16936]: [SYNC ] This node is within 
> the primary component and will provide service. 
> Apr 15 18:41:00 nodea openais[16936]: [TOTEM] entering OPERATIONAL
state. 
> Apr 15 18:41:00 nodea openais[16936]: [CMAN ] quorum regained, 
> resuming activity 
> Apr 15 18:41:00 nodea openais[16936]: [CLM  ] got nodejoin message 
> 10.192.16.42 
> Apr 15 18:42:11 nodea fenced[16955]: nodeb not a cluster member 
> after 25 sec post_join_delay
> Apr 15 18:42:11 nodea fenced[16955]: fencing node "nodeb"
> Apr 15 18:42:23 nodea fenced[16955]: fence "nodeb" success
> 
> [root at nodea ~]# clustat
> Cluster Status for VCluster @ Thu Apr 15 18:55:23 2010
> Member Status: Quorate
> 
>  Member Name                                                     ID
Status
>  ------ ----                                                     ----
------
>  nodea                                                              
> 1 Online, Local
>  nodeb                                                              2
Offline
> 
> Then when nodeb starts again, I try to start cman there to join the 
> cluster... but it again fences "nodea":
> [root at nodeb ~]# clustat
> Could not connect to CMAN: No such file or directory
> [root at nodeb ~]# service cman start
> Starting cluster: 
>    Loading modules... done
>    Mounting configfs... done
>    Starting ccsd... done
>    Starting cman... done
>    Starting qdiskd... done
>    Starting daemons... done
>    Starting fencing... (wait for 25secs again) done
>                                                            [  OK  ]
> "nodea" gets rebooted:
> [root at nodea ~]# 
> Broadcast message from root (Thu Apr 15 18:58:40 2010):
> 
> The system is going down for system halt NOW!
> 
> Apr 15 18:57:31 nodeb openais[11789]: [CLM  ] Members Joined: 
> Apr 15 18:57:31 nodeb openais[11789]: [CLM  ]     r(0)
ip(10.192.16.44)  
> Apr 15 18:57:31 nodeb openais[11789]: [SYNC ] This node is within 
> the primary component and will provide service. 
> Apr 15 18:57:31 nodeb openais[11789]: [TOTEM] entering OPERATIONAL
state. 
> Apr 15 18:57:31 nodeb openais[11789]: [CMAN ] quorum regained, 
> resuming activity 
> Apr 15 18:57:31 nodeb openais[11789]: [CLM  ] got nodejoin message 
> 10.192.16.44 
> Apr 15 18:57:34 nodeb qdiskd[10323]: <info> Quorum Daemon Initializing

> Apr 15 18:57:34 nodeb qdiskd[10323]: <crit> Initialization failed 
> Apr 15 18:58:42 nodeb fenced[11816]: nodea not a cluster member 
> after 25 sec post_join_delay
> Apr 15 18:58:42 nodeb fenced[11816]: fencing node "nodea"
> Apr 15 18:58:54 nodeb fenced[11816]: fence "nodea" success
> 
> And I can't get the two nodes, joining the cluster...
> I guess I'm missing something at the cluster.conf file??? I can't 
> find what I'm making wrong.
> 
> Thanks for any help!
> 
> Alex Re--
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20100415/9caf7328/attachment.htm>


More information about the Linux-cluster mailing list