[Linux-cluster] Why my cluster stop to work when one node down?

gordan at bobich.net gordan at bobich.net
Wed Apr 2 15:16:16 UTC 2008


Replace:

<cman expected_votes="1">
</cman>

with

<cman two_node="1" expected_votes="1"/>

in cluster.conf.

Gordan

On Wed, 2 Apr 2008, Tiago Cruz wrote:

> Hello guys,
>
> I have one cluster with two machines, running RHEL 5.1 x86_64.
> The Storage device has imported using GNDB and formated using GFS, to
> mount on both nodes:
>
> [root at teste-spo-la-v1 ~]# gnbd_import -v -l
> Device name : cluster
> ----------------------
>    Minor # : 0
> sysfs name : /block/gnbd0
>     Server : gnbdserv
>       Port : 14567
>      State : Open Connected Clear
>   Readonly : No
>    Sectors : 20971520
>
> # gfs2_mkfs -p lock_dlm -t mycluster:export1 -j 2 /dev/gnbd/cluster
> # mount /dev/gnbd/cluster /mnt/
>
> Everything works graceful, until one node get out (shutdown, network
> stop, xm destroy...)
>
>
> teste-spo-la-v1 clurgmgrd[3557]: <emerg> #1: Quorum Dissolved Apr  2 12:00:07 teste-spo-la-v1 openais[1545]: [TOTEM] entering GATHER state from 0.
> Apr  2 12:00:07 teste-spo-la-v1 openais[1545]: [TOTEM] Creating commit token because I am the rep.
> Apr  2 12:00:07 teste-spo-la-v1 openais[1545]: [TOTEM] Saving state aru 46 high seq received 46
> Apr  2 12:00:07 teste-spo-la-v1 openais[1545]: [TOTEM] Storing new sequence id for ring 4c
> Apr  2 12:00:07 teste-spo-la-v1 openais[1545]: [TOTEM] entering COMMIT state.
> Apr  2 12:00:07 teste-spo-la-v1 openais[1545]: [TOTEM] entering RECOVERY state.
> Apr  2 12:00:07 teste-spo-la-v1 openais[1545]: [TOTEM] position [0] member 10.25.0.251:
> Apr  2 12:00:07 teste-spo-la-v1 openais[1545]: [TOTEM] previous ring seq 72 rep 10.25.0.251
> Apr  2 12:00:07 teste-spo-la-v1 openais[1545]: [TOTEM] aru 46 high delivered 46 received flag 1
> Apr  2 12:00:07 teste-spo-la-v1 openais[1545]: [TOTEM] Did not need to originate any messages in recovery.
> Apr  2 12:00:07 teste-spo-la-v1 openais[1545]: [TOTEM] Sending initial ORF token
> Apr  2 12:00:07 teste-spo-la-v1 openais[1545]: [CLM  ] CLM CONFIGURATION CHANGE
> Apr  2 12:00:07 teste-spo-la-v1 kernel: dlm: closing connection to node 3
> Apr  2 12:00:07 teste-spo-la-v1 openais[1545]: [CLM  ] New Configuration:
> Apr  2 12:00:07 teste-spo-la-v1 clurgmgrd[3557]: <emerg> #1: Quorum Dissolved
> Apr  2 12:00:07 teste-spo-la-v1 openais[1545]: [CLM  ] 	r(0) ip(10.25.0.251)
> Apr  2 12:00:07 teste-spo-la-v1 openais[1545]: [CLM  ] Members Left:
> Apr  2 12:00:07 teste-spo-la-v1 openais[1545]: [CLM  ] 	r(0) ip(10.25.0.252)
> Apr  2 12:00:07 teste-spo-la-v1 openais[1545]: [CLM  ] Members Joined:
> Apr  2 12:00:07 teste-spo-la-v1 openais[1545]: [CMAN ] quorum lost, blocking activity
> Apr  2 12:00:07 teste-spo-la-v1 openais[1545]: [CLM  ] CLM CONFIGURATION CHANGE
> Apr  2 12:00:07 teste-spo-la-v1 openais[1545]: [CLM  ] New Configuration:
> Apr  2 12:00:07 teste-spo-la-v1 openais[1545]: [CLM  ] 	r(0) ip(10.25.0.251)
> Apr  2 12:00:07 teste-spo-la-v1 openais[1545]: [CLM  ] Members Left:
> Apr  2 12:00:07 teste-spo-la-v1 openais[1545]: [CLM  ] Members Joined:
> Apr  2 12:00:07 teste-spo-la-v1 openais[1545]: [SYNC ] This node is within the primary component and will provide service.
> Apr  2 12:00:07 teste-spo-la-v1 openais[1545]: [TOTEM] entering OPERATIONAL state.
> Apr  2 12:00:07 teste-spo-la-v1 openais[1545]: [CLM  ] got nodejoin message 10.25.0.251
> Apr  2 12:00:07 teste-spo-la-v1 openais[1545]: [CPG  ] got joinlist message from node 2
> Apr  2 12:00:12 teste-spo-la-v1 ccsd[1539]: Cluster is not quorate.  Refusing connection.
> Apr  2 12:00:12 teste-spo-la-v1 ccsd[1539]: Error while processing connect: Connection refused
> Apr  2 12:00:16 teste-spo-la-v1 ccsd[1539]: Cluster is not quorate.  Refusing connection.
> Apr  2 12:00:17 teste-spo-la-v1 ccsd[1539]: Error while processing connect: Connection refused
> Apr  2 12:00:22 teste-spo-la-v1 ccsd[1539]: Cluster is not quorate.  Refusing connection.
>
>
> So then, my GFS mount point has broken... the terminal freeze when I try
> to access the directory "/mnt" and just come back when the second node
> has back again to the cluster.
>
>
> Follow the cluster.conf:
>
> <?xml version="1.0"?>
> <cluster name="mycluster" config_version="2">
>
> <cman expected_votes="1">
> </cman>
>
> <fence_daemon post_join_delay="60">
> </fence_daemon>
>
> <clusternodes>
> <clusternode name="node1.mycluster.com" nodeid="2">
> 	<fence>
> 		<method name="single">
> 			<device name="gnbd" ipaddr="10.25.0.251"/>
> 		</method>
> 	</fence>
> </clusternode>
> <clusternode name="node2.mycluster.com" nodeid="3">
> 	<fence>
> 		<method name="single">
> 			<device name="gnbd" ipaddr="10.25.0.252"/>
> 		</method>
> 	</fence>
> </clusternode>
> </clusternodes>
>
> <fencedevices>
> 	<fencedevice name="gnbd" agent="fence_gnbd"/>
> </fencedevices>
> </cluster>
>
>
> Thanks!
>
> -- 
> Tiago Cruz
> http://everlinux.com
> Linux User #282636
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>




More information about the Linux-cluster mailing list