[Linux-cluster] DLM locks with 1 node on 2 node cluster

Mon Aug 28 18:58:32 UTC 2006

(resending since forgot to include linux-cluster at redhat.com)
I am using manual fencing with gnbd fencing. Here is the tail on
/var/proc/messages:

Aug 28 14:17:06 bof227 fenced[2497]: bof226 not a cluster member after 0 sec
post_fail_delay Aug 28 14:17:06 bof227 kernel: CMAN: removing node bof226
from the cluster : Missed too many heartbeats Aug 28 14:17:06 bof227
fenced[2497]: fencing node "bof226"
Aug 28 14:17:06 bof227 fence_manual: Node bof226 needs to be reset before
recovery can procede.  Waiting for bof226 to rejoin the cluster or for
manual acknowledgement that it has been reset (i.e. fence_ack_manual -n
bof226)

************************ cluster.conf
<?xml version="1.0"?>
<cluster config_version="84" name="MZ_CLUSTER">
	<fence_daemon post_fail_delay="0" post_join_delay="3"/>
	<clusternodes>
		<clusternode name="bof227" votes="1">
			<fence>
				<method name="1">
					<device name="device_MF_227"
nodename="bof227"/>
					<device name="gnbd_server_bof226"
nodename="bof227"/>
				</method>
			</fence>
		</clusternode>
		<clusternode name="bof226" votes="1">
			<fence>
				<method name="1">
					<device name="device_MF_226"
nodename="bof226"/>
					<device name="gnbd_server_bof227"
nodename="bof226"/>
				</method>
			</fence>
		</clusternode>
	</clusternodes>
	<cman expected_votes="1" two_node="1"/>
	<fencedevices>
		<fencedevice agent="fence_manual" name="device_MF_226"/>
		<fencedevice agent="fence_manual" name="device_MF_227"/>
		<fencedevice agent="fence_gnbd" name="gnbd_server_bof226"
servers="bof226"/>
		<fencedevice agent="fence_gnbd" name="gnbd_server_bof227"
servers="bof227"/>
	</fencedevices>
	<rm>
		<failoverdomains>
			<failoverdomain name="FD_PREF_BOF226" ordered="1"
restricted="1">
				<failoverdomainnode name="bof226"
priority="1"/>
				<failoverdomainnode name="bof227"
priority="2"/>
			</failoverdomain>
			<failoverdomain name="FD_PREF_BOF_227" ordered="1"
restricted="1">
				<failoverdomainnode name="bof227"
priority="1"/>
				<failoverdomainnode name="bof226"
priority="2"/>
			</failoverdomain>
		</failoverdomains>
		<resources/>
	</rm>
</cluster> 

-----Original Message-----
From: David Teigland [mailto:teigland at redhat.com] 
Sent: Monday, August 28, 2006 2:36 PM
To: Zelikov, Mikhail
Cc: linux-cluster at redhat.com
Subject: Re: [Linux-cluster] DLM locks with 1 node on 2 node cluster

It's trying to fence the failed node and won't continue with recovery until
that's done.  What fencing method are you using in cluster.conf?
Are there any fencing error messages in /var/log/messages?  What does your
cluster.conf look like?

Dave