[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

[Linux-cluster] Hang on start fence_tool join with qdisk





Hi,

I have problem with my cluster running on RHEL5 + updates from http://people.redhat.com/lhh/rhel5-test/ I have 2 node cluster with shared quorum disk, qdiskd is running, but when I start service cman I hang on Starting fencing.
In my logs I have messages about regained qourum :

Jul 21 15:50:18 arf-web1 qdiskd[7326]: <info> Assuming master role
Jul 21 15:50:19 arf-web1 ccsd[8188]: Cluster is not quorate. Refusing connection. Jul 21 15:50:19 arf-web1 ccsd[8188]: Error while processing connect: Connection refused Jul 21 15:50:19 arf-web1 openais[8200]: [CMAN ] quorum regained, resuming activity
Jul 21 15:50:20 arf-web1 clurgmgrd[7746]: <notice> Quorum formed, starting
Jul 21 15:50:20 arf-web1 kernel: dlm: no local IP address has been set
Jul 21 15:50:20 arf-web1 kernel: dlm: cannot start dlm lowcomms -12


After few minutes process of starting fencing finished , but I still do not have running services and in group_tool I see that joining to fence domain is not complete.

[root arf-web1 ~]# group_tool
type             level name     id       state
fence            0     default  00010002 JOIN_START_WAIT
[2]

When I try issue commands like cman_tool or clustat I got nothing and hang on access to socket /var/run/cman_client (but can Ctrl-C running command)
[root arf-web1 ~]# strace cman_tool status
execve("/usr/sbin/cman_tool", ["cman_tool", "status"], [/* 21 vars */]) = 0
<skip>
socket(PF_FILE, SOCK_STREAM, 0)         = 3
fcntl(3, F_SETFD, FD_CLOEXEC)           = 0
connect(3, {sa_family=AF_FILE, path="/var/run/cman_client"}, 110 <unfinished ...>

[root arf-web1 ~]# strace clustat
execve("/usr/sbin/clustat", ["clustat"], [/* 21 vars */]) = 0
socket(PF_FILE, SOCK_STREAM, 0)         = 3
fcntl(3, F_SETFD, FD_CLOEXEC)           = 0
connect(3, {sa_family=AF_FILE, path="/var/run/cman_client"}, 110 <unfinished ...>

What can I do to resolve this ?


Thanks in advance,
Eugene



--
Eugene Melnichuk
Lead Engineer
email: doc umc ua <mailto:doc umc ua>
mob: +380503304043
pbx: +380501105731
CJSC Ukrainian Mobile Communications
49/2 Pobedy ave., room 4.26, 03680, Kyiv, Ukraine


<?xml version="1.0"?>
<cluster alias="arf-web" config_version="19" name="arf-web">
	<fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="12"/>
	<clusternodes>
		<clusternode name="arf-web2.mts.com.ua" nodeid="1" votes="1">
			<fence>
				<method name="1">
					<device blade="4" name="kv-484-is-bld3kvm.mts.com.ua"/>
				</method>
			</fence>
		</clusternode>
		<clusternode name="arf-web1.mts.com.ua" nodeid="2" votes="1">
			<fence>
				<method name="1">
					<device blade="3" name="kv-484-is-bld3kvm.mts.com.ua"/>
				</method>
			</fence>
		</clusternode>
	</clusternodes>
	<cman expected_votes="2" two_node="0"/>
	<fencedevices>
		<fencedevice agent="fence_bladecenter" ipaddr="172.20.225.100" login="arfwebfence" name="kv-484-is-bld3kvm.mts.com.ua" passwd="*****"/>
	</fencedevices>
	<rm>
		<failoverdomains>
			<failoverdomain name="arf-web1" ordered="0" restricted="1">
				<failoverdomainnode name="arf-web1.mts.com.ua" priority="1"/>
			</failoverdomain>
			<failoverdomain name="arf-web2" ordered="0" restricted="1">
				<failoverdomainnode name="arf-web2.mts.com.ua" priority="1"/>
			</failoverdomain>
		</failoverdomains>
		<resources>
			<clusterfs device="/dev/mapper/arf.web.log-lvol0" force_unmount="0" fsid="6208" fstype="gfs" mountpoint="/opt/web.log" name="arf.web.log"/>
			<clusterfs device="/dev/mapper/arf.web.root-lvol0" force_unmount="0" fsid="49796" fstype="gfs" mountpoint="/opt/web.root" name="arf.web.root"/>
			<script file="/opt/web.root/arf-web/rc.httpd-arf-web1" name="rc.httpd-arf-web1"/>
			<script file="/opt/web.root/arf-web/rc.httpd-arf-web2" name="rc.httpd-arf-web2"/>
		</resources>
		<service autostart="1" domain="arf-web1" exclusive="0" name="arf-web1" recovery="restart">
			<clusterfs ref="arf.web.root">
				<clusterfs ref="arf.web.log">
					<script ref="rc.httpd-arf-web1"/>
				</clusterfs>
			</clusterfs>
		</service>
		<service autostart="1" domain="arf-web2" exclusive="0" name="arf-web2" recovery="restart">
			<clusterfs ref="arf.web.root">
				<clusterfs ref="arf.web.log">
					<script ref="rc.httpd-arf-web2"/>
				</clusterfs>
			</clusterfs>
		</service>
	</rm>
	<quorumd label="arf.web.quorum" interval="1" min_score="1" tko="10" votes="1" status_file="/tmp/quorum">
		<heuristic interval="2" program="ping 172.25.39.1 -c1 -t1" score="1"/>
	</quorumd>
</cluster>


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]