[Linux-cluster] fence in xen
Rakovec Jost
Jost.Rakovec at snt.si
Sat Sep 11 16:36:44 UTC 2010
Hi list!
I have a question about fence_xvm.
Situation is:
one physical server with xen --> dom0 with 2 domU. Cluster work fine between domU --reboot, relocate,
I'm using redhat 5.5
Problem is with fence from dom0 with "fence_xvm -H oelcl2" , domU is destroyed but when it is booted back domU can't join to the cluster. domU boot very long time --> FENCED_START_TIMEOUT=300
on console I get after the node2 is up:
node2:
INFO: task clurgmgrd:2127 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
clurgmgrd D 0000000000000010 0 2127 2126 (NOTLB)
ffff88006f08dda8 0000000000000286 ffff88007cc0b810 0000000000000000
0000000000000003 ffff880072009860 ffff880072f6b0c0 00000000000455ec
ffff880072009a48 ffffffff802649d7
Call Trace:
[<ffffffff802649d7>] _read_lock_irq+0x9/0x19
[<ffffffff8021420e>] filemap_nopage+0x193/0x360
[<ffffffff80263a7e>] __mutex_lock_slowpath+0x60/0x9b
[<ffffffff80263ac8>] .text.lock.mutex+0xf/0x14
[<ffffffff88424b64>] :dlm:dlm_new_lockspace+0x2c/0x860
[<ffffffff80222b08>] __up_read+0x19/0x7f
[<ffffffff802d0abb>] __kmalloc+0x8f/0x9f
[<ffffffff8842b6fa>] :dlm:device_write+0x438/0x5e5
[<ffffffff80217377>] vfs_write+0xce/0x174
[<ffffffff80217bc4>] sys_write+0x45/0x6e
[<ffffffff802602f9>] tracesys+0xab/0xb6
between booting on node2:
Starting clvmd: dlm: Using TCP for communications
clvmd startup timed out
[FAILED]
node2:
[root at oelcl2 init.d]# clustat
Cluster Status for cluster1 @ Sat Sep 11 18:11:21 2010
Member Status: Quorate
Member Name ID Status
------ ---- ---- ------
oelcl1 1 Online
oelcl2 2 Online, Local
[root at oelcl2 init.d]#
on first node:
[root at oelcl1 ~]# clustat
Cluster Status for cluster1 @ Sat Sep 11 18:12:07 2010
Member Status: Quorate
Member Name ID Status
------ ---- ---- ------
oelcl1 1 Online, Local, rgmanager
oelcl2 2 Online, rgmanager
Service Name Owner (Last) State
------- ---- ----- ------ -----
service:webby oelcl1 started
[root at oelcl1 ~]#
and then I have to destroy both domU on guest and create it back to get node2 work again.
I have use how to on https://access.redhat.com/kb/docs/DOC-5937 and http://sources.redhat.com/cluster/wiki/VMClusterCookbook
cluster config on dom0
<?xml version="1.0"?>
<cluster alias="vmcluster" config_version="1" name="vmcluster">
<clusternodes>
<clusternode name="vm5" nodeid="1" votes="1"/>
</clusternodes>
<cman/>
<fencedevices/>
<rm/>
<fence_xvmd/>
</cluster>
cluster config on domU
<?xml version="1.0"?>
<cluster alias="cluster1" config_version="49" name="cluster1">
<fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="4"/>
<clusternodes>
<clusternode name="oelcl1.name.comi" nodeid="1" votes="1">
<fence>
<method name="1">
<device domain="oelcl1" name="xenfence1"/>
</method>
</fence>
</clusternode>
<clusternode name="oelcl2.name.com" nodeid="2" votes="1">
<fence>
<method name="1">
<device domain="oelcl2" name="xenfence1"/>
</method>
</fence>
</clusternode>
</clusternodes>
<cman expected_votes="1" two_node="1"/>
<fencedevices>
<fencedevice agent="fence_xvm" name="xenfence1"/>
</fencedevices>
<rm>
<failoverdomains>
<failoverdomain name="prefer_node1" nofailback="0" ordered="1" restricted="1">
<failoverdomainnode name="oelcl1.name.com" priority="1"/>
<failoverdomainnode name="oelcl2.name.com" priority="2"/>
</failoverdomain>
</failoverdomains>
<resources>
<ip address="xx.xx.xx.xx" monitor_link="1"/>
<fs device="/dev/xvdb1" force_fsck="0" force_unmount="0" fsid="8669" fstype="ext3" mountpoint="/var/www/html" name="docroot" self_fence="0"/>
<script file="/etc/init.d/httpd" name="apache_s"/>
</resources>
<service autostart="1" domain="prefer_node1" exclusive="0" name="webby" recovery="relocate">
<ip ref="xx.xx.xx.xx"/>
<fs ref="docroot"/>
<script ref="apache_s"/>
</service>
</rm>
</cluster>
fence proces on dom0
[root at vm5 cluster]# ps -ef |grep fenc
root 18690 1 0 17:40 ? 00:00:00 /sbin/fenced
root 18720 1 0 17:40 ? 00:00:00 /sbin/fence_xvmd -I xenbr0
root 22633 14524 0 18:21 pts/3 00:00:00 grep fenc
[root at vm5 cluster]#
and on domU
[root at oelcl1 ~]# ps -ef|grep fen
root 1523 1 0 17:41 ? 00:00:00 /sbin/fenced
root 13695 2902 0 18:22 pts/0 00:00:00 grep fen
[root at oelcl1 ~]#
Do somebody have any idea why fence don't work?
thx
br
jost
More information about the Linux-cluster
mailing list