[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

[Linux-cluster] GFS2 volumes hanging on 1 of 3 cluster nodes



Hello,

I've set up a 3-node cluster, where i seem to be having problems with some of my GFS2 mounts. All servers have 2 gfs2 mounts on iscsi luns, /var/lib/libvirt/sanlock and /etc/libvirt/qemu.

/dev/mapper/iscsi_cluster_qemu on /etc/libvirt/qemu type gfs2 (rw,relatime,hostdata=jid=0) /dev/mapper/iscsi_cluster_sanlock on /var/lib/libvirt/sanlock type gfs2 (rw,relatime,hostdata=jid=0)

Currently, on vm01-test, i cannot go to /var/lib/libvirt/sanlock:

root vm01-test:~# ls /var/lib/libvirt/sanlock
^C^C^C
^C
^C

The same command on vm02-test ( and vm03-test ):

root vm02-test:~# ls /var/lib/libvirt/sanlock/
42f8374d2c9513171301d94ab3f4c921 e193ecac416d5d6a4b7433ca80e201c5 f97ab2f33af3dc0f3fc38a9921aa3711 __LIBVIRT__DISKS__

I have tried rebooting the whole cluster, rebooting several nodes, restarting cman, etc, it never fully works. If it's not happening on vm01, it happens on one of the other nodes. Both gfs2 volumes have been stuck like this on 1 of the 3 nodes.

I've included as much info as possible to assist you guys in getting to the bottom of this, if i have forgotten something, please let me know! I would really like to know what i'm missing here,


Cluster contains the following components:

cman 3.1.7-0ubuntu2.1
gfs2-cluster 3.1.3-0ubuntu1
corosync 1.4.2-2
lvm2 2.02.95-4ppa1
sanlock 2.2-1
libvirt-bin 0.9.13-1ppa1
rgmanager 3.1.7-0ubuntu2.1


Main configuration for the cluster is as follows:

<cluster name="kvm" config_version="11">
	<logging debug="on"/>
        <clusternodes>
        <clusternode name="vm01-test" nodeid="1">
		<fence>
			<method name="apc">
				<device name="apc01" port="1" action="off"/>
				<device name="apc02" port="1" action="off"/>
				<device name="apc01" port="1" action="on"/>
				<device name="apc02" port="1" action="on"/>
			</method>
		</fence>
        </clusternode>
        <clusternode name="vm02-test" nodeid="2">
		<fence>
			<method name="apc">
				<device name="apc01" port="8" action="off"/>
				<device name="apc02" port="8" action="off"/>
				<device name="apc01" port="8" action="on"/>
				<device name="apc02" port="8" action="on"/>
			</method>
                </fence>
        </clusternode>
        <clusternode name="vm03-test" nodeid="3">
		<fence>
			<method name="apc">
				<device name="apc01" port="2" action="off"/>
				<device name="apc02" port="2" action="off"/>
				<device name="apc01" port="2" action="on"/>
				<device name="apc02" port="2" action="on"/>
			</method>
                </fence>
        </clusternode>
        </clusternodes>
	<fencedevices>
<fencedevice agent="fence_apc" ipaddr="apc01" secure="on" login="device" name="apc01" passwd="xxx"/> <fencedevice agent="fence_apc" ipaddr="apc02" secure="on" login="device" name="apc02" passwd="xxx"/>
	</fencedevices>
	<rm log_level="5">
		<failoverdomains>
<failoverdomain name="any_node" nofailback="1" ordered="0" restricted="0"/>
		</failoverdomains>
<vm domain="any_node" max_restarts="2" migrate="live" name="cloudstack" path="/etc/libvirt/qemu/" recovery="restart" restart_expire_time="600"/> <vm domain="any_node" max_restarts="2" migrate="live" name="test" path="/etc/libvirt/qemu/" recovery="restart" restart_expire_time="600"/>
	</rm>
	<totem rrp_mode="none" secauth="off"/>
	<quorumd device="/dev/mapper/iscsi_cluster_quorum"></quorumd>
</cluster>



Output from various commands:

root vm01-test:~#  dlm_tool ls
dlm lockspaces
name          rgmanager
id            0x5231f3eb
flags         0x00000000
change        member 3 joined 1 remove 0 failed 0 seq 1,1
members       1 2 3

name          sanlock
id            0x3c282c0a
flags         0x00000008 fs_reg
change        member 3 joined 1 remove 0 failed 0 seq 3,3
members       1 2 3

name          qemu
id            0xb061106c
flags         0x00000008 fs_reg
change        member 3 joined 1 remove 0 failed 0 seq 5,5
members       1 2 3

name          clvmd
id            0x4104eefa
flags         0x00000000
change        member 1 joined 1 remove 0 failed 0 seq 1,1
members       1

root vm02-test:~# dlm_tool ls
dlm lockspaces
name          clvmd
id            0x4104eefa
flags         0x00000000
change        member 2 joined 1 remove 0 failed 0 seq 1,1
members       1 2

name          qemu
id            0xb061106c
flags         0x00000008 fs_reg
change        member 3 joined 1 remove 0 failed 0 seq 1,1
members       1 2 3

name          rgmanager
id            0x5231f3eb
flags         0x00000000
change        member 3 joined 1 remove 0 failed 0 seq 3,3
members       1 2 3

name          sanlock
id            0x3c282c0a
flags         0x00000008 fs_reg
change        member 3 joined 1 remove 0 failed 0 seq 2,2
members       1 2 3

root vm02-test:~# clustat
Cluster Status for kvm @ Wed Aug 15 17:19:24 2012
Member Status: Quorate

Member Name ID Status ------ ---- ---- ------ vm01-test 1 Online vm02-test 2 Online, Local, rgmanager vm03-test 3 Online, rgmanager /dev/mapper/iscsi_cluster_quorum 0 Online, Quorum Disk

Service Name Owner (Last) State ------- ---- ----- ------ ----- vm:cloudstack (vm03-test) stopped vm:test (vm02-test) disabled

root vm02-test:~# sanlock client status
daemon 806a79ee-ef22-4296-abf4-5f2d531063a1.vm02-test
p -1 listener
p -1 status
s __LIBVIRT__DISKS__:2:/var/lib/libvirt/sanlock/__LIBVIRT__DISKS__:0


root vm01-test:~# cman_tool status
Version: 6.2.0
Config Version: 11
Cluster Name: kvm
Cluster Id: 773
Cluster Member: Yes
Cluster Generation: 1220
Membership state: Cluster-Member
Nodes: 3
Expected votes: 3
Quorum device votes: 2
Total votes: 5
Node votes: 1
Quorum: 3
Active subsystems: 9
Flags:
Ports Bound: 0 11 178
Node name: vm01-test
Node ID: 1
Multicast addresses: 239.192.3.8
Node addresses: 10.254.128.240


root vm02-test:~# cman_tool status
Version: 6.2.0
Config Version: 11
Cluster Name: kvm
Cluster Id: 773
Cluster Member: Yes
Cluster Generation: 1220
Membership state: Cluster-Member
Nodes: 3
Expected votes: 3
Quorum device votes: 2
Total votes: 5
Node votes: 1
Quorum: 3
Active subsystems: 9
Flags:
Ports Bound: 0 11 177 178
Node name: vm02-test
Node ID: 2
Multicast addresses: 239.192.3.8
Node addresses: 10.254.128.65


root vm01-test:~# ps aux | grep gfs
root 11686 0.0 0.0 140080 1996 ? Ssl 13:20 0:00 /usr/sbin/gfs_controld root 14172 0.0 0.0 0 0 ? S< 13:21 0:00 [gfs_recovery] root 14183 0.0 0.0 0 0 ? S 13:21 0:00 [gfs2_logd] root 14184 0.0 0.0 0 0 ? S 13:21 0:00 [gfs2_quotad] root 14388 0.0 0.0 0 0 ? S 13:21 0:00 [gfs2_logd] root 14389 0.0 0.0 0 0 ? S 13:21 0:00 [gfs2_quotad] root 14621 0.0 0.0 4316 540 ? D 13:25 0:00 /sbin/mount.gfs2 /dev/mapper/iscsi_cluster_sanlock /var/lib/libvirt/sanlock -o rw root 20438 0.0 0.0 9380 944 pts/7 S+ 17:25 0:00 grep --color=auto gfs
root vm01-test:~# ps aux | grep dlm
root 7430 0.0 0.0 0 0 ? S< 12:23 0:00 [user_dlm] root 11606 0.0 0.0 223096 2076 ? Ssl 13:20 0:00 dlm_controld root 13614 0.0 0.0 0 0 ? S 13:20 0:00 [dlm_scand] root 13615 0.0 0.0 0 0 ? S< 13:20 0:00 [dlm_recv] root 13616 0.0 0.0 0 0 ? S< 13:20 0:00 [dlm_send] root 13617 0.0 0.0 0 0 ? S 13:20 0:00 [dlm_recoverd] root 14174 0.0 0.0 0 0 ? S< 13:21 0:00 [dlm_callback] root 14175 0.0 0.0 0 0 ? S 13:21 0:00 [dlm_recoverd] root 14382 0.0 0.0 0 0 ? S< 13:21 0:00 [dlm_callback] root 14383 0.0 0.0 0 0 ? S 13:21 0:00 [dlm_recoverd] root 15525 0.0 0.0 0 0 ? S 13:35 0:00 [dlm_recoverd] root 20442 0.0 0.0 9380 940 pts/7 S+ 17:25 0:00 grep --color=auto dlm

root vm02-test:~# ps aux | grep gfs
root 8433 0.0 0.0 140080 2016 ? Ssl 13:31 0:00 /usr/sbin/gfs_controld root 8465 0.0 0.0 0 0 ? S< 13:31 0:00 [gfs_recovery] root 8493 0.0 0.0 0 0 ? S 13:31 0:00 [gfs2_logd] root 8494 0.0 0.0 0 0 ? S 13:31 0:00 [gfs2_quotad] root 9860 0.0 0.0 0 0 ? S 13:34 0:00 [gfs2_logd] root 9861 0.0 0.0 0 0 ? S 13:34 0:00 [gfs2_quotad] root 12818 0.0 0.0 9380 940 pts/0 S+ 17:25 0:00 grep --color=auto gfs
root vm02-test:~# ps aux | grep dlm
root 8012 0.0 0.0 223096 2064 ? Ssl 12:04 0:00 dlm_controld root 8467 0.0 0.0 0 0 ? S 13:31 0:00 [dlm_scand] root 8468 0.0 0.0 0 0 ? S< 13:31 0:00 [dlm_recv] root 8469 0.0 0.0 0 0 ? S< 13:31 0:00 [dlm_send] root 8485 0.0 0.0 0 0 ? S< 13:31 0:00 [dlm_callback] root 8486 0.0 0.0 0 0 ? S 13:31 0:00 [dlm_recoverd] root 8560 0.0 0.0 0 0 ? S 13:31 0:00 [dlm_recoverd] root 9851 0.0 0.0 0 0 ? S< 13:34 0:00 [dlm_callback] root 9852 0.0 0.0 0 0 ? S 13:34 0:00 [dlm_recoverd] root 12603 0.0 0.0 0 0 ? S 17:18 0:00 [dlm_recoverd] root 12820 0.0 0.0 9380 940 pts/0 S+ 17:25 0:00 grep --color=auto dlm


root vm02-test:~# gfs2_tool journals /var/lib/libvirt/sanlock
journal2 - 8MB
journal3 - 8MB
journal1 - 8MB
journal0 - 8MB
4 journal(s) found.
root vm02-test:~# gfs2_tool journals /etc/libvirt/qemu
journal2 - 8MB
journal3 - 8MB
journal1 - 8MB
journal0 - 8MB
4 journal(s) found.

# lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 12.04 LTS
Release:	12.04
Codename:	precise




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]