[Linux-cluster] dlm problem

Constantin Daniel VULTUR costi at recognos.ro
Wed Mar 29 07:01:47 UTC 2006


Hi guys,

I got into an problem while using GFS with DLM on Fedora Core 4.
Every thing is working fine, the boot process, the mounting of the GFS 
volume, the reading of the journals,
until I try to read files from that volume. And from then the GFS hangs.
I tried to look into the system logs but there I can't find anything.
The only thing that I find it different from my other cluster wich is 
based on Ubuntu, is the content of
/proc/cluster/dlm_debug

On FC4 :
[root at cluster01 ~]# cat /proc/cluster/dlm_debug
6 finished
clvmd move flags 1,0,0 ids 15,15,15
clvmd move flags 0,1,0 ids 15,20,15
clvmd move use event 20
clvmd recover event 20
clvmd add node 5
clvmd total nodes 5
clvmd rebuild resource directory
clvmd rebuilt 1 resources
clvmd purge requests
clvmd purged 0 requests
clvmd mark waiting requests
clvmd marked 0 requests
clvmd recover event 20 done
clvmd move flags 0,0,1 ids 15,20,20
clvmd process held requests
clvmd processed 0 requests
clvmd resend marked requests
clvmd resent 0 requests
clvmd recover event 20 finished
data move flags 1,0,0 ids 16,16,16
data move flags 0,1,0 ids 16,21,16
data move use event 21
data recover event 21
data add node 5
data total nodes 4
data rebuild resource directory
data rebuilt 8 resources
data purge requests
data purged 0 requests
data mark waiting requests
data marked 0 requests
data recover event 21 done
data move flags 0,0,1 ids 16,21,21
data process held requests
data processed 0 requests
data resend marked requests
data resent 0 requests
data recover event 21 finished
[root at cluster01 ~]#


And on Ubuntu:
root at web1:~# cat /proc/cluster/dlm_debug
        5
data (8066) req reply einval 660c01e6 fr 5 r 5        5
data (8066) req reply einval 660c01e6 fr 5 r 5        5
data (8066) req reply einval 65fb0158 fr 5 r 5        5
data (8066) req reply einval 65fb0158 fr 5 r 5        5
data (8066) req reply einval 65fb0158 fr 5 r 5        5
data (8066) req reply einval 65fb0158 fr 5 r 5        5
data (8066) req reply einval 65fb0158 fr 5 r 5        5
data (8066) req reply einval 65fb0158 fr 5 r 5        5
data (8066) req reply einval 65fb0158 fr 5 r 5        5
data (8066) req reply einval 65fb0158 fr 5 r 5        5
data (8066) req reply einval 65fb0158 fr 5 r 5        5
data (8066) req reply einval 65fb0158 fr 5 r 5        5
data (8066) req reply einval 63b4013a fr 5 r 5        5
data (8066) req reply einval 63b4013a fr 5 r 5        5
data send einval to 7
data send einval to 7
data send einval to 4
data send einval to 4
data send einval to 4
root at web1:~#


So I thought that this is and DLM locking problem, so I started to look 
for clues
[root at cluster01 ~]# lsmod | grep dlm
lock_dlm               42084  1
lock_harness            4392  2 lock_dlm,gfs
dlm                   118220  5 lock_dlm
cman                  130208  21 lock_dlm,dlm

[root at cluster01 ~]# dmesg | grep dlm
dlm: no version for "struct_module" found: kernel tainted.
GFS: Trying to join cluster "lock_dlm", "cluster:data"

The packages that are installed are:
[root at cluster01 ~]# rpm -qa | grep dlm
dlm-kernel-2.6.11.5-20050601.152643.FC4.17
dlm-kernheaders-2.6.11.5-20050601.152643.FC4.17
dlm-devel-1.0.0-3
dlm-1.0.0-3


And the machine is
[root at cluster01 ~]# uname -a
Linux cluster01 2.6.14-1.1653_FC4 #1 Tue Dec 13 21:32:09 EST 2005 i686 
i686 i386 GNU/Linux

Right now I don't know for sure which is the problem. I hope that 
someone can explain me what I did wrong.

Thanks in advance for your help.

Costi.








More information about the Linux-cluster mailing list