[Linux-cluster] Re: UPDATE qdiskd locks cluster

Stepan Kadlec skadlec at gk-software.com
Mon Nov 24 13:14:30 UTC 2008


still can't make the qdisk working - after 5-10 seconds of qdiskd start 
it locks awaiting some communication:

some more debugging of this issue:

the strace of qdiskd when getting locked:

lseek(6, 65536, SEEK_SET)               = 65536
read(6, "\36\273\336\0`\224\213P\2265\v\3P\0\0\0\0\0\0\0\0\0\0\0"..., 
512) = 512
select(4, [3], NULL, NULL, {0, 0})      = 0 (Timeout)
writev(3, [{"NAMC\3\0\0\20\30\0\0\0\267\0\0\200\0\0\0\0", 20}, 
{"\1\0\0\0", 4}], 2) = 24
recvfrom(3,

and gdb stack trace in locked state

#0  0x00007f2e642adc75 in recv () from /lib64/libpthread.so.0
#1  0x00007f2e643bba81 in cman_dispatch (handle=0x50e010, flags=26) at 
/usr/src/packages/BUILD/cluster-2.03.09/cman/lib/libcman.c:501
#2  0x00007f2e643bbc7b in info_call (h=0x50e010, msgtype=<value 
optimized out>, inbuf=<value optimized out>, inlen=<value optimized 
out>, outbuf=0x0, outlen=0)
     at /usr/src/packages/BUILD/cluster-2.03.09/cman/lib/libcman.c:59
#3  0x00007f2e643bc07a in cman_poll_quorum_device (handle=0x6, 
isavailable=1) at 
/usr/src/packages/BUILD/cluster-2.03.09/cman/lib/libcman.c:1016
#4  0x0000000000406d6f in quorum_loop (ctx=0x7fff6c5d5a10, 
ni=0x7fff6c5d5110, max=16) at 
/usr/src/packages/BUILD/cluster-2.03.09/cman/qdisk/main.c:985
#5  0x0000000000407b50 in main (argc=<value optimized out>, 
argv=0x7fff6c5d6508) at 
/usr/src/packages/BUILD/cluster-2.03.09/cman/qdisk/main.c:1540

what can be wrong?
thanks stepan


Stepan Kadlec wrote:
> hi,
> I am running cluster 2.03.08.
> after adding qdisk feature to twonode cluster, it somehow locks entire 
> cluster. without qdisk it runs ok.
> 
> initialization log:
> 
> Nov 21 15:15:07 xen01 ccsd[15178]: Starting ccsd 2.03.08:
> Nov 21 15:15:07 xen01 ccsd[15178]:  Built: Nov 18 2008 14:18:19
> Nov 21 15:15:07 xen01 ccsd[15178]:  Copyright (C) Red Hat, Inc. 
> 2004-2008  All rights reserved.
> Nov 21 15:15:07 xen01 ccsd[15178]:   IP Protocol:: IPv4 only
> Nov 21 15:15:07 xen01 ccsd[15178]: /etc/cluster/cluster.conf (cluster 
> name = xen, version = 1) found.
> Nov 21 15:15:10 xen01 ccsd[15178]: Initial status:: Inquorate
> Nov 21 15:15:22 xen01 qdiskd[15202]: <debug> 0 heuristics loaded
> Nov 21 15:15:22 xen01 qdiskd[15202]: <debug> Quorum Daemon: 0 
> heuristics, 1 interval, 10 tko, 1 votes
> Nov 21 15:15:22 xen01 qdiskd[15202]: <debug> Run Flags: 00000031
> Nov 21 15:15:22 xen01 qdiskd[15202]: <info> Quorum Partition: 
> /dev/disk/by-id/scsi-360a9800068706952464a4b544c704271-part2 Label: xen
> Nov 21 15:15:22 xen01 qdiskd[15203]: <info> Quorum Daemon Initializing
> Nov 21 15:15:22 xen01 qdiskd[15203]: <debug> I/O Size: 512  Page Size: 4096
> Nov 21 15:15:22 xen01 qdiskd[15203]: <debug> Permanently setting score 
> to 1/1
> Nov 21 15:15:22 xen01 kernel: dlm: closing connection to node 2
> Nov 21 15:15:22 xen01 kernel: dlm: closing connection to node 1
> Nov 21 15:15:25 xen01 qdiskd[15203]: <debug> Node 2 is UP
> Nov 21 15:15:32 xen01 qdiskd[15203]: <info> Initial score 1/1
> Nov 21 15:15:32 xen01 qdiskd[15203]: <info> Initialization complete
> Nov 21 15:15:32 xen01 qdiskd[15203]: <notice> Score sufficient for 
> master operation (1/1; required=1); upgrading
> Nov 21 15:15:34 xen01 qdiskd[15203]: <debug> Making bid for master
> Nov 21 15:15:38 xen01 qdiskd[15203]: <info> Assuming master role
> 
> after this, all cluster tools just hang - cman_tool nodes, clustat, ...
> 
> and cluster processes are in locked state:
> 
> 13124 ?        Ssl    0:00 /sbin/ccsd -4
> 13129 ?        SLl    0:00 aisexec
> 13154 ?        Ss     0:00 /sbin/groupd
> 13157 ?        SLs    0:00 /sbin/qdiskd -Q
> 13162 ?        Ss     0:00 /sbin/fenced
> 13167 ?        Ss     0:00 /sbin/dlm_controld
> 
> any ideas howto fix that?
> thanks stepan.
> 

-- 
Eurosoftware s.r.o.
skadlec at gk-software.com
+420 379 307 379
+420 724 554 104




More information about the Linux-cluster mailing list