[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

[Linux-cluster] Re: UPDATE qdiskd locks cluster



still can't make the qdisk working - after 5-10 seconds of qdiskd start it locks awaiting some communication:

some more debugging of this issue:

the strace of qdiskd when getting locked:

lseek(6, 65536, SEEK_SET)               = 65536
read(6, "\36\273\336\0`\224\213P\2265\v\3P\0\0\0\0\0\0\0\0\0\0\0"..., 512) = 512
select(4, [3], NULL, NULL, {0, 0})      = 0 (Timeout)
writev(3, [{"NAMC\3\0\0\20\30\0\0\0\267\0\0\200\0\0\0\0", 20}, {"\1\0\0\0", 4}], 2) = 24
recvfrom(3,

and gdb stack trace in locked state

#0  0x00007f2e642adc75 in recv () from /lib64/libpthread.so.0
#1 0x00007f2e643bba81 in cman_dispatch (handle=0x50e010, flags=26) at /usr/src/packages/BUILD/cluster-2.03.09/cman/lib/libcman.c:501 #2 0x00007f2e643bbc7b in info_call (h=0x50e010, msgtype=<value optimized out>, inbuf=<value optimized out>, inlen=<value optimized out>, outbuf=0x0, outlen=0)
    at /usr/src/packages/BUILD/cluster-2.03.09/cman/lib/libcman.c:59
#3 0x00007f2e643bc07a in cman_poll_quorum_device (handle=0x6, isavailable=1) at /usr/src/packages/BUILD/cluster-2.03.09/cman/lib/libcman.c:1016 #4 0x0000000000406d6f in quorum_loop (ctx=0x7fff6c5d5a10, ni=0x7fff6c5d5110, max=16) at /usr/src/packages/BUILD/cluster-2.03.09/cman/qdisk/main.c:985 #5 0x0000000000407b50 in main (argc=<value optimized out>, argv=0x7fff6c5d6508) at /usr/src/packages/BUILD/cluster-2.03.09/cman/qdisk/main.c:1540

what can be wrong?
thanks stepan


Stepan Kadlec wrote:
hi,
I am running cluster 2.03.08.
after adding qdisk feature to twonode cluster, it somehow locks entire cluster. without qdisk it runs ok.

initialization log:

Nov 21 15:15:07 xen01 ccsd[15178]: Starting ccsd 2.03.08:
Nov 21 15:15:07 xen01 ccsd[15178]:  Built: Nov 18 2008 14:18:19
Nov 21 15:15:07 xen01 ccsd[15178]: Copyright (C) Red Hat, Inc. 2004-2008 All rights reserved.
Nov 21 15:15:07 xen01 ccsd[15178]:   IP Protocol:: IPv4 only
Nov 21 15:15:07 xen01 ccsd[15178]: /etc/cluster/cluster.conf (cluster name = xen, version = 1) found.
Nov 21 15:15:10 xen01 ccsd[15178]: Initial status:: Inquorate
Nov 21 15:15:22 xen01 qdiskd[15202]: <debug> 0 heuristics loaded
Nov 21 15:15:22 xen01 qdiskd[15202]: <debug> Quorum Daemon: 0 heuristics, 1 interval, 10 tko, 1 votes
Nov 21 15:15:22 xen01 qdiskd[15202]: <debug> Run Flags: 00000031
Nov 21 15:15:22 xen01 qdiskd[15202]: <info> Quorum Partition: /dev/disk/by-id/scsi-360a9800068706952464a4b544c704271-part2 Label: xen
Nov 21 15:15:22 xen01 qdiskd[15203]: <info> Quorum Daemon Initializing
Nov 21 15:15:22 xen01 qdiskd[15203]: <debug> I/O Size: 512  Page Size: 4096
Nov 21 15:15:22 xen01 qdiskd[15203]: <debug> Permanently setting score to 1/1
Nov 21 15:15:22 xen01 kernel: dlm: closing connection to node 2
Nov 21 15:15:22 xen01 kernel: dlm: closing connection to node 1
Nov 21 15:15:25 xen01 qdiskd[15203]: <debug> Node 2 is UP
Nov 21 15:15:32 xen01 qdiskd[15203]: <info> Initial score 1/1
Nov 21 15:15:32 xen01 qdiskd[15203]: <info> Initialization complete
Nov 21 15:15:32 xen01 qdiskd[15203]: <notice> Score sufficient for master operation (1/1; required=1); upgrading
Nov 21 15:15:34 xen01 qdiskd[15203]: <debug> Making bid for master
Nov 21 15:15:38 xen01 qdiskd[15203]: <info> Assuming master role

after this, all cluster tools just hang - cman_tool nodes, clustat, ...

and cluster processes are in locked state:

13124 ?        Ssl    0:00 /sbin/ccsd -4
13129 ?        SLl    0:00 aisexec
13154 ?        Ss     0:00 /sbin/groupd
13157 ?        SLs    0:00 /sbin/qdiskd -Q
13162 ?        Ss     0:00 /sbin/fenced
13167 ?        Ss     0:00 /sbin/dlm_controld

any ideas howto fix that?
thanks stepan.


--
Eurosoftware s.r.o.
skadlec gk-software com
+420 379 307 379
+420 724 554 104


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]