[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

[Linux-cluster] Clearing a glock



We've got two nodes of a three node gfs2 cluster that seem to be in some sort of deadlock. We're seeing a number of gfs2-related stack traces in dmesg:

INFO: task igtcpython.sh:22945 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
igtcpython.sh D ffff81011cabd218 0 22945 22874 (NOTLB)
 ffff8101f9fedcc8 0000000000000086 ffff8101f9fede38 ffffffff8000a604
 ffff8102b257f778 0000000000000006 ffff8105819d37a0 ffff8107a7f93820
 0004084c69951298 00000000000365c6 ffff8105819d3988 000000042d07a810
Call Trace:
 [<ffffffff8000a604>] __link_path_walk+0xdf8/0xf42
 [<ffffffff8002c9e4>] mntput_no_expire+0x19/0x89
 [<ffffffff8000ea46>] link_path_walk+0xa6/0xb2
 [<ffffffff887d6ee7>] :gfs2:just_schedule+0x0/0xe
 [<ffffffff887d6ef0>] :gfs2:just_schedule+0x9/0xe
 [<ffffffff80063a16>] __wait_on_bit+0x40/0x6e
 [<ffffffff887d6ee7>] :gfs2:just_schedule+0x0/0xe
 [<ffffffff80063ab0>] out_of_line_wait_on_bit+0x6c/0x78
 [<ffffffff800a0aec>] wake_bit_function+0x0/0x23
 [<ffffffff887d6ee2>] :gfs2:gfs2_glock_wait+0x2b/0x30
 [<ffffffff887e679e>] :gfs2:gfs2_permission+0x83/0xd5
 [<ffffffff887e6796>] :gfs2:gfs2_permission+0x7b/0xd5
 [<ffffffff8000d918>] permission+0x81/0xc8
 [<ffffffff8003c0c0>] open_exec+0x60/0xc0
 [<ffffffff8005d28d>] tracesys+0xd5/0xe0
 [<ffffffff8003ed4d>] do_execve+0x46/0x1f7
 [<ffffffff8005516d>] sys_execve+0x36/0x4c
 [<ffffffff8005d4d3>] stub_execve+0x67/0xb0

and gfs2_hangalyzer is saying:

./gfs2_hangalyzer -n wilkins-pi -a
wilkins-pi: UsrLocal: G:  s:UN n:2/a5b67f f:l t:SH d:EX/0 l:0 a:0 r:58
wilkins-pi: UsrLocal: H: s:SH f:W e:0 p:21810 [python] gfs2_readpage+0x61/0x199 [gfs2] wilkins-pi: UsrLocal: H: s:SH f:W e:0 p:21809 [python] gfs2_readpage+0x61/0x199 [gfs2] wilkins-pi: UsrLocal: H: s:SH f:W e:0 p:1897 [python] gfs2_readpage+0x61/0x199 [gfs2] wilkins-pi: UsrLocal: H: s:SH f:aW e:0 p:6307 [igtcpython.sh] gfs2_permission+0x7b/0xd5 [gfs2] wilkins-pi: UsrLocal: H: s:SH f:aW e:0 p:10436 [igtcpython.sh] gfs2_permission+0x7b/0xd5 [gfs2] wilkins-pi: UsrLocal: H: s:SH f:aW e:0 p:11000 [igtcpython.sh] gfs2_permission+0x7b/0xd5 [gfs2] wilkins-pi: UsrLocal: H: s:SH f:aW e:0 p:11003 [igtcpython.sh] gfs2_permission+0x7b/0xd5 [gfs2] wilkins-pi: UsrLocal: H: s:SH f:aW e:0 p:12140 [igtcpython.sh] gfs2_permission+0x7b/0xd5 [gfs2] wilkins-pi: UsrLocal: H: s:SH f:aW e:0 p:21499 [igtcpython.sh] gfs2_permission+0x7b/0xd5 [gfs2] wilkins-pi: UsrLocal: H: s:SH f:aW e:0 p:32601 [igtcpython.sh] gfs2_permission+0x7b/0xd5 [gfs2] wilkins-pi: UsrLocal: H: s:SH f:aW e:0 p:653 [igtcpython.sh] gfs2_permission+0x7b/0xd5 [gfs2] wilkins-pi: UsrLocal: H: s:SH f:aW e:0 p:10078 [igtcpython.sh] gfs2_permission+0x7b/0xd5 [gfs2] wilkins-pi: UsrLocal: H: s:SH f:aW e:0 p:14436 [igtcpython.sh] gfs2_permission+0x7b/0xd5 [gfs2] wilkins-pi: UsrLocal: H: s:SH f:aW e:0 p:7500 [igtcpython.sh] gfs2_permission+0x7b/0xd5 [gfs2] wilkins-pi: UsrLocal: H: s:SH f:aW e:0 p:22815 [igtcpython.sh] gfs2_permission+0x7b/0xd5 [gfs2] wilkins-pi: UsrLocal: H: s:SH f:aW e:0 p:26056 [igtcpython.sh] gfs2_permission+0x7b/0xd5 [gfs2] wilkins-pi: UsrLocal: H: s:SH f:aW e:0 p:26062 [igtcpython.sh] gfs2_permission+0x7b/0xd5 [gfs2] wilkins-pi: UsrLocal: H: s:SH f:aW e:0 p:26122 [igtcpython.sh] gfs2_permission+0x7b/0xd5 [gfs2] wilkins-pi: UsrLocal: H: s:SH f:aW e:0 p:26124 [igtcpython.sh] gfs2_permission+0x7b/0xd5 [gfs2] wilkins-pi: UsrLocal: H: s:SH f:aW e:0 p:26128 [igtcpython.sh] gfs2_permission+0x7b/0xd5 [gfs2] wilkins-pi: UsrLocal: H: s:SH f:aW e:0 p:31825 [igtcpython.sh] gfs2_permission+0x7b/0xd5 [gfs2] wilkins-pi: UsrLocal: H: s:SH f:aW e:0 p:32125 [igtcpython.sh] gfs2_permission+0x7b/0xd5 [gfs2] wilkins-pi: UsrLocal: H: s:SH f:aW e:0 p:2441 [igtcpython.sh] gfs2_permission+0x7b/0xd5 [gfs2] wilkins-pi: UsrLocal: H: s:SH f:aW e:0 p:2444 [igtcpython.sh] gfs2_permission+0x7b/0xd5 [gfs2] wilkins-pi: UsrLocal: H: s:SH f:aW e:0 p:21792 [igtcpython.sh] gfs2_permission+0x7b/0xd5 [gfs2] wilkins-pi: UsrLocal: H: s:SH f:aW e:0 p:21793 [igtcpython.sh] gfs2_permission+0x7b/0xd5 [gfs2] wilkins-pi: UsrLocal: H: s:SH f:aW e:0 p:21794 [igtcpython.sh] gfs2_permission+0x7b/0xd5 [gfs2] wilkins-pi: UsrLocal: H: s:SH f:aW e:0 p:31941 [igtcpython.sh] gfs2_permission+0x7b/0xd5 [gfs2] wilkins-pi: UsrLocal: H: s:SH f:aW e:0 p:31970 [igtcpython.sh] gfs2_permission+0x7b/0xd5 [gfs2] wilkins-pi: UsrLocal: H: s:SH f:aW e:0 p:8584 [igtcpython.sh] gfs2_permission+0x7b/0xd5 [gfs2] wilkins-pi: UsrLocal: H: s:SH f:aW e:0 p:8590 [igtcpython.sh] gfs2_permission+0x7b/0xd5 [gfs2] wilkins-pi: UsrLocal: H: s:SH f:aW e:0 p:8821 [igtcpython.sh] gfs2_permission+0x7b/0xd5 [gfs2] wilkins-pi: UsrLocal: H: s:SH f:aW e:0 p:8822 [igtcpython.sh] gfs2_permission+0x7b/0xd5 [gfs2] wilkins-pi: UsrLocal: H: s:SH f:aW e:0 p:9487 [igtcpython.sh] gfs2_permission+0x7b/0xd5 [gfs2] wilkins-pi: UsrLocal: H: s:SH f:aW e:0 p:9488 [igtcpython.sh] gfs2_permission+0x7b/0xd5 [gfs2] wilkins-pi: UsrLocal: H: s:SH f:aW e:0 p:9489 [igtcpython.sh] gfs2_permission+0x7b/0xd5 [gfs2] wilkins-pi: UsrLocal: H: s:SH f:aW e:0 p:9490 [igtcpython.sh] gfs2_permission+0x7b/0xd5 [gfs2] wilkins-pi: UsrLocal: H: s:SH f:aW e:0 p:18878 [igtcpython.sh] gfs2_permission+0x7b/0xd5 [gfs2] wilkins-pi: UsrLocal: H: s:SH f:aW e:0 p:325 [igtcpython.sh] gfs2_permission+0x7b/0xd5 [gfs2] wilkins-pi: UsrLocal: H: s:SH f:aW e:0 p:3139 [ls] gfs2_getattr+0x7d/0xc4 [gfs2] wilkins-pi: UsrLocal: H: s:SH f:aW e:0 p:3144 [ls] gfs2_getattr+0x7d/0xc4 [gfs2] wilkins-pi: UsrLocal: H: s:SH f:aW e:0 p:7450 [ls] gfs2_getattr+0x7d/0xc4 [gfs2] wilkins-pi: UsrLocal: H: s:SH f:aW e:0 p:31741 [igtcpython.sh] gfs2_permission+0x7b/0xd5 [gfs2] wilkins-pi: UsrLocal: H: s:SH f:aW e:0 p:4982 [igtcpython.sh] gfs2_permission+0x7b/0xd5 [gfs2] wilkins-pi: UsrLocal: H: s:SH f:aW e:0 p:18258 [igtcpython.sh] gfs2_permission+0x7b/0xd5 [gfs2] wilkins-pi: UsrLocal: H: s:SH f:aW e:0 p:18262 [igtcpython.sh] gfs2_permission+0x7b/0xd5 [gfs2] wilkins-pi: UsrLocal: H: s:SH f:aW e:0 p:18263 [igtcpython.sh] gfs2_permission+0x7b/0xd5 [gfs2] wilkins-pi: UsrLocal: H: s:SH f:aW e:0 p:18265 [igtcpython.sh] gfs2_permission+0x7b/0xd5 [gfs2] wilkins-pi: UsrLocal: H: s:SH f:aW e:0 p:18269 [igtcpython.sh] gfs2_permission+0x7b/0xd5 [gfs2] wilkins-pi: UsrLocal: H: s:SH f:aW e:0 p:20039 [igtcpython.sh] gfs2_permission+0x7b/0xd5 [gfs2] wilkins-pi: UsrLocal: H: s:SH f:aW e:0 p:20042 [igtcpython.sh] gfs2_permission+0x7b/0xd5 [gfs2] wilkins-pi: UsrLocal: H: s:SH f:aW e:0 p:20043 [igtcpython.sh] gfs2_permission+0x7b/0xd5 [gfs2] wilkins-pi: UsrLocal: H: s:SH f:aW e:0 p:20044 [igtcpython.sh] gfs2_permission+0x7b/0xd5 [gfs2] wilkins-pi: UsrLocal: H: s:SH f:aW e:0 p:20046 [igtcpython.sh] gfs2_permission+0x7b/0xd5 [gfs2] wilkins-pi: UsrLocal: H: s:SH f:aW e:0 p:20763 [igtcpython.sh] gfs2_permission+0x7b/0xd5 [gfs2] lkb_id N RemoteID pid exflg lkbflgs stat gr rq waiting n ln resource name wilkins-pi: UsrLocal: 26513a1 3 1385218 21810 0 0 wait -1 3 0 3 24 " 2 a5b67f"



There is 1 glock with waiters.
wilkins-pi.compbio.ucsf.edu, pid 21810 is waiting for glock 2/a5b67f, but no holder was found. wilkins-pi.compbio.ucsf.edu, pid 21809 is waiting for glock 2/a5b67f, but no holder was found. wilkins-pi.compbio.ucsf.edu, pid 1897 is waiting for glock 2/a5b67f, but no holder was found. wilkins-pi.compbio.ucsf.edu, pid 6307 is waiting for glock 2/a5b67f, but no holder was found. wilkins-pi.compbio.ucsf.edu, pid 10436 is waiting for glock 2/a5b67f, but no holder was found. wilkins-pi.compbio.ucsf.edu, pid 11000 is waiting for glock 2/a5b67f, but no holder was found. wilkins-pi.compbio.ucsf.edu, pid 11003 is waiting for glock 2/a5b67f, but no holder was found. wilkins-pi.compbio.ucsf.edu, pid 12140 is waiting for glock 2/a5b67f, but no holder was found. wilkins-pi.compbio.ucsf.edu, pid 21499 is waiting for glock 2/a5b67f, but no holder was found. wilkins-pi.compbio.ucsf.edu, pid 32601 is waiting for glock 2/a5b67f, but no holder was found. wilkins-pi.compbio.ucsf.edu, pid 653 is waiting for glock 2/a5b67f, but no holder was found. wilkins-pi.compbio.ucsf.edu, pid 10078 is waiting for glock 2/a5b67f, but no holder was found. wilkins-pi.compbio.ucsf.edu, pid 14436 is waiting for glock 2/a5b67f, but no holder was found. wilkins-pi.compbio.ucsf.edu, pid 7500 is waiting for glock 2/a5b67f, but no holder was found. wilkins-pi.compbio.ucsf.edu, pid 22815 is waiting for glock 2/a5b67f, but no holder was found. wilkins-pi.compbio.ucsf.edu, pid 26056 is waiting for glock 2/a5b67f, but no holder was found. wilkins-pi.compbio.ucsf.edu, pid 26062 is waiting for glock 2/a5b67f, but no holder was found. wilkins-pi.compbio.ucsf.edu, pid 26122 is waiting for glock 2/a5b67f, but no holder was found. wilkins-pi.compbio.ucsf.edu, pid 26124 is waiting for glock 2/a5b67f, but no holder was found. wilkins-pi.compbio.ucsf.edu, pid 26128 is waiting for glock 2/a5b67f, but no holder was found. wilkins-pi.compbio.ucsf.edu, pid 31825 is waiting for glock 2/a5b67f, but no holder was found. wilkins-pi.compbio.ucsf.edu, pid 32125 is waiting for glock 2/a5b67f, but no holder was found. wilkins-pi.compbio.ucsf.edu, pid 2441 is waiting for glock 2/a5b67f, but no holder was found. wilkins-pi.compbio.ucsf.edu, pid 2444 is waiting for glock 2/a5b67f, but no holder was found. wilkins-pi.compbio.ucsf.edu, pid 21792 is waiting for glock 2/a5b67f, but no holder was found. wilkins-pi.compbio.ucsf.edu, pid 21793 is waiting for glock 2/a5b67f, but no holder was found. wilkins-pi.compbio.ucsf.edu, pid 21794 is waiting for glock 2/a5b67f, but no holder was found. wilkins-pi.compbio.ucsf.edu, pid 31941 is waiting for glock 2/a5b67f, but no holder was found. wilkins-pi.compbio.ucsf.edu, pid 31970 is waiting for glock 2/a5b67f, but no holder was found. wilkins-pi.compbio.ucsf.edu, pid 8584 is waiting for glock 2/a5b67f, but no holder was found. wilkins-pi.compbio.ucsf.edu, pid 8590 is waiting for glock 2/a5b67f, but no holder was found. wilkins-pi.compbio.ucsf.edu, pid 8821 is waiting for glock 2/a5b67f, but no holder was found. wilkins-pi.compbio.ucsf.edu, pid 8822 is waiting for glock 2/a5b67f, but no holder was found. wilkins-pi.compbio.ucsf.edu, pid 9487 is waiting for glock 2/a5b67f, but no holder was found. wilkins-pi.compbio.ucsf.edu, pid 9488 is waiting for glock 2/a5b67f, but no holder was found. wilkins-pi.compbio.ucsf.edu, pid 9489 is waiting for glock 2/a5b67f, but no holder was found. wilkins-pi.compbio.ucsf.edu, pid 9490 is waiting for glock 2/a5b67f, but no holder was found. wilkins-pi.compbio.ucsf.edu, pid 18878 is waiting for glock 2/a5b67f, but no holder was found. wilkins-pi.compbio.ucsf.edu, pid 325 is waiting for glock 2/a5b67f, but no holder was found. wilkins-pi.compbio.ucsf.edu, pid 3139 is waiting for glock 2/a5b67f, but no holder was found. wilkins-pi.compbio.ucsf.edu, pid 3144 is waiting for glock 2/a5b67f, but no holder was found. wilkins-pi.compbio.ucsf.edu, pid 7450 is waiting for glock 2/a5b67f, but no holder was found. wilkins-pi.compbio.ucsf.edu, pid 31741 is waiting for glock 2/a5b67f, but no holder was found. wilkins-pi.compbio.ucsf.edu, pid 4982 is waiting for glock 2/a5b67f, but no holder was found. wilkins-pi.compbio.ucsf.edu, pid 18258 is waiting for glock 2/a5b67f, but no holder was found. wilkins-pi.compbio.ucsf.edu, pid 18262 is waiting for glock 2/a5b67f, but no holder was found. wilkins-pi.compbio.ucsf.edu, pid 18263 is waiting for glock 2/a5b67f, but no holder was found. wilkins-pi.compbio.ucsf.edu, pid 18265 is waiting for glock 2/a5b67f, but no holder was found. wilkins-pi.compbio.ucsf.edu, pid 18269 is waiting for glock 2/a5b67f, but no holder was found. wilkins-pi.compbio.ucsf.edu, pid 20039 is waiting for glock 2/a5b67f, but no holder was found. wilkins-pi.compbio.ucsf.edu, pid 20042 is waiting for glock 2/a5b67f, but no holder was found. wilkins-pi.compbio.ucsf.edu, pid 20043 is waiting for glock 2/a5b67f, but no holder was found. wilkins-pi.compbio.ucsf.edu, pid 20044 is waiting for glock 2/a5b67f, but no holder was found. wilkins-pi.compbio.ucsf.edu, pid 20046 is waiting for glock 2/a5b67f, but no holder was found. wilkins-pi.compbio.ucsf.edu, pid 20763 is waiting for glock 2/a5b67f, but no holder was found. The dlm has granted lkb " 2 a5b67f" to pid 391344724

Clearly, I've got a hung lock of some sort. Is there any way to clear the glock to free up all of these processes? I really hate to reboot the cluster to clear this up since it's only effecting one pipeline....

Thanks in advance!

-- scooter


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]