[Linux-cluster] Clearing a glock

Scooter Morris scooter at cgl.ucsf.edu
Mon Jul 26 23:54:59 UTC 2010


  We've got two nodes of a three node gfs2 cluster that seem to be in 
some sort of deadlock.  We're seeing a number of gfs2-related stack 
traces in dmesg:

INFO: task igtcpython.sh:22945 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
igtcpython.sh D ffff81011cabd218     0 22945  22874                     
(NOTLB)
  ffff8101f9fedcc8 0000000000000086 ffff8101f9fede38 ffffffff8000a604
  ffff8102b257f778 0000000000000006 ffff8105819d37a0 ffff8107a7f93820
  0004084c69951298 00000000000365c6 ffff8105819d3988 000000042d07a810
Call Trace:
  [<ffffffff8000a604>] __link_path_walk+0xdf8/0xf42
  [<ffffffff8002c9e4>] mntput_no_expire+0x19/0x89
  [<ffffffff8000ea46>] link_path_walk+0xa6/0xb2
  [<ffffffff887d6ee7>] :gfs2:just_schedule+0x0/0xe
  [<ffffffff887d6ef0>] :gfs2:just_schedule+0x9/0xe
  [<ffffffff80063a16>] __wait_on_bit+0x40/0x6e
  [<ffffffff887d6ee7>] :gfs2:just_schedule+0x0/0xe
  [<ffffffff80063ab0>] out_of_line_wait_on_bit+0x6c/0x78
  [<ffffffff800a0aec>] wake_bit_function+0x0/0x23
  [<ffffffff887d6ee2>] :gfs2:gfs2_glock_wait+0x2b/0x30
  [<ffffffff887e679e>] :gfs2:gfs2_permission+0x83/0xd5
  [<ffffffff887e6796>] :gfs2:gfs2_permission+0x7b/0xd5
  [<ffffffff8000d918>] permission+0x81/0xc8
  [<ffffffff8003c0c0>] open_exec+0x60/0xc0
  [<ffffffff8005d28d>] tracesys+0xd5/0xe0
  [<ffffffff8003ed4d>] do_execve+0x46/0x1f7
  [<ffffffff8005516d>] sys_execve+0x36/0x4c
  [<ffffffff8005d4d3>] stub_execve+0x67/0xb0

and gfs2_hangalyzer is saying:

./gfs2_hangalyzer -n wilkins-pi -a
wilkins-pi: UsrLocal: G:  s:UN n:2/a5b67f f:l t:SH d:EX/0 l:0 a:0 r:58
wilkins-pi: UsrLocal:  H: s:SH f:W e:0 p:21810 [python] 
gfs2_readpage+0x61/0x199 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:W e:0 p:21809 [python] 
gfs2_readpage+0x61/0x199 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:W e:0 p:1897 [python] 
gfs2_readpage+0x61/0x199 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:6307 [igtcpython.sh] 
gfs2_permission+0x7b/0xd5 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:10436 [igtcpython.sh] 
gfs2_permission+0x7b/0xd5 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:11000 [igtcpython.sh] 
gfs2_permission+0x7b/0xd5 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:11003 [igtcpython.sh] 
gfs2_permission+0x7b/0xd5 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:12140 [igtcpython.sh] 
gfs2_permission+0x7b/0xd5 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:21499 [igtcpython.sh] 
gfs2_permission+0x7b/0xd5 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:32601 [igtcpython.sh] 
gfs2_permission+0x7b/0xd5 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:653 [igtcpython.sh] 
gfs2_permission+0x7b/0xd5 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:10078 [igtcpython.sh] 
gfs2_permission+0x7b/0xd5 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:14436 [igtcpython.sh] 
gfs2_permission+0x7b/0xd5 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:7500 [igtcpython.sh] 
gfs2_permission+0x7b/0xd5 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:22815 [igtcpython.sh] 
gfs2_permission+0x7b/0xd5 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:26056 [igtcpython.sh] 
gfs2_permission+0x7b/0xd5 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:26062 [igtcpython.sh] 
gfs2_permission+0x7b/0xd5 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:26122 [igtcpython.sh] 
gfs2_permission+0x7b/0xd5 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:26124 [igtcpython.sh] 
gfs2_permission+0x7b/0xd5 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:26128 [igtcpython.sh] 
gfs2_permission+0x7b/0xd5 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:31825 [igtcpython.sh] 
gfs2_permission+0x7b/0xd5 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:32125 [igtcpython.sh] 
gfs2_permission+0x7b/0xd5 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:2441 [igtcpython.sh] 
gfs2_permission+0x7b/0xd5 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:2444 [igtcpython.sh] 
gfs2_permission+0x7b/0xd5 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:21792 [igtcpython.sh] 
gfs2_permission+0x7b/0xd5 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:21793 [igtcpython.sh] 
gfs2_permission+0x7b/0xd5 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:21794 [igtcpython.sh] 
gfs2_permission+0x7b/0xd5 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:31941 [igtcpython.sh] 
gfs2_permission+0x7b/0xd5 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:31970 [igtcpython.sh] 
gfs2_permission+0x7b/0xd5 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:8584 [igtcpython.sh] 
gfs2_permission+0x7b/0xd5 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:8590 [igtcpython.sh] 
gfs2_permission+0x7b/0xd5 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:8821 [igtcpython.sh] 
gfs2_permission+0x7b/0xd5 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:8822 [igtcpython.sh] 
gfs2_permission+0x7b/0xd5 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:9487 [igtcpython.sh] 
gfs2_permission+0x7b/0xd5 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:9488 [igtcpython.sh] 
gfs2_permission+0x7b/0xd5 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:9489 [igtcpython.sh] 
gfs2_permission+0x7b/0xd5 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:9490 [igtcpython.sh] 
gfs2_permission+0x7b/0xd5 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:18878 [igtcpython.sh] 
gfs2_permission+0x7b/0xd5 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:325 [igtcpython.sh] 
gfs2_permission+0x7b/0xd5 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:3139 [ls] 
gfs2_getattr+0x7d/0xc4 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:3144 [ls] 
gfs2_getattr+0x7d/0xc4 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:7450 [ls] 
gfs2_getattr+0x7d/0xc4 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:31741 [igtcpython.sh] 
gfs2_permission+0x7b/0xd5 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:4982 [igtcpython.sh] 
gfs2_permission+0x7b/0xd5 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:18258 [igtcpython.sh] 
gfs2_permission+0x7b/0xd5 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:18262 [igtcpython.sh] 
gfs2_permission+0x7b/0xd5 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:18263 [igtcpython.sh] 
gfs2_permission+0x7b/0xd5 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:18265 [igtcpython.sh] 
gfs2_permission+0x7b/0xd5 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:18269 [igtcpython.sh] 
gfs2_permission+0x7b/0xd5 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:20039 [igtcpython.sh] 
gfs2_permission+0x7b/0xd5 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:20042 [igtcpython.sh] 
gfs2_permission+0x7b/0xd5 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:20043 [igtcpython.sh] 
gfs2_permission+0x7b/0xd5 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:20044 [igtcpython.sh] 
gfs2_permission+0x7b/0xd5 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:20046 [igtcpython.sh] 
gfs2_permission+0x7b/0xd5 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:20763 [igtcpython.sh] 
gfs2_permission+0x7b/0xd5 [gfs2]
                         lkb_id N RemoteID  pid exflg lkbflgs stat gr 
rq    waiting n ln             resource name
wilkins-pi: UsrLocal:  26513a1 3  1385218 21810     0       0 wait -1  
3          0 3 24 "       2          a5b67f"



There is 1 glock with waiters.
wilkins-pi.compbio.ucsf.edu, pid 21810 is waiting for glock 2/a5b67f, 
but no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 21809 is waiting for glock 2/a5b67f, 
but no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 1897 is waiting for glock 2/a5b67f, but 
no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 6307 is waiting for glock 2/a5b67f, but 
no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 10436 is waiting for glock 2/a5b67f, 
but no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 11000 is waiting for glock 2/a5b67f, 
but no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 11003 is waiting for glock 2/a5b67f, 
but no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 12140 is waiting for glock 2/a5b67f, 
but no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 21499 is waiting for glock 2/a5b67f, 
but no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 32601 is waiting for glock 2/a5b67f, 
but no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 653 is waiting for glock 2/a5b67f, but 
no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 10078 is waiting for glock 2/a5b67f, 
but no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 14436 is waiting for glock 2/a5b67f, 
but no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 7500 is waiting for glock 2/a5b67f, but 
no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 22815 is waiting for glock 2/a5b67f, 
but no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 26056 is waiting for glock 2/a5b67f, 
but no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 26062 is waiting for glock 2/a5b67f, 
but no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 26122 is waiting for glock 2/a5b67f, 
but no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 26124 is waiting for glock 2/a5b67f, 
but no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 26128 is waiting for glock 2/a5b67f, 
but no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 31825 is waiting for glock 2/a5b67f, 
but no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 32125 is waiting for glock 2/a5b67f, 
but no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 2441 is waiting for glock 2/a5b67f, but 
no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 2444 is waiting for glock 2/a5b67f, but 
no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 21792 is waiting for glock 2/a5b67f, 
but no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 21793 is waiting for glock 2/a5b67f, 
but no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 21794 is waiting for glock 2/a5b67f, 
but no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 31941 is waiting for glock 2/a5b67f, 
but no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 31970 is waiting for glock 2/a5b67f, 
but no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 8584 is waiting for glock 2/a5b67f, but 
no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 8590 is waiting for glock 2/a5b67f, but 
no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 8821 is waiting for glock 2/a5b67f, but 
no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 8822 is waiting for glock 2/a5b67f, but 
no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 9487 is waiting for glock 2/a5b67f, but 
no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 9488 is waiting for glock 2/a5b67f, but 
no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 9489 is waiting for glock 2/a5b67f, but 
no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 9490 is waiting for glock 2/a5b67f, but 
no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 18878 is waiting for glock 2/a5b67f, 
but no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 325 is waiting for glock 2/a5b67f, but 
no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 3139 is waiting for glock 2/a5b67f, but 
no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 3144 is waiting for glock 2/a5b67f, but 
no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 7450 is waiting for glock 2/a5b67f, but 
no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 31741 is waiting for glock 2/a5b67f, 
but no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 4982 is waiting for glock 2/a5b67f, but 
no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 18258 is waiting for glock 2/a5b67f, 
but no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 18262 is waiting for glock 2/a5b67f, 
but no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 18263 is waiting for glock 2/a5b67f, 
but no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 18265 is waiting for glock 2/a5b67f, 
but no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 18269 is waiting for glock 2/a5b67f, 
but no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 20039 is waiting for glock 2/a5b67f, 
but no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 20042 is waiting for glock 2/a5b67f, 
but no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 20043 is waiting for glock 2/a5b67f, 
but no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 20044 is waiting for glock 2/a5b67f, 
but no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 20046 is waiting for glock 2/a5b67f, 
but no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 20763 is waiting for glock 2/a5b67f, 
but no holder was found.
          The dlm has granted lkb "       2          a5b67f" to pid 
391344724

Clearly, I've got a hung lock of some sort.  Is there any way to clear 
the glock to free up all of these processes?  I really hate to reboot 
the cluster to clear this up since it's only effecting one pipeline....

Thanks in advance!

-- scooter




More information about the Linux-cluster mailing list