[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Linux-cluster] Clearing a glock




Maybe a bit off topic, but IMO Red Hat should really consider changing
the name of this process.
Thought I somehow crossed mailing lists and was about to read a post
about a jammed polymer-framed 9mm pistol. 

D

-----Original Message-----
From: linux-cluster-bounces redhat com
[mailto:linux-cluster-bounces redhat com] On Behalf Of Scooter Morris
Sent: Monday, July 26, 2010 6:55 PM
To: linux clustering
Subject: [Linux-cluster] Clearing a glock


  We've got two nodes of a three node gfs2 cluster that seem to be in 
some sort of deadlock.  We're seeing a number of gfs2-related stack 
traces in dmesg:

INFO: task igtcpython.sh:22945 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
igtcpython.sh D ffff81011cabd218     0 22945  22874                     
(NOTLB)
  ffff8101f9fedcc8 0000000000000086 ffff8101f9fede38 ffffffff8000a604
  ffff8102b257f778 0000000000000006 ffff8105819d37a0 ffff8107a7f93820
  0004084c69951298 00000000000365c6 ffff8105819d3988 000000042d07a810
Call Trace:
  [<ffffffff8000a604>] __link_path_walk+0xdf8/0xf42
  [<ffffffff8002c9e4>] mntput_no_expire+0x19/0x89
  [<ffffffff8000ea46>] link_path_walk+0xa6/0xb2
  [<ffffffff887d6ee7>] :gfs2:just_schedule+0x0/0xe
  [<ffffffff887d6ef0>] :gfs2:just_schedule+0x9/0xe
  [<ffffffff80063a16>] __wait_on_bit+0x40/0x6e
  [<ffffffff887d6ee7>] :gfs2:just_schedule+0x0/0xe
  [<ffffffff80063ab0>] out_of_line_wait_on_bit+0x6c/0x78
  [<ffffffff800a0aec>] wake_bit_function+0x0/0x23
  [<ffffffff887d6ee2>] :gfs2:gfs2_glock_wait+0x2b/0x30
  [<ffffffff887e679e>] :gfs2:gfs2_permission+0x83/0xd5
  [<ffffffff887e6796>] :gfs2:gfs2_permission+0x7b/0xd5
  [<ffffffff8000d918>] permission+0x81/0xc8
  [<ffffffff8003c0c0>] open_exec+0x60/0xc0
  [<ffffffff8005d28d>] tracesys+0xd5/0xe0
  [<ffffffff8003ed4d>] do_execve+0x46/0x1f7
  [<ffffffff8005516d>] sys_execve+0x36/0x4c
  [<ffffffff8005d4d3>] stub_execve+0x67/0xb0

and gfs2_hangalyzer is saying:

./gfs2_hangalyzer -n wilkins-pi -a
wilkins-pi: UsrLocal: G:  s:UN n:2/a5b67f f:l t:SH d:EX/0 l:0 a:0 r:58
wilkins-pi: UsrLocal:  H: s:SH f:W e:0 p:21810 [python] 
gfs2_readpage+0x61/0x199 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:W e:0 p:21809 [python] 
gfs2_readpage+0x61/0x199 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:W e:0 p:1897 [python] 
gfs2_readpage+0x61/0x199 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:6307 [igtcpython.sh] 
gfs2_permission+0x7b/0xd5 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:10436 [igtcpython.sh] 
gfs2_permission+0x7b/0xd5 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:11000 [igtcpython.sh] 
gfs2_permission+0x7b/0xd5 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:11003 [igtcpython.sh] 
gfs2_permission+0x7b/0xd5 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:12140 [igtcpython.sh] 
gfs2_permission+0x7b/0xd5 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:21499 [igtcpython.sh] 
gfs2_permission+0x7b/0xd5 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:32601 [igtcpython.sh] 
gfs2_permission+0x7b/0xd5 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:653 [igtcpython.sh] 
gfs2_permission+0x7b/0xd5 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:10078 [igtcpython.sh] 
gfs2_permission+0x7b/0xd5 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:14436 [igtcpython.sh] 
gfs2_permission+0x7b/0xd5 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:7500 [igtcpython.sh] 
gfs2_permission+0x7b/0xd5 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:22815 [igtcpython.sh] 
gfs2_permission+0x7b/0xd5 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:26056 [igtcpython.sh] 
gfs2_permission+0x7b/0xd5 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:26062 [igtcpython.sh] 
gfs2_permission+0x7b/0xd5 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:26122 [igtcpython.sh] 
gfs2_permission+0x7b/0xd5 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:26124 [igtcpython.sh] 
gfs2_permission+0x7b/0xd5 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:26128 [igtcpython.sh] 
gfs2_permission+0x7b/0xd5 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:31825 [igtcpython.sh] 
gfs2_permission+0x7b/0xd5 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:32125 [igtcpython.sh] 
gfs2_permission+0x7b/0xd5 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:2441 [igtcpython.sh] 
gfs2_permission+0x7b/0xd5 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:2444 [igtcpython.sh] 
gfs2_permission+0x7b/0xd5 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:21792 [igtcpython.sh] 
gfs2_permission+0x7b/0xd5 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:21793 [igtcpython.sh] 
gfs2_permission+0x7b/0xd5 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:21794 [igtcpython.sh] 
gfs2_permission+0x7b/0xd5 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:31941 [igtcpython.sh] 
gfs2_permission+0x7b/0xd5 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:31970 [igtcpython.sh] 
gfs2_permission+0x7b/0xd5 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:8584 [igtcpython.sh] 
gfs2_permission+0x7b/0xd5 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:8590 [igtcpython.sh] 
gfs2_permission+0x7b/0xd5 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:8821 [igtcpython.sh] 
gfs2_permission+0x7b/0xd5 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:8822 [igtcpython.sh] 
gfs2_permission+0x7b/0xd5 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:9487 [igtcpython.sh] 
gfs2_permission+0x7b/0xd5 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:9488 [igtcpython.sh] 
gfs2_permission+0x7b/0xd5 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:9489 [igtcpython.sh] 
gfs2_permission+0x7b/0xd5 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:9490 [igtcpython.sh] 
gfs2_permission+0x7b/0xd5 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:18878 [igtcpython.sh] 
gfs2_permission+0x7b/0xd5 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:325 [igtcpython.sh] 
gfs2_permission+0x7b/0xd5 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:3139 [ls] 
gfs2_getattr+0x7d/0xc4 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:3144 [ls] 
gfs2_getattr+0x7d/0xc4 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:7450 [ls] 
gfs2_getattr+0x7d/0xc4 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:31741 [igtcpython.sh] 
gfs2_permission+0x7b/0xd5 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:4982 [igtcpython.sh] 
gfs2_permission+0x7b/0xd5 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:18258 [igtcpython.sh] 
gfs2_permission+0x7b/0xd5 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:18262 [igtcpython.sh] 
gfs2_permission+0x7b/0xd5 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:18263 [igtcpython.sh] 
gfs2_permission+0x7b/0xd5 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:18265 [igtcpython.sh] 
gfs2_permission+0x7b/0xd5 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:18269 [igtcpython.sh] 
gfs2_permission+0x7b/0xd5 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:20039 [igtcpython.sh] 
gfs2_permission+0x7b/0xd5 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:20042 [igtcpython.sh] 
gfs2_permission+0x7b/0xd5 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:20043 [igtcpython.sh] 
gfs2_permission+0x7b/0xd5 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:20044 [igtcpython.sh] 
gfs2_permission+0x7b/0xd5 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:20046 [igtcpython.sh] 
gfs2_permission+0x7b/0xd5 [gfs2]
wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:20763 [igtcpython.sh] 
gfs2_permission+0x7b/0xd5 [gfs2]
                         lkb_id N RemoteID  pid exflg lkbflgs stat gr 
rq    waiting n ln             resource name
wilkins-pi: UsrLocal:  26513a1 3  1385218 21810     0       0 wait -1  
3          0 3 24 "       2          a5b67f"



There is 1 glock with waiters.
wilkins-pi.compbio.ucsf.edu, pid 21810 is waiting for glock 2/a5b67f, 
but no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 21809 is waiting for glock 2/a5b67f, 
but no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 1897 is waiting for glock 2/a5b67f, but

no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 6307 is waiting for glock 2/a5b67f, but

no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 10436 is waiting for glock 2/a5b67f, 
but no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 11000 is waiting for glock 2/a5b67f, 
but no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 11003 is waiting for glock 2/a5b67f, 
but no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 12140 is waiting for glock 2/a5b67f, 
but no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 21499 is waiting for glock 2/a5b67f, 
but no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 32601 is waiting for glock 2/a5b67f, 
but no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 653 is waiting for glock 2/a5b67f, but 
no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 10078 is waiting for glock 2/a5b67f, 
but no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 14436 is waiting for glock 2/a5b67f, 
but no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 7500 is waiting for glock 2/a5b67f, but

no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 22815 is waiting for glock 2/a5b67f, 
but no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 26056 is waiting for glock 2/a5b67f, 
but no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 26062 is waiting for glock 2/a5b67f, 
but no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 26122 is waiting for glock 2/a5b67f, 
but no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 26124 is waiting for glock 2/a5b67f, 
but no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 26128 is waiting for glock 2/a5b67f, 
but no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 31825 is waiting for glock 2/a5b67f, 
but no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 32125 is waiting for glock 2/a5b67f, 
but no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 2441 is waiting for glock 2/a5b67f, but

no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 2444 is waiting for glock 2/a5b67f, but

no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 21792 is waiting for glock 2/a5b67f, 
but no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 21793 is waiting for glock 2/a5b67f, 
but no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 21794 is waiting for glock 2/a5b67f, 
but no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 31941 is waiting for glock 2/a5b67f, 
but no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 31970 is waiting for glock 2/a5b67f, 
but no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 8584 is waiting for glock 2/a5b67f, but

no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 8590 is waiting for glock 2/a5b67f, but

no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 8821 is waiting for glock 2/a5b67f, but

no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 8822 is waiting for glock 2/a5b67f, but

no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 9487 is waiting for glock 2/a5b67f, but

no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 9488 is waiting for glock 2/a5b67f, but

no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 9489 is waiting for glock 2/a5b67f, but

no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 9490 is waiting for glock 2/a5b67f, but

no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 18878 is waiting for glock 2/a5b67f, 
but no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 325 is waiting for glock 2/a5b67f, but 
no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 3139 is waiting for glock 2/a5b67f, but

no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 3144 is waiting for glock 2/a5b67f, but

no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 7450 is waiting for glock 2/a5b67f, but

no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 31741 is waiting for glock 2/a5b67f, 
but no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 4982 is waiting for glock 2/a5b67f, but

no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 18258 is waiting for glock 2/a5b67f, 
but no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 18262 is waiting for glock 2/a5b67f, 
but no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 18263 is waiting for glock 2/a5b67f, 
but no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 18265 is waiting for glock 2/a5b67f, 
but no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 18269 is waiting for glock 2/a5b67f, 
but no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 20039 is waiting for glock 2/a5b67f, 
but no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 20042 is waiting for glock 2/a5b67f, 
but no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 20043 is waiting for glock 2/a5b67f, 
but no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 20044 is waiting for glock 2/a5b67f, 
but no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 20046 is waiting for glock 2/a5b67f, 
but no holder was found.
wilkins-pi.compbio.ucsf.edu, pid 20763 is waiting for glock 2/a5b67f, 
but no holder was found.
          The dlm has granted lkb "       2          a5b67f" to pid 
391344724

Clearly, I've got a hung lock of some sort.  Is there any way to clear 
the glock to free up all of these processes?  I really hate to reboot 
the cluster to clear this up since it's only effecting one pipeline....

Thanks in advance!

-- scooter

--
Linux-cluster mailing list
Linux-cluster redhat com
https://www.redhat.com/mailman/listinfo/linux-cluster

Confidentiality Warning:  This e-mail contains information intended only for the use of the individual or entity named above.  If the reader of this e-mail is not the intended recipient or the employee or agent responsible for delivering it to the intended recipient, any dissemination, publication or copying of this e-mail is strictly prohibited.  The sender does not accept any responsibility for any loss, disruption or damage to your data or computer system that may occur while using data contained in, or transmitted with, this e-mail.  
If you have received this e-mail in error, please immediately notify us by return e-mail.  Thank you.





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]