[Linux-cluster] gfs deadlock situation

Thu Feb 15 20:59:49 UTC 2007

Hi Wendy,

I created a bugzilla for this:
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=228916

Thanks,

Mark

> > Is there a way to resolve the deadlock without rebooting a server ?
> >
> I don't think there is a way to work around this hang other than reboot.
> Will look into this with Dave (who owns DLM code in the group).
>
> > On Wednesday 14 February 2007 20:41, Wendy Cheng wrote:
> >> Mark Hlawatschek wrote:
> >>> On Tuesday 13 February 2007 20:56, Wendy Cheng wrote:
> >>>> Wendy Cheng wrote:
> >>>>> So it is removing a file. It has obtained the directory lock and is
> >>>>> waiting for the file lock. Look to me DLM (LM_CB_ASYNC) callback
> >>>>> never occurs. Do you have abnormal messages in your /var/log/messages
> >>>>> file ? Dave, how to dump the locks from DLM side to see how DLM is
> >>>>> thinking ?
> >>>>
> >>>> shell> cman_tool services    /* find your lock space name */
> >>>> shell> echo "lock-space-name-found-above" > /proc/cluster/dlm_locks
> >>>> shell> cat /proc/cluster/dlm_locks
> >>>>
> >>>> Then try to find the lock (2, hex of (1114343)) - cut and paste the
> >>>> contents of that file here.
> >>>
> >>> syslog seems to be ok.
> >>> note, that the process 5856 is running on node1
> >>
> >> What I was looking for is a lock (type=2 and lock number=0x98d3
> >> (=1114343)) - that's the lock hangs process id=5856. Since pid=5856 also
> >> holds another directory exclusive lock, so no body can access to that
> >> directory.
> >>
> >> Apparently from GFS end, node 2 thinks 0x98d3 is "unlocked" and node 1
> >> is waiting for it. The only thing that can get node 1 out of this wait
> >> state is DLM's callback. If DLM doesn't have any record of this lock,
> >> pid=5856 will wait forever. Are you sure this is the whole file of DLM
> >> output ? This lock somehow disappears from DLM and I have no idea why we
> >> get into this state. If the files are too large, could you tar the files
> >> and email over ? I would like to see (both) complete glock and dlm lock
> >> dumps on both nodes (4 files here). If possible, add the following two
> >> outputs (so 6 files total):
> >>
> >> shell> cd /tmp  /* on both nodes */
> >> shell> script /* this should generate a file called typescript in /tmp
> >> directory */
> >> shell> crash
> >> crash> foreach bt /* keep hitting space bar until this command run thru
> >> */ crash> quit
> >> shell> <cntl><d> /* this should close out typescript file */
> >> shell> mv typescript nodex_crash /* x=1, 2 based on node1 or node2 */
> >>
> >> Tar these 6 files (glock_dump_1, glock_dump_2, dlm_dump_1, dlm_dump_2,
> >> node1_crash, node2_crash) and email them to wcheng at redhat.com
> >>
> >> Thank you for the helps if you can.
> >>
> >> -- Wendy
> >>
> >>> Here's the dlm output:
> >>>
> >>> node1:
> >>> Resource 0000010001218088 (parent 0000000000000000). Name (len=24) "
> >>>  2 1100e7"
> >>> Local Copy, Master is node 2
> >>> Granted Queue
> >>> Conversion Queue
> >>> Waiting Queue
> >>> 5eb00178 PR (EX) Master:     3eeb0117  LQ: 0,0x5
> >>> [...]
> >>> Resource 00000100f56f0618 (parent 0000000000000000). Name (len=24) "
> >>>  5 1100e7"
> >>> Local Copy, Master is node 2
> >>> Granted Queue
> >>> 5bc20257 PR Master:     3d9703e0
> >>> Conversion Queue
> >>> Waiting Queue
> >>>
> >>> node2:
> >>> Resource 00000107e462c8c8 (parent 0000000000000000). Name (len=24) "
> >>>  2 1100e7"
> >>> Master Copy
> >>> Granted Queue
> >>> 3eeb0117 PR Remote:   1 5eb00178
> >>> Conversion Queue
> >>> Waiting Queue
> >>> [...]
> >>> Resource 000001079f7e81d8 (parent 0000000000000000). Name (len=24) "
> >>>  5 1100e7"
> >>> Master Copy
> >>> Granted Queue
> >>> 3d9703e0 PR Remote:   1 5bc20257
> >>> 3e500091 PR
> >>> Conversion Queue
> >>> Waiting Queue
> >>>
> >>> Thanks for your help,
> >>>
> >>> Mark
> >>>
> >>>>>>>> we have the following deadlock situation:
> >>>>>>>>
> >>>>>>>> 2 node cluster consisting of node1 and node2.
> >>>>>>>> /usr/local is placed on a GFS filesystem mounted on both nodes.
> >>>>>>>> Lockmanager is dlm.
> >>>>>>>> We are using RHEL4u4
> >>>>>>>>
> >>>>>>>> a strace to ls -l /usr/local/swadmin/mnx/xml ends up in
> >>>>>>>> lstat("/usr/local/swadmin/mnx/xml",
> >>>>>>>>
> >>>>>>>> This happens on both cluster nodes.
> >>>>>>>>
> >>>>>>>> All processes trying to access the directory
> >>>>>>>> /usr/local/swadmin/mnx/xml
> >>>>>>>> are in "Waiting for IO (D)" state. I.e. system load is at about
> >>>>>>>> 400 ;-)
> >>>>>>>>
> >>>>>>>> Any ideas ?
> >>>>>>>
> >>>>>>> Quickly browsing this, look to me that process with pid=5856 got
> >>>>>>> stuck. That process had the file or directory (ino number 627732 -
> >>>>>>> probably /usr/local/swadmin/mnx/xml) exclusive lock so everyone was
> >>>>>>> waiting for it. The faulty process was apparently in the middle of
> >>>>>>> obtaining another
> >>>>>>> exclusive lock (and almost got it). We need to know where pid=5856
> >>>>>>> was stuck at that time. If this occurs again, could you use "crash"
> >>>>>>> to back trace that process and show us the output ?
> >>>>>>
> >>>>>> Here's the crash output:
> >>>>>>
> >>>>>> crash> bt 5856
> >>>>>> PID: 5856   TASK: 10bd26427f0       CPU: 0   COMMAND: "java"
> >>>>>>  #0 [10bd20cfbc8] schedule at ffffffff8030a1d1
> >>>>>>  #1 [10bd20cfca0] wait_for_completion at ffffffff8030a415
> >>>>>>  #2 [10bd20cfd20] glock_wait_internal at ffffffffa018574e
> >>>>>>  #3 [10bd20cfd60] gfs_glock_nq_m at ffffffffa01860ce
> >>>>>>  #4 [10bd20cfda0] gfs_unlink at ffffffffa019ce41
> >>>>>>  #5 [10bd20cfea0] vfs_unlink at ffffffff801889fa
> >>>>>>  #6 [10bd20cfed0] sys_unlink at ffffffff80188b19
> >>>>>>  #7 [10bd20cff30] filp_close at ffffffff80178e48
> >>>>>>  #8 [10bd20cff50] error_exit at ffffffff80110d91
> >>>>>>     RIP: 0000002a9593f649  RSP: 0000007fbfffbca0  RFLAGS: 00010206
> >>>>>>     RAX: 0000000000000057  RBX: ffffffff8011026a  RCX:
> >>>>>> 0000002a9cc9c870 RDX: 0000002ae5989000  RSI: 0000002a962fa3a8  RDI:
> >>>>>> 0000002ae5989000 RBP: 0000000000000000   R8: 0000002a9630abb0   R9:
> >>>>>> 0000000000000ffc R10: 0000002a9630abc0  R11: 0000000000000206  R12:
> >>>>>> 0000000040115700 R13: 0000002ae23294b0  R14: 0000007fbfffc300  R15:
> >>>>>> 0000002ae5989000 ORIG_RAX: 0000000000000057  CS: 0033  SS: 002b
> >>>>>>
> >>>>>>>> a lockdump analysis with the decipher_lockstate_dump and
> >>>>>>>> parse_lockdump
> >>>>>>>> shows the following output (The whole file is too large for the
> >>>>>>>> mailing-list):
> >>>>>>>>
> >>>>>>>> Entries:  101939
> >>>>>>>> Glocks:  60112
> >>>>>>>> PIDs:  751
> >>>>>>>>
> >>>>>>>> 4 chain:
> >>>>>>>> lockdump.node1.dec Glock (inode[2], 1114343)
> >>>>>>>>   gl_flags = lock[1]
> >>>>>>>>   gl_count = 5
> >>>>>>>>   gl_state = shared[3]
> >>>>>>>>   req_gh = yes
> >>>>>>>>   req_bh = yes
> >>>>>>>>   lvb_count = 0
> >>>>>>>>   object = yes
> >>>>>>>>   new_le = no
> >>>>>>>>   incore_le = no
> >>>>>>>>   reclaim = no
> >>>>>>>>   aspace = 1
> >>>>>>>>   ail_bufs = no
> >>>>>>>>   Request
> >>>>>>>>     owner = 5856
> >>>>>>>>     gh_state = exclusive[1]
> >>>>>>>>     gh_flags = try[0] local_excl[5] async[6]
> >>>>>>>>     error = 0
> >>>>>>>>     gh_iflags = promote[1]
> >>>>>>>>   Waiter3
> >>>>>>>>     owner = 5856
> >>>>>>>>     gh_state = exclusive[1]
> >>>>>>>>     gh_flags = try[0] local_excl[5] async[6]
> >>>>>>>>     error = 0
> >>>>>>>>     gh_iflags = promote[1]
> >>>>>>>>   Inode: busy
> >>>>>>>> lockdump.node2.dec Glock (inode[2], 1114343)
> >>>>>>>>   gl_flags =
> >>>>>>>>   gl_count = 2
> >>>>>>>>   gl_state = unlocked[0]
> >>>>>>>>   req_gh = no
> >>>>>>>>   req_bh = no
> >>>>>>>>   lvb_count = 0
> >>>>>>>>   object = yes
> >>>>>>>>   new_le = no
> >>>>>>>>   incore_le = no
> >>>>>>>>   reclaim = no
> >>>>>>>>   aspace = 0
> >>>>>>>>   ail_bufs = no
> >>>>>>>>   Inode:
> >>>>>>>>     num = 1114343/1114343
> >>>>>>>>     type = regular[1]
> >>>>>>>>     i_count = 1
> >>>>>>>>     i_flags =
> >>>>>>>>     vnode = yes
> >>>>>>>> lockdump.node1.dec Glock (inode[2], 627732)
> >>>>>>>>   gl_flags = dirty[5]
> >>>>>>>>   gl_count = 379
> >>>>>>>>   gl_state = exclusive[1]
> >>>>>>>>   req_gh = no
> >>>>>>>>   req_bh = no
> >>>>>>>>   lvb_count = 0
> >>>>>>>>   object = yes
> >>>>>>>>   new_le = no
> >>>>>>>>   incore_le = no
> >>>>>>>>   reclaim = no
> >>>>>>>>   aspace = 58
> >>>>>>>>   ail_bufs = no
> >>>>>>>>   Holder
> >>>>>>>>     owner = 5856
> >>>>>>>>     gh_state = exclusive[1]
> >>>>>>>>     gh_flags = try[0] local_excl[5] async[6]
> >>>>>>>>     error = 0
> >>>>>>>>     gh_iflags = promote[1] holder[6] first[7]
> >>>>>>>>   Waiter2
> >>>>>>>>     owner = none[-1]
> >>>>>>>>     gh_state = shared[3]
> >>>>>>>>     gh_flags = try[0]
> >>>>>>>>     error = 0
> >>>>>>>>     gh_iflags = demote[2] alloced[4] dealloc[5]
> >>>>>>>>   Waiter3
> >>>>>>>>     owner = 32753
> >>>>>>>>     gh_state = shared[3]
> >>>>>>>>     gh_flags = any[3]
> >>>>>>>>     error = 0
> >>>>>>>>     gh_iflags = promote[1]
> >>>>>>>>   [...loads of Waiter3 entries...]
> >>>>>>>>   Waiter3
> >>>>>>>>     owner = 4566
> >>>>>>>>     gh_state = shared[3]
> >>>>>>>>     gh_flags = any[3]
> >>>>>>>>     error = 0
> >>>>>>>>     gh_iflags = promote[1]
> >>>>>>>>   Inode: busy
> >>>>>>>> lockdump.node2.dec Glock (inode[2], 627732)
> >>>>>>>>   gl_flags = lock[1]
> >>>>>>>>   gl_count = 375
> >>>>>>>>   gl_state = unlocked[0]
> >>>>>>>>   req_gh = yes
> >>>>>>>>   req_bh = yes
> >>>>>>>>   lvb_count = 0
> >>>>>>>>   object = yes
> >>>>>>>>   new_le = no
> >>>>>>>>   incore_le = no
> >>>>>>>>   reclaim = no
> >>>>>>>>   aspace = 0
> >>>>>>>>   ail_bufs = no
> >>>>>>>>   Request
> >>>>>>>>     owner = 20187
> >>>>>>>>     gh_state = shared[3]
> >>>>>>>>     gh_flags = any[3]
> >>>>>>>>     error = 0
> >>>>>>>>     gh_iflags = promote[1]
> >>>>>>>>   Waiter3
> >>>>>>>>     owner = 20187
> >>>>>>>>     gh_state = shared[3]
> >>>>>>>>     gh_flags = any[3]
> >>>>>>>>     error = 0
> >>>>>>>>     gh_iflags = promote[1]
> >>>>>>>>   [...loads of Waiter3 entries...]
> >>>>>>>>   Waiter3
> >>>>>>>>     owner = 10460
> >>>>>>>>     gh_state = shared[3]
> >>>>>>>>     gh_flags = any[3]
> >>>>>>>>     error = 0
> >>>>>>>>     gh_iflags = promote[1]
> >>>>>>>>   Inode: busy
> >>>>>>>> 2 requests
> >>>>>>>
> >>>>>>> --
> >>
> >> --
> >> Linux-cluster mailing list
> >> Linux-cluster at redhat.com
> >> https://www.redhat.com/mailman/listinfo/linux-cluster

-- 
Gruss / Regards,

** Visit us at CeBIT 2007 in Hannover/Germany **
** in Hall 5, Booth G48/2  (15.-21. of March) **

Dipl.-Ing. Mark Hlawatschek
http://www.atix.de/
http://www.open-sharedroot.org/

**
ATIX - Ges. fuer Informationstechnologie und Consulting mbH
Einsteinstr. 10 - 85716 Unterschleissheim - Germany