[Linux-cluster] GFS 2 node hang in rm test

Tue Dec 7 00:13:50 UTC 2004

On Mon, 2004-12-06 at 11:45, Ken Preslan wrote:
> On Fri, Dec 03, 2004 at 03:08:00PM -0800, Daniel McNeil wrote:
> > Is there any documentation on what these fields are?
> 
> Ben Cahill is doing a good job adding comments.  So, you might want
> to look in cluster/gfs-kernels/src/gfs/incore.h and 
> cluster/gfs-kernels/src/gfs/glock.h.
> 
> > What is the difference between Waiter2 and Waiter3?
> 
> The waiter2 list is for requests to drop the glock.  The waiter3 list
> is for processes waiting to acquire the glock.
> 
> > Shouldn't the cl030 know (bast) that the updatedb on cl031
> > is trying to go shared->exclusive?
> 
> It does, that's why there is something on the waiter2 list.
> 
> > What does the gfs_tool/parse_lockdump script do?
> 
> parse_lockdump takes the output of decipher_lockstate_dump from multiple
> machines and looks for groups of locks and processes that might be
> deadlocking together.  The interesting one it found for your situation is:
> 
> 2 chain:
> 30.ld Glock (inode[2], 39860) 
>   gl_flags =  
>   gl_count = 6 
>   gl_state = shared[3] 
>   lvb_count = 0 
>   object = yes 
>   aspace = 2 
>   reclaim = no 
>   Holder 
>     owner = 28723 
>     gh_state = shared[3] 
>     gh_flags = atime[9]  
>     error = 0 
>     gh_iflags = promote[1] holder[6] first[7]  
>   Waiter2 
>     owner = none[-1] 
>     gh_state = unlocked[0] 
>     gh_flags = try[0]  
>     error = 0 
>     gh_iflags = demote[2] alloced[4] dealloc[5]  
>   Waiter3 
>     owner = 29693 
>     gh_state = shared[3] 
>     gh_flags = any[3]  
>     error = 0 
>     gh_iflags = promote[1]  
>   Inode: busy 
> 31.ld Glock (inode[2], 39860) 
>   gl_flags = lock[1]  
>   gl_count = 5 
>   gl_state = shared[3] 
>   lvb_count = 0 
>   object = yes 
>   aspace = 1 
>   reclaim = no 
>   Request 
>     owner = 29537 
>     gh_state = exclusive[1] 
>     gh_flags = local_excl[5] atime[9]  
>     error = 0 
>     gh_iflags = promote[1]  
>   Waiter3 
>     owner = 29537 
>     gh_state = exclusive[1] 
>     gh_flags = local_excl[5] atime[9]  
>     error = 0 
>     gh_iflags = promote[1]  
>   Inode: busy 
> 1 requests
> 
> It shows machine 31 is asking for the lock exclusively to perform an
> atime update.  Machine 30 has received the callback asking for the
> lock to be released, but it can't because process 28723 is actively
> using the lock.
> 
> Looking at the backtrace for process 28723, we see:
> 
> rm            D 00000008     0 28723  28711                     (NOTLB)
> caa0ab60 00000082 caa0ab50 00000008 00000001 00000000 00000008 00000000 
>        c1addb80 00000246 00002f57 caa0ab6c c0105d5e ddb18f90 00000000 caa0ab48 
>        c170df60 00000000 00003c53 44ddcfe7 00002f57 ea82cd70 ea82ced0 f8b80260 
> Call Trace:
>  [<c03d409c>] rwsem_down_write_failed+0x9c/0x18e
>  [<f8b7a259>] .text.lock.locking+0x72/0x1c9 [dlm]
>  [<f8b78879>] dlm_lock+0x319/0x3b0 [dlm]
>  [<f8af3eda>] do_dlm_lock+0x11a/0x200 [lock_dlm]
>  [<f8af407c>] lm_dlm_lock+0x6c/0x80 [lock_dlm]
>  [<f8b3b7b0>] gfs_glock_xmote_th+0xc0/0x200 [gfs]
>  [<f8b3c749>] gfs_glock_prefetch+0xf9/0x1e0 [gfs]
>  [<f8b3cfba>] gfs_glock_prefetch_num+0x6a/0xc0 [gfs]
>  [<f8b5268b>] filldir_reg_func+0xbb/0x180 [gfs]
>  [<f8b323a4>] do_filldir_main+0x1b4/0x240 [gfs]
>  [<f8b324f8>] do_filldir_single+0xc8/0x110 [gfs]
>  [<f8b3359f>] dir_e_read+0x31f/0x360 [gfs]
>  [<f8b33f5c>] gfs_dir_read+0x3c/0x90 [gfs]
>  [<f8b527f5>] readdir_reg+0xa5/0xc0 [gfs]
>  [<f8b52b7b>] gfs_readdir+0x6b/0x90 [gfs]
>  [<c01723ab>] vfs_readdir+0x9b/0xc0
>  [<c0172840>] sys_getdents64+0x70/0xaf
>  [<c010537d>] sysenter_past_esp+0x52/0x71
> 
> So, inode 39860 is a directory.  That process acquired the lock on the
> directory to perform a readdir.  While it does the readdir, it sends out
> prefetch requests for the locks in the directory.  Those requests
> are asynchronous tries, so they shouldn't ever block for long.
> The prefetch is for this glock:
> 
> 1 chain:
> 30.ld Glock (inode[2], 39883) 
>   gl_flags = lock[1] prefetch[3] 
>   gl_count = 2 
>   gl_state = unlocked[0] 
>   lvb_count = 0 
>   object = no 
>   aspace = 0 
>   reclaim = no 
> 0 requests
> 
> So, the question is:  Why is the DLM stuck on that request?

Looking at the stack trace above and dissabling dlm.ko,
it looks like dlm_lock+0x319 is the call to dlm_lock_stage1().
looking at dlm_lock_stage1(), it looks like it is sleeping on
	 down_write(&rsb->res_lock)

So now I have to find who is holding the res_lock.

Any ideas on how to debug this?

It is reproducible. My test script was on 1st 17th time through
the test loop when it hit this.

I hit it again over the weekend in only 4 times through the loop,
but this time on a 3 node test.  Unfortunately, when I got
the stack trace on the 3rd node, the other nodes thought it
died and kicked it out of the cluster.  The other nodes got
stuck spinning in the kernel along with 70 other 'df' processes
that cron was periodically firing off.

I do more poking around on this one and see if I can get more
info out on the next hang also.

Daniel