[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

[Linux-cluster] RE: GFS2 subdirectory hang



On Thu, 2009-08-27 at 09:25 -0500, Johnson, Eric wrote:
>> I have a 32-bit RHEL 5.3 Cluster Suite setup of two nodes with GFS2
file
>> systems on FC attached SAN. I have run into this issue twice now,
where
>> attempts to access a certain directory within one of the GFS2 file
>> systems never return. Other directories and paths within that file
>> system work just fine.
>> 
>> The first time it happened, I had to crash the node to get it to
release
>> the FS, then unmount it on both nodes, fsck it, remount it, and it
was
>> fine. It has happened again (different path, different file system).
A
>> simple "ls" in the directory (which has maybe 20 files in it) leaves
the
>> process in an uninterruptible sleep state. I left it all night and it
>> never returned.
>> 
>> I'm not sure what other info would be useful on this, but this is
what I
>> see from a gfs2_tool lockdump output for ls PID on that node:
>> 
>> G:  s:UN n:2/bf1df f:l t:SH d:EX/0 l:0 a:0 r:4
>>  H: s:SH f:aW e:0 p:9938 [ls] gfs2_lookup+0x44/0x90 [gfs2]
>              ^ The W flag indicates that this is waiting for a glock
>
>Currently the glock is in the UN (unlocked) state, and its trying to
get
>a SH (shared) lock. The next step in the investigation is to look for
>the same glock number 2/bf1df on the other nodes, and see what is
>holding that lock. This particular node will hang until the lock is
>released on whichever other node is holding it.
>
>If there is nothing on any other node apparently holding that lock in
>the glock dumps, then looking at dlm lock dumps would be the next step,
>
>Steve.

Thanks for the response, Steve. I found this reference to that lock on
the other node:

G:  s:EX n:2/bf1df f:dy t:EX d:SH/0 l:0 a:0 r:4
 I: n:1155192/782815 t:8 f:0x00000010

I'm having trouble finding documentation that describes what each of
these fields are. There's no obvious process ID here, and all I can
determine is that it's an exclusive lock.

Eric


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]