[Linux-cluster] umount hang
Daniel McNeil
daniel at osdl.org
Mon Nov 22 20:44:07 UTC 2004
I left some automated tests running over the weekend and
ran into a umount hang.
A single GFS file system was mounted on 2 nodes of a 3 node
cluster. The test had just removed 2 subdirectories - one
from each node. The test was then unmounting the file system
from one node when the umount hung.
Here's a stack trace from the hung umount (on cl030):
(node cl030)
umount D 00000008 0 14345 14339 (NOTLB)
db259e04 00000086 db259df4 00000008 00000001 00000000 00000008 db259dc8
eda96dc0 f15d0750 c044aac0 db259000 db259de4 c01196d1 f7cf0b90
450fa673
c170df60 00000000 00049d65 44bb3183 0002dfe0 f15d0750 f15d08b0
c170df60
Call Trace:
[<c03d39d4>] wait_for_completion+0xa4/0xe0
[<f8aba97e>] kcl_leave_service+0xfe/0x180 [cman]
[<f8b06756>] release_lockspace+0x2d6/0x2f0 [dlm]
[<f8a9010c>] release_gdlm+0x1c/0x30 [lock_dlm]
[<f8a903f4>] lm_dlm_unmount+0x24/0x50 [lock_dlm]
[<f881e496>] lm_unmount+0x46/0xac [lock_harness]
[<f8b8089f>] gfs_put_super+0x30f/0x3c0 [gfs]
[<c01654fa>] generic_shutdown_super+0x18a/0x1a0
[<c016608d>] kill_block_super+0x1d/0x40
[<c01652a1>] deactivate_super+0x81/0xa0
[<c017c6cc>] sys_umount+0x3c/0xa0
[<c017c749>] sys_oldumount+0x19/0x20
[<c010537d>] sysenter_past_esp+0x52/0x71
[root at cl030 proc]# cat /proc/cluster/services
Service Name GID LID State
Code
Fence Domain: "default" 1 2 run -
[3 1 2]
DLM Lock Space: "stripefs" 222 275 run
S-13,210,1
[1 3]
Cat'ing /proc/cluster/services on the 2nd node (cl031) hangs.
[root at cl031 root]# cat /proc/cluster/services
>From the 2nd node (cl031). Here are some stack traces that
might be interesting:
cman_serviced D 00000008 0 3818 6 12593 665 (L-TLB)
ebc23edc 00000046 ebc23ecc 00000008 00000001 00000010 00000008 00000002
f7726dc0 00000000 00000000 f5a4b230 00000000 00000010 00000010 ebc23f24
c170df60 00000000 000005a8 d42bcdab 0002e201 eb5119f0 eb511b50 ebc23f08
Call Trace:
[<c03d409c>] rwsem_down_write_failed+0x9c/0x18e
[<f8b06acb>] .text.lock.lockspace+0x4e/0x63 [dlm]
[<f8a8daa2>] process_leave_stop+0x32/0x80 [cman]
[<f8a8dcf2>] process_one_uevent+0xc2/0x100 [cman]
[<f8a8e798>] process_membership+0xc8/0xca [cman]
[<f8a8bf65>] serviced+0x165/0x1d0 [cman]
[<c013426a>] kthread+0xba/0xc0
[<c0103325>] kernel_thread_helper+0x5/0x10
cat /proc/cluster/services stack trace:
cat D 00000008 0 22151 1 13435 (NOTLB)
c1f7ae90 00000086 c1f7ae7c 00000008 00000002 000000d0 00000008 c1f7ae74
eb0acdc0 00000001 00000246 00000000 e20c4670 f474f1d0 00000000 c17168c0
c1715f60 00000001 00159c05 bad07454 0003aa83 e20c4670 e20c47d0 00000000
Call Trace:
[<c03d2b03>] __down+0x93/0xf0
[<c03d2c93>] __down_failed+0xb/0x14
[<f8a9053c>] .text.lock.sm_misc+0x2d/0x41 [cman]
[<f8a90144>] sm_seq_next+0x34/0x50 [cman]
[<c017e629>] seq_read+0x159/0x2b0
[<c015e49f>] vfs_read+0xaf/0x120
[<c015e74b>] sys_read+0x4b/0x80
[<c010537d>] sysenter_past_esp+0x52/0x71
The full stack traces are available here:
http://developer.osdl.org/daniel/gfs_umount_hang/
I'm running on 2.6.9 and cvs code from Nov 9th.
Any ideas?
Daniel
More information about the Linux-cluster
mailing list