[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

[Linux-cluster] mount hang in kcl_join_service

I have not been able to get my tests to run for more than
1 day for the last several tries.  This time my test hung
during mount in kcl_join_service().  My test does mount and umount 
several times for each test run.  This time it hung on the
22nd test run.  It looks like it was starting a 3node test
where a gfs file system is mounted on all 3 nodes and then
does a umount/mount 1 node at a time.  So this should have
done an umount on cl031 and then hung on a mount on cl031
with cl030 and cl032 having the gfs file system still mounted.

The mount stack trace is:
mount         D C170EF9C     0  1557      1         12111  3932 (NOTLB)
f09b9c30 00000086 f1f4c580 c170ef9c 00002ca3 c2c39ae0 00000008 00000000
       e947e548 7b01b78b 00002ca3 f09b9c10 f1f4c580 00000000 c170f8c0 c170ef60
       00000000 00003ba8 7b01b9ea 00002ca3 c2c39ae0 c2c39c4c 00000000 00002ca3
Call Trace:
 [<c03ce814>] wait_for_completion+0xa4/0xe0
 [<f8ab6164>] kcl_join_service+0x154/0x180 [cman]
 [<f8890fff>] init_mountgroup+0x6f/0xc0 [lock_dlm]
 [<f88934b1>] lm_dlm_mount+0xa1/0xf0 [lock_dlm]
 [<f8812300>] lm_mount+0x140/0x230 [lock_harness]
 [<f9017f4d>] gfs_lm_mount+0x1fd/0x390 [gfs]
 [<f9024276>] fill_super+0x596/0x14c0 [gfs]
 [<f902533f>] gfs_get_sb+0x15f/0x1b0 [gfs]
 [<c0166ae8>] do_kern_mount+0x58/0xe0
 [<c017ce08>] do_new_mount+0x98/0xe0
 [<c017d4b5>] do_mount+0x165/0x1b0
 [<c017d8c7>] sys_mount+0x97/0x100
 [<c010323d>] sysenter_past_esp+0x52/0x75

A bunch of info is available here:

The bad news is that taking a stack trace to a serial console
causes nodes to be kicked out of the cluster, so some of the
info has the nodes being kicked out.

Any ideas on how to figure this out?


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]