[Linux-cluster] umount hung single node
David Teigland
teigland at redhat.com
Wed Mar 16 02:28:17 UTC 2005
> I upgrade to the latest cvs and I hit the same problem again.
>
> umount is hung:
> root 24099 24093 0 Mar14 ? 00:00:02 umount /gfs_stripe5
>
> and dlm_astd is spinning:
> 23895 root 20 -5 0 0 0 R 99.9 0.0 1479:34 dlm_astd
>
> Any ideas? Is there any debug info that would be useful?
Try 'cat /proc/cluster/dlm_stats' to see if any of those values are
changing over the span of a few seconds; if so it'll be helpful to
see which are changing (especially the AST numbers).
The other standard stuff might also help:
echo <lockspace name> >> /proc/cluster/dlm_locks
cat /proc/cluster/dlm_locks > dlm_locks.txt
cat /proc/cluster/dlm_debug > dlm_debug.txt
I'm at a real loss for a good way to see what's happening, though.
The attached patch may at least tell us which loop it's stuck in.
--
Dave Teigland <teigland at redhat.com>
-------------- next part --------------
Index: ast.c
===================================================================
RCS file: /cvs/cluster/cluster/dlm-kernel/src/ast.c,v
retrieving revision 1.24
diff -u -r1.24 ast.c
--- ast.c 11 Mar 2005 08:15:59 -0000 1.24
+++ ast.c 16 Mar 2005 02:21:09 -0000
@@ -199,13 +199,21 @@
void (*bast) (long param, int mode);
long astparam;
uint16_t flags = 0, found;
+ uint32_t debug, debug2 = 0;
for (;;) {
+ if (++debug2 > 20000)
+ printk("ast for stuck\n");
+ debug = 0;
found = FALSE;
down(&ast_queue_lock);
list_for_each_entry(lkb, &ast_queue, lkb_astqueue) {
rsb = lkb->lkb_resource;
ls = rsb->res_ls;
+ if (++debug > 10000)
+ printk("ast foreach stuck lkb %x %x rsb %s\n",
+ lkb->lkb_id, lkb->lkb_astflags,
+ rsb->res_name);
/* don't deliver ast's for locks in lockspaces
being recovered */
More information about the Linux-cluster
mailing list