[Linux-cluster] GFS1: node get withdrawn intermittent

Thu Feb 8 18:02:50 UTC 2007

Interesting. While testing GFS with low jounrnal size and ResourceGroup
size, I hit the same issue,

Feb  7 17:01:42 cfs1 kernel: GFS: fsid=cisco:gfs2.2: fatal: assertion "x
<= length" failed
Feb  7 17:01:42 cfs1 kernel: GFS: fsid=cisco:gfs2.2:   function =
blkalloc_internal 
Feb  7 17:01:42 cfs1 kernel: GFS: fsid=cisco:gfs2.2:   file =
/download/gfs/cluster.cvs-rhel4/gfs-kernel/src/gfs/rgrp.c, line = 1458 
Feb  7 17:01:42 cfs1 kernel: GFS: fsid=cisco:gfs2.2:   time = 1170896502
Feb  7 17:01:42 cfs1 kernel: GFS: fsid=cisco:gfs2.2: about to withdraw
from the cluster
Feb  7 17:01:42 cfs1 kernel: GFS: fsid=cisco:gfs2.2: waiting for
outstanding I/O
Feb  7 17:01:42 cfs1 kernel: GFS: fsid=cisco:gfs2.2: telling LM to
withdraw

This happened on a 3 node GFS over 512M device.

$ gfs_mkfs -t cisco:gfs2 -p lock_dlm -j 3 -J 8 -r 16 -X /dev/hda12

I was using bonnie++ to create about 10K files of 1K each from each of 3
nodes simulataneous.

Look at the code in rgrp.c it seems related to failure to find a
particular resource group block. Could this be due to a very low RG size
I'm using (16M) ??

Thanks,
Sridharan

> -----Original Message-----
> From: linux-cluster-bounces at redhat.com 
> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of 
> rh-cluster at menole.net
> Sent: Thursday, February 08, 2007 3:35 AM
> To: linux-cluster at redhat.com
> Subject: [Linux-cluster] GFS1: node get withdrawn intermittent
> 
> Hi,
> 
> since some days I do get a withdraw on 1 node of my 6 nodes 
> gfs1 cluster.
> Yesterday I did reboot all nodes. Now the problem has moved to another
> node.
> 
> kernel messages are the same anytime:
> 
> GFS: fsid=epsilon:amal.1: fatal: assertion "x <= length" failed
> GFS: fsid=epsilon:amal.1:   function = blkalloc_internal
> GFS: fsid=epsilon:amal.1:   file =
> /build/buildd/linux-modules-extra-2.6-2.6.17/debian/build/buil
d_amd64_none_amd64_redhat-cluster/gfs/gfs/rgrp.c,
> line = 1458
> GFS: fsid=epsilon:amal.1:   time = 1170922910
> GFS: fsid=epsilon:amal.1: about to withdraw from the cluster
> GFS: fsid=epsilon:amal.1: waiting for outstanding I/O
> GFS: fsid=epsilon:amal.1: telling LM to withdraw
> lock_dlm: withdraw abandoned memory
> GFS: fsid=epsilon:amal.1: withdrawn
> 
> `gfs_tool df` says:
> /home:
>   SB lock proto = "lock_dlm"rently  mounted GFS filesystems.  
> Each line
> repre-
>   SB lock table = "epsilon:affaire"The columns represent (in 
> order): 1)
> A num-
>   SB ondisk format = 1309s a cookie that represents the mounted
> filesystem. 2)
>   SB multihost format = 1401e device that holds the 
> filesystem (well, the
> name
>   Block size = 4096he Linux kernel knows it). 3) The lock table field
> that the
>   Journals = 12ilesystem was mounted with.
>   Resource Groups = 1166
>   Mounted lock proto = "lock_dlm"rsize]
>   Mounted lock table = "epsilon:amal"t the locks this machine holds 
> for  a
>   Mounted host data = ""esystem.  Buffersize  is  the  size  of the
> buffer (in
>   Journal number = 0 that gfs_tool allocates to store  the  
> lock  data 
> during
>   Lock module flags = ng.  It defaults to 4194304 bytes.
>   Local flocks = FALSE
>   Local caching = FALSE
>   Oopses OK = FALSE loads  arguments  into  the  module what will
> override the
>               mount options passed with the -o field on the 
> next  mount. 
>  See
>   Type           Total          Used           Free           use%
>   
> --------------------------------------------------------------
> ----------
>   inodes         731726         731726         0              100%
>   metadata       329491         4392           325099         1%cks.
>   data           75336111       4646188        70689923       6%
> 
> 
> System:
> 6 Dual AMD Opteron
> Kernel 2.6.17-2-amd64
> Userland 32 Bit
> Storage device via qlogic fibre channel qla2xxx, without 
> serious problems
> No LVM
> 
> 
> Kind Regards,
> 
> menole
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>