[Linux-cluster] corrupted gfs filesystem

Fri Dec 9 16:20:52 UTC 2005

nope, the fs was unmounted on both nodes.  I ran it from node01 after I
was unable to mount it and had to reboot the node because the mount
command hung the system.  The latest output from gfs_fsck...

Initializing fsck
fs_compute_bitstructs:  # of blks in rgrp do not equal # of blks
represented in bitmap.
        bi_start = 134230407
        bi_len   = 17
        GFS_NBBY = 4
        ri_data  = 8
Unable to fill in resource group information.

The only thing that has changed is I tried to mount it a 2nd time and
again couldn't kill mount and was forced to reboot. 

-----Original Message-----
From: David Teigland [mailto:teigland at redhat.com] 
Sent: Friday, December 09, 2005 11:08 AM
To: Jeff Dinisco
Cc: linux-cluster at redhat.com
Subject: Re: [Linux-cluster] corrupted gfs filesystem

On Thu, Dec 08, 2005 at 02:01:50PM -0800, Jeff Dinisco wrote:
> I'm testing gfs 6.1 (lock dlm) in a 2 node cluster on FC4.  I took
both
> nodes out of the cluster manually, then added node01 back in.  As
> expected, it fenced node02.  Fencing was done by shutting down a
network
> port on a switch so iscsi could not access the storage devices.
> However, the device files still existed.  
> 
> Just to see how the cluster would react, I started up ccsd, cman, and
> fenced on node02.  It joined the cluster w/ out issue.  Even though I
> knew iscsi was unable to get to the storage devices, I started the gfs
> init script which attempted to mount the filesystem.  Looks like it
> trashed it.  

But node02 couldn't reach the storage, how could it trash it?  If node02
_could_ reach the storage, it would have just mounted the fs normally.

> Output from gfs_fsck...

When and where did you run fsck?  Not while either node had the fs
mounted
I trust.

Dave

> 
> # gfs_fsck /dev/iscsi/laxrifa01/lun0
> Initializing fsck
> Buffer #150609096 (1 of 5) is neither GFS_METATYPE_RB nor
> GFS_METATYPE_RG.
> Resource group is corrupted.
> Unable to read in rgrp descriptor.
> Unable to fill in resource group information.
> 
> Is this expected behavior or is it possible that I'm missing something
> in my configuration that allowed this to happen?  Thanks.