[Linux-cluster] gfs_fsck ...

Fri Dec 22 15:17:17 UTC 2006

Ivan Pantovic wrote:
>
> Hi everyone,
>
> after power out I have this:
>
>> GFS: fsid=mail:mailbox.1: fatal: invalid metadata block
>> GFS: fsid=mail:mailbox.1:   bh = 28377265 (magic)
>> GFS: fsid=mail:mailbox.1:   function = gfs_rgrp_read
>> GFS: fsid=mail:mailbox.1:   file = 
>> /var/tmp/portage/gfs-kernel-1.03.00/work/cluster-1.03.00/gfs-kernel/src/gfs/rgrp.c, 
>> line = 830
>> GFS: fsid=mail:mailbox.1:   time = 1166797337
>> GFS: fsid=mail:mailbox.1: about to withdraw from the cluster
>> GFS: fsid=mail:mailbox.1: waiting for outstanding I/O
>> GFS: fsid=mail:mailbox.1: telling LM to withdraw
>
> and after that ...
>
>> clu3 ~ # gfs_fsck -n /dev/vg/mailbox
>> Initializing fsck
>> Block #28377265 (0x1b100b1) (4 of 5) is neither GFS_METATYPE_RB nor 
>> GFS_METATYPE_RG.
>> Resource group or index is corrupted.
>> Unable to read in rgrp descriptor.
>> number_of_rgs = 2576.
>> Block #28377265 (0x1b100b1) (4 of 5) is neither GFS_METATYPE_RB nor 
>> GFS_METATYPE_RG.
>> Block #136509413 (0x822f7e5) (2 of 5) is neither GFS_METATYPE_RB nor 
>> GFS_METATYPE_RG.
>> Block #144504564 (0x89cf6f4) (5 of 5) is neither GFS_METATYPE_RB nor 
>> GFS_METATYPE_RG.
>> Block #162788548 (0x9b3f4c4) (3 of 5) is neither GFS_METATYPE_RB nor 
>> GFS_METATYPE_RG.
>> Starting pass1
>> Block #28377265 (0x1b100b1) (4 of 5) is neither GFS_METATYPE_RB nor 
>> GFS_METATYPE_RG.
>> Resource group or index is corrupted.
>
> the partition is 600Gigs ... the other one (home) was smaller and fsck 
> did the trick...
>
> i had same errors in kernel log...
>
> this was on home partition ..
>
>> GFS: fsid=mail:home.2: fatal: invalid metadata block
>> GFS: fsid=mail:home.2:   bh = 1122 (magic)
>> GFS: fsid=mail:home.2:   function = gfs_get_meta_buffer
>> GFS: fsid=mail:home.2:   file = 
>> /var/tmp/portage/gfs-kernel-1.03.00/work/cluster-1.03.00/gfs-kernel/src/gfs/dio.c, 
>> line = 1223
>> GFS: fsid=mail:home.2:   time = 1166792982
>> GFS: fsid=mail:home.2: about to withdraw from the cluster
>> GFS: fsid=mail:home.2: waiting for outstanding I/O
>> GFS: fsid=mail:home.2: telling LM to withdraw
>> lock_dlm: withdraw abandoned memory
>> GFS: fsid=mail:home.2: withdrawn
>
> but fsck on that partition was a success ...
>
>> clu2 gfs_fsck #  ./gfs_fsck -y /dev/vg/home
>> Initializing fsck
>> Clearing journals (this may take a while).....
>> Journals cleared.
>> Starting pass1
>> Pass1 complete
>> Starting pass1b
>> Pass1b complete
>> Starting pass1c
>> Pass1c complete
>> Starting pass2
>> Found directory entry '_5' in 1116 to something not a file or directory!
>> Directory entry '_5' cleared
>> Entries is 41 - should be 40 for 1116
>> Pass2 complete
>> Starting pass3
>> Pass3 complete
>> Starting pass4
>> Link count inconsistent for inode 1116 - 41 40
>> Link count updated for inode 1116
>> Pass4 complete
>> Starting pass5
>> ondisk and fsck bitmaps differ at block 1122
>> Succeeded.
>> RG #1 free count inconsistent: is 62941 should be 63068
>> RG #1 used inode count inconsistent: is 1578 should be 1577
>> RG #1 free meta count inconsistent: is 155 should be 29
>> Resource group counts updated
>> Converting 129 unused metadata blocks to free data blocks...
>> Converting 128 unused metadata blocks to free data blocks...
>> Converting 17 unused metadata blocks to free data blocks...
>> Converting 22 unused metadata blocks to free data blocks...
>> Converting 129 unused metadata blocks to free data blocks...
>> Converting 128 unused metadata blocks to free data blocks...
>> Converting 126 unused metadata blocks to free data blocks...
>> Converting 129 unused metadata blocks to free data blocks...
>> Converting 129 unused metadata blocks to free data blocks...
>> Converting 129 unused metadata blocks to free data blocks...
>> Pass5 complete
>> Writing changes to disk
>
> sadly on mailboxes there is no hope ... fsck just barfs in pass 1.
>
> is there a way to repair or salvage data from it?
>
Hi Ivan,

The messages like:

Block #28377265 (0x1b100b1) (4 of 5) is neither GFS_METATYPE_RB nor 
GFS_METATYPE_RG.

indicate that one of the resource groups--AKA "RG" (an internal GFS data
structure, not to be confused with rgmanager's "resource groups") was
somehow damaged, and that's not good.

The first question, to get you back into production again, is:
What version of the cluster software are you running?  The latest version
(e.g. RHEL4U4 or the RHEL4 branch of cvs) of gfs_fsck can usually repair
most damaged RGs and RG indexes.  Old versions cannot.  I think that
RHEL4U3 might have these changes as well.  I hope that updating your
nodes to the latest gfs_fsck can fix your file system and repair the damage.
If the latest version of gfs_fsck cannot repair it, you may want to run
gfs_fsck -vv for about a half minute, redirecting the output to a file and
send the file to me or post it into a new bugzilla.  Perhaps I can 
figure out
why the damage isn't repaired.  Please try the latest version first though.

The second question is how you got the corruption in the first place.
In theory, a power out shouldn't cause this kind of damage unless the
underlying hardware was somehow damaged and it's no longer able to read
those blocks.  Or unless someone was doing something "bad" on your SAN
(like running gfs_fsck while it was still mounted).  Barring those hardware
and procedural problems, the journals should have been replayed the next
time the file system was mounted, and that should have kept your file
system sane and happy, without any RG damage.  If you can tell me how
to corrupt a file system in this way, I'd be excited to hear how. 

BTW, before we release any versions of GFS or fixes to it, we normally
put it through many days of "living hell" recovery tests we affectionately
call "revolver" (because various combinations of cluster nodes get "shot"
and GFS is forced to pick up the pieces).  So this kind of damage is
extremely rare; I've only seen it two or three times before.

Regards,

Bob Peterson
Red Hat Cluster Suite