[Linux-cluster] Unformatting a GFS cluster disk

Thu Mar 27 14:45:39 UTC 2008

Hi Damon,

On Thu, 2008-03-27 at 11:25 +0000, DRand at amnesty.org wrote:
> # gfs2_edit -p rindex /san2/sda.backup |more
> -----------
> RG #1
>  ri_addr               0                   0x0
>  ri_length             0                   0x0
>  ri_data0              0                   0x0
>  ri_data               0                   0x0
>  ri_bitbytes           65827               0x10123

First of all, Wendy is right.  If you can use hardware snaps
to restore the data, you'll be much better off.  If that's not
possible for some reason, keep reading.

The version of gfs2_edit you're using has a bug where it's not
decoding the gfs1 rg structure properly.  I've got a patch to
gfs2_edit that will make this work properly, but it hasn't been
committed to our git tree yet.  I just haven't gotten to it.
If you want, I can email it to you or commit it to the head
branch (a rhel5 branch requires a bugzilla record and paperwork)
but the head branch is designed to compile on an up-stream kernel,
not the RHEL5 kernel.  The patch isn't critical for what you're
doing, but the -p rindex and -p rgs won't work properly for gfs1.

> updated:
> Block #65827    (0x10123)     of 39321600 (0x2580000)  (rsrc grp hdr)
> (p.1 of 6)
> 10123000 01161970 00000002 00000000 00000000 [...p............]
> 10123010 000000C8 00000000 00000000 0000FFF8 [................]
> 10123020 0000FFFF 00000000 00000000 00000000 [................]
> 10123030 00000000 00000000 00000000 00000000 [................]
> 10123040 00000000 00000000 00000000 00000000 [................]
> 10123050 00000000 00000000 00000000 00000000 [................]
> 10123060 00000000 00000000 00000000 00000000 [................]
> 10123070 00000000 00000000 00000000 00000000 [................]
> 10123080 FFFFFFFF FFFFFFFF 00000000 00000000 [................]
> 10123090 00000000 00000000 00000000 00000000 [................]
> 101230A0 00000000 00000000 00000000 00000000 [................] 

You're close, but I would set ALL the bitmap to 0xff, whereas you've
only set the first 8 bytes.  Each byte represents 4 blocks on disk,
so my guess is that there weren't any disk inodes in the 32 blocks
that you made "appear".  Since no disk inodes were found among them,
gfs_fsck decided they were incorrectly marked "in use" and changed
them back for you.  Now if it had a disk inode to associate them with
then gfs_fsck should try to recover them.

All "data" blocks will eventually be 01, so groups of data will appear
as "55555555".  Metadata such as disk inodes will be 11, so a group of
metadata will be "ffffffff".

By setting the whole bitmap to "ff" you're essentially telling gfs_fsck
that every block is metadata.  So it should scan every block looking for
disk inodes and directory entries.  When it finds out that a block is
not metadata, it should later fix it, so it's a data block again.
But before it does that, it should find any disk inodes among them and
gather information about what blocks are associated with each disk
inode.  In that way, gfs_fsck should restore the files.  But you'll
never know where those disk inodes really are, so that's why you just
want to tell gfs_fsck to check them all.

If your big tar/zip file was on the root file system, you'll just
have to do it that way.  If the big tar/zip file was in a directory,
you might be able to use the "f" key to page through the blocks
until you find a directory, then page down to look through the
directory to see if you're big file is there.  That might help you
locate the disk inode, but depending on how big your file system is,
it could take a very long time, and then you'd still have to mark that
particular block (for the disk inode) as "in use" in the bitmap.
You'd have to use math to figure out which byte of which bitmap
that corresponds to.  Easier to just let fsck try them all.

Regards,

Bob Peterson
Red Hat GFS