[Linux-cluster] GFS Problem: invalid metadata block

Tue Oct 10 20:09:45 UTC 2006

Matt Eagleson wrote:
> Hello,
>
> I have been evaluating a GFS cluster as an NFS solution and have 
> unfortunately run in to a serious problem which I cannot explain.  
> Both of the GFS filesystems I am exporting became corrupt and unusable.
>
> The system is Redhat AS4 with 2.6.9-42.0.2.ELsmp.  I cannot find 
> anything unusual on the host or the SAN at the time of this error.  
> Nobody was logged in to the nodes.
>
> Can anyone help me understand what is happening here?
>
> Here are the logs:

Hi Matt,

These errors indicate file system corruption on your SAN.  The "bh =" is 
the
block number where the error was detected.  Two of the errors were found
in GFS resource group data ("RG"), which are areas on disk that indicate 
which
blocks on the SAN are allocated and which aren't.  (Not to be confused 
with the
Resource Groups in rgmanager, which is something completely different.) 
The third error is usually reserved for the quota file inode. 

Corruption in the RG information is extremely rare, and may indicate a 
hardware
problem with your SAN.  The fact that both nodes detected problems in 
different
areas is an indication that the problem might be in the SAN itself 
rather than
the motherboards, fibre channel cards or memory of the nodes, although 
that's
still not guaranteed.  Many things can cause data corruption.

I recommend you:

1. Verify the hardware is working properly in all respects.  One way you 
can do this
    is to make a backup of the raw data to another device and verify the 
copy against
    the original without GFS or any of the cluster software in the mix.
    For example, unmount the file system from all nodes in the cluster, 
then do
    something like "dd if=/dev/my_vg/lvol0 of=/mnt/backup/sanbackup" then:
    "diff /dev/my_vg/lvol0 mnt/backup/sanbackup"  (assuming of course that
    /dev/my_vg/lvol0 is the logical volume you have your GFS partition 
on, and
    /mnt/backup/ is some scratch area big enough to hold that much data.)
    The idea here is simply to test that reading from the SAN give you
    the same data twice.  If that works successfully on one node, try it 
on the other node.
2. Once you verify the hardware is working properly, run gfs_fsck on it.
    The latest version of gfs_fsck can repair most GFS rg corruption.
3. If the file system is fixed okay, you should back it up.
4. You may want to do a similar test, only writing data to the SAN, then 
reading it
    back and verifying the results.  Obviously this will destroy the 
data on your SAN
    unless you are careful, so if this is a production machine, please 
take measures
    to protect the data before trying anything like this.
5. If you can read and write to the SAN reliably from both nodes without 
GFS,
    then try using it again with GFS and see if the problem comes back.

Perhaps someone else (the SAN manufacturer?) can recommend hardware
tests you can run to verify the data integrity.

I realize these kinds of tests take a long time to do, but if it's a 
hardware problem,
you really need to know.  There's a outside chance the problem is somewhere
in the GFS core, but I've personally only seen this type of corruption 
once or twice
so I think it's unlikely.  If you can recreate this kind of corruption 
with some kind of
test, please let us know how.

Regards,

Bob Peterson
Red Hat Cluster Suite