[Linux-cluster] Re: gfs_fsck problems (not doing get_get_meta_buffer)

Mon May 15 10:46:22 UTC 2006

Having looked into this a bit, it appears that gfs_fsck doesn't like
large drives.

It works fine on a 137Gb drive but fails instantly with the below
symptoms on a 10Tb RAID.

Is it still the case that GFS is not scalable to very large filesystems?

Stephen

Stephen Willey wrote:
> gfs_fsck seems to break my filesystem!
> 
> Here's the sequence of events (everything acts as expected unless I
> state otherwise):
> 
> pvcreate /dev/sda; pvcreate /dev/sdb
> vgcreate gfs_vg /dev/sda /dev/sdb
> vgdisplay
> lvcreate -l 4171379 gfs_vg -n gfs_lv (the extents number obviously
> gleaned from vgdisplay)
> vgchange -aly
> gfs_mkfs -p lock_dlm -t mycluster:gfs1 -j 8 /dev/gfs_vg/gfs_lv
> 
> mount -t gfs /dev/gfs_vg/gfs_lv /mnt/disk2
> df -h /mnt/disk2
> cd /mnt/disk2
> touch 1 2 3 4 5 6 7 8 9 10
> ls -lh
> 
> cd ..
> umount /mnt/disk2
> gfs_fsck -nvv /dev/gfs_vg/gfs_lv (output below - notice I'm running it
> read-only)
> 
> Initializing fsck
> Initializing lists...
> Initializing special inodes...
> (file.c:45)     readi:  Offset (640) is >= the file size (640).
> (super.c:208)   8 journals found.
> ATTENTION -- not doing gfs_get_meta_buffer...
> 
> mount -t gfs /dev/gfs_vg/gfs_lv /mnt/disk2
> cd /mnt/disk2 (successful)
> ls -lh (successful)
> 
> cd ..
> umount /mnt/disk2
> gfs_fsck -vv /dev/gfs_vg/gfs_lv (output below)
> 
> Initializing fsck
> Initializing lists...
> (bio.c:140)     Writing to 65536 - 16 4096
> Initializing special inodes...
> (file.c:45)     readi:  Offset (640) is >= the file size (640).
> (super.c:208)   8 journals found.
> ATTENTION -- not doing gfs_get_meta_buffer...
> 
> mount -t gfs /dev/gfs_vg/gfs_lv /mnt/disk2 (output below)
> 
> mount: No such file or directory
> 
> The syslog shows:
> 
> Lock_Harness 2.6.9-34.R5.2 (built May 11 2006 14:15:58) installed
> May 11 15:12:43 gfstest1 kernel: GFS 2.6.9-34.R5.2 (built May 11 2006
> 14:16:10) installed
> May 11 15:12:43 gfstest1 kernel: GFS: Trying to join cluster "fsck_dlm",
> "mycluster:gfs1"
> May 11 15:12:43 gfstest1 kernel: lock_harness:  can't find protocol fsck_dlm
> May 11 15:12:43 gfstest1 kernel: GFS: can't mount proto = fsck_dlm,
> table = mycluster:gfs1, hostdata =
> May 11 15:12:43 gfstest1 mount: mount: No such file or directory
> May 11 15:12:43 gfstest1 gfs: Mounting GFS filesystems:  failed
> 
> If I use the following to change the lock method, I can mount it again:
> 
> gfs_tool sb /dev/gfs_vg/gfs_lv proto lock_dlm
> 
> but shortly after I'll sometimes get I/O errors on the drive not letting
> me cd into it or ls or df.
> 
> fsck isn't supposed to break clean filesystems so does anyone have any
> ideas?
> 
> FYI - The other machines in the cluster were at no point mounting the
> filesystem during this exercise.
> 
> Stephen
>