[Linux-cluster] GFS volume hangs on 3 nodes after gfs_grow
Bob Peterson
rpeterso at redhat.com
Thu Sep 25 22:49:04 UTC 2008
----- "Alan A" <alan.zg at gmail.com> wrote:
| Hello Bob, and thanks for the reply.
|
| gfs-utils-0.1.17-1.el5
(snip)
| Let me know if you need any additional information. What would be
| suggested
| path to recovery. I tried gfs_fsck but I get:
| Initializing fsck
| Unable to open device: /lvm_test2
Hi Alan,
The good news is that gfs-utils-0.1.17-1.el5 has the latest fixes for
the gfs_grow program (as of the time of this post), so in theory
it should not have caused any corruption.
The fix to gfs_fsck for repairing that kind of corruption does not appear
until gfs-utils-0.1.18-1.el5, but in theory, the gfs_grow should not
corrupt the file system, so you shouldn't need the repair code.
Are you sure you're specifying the right device? I would expect
something more like /dev/test2_vg/lvm_test2 rather than just /lvm_test2.
So that doesn't look like you specified a valid device.
If you did specify the correct device, and I'm just not understanding,
it looks to me like gfs_fsck can't open the SCSI device. That likely
means the device is locked up, perhaps by SCSI fencing or something,
which is why I suggest looking in "dmesg" for kernel problems.
If you got a message in dmesg that looks something like this:
lock_dlm: yourfs: gdlm_lock 2,17 err=-16 cur=3 req=5 lkf=3044 flags=80
then you might be a victim of this bug:
https://bugzilla.redhat.com/show_bug.cgi?id=438268
(I apologise in advance if you can't view this record; that's out
of my control). There is a fix for that bug in the kernel's dlm code
that's scheduled to go in to 5.3, but it has only been released in
the source code so far.
I don't know much about SCSI fencing or SCSI reservations, but
maybe booting the whole cluster will free things up.
Regards,
Bob Peterson
Red Hat Clustering & GFS
More information about the Linux-cluster
mailing list