[Linux-cluster] GFS volume hangs on 3 nodes after gfs_grow

Fri Sep 26 14:24:36 UTC 2008

Hi Bob. Thanks for the reply once again.

Yes lvm_test2 is the mount point - but in my explanation of the problem I
used /lvm2 for clarification. I haven't rebooted the node yet, and now that
I try, it hangs. I will have to bring crash cart to the data center and
check it out sometimes today. That is why I haven't been able to check dmesg
for potential DLM problems.

I am rebooting the cluster (not desired outcome but have to bring it to a
stable state), and will post the dmesg afterwards.

On Thu, Sep 25, 2008 at 5:49 PM, Bob Peterson <rpeterso at redhat.com> wrote:

> ----- "Alan A" <alan.zg at gmail.com> wrote:
> | Hello Bob, and thanks for the reply.
> |
> | gfs-utils-0.1.17-1.el5
> (snip)
> | Let me know if you need any additional information. What would be
> | suggested
> | path to recovery. I tried gfs_fsck but I get:
> | Initializing fsck
> | Unable to open device: /lvm_test2
>
> Hi Alan,
>
> The good news is that gfs-utils-0.1.17-1.el5 has the latest fixes for
> the gfs_grow program (as of the time of this post), so in theory
> it should not have caused any corruption.
>
> The fix to gfs_fsck for repairing that kind of corruption does not appear
> until gfs-utils-0.1.18-1.el5, but in theory, the gfs_grow should not
> corrupt the file system, so you shouldn't need the repair code.
>
> Are you sure you're specifying the right device?  I would expect
> something more like /dev/test2_vg/lvm_test2 rather than just /lvm_test2.
> So that doesn't look like you specified a valid device.
>
> If you did specify the correct device, and I'm just not understanding,
> it looks to me like gfs_fsck can't open the SCSI device.  That likely
> means the device is locked up, perhaps by SCSI fencing or something,
> which is why I suggest looking in "dmesg" for kernel problems.
>
> If you got a message in dmesg that looks something like this:
>
> lock_dlm: yourfs: gdlm_lock 2,17 err=-16 cur=3 req=5 lkf=3044 flags=80
>
> then you might be a victim of this bug:
>
> https://bugzilla.redhat.com/show_bug.cgi?id=438268
>
> (I apologise in advance if you can't view this record; that's out
> of my control).  There is a fix for that bug in the kernel's dlm code
> that's scheduled to go in to 5.3, but it has only been released in
> the source code so far.
>
> I don't know much about SCSI fencing or SCSI reservations, but
> maybe booting the whole cluster will free things up.
>
> Regards,
>
> Bob Peterson
> Red Hat Clustering & GFS
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>

-- 
Alan A.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080926/350b84ea/attachment.htm>