[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

RE: [Linux-cluster] gfs_grow




> -----Original Message-----
> From: linux-cluster-bounces redhat com [mailto:linux-cluster-bounces redhat com] On Behalf Of Bob
> Peterson
> Sent: 28 August 2007 15:31
> To: linux clustering
> Subject: Re: [Linux-cluster] gfs_grow
> 
> On Tue, 2007-08-28 at 10:08 +0100, Ben Yarwood wrote:
> > I am using a 3 Node cluster using RHEL4U4.
> >
> > I ran a gfs_grow yesterday on one of our filesystems but stupidly missed a process that was using
> the same file system.  The grow
> > process hung and when I got it to exit, the file system is now reporting as having grown to the
> larger size but no extra space has
> > appeared.  Basically my file system grew from 14TB to 15TB and my usage also grew from 13TB to 14TB.
> >
> > Does anyone know if it's possible to get this space back?  I know I could probably do as gfs_fsck
> but given the size of the file
> > system, this would take a few days according to some previous reports.
> >
> > Thanks
> > Ben
> 
> Hi Ben,
> 
> The fact that there was a process using the file system shouldn't have
> been a problem and gfs_grow should have been able to work around it.
> It would have been interesting to see where gfs_grow was "hung" but it's
> too late for that now.  My guess is that you killed gfs_grow before it
> was able to update the resource group index properly.
> 
> In RHEL4U4 there is a feature to gfs_fsck to change and repair damaged
> RGs and RG indexes.  Things get tricky for the code once the file system
> has been extended though, so although you probably don't want to hear
> this, you should probably make a backup of your data first, just to be
> safe.
> 
> Running gfs_fsck will take a while on a file system that big, but it
> depends on the speed of your hardware.  I'd expect it to take less than
> a day to complete.  If you can't afford the down time, it might be
> helpful to know that the RG repair is done before any of the passes, so
> in theory you could probably try to use it to repair the RGs and then
> kill the gfs_fsck.  Newer versions of gfs_fsck will catch <ctrl-c>
> interrupts and give you options to skip around parts, but I don't think
> that's in RHEL4U4 (I think it got into RHEL4.5).
> 
> So I guess my recommendation is:
> 
> 1. Make a backup of your data
> 2. Wait until most people have gone home for the day
> 3. Unmount the file system from ALL nodes.
> 4. Run gfs_fsck.
> 5. Watch the gfs_fsck output for messages about finding and fixing
>    RG damage just so you know it did something.
> 6. Let gfs_fsck run overnight.
> 7. If you need the file system back and it's still running by morning,
>    you could kill it manually.  It would be better to let it run, but
>    it shouldn't do any harm to kill it prematurely if necessary.
> 8. Remount the file system and see if df shows the correct values.
> 
> I hope this helps.
> 
> Regards,
> 
> Bob Peterson
> Red Hat Cluster Suite
> 
> 
> --
> Linux-cluster mailing list
> Linux-cluster redhat com
> https://www.redhat.com/mailman/listinfo/linux-cluster





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]