[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Linux-cluster] GFS Tunables



Wendy,

We have searched high and low for an alternative to file-to-file backups, especially looking block level backups.  The only product we've found that "supports" GFS is Bak Bone Replicator.  My first crack at installing it was late last week.  The experience was worrisome.  The replicator service inserts a kernel module, which by itself is livable; but in our particular case, we found a changed behavior in error codes the kernel returns for things like non existent files, while this module is loaded.  Ultimately, if the kernel module was the root cause of that behavior (we're still investigating), that's unworkable. 

As for LVM snapshotting ... I am under the impression that those features are unavailable in GFS (and are slated for GFS2?  Which is not "production ready", yet?)  It has certainly occured to me to try that feature, if only it were available.  Am I misinformed?  Perhaps I need some more education on how exactly LVM mirroring will help me.  I am *attempting* to approximate a traditional backup scheme, atleast on this particular filesystem.  Am I correct in believing that I could snapshot a volume (assuming the feature is available) and run a traditional backup (using, say, rdiff-backup) in a shorter time than I can now, where I'm running it straight off a live GFS volume?

--
Brandon

On Thu, Oct 16, 2008 at 10:50 AM, Wendy Cheng <s wendy cheng gmail com> wrote:
Brandon Young wrote:
Hi all,

I currently have a GFS deployment consisting of eight servers and several GFS volumes.  One of my GFS servers is a dedicated backup server with a second replica SAN attached to it through a second HBA.  My approach to backups has been with tools such as rsync and rdiff-backup, run on a nightly basis.  I am having a particular problem with one or two of my filesystems taking a *very* long time to backup.  For example, I have /home living on GFS.  Day-to-day performance is acceptable, but backups are hideously slow.  Every night, I kick off an rdiff-backup of /home from my backup server, which dumps the backup onto an XFS filesystem on the replica SAN.  This backup can take days in some cases.

Not only GFS, the "getdents()" has been more than annoying on many
filesystems if entries count within the directory is high - but, yes,
GFS is particularly bloody slow with its directory read. There have been
efforts contributed by Red Hat POSIX and LIBC folks to have new
standardized light-weight directory operations. Unfortunately I lost
tracks of their progress ... On the other hand, integrating these new
calls into GFS would take time anyway (if they are available) - so
unlikely it can meet your need. There were also few experimental GFS
patches but none of them made into the production code.

Unless other GFS folks can give you more ideas, I think your best bet at
this moment is to think "outside" the box. That is, don't do
file-to-file backup if all possible. Check out other block level backup
strategies. Are Linux LVM mirroring and/or snapshots workable for you ?
Does your SAN vendor provide embedded features (e.g. Netapp SAN box
offers snapshot, snapmirror, syncmirror, etc) ?

-- Wendy


We have done some investigating, and found that it appears that getdents(2) calls (which give the list of filenames present in a directory) are spectacularly slow on GFS, irrespective of the size of the directory in question.  In particular, with 'strace -r', I'm seeing a rate below 100 filenames per second.  The filesystem /home has at least 10 million files in it, which doing the math means 29.5 hours just to do the getdents calls to scan them, which is more than a third of wall-clock time.  And that's before we even start stat'ing.

I google'd around a bit and I can't see any discussion of slow getdents calls under GFS.  Is there any chance we have some sort of tunable turned on/off that might be causing this?  I'm not sure which tunables to consider tweaking, even.  This seems awfully slow, even with sub-optimal locking.  Is there perhaps some tunable I can try tweaking to improve this situation?  Any insights would be much appreciated.

--
Brandon
------------------------------------------------------------------------

--
Linux-cluster mailing list
Linux-cluster redhat com
https://www.redhat.com/mailman/listinfo/linux-cluster



--
Linux-cluster mailing list
Linux-cluster redhat com
https://www.redhat.com/mailman/listinfo/linux-cluster


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]