Opinions on new Fedora Core 2 install with LVM 2 and snapshots?

Bill Rugolsky Jr. brugolsky at telemetry-investments.com
Mon Jul 26 13:58:37 UTC 2004


On Fri, Jul 23, 2004 at 04:27:20PM -0400, Bryan J. Smith wrote:
> I have a client with an age-old RHL 6.2 system (specs below).
> I'm considering replacing the storage array (specs below)
> and moving the system to Fedora Core 2 with LVM 2.
> 
> - LVM2+Snapshots, close to what NetApp had 4 years ago?
> 
> I'd really like to take advantage of snapshots, both for
> backup and accidental file deletion purposes.  They are used
> to NetApp filers, with the ability to restore files by
> mounting the snapshot filesystem.  But cost is everything
> now.  How good is LVM2 at this in comparison to where NetApp
> was 4 years ago?
 
There are fundamental differences between what a NetApp filer is
doing, and what LVM2 snapshots provide.  In particular, when using LVM2
snapshots, kcopyd has to constantly move blocks from your filesystem LV
to the snapshot LV.  Device Mapper is much more sensible and efficient
at this than LVM1, but it is still non-trivial overhead, and ends up
generating a lot of mixed read/write traffic.  We are currently using
NFS/Ext3/LVM2/MD on a 2.6.8-rc1 kernel as our backup NFS server, and
initial testing with snapshots under load uncovered some performance
problems that I need to track down. [Snapshots and mirroring were
only recently added to the Device Mapper code in the Linus kernel tree.]
Either grab the most recent kernel from kernel.org, or an FC3 development
kernel, and test extensively.

The NetApp WAFL filesystem encapsulates all meta-data in a tree structure,
and uses persistent copy-on-write multi-rooted trees.  When writing, it
places data wherever it is convenient (i.e., in the free space), and then
adjusts block pointers up toward the root of the tree.  Every few seconds
it checkpoints its state (i.e., takes a snapshot).  [The NetApp also uses
NVRAM to hold state that hasn't been flushed to disk.]  When one wants
to save a snapshot, the filesystem tags it and maintains its allocation
data, instead of releasing stale blocks back into the free pool.

For more info on the NetApp filer filesystem, see the original whitepaper:

	http://www.netapp.com/tech_library/3002.html

Based on what I've read of Reiser4, the design should allow a similar
level of functionality to be incorporated at some point.  Unfortunately,
it is not done yet.

To summarize: LVM2 will do what you want (modulo some tuning and
perhaps bug fixes), but it is not an NetApp.

> With that said, which is better for LVM2, Ext3 or XFS?
> I've always been a closet fan of XFS on Linux with all its
> inherent capabilities, but if Ext3 is better for LVM2 in
> FC2, then I want to stick with Ext3.
 
IIRC, XFS does not do data journaling.  So while it may be much
faster than Ext3, you need to consider data integrity.

> I'm also not against using something other than LVM2 if it
> is better for XFS, as long as it is GPL (I wasn't aware
> anything was other than LVM2, so let me know if I'm
> mistaken).
 
I haven't been following EVMS development, but you might want
to look into the current state of affairs to find out if there
is any functionality there that you need (e.g., badblock handling).

> [ Yes, I know, I'll need to build the 3w-9xxx driver as
> it wasn't included until later FC2 kernel releases.  I'll
> use a "helper ATA disk" to install FC2 and then install
> a newer kernel with the 3w-9xxx driver.  I figured I might
> need to do this for LVM2 anyway (unless the FC2 installer
> has LVM2 all integrated?  I didn't think it did?) ]

LVM2 installs work fine.

Some things you might want to do:

    1. Script some infrastructure to monitor snapshot space usage.

    2. Cron a job to snapshot and fsck the filesystem, so any
       filesystem problems are revealed early.

    3. If using Ext3 with data journaling, specify a large journal when
       creating the filesystem (e.g., mke2fs -j -J size=400 ...).

    4. Tune the filesystem and VM variables: flush time, readahead, etc.

    5. Test whether an external journal in the form of an NVRAM card
       or additional disks would improve performance.  (You can try with
       a ramdisk for test purposes).

Regards,

	Bill Rugolsky





More information about the fedora-list mailing list