Mirror Mirror

Rick Stevens rstevens at vitalstream.com
Thu Apr 14 21:35:29 UTC 2005


Scott Mertens wrote:
> Scott Mertens wrote:
> 
>>With these SATA drives as cheap as they are I am thinking about just 
>>purchasing another 200 GIG HD and setting up a mirror in Linux.
>>
>>
>>My thinking is that if one drives fails, I can simply break the 
>>mirror, switch SATA hard drive cables - re-boot and away I go.
>>
>>I am using
>>Red Hat Enterprise Linux ES release 4 (Nahant)
>>
>>Can anyone tel me if this is possible, or do I have my head way up my 
>>???  Is this something Linux can do out of the box, or is there some 
>>add-on's I need to accomplish.
> 
> 
>>It is certainly doable by either hardware or software.  You're talking
> 
> about RAID-1 (disk mirroring).  Most SATA >controllers are capable of doing
> it, but usually need the help of the OS to accomplish it.
> 
> 
>>Linux has software RAID (called "md" for "multiple devices").  grub has
> 
> support for booting software RAID 
> 
>>>>>>volumes. I'd recommend you go off and check out the Software RAID HOWTO
> 
> at:
> 
> 
> Thanks Rick, good advice.  I'm wondering if this might be a better way, as
> far as getting back up and running quicker and easier than say rsync. Drive
> space is pretty inexpensive.
> 
> Any thoughts on ease of use/setup of either method?

rsync will synchronize files, but it won't synchronize your MBR.  If
your primary drive croaks, you won't have a bootable system on your
backup drive.  rsync was really intended to distribute things like FTP
repositories, websites and things like that.  You've heard of FTP
mirrors, right?  That's done by rsync.

RAID-1 is the cheapest fault-tolerant system around.  Every time the
primary drive gets written to, it is duplicated on the mirror drive.
If the primary goes teats up (pardon the phrase), the system fails over
to the mirror drive and your system stays up.

When you replace the failed drive, however, you have to tell the system
to rebuild the array so the new drive reflects the operating drive.
This takes time to do and the system will run slower while it occurs.
Some systems won't let you even make the RAID array writable until the
rebuild occurs.  In other words, the machine will continue to run
despite a disk failure.  However, when you replace the drive and bring
the machine back up, it may not come up fully until the RAID has been
rebuilt.

There are inherent dangers in any software RAID.  If something corrupts
the RAID software (bad RAM, some malignant application, a CPU overheat,
etc.), then the RAID is vulnerable.  In our operations, critical systems
use hardware RAID, but you need a special controller to do it.  The SATA
controllers on most motherboards, while claiming to be hardware RAID,
are in reality somewhat akin to "winmodems" in that the hardware is
capable of doing a lot of the RAID-1 stuff but needs some assistance
from the operating system to do the whole thing.

Understand that what we do is WAY above the average person does.
Because of the nature of our business, we use multiple redundancy.  All
of our critical systems are load balanced (there's at least two machines
that do the job) and all of them run hardware SCSI RAID controllers and
run them in a RAID-5 format (minimum of four disks--three for the RAID
and a hot spare).

If we lose a drive, the controller brings up the hot spare and rebuilds
the RAID on-the-fly. The system stays up--even while the RAID is being
rebuilt.  We can then replace the failed drive and the new drive becomes
the hot spare.  Note that the Adaptec 2100S (a fine hardware RAID
controller) costs about $400.  A 73GB SCSI drive probably costs about
$300, so one of our systems has about $1600 invested in its disk array.
However, barring catastrophic issues, they're pretty robust.  I've yet
to have one of the 120 or so of those systems go down in five years due
to disk problems.  CPUs, yes.  Power supplies, yes.  Memory, yes.
Disks, no.

If a machine dies because of something else (power supply, etc.), the
load balancer routes traffic to the surviving machine that does the same
job.  It takes a lot to kill off a service we offer!  BTW, this sort of
thing is technically called "RAIM" (redundant array of inexpensive
machines).  We call it "LOCH" (lots of cheap hardware).

All that being said, for what you (and the majority of home users) want
to do, a software RAID-1 is peachy and will work very well.

NOTE:  A RAID system is NOT a replacement for devising (and using) a
good backup strategy.  RAID helps your machine survive hardware faults.
Backups will save your sanity--even if it's something as stupid as "Oh,
sh*t!  I didn't mean to delete THAT file!"
----------------------------------------------------------------------
- Rick Stevens, Senior Systems Engineer     rstevens at vitalstream.com -
- VitalStream, Inc.                       http://www.vitalstream.com -
-                                                                    -
-      On a scale of 1 to 10 I'd say...  oh, somewhere in there.     -
----------------------------------------------------------------------




More information about the Redhat-install-list mailing list