[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: RAID5 gets a bad rap

Gordon Messmer wrote:
Philip A. Prindeville wrote:

If you're *not* a database weenie, and you're doing usual manly things with your filesystem (like lots of compiles, for instance), you're typically not going to be modifying files in place at all.

That's not quite it. RAID 5 performance suffers because every write requires that the entire block that's being written be read from every drive in the array, parity calculated, and then the data and parity written out. For each block written, the array has to do N reads plus two writes.

No. Even in the worst case it would read N-2 blocks (you are writing a new data block and calculating new parity), and two writes. But normally, writing sequential data, you can wait until you have enough data for an entire stripe at once, read nothing, and write once to each drive. You should be able to do this in parallel, but unlike RAID0 I've never measured it happening. Tuning the "stripe_cache_size" and creating a filesystem using the "stride=" option will help.

It doesn't matter whether you're writing new files or modifying existing files, because all of this happens at the block level. It's especially bad on journalled filesystems, where writing to a file will update the files blocks, plus the filesystem's journal's blocks, and finally the filesystem's blocks.

No again. You read the parity block and the old data block, XOR first the old then the new data with the parity block, and write the new data and parity.

So is it just the database-heads that are maligning RAID5, or are there other performance issues I don't know about?

Most of your comments don't reflect the way RAID 5 actually functions in any way.

Because my empirical experience has always been that when writing large files, RAID5 performs on par with RAID0.

The system on which you were testing was probably limited by other factors, if that was the case. A RAID 0 disk array will be much faster than a RAID 5 array.

RAID 5 tends to be most appropriate when you're trying to get as much disk space as you can with the lowest cost, you won't be running multiple simultaneous jobs on the same disk array, and when you'll be collecting data at a rate that's relatively low. Usually, that's backups. Your network is probably slower than your disk array (unless the array is very large -- array speed decreases with array size), so streaming data in over the network to your disk array won't bog it down. Virtually any interactive workload will benefit from a better disk configuration.

Bill Davidsen <davidsen tmr com>
  "We have more to fear from the bungling of the incompetent than from
the machinations of the wicked."  - from Slashdot

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]