FC4 kernel performance

Wed Jun 22 14:23:22 UTC 2005

On 6/22/05, Chris Adams <cmadams at hiwaay.net> wrote:
> Just FYI: this is not a problem exclusive to Linux software RAID.  I
> have seen similar behavior out of LSI MegaRAID cards as well (and I
> think other hardware RAID controllers work in a similar fashion).
> 
> Most things consider a bad sector a sign of a bad drive.  On today's
> drives, where bad sectors are remapped internally to the drive, by the
> time you see a bad sector, the drive has remapped a bunch of sectors
> (and may be out of spare space).

Well we need to distinguish between hard read errors and hard write errors.

Because of relocation a hard write error is a sign of a failing drive
(although with smart it shouldn't be your first clue).  A read error
can be the result of a sector that has just gone unrecoverable, the
drive can't relocate because the data is already lost. Such sectors
are displayed by smart as pending relocation and are only relocated
after they are rewritten.  After the write the drive works fine.

I've found that on my large ATA disks that if I perform a weekly smart
extensive scan that I don't get pending sectors as often, and when I
do I can track them down and write something there before the raid
code finds them. I'm not sure if the drive is just detecting weak
sectors and rewriting them or if it just relocates (the smart counters
don't indicate anything). Still they do happen from time to time, and
it's often enough that on an older 6 disk raid 5 that resyncing always
kicks out two disks unless I'm careful to make sure that there are no
pending sectors.

This can be addressed by attempting a rewrite of any unreadable
sector... I suspect that's what the 3ware cards do, but I don't have
any real evidence of that... they do seem much less likely to kick out
drives than the software raid.

> Some type of "journalling RAID" would be a possible solution (and would
> also allow for much faster re-syncs on unclean shutdown, as only the
> last written blocks would need updating).

Right that's useful for many reasons, ... it's easier for an uncleanly
shutdown software raid to make it's way to a desynced status than the
hardware controllers. (though I'm not entirely sure why, we're pretty
aggressive about flushing and setting a synced flag).. But this will
be less then completely trivial to implement, especially since the
journal should be persistent, which probably means a storage format
change.