Errors with Soft RAID 5 Array, no disks off line

Bruno Wolff III bruno at wolff.to
Mon Jan 1 08:57:52 UTC 2007


On Sun, Dec 31, 2006 at 21:11:02 -0500,
  vamythguy <vamythguy at gmail.com> wrote:
> 
> 1) Do the different messages imply they are dieing in different ways?

The drive with only 3 bad sectors you may want to continue using, but the one
with 48, you should strongly consider replacing as fast as you can.

> 2) Is there any way to salvage either drive, maybe by quarantining the bad
> sectors or something?

If you write over the bad sectors, the drive will remap them to spare sectors.
This won't happen unless you either get a successful read or write over the
bad sectors.

In theory you might not have lost any data, but because the one drive is
offline you can't be sure you can know what value goes into the bad sectors
of the drive that is still in the array.

You should first try to back up what you have now. Then you can try to figure
out which files are possibly corrupt
(see: http://smartmontools.sourceforge.net/BadBlockHowTo.txt) and decide
if it is worth trying to fix them. You may be able to use the data on the
failed drive to fix the files.

Once you have receovered everything you can, you should run badblocks
on the drives (using a livecd is probably easiest) to try to find other
bad blocks. Then run smartctl -t long on the drives to see how bad things
are. A few reallocated blocks don't necessarily warrant tossing a drive.
You need to balance your budget versus the cost of replacing the data you
might lose and the extra likelyhood of failure suggested by a drive having
reallocated sectors.

For the future, you want to be running smartd so that you are warned about
bad sectors while only one drive has them so that you can repair the array
before there are two drives with errors.




More information about the fedora-list mailing list