RAID 5 Multiple Hard-drives failure

jdow jdow at earthlink.net
Wed Mar 15 01:00:25 UTC 2006


From: "Reuben D. Budiardja" <techlist at pathfinder.phys.utk.edu>

> On Tuesday 14 March 2006 08:37, Mariano López Reta wrote:
>> On Tue, 2006-03-14 at 08:21 -0500, Reuben D. Budiardja wrote:
>> > So my questions having said all that, is there any thing else other than
>> > a real hard-drive problem that would cause something like this ?
>> > In other words, could the problem be in the controller, motherboard, etc
>> > other than the hard drive itself that would cause hard-drives to fail
>> > like that ? Or is it just Maxtor makes bad drives ?
>> > Or is a consumer level hard-drive just cannot be used for this kind of
>> > work
>> >
>> >
>> >From my experience on systems' admin (almost 20 years now), is that this
>>
>> kind of failure might be to some of this issues:
>>
>> - electrical problems: Have seen many drives failures due to
>> inappropiate grounding of the electrical installation, lightning coming
>> thru the electrical network, etc. This is the most important
>> (underestimated) cause of failure in my case.
>
> OK. I'll try to check for this but I am not sure if there's really anything I
> can do about this, since this machine is on university campus using
> university electrical installation. The machine is already on UPS (with
> proper shutdown in the event of electrical outage, etc).
>
>> - controller (in this case, motherboard) failure: sometimes is not the
>> drive that fails, but the controller, rendering the disks useless.
>
> I am suspecting this too. But something I was not sure, could a bad controller
> render a drive to be physically damaged / bad as well ? My controller is on
> PCI card, any way to determine if they are bad ? or is the motherboard ? The
> system drive that plugs directly to the motherboard (not part of the RAID
> array) has never had any problem.

Yes. I had a defective Promise board that for some reason would render
two drives bad at precisely the same location. I thought it was a Maxtor
problem. It's problem report software showed the failure. I reformatted
the drive and the failure disappeared. But I got two drives replaced
anyway. (I had a spare I was able to put in to keep the array barely
functional and cross shipped.) Not long after that two drives showed
the exact same error and location. So I ran that way for some months
until I started to have other problems. I ended up cross shipping boards
with Promise. The new board came up fine with absolutely no traceable
error reports. I did use powermax to reformat the drives that had "gone
bad". They're in there perking along just fine.

So yes, the controllers can cause problems. And they can be devilish to
track down. (Like the initial Adaptec 2940's defect that would randomly
overwrite one byte on IDE transfers about once a megabyte of writing. Boy
howdy that one was tough to track down. New Adaptec firmware fixed it.)

{^_^} 




More information about the fedora-list mailing list