disk problems or false alarm??

Guolin Cheng guolin at alexa.com
Fri Apr 30 22:18:09 UTC 2004


Hi, jludwig,

 Thanks.

 Because all our Linux boxes are running at UDMA 5, 365 days a year, 24
hours a day. And also since most of the failed hard drives runs for
about 3 years, I assume they are dying at the first glance.

 So my solution to the failed hard drives is, backup and restore the
data files to a same size new hard disk when the first disk error
messages appears,  then return the failed hard drives to Maxtor.

 Since we have thousands of hard drives, it seems impractical to run
low-level format for failed drives, because it seems that is the work of
Maxtor vendor instead of end users.

 Thanks a lot again for your helpful analysis and suggestions!

 --Guolin Cheng

   


-----Original Message-----
From: jludwig [mailto:wralphie at comcast.net] 
Sent: Friday, April 30, 2004 2:40 PM
To: For users of Fedora Core releases
Subject: RE: disk problems or false alarm??

On Fri, 2004-04-30 at 15:01, Guolin Cheng wrote:
> Hi, jludwig,
> 
>  Thanks for your helpful information.
> 
>  Because I'm running Linux, so I assume there are no viruses. Then
comes
> several questions:
> 
> 1, How can I know whether all the spare sectors are in use and the
disk
> will lose data, or it is just the beginning of disk failure?
> 

There is no real way to know if you are using spare sectors (even new
drives use a few since perfect media is rare) since this is part of the
hard drive system's firmware and happens automatically.

> 2, How I can identify that the hard drive becomes dying at the first
> minute?

Run the smartd daemon < chkconfig smartd on >

> 3, How to identify the malfunctioning hard drives? Should I idle the
> machine and test hard drives one by one to figure it out? Mostly it is
> the faiure-reporting hard drive failed, but I remember for sure, in a
> few cases, other alternative hard drives failed instead. 

The only way to really check a hard drive is a multiple 100% read/write
of
each sector. Needless to say the drive must be taken out of service and
all
data is removed.

> 
> 4, Should I replace hard drives when I first see this kind of disk
error
> messages in case data begin to lose?

When you see this it usually indicates a drive has used up all the
spares.
When you do see this;
1) back up your data
2) watch for another R/W failure
3) Depending on the nature of the drive and system have a new drive
ready
4) Don't assume the drive has failed or lost sectors. I have had drives
that were "thrown out" when all that was really needed was a factory
"low level format" which rechecks all sectors. (This is not a true low
level format which can only be done at the factory or other facility
with the proper equipment). 

>  Thanks a LOT...
> 
> --Guolin Cheng
> 
Snip

-- 
jludwig <wralphie at comcast.net>


-- 
fedora-list mailing list
fedora-list at redhat.com
To unsubscribe: http://www.redhat.com/mailman/listinfo/fedora-list






More information about the fedora-list mailing list