[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Fwd: F7: Howto monitoring a Hardware sata raid controller]



Tim:
>> As far as I can tell, you can get SMART to check drives, but it offers
>> nothing to fix them up.

Tony Nelson:
> SMART's automatic offline test will scan the drive's surface every 4
> hours, correcting soft errors and remapping any bad sectors that could
> still be read.

Well, the manual off-line test didn't do anything helpful about
correcting errors when I tried it.  It discovered some, but that was
all.

> However, the main goal of SMART is to give about a day's warning of
> impending drive failure, so if SMART has anything to say it's usually
> rather late to do much repair, but rather time to make one last
> backup.

I tend to agree.  However, I've found SMART complain about a drive that
nothing else had a problem with.  I've had a dodgy drive that SMART
didn't warn about.  I've had a dodgy drive that SMART did warn about,
yet also said it passed the health checks.  :-\  I don't have much faith
in it anymore.

I'd been giving a few duff drives a while back.  Things like that are
always useful to experiment with.  I did find that using dd if=/dev/zero
of=/dev/hdb to write to all sectors on the drive caused the drive to
sort itself out.  Afterwards it passed the SMART check with flying
colours.

The drives have, since, also been reliable in two test boxes with FC4, 6
& 7 over several months.  Before the zeroing, they were continuously
getting errors like the one mentioned below, and (as expected) mangling
file contents.

  hdb: drive_cmd: error=0x04 { DriveStat ...:  17 Time(s)
  hdb: drive_cmd: status=0x51 { DriveReady SeekComplete Error } ...:  17 Time(s)

The first zeroing out resulted in slightly less sectors *working* than
were tried.  The second zeroing used the same number as *worked* in the
first time.  i.e. Bad blocks had been mapped out, the drive was slightly
smaller (now), and that portion of the drive was okay for use.

Before someone points out the obvious, these are test/play boxes.  I'd
be loathe to trust doing this for drives with data I wanted to keep.

-- 
[tim bigblack ~]$ rm -rfd /*^H^H^H^H^H^H^H^H^H^Huname -ipr
2.6.21-1.3228.fc7 i686 i386

Using FC 4, 5, 6 & 7, plus CentOS 5.  Today, it's FC7.

Don't send private replies to my address, the mailbox is ignored.
I read messages from the public lists.




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]