Catastrophic disk failure, where was smartd?

Thu Mar 27 17:14:58 UTC 2008

Les Mikesell wrote:
> Roger Heflin wrote:
>>
>>>> The big issue is that most of the smart implementations don't scan 
>>>> the disk for bad blocks, and in my experience several years ago with 
>>>> a 1000+ disks in services was that the #1 failure was bad blocks, 
>>>> and smart did little to catch that.    The #2 failure was failure to 
>>>> spin up at all, but this seemed to be confined to certain batches.
>>>
>>> Isn't that what the long surface scan test is supposed to do?
>>>
>>
>> Probably.   I started using dd test before disks and Linux and other 
>> oses supported smart.   It works on any disk (or array) whether smart 
>> works or not.
> 
> That only catches 'hard' errors.  Modern drives have spare sectors and 
> the ability to remap soft errors internally, up to a point, before the 
> OS knows anything about them.  If the OS (or dd) sees an error, it means 
> you've used up the spares or the internal retries weren't able to fix 
> it.  The smart interface is supposed to let you know far along you are 
> in using up the internal correction and how often soft errors are hidden 
> by the retries.  It seems good in theory, and if it predicts the drive 
> is going bad you should probably believe it.  But, I think a lot of 
> drives fail faster than the internal corrections can handle so you often 
> don't get any warning.
> 
Yes, but dd will force the drive electronics to read the partially bad sector(s) 
and then those same driver electronics will remap it if it decides it is bad 
enough, you may not see an error, but the dd will have made the drive check the 
sector.  If you get a hard error, yes you are in trouble.

                              Roger