SATA - System Freezes

Roger Heflin rogerheflin at gmail.com
Fri Jun 20 17:07:16 UTC 2008


Henry Ritzlmayr wrote:
> Am Donnerstag, den 19.06.2008, 09:52 -0600 schrieb Robin Laing:
>> Henry Ritzlmayr wrote:
>>> Am Dienstag, den 17.06.2008, 13:25 -0400 schrieb Jorge Fábregas:
>>>> Hello Everyone,
>>>>
>>>> I'm running Fedora 8 and my system freezes (for about 20 to 40 seconds) a 
>>>> couple of times a day. When it does I see this on /var/log/messages:
>>>>
>>>> ------------------------------- cut here -------------------------------------
>>>>
>>>> kernel: ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
>>>> kernel: ata3.00: cmd ca/00:50:67:85:03/00:00:00:00:00/e0 tag 0 dma 40960 out
>>>> kernel:          res 40/00:00:76:6c:03/84:00:10:00:00/e0 Emask 0x4 (timeout)
>>>> kernel: ata3.00: status: { DRDY }
>>>> kernel: ata3: port is slow to respond, please be patient (Status 0xd0)
>>>> kernel: ata3: device not ready (errno=-16), forcing hardreset
>>>> kernel: ata3: soft resetting link
>>>> kernel: ata3.00: configured for UDMA/33
>>>> kernel: ata3: EH complete
>>>> kernel: sd 2:0:0:0: [sdc] 321672960 512-byte hardware sectors (164697 MB)
>>>> kernel: sd 2:0:0:0: [sdc] Write Protect is off
>>>> kernel: sd 2:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't 
>>>> support DPO or FUA
>>>>
>>>> ------------------------------- cut here -------------------------------------
>>>>
>>>> /dev/sdc is my main drive. The only thing I can think of...is that this drive 
>>>> is actually a PATA drive connected to the SATA controller on MoBo thru 
>>>> a "SATA-TO-IDE Adapter" that I connect on the drive. Perhaps the converter is 
>>>> faulty...or could this be a known issue with libata?  Anyone had same 
>>>> problem?
>>>>
>>>> Thanks,
>>>> Jorge
>>> Many months ago I had the exact same output. Lots of google voodo and
>>> try and error solved it. My issue was that on one outlet of the power
>>> supply there where to many (3) drives connected. After recabling all
>>> went away. Others claimed that they got rid of the problem be refitting
>>> the sata cables.
>>>
>>> Henry
>>>
>> Henry,
>>
>> I was just about to suggest checking the power supply.  I had a power 
>> supply that wouldn't supply enough voltage on the 5V rail.  My system 
>> would freeze.  Turned out to be a known fault with the brand of 
>> powersupplies.
>>
>> Took two power supplies to find out that it was a known fault.  Argh. 
>> Warranties are useless on some products.  I also learned that the sensor 
>> voltages were not accurate in the BIOS in comparison to a digital 
>> voltmeter on the actual power cable.
>>
>> -- 
>> Robin Laing
> 
> What I didn´t like (still) is the fact that there is no indication, that
> this could be even slightly related to the power supply. As stated above
> it was more a try and error to solve this issue. Hopefully for the OP
> this also solved his issue. 
> 
> Question to the devs - could you think of any way that the kernel output
> could be a bit more informing, or don´t you get enough information from
> the hardware for such an issue. I also checked smart for unusual power
> cycle counts but to no avail. 
> 
> Henry
> 
> 
> 
> 
> 
The problem with power supplies is that often they don't fully fail, if the 
voltage goes low enough things don't completely fail, only some operations will 
fail and some will not, and often things won't notice the PS was low for too 
long, and often they may only fail for the short period of the low voltage and 
be fine the next second, or if the fully fail the OS may still be able to reset 
the device and get it back up, but from the HW's point of view there was never a 
complete power failure.    And none of the normal voltage monitoring devices sit 
there and sample the power voltages over time and verify they were always good 
for the entire time, they only check when someone looks, and all that really 
matters was that for tiny short period of time the voltage was too low, and 
screwed someone up enough to cause trouble.

I have seen a 110V AC outage that resulted in a remote controlled power switch 
switching off all of its relays, but the internal computer running those relays 
reported them all on (it did not reboot, and had no idea the relays internal to 
it were switched off and had no feedback on their position), obviously in this 
case the relays were more sensitive to voltage issues than the computer running 
the relays, likely a design issue were you really want to make sure the computer 
goes off first, or make sure that the computer has actual feedback on the relay 
positions so it knows something went wrong.

I have seen a power supply that was undersized on a certain voltage result in 
the ethernet going offline (kernel reported the ethernet was screwed up-but had 
no idea why and was unable to reset it and get it working again) and required a 
reboot to get ethernet back again, but other than the ethernet going offline 
nothing else looked wrong with the machines, and there were no other failures 
that could be found, and absolutely nothing indicated that there were any 
voltage issues.

                             Roger




More information about the fedora-list mailing list