Kernel bug or disk failure

Sam Varshavchik mrsam at courier-mta.com
Sun Jul 13 14:51:14 UTC 2008


Chris Snook writes:

> Sam Varshavchik wrote:
>> Every other week or so, I get a disk kicked out of my RAID, with this:
>> 
>> Jul  6 04:05:38 commodore kernel: (scsi1:A:0:0): scsi1: device overrun 
>> (status 10) on 0:0:0
>> Jul  6 04:05:38 commodore kernel: Unexpected busfree in DT Data-in 
>> phase, 1 SCBs aborted, PRGMCNT == 0x22f
>> Jul  6 04:05:38 commodore kernel: >>>>>>>>>>>>>>>>>> Dump Card State 
>> Begins <<<<<<<<<<<<<<<<<
>> Jul  6 04:05:38 commodore kernel: scsi1: Dumping Card State at program 
>> address 0x22d Mode 0x22
>> Jul  6 04:05:38 commodore kernel: Card was paused
>> 
>> … followed by a rather dry dump of the HBA's registers. This is aic79xxx.
>> 
>> This does not look like a disk error to me. I re-add the drive into the 
>> array, and rebuild with no downtime. SMART shows 0 in the defect list on 
>> this drive, and over the disk's lifetime 0 uncorrectable reads and 1 
>> uncorrectable write -- but this kernel barf already happened 4-5 times 
>> now, and it's getting rather annoying.
>> 
> 
> Looks more like a controller problem than a drive problem.  Do you have a spare 
> HBA to test?

No, but I have one on order, now. I reseated the cable, that didn't help -- 
the card dumped again about 12 hours later, but it was, apparently, 
non-fatal because RAID did not degrade.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/fedora-list/attachments/20080713/8952de61/attachment-0001.sig>


More information about the fedora-list mailing list