Reproducible Filesystem Corruption on FC4 (Long)

John Wendel jwendel10 at comcast.net
Thu Jun 30 04:40:37 UTC 2005


Tom Sightler wrote:
> On Wed, 2005-06-29 at 20:19 -0300, Ben Steeves wrote: 
> 
>>On 6/29/05, Tom Sightler <ttsig at tuxyturvy.com> wrote:
>>
>>>I decided to reinstall and try again.  This time, immediately after the
>>>install I ran fsck and found no errors.  I copied my directories from my
>>>backup again, and the corruption also returned.  I repeated again, this
>>>time I booted with ide=nodma before restoring my backup, this caused the
>>>restore to take so long that I wasn't sure it would ever finish.  I did
>>>not get corruption, but the system was far to slow to use with this
>>>option.
>>
>>This really, really sounds like a hardware problem.  I would check
>>your /var/log/messages and see what smartctl has to say about your
>>drives.  I'd also check the status of the drivers for your USB
>>controller chipset, since if it is a software bug, that's probably
>>where the problem lies.
> 
> 
> I would agree that it sounds that way, but I simply don't think this is
> the case.  For one thing, if it were a hardware problem, the system
> wouldn't work with CentOS 4 or FC3 either, but both of those install and
> run fine.  I use this system 12-14 hours a day with CentOS 4 and have
> never experienced a single glitch.
> 
> There were absolutely no errors in /var/log/messages or in dmesg in
> regards to the hardware, everything appeared to be working 100%
> correctly, it just silently corrupted the data, time and time again.
> 
> I reinstalled CentOS 4, performed the identical steps, and everything
> works perfectly.  I can also install FC3 and perform the steps without
> issues, however, with FC4 I get silent corruption everytime I restore my
> data from the USB device.
> 
> I suppose it's possible to be some issue with reading from the USB
> drive.  I found some notes claiming that recent improvement in usb-
> storage driver push the hardware harder and can sometimes expose USB
> chipset problems that previously were hidden.  I could possibly buy
> this, but even if the source drive is corrupt, that shouldn't corrupt
> the drive your writing too, and in my case it's the internal IDE drive
> that's being corrupted.  I can absolutely hammer this drive for days
> with CentOS 4 without even a slight glitch and zero corruption.
> 
> I'm going to try tonight by installing FC4 and then replacing the kernel
> before doing the restore, that should give me a good clue.
> 
> Thanks,
> Tom
> 
> 

If you're in the mood for kernel swapping, I suspect that the kernel 
developers would like to know if this bug exists in the latest 
kernel.org kernel (2.6.13.rc1, I think). I think the USB maintainer 
(Greg KH) just committed a giant load of patches, so it would be a good 
test.

If you can reproduce the bug with the latest kernel, report it to 
"greg at kroah.com" and "linux-kernel at vger.kernel.org". They need to know 
the details of the USB host controller and the interface chip in the 
external disk enclosure.

Thanks for helping making Linux better!

John




More information about the fedora-list mailing list