[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: Problems with ext3 fs



On Thu, 2002-04-25 at 13:44, Andreas Dilger wrote:
> On Apr 25, 2002  09:14 -0400, Darrell Michaud wrote:
> > There are three data partitions on each drive, each of which are
> > mirrored. (/boot, /, and /home). Over time, depending on overall disk
> > use and NOT on use on a particular filesystem, the / filesystem becomes
> > corrupt. Strangely enough, I can run bonnie, dd tests, copies, etc all
> > day on the /home and /boot ext3 filesystem and they have never become
> > corrupt- only the / partition does.
> >
> > I have a lot of data points for this behavior.. 8 of these machines, all
> > identical in configuration, exhibit the same symptoms.
> 
> Hmm, that does seem ominous.  Did you ever notice if your corruption is
> actually _related_ to the use of e2fsck on the root partition and/or
> crashes, or does it get corrupted even after normal usage?
> 
> Have you checked the l-k archives for any possible DMA/IDE problems on
> your chipset?  Have you tried booting with "ide=nodma" as a kernel
> option to see if that helps?
> 

I did not spend too much time dumbing down the DMA, mostly because the
consequences of that, even if it worked, would be unacceptable. 

I did try using a "normal" partition instead of the software raid,
however, and that has resolved the corruption completely. I'm still
hoping that there's a workaround that would allow the use of software
raid with these drives on this chipset (i860) with ext3, but for the 
time being a non-raid setup is ok.

I don't *think* that the corruption was related to the use of fsck,
because I only started checking the other 7 systems after extensively
testing one. There's always the chance that it could be a cruel truth,
however :P

> > What's aggravating my problem is that for some reason the root
> > filesystem is only fsck'd on boot when a power-off event occurs.
> 
> Well, e2fsck _should_ run all the time, but it will normally report
> a clean filesystem and continue.  If there was a crash it will normally
> report something like "journal recovered" and then clean filesystem.
> It may be that your startup scripts are too "nice" and hide the output
> from fsck for you.

Yes, you're right. e2fsck does run all the time and "recover the
journal" without mention of any problems. However, if I boot from a
different raid/ext3-aware boot/root source and perform a full, manual
fsck on the root filesystem it will tell me that there are errors that
need to be repaired. The amount and severity of the errors grow with
time, but are not detected by the "journal" fsck.  

> 
> > If I manually set needs_** with hdparm it is ignored (or possibly reset
> > upon a clean shutdown). I posted these symptoms last month in hopes that
> > someone had seem them before. I got some hints to check my /etc/fstab
> > file to make sure / gets fsck'd, but that was ok.
> 
> hdparm?  You can tell e2fsck to run a full fsck on each boot in several
> ways:
> 1) create a /forcefsck file (you may have to do this on each boot)
> 2) use "tune2fs -c 1 <dev>" to force an fsck each mount
> 3) create a file /fsckoptions with "-f" in it
> 4) create a file /etc/sysconfig/autofsck with "AUTOFSCK_DEF_CHECK=yes" in it

You're right.. I meant to say tune2fs and not hdparm. There's a
RedHat-ism that gives you the option of performing a manual, full fsck
instead of simply recovering the journal. However, I've only been able
to get this feature to activate on an unclean shutdown regardless of
whether or not the filesystem is flagged to need recovery. I'll try the
options that you mention below and see if that does the trick.  

> 
> The #3 and #4 options may be RedHat specific.
> 
> > > md1 : active raid5 ide/host2/bus1/target0/lun0/part1[2] 
> 
> Both of you are using MD RAID.  Is there a possibility to disable MD
> raid on the root device and see if this fixes things?  This is obviously
> a lot easier to do on the mirrored root filesystem (change back to using
> one of the raw devices instead of the MD device, and disable MD for that
> device, including RAID autostart where you need to change the partition
> type).

This was going to be my next troubleshooting step.. Wish me luck :P


> 
> Cheers, Andreas
> --
> Andreas Dilger
> http://www-mddsp.enel.ucalgary.ca/People/adilger/
> http://sourceforge.net/projects/ext2resize/






[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]