ext3 fs errors 3T fs

Sun Jan 22 19:25:25 UTC 2006

On Jan 20, 2006  23:07 -0800, Dennis Williams wrote:
> After the fsck finished this evening there were no final statements
> refering to problems.  I remounted the filesystem without any errors.
> After noticing that there were a number of files missing, I started to
> attempt to recover from the lost+found directory.  I was repeatedly able
> to get the the filesystem to error and remount read only when find
> traversed a specific directory in lost+found.  This is the error message I
> recieved from /var/log/messages:
> 
> Jan 21 16:00:26 terrorbytes kernel: EXT3-fs error (device md0):
> ext3_readdir: bad entry in directory #73117155: directory entry across
> blocks - offset=0, inode=0, rec_len=8196, name_len=84
> Jan 21 16:00:26 terrorbytes kernel: Aborting journal on device md0.
> Jan 21 16:00:26 terrorbytes kernel: ext3_abort called.
> Jan 21 16:00:26 terrorbytes kernel: EXT3-fs abort (device md0):
> ext3_journal_start: Detected aborted journal
> Jan 21 16:00:26 terrorbytes kernel: Remounting filesystem read-only
> 
> 1) Can someone explain what this means, and or why it might happen?
> 2) Why this condition might exist even after a succesfull fsck?

In case it wasn't clear before (I thought it was) you are having problems
because this fs is > 2TB.  Why, I'm not sure - it may relate to LVM/MD,
it may be the block layer, or it may be an ext3 bug.  The fact that it is
at 2TB makes it seem like a block layer bug or lower.

I would start by making a backup if you haven't already.

I think debugging it would be easiest if you had a backup and were
willing to overwrite the device with a test pattern.

If you can isolate the corruptionto a single file or dir, you may get some
insight into the problem by running filefrag on it (or "stat {path}" in
debugfs.

> I am planning on running a fsck yet again.

Won't prevent problems from recurring.

> 
> Sincerely,
> Dennis Williams
> 
> On Fri, 20 Jan 2006, Dennis Williams wrote:
> 
> >
> > > > The system has now been corecting errors for the past 12 hours.  I hope
> > > > when it finishes, it will mount without complaints.
> > >
> > > Never belive fsck here. It may check heavy corrupted filesystems serval DAYS.
> > > For me (corrupted 120 Gb ext3 partition) "fsck.ext3 -y" work 3 days before i
> > > interrupt it. In manual mode, avoid 'duplicate inode clone' and answer yes to
> > > 'delete file' - only 30 minutes.
> > >
> >
> > Just out of morbid curiosity what does 'duplicate inode clone' mean?  And
> > how does the fs get in that state?
> >
> > The fsck finished this morning with the following final statements:
> >
> > /dev/md0: ***** FILE SYSTEM WAS MODIFIED *****
> >
> > /dev/md0: ********** WARNING: Filesystem still has errors **********
> >
> > /dev/md0: 1472505/403685856 files (10.3% non-contiguous),
> > 673983041/805797888 blocks
> >
> > 1) Why would the fs still have errors?  Is it correct to assume that
> > running fsck again is the answer? (I hope so)
> >
> > 2) What does the last line of this message mean?
> >
> > I did notice that the fs mounted correctly after this with the following
> > errors in /var/log/messages:
> >
> > Jan 21 02:09:48 terrorbytes kernel: kjournald starting.  Commit interval 5
> > seconds
> > Jan 21 02:09:48 terrorbytes kernel: EXT3-fs warning (device md0):
> > ext3_clear_journal_err: Filesystem error recorded from previous mount: IO
> > failure
> > Jan 21 02:09:48 terrorbytes kernel: EXT3-fs warning (device md0):
> > ext3_clear_journal_err: Marking fs in need of filesystem check.
> > Jan 21 02:09:48 terrorbytes kernel: EXT3-fs warning: mounting unchecked
> > fs, running e2fsck is recommended
> > Jan 21 02:09:48 terrorbytes kernel: EXT3 FS on md0, internal journal
> > Jan 21 02:09:48 terrorbytes kernel: EXT3-fs: mounted filesystem with
> > ordered data mode.
> >
> > after unmounting the filesystem, I ran a standard fsck again:
> > terrorbytes:~ # e2fsck /dev/md0
> > e2fsck 1.34 (25-Jul-2003)
> > /dev/md0 contains a file system with errors, check forced.
> > Pass 1: Checking inodes, blocks, and sizes
> >
> > Thank you to everyone who has responded to my posts with thier
> > suggestions.
> >
> > Sincerely,
> > Dennison Williams
> >
> > _______________________________________________
> > Ext3-users mailing list
> > Ext3-users at redhat.com
> > https://www.redhat.com/mailman/listinfo/ext3-users
> >
> 
> _______________________________________________
> Ext3-users mailing list
> Ext3-users at redhat.com
> https://www.redhat.com/mailman/listinfo/ext3-users

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.