[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: possible ext3 bug?



On Sun, Oct 14, 2001 at 10:11:06PM +0200, Michael Renner wrote:
> hiya!
> 
> i'm currently running 2.4.12-ac1 w/o any further ext3 patches.
> 
> i've got a dir with some fairly big files (like 15mb/file) in it and while
> sfv-checking these i got the following errors:
> 
> Oct 14 21:47:59 srck trottelkunde attempt to access beyond end of device
> Oct 14 21:47:59 srck trottelkunde 16:42: rw=0, want=600889688, limit=12289725
> after deleting the whole directory (which was very stupid, no chance for
> debugging at all) i got the following errors:
> 
> Oct 14 21:50:50 srck trottelkunde EXT3-fs error (device ide1(22,66)):
> ext3_free_blocks: Freeing blocks not in datazone - block = 1223964245, count = 1
> 
> Imho this seems to be an ext3 bug because there where no further errors
> regarding a disk-access problem

What these errors all point to is that an indirect block (more
likely), or possibly part of an inode table, got corrupted.  *Why*
that happened is a different question; it could have been caused by a
kernel bug in the buffer/page cache code, or it could have been caused
by hardware problem (i.e., disk block written to the wrong place,
perhaps caused by bad SCSI/IDE cable, or an actual hard drive failure
causing garbage to be written or read), or it could be caused by an
ext3 error.

The real trick is figuring out how to reproduce the problem.  I
haven't other reports of this happening on recent kernels, so this
isn't a known problem.

The first thing to do is to unmount the filesystem and run e2fsck -f
on the device to make sure there aren't other parts of the filesystem
which had gotten corrupted somehow.  After that, the main trick will
be to see if you can push the filesystem hard and see if you can
reproduce the problem, and see if we can find a pattern there.
Another thing to try would be to drop back to an older kernel that's a
bit more trustworthy.  I'd suggest 2.4.9 plus ext3-0.9.6 --- that
version has been very stable for me and others.  So if it's still
happening there, that might be a hint that it might be a potential
hardware problem.

						- Ted





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]