[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: ext3 bug?



Theodore Ts'o wrote:

On Fri, Jan 16, 2004 at 02:37:06PM +0100, Simon Vogl wrote:


sorry about this - yes, they are I/O errors (interestingly, I do use LANG=C normally.
don't know what reset this - som kind of Debian magic, I suppose).



I/O errors are returned my the filesystem under a number of different conditions; in this case, it's likely they were simply caused by filesystem inconsistencies. How the filesystem became inconsistent is a different storiy.



The log files did not show anything, I didn't even have something on the system console :(



In general, if userspace sees an "I/O Error" returned up to it, the kernel would have logged something; either a hardware I/O error, or a filesystem inconsistency warning. If you didn't see something, then either the relevant log files got corrupted and so were lost, or you aren't looking at the right logs, and/or the console has been configured to suppress certain (fairly important) levels of log messages. Basically, there really should have been *some* kind of log entries given the symptoms you described.

The usual cause for this kind of really massive levels of
inconsistency is a hardware fault; either garbage is being written
into the inode table, or the block sector address to the controller is
getting corrupted, so the wrong data is being written to the wrong
place, or memory is getting corrupted and then being written out to
disk.

It's possible that this might be caused by a filesystem bug, of
course, but I'm not aware of any other reports that match your report
at this point, and ext3 is fairly widely used.

So my first suggest is to make sure that logging is working correctly,
since the fact that you didn't see any logs, especially on the
console, is highly suspect.  The linux kernel is pretty verbose when
it's unhappy, and from what you described, the kernel should have been
extremely upset.  :-)



Right, I thought so, too. On the other side, the partition is mounted with the 'on error remount readonly'
option, which could be why I did not find anything... Nevertheless, I have the disk image that I will look
at as soon as I got the machine restored (and I have another one that I can try to kill, maybe I can
reproduce it >:)


I know almost certain, btw, that it is a software problem - my workstation had the same problem some
weeks ago, but I did not think much about it - I simply did an fsck and cloned it from the master machine
once more. Since then, it ran without errors.


btw, anyone know how I get rid of those pesky files that stay around with broken bits in them? I cant
even remove those inodes as root - I moved the directories containing them to isolation...


Thanks for the help,
Simon

--
_______________________________________________________________________
Dr. Simon Vogl
Institut für Pervasive Computing, Johannes Kepler Universität Linz
Altenberger Straße 69, A-4040 Linz, Austria

Tel: +43 732 2468-8517, Fax: +43 732 2468-8426
mailto: vogl soft uni-linz ac at,  http://www.soft.uni-linz.ac.at/





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]