Frequent metadata corruption with ext3 + hard power-off

Mats Ahlgren mats_a at MIT.EDU
Sun Mar 18 01:42:17 UTC 2007


Hello.

I'm having serious issues with ext3; any insight would be greatly appreciated:


_____ Overview:

I believe ext3 is supposed to be recoverable in the case of a power failure by 
replaying the log.

However, on two separate computers (running different operatings systems too), 
this has been everything but the case.


_____ Specifics:

Sometimes, my kernel will hard-freeze and I'll have to do a hard reboot. When 
this happens, sometimes fsck will insist on running and find some orphaned 
inodes, which it will proceed to put in the /lost+found directory.

This is unacceptable: The last time this happened, random files in my 
operating system were plucked from the file system and stuffed in lost+found, 
corrupting the OS and forcing a reinstall. Another time, files I had recently 
moved (a final project) a minute before the crash were orphaned and put in 
the lost+found, effectively destroying it.

Why should a lost+found folder even be necessary when the file hierarchy is 
guaranteed to be consistent?


In response to these problems, I changed the ext3 journaling mode to "journal" 
rather than "ordered" (frankly it seems deeply disturbing that "ordered" is 
the default). Since then, I've once had to hard-reboot and yet again found 
files in the /lost+found folder.

Might anyone know why ext3 is not fulfilling its promise of an 
always-consistent file system?


_____ Other interacting issues:

I'm running RAID1 (mirroring) on one computer, but I've had the same issues on 
another computer without RAID.

(In response to "you shouldn't hard-reboot your computer": I realize that most 
computers are not meant to be hard-rebooted, but I don't have a sysrq key and 
xmodmapping it has been difficult. I also realize that kernels shouldn't 
crash, but what's a person to do if the computer doesn't respond to 
ctrl-alt-f1 and doesn't leave any messages in the logs...)

(In response to "maybe your drive is defective": This is not a problem with a 
defective drive; I've tried multiple drives.)

(In response to "you should backup your data": Periodic backups clearly help, 
but it's ridiculous to restore a system from backup every week because a 
hard-freeze corrupted your filesystem...)


Any insight would be greatly appreciated. These problems have been making me 
look for other file systems (such as zfs, which unfortunately I can't use to 
boot; or reiser4, which also makes a filesystem-is-always-consistent 
guarantee); I would prefer to use ext3, but I've never had these sorts of 
problems with old Mac OS, OS X, or Windows.


Thank you,
Mats




More information about the Ext3-users mailing list