Whitepaper: Red Hat's New Journaling File System: ext3

January 1, 2010

Michael K. Johnson
johnsonm@redhat.com

In Red Hat Linux 7.2, Red Hat provides its first officially supported journaling file system: ext3. The ext3 file system is a set of incremental enhancements to the robust ext2 file system that provide several advantages. This paper summarizes some of those advantages (first in general terms and then more specifically), explains what Red Hat has done to test the ext3 file system, and (for advanced users only) touches on tuning.

 

Why can you trust ext3?

Here are some of the things Red Hat has done to ensure that ext3 is safe for handling user data.

  • We have performed extensive stress testing under a large set of configurations. This has involved many thousands of hours of "contrived" load testing on a wide variety of hardware and file system configurations, as well as many use case tests.

  • We have audited ext3 for multiple conditions, including memory allocation errors happening at any point. We test that repeatedly and often (every time the code changes) by forcing false errors and testing file system consistency.

  • We audited and tested ext3 for poor interactions with the VM subsystem, finding and fixing several interactions. A journaling file system puts more stress on the VM subsystem, and we found and fixed bugs both in ext3 and in the VM subsystem in the process of this auditing and testing. After thousands of hours of this testing, we are extremely confident in the robustness of the ext3 file system.

  • We have done an extensive year-long-plus beta program, starting with ext3 on the 2.2 kernel series, and then moving forward to the 2.4 kernel series. Even before the official beta program, ext3 was put into production use in some circumstances; ext3 has been in production use on some widely-accessed servers, including the rpmfind.net servers, for more than two years.

  • We have arranged to allow the user to choose to check file systems consistency after unclean system shutdown, even if the filesystem is marked "clean", in order to deal with potential hardware-generated corruption. This is because hardware failures, and most particularly real power failures or brownouts, can cause "garbage" data to be written practically anywhere on disk. Hitting the reset button is not likely to trigger this kind of problem, but a true power failure associated with things like lightning strikes and trees falling on power lines tend to involve spikes and brownouts that can damage date en route to disk. IDE systems tend to be a bit more susceptible to this kind of problem than SCSI systems, in part because IDE disks tend to implement looser cacheing algorithms.

  • This feature is implemented using the /.autofsck file — if the root user removes that file during normal operation, the system will offer the choice to check file system consistency at boot time. If /.autofsck is missing and the user elects to force file system consistency checks, the effect will be the same as if the /forcefsck file existed.