OT - Journaling File Systems?

Fri Jul 2 16:38:01 UTC 2004

On Fri, Jul 02, 2004 at 11:22:20AM -0500, Edwards, Scott (MED, Kelly IT Resouces) wrote:
> The ext3 have almost a perfect record with the write cache off:  I have
> run over 300 cycles on the two drives and only had two corrupted lines
> in the output files.  So out of 600 total cycles on the two drives there
> were only two lines with bad data, I think that is a pretty good record.

Unless you are doing data journalling or some kind of userspace
transactions you wouldn't expect file contents to be perfect. Data 
journalling has a big performance cost.

> I just can't understand what is happening, it makes no sense to me that 
> one file system would be almost perfect and three would fail so 
> dramatically.  I am going to re-run the tests on all 4 file systems to
> verify that it is repeatable.

Your expectations seem at odds with what journalling provides. A journalled
fs can be recovered by log replay. It doesn't guarantee that user data is
recovered precisely. It guarantees that user data is recovered to those
points where it was committed.

Thus
		open file O_APPEND
		write stuff
		close it

repeat. doesn't guarantee "stuff" will always be committed - it just
guarantees that the fs will be structurally sound

		open file O_APPEND
		write stuff
		fsync
		close

OTOH says that after the fsync has returned you can be sure the data
just wrote before it *will* still be there.

Ext3 data journalling journals everything which is a bit slower but can
be appropriate for some applications (and actually for big NFS servers
often turns out to be faster because of the NFS commit behaviour)