Ext3: Why data=journal is better than data=ordered when data needs to be read from and written to disk at the same time

Sun Mar 27 04:52:21 UTC 2011

On Sat, Mar 26, 2011 at 10:44 PM, Ted Ts'o <tytso at mit.edu> wrote:
> On Sat, Mar 26, 2011 at 08:25:23PM -0400, Jidong Xiao wrote:
>>
>> But my question is, why data=journal could outperform data=ordered,
>> for the data=journal mode, you have to write the data and metadata
>> blocks into the journal, but for the data=ordered mode, you only have
>> to write the metadata blocks into the journal. If, in some certain
>> cases, the former mode can avoid seeks, then the same behavior should
>> apply to the latter mode. So it's really odd that the former mode can
>> outperform the latter mode.
>
> When executing an fsync(), in data=ordered mode you have to write the
> data data blocks into the journal and wait for the data blocks to be
> written.  This requires generally will require extra seeks.  In
> data=journaled mode, the data blocks can be written directly into the
> sjoujournal without needing to seek.
>
> Of course eventually the data and metadata blocks will need to be
> written to their permanent locations before the journal space can be
> reused.  But for short bursty write patterns, the fsync() latency will
> be much smaller in data=journal mode.
>

Thank you Ted, it is really helpful!

So the difference is:
data=ordered mode: fsync() will return only if the meta data blocks
have been written into the journal and the data blocks have been
written into the disk.
data=journal mode: fsync() returns if the meta data and data have been
written into the journal. The journal is contiguous, so data=journal
mode means no seeking needed, therefore, fsync() would return more
quicker.

If, we perform read from and write to the disk simultaneously, like
following example:

First, write data to the filesystem as quickly as possible:

Rapid writing

while true
do
	dd if=/dev/zero of=largefile bs=16384 count=131072
done

While data was being written to the test filesystem, read 16Mb of data
from the same filesystem on the same disk, timing the results:

Reading a 16Mb file

time cat 16-meg-file > /dev/null

In this case, if we conduct the experiment in data=journal mode and
data=ordered mode respectively, since write latency is much smaller in
data=journal mode, the disk will focus more on the read operation,
hence, the read operation will also finish earlier than it do in the
data=ordered mode. Am I understanding correctly?

Regards
Jidong