[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: Will barrier support for ext3 be enabled by default for FC3 or errata kernel



On Tue, 2004-08-24 at 09:31, Kenneth Porter wrote:
> What is barrier support?

ok it's like this:
nowadays storage (think "disks" but also "raid arrays") can handle more
than 1 outstanding IO at a time.  For example for scsi it's not uncommon
to have 250 IO's outstanding to one disk at a time. This allows the disk
to optimize the IO, for example by doing commands that are sequential on
the platters as one big operation. 

The result of these optimisations is that IO's can and will happen in a
different order than the operating system has submitted them. This would
be a problem for journalling filesystems, because for example the
journal update that gets submitted before the actual metadata update can
in practice happen AFTER the actual metadata update. If there is a power
failure this would lead to a broken journal/metadata combination
(normally you recover from a metadata problem by replaying the
journal.... except the journal is out of date due to the reorder).
Currently, journalling filesystems deal with this by waiting for the
journal IO to entirely complete before submitting the metadata IO, eg if
it's already on the platter there is no way to reorder ;)

Barriers (or in scsi terms "ordered tags") basically tell the disk
"don't reorder past THIS point". In principle this alleviates the need
for the journalling filesystem to wait on the first IO before submitting
the second IO (the metadata one).

In practice it's sort of an unknown of how well actual disks honor
this... and if they do, it's mostly unknown how expensive it is for the
disks to do such tag, so it's unsure if there is an actual performance
gain.

At least now there's the infrastructure so that people can play with it;
enabling it by default from the start sounds like a bad idea. Of course
we'll be looking at this, but it's a longer term project. 

Attachment: signature.asc
Description: This is a digitally signed message part


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]