barrier and commit options?
Ric Wheeler
rwheeler at redhat.com
Sat Jan 31 12:45:06 UTC 2009
Theodore Tso wrote:
>>>> - If I remember the details correctly, Chris Mason has demonstrated a
>>>> 50% chance of corruption directory entries in ext3 for example.
>>>>
>
> Chris Mason has a script which forces the system to be under a lot of
> memory pressure, and in that scenario, it is highly likely that
> without barriers, there will be filesystem corruptions if the system
> is abruptly turned off while his script is running.
>
> Andrew Monrton has been resistant in making barriers=1 be the default
> for ext3 because (as I understand it) he disbelieves that this is an
> adequate real-world example, and there is a real performance hit to
> running without barriers.
>
>
>>>> If you have a battery backed write cache (say, in a high end array)
>>>> barriers can be ignored since the storage can effectively make that
>>>> write cache non-volatile, but otherwise, this is pretty key for
>>>> anyone wanting to maintain data integrity,
>>>>
>>>>
>>> That's what I getting at, array controllers with a battery backed
>>> write cache (BBWC). We disable the write cache on the physical
>>> disks and provide no mechanism to re-enable the cache except in
>>> some SATA configurations.
>>>
>
> Well, we still need the barrier on the block I/O elevantor side to
> make sure that requests don't get reordered in the block layer. But
> what you're saying is that once the write is posted to the array, it
> is guaranteed that it is on "stable storage" (even if it is BBWC) such
> that if someone hits the Big Red Switch at the exit to the data
> center, and power is forcibly cut from the entire data center in case
> of a fire, the battery will still keep the cache alive, at least until
> the sprinklers go off, anyway, right? :-)
>
Yes, true....
> In that case, I suspect the right thing for the cciss array to do is
> to ignore the barrier, but not to return an error. If you return an
> error, and refuse the write with barrier operation (which is what the
> cciss driver seems to be doing starting in 2.6.29-rcX), ext4 will
> retry the write without the barrier, at which point we are vulnerable
> to the block layer reordering things at the I/O scheduler layer. In
> effect, you're claiming that every single write to cciss is implicitly
> a "barrier write" in that once it is received by the device, it is
> guaranteed not to be lost even if the power to the entire system is
> forcibly removed.
>
> - Ted
>
>
>
Aren't barriers tied still to the state of the write cache on the target
drive? In other words, if the write cache is off, we disable barriers
automatically. I think that this happens for scsi in sd_revalidate_disk().
In this case, it sounds like we have tangled the need to flush a drive's
write with the need to not re-order IO in the elevator code.
Ric
More information about the Ext3-users
mailing list