[PATCH] storage: only fallocate when allocation matches capacity

Thu Sep 3 10:49:18 UTC 2020

On 9/3/20 5:18 AM, Christian Ehrhardt wrote:
> Even if my fix lands, we are back to square one and would need
> virt-manager to submit a different XML.
> Remember: my target here would be to come back to pralloca=metadata as
> it was before for image creations from virt-manager.

Why is that your goal?

If this is simply because OpenZFS doesn't support fallocate(mode=0),
that has (finally!) been resolved for the next release:
https://github.com/openzfs/zfs/commit/f734301d2267cbb33eaffbca195fc93f1dae7b74

ZFS will "fake" the fallocate() request. It'll check to make sure
there's enough free space at the moment, which is about all it can do
anyway. It can't reserve the space anyway, mostly because it is a
copy-on-write filesystem. Even if the application writes zeros, ZFS will
just throw them away anyway (assuming you are using compression, which
everyone should be).

> On the libvirt side allocation>capacity sounds like being wrong anyway.
> And if that is so we have these possible conditions:
> - capacity==allocation now and before my change falloc
> - capacity>allocation now and before my change metadata
> - capacity<allocation before my change falloc, afterwards metadata
> (but this one seems invalid anyway)
> 
> So I wonder are we really back at me asking Cole to let virt-manager
> request things differently which is how this started about a year ago?

Setting aside cases of semi-allocation (capacity > allocation != 0) and
overprovisioning (allocation > capacity), I assume the common cases are
thin provisioning (allocation == 0) and thick provisioning (capacity ==
allocation).

virt-manager (at least in the way I use it) asks explicitly for the
allocation and capacity. If virt-manager is properly conveying (and I'd
assume it is) the user's capacity and allocation choices from the GUI to
libvirt, then virt-manager is working correctly in my view and should be
left alone.

I believe the main goal for thick provisioning is to reserve the space
as best as possible, because ENOSPC underneath a virtual machine is bad.
Secondary goals would be allocating the space relatively contiguously
for performance and accounting for the space immediately to help the
administrator keep track of usage.

If the filesystem supports fallocate(), using it accomplishes all of
these goals in a very performant way. If the filesystem does not support
fallocate(), then the application can either write zeros or do nothing.
Writing zeros is slow, but achieves the goals to the extent possible.
Not writing zeros is fast, but does not reserve/account for the space;
though, depending on the filesystem, that might not be possible anyway.

I think the question fundamentally comes down to: how strong do you take
a "thick provisioning" request? Do you do everything in your power to
achieve it (which would mean writing zeros*) or do you treat it as a
hint that you'll only follow if it is fast to do so?

If it's a demand, then try fallocate() but fall back to writing zeroes.
(glibc's posix_fallocate() does exactly this.). If it's a hint, then
only ever call fallocate().

I think it is reasonable to treat it as a demand and write zeros if
fallocate() fails. If it is too slow, the admin will notice and can make
the decision to (in the future) stop requesting thick provisioning and
just request thin provisioning.

In the ZFS case, why is the admin requesting thick provisioning anyway?

* One could go further and defeat compression by writing random data.
  But that seems extreme, so I'm going to ignore that.

-- 
Richard