[libvirt] [PATCH v2 2/4] qemu: caps: add QEMU_CAPS_QCOW2_L2_CACHE_SIZE

Wed Dec 12 15:08:38 UTC 2018

On 12.12.2018 17:28, John Ferlan wrote:
> 
> 
> On 12/12/18 5:59 AM, Nikolay Shirokovskiy wrote:
>>
>>
>> On 11.12.2018 19:41, John Ferlan wrote:
>>>
>>>
>>> On 12/11/18 6:04 AM, Nikolay Shirokovskiy wrote:
>>>>
>>>>
>>>> On 10.12.2018 19:58, John Ferlan wrote:
>>>>>
>>>>>
>>>>> On 11/8/18 8:02 AM, Nikolay Shirokovskiy wrote:
>>>>>> For qemu capable of setting l2-cache-size for qcow2 images
>>>>>> to INT64_MAX and semantics of upper limit on l2 cache
>>>>>> size. We can only check this by qemu version (3.1.0) now.
>>>>>>
>>>>>> Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy at virtuozzo.com>
>>>>>> ---
>>>>>>  src/qemu/qemu_capabilities.c | 5 +++++
>>>>>>  src/qemu/qemu_capabilities.h | 1 +
>>>>>>  2 files changed, 6 insertions(+)
>>>>>>
>>>>>
>>>>> This patch needs to be updated to top of tree.
>>>>>
>>>>>> diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c
>>>>>> index 2ca5af3..49a3b60 100644
>>>>>> --- a/src/qemu/qemu_capabilities.c
>>>>>> +++ b/src/qemu/qemu_capabilities.c
>>>>>> @@ -509,6 +509,7 @@ VIR_ENUM_IMPL(virQEMUCaps, QEMU_CAPS_LAST,
>>>>>>                "vfio-pci.display",
>>>>>>                "blockdev",
>>>>>>                "vfio-ap",
>>>>>> +              "qcow2.l2-cache-size",
>>>>>
>>>>> When you do update, you will need to fix the comma-less entry for
>>>>> "egl-headless.rendernode" as well, unless someone else gets to it first.
>>>>>
>>>>>>      );
>>>>>>  
>>>>>>  
>>>>>> @@ -4117,6 +4118,10 @@ virQEMUCapsInitQMPMonitor(virQEMUCapsPtr qemuCaps,
>>>>>>          virQEMUCapsSet(qemuCaps, QEMU_CAPS_MACHINE_PSERIES_MAX_CPU_COMPAT);
>>>>>>      }
>>>>>>  
>>>>>> +    /* l2-cache-size before 3001000 does not accept INT64_MAX */
>>>>>> +    if (qemuCaps->version >= 3001000)
>>>>>> +        virQEMUCapsSet(qemuCaps, QEMU_CAPS_QCOW2_L2_CACHE_SIZE);
>>>>>> +
>>>>>
>>>>> Not a fan of this.
>>>>>
>>>>> The 'l2-cache-size' was supported since QEMU 2.2. A "newer" algorithm
>>>>> for cache adjustment has been supported since QEMU 2.12 (and there's
>>>>> l2-cache-entry-size that "could be used" to know that). Then there's
>>>>> some unreferenced commit indicating usage of INT64_MAX. Tracking that
>>>>> down, I find qemu.git commit 6c1c8d5d from Max which enforces MAX_INT.
>>>>
>>>> This is a failure rather than enforcement. And AFAIU code that limit cache
>>>> to appropriate maximum when INT64_MAX is given in read_cache_sizes:
>>>>
>>>> *l2_cache_size = MIN(max_l2_cache, l2_cache_max_setting);
>>>>
>>>> only appeared after release of 3.0 in 
>>>>
>>>> commit b749562d9822d14ef69c9eaa5f85903010b86c30
>>>> Author: Leonid Bloch <lbloch at janustech.com>
>>>> Date:   Wed Sep 26 19:04:43 2018 +0300
>>>>
>>>>     qcow2: Assign the L2 cache relatively to the image size
>>>>
>>>>
>>>> ie setting cache size to INT64_MAX before 3.1 will fail. In
>>>> other words in earlier versions there were no value to specify that
>>>> cache size need to be set to maximum. You can calculate this value
>>>> yourself knowing disk size and disk cluster size and set it but
>>>> this is not convinient.
>>>>
>>>
>>> So prior to this patch the max could be 32 MB? And now it's calculate-able?
>>
>> 32 MB is default value on Linux in 3.1 (the patch set it it to 1 MB but later
>> patch 80668d0f increases to 32 MB). 
>>
>> Before the patch one can set greater values but can not set INT64_MAX.
>> The exact value that can be set depends on cluster size due to the check
>> againts INT_MAX you already mentioned. So one can set up to INT_MAX * cluster_size,
>> but cluster_size is unknown to us (?). Given minium cluster size is 512 bytes, we
>> can set INT_MAX * 512 (1 TB) always. This will cover 64 TB disks for cluster
>> size 512 bytes and 8192 PB for default cluster size. So we should refuse 
>> to start with policy 'maximum' and disk size greater then 64 TB. 
>> Or may be we can just check cluster size at start up. How do you think?
> 
> I'm not sure I'd go through the trouble of determining the size. If
> we're not changing the default cluster_size, then using 512 "seems"
> reasonable (whether others feel the same way who knows yet). I'm not
> even sure if we already get that information - feel free to research and
> add though.
> 
> I'm not sure I would "refuse" to start - think in terms of what is the
> downside to not being able to have a large enough value - well that
> seems to be I/O performance.  If we can get "better" I/O performance by
> providing "something" more than "default" for those environments, then I
> would think that's better than doing nothing or failing to start.
> 
> I'm not sure how much "effort" you want to put into the window of
> support prior to 3.1 where "maximum" is true maximum.  It could be
> simple enough to document that for prior to 3.1 the maximum value is
> limited to XXX. After 3.1 it's essentially unlimited since libvirt can
> provide the maximum integer value to QEMU and QEMU then decides the
> maximum up to that value.  There's ways to wordsmith it - I'm pretty
> sure it's been done before, but I don't have an example readily
> available (search docs for size, maximum, or default)...
> 
>>
>> By the way the patch works only for -blockdev configuration which is
>> available since QEMU 2.10 AFAIK. So we could purse to support 2.10, 2.11,
>> 2.12 and 3.0 version in principle.
> 
> Ah true, the --blockdev option is required.
> 
> I've put the "artificial" point in time of 2.12 because that when I
> recall Berto making adjustments to the cache size algorithms and when
> the @l2-cache-entry-size appeared in qapi/block-core.json. I think prior
> to those changes the algorithm was even more difficult to decipher.

Ok, I'll check what is possible to support without too much efforts.

Hmm, looks like @l2-cache-size is available from much earlier versions...

# @l2-cache-size:         the maximum size of the L2 table cache in
#                         bytes (since 2.2)

Nikolay

> 
>>
>>>
>>> Do you think it would be "worthwhile" to add logic that knows QEMU 2.12
>>> and 3.0 could still support using l2_cache_size, but with a lower value
>>> say 32MB? Then if we find 3.1 installed then we can supply INT_MAX for
>>> the value?  Oh, and if the "maximum" attribute isn't set, then the
>>> default of 0 is used. Thus being able to use "Z" instead of "I" for patch 3?
>>
>> In default case l2-cache-size just does not get set.
>>
> 
> Yes on earlier support and I think using l2-cache-entry-size will give
> us a point in time to support from. The max size chosen is up to you - I
> think if we document what we set to prior to QEMU 3.1, then we'll be good.
> 
> Adding a second capability that would be (yuck) version based because we
> know a bug was fixed is going to have to be acceptable. Then, the logic
> to "set" the value would be something like:
> 
>     if (domain maximum provided) {
>         if (some 3.1 capability is set)
>             max = INT64_MAX;
>         else if (at least 2.12 capability is set)
>             max = "whatever smaller value you choose"
>         else
>             error this qemu doesn't support setting...
>     }
> 
>>>
>>> BTW:
>>> One issue with providing a numerical QEMU version for something is that
>>> someone could backport that patch (and it's dependencies) into some
>>> maintenance branch or even downstream repo that won't report as 3.1, but
>>> would have the patch. But since libvirt would only accept 3.1 or later,
>>> it wouldn't allow usage.
>>
>> Does not look critical. The functionality just won't work from start
>> and when they start to investigate why in the result they can backport 
>> appropriate qemu patches as well and fix libvirt caps code for example.
> 
> True, not critical, but it's one of the reasons numerical version
> checking is disliked. Similarly providing in docs that something is
> supported as of QEMU maj.min will get "dinged" with the reasoning of
> what if someone backports...  and I have a similar response to yours for
> that.
> 
> John
> 
> 
>>
>> Nikolay
>>
>>>
>>>
>>>>>
>>>>> Still does that really matter? Let QEMU pick their own max and don't
>>>>> have libvirt be the arbiter of that (like I noted in my 1/4 review). My
>>>>> reasoning is that there's been quite a few "adjustments" to the
>>>>> algorithms along the way. Those adjustments are one of the concerns
>>>>> voiced in the past why making any "semi-permanent" change to the value
>>>>> in some libvirt XML format has been NACKed historically. THus by placing
>>>>> boundaries that are true today we limit ourselves for the future.
>>>>
>>>> IIUC setting INT64_MAX is ok in this respect. It is not real value for cache
>>>> but rather order for QEMU to pick up one.
>>>>
>>>
>>> Right providing some unrealistically large value would appear to work.
>>> What that value "could be" for 2.12 and 3.0.0 which do support some
>>> level of alteration and would be 'nice' to include in the mix, but not
>>> required I suppose. And yes, I do have some downstream representation in
>>> mind by thinking about this - it's in beta now ;-).
>>>
>>> John
>>>
>>>>>
>>>>> BTW: If 3.1 was the "base" from which you want to work, then adjustments
>>>>> to the tests/qemucapabilitiesdata/*3_1*{.replies|.xml } files would be
>>>>> required as evidenced by your patch 4. The *.xml file would need to have
>>>>> the correct <version>3001000</version> setting which should allow patch4
>>>>> to be merged into patch3.
>>>>
>>>> Yeah, but 3.1 is not yet released and I need blockdev also as
>>>> patch only makes difference in latter case.
>>>>
>>>> Nikolay
>>>>
>>>>>
>>>>> John
>>>>>
>>>>>>      if (virQEMUCapsProbeQMPCommands(qemuCaps, mon) < 0)
>>>>>>          goto cleanup;
>>>>>>  
>>>>>> diff --git a/src/qemu/qemu_capabilities.h b/src/qemu/qemu_capabilities.h
>>>>>> index 6bb9a2c..bb2ac17 100644
>>>>>> --- a/src/qemu/qemu_capabilities.h
>>>>>> +++ b/src/qemu/qemu_capabilities.h
>>>>>> @@ -493,6 +493,7 @@ typedef enum { /* virQEMUCapsFlags grouping marker for syntax-check */
>>>>>>      QEMU_CAPS_VFIO_PCI_DISPLAY, /* -device vfio-pci.display */
>>>>>>      QEMU_CAPS_BLOCKDEV, /* -blockdev and blockdev-add are supported */
>>>>>>      QEMU_CAPS_DEVICE_VFIO_AP, /* -device vfio-ap */
>>>>>> +    QEMU_CAPS_QCOW2_L2_CACHE_SIZE, /* -blockdev supports l2-cache-size with INT64_MAX value */
>>>>>>  
>>>>>>      QEMU_CAPS_LAST /* this must always be the last item */
>>>>>>  } virQEMUCapsFlags;
>>>>>>