[libvirt] [PATCH v2 2/4] qemu: caps: add QEMU_CAPS_QCOW2_L2_CACHE_SIZE

Wed Dec 12 10:59:56 UTC 2018

On 11.12.2018 19:41, John Ferlan wrote:
> 
> 
> On 12/11/18 6:04 AM, Nikolay Shirokovskiy wrote:
>>
>>
>> On 10.12.2018 19:58, John Ferlan wrote:
>>>
>>>
>>> On 11/8/18 8:02 AM, Nikolay Shirokovskiy wrote:
>>>> For qemu capable of setting l2-cache-size for qcow2 images
>>>> to INT64_MAX and semantics of upper limit on l2 cache
>>>> size. We can only check this by qemu version (3.1.0) now.
>>>>
>>>> Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy at virtuozzo.com>
>>>> ---
>>>>  src/qemu/qemu_capabilities.c | 5 +++++
>>>>  src/qemu/qemu_capabilities.h | 1 +
>>>>  2 files changed, 6 insertions(+)
>>>>
>>>
>>> This patch needs to be updated to top of tree.
>>>
>>>> diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c
>>>> index 2ca5af3..49a3b60 100644
>>>> --- a/src/qemu/qemu_capabilities.c
>>>> +++ b/src/qemu/qemu_capabilities.c
>>>> @@ -509,6 +509,7 @@ VIR_ENUM_IMPL(virQEMUCaps, QEMU_CAPS_LAST,
>>>>                "vfio-pci.display",
>>>>                "blockdev",
>>>>                "vfio-ap",
>>>> +              "qcow2.l2-cache-size",
>>>
>>> When you do update, you will need to fix the comma-less entry for
>>> "egl-headless.rendernode" as well, unless someone else gets to it first.
>>>
>>>>      );
>>>>  
>>>>  
>>>> @@ -4117,6 +4118,10 @@ virQEMUCapsInitQMPMonitor(virQEMUCapsPtr qemuCaps,
>>>>          virQEMUCapsSet(qemuCaps, QEMU_CAPS_MACHINE_PSERIES_MAX_CPU_COMPAT);
>>>>      }
>>>>  
>>>> +    /* l2-cache-size before 3001000 does not accept INT64_MAX */
>>>> +    if (qemuCaps->version >= 3001000)
>>>> +        virQEMUCapsSet(qemuCaps, QEMU_CAPS_QCOW2_L2_CACHE_SIZE);
>>>> +
>>>
>>> Not a fan of this.
>>>
>>> The 'l2-cache-size' was supported since QEMU 2.2. A "newer" algorithm
>>> for cache adjustment has been supported since QEMU 2.12 (and there's
>>> l2-cache-entry-size that "could be used" to know that). Then there's
>>> some unreferenced commit indicating usage of INT64_MAX. Tracking that
>>> down, I find qemu.git commit 6c1c8d5d from Max which enforces MAX_INT.
>>
>> This is a failure rather than enforcement. And AFAIU code that limit cache
>> to appropriate maximum when INT64_MAX is given in read_cache_sizes:
>>
>> *l2_cache_size = MIN(max_l2_cache, l2_cache_max_setting);
>>
>> only appeared after release of 3.0 in 
>>
>> commit b749562d9822d14ef69c9eaa5f85903010b86c30
>> Author: Leonid Bloch <lbloch at janustech.com>
>> Date:   Wed Sep 26 19:04:43 2018 +0300
>>
>>     qcow2: Assign the L2 cache relatively to the image size
>>
>>
>> ie setting cache size to INT64_MAX before 3.1 will fail. In
>> other words in earlier versions there were no value to specify that
>> cache size need to be set to maximum. You can calculate this value
>> yourself knowing disk size and disk cluster size and set it but
>> this is not convinient.
>>
> 
> So prior to this patch the max could be 32 MB? And now it's calculate-able?

32 MB is default value on Linux in 3.1 (the patch set it it to 1 MB but later
patch 80668d0f increases to 32 MB). 

Before the patch one can set greater values but can not set INT64_MAX.
The exact value that can be set depends on cluster size due to the check
againts INT_MAX you already mentioned. So one can set up to INT_MAX * cluster_size,
but cluster_size is unknown to us (?). Given minium cluster size is 512 bytes, we
can set INT_MAX * 512 (1 TB) always. This will cover 64 TB disks for cluster
size 512 bytes and 8192 PB for default cluster size. So we should refuse 
to start with policy 'maximum' and disk size greater then 64 TB. 
Or may be we can just check cluster size at start up. How do you think?

By the way the patch works only for -blockdev configuration which is
available since QEMU 2.10 AFAIK. So we could purse to support 2.10, 2.11,
2.12 and 3.0 version in principle.

> 
> Do you think it would be "worthwhile" to add logic that knows QEMU 2.12
> and 3.0 could still support using l2_cache_size, but with a lower value
> say 32MB? Then if we find 3.1 installed then we can supply INT_MAX for
> the value?  Oh, and if the "maximum" attribute isn't set, then the
> default of 0 is used. Thus being able to use "Z" instead of "I" for patch 3?

In default case l2-cache-size just does not get set.

> 
> BTW:
> One issue with providing a numerical QEMU version for something is that
> someone could backport that patch (and it's dependencies) into some
> maintenance branch or even downstream repo that won't report as 3.1, but
> would have the patch. But since libvirt would only accept 3.1 or later,
> it wouldn't allow usage.

Does not look critical. The functionality just won't work from start
and when they start to investigate why in the result they can backport 
appropriate qemu patches as well and fix libvirt caps code for example.

Nikolay

> 
> 
>>>
>>> Still does that really matter? Let QEMU pick their own max and don't
>>> have libvirt be the arbiter of that (like I noted in my 1/4 review). My
>>> reasoning is that there's been quite a few "adjustments" to the
>>> algorithms along the way. Those adjustments are one of the concerns
>>> voiced in the past why making any "semi-permanent" change to the value
>>> in some libvirt XML format has been NACKed historically. THus by placing
>>> boundaries that are true today we limit ourselves for the future.
>>
>> IIUC setting INT64_MAX is ok in this respect. It is not real value for cache
>> but rather order for QEMU to pick up one.
>>
> 
> Right providing some unrealistically large value would appear to work.
> What that value "could be" for 2.12 and 3.0.0 which do support some
> level of alteration and would be 'nice' to include in the mix, but not
> required I suppose. And yes, I do have some downstream representation in
> mind by thinking about this - it's in beta now ;-).
> 
> John
> 
>>>
>>> BTW: If 3.1 was the "base" from which you want to work, then adjustments
>>> to the tests/qemucapabilitiesdata/*3_1*{.replies|.xml } files would be
>>> required as evidenced by your patch 4. The *.xml file would need to have
>>> the correct <version>3001000</version> setting which should allow patch4
>>> to be merged into patch3.
>>
>> Yeah, but 3.1 is not yet released and I need blockdev also as
>> patch only makes difference in latter case.
>>
>> Nikolay
>>
>>>
>>> John
>>>
>>>>      if (virQEMUCapsProbeQMPCommands(qemuCaps, mon) < 0)
>>>>          goto cleanup;
>>>>  
>>>> diff --git a/src/qemu/qemu_capabilities.h b/src/qemu/qemu_capabilities.h
>>>> index 6bb9a2c..bb2ac17 100644
>>>> --- a/src/qemu/qemu_capabilities.h
>>>> +++ b/src/qemu/qemu_capabilities.h
>>>> @@ -493,6 +493,7 @@ typedef enum { /* virQEMUCapsFlags grouping marker for syntax-check */
>>>>      QEMU_CAPS_VFIO_PCI_DISPLAY, /* -device vfio-pci.display */
>>>>      QEMU_CAPS_BLOCKDEV, /* -blockdev and blockdev-add are supported */
>>>>      QEMU_CAPS_DEVICE_VFIO_AP, /* -device vfio-ap */
>>>> +    QEMU_CAPS_QCOW2_L2_CACHE_SIZE, /* -blockdev supports l2-cache-size with INT64_MAX value */
>>>>  
>>>>      QEMU_CAPS_LAST /* this must always be the last item */
>>>>  } virQEMUCapsFlags;
>>>>