[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [libvirt] [PATCH v2 1/4] xml: add disk driver metadata_cache_size option




On 10.12.2018 19:56, John Ferlan wrote:
> 
> On 11/8/18 8:02 AM, Nikolay Shirokovskiy wrote:
>> Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy virtuozzo com>
>> ---
>>  docs/formatdomain.html.in                          |  8 ++++
>>  docs/schemas/domaincommon.rng                      | 11 +++++
>>  src/conf/domain_conf.c                             | 17 ++++++++
>>  src/conf/domain_conf.h                             |  9 ++++
>>  .../qemuxml2argvdata/disk-metadata_cache_size.xml  | 42 +++++++++++++++++++
>>  .../disk-metadata_cache_size.xml                   | 48 ++++++++++++++++++++++
>>  tests/qemuxml2xmltest.c                            |  2 +
>>  7 files changed, 137 insertions(+)
>>  create mode 100644 tests/qemuxml2argvdata/disk-metadata_cache_size.xml
>>  create mode 100644 tests/qemuxml2xmloutdata/disk-metadata_cache_size.xml
>>
> 
> <sigh> looks like a forgotten thread...  It seems reviewer bandwidth is
> limited and won't get much better during the last couple weeks of the
> month when many if not most Red Hat employees are out to due company
> shutdown period.
> 
> You need to add a few words to the commit message to describe what's
> being changed.  Oh and you'll also need a "last" patch to docs/news.xml
> to describe the new feature.
> 
>> diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in
>> index 269741a..1d186ab 100644
>> --- a/docs/formatdomain.html.in
>> +++ b/docs/formatdomain.html.in
>> @@ -3556,6 +3556,14 @@
>>              virt queues for virtio-blk. (<span class="since">Since 3.9.0</span>)
>>            </li>
>>            <li>
>> +            The optional <code>metadata_cache_size</code> attribute specifies
>> +            metadata cache size policy, possible values are "default" and "maximum".
> 
> s/policy, possible/policy. Possible/
> 
>> +            "default" leaves setting cache size to hypervisor, "maximum" makes
> 
> s/"default"/Using "default"/
> 
> s/to hypervisor,/to the hypervisor./
> 
> s/"maximum"/Using "maximum"
> 
>> +            cache size large enough to keep all metadata, this will help if workload
> 
> 
> "Using maximum assigns the largest value possible for the cache size.
> This ensures the entire disk cache remains in memory for faster I/O at
> the expense of utilizing more memory."
> 
> [Editorial comment: it's really not 100% clear what all the tradeoffs
> someone is making here. The phrase "large enough" just sticks out, but
> since you use INT64_MAX in patch 3, I suppose that *is* the maximum.
> Still in some way indicating that this allows QEMU to grow the cache and
> keep everything in memory, but has the side effect that disks configured
> this way will cause guest memory requirements to grow albeit limited by
> the QEMU algorithms. It seems from my read the max is 32MB, so perhaps
> not a huge deal, but could be useful to note. Whether QEMU shrinks the
> cache when not in use wasn't 100% clear to me.]
> 
> It's too bad it's not possible to utilize some dynamic value via
> qcow2_update_options_prepare. If dynamic adjustment were possible, then
> saving the value in the XML wouldn't be necessary - we could just allow
> dynamic adjustment similar to what I did for of a couple of IOThread
> params (see commit 11ceedcda and the patches before it).
> 
>> +            needs access to whole disk all the time. (<span class="since">Since
>> +            4.10.0, QEMU 3.1</span>)
> 
> This will be at least 5.0.0 now.
> 
>> +          </li>
>> +          <li>
>>            For virtio disks,
>>            <a href="#elementsVirtio">Virtio-specific options</a> can also be
>>            set. (<span class="since">Since 3.5.0</span>)
>> diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng
>> index b9ac5df..3e406fc 100644
>> --- a/docs/schemas/domaincommon.rng
>> +++ b/docs/schemas/domaincommon.rng
>> @@ -1990,6 +1990,9 @@
>>          <ref name="detect_zeroes"/>
>>        </optional>
>>        <optional>
>> +        <ref name="metadata_cache_size"/>
>> +      </optional>
>> +      <optional>
>>          <attribute name='queues'>
>>            <ref name="positiveInteger"/>
>>          </attribute>
>> @@ -2090,6 +2093,14 @@
>>        </choice>
>>      </attribute>
>>    </define>
>> +  <define name="metadata_cache_size">
>> +    <attribute name='metadata_cache_size'>
>> +      <choice>
>> +        <value>default</value>
>> +        <value>maximum</value>
> 
> I didn't go back and read the previous reviews nor was I present for the
> KVM Forum discussion on this, but default really means "minimum" I
> think. Even that just doesn't feel "right". There's 3 QEMU knobs, but
> this changes 1 knob allowing one direction. And yes, changing 3 knobs is
> also confusing. I "assume" that it's been determined that this one knob
> has the greatest affect on I/O performance...
> 
> Reading https://github.com/qemu/qemu/blob/master/docs/qcow2-cache.txt
> some phrases stick out:
> 
>   1. "setting the right cache sizes is not a straightforward operation".
>   2. "In order to choose the cache sizes we need to know how they relate
> to the amount of allocated space."
>   3. The limit of the l2 cache size is "32 MB by default on Linux
> platforms (enough for full coverage of 256 GB images, with the default
> cluster size)".
>   4. "QEMU will not use more memory than needed to hold all of the
> image's L2 tables, regardless of this max. value."
> 
> So at least providing INT_MAX won't hurt, although given the math in
> qcow2_update_options_prepare:
> 
>     l2_cache_size /= l2_cache_entry_size;
>     if (l2_cache_size < MIN_L2_CACHE_SIZE) {
>         l2_cache_size = MIN_L2_CACHE_SIZE;
>     }
>     if (l2_cache_size > INT_MAX) {
> 
> wouldn't that mean we could pass up to "l2_cache_size *
> l2_cache_entry_size"?

You mean up to INT_MAX / l2_cache_entry_size? No, because there
is limitation also that comes from read_cache_sizes so cache
will be limited to maximum needs of disk size.

> 
> Still using minimum/maximum for what amounts to a boolean operation is I
> think unnecessary. Rather than a list of values, wouldn't having
> something like "max_metadata_cache='yes'" be just as effective?

Well introducing metadata_cache_size enum is what it was agreed on KVM Forum...

> 
> Alternatively, reading through the above document and thinking about how
> "useful" it could be to use a more specific value, why not go with
> "metadata_cache_size=N", where N is in MB and describes the size of the
> cache to be used. Being in MB ascribes to the need to be a power of 2
> and quite frankly keeps a reasonable range assigned to the value. I'd
> find it hard to fathom a "large enough" disk would improve that much I/O
> performance in KB adjustments, but I haven't done any analysis of that
> theory either.

There a lot of discussion of this cache parameter in this list, bugzilla
(sorry for not providing links) that boils down to one does not know
how to set it properly, so having option that is in MB is hard to use. On
the other hand there are cases (synthetic tests, large database) when default
cache size is clearly not big enough so in course of discussion
it was decided to have besides default cache size also cache size that
cover whole disk so IO performance will not degrade whatever guest will do.
Thus this 'maximum' set. 'default' is just an explicit default (implicit
is omitted metadata_cache_size option of course).

Also having enum instead of bool is a bit future proof )

Nikolay

> 
> I think the above document also gives a "fair" example that could be
> "reworded" into the libvirt docs in order to help "size" things based on
> the default QEMU values we're not allowing to be changed and then
> showing the math what it means to have a 1MB cache, a 2MB cache, an 8MB,
> etc. vis-a-vis the size of the disk for which the cache is being
> created. For example, for a disk of up to 8G, the "default" cache would
> be XXX and setting a value of YYY would help I/O.  Similarly for a 16GB,
> 64GB, etc. Essentially a bit of a guide - although that's starting to
> border on what should go into the wiki instead.
> 
>> +      </choice>
>> +    </attribute>
>> +  </define>
>>    <define name="controller">
>>      <element name="controller">
>>        <optional>
>> diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c
>> index 237540b..b2185f8 100644
>> --- a/src/conf/domain_conf.c
>> +++ b/src/conf/domain_conf.c
>> @@ -885,6 +885,11 @@ VIR_ENUM_IMPL(virDomainDiskDetectZeroes, VIR_DOMAIN_DISK_DETECT_ZEROES_LAST,
>>                "on",
>>                "unmap")
>>  
>> +VIR_ENUM_IMPL(virDomainDiskMetadataCacheSize,
>> +              VIR_DOMAIN_DISK_METADATA_CACHE_SIZE_LAST,
>> +              "default",
>> +              "maximum")
>> +
>>  VIR_ENUM_IMPL(virDomainDiskMirrorState, VIR_DOMAIN_DISK_MIRROR_STATE_LAST,
>>                "none",
>>                "yes",
>> @@ -9375,6 +9380,14 @@ virDomainDiskDefDriverParseXML(virDomainDiskDefPtr def,
>>      }
>>      VIR_FREE(tmp);
>>  
>> +    if ((tmp = virXMLPropString(cur, "metadata_cache_size")) &&
>> +        (def->metadata_cache_size = virDomainDiskMetadataCacheSizeTypeFromString(tmp)) < 0) {
>> +        virReportError(VIR_ERR_CONFIG_UNSUPPORTED,
>> +                       _("unknown driver metadata_cache_size value '%s'"), tmp);
>> +        goto cleanup;
>> +    }
>> +    VIR_FREE(tmp);
>> +
> 
> This would just use the YesNo type logic instead or read in a numeric
> value.  BTW: Not sure I'd bother with worrying about checking the QEMU
> maximum as that could change and then we're stuck with a lower maximum
> check. Just pass along the value to QEMU and let it fail.
> 
>>      ret = 0;
>>  
>>   cleanup:
>> @@ -23902,6 +23915,10 @@ virDomainDiskDefFormatDriver(virBufferPtr buf,
>>      if (disk->queues)
>>          virBufferAsprintf(&driverBuf, " queues='%u'", disk->queues);
>>  
>> +    if (disk->metadata_cache_size)
>> +        virBufferAsprintf(&driverBuf, " metadata_cache_size='%s'",
>> +                          virDomainDiskMetadataCacheSizeTypeToString(disk->metadata_cache_size));
>> +
> 
> My personal crusade... metadata_cache_size >
> VIR_DOMAIN_DISK_METADATA_CACHE_SIZE_DEFAULT (yes, > 0).
> 
> Of course this all depends on your final solution - a boolean would only
> be written if set and an int value only written if > 0 (since 0 would be
> the "optional" viewpoint).
> 
>>      virDomainVirtioOptionsFormat(&driverBuf, disk->virtio);
>>  
>>      ret = virXMLFormatElement(buf, "driver", &driverBuf, NULL);
>> diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h
>> index e30a4b2..b155058 100644
>> --- a/src/conf/domain_conf.h
>> +++ b/src/conf/domain_conf.h
>> @@ -568,6 +568,13 @@ typedef enum {
>>      VIR_DOMAIN_DISK_DETECT_ZEROES_LAST
>>  } virDomainDiskDetectZeroes;
>>  
>> +typedef enum {
>> +    VIR_DOMAIN_DISK_METADATA_CACHE_SIZE_DEFAULT = 0,
>> +    VIR_DOMAIN_DISK_METADATA_CACHE_SIZE_MAXIMUM,
>> +
>> +    VIR_DOMAIN_DISK_METADATA_CACHE_SIZE_LAST
>> +} virDomainDiskMetadataCacheSize;
>> +
>>  typedef struct _virDomainBlockIoTuneInfo virDomainBlockIoTuneInfo;
>>  struct _virDomainBlockIoTuneInfo {
>>      unsigned long long total_bytes_sec;
>> @@ -672,6 +679,7 @@ struct _virDomainDiskDef {
>>      int discard; /* enum virDomainDiskDiscard */
>>      unsigned int iothread; /* unused = 0, > 0 specific thread # */
>>      int detect_zeroes; /* enum virDomainDiskDetectZeroes */
>> +    int metadata_cache_size; /* enum virDomainDiskMetadataCacheSize */
>>      char *domain_name; /* backend domain name */
>>      unsigned int queues;
>>      virDomainVirtioOptionsPtr virtio;
>> @@ -3388,6 +3396,7 @@ VIR_ENUM_DECL(virDomainDeviceSGIO)
>>  VIR_ENUM_DECL(virDomainDiskTray)
>>  VIR_ENUM_DECL(virDomainDiskDiscard)
>>  VIR_ENUM_DECL(virDomainDiskDetectZeroes)
>> +VIR_ENUM_DECL(virDomainDiskMetadataCacheSize)
>>  VIR_ENUM_DECL(virDomainDiskMirrorState)
>>  VIR_ENUM_DECL(virDomainController)
>>  VIR_ENUM_DECL(virDomainControllerModelPCI)
>> diff --git a/tests/qemuxml2argvdata/disk-metadata_cache_size.xml b/tests/qemuxml2argvdata/disk-metadata_cache_size.xml
>> new file mode 100644
>> index 0000000..8ac2599
>> --- /dev/null
>> +++ b/tests/qemuxml2argvdata/disk-metadata_cache_size.xml
>> @@ -0,0 +1,42 @@
>> +<domain type='qemu'>
>> +  <name>test</name>
>> +  <uuid>468404ad-d49c-40f2-9e14-02294f9c1be3</uuid>
>> +  <memory unit='KiB'>1048576</memory>
>> +  <currentMemory unit='KiB'>1048576</currentMemory>
>> +  <vcpu placement='static'>1</vcpu>
>> +  <os>
>> +    <type arch='x86_64' machine='pc-0.13'>hvm</type>
> 
> Nothing more recent than pc-0.13 ;-) for this copy-pasta.
> 
>> +    <boot dev='cdrom'/>
>> +    <boot dev='hd'/>
>> +    <bootmenu enable='yes'/>
>> +  </os>
>> +  <clock offset='utc'/>
>> +  <on_poweroff>destroy</on_poweroff>
>> +  <on_reboot>restart</on_reboot>
>> +  <on_crash>restart</on_crash>
>> +  <devices>
>> +    <emulator>/usr/bin/qemu-system-x86_64</emulator>
>> +    <disk type='file' device='disk'>
>> +      <driver name='qemu' type='qcow2' metadata_cache_size='maximum'/>
>> +      <source file='/var/lib/libvirt/images/f14.img'/>
>> +      <target dev='vda' bus='virtio'/>
>> +      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
>> +    </disk>
>> +    <disk type='file' device='cdrom'>
>> +      <driver name='qemu' type='raw'/>
>> +      <source file='/var/lib/libvirt/Fedora-14-x86_64-Live-KDE.iso'/>
>> +      <target dev='hdc' bus='ide'/>
>> +      <readonly/>
>> +      <address type='drive' controller='0' bus='1' target='0' unit='0'/>
>> +    </disk>
> 
> This second disk isn't necessary
> 
> John
> 
>> +    <controller type='usb' index='0'/>
>> +    <controller type='virtio-serial' index='0'>
>> +      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
>> +    </controller>
>> +    <controller type='ide' index='0'/>
>> +    <controller type='pci' index='0' model='pci-root'/>
>> +    <input type='mouse' bus='ps2'/>
>> +    <input type='keyboard' bus='ps2'/>
>> +    <memballoon model='virtio'/>
>> +  </devices>
>> +</domain>
>> diff --git a/tests/qemuxml2xmloutdata/disk-metadata_cache_size.xml b/tests/qemuxml2xmloutdata/disk-metadata_cache_size.xml
>> new file mode 100644
>> index 0000000..5fed22b
>> --- /dev/null
>> +++ b/tests/qemuxml2xmloutdata/disk-metadata_cache_size.xml
>> @@ -0,0 +1,48 @@
>> +<domain type='qemu'>
>> +  <name>test</name>
>> +  <uuid>468404ad-d49c-40f2-9e14-02294f9c1be3</uuid>
>> +  <memory unit='KiB'>1048576</memory>
>> +  <currentMemory unit='KiB'>1048576</currentMemory>
>> +  <vcpu placement='static'>1</vcpu>
>> +  <os>
>> +    <type arch='x86_64' machine='pc-0.13'>hvm</type>
>> +    <boot dev='cdrom'/>
>> +    <boot dev='hd'/>
>> +    <bootmenu enable='yes'/>
>> +  </os>
>> +  <clock offset='utc'/>
>> +  <on_poweroff>destroy</on_poweroff>
>> +  <on_reboot>restart</on_reboot>
>> +  <on_crash>restart</on_crash>
>> +  <devices>
>> +    <emulator>/usr/bin/qemu-system-x86_64</emulator>
>> +    <disk type='file' device='disk'>
>> +      <driver name='qemu' type='qcow2' metadata_cache_size='maximum'/>
>> +      <source file='/var/lib/libvirt/images/f14.img'/>
>> +      <target dev='vda' bus='virtio'/>
>> +      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
>> +    </disk>
>> +    <disk type='file' device='cdrom'>
>> +      <driver name='qemu' type='raw'/>
>> +      <source file='/var/lib/libvirt/Fedora-14-x86_64-Live-KDE.iso'/>
>> +      <target dev='hdc' bus='ide'/>
>> +      <readonly/>
>> +      <address type='drive' controller='0' bus='1' target='0' unit='0'/>
>> +    </disk>
>> +    <controller type='usb' index='0'>
>> +      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/>
>> +    </controller>
>> +    <controller type='virtio-serial' index='0'>
>> +      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
>> +    </controller>
>> +    <controller type='ide' index='0'>
>> +      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/>
>> +    </controller>
>> +    <controller type='pci' index='0' model='pci-root'/>
>> +    <input type='mouse' bus='ps2'/>
>> +    <input type='keyboard' bus='ps2'/>
>> +    <memballoon model='virtio'>
>> +      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
>> +    </memballoon>
>> +  </devices>
>> +</domain>
>> diff --git a/tests/qemuxml2xmltest.c b/tests/qemuxml2xmltest.c
>> index 89640f6..c44e0fe 100644
>> --- a/tests/qemuxml2xmltest.c
>> +++ b/tests/qemuxml2xmltest.c
>> @@ -1235,6 +1235,8 @@ mymain(void)
>>      DO_TEST("riscv64-virt",
>>              QEMU_CAPS_DEVICE_VIRTIO_MMIO);
>>  
>> +    DO_TEST("disk-metadata_cache_size", NONE);
>> +
>>      if (getenv("LIBVIRT_SKIP_CLEANUP") == NULL)
>>          virFileDeleteTree(fakerootdir);
>>  
>>


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]