[libvirt] [RFC PATCH 01/17] qemu: setup shared memory without explicit numa configuration
Marc-André Lureau
marcandre.lureau at redhat.com
Thu Aug 16 10:31:56 UTC 2018
Hi
On Thu, Aug 16, 2018 at 4:35 AM, John Ferlan <jferlan at redhat.com> wrote:
>
>
> On 07/13/2018 09:28 AM, marcandre.lureau at redhat.com wrote:
>> From: Marc-André Lureau <marcandre.lureau at redhat.com>
>>
>> When a domain is configured with 'shared' memory backing:
>>
>> <memoryBacking>
>> <access mode='shared'/>
>> </memoryBacking>
>>
>> But no explicit NUMA configuration, let's configure a shared memory
>> backend associated with default -numa.
>>
>> Signed-off-by: Marc-André Lureau <marcandre.lureau at redhat.com>
>> ---
>> src/qemu/qemu_command.c | 100 ++++++++++++------
>> .../fd-memory-no-numa-topology.args | 4 +
>> 2 files changed, 73 insertions(+), 31 deletions(-)
>>
>
> NUMA, memory backends, and hugepages - not in my wheelhouse of
> knowledge. Hopefully Michal and/or Pavel will take a look!
>
> Is it possible someone may not want this type of thing to happen? Is
I assume someone that sets 'shared' memory mode may consider this as a bug fix.
> there an upside or downside to this? What happens "today" when not
You get non-shared memory
> generated? And of course, what about migration concerns about
> unconditionally doing this for some target migration?
True, this will break migration though if the target uses
numa/memory-backend-file.
What do you suggest?
>
>> diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c
>> index 44ae8dcef7..f1235099b2 100644
>> --- a/src/qemu/qemu_command.c
>> +++ b/src/qemu/qemu_command.c
>> @@ -3254,26 +3254,21 @@ qemuBuildMemoryBackendProps(virJSONValuePtr *backendProps,
>>
>>
>> static int
>> -qemuBuildMemoryCellBackendStr(virDomainDefPtr def,
>> - virQEMUDriverConfigPtr cfg,
>> - size_t cell,
>> - qemuDomainObjPrivatePtr priv,
>> - virBufferPtr buf)
>> +qemuBuildMemoryBackendStr(virDomainDefPtr def,
>> + virQEMUDriverConfigPtr cfg,
>> + const char *alias,
>
> So the one "concern" I'd have here is some time in the future the @mem
> gets allocated and handled like a real device eventually calling
> virDomainDeviceInfoClear and that'd be a problem for the passed const
> char * string. Some future person's problem I guess!
>
>> + int targetNode,
>> + unsigned long long memsize,
>> + qemuDomainObjPrivatePtr priv,
>> + virBufferPtr buf)
>
> As much as a long name is a pain, is this more of a :
>
> qemuBuildMemorySharedDefaultBackendStr
Why?
>
>> {
>> virJSONValuePtr props = NULL;
>> - char *alias = NULL;
>> - int ret = -1;
>> - int rc;
>> virDomainMemoryDef mem = { 0 };
>> - unsigned long long memsize = virDomainNumaGetNodeMemorySize(def->numa,
>> - cell);
>> -
>> - if (virAsprintf(&alias, "ram-node%zu", cell) < 0)
>> - goto cleanup;
>> + int rc, ret = -1;
>>
>> mem.size = memsize;
>> - mem.targetNode = cell;
>> - mem.info.alias = alias;
>> + mem.targetNode = targetNode;
>> + mem.info.alias = (char *)alias;
>>
>> if ((rc = qemuBuildMemoryBackendProps(&props, alias, cfg, priv->qemuCaps,
>> def, &mem, priv->autoNodeset, false)) < 0)> @@ -3284,9 +3279,30 @@ qemuBuildMemoryCellBackendStr(virDomainDefPtr def,
>>
>> ret = rc;
>>
>> +cleanup:
>
> Fails 'make check syntax-check' :
>
> maint.mk: Top-level labels should be indented by one space
> make: *** [cfg.mk:898: sc_require_space_before_label] Error 1
argh, fixed
>
>> + virJSONValueFree(props);
>> + return ret;
>> +}
>> +
>> +
>> +static int
>> +qemuBuildMemoryCellBackendStr(virDomainDefPtr def,
>> + virQEMUDriverConfigPtr cfg,
>> + size_t cell,
>> + qemuDomainObjPrivatePtr priv,
>> + virBufferPtr buf)
>> +{
>> + char *alias = NULL;
>> + int ret = -1;
>> + unsigned long long memsize = virDomainNumaGetNodeMemorySize(def->numa, cell);
>> +
>> + if (virAsprintf(&alias, "ram-node%zu", cell) < 0)
>> + goto cleanup;
>> +
>> + ret = qemuBuildMemoryBackendStr(def, cfg, alias, cell, memsize, priv, buf);
>> +
>> cleanup:
>> VIR_FREE(alias);
>> - virJSONValueFree(props);
>>
>> return ret;
>> }
>> @@ -7590,6 +7606,17 @@ qemuBuildNumaArgStr(virQEMUDriverConfigPtr cfg,
>> size_t ncells = virDomainNumaGetNodeCount(def->numa);
>> const long system_page_size = virGetSystemPageSizeKB();
>> bool numa_distances = false;
>> + bool implicit = false;
>> +
>> + if (ncells == 0) {
>> + if (def->mem.access == VIR_DOMAIN_MEMORY_ACCESS_SHARED) {
>> + ncells = 1;
>
> Well, that's cheating for the subsequent for loop ;-)
>
>> + implicit = true;
>> + } else {
>> + ret = 0;
>> + goto cleanup;
>> + }
>> + }
>
> So if ncells == 0 && def->mem.access != VIR_DOMAIN_MEMORY_ACCESS_SHARED,
> then we return 0 without doing the subsequent code? Is that expected? Is
> there something done later that may be necessary, needed, or assumed.
No, before the patch, virDomainNumaGetNodeCount() is checked before
calling qemuBuildNumaArgStr(). Now it is handled inside in case
ncells==0 && def->mem.access != VIR_DOMAIN_MEMORY_ACCESS_SHARED.
>>
>> if (virDomainNumatuneHasPerNodeBinding(def->numa) &&
>> !(virQEMUCapsGet(qemuCaps, QEMU_CAPS_OBJECT_MEMORY_RAM) ||
>> @@ -7645,14 +7672,22 @@ qemuBuildNumaArgStr(virQEMUDriverConfigPtr cfg,
>> if (virQEMUCapsGet(qemuCaps, QEMU_CAPS_OBJECT_MEMORY_RAM) ||
>> virQEMUCapsGet(qemuCaps, QEMU_CAPS_OBJECT_MEMORY_FILE)) {
>>
>> - if ((rc = qemuBuildMemoryCellBackendStr(def, cfg, i, priv,
>> - &nodeBackends[i])) < 0)
>> +
>> + if (implicit)
>> + rc = qemuBuildMemoryBackendStr(def, cfg, "ram-node", -1,
>
> Using "ram-node" where other devices use "ram-node%zu" could easily be
> missed - maybe "default" or "implicit" as a prefix or postfix to make it
> stand out a bit more? It's not a big deal though.
>
>> + def->mem.total_memory,
>> + priv, &nodeBackends[i]);
>> + else
>> + rc = qemuBuildMemoryCellBackendStr(def, cfg, i,
>> + priv, &nodeBackends[i]);
>> + if (rc < 0)
>> goto cleanup;
>>
>> if (rc == 0)
>> needBackend = true;
>> } else {
>> - if (virDomainNumaGetNodeMemoryAccessMode(def->numa, i)) {
>> + if (implicit ||
>> + virDomainNumaGetNodeMemoryAccessMode(def->numa, i)) {
>> virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s",
>> _("Shared memory mapping is not supported "
>> "with this QEMU"));
>> @@ -7667,15 +7702,18 @@ qemuBuildNumaArgStr(virQEMUDriverConfigPtr cfg,
>>
>> for (i = 0; i < ncells; i++) {
>> VIR_FREE(cpumask);
>> - if (!(cpumask = virBitmapFormat(virDomainNumaGetNodeCpumask(def->numa, i))))
>> - goto cleanup;
>>
>> - if (strchr(cpumask, ',') &&
>> - !virQEMUCapsGet(qemuCaps, QEMU_CAPS_NUMA)) {
>> - virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s",
>> - _("disjoint NUMA cpu ranges are not supported "
>> - "with this QEMU"));
>> - goto cleanup;
>> + if (!implicit) {
>> + if (!(cpumask = virBitmapFormat(virDomainNumaGetNodeCpumask(def->numa, i))))
>> + goto cleanup;
>> +
>> + if (strchr(cpumask, ',') &&
>> + !virQEMUCapsGet(qemuCaps, QEMU_CAPS_NUMA)) {
>> + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s",
>> + _("disjoint NUMA cpu ranges are not supported "
>> + "with this QEMU"));
>> + goto cleanup;
>> + }
>> }
>>
>> if (needBackend) {
>> @@ -7694,7 +7732,8 @@ qemuBuildNumaArgStr(virQEMUDriverConfigPtr cfg,
>> }
>>
>> if (needBackend)
>> - virBufferAsprintf(&buf, ",memdev=ram-node%zu", i);
>> + virBufferAsprintf(&buf, implicit ?
>> + ",memdev=ram-node" : ",memdev=ram-node%zu", i);
>> else
>> virBufferAsprintf(&buf, ",mem=%llu",
>> virDomainNumaGetNodeMemorySize(def->numa, i) / 1024);
>> @@ -7717,7 +7756,7 @@ qemuBuildNumaArgStr(virQEMUDriverConfigPtr cfg,
>> break;
>> }
>>
>> - if (numa_distances) {
>> + if (!implicit && numa_distances) {
>> if (!virQEMUCapsGet(qemuCaps, QEMU_CAPS_NUMA_DIST)) {
>> virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s",
>> _("setting NUMA distances is not "
>> @@ -10303,8 +10342,7 @@ qemuBuildCommandLine(virQEMUDriverPtr driver,
>> if (qemuBuildIOThreadCommandLine(cmd, def) < 0)
>> goto error;
>>
>> - if (virDomainNumaGetNodeCount(def->numa) &&
>> - qemuBuildNumaArgStr(cfg, def, cmd, priv) < 0)
>> + if (qemuBuildNumaArgStr(cfg, def, cmd, priv) < 0)
>> goto error;
>>
>> if (qemuBuildMemoryDeviceCommandLine(cmd, cfg, def, priv) < 0)
>> diff --git a/tests/qemuxml2argvdata/fd-memory-no-numa-topology.args b/tests/qemuxml2argvdata/fd-memory-no-numa-topology.args
>> index bd88daaa3b..400fb39cc6 100644
>> --- a/tests/qemuxml2argvdata/fd-memory-no-numa-topology.args
>> +++ b/tests/qemuxml2argvdata/fd-memory-no-numa-topology.args
>> @@ -11,6 +11,10 @@ QEMU_AUDIO_DRV=none \
>> -m 14336 \
>> -mem-prealloc \
>> -smp 8,sockets=8,cores=1,threads=1 \
>> +-object memory-backend-file,id=ram-node,\
>> +mem-path=/var/lib/libvirt/qemu/ram/libvirt/qemu/-1-instance-00000092/ram-node,\
>> +share=yes,size=15032385536 \
>
> Curious does that perhaps rather large file for mem-path get created? If
> so, wow! Seems to me something like this would need to be documented or
> as noted earlier be configurable.
You mean during tests? no, it's created by qemu afaik.
I am not a libvirt expert either, and I would rather have no file be
created when you want shared memory. That's what the next patches
propose with memfd support.
>
> John
>
>> +-numa node,nodeid=0,memdev=ram-node \
>> -uuid 126f2720-6f8e-45ab-a886-ec9277079a67 \
>> -display none \
>> -no-user-config \
>>
thanks
More information about the libvir-list
mailing list