[libvirt] [RFC PATCH v2 1/4] numa: describe siblings distances within cells
Daniel P. Berrange
berrange at redhat.com
Wed Jun 28 14:21:29 UTC 2017
On Wed, Jun 28, 2017 at 03:56:36PM +0200, Wim Ten Have wrote:
> From: Wim ten Have <wim.ten.have at oracle.com>
>
> Add libvirtd NUMA cell domain administration functionality to
> describe underlying cell id sibling distances in full fashion
> when configuring HVM guests.
>
> [below is an example of a 4 node setup]
>
> <cpu>
> <numa>
> <cell id='0' cpus='0' memory='2097152' unit='KiB'>
> <distances>
> <sibling id='0' value='10'/>
> <sibling id='1' value='21'/>
> <sibling id='2' value='31'/>
> <sibling id='3' value='41'/>
> </distances>
> </cell>
> <cell id='1' cpus='1' memory='2097152' unit='KiB'>
> <distances>
> <sibling id='0' value='21'/>
> <sibling id='1' value='10'/>
> <sibling id='2' value='31'/>
> <sibling id='3' value='41'/>
> </distances>
> </cell>
> <cell id='2' cpus='2' memory='2097152' unit='KiB'>
> <distances>
> <sibling id='0' value='31'/>
> <sibling id='1' value='21'/>
> <sibling id='2' value='10'/>
> <sibling id='3' value='21'/>
> </distances>
> <cell id='3' cpus='3' memory='2097152' unit='KiB'>
> <distances>
> <sibling id='0' value='41'/>
> <sibling id='1' value='31'/>
> <sibling id='2' value='21'/>
> <sibling id='3' value='10'/>
> </distances>
> </cell>
> </numa>
> </cpu>
>
> Changes under this commit concern all those that require adding
> the valid data structures, virDomainNuma* functional routines and the
> XML doc schema enhancements to enforce appropriate administration.
>
> These changes are accompanied with topic related documentation under
> docs/formatdomain.html within the "CPU model and topology" extending the
> "Guest NUMA topology" paragraph.
>
> For terminology we refer to sockets as "nodes" where access to each
> others' distinct resources such as memory make them "siblings" with a
> designated "distance" between them. A specific design is described under
> the ACPI (Advanced Configuration and Power Interface Specification)
> within the chapter explaining the system's SLIT (System Locality Distance
> Information Table).
>
> These patches extend core libvirt's XML description of a virtual machine's
> hardware to include NUMA distance information for sibling nodes, which
> is then passed to Xen guests via libxl. Recently qemu landed support for
> constructing the SLIT since commit 0f203430dd ("numa: Allow setting NUMA
> distance for different NUMA nodes"), hence these core libvirt extensions
> can also help other drivers in supporting this feature.
>
> These changes alter the docs/schemas/cputypes.rng enforcing
> domain administration to follow the syntax below per numa cell id.
>
> These changes also alter docs/schemas/basictypes.rng to add
> "numaDistanceValue" which is an "unsignedInt" with a minimum value
> of 10 as 0-9 are reserved values and can not be used as System
> Locality Distance Information Table data.
>
> Signed-off-by: Wim ten Have <wim.ten.have at oracle.com>
> ---
> docs/formatdomain.html.in | 64 ++++++++++-
> docs/schemas/basictypes.rng | 8 ++
> docs/schemas/cputypes.rng | 18 +++
> src/conf/cpu_conf.c | 2 +-
> src/conf/numa_conf.c | 260 +++++++++++++++++++++++++++++++++++++++++++-
> src/conf/numa_conf.h | 25 ++++-
> src/libvirt_private.syms | 6 +
> 7 files changed, 376 insertions(+), 7 deletions(-)
>
> diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in
> index 36bea67..b9736e3 100644
> --- a/docs/formatdomain.html.in
> +++ b/docs/formatdomain.html.in
> @@ -1510,7 +1510,69 @@
> </p>
>
> <p>
> - This guest NUMA specification is currently available only for QEMU/KVM.
> + This guest NUMA specification is currently available only for
> + QEMU/KVM and Xen. Whereas Xen driver also allows for a distinct
> + description of NUMA arranged <code>sibling</code> <code>cell</code>
> + <code>distances</code> <span class="since">Since 3.6.0</span>.
> + </p>
> +
> + <p>
> + Under NUMA h/w architecture, distinct resources such as memory
> + create a designated distance between <code>cell</code> and
> + <code>siblings</code> that now can be described with the help of
> + <code>distances</code>. A detailed describtion can be found within
> + the ACPI (Advanced Configuration and Power Interface Specification)
> + within the chapter explaining the system's SLIT (System Locality
> + Distance Information Table).
> + </p>
> +
> +<pre>
> +...
> +<cpu>
> + ...
> + <numa>
> + <cell id='0' cpus='0,4-7' memory='512000' unit='KiB'>
> + <distances>
> + <sibling id='0' value='10'/>
> + <sibling id='1' value='21'/>
> + <sibling id='2' value='31'/>
> + <sibling id='3' value='41'/>
> + </distances>
> + </cell>
> + <cell id='1' cpus='1,8-10,12-15' memory='512000' unit='KiB' memAccess='shared'>
> + <distances>
> + <sibling id='0' value='21'/>
> + <sibling id='1' value='10'/>
> + <sibling id='2' value='21'/>
> + <sibling id='3' value='31'/>
> + </distances>
> + </cell>
> + <cell id='2' cpus='2,11' memory='512000' unit='KiB' memAccess='shared'>
> + <distances>
> + <sibling id='0' value='31'/>
> + <sibling id='1' value='21'/>
> + <sibling id='2' value='10'/>
> + <sibling id='3' value='21'/>
> + </distances>
> + </cell>
> + <cell id='3' cpus='3' memory='512000' unit='KiB'>
> + <distances>
> + <sibling id='0' value='41'/>
> + <sibling id='1' value='31'/>
> + <sibling id='2' value='21'/>
> + <sibling id='3' value='10'/>
> + </distances>
> + </cell>
> + </numa>
> + ...
> +</cpu>
> +...</pre>
> +
> + <p>
> + Under Xen driver, if no <code>distances</code> are given to describe
> + the SLIT data between different cells, it will default to a scheme
> + using 10 for local and 21 for other cells. Which is the assumption
> + of guest OS when no SLIT is specified.
> </p>
We need to specify what happens if partial data is given too. In particular
if you give a distance for A -> B, and not B -> A, then we should assume
symmetry.
Rather than having the defaults be duplicated in drivers, we should have
a helper API in domain_conf.c that can be used to query the distance
between 2 nodes that can be called by drivers. This can then apply
defaults in a consistent manner.
> @@ -66,6 +68,12 @@ struct _virDomainNuma {
> virBitmapPtr nodeset; /* host memory nodes where this guest node resides */
> virDomainNumatuneMemMode mode; /* memory mode selection */
> virDomainMemoryAccess memAccess; /* shared memory access configuration */
> +
> + struct _virDomainNumaDistance {
> + unsigned int value; /* locality value for node i*j */
Strictly speaking distances have an orderng, ie this is 'node i -> j'
in that specific direction. 'node j -> i' can be different.
> + unsigned int cellid;
Needs extra indent - 4 spaces per indent level
> + } *distances; /* remote node distances */
> + size_t ndistances;
> } *mem_nodes; /* guest node configuration */
> size_t nmem_nodes;
>
> @@ -686,6 +694,95 @@ virDomainNumatuneNodesetIsAvailable(virDomainNumaPtr numatune,
> }
> + for (i = 0; i < ndistances; i++) {
> + unsigned int sibling_id = i, sibling_value;
> +
> + /* siblings are in order of parsing or explicitly numbered */
> + if ((tmp = virXMLPropString(nodes[i], "id"))) {
> + if (virStrToLong_uip(tmp, NULL, 10, &sibling_id) < 0) {
> + virReportError(VIR_ERR_XML_ERROR,
> + _("Invalid 'id' attribute in NUMA distances for sibling: '%s'"),
> + tmp);
> + goto cleanup;
> + }
> +
> + if (sibling_id >= ndistances) {
> + virReportError(VIR_ERR_XML_ERROR, "%s",
> + _("Exactly one 'sibling' element per NUMA distance "
> + "is allowed, non-contiguous ranges or ranges not "
> + "starting from 0 are not allowed"));
We should allow non-contiguous ranges, starting from any node id. ie users should
be free to provide sparse information, relying on defaults for the rest.
> + goto cleanup;
> + }
> + }
> + VIR_FREE(tmp);
> +
> + /* We need a locality value */
> + if (!(tmp = virXMLPropString(nodes[i], "value"))) {
> + virReportError(VIR_ERR_XML_ERROR,
> + _("Missing 'value' attribute in NUMA distances for sibling id: '%d'"),
> + sibling_id);
> + goto cleanup;
> + }
> +
> + /* It needs to be applicable */
> + if (virStrToLong_uip(tmp, NULL, 10, &sibling_value) < 0) {
> + virReportError(VIR_ERR_XML_ERROR,
> + _("Invalid 'value' attribute in NUMA distances for sibling id: '%d'"),
> + sibling_id);
> + goto cleanup;
> + }
> + VIR_FREE(tmp);
> +
> + distances[sibling_id].cellid = sibling_id;
> + distances[sibling_id].value = sibling_value;
> + }
> +
> + def->mem_nodes[cur_cell].distances = distances;
> + def->mem_nodes[cur_cell].ndistances = ndistances;
> +
> + ret = 0;
> +
> + cleanup:
> + if (ret)
> + VIR_FREE(distances);
> + VIR_FREE(nodes);
> + VIR_FREE(tmp);
> +
> + return ret;
> +}
> +
> int
> virDomainNumaDefCPUParseXML(virDomainNumaPtr def,
> xmlXPathContextPtr ctxt)
> @@ -788,6 +885,14 @@ virDomainNumaDefCPUParseXML(virDomainNumaPtr def,
> def->mem_nodes[cur_cell].memAccess = rc;
> VIR_FREE(tmp);
> }
> +
> + /* Parse NUMA distances info */
> + if (virDomainNumaDefNodeDistanceParseXML(def, ctxt, cur_cell) < 0) {
> + virReportError(VIR_ERR_CONFIG_UNSUPPORTED,
> + _("NUMA cell %d has incorrect 'distances' configured"),
> + cur_cell);
> + goto cleanup;
> + }
This method already reported an detailed error message, which is being
overwritten here with a useless generic message.
> +size_t
> +virDomainNumaSetNodeCount(virDomainNumaPtr numa, size_t nmem_nodes)
> +{
> + if (!numa || !nmem_nodes)
> + return 0;
This error condition doesn't report any error message...
> +
> + if (numa->mem_nodes) {
> + virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
> + _("Cannot alter an existing nmem_nodes set"));
> + return 0;
> + }
...but this does...
> +
> + if (VIR_ALLOC_N(numa->mem_nodes, nmem_nodes) < 0)
> + return 0;
...and so does this.
Callers need to have consistent error reporting for all scenarios
> +
> + numa->nmem_nodes = nmem_nodes;
> +
> + return numa->nmem_nodes;
> +}
> +
> +
> +size_t
> +virDomainNumaGetNodeDistance(virDomainNumaPtr numa,
> + size_t node,
> + size_t cellid)
> +{
> + virDomainNumaDistancePtr distances;
> +
> + if (!numa)
> + return 0;
Just mark 'numa' parameters as ATTRIBUTE_NONNULL and make
sure callers don't run this method if numa is not
configured.
> +
> + distances = numa->mem_nodes[node].distances;
You've not bounds-checked 'node' vs mem_nodes size.
> + if (!numa->mem_nodes[node].ndistances || !distances)
> + return 0;
This is where we should plug in defaults for partially
specified data.
> +
> + return distances[cellid].value;
> +}
What are calling semantics here ? I think we should
> +
> +
> +size_t
> +virDomainNumaSetNodeDistance(virDomainNumaPtr numa,
> + size_t node,
> + size_t cellid,
> + unsigned int value)
> +{
> + virDomainNumaDistancePtr distances;
> +
> + /*
> + * Advanced Configuration and Power Interface
> + * Specification version 6.1. Chapter 5.2.17
> + * System Locality Distance Information Table
> + * ... Distance values of 0-9 are reserved.
> + */
> + if (!numa || value < 10)
> + return 0;
ATTRIBUTE_NONNULL instead, and report an error for
values < 10
> +
> + distances = numa->mem_nodes[node].distances;
> +
> + if (numa->mem_nodes[node].ndistances > 0 &&
> + distances[cellid].value)
> + return 0;
I'm not sure what this check is doing - it seems to be
refusing to let you change the value if one is already
set. I don't see the point of that, but if it is needed
then we should report errors
> +
> + distances[cellid].cellid = cellid;
> + distances[cellid].value = value;
> +
> + return distances[cellid].value;
> +}
> +
> +
> +size_t
> +virDomainNumaGetNodeDistanceCount(virDomainNumaPtr numa,
> + size_t node)
> +{
> + if (!numa)
> + return 0;
ATTRIBUTE_NONNULL instead
> +
> + return numa->mem_nodes[node].ndistances;
> +}
> +
> +
> +size_t
> +virDomainNumaSetNodeDistanceCount(virDomainNumaPtr numa,
> + size_t node,
> + size_t ndistances)
> +{
> + virDomainNumaDistancePtr distances;
> +
> + if (!numa || !ndistances)
> + return 0;
ATTRIBUTE_NONNULL for first.
For the second we should presumably free any existing
distance data
> +
> + distances = numa->mem_nodes[node].distances;
> + if (distances) {
> + virReportError(VIR_ERR_INTERNAL_ERROR,
> + _("Cannot alter an existing nmem_nodes distances set for node: %zu"),
> + node);
> + return 0;
> + }
> +
> + if (VIR_ALLOC_N(distances, ndistances) < 0)
> + return 0;
> +
> + numa->mem_nodes[node].distances = distances;
> + numa->mem_nodes[node].ndistances = ndistances;
> +
> + return numa->mem_nodes[node].ndistances;
> +}
> +
> +
> virBitmapPtr
> virDomainNumaGetNodeCpumask(virDomainNumaPtr numa,
> size_t node)
> @@ -937,6 +1175,20 @@ virDomainNumaGetNodeCpumask(virDomainNumaPtr numa,
> }
>
>
> +virBitmapPtr
> +virDomainNumaSetNodeCpumask(virDomainNumaPtr numa,
> + size_t node,
> + virBitmapPtr cpumask)
> +{
> + if (!numa || !cpumask)
> + return NULL;
> +
> + numa->mem_nodes[node].cpumask = cpumask;
> +
> + return numa->mem_nodes[node].cpumask;
> +}
> +
Regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
More information about the libvir-list
mailing list