[libvirt] [RFC PATCH v2 1/4] numa: describe siblings distances within cells

Daniel P. Berrange berrange at redhat.com
Wed Jun 28 14:21:29 UTC 2017


On Wed, Jun 28, 2017 at 03:56:36PM +0200, Wim Ten Have wrote:
> From: Wim ten Have <wim.ten.have at oracle.com>
> 
> Add libvirtd NUMA cell domain administration functionality to
> describe underlying cell id sibling distances in full fashion
> when configuring HVM guests.
> 
> [below is an example of a 4 node setup]
> 
>   <cpu>
>     <numa>
>       <cell id='0' cpus='0' memory='2097152' unit='KiB'>
>         <distances>
>           <sibling id='0' value='10'/>
>           <sibling id='1' value='21'/>
>           <sibling id='2' value='31'/>
>           <sibling id='3' value='41'/>
>         </distances>
>       </cell>
>       <cell id='1' cpus='1' memory='2097152' unit='KiB'>
>         <distances>
>           <sibling id='0' value='21'/>
>           <sibling id='1' value='10'/>
>           <sibling id='2' value='31'/>
>           <sibling id='3' value='41'/>
>         </distances>
>       </cell>
>       <cell id='2' cpus='2' memory='2097152' unit='KiB'>
>         <distances>
>           <sibling id='0' value='31'/>
>           <sibling id='1' value='21'/>
>           <sibling id='2' value='10'/>
>           <sibling id='3' value='21'/>
>         </distances>
>       <cell id='3' cpus='3' memory='2097152' unit='KiB'>
>         <distances>
>           <sibling id='0' value='41'/>
>           <sibling id='1' value='31'/>
>           <sibling id='2' value='21'/>
>           <sibling id='3' value='10'/>
>         </distances>
>       </cell>
>     </numa>
>   </cpu>
> 
> Changes under this commit concern all those that require adding
> the valid data structures, virDomainNuma* functional routines and the
> XML doc schema enhancements to enforce appropriate administration.
> 
> These changes are accompanied with topic related documentation under
> docs/formatdomain.html within the "CPU model and topology" extending the
> "Guest NUMA topology" paragraph.
> 
> For terminology we refer to sockets as "nodes" where access to each
> others' distinct resources such as memory make them "siblings" with a
> designated "distance" between them.  A specific design is described under
> the ACPI (Advanced Configuration and Power Interface Specification)
> within the chapter explaining the system's SLIT (System Locality Distance
> Information Table).
> 
> These patches extend core libvirt's XML description of a virtual machine's
> hardware to include NUMA distance information for sibling nodes, which
> is then passed to Xen guests via libxl. Recently qemu landed support for
> constructing the SLIT since commit 0f203430dd ("numa: Allow setting NUMA
> distance for different NUMA nodes"), hence these core libvirt extensions
> can also help other drivers in supporting this feature.
> 
> These changes alter the docs/schemas/cputypes.rng enforcing
> domain administration to follow the syntax below per numa cell id.
> 
> These changes also alter docs/schemas/basictypes.rng to add
> "numaDistanceValue" which is an "unsignedInt" with a minimum value
> of 10 as 0-9 are reserved values and can not be used as System
> Locality Distance Information Table data.
> 
> Signed-off-by: Wim ten Have <wim.ten.have at oracle.com>
> ---
>  docs/formatdomain.html.in   |  64 ++++++++++-
>  docs/schemas/basictypes.rng |   8 ++
>  docs/schemas/cputypes.rng   |  18 +++
>  src/conf/cpu_conf.c         |   2 +-
>  src/conf/numa_conf.c        | 260 +++++++++++++++++++++++++++++++++++++++++++-
>  src/conf/numa_conf.h        |  25 ++++-
>  src/libvirt_private.syms    |   6 +
>  7 files changed, 376 insertions(+), 7 deletions(-)
> 
> diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in
> index 36bea67..b9736e3 100644
> --- a/docs/formatdomain.html.in
> +++ b/docs/formatdomain.html.in
> @@ -1510,7 +1510,69 @@
>      </p>
>  
>      <p>
> -      This guest NUMA specification is currently available only for QEMU/KVM.
> +      This guest NUMA specification is currently available only for
> +      QEMU/KVM and Xen.  Whereas Xen driver also allows for a distinct
> +      description of NUMA arranged <code>sibling</code> <code>cell</code>
> +      <code>distances</code> <span class="since">Since 3.6.0</span>.
> +    </p>
> +
> +    <p>
> +      Under NUMA h/w architecture, distinct resources such as memory
> +      create a designated distance between <code>cell</code> and
> +      <code>siblings</code> that now can be described with the help of
> +      <code>distances</code>. A detailed describtion can be found within
> +      the ACPI (Advanced Configuration and Power Interface Specification)
> +      within the chapter explaining the system's SLIT (System Locality
> +      Distance Information Table).
> +    </p>
> +
> +<pre>
> +...
> +<cpu>
> +  ...
> +  <numa>
> +    <cell id='0' cpus='0,4-7' memory='512000' unit='KiB'>
> +      <distances>
> +        <sibling id='0' value='10'/>
> +        <sibling id='1' value='21'/>
> +        <sibling id='2' value='31'/>
> +        <sibling id='3' value='41'/>
> +      </distances>
> +    </cell>
> +    <cell id='1' cpus='1,8-10,12-15' memory='512000' unit='KiB' memAccess='shared'>
> +      <distances>
> +        <sibling id='0' value='21'/>
> +        <sibling id='1' value='10'/>
> +        <sibling id='2' value='21'/>
> +        <sibling id='3' value='31'/>
> +      </distances>
> +    </cell>
> +    <cell id='2' cpus='2,11' memory='512000' unit='KiB' memAccess='shared'>
> +      <distances>
> +        <sibling id='0' value='31'/>
> +        <sibling id='1' value='21'/>
> +        <sibling id='2' value='10'/>
> +        <sibling id='3' value='21'/>
> +      </distances>
> +    </cell>
> +    <cell id='3' cpus='3' memory='512000' unit='KiB'>
> +      <distances>
> +        <sibling id='0' value='41'/>
> +        <sibling id='1' value='31'/>
> +        <sibling id='2' value='21'/>
> +        <sibling id='3' value='10'/>
> +      </distances>
> +    </cell>
> +  </numa>
> +  ...
> +</cpu>
> +...</pre>
> +
> +    <p>
> +      Under Xen driver, if no <code>distances</code> are given to describe
> +      the SLIT data between different cells, it will default to a scheme
> +      using 10 for local and 21 for other cells.  Which is the assumption
> +      of guest OS when no SLIT is specified.
>      </p>

We need to specify what happens if partial data is given too. In particular
if you give a distance for  A -> B, and not B -> A, then we should assume
symmetry.

Rather than having the defaults be duplicated in drivers, we should have
a helper API in domain_conf.c that can be used to query the distance
between 2 nodes that can be called by drivers. This can then apply
defaults in a consistent manner.


> @@ -66,6 +68,12 @@ struct _virDomainNuma {
>          virBitmapPtr nodeset;   /* host memory nodes where this guest node resides */
>          virDomainNumatuneMemMode mode;  /* memory mode selection */
>          virDomainMemoryAccess memAccess; /* shared memory access configuration */
> +
> +        struct _virDomainNumaDistance {
> +          unsigned int value;    /* locality value for node i*j */

Strictly speaking distances have an orderng, ie this is 'node i -> j'
in that specific direction. 'node j -> i' can be different.

> +          unsigned int cellid;

Needs extra indent - 4 spaces per indent level

> +        } *distances;           /* remote node distances */
> +        size_t ndistances;
>      } *mem_nodes;           /* guest node configuration */
>      size_t nmem_nodes;
>  
> @@ -686,6 +694,95 @@ virDomainNumatuneNodesetIsAvailable(virDomainNumaPtr numatune,
>  }

> +    for (i = 0; i < ndistances; i++) {
> +        unsigned int sibling_id = i, sibling_value;
> +
> +        /* siblings are in order of parsing or explicitly numbered */
> +        if ((tmp = virXMLPropString(nodes[i], "id"))) {
> +            if (virStrToLong_uip(tmp, NULL, 10, &sibling_id) < 0) {
> +                virReportError(VIR_ERR_XML_ERROR,
> +                               _("Invalid 'id' attribute in NUMA distances for sibling: '%s'"),
> +                               tmp);
> +                goto cleanup;
> +            }
> +
> +            if (sibling_id >= ndistances) {
> +                virReportError(VIR_ERR_XML_ERROR, "%s",
> +                               _("Exactly one 'sibling' element per NUMA distance "
> +                                 "is allowed, non-contiguous ranges or ranges not "
> +                                 "starting from 0 are not allowed"));

We should allow non-contiguous ranges, starting from any node id. ie users should
be free to provide sparse information, relying on defaults for the rest.

> +                goto cleanup;
> +            }
> +        }
> +        VIR_FREE(tmp);
> +
> +        /* We need a locality value */
> +        if (!(tmp = virXMLPropString(nodes[i], "value"))) {
> +            virReportError(VIR_ERR_XML_ERROR,
> +                           _("Missing 'value' attribute in NUMA distances for sibling id: '%d'"),
> +                           sibling_id);
> +            goto cleanup;
> +        }
> +
> +        /* It needs to be applicable */
> +        if (virStrToLong_uip(tmp, NULL, 10, &sibling_value) < 0) {
> +            virReportError(VIR_ERR_XML_ERROR,
> +                           _("Invalid 'value' attribute in NUMA distances for sibling id: '%d'"),
> +                           sibling_id);
> +            goto cleanup;
> +        }
> +        VIR_FREE(tmp);
> +
> +        distances[sibling_id].cellid = sibling_id;
> +        distances[sibling_id].value = sibling_value;
> +    }
> +
> +    def->mem_nodes[cur_cell].distances = distances;
> +    def->mem_nodes[cur_cell].ndistances = ndistances;
> +
> +    ret = 0;
> +
> + cleanup:
> +    if (ret)
> +        VIR_FREE(distances);
> +    VIR_FREE(nodes);
> +    VIR_FREE(tmp);
> +
> +    return ret;
> +}
> +
>  int
>  virDomainNumaDefCPUParseXML(virDomainNumaPtr def,
>                              xmlXPathContextPtr ctxt)
> @@ -788,6 +885,14 @@ virDomainNumaDefCPUParseXML(virDomainNumaPtr def,
>              def->mem_nodes[cur_cell].memAccess = rc;
>              VIR_FREE(tmp);
>          }
> +
> +        /* Parse NUMA distances info */
> +        if (virDomainNumaDefNodeDistanceParseXML(def, ctxt, cur_cell) < 0) {
> +                virReportError(VIR_ERR_CONFIG_UNSUPPORTED,
> +                          _("NUMA cell %d has incorrect 'distances' configured"),
> +                          cur_cell);
> +                goto cleanup;
> +        }

This method already reported an detailed error message, which is being
overwritten here with a useless generic message.


> +size_t
> +virDomainNumaSetNodeCount(virDomainNumaPtr numa, size_t nmem_nodes)
> +{
> +    if (!numa || !nmem_nodes)
> +        return 0;

This error condition doesn't report any error message...

> +
> +    if (numa->mem_nodes) {
> +        virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
> +                       _("Cannot alter an existing nmem_nodes set"));
> +        return 0;
> +    }

...but this does...

> +
> +    if (VIR_ALLOC_N(numa->mem_nodes, nmem_nodes) < 0)
> +        return 0;

...and so does this.

Callers need to have consistent error reporting for all scenarios

> +
> +    numa->nmem_nodes = nmem_nodes;
> +
> +    return numa->nmem_nodes;
> +}



> +
> +
> +size_t
> +virDomainNumaGetNodeDistance(virDomainNumaPtr numa,
> +                             size_t node,
> +                             size_t cellid)
> +{
> +    virDomainNumaDistancePtr distances;
> +
> +    if (!numa)
> +        return 0;

Just mark 'numa' parameters as ATTRIBUTE_NONNULL and make
sure callers don't run this method if numa is not
configured.

> +
> +    distances = numa->mem_nodes[node].distances;

You've not bounds-checked 'node' vs mem_nodes size.

> +    if (!numa->mem_nodes[node].ndistances || !distances)
> +        return 0;

This is where we should plug in defaults for partially
specified data.

> +
> +    return distances[cellid].value;
> +}

What are calling semantics here ?  I think we should

> +
> +
> +size_t
> +virDomainNumaSetNodeDistance(virDomainNumaPtr numa,
> +                             size_t node,
> +                             size_t cellid,
> +                             unsigned int value)
> +{
> +    virDomainNumaDistancePtr distances;
> +
> +    /*
> +     * Advanced Configuration and Power Interface
> +     * Specification version 6.1. Chapter 5.2.17
> +     * System Locality Distance Information Table
> +     * ... Distance values of 0-9 are reserved.
> +     */
> +    if (!numa || value < 10)
> +        return 0;

ATTRIBUTE_NONNULL instead, and report an error for
values < 10


> +
> +    distances = numa->mem_nodes[node].distances;
> +
> +    if (numa->mem_nodes[node].ndistances > 0 &&
> +        distances[cellid].value)
> +        return 0;

I'm not sure what this check is doing - it seems to be
refusing to let you change the value if one is already
set. I don't see the point of that, but if it is needed
then we should report errors

> +
> +    distances[cellid].cellid = cellid;
> +    distances[cellid].value = value;
> +
> +    return distances[cellid].value;
> +}
> +
> +
> +size_t
> +virDomainNumaGetNodeDistanceCount(virDomainNumaPtr numa,
> +                                  size_t node)
> +{
> +    if (!numa)
> +        return 0;

ATTRIBUTE_NONNULL instead

> +
> +    return numa->mem_nodes[node].ndistances;
> +}
> +
> +
> +size_t
> +virDomainNumaSetNodeDistanceCount(virDomainNumaPtr numa,
> +                                  size_t node,
> +                                  size_t ndistances)
> +{
> +    virDomainNumaDistancePtr distances;
> +
> +    if (!numa || !ndistances)
> +        return 0;

ATTRIBUTE_NONNULL for first.

For the second we should presumably free any existing
distance data

> +
> +    distances = numa->mem_nodes[node].distances;
> +    if (distances) {
> +        virReportError(VIR_ERR_INTERNAL_ERROR,
> +                       _("Cannot alter an existing nmem_nodes distances set for node: %zu"),
> +                       node);
> +        return 0;
> +    }
> +
> +    if (VIR_ALLOC_N(distances, ndistances) < 0)
> +        return 0;
> +
> +    numa->mem_nodes[node].distances = distances;
> +    numa->mem_nodes[node].ndistances = ndistances;
> +
> +    return numa->mem_nodes[node].ndistances;
> +}
> +
> +
>  virBitmapPtr
>  virDomainNumaGetNodeCpumask(virDomainNumaPtr numa,
>                              size_t node)
> @@ -937,6 +1175,20 @@ virDomainNumaGetNodeCpumask(virDomainNumaPtr numa,
>  }
>  
>  
> +virBitmapPtr
> +virDomainNumaSetNodeCpumask(virDomainNumaPtr numa,
> +                            size_t node,
> +                            virBitmapPtr cpumask)
> +{
> +    if (!numa || !cpumask)
> +        return NULL;
> +
> +    numa->mem_nodes[node].cpumask = cpumask;
> +
> +    return numa->mem_nodes[node].cpumask;
> +}
> +
Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|




More information about the libvir-list mailing list