[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [libvirt] [PATCH 1/3] XML definitions for guest NUMA

On 11/06/2011 06:57 AM, Bharata B Rao wrote:
XML definitions for guest NUMA and parsing routines.

From: Bharata B Rao<bharata linux vnet ibm com>

This patch adds XML definitions for guest NUMA specification and contains
routines to parse the same. The guest NUMA specification looks like this:

         <topology sockets='2' cores='4' threads='2'/>
                 <cell cpus='0-7' mems='512000'/>
                 <cell cpus='8-15' mems='512000'/>

Signed-off-by: Bharata B Rao<bharata linux vnet ibm com>

+      Guest NUMA topology can be specifed using<code>numa</code>  element.
+<span class="since">Since X.X.X</span>

Let's just put 0.9.8 here. It's easier at feature freeze time to grep and replace a concrete 0.9.8 into a different numbering scheme (0.10.0, 1.0.0, ?) if we decide on something different than 0.9.8, than it is to remember to also search for X.X.X.

+  ...
+    ...
+&lt;cell cpus='0-3' mems='512000'/&gt;
+&lt;cell cpus='4-7' mems='512000'/&gt;

I understand 'cpus' (a valid word meaning multiple cpu units), but 'mems' seems odd; I think it would be better naming this attribute 'memory' to match our <memory> element at the top level. Just because qemu's command line names the option mems= doesn't mean we should be stuck making our XML inconsistent.

+    ...
+  ...</pre>
+      Each<code>cell</code>  element specifies a NUMA cell or a NUMA node.
+<code>cpus</code>  specifies the CPU or range of CPUs that are part of
+      the node.<code>mems</code>  specifies the node memory in kilobytes
+      (i.e. blocks of 1024 bytes). Each cell or node is assigned cellid
+      or nodeid in the increasing order starting from 0.

I agree with doing things in 1024-byte blocks [1], since <memory> and <currentMemory> are also in that unit.

[1] 1024-byte blocks is technically kibibytes, not kilobytes; but you're copying from existing text, so at least we're consistent, not to mention fitting right in with the wikipedia complaint that KiB has had slow adoption by the computer industry: https://secure.wikimedia.org/wikipedia/en/wiki/Kibibyte :)

+      This guest NUMA specification translates to<code>-numa</code>  command
+      line option for QEMU/KVM. For the above example, the following QEMU
+      command line option is generated:
+<code>-numa node,nodeid=0,cpus=0-3,mems=512000 -numa node,nodeid=1,cpus=4-7,mems=512000</code>

This paragraph is not necessary. We don't need to give one hypervisor-specific example of how the XML is translated; it is sufficient to simply document the XML semantics in a way that can be implemented by any number of hypervisors.

+<define name="numaCell">
+<element name="cell">
+<attribute name="cpus">
+<ref name="Cellcpus"/>

Typically, ref names start with a lower case letter.

+<attribute name="mems">
+<ref name="Cellmems"/>

Is it possible for these attributes to be optional? That is, on the qemu line, can I specify cpus= but not mems=, or mems= but not cpus=? If so, then the attributes need to be optional in the schema, and the code behave with sane defaults when one of the two attributes is not present.

@@ -2745,4 +2767,14 @@
        <param name="pattern">[a-zA-Z0-9_\.:]+</param>
+<define name="Cellcpus">
+<data type="string">
+<param name="pattern">([0-9]+(-[0-9]+)?|\^[0-9]+)(,([0-9]+(-[0-9]+)?|\^[0-9]+))*</param>

This looks like a repeat of <define name="cpuset">; if so, let's reuse that define instead of making a new one (and if not, why are we introducing yet another syntax?).

+<define name="Cellmems">
+<data type="unsignedInt">
+<param name="pattern">[0-9]+</param>

Likewise, this looks like a repeat of <define name="memoryKB">, so lets reuse that.

@@ -109,6 +114,19 @@ no_memory:
      return NULL;

+static int
+virCPUDefNumaCPUs(virCPUDefPtr def)
+    int i, j, ncpus = 0;
+    for (i = 0; i<  def->ncells; i++) {
+        for (j = 0; j<  VIR_DOMAIN_CPUMASK_LEN; j++) {
+            if (def->cells[i].cpumask[j])
+                ncpus++;
+        }
+    }

Can this loop be made any faster by using count_one_bits?

+    return ncpus;

  virCPUDefParseXML(const xmlNodePtr node,
@@ -289,6 +307,50 @@ virCPUDefParseXML(const xmlNodePtr node,
          def->features[i].policy = policy;

+    if (virXPathNode("./numa[1]", ctxt)) {
+        VIR_FREE(nodes);
+        n = virXPathNodeSet("./numa[1]/cell", ctxt,&nodes);
+        if (n<  0 || n == 0) {

This looks a bit funny, compared to if (n <= 0).

+        for (i = 0 ; i<  n ; i++) {
+            char *cpus;
+            int cpumasklen = VIR_DOMAIN_CPUMASK_LEN;
+            unsigned long ul;
+            int ret;
+            def->cells[i].cellid = i;
+            cpus = virXMLPropString(nodes[i], "cpus");
+            if (VIR_ALLOC_N(def->cells[i].cpumask, cpumasklen)<  0)
+                goto no_memory;
+            if (virDomainCpuSetParse((const char **)&cpus,
+                                 0, def->cells[i].cpumask,
+                                 cpumasklen)<  0)

Does this behave properly if the cpus=... attribute was missing?

+                goto error;
+            ret = virXPathULong("string(./numa[1]/cell/@mems)",
+                            ctxt,&ul);
+            if (ret<  0) {
+                virCPUReportError(VIR_ERR_INTERNAL_ERROR,
+                    "%s", _("Missing 'mems' attribute in NUMA topology"));
+                goto error;

Especially since you checked for a missing mems= counterpart? And back to my earlier question of whether both attributes are really mandatory, or whether just one in isolation can make sense.

+++ b/src/conf/cpu_conf.h
@@ -67,6 +67,14 @@ struct _virCPUFeatureDef {
      int policy;         /* enum virCPUFeaturePolicy */

+typedef struct _virCellDef virCellDef;
+typedef virCellDef *virCellDefPtr;
+struct _virCellDef {
+   int cellid;
+   char *cpumask;	/* CPUs that are part of this node */
+   unsigned int mem;	/* Node memory */

I'd write this as /* Node memory in kB */, to make it clear what scale is in use.

I saw the code for parsing the XML, but what about the code for generating the xml during 'virsh dumpxml' (aka virDomainDefFormat)? The patch is incomplete without that.

Eric Blake   eblake redhat com    +1-801-349-2682
Libvirt virtualization library http://libvirt.org

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]