[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [libvirt] [PATCH] numad: Set memory policy according to the advisory nodeset from numad

On 2012年05月08日 01:06, Eric Blake wrote:
On 05/07/2012 02:34 AM, Osier Yang wrote:
Though numad will manage the memory allocation of task dynamically,
but it wants management application (libvirt) to pre-set the memory
policy according to the advisory nodeset returned from querying numad,
(just like pre-bind CPU nodeset for domain process), and thus the
performance could benifit much more from it.


This patch introduces new XML tag 'placement', value 'auto' indicates
whether to set the memory policy with the advisory nodeset from numad,
and its value defaults to 'static'. e.g.

     <memory placement='auto' mode='interleave'/>

Just like what current "numatune" does, the 'auto' numa memory policy
setting uses libnuma's API too.

So, to full drive numad, one needs to specify placement='auto' for
both "<vcpu>" and"<numatune><memory .../></numatune>". It's a bit
inconvenient, but makes sense from semantics' point of view.

An alternative way is to not introduce the new XML tag, and pre-set
the memory policy implicitly with "<vcpu placement='auto'>4</vcpu>",
but IMHO it implies too much, and I'd not like go this way unless
the new XML tag is not accepted.

I think we need a hybrid approach.

Not sure, honestly, :-)

Existing users wrote XML that used just<vcpu placement='auto'>  but
expecting full numad support. Normally, numad is most useful if it
controls _both_ memory (<numatune>) and vcpu placement together, but it
is feasible to control either one in isolation.


Therefore, I think what we need to do is specify that on input, we have
four situations:

1. /domain/vcpu placement is missing
    /domain/numatune/memory placement is missing
    We default to mode='static' in both places, and avoid numad

2. /domain/vcpu placement is present
    /domain/numatune/memory placement is missing
    We copy vcpu placement over to numa placement (and if
placement='auto', that means we use numad for everything)

3. /domain/vcpu placement is missing
    /domain/numatune/memory placement is present
    We copy numa placement over to vcpu placement (and if
placement='auto', that means we use numad for everything)

4. /domain/vcpu placement is present
    /domain/numatune/memory placement is present
    We use exactly what the user specifies (and if only one of the two
features is placement='auto', then the other feature avoids numad)

Mode 3 and 4 would be the new modes possible by the XML added in this
patch.  Mode 1 is the default for XML in use by guests before numad
integration, and Mode 2 is the XML for guests attempting to use numad in
the 0.9.11 release; I'm okay changing the semantics of mode 2 to be
easier to use (because you can use mode 4 if you really wanted the
unusual semantics of numad vcpu placement but not memory placement).

Then on output, we always output mode in both<vcpu>  and<numatune>, as
in mode 4 (yeah, that probably means touching a lot of tests, but that's
life when we add new XML).

It sounds clear, but I'm not sure if it could make things a chaos, as it implicts so much in XML parsing, while the XMLs are probably used
by other drivers in future. But anyway I'm going forward this way to
see if it is.

I'm still a bit torn on whether this must go in before 0.9.12, since
we're past rc1.  But since it looks like this is more of an oversight
(our numad usage in 0.9.11 is not very useful, since it missed memory
placement), this can be argued to be a bug fix rather than a new
feature, even though it is adding XML, so I will probably be okay with a
v2 patch going in before the final 0.9.12 release.

On to the actual code of v1:

diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in
index e1fe0c4..01b3124 100644
--- a/docs/formatdomain.html.in
+++ b/docs/formatdomain.html.in
@@ -580,9 +580,14 @@
          The optional<code>memory</code>  element specifies how to allocate memory
          for the domain process on a NUMA host. It contains two attributes,

s/two attributes,/several optional attributes./

          attribute<code>mode</code>  is either 'interleave', 'strict',


-        or 'preferred',
-        attribute<code>nodeset</code>  specifies the NUMA nodes, it leads same
-        syntax with attribute<code>cpuset</code>  of element<code>vcpu</code>.
+        or 'preferred', attribute<code>nodeset</code>  specifies the NUMA nodes,

s/'preferred', attribute/'preferred'.  Attribute/

+        it leads same syntax with attribute<code>cpuset</code>  of element

s/it leads same syntax with/using the same syntax as/

+<code>vcpu</code>, the optional attribute<code>placement</code>  can be

s/</code>, the/</code>.  The/

+        used to indicate the memory placement mode for domain process, its value
+        can be either "static" or "auto", defaults to "static". "auto" indicates

Given my above comments, this would default to the same placement as
used in<vcpu>.

+        the domain process will only allocate memory from the advisory nodeset
+        returned from querying numad, and the value of attribute<code>nodeset</code>
+        will be ignored if it's specified.
          <span class='since'>Since 0.9.3</span>

Mention that attribute placement is since 0.9.12.


diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng
index 8419ccc..9b509f1 100644
--- a/docs/schemas/domaincommon.rng
+++ b/docs/schemas/domaincommon.rng
@@ -562,16 +562,35 @@
          <element name="numatune">
              <element name="memory">
-<attribute name="mode">
-<attribute name="nodeset">
-<ref name="cpuset"/>
+<attribute name="mode">
+<attribute name='placement'>
+<attribute name="mode">
+<attribute name="nodeset">
+<ref name="cpuset"/>

It looks like you are requiring mode='...' in both choices, so the real
choice is whether you permit placement or nodeset.  However, this RNG
will forbid placement='static' nodeset='0-3', so you didn't get it quite
right.  I almost think you want:

<attribute name='mode'>
       <attribute name='placement'>
       <attribute name='nodeset'>
         <ref name='cpuset'/>
   <attribute name='placement'>

which says that both placement and nodeset can be missing (by my new
semantics, that would mean that placement defaults to<vcpu>, otherwise
to static); or that nodeset can be present (placement would be implied
as static), or that placement can explicitly be set to static.
Meanwhile, it says that placement='auto' cannot be mixed with
nodeset='...' (is that right,


or do we output nodeset even with numad
automatic placement to show the user what they actually have at the
current moment?).

No, we intended to show the cpuset for "vcpu", but it's removed later,
as it doesn't make much sense for numad will schedule the resources

+++ b/libvirt.spec.in
@@ -454,6 +454,7 @@ BuildRequires: scrub

  %if %{with_numad}
  BuildRequires: numad
+BuildRequires: numactl-devel

This should be split into an independent fix.  Furthermore, it needs to
be gated - in F16, 'numad' provided the development tools that were
split off into 'numactl-devel' for F17.  So this really needs to be a
versioned check:

%if %{with_numad}
%if 0%{?fedora}>= 17
BuildRequires: numactl-devel
BuildRequires: numad


@@ -8055,13 +8052,40 @@ static virDomainDefPtr virDomainDefParseXML(virCapsPtr caps,
                                                   _("Unsupported NUMA memory "
                                                     "tuning mode '%s'"),
-                                                 tmp);
+                                                   tmp);

Spurious indentation change.  The old spacing was correct.

Ah, yes.

+                            /* "nodeset" leads same syntax with "cpuset". */

s/leads same syntax with/uses the same syntax as/


  struct _virDomainTimerCatchupDef {
@@ -1504,6 +1511,7 @@ struct _virDomainNumatuneDef {
      struct {
          char *nodemask;
          int mode;
+        int placement_mode;

Add a comment mentioning which enum values are valid for assignment to
this variable.


Thanks for the reviewing!


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]