[Libvir] CPU pinning of domains at creation time

Thu Oct 11 15:45:44 UTC 2007

* Daniel Veillard <veillard at redhat.com> [2007-10-11 08:01]:
>   There are a few things I gathered on this issue. This affects 
> NUMA setups, where basically if a domain must be placed on a given cell
> it is not good to let the hypervisor place it first with its own heuristics
> and then later migrate it to a different set of CPU, but better to 
> instruct the hypervisor to start said domain on the given set.
>    - For Xen it is possible to instruct the hypervisor by passing 
>      (cpus '2,3') in the SExpr where the argument is a list of
>      the physical processors allowed

A bit more detail here just FYI:

Xen takes the cpu list and converts that into an affinity bitmap that is
then applied to each vcpu allocated to the guest.

>    - For KVM I think the standard way would be to select the 
>      cpuset using sched_setaffinity() between the fork of the 
>      current process and the exec of the qemu process

Yep.  

>    - there is no need (from a NUMA perspective) to do fine grained
>      allocation at that point, as long as the domain can be restricted
>      to a given cell at startup, then if needed virDomainPinVcpu() can be
>      used later to do more precise pinning in order to try to optimize
>      placement

kvm-46 added user-space allocated memory which means that we can use
libnuma/numactl to set the approriate node.

>    - to be able to instruct the hypervisor at creation time adding the
>      information in the domain XML description looks the more natural way
>      (another option would be to force to use virDomainDefineXML, add a
>       call using the resulting virDomainPtr to define the set, and 
>       then virDomainCreate would be used to do the actual start)
>      + the good point of having this embedded in the XML is that
>        we still have all informations about the domain settings in
>        the XML, if we want to restart it later
>      + the bad point is that we need to fetch and carry this extra
>        information when doing XML dumps to not loose it for example
>        when manipulating the domain to add or remove devices
>    - extracting a cpuset can still be an heavy operation, for example
>      if using xend on need one RPC per vcpu in the domain, the cpuset
>      being constructed by OR'ing logically all cpumaps used by the 
>      vcpus of the domain (though in most case this will be the full
>      map after the first CPU and can be stopped immediately)

Yeah, that might be a decent patch to xend - build up an array of
affinity masks for each vcpu.

>    - for the mapping at the XML level I suggest to use a simple extension
>      to the <vcpu>n</vcpu> and extend it to
>      <vcpu cpuset='2,3'>n</vcpu>
>      with a limited syntax which is just the comma separated list of
>      allowed CPU numbers (if the code actually detects such a cpuset is
>      in effect i.e. in general this won't be added).

I think we should support the same cpuset notation that Xen supports,
which means including ranges (1-4) and negation (^1).  These two
features make describing large ranges much more compact.

> 
> Internally implementing this should not be too hard, I would probably refactor
> some of the existing parsing code, provide functions to get the cpuset and
> the number of physical processors.
> 
>   Does this sounds okay ?

Yeah, I think this covers everything we'd need.

-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
(512) 838-9253   T/L: 678-9253
ryanh at us.ibm.com