[libvirt] [RFC] Support for CPUID masking v2

Tue Sep 22 11:44:02 UTC 2009

On Fri, Sep 04, 2009 at 04:58:25PM +0200, Jiri Denemark wrote:
> Firstly, CPU topology and all (actually all that libvirt knows about) CPU
> features have to be advertised in host capabilities:
> 
>     <host>
>         <cpu>
>             ...
>             <features>
>                 <feature>NAME</feature>
>             </features>
>             <topology>
>                 <sockets>NUMBER_OF_SOCKETS</sockets>
>                 <cores>CORES_PER_SOCKET</cores>
>                 <threads>THREADS_PER_CORE</threads>
>             </topology>
>         </cpu>
>         ...
>     </host>

FWIW, we already have the host topology sockets/core/threads exposed
in the virNodeInfo API / struct, though I don't see any harm in having
it in the node capabilities XML too, particularly since we put NUMA
topology in there.

> I'm not 100% sure we should represent CPU features as <feature>NAME</feature>
> especially because some features are currently advertised as <NAME/>. However,
> extending XML schema every time a new feature is introduced doesn't look like
> a good idea at all. The problem is we can't get rid of <NAME/>-style features,
> which would result in redundancy:
> 
>     <features>
>         <vmx/>
>         <feature>vmx</feature>
>     </features>
> 
> But I think it's better than changing the schema to add new features.

I think we need more than just the features in the capabilties XML
though. eg, if an application wants to configure a guest with a
CPU model of 'core2duo' then it needs to know whether the host OS
is at least a 'core2duo' or a superset. 

In essence I think the host capabilities XML needs to be more closely
aligned with your proposed guest XML, specifically including a base
CPU model name, along with any additional features beyond the basic
set provided by that model.

Which brings me neatly to the next question

The host capabilities XML for some random machine says the host CPU is
a 'core2duo' + 'ssse3' + '3dnow'.

There is a guest to be run with a XML config requesting 'pentium3' + 
'ssse3' as a minimum requirement.

Now pretend you are not a human who knows pentium3 is a sub-set of 
core2duo.  How do we know whether it is possible to run the guest
on that host ?

We could say that we'll make 'virDomainCreate' just throw an error
when you try to start a guest (or incoming migration, etc), but if
we have a data center of many hosts, apps won't want to just try
to start a guest on each host. They'll want some way to figure out
equivalence between CPU + feature sets. 

Perhaps this suggests we want a

    virConnectCompareCPU(conn, "<guest cpu xml fragment>")

which returns 0 if the CPU is not compatible (ie subset), 1 if
it is identical, or 2 if it is a superset.  If we further
declare that host capabilities for CPU model follow the same
schema a guest XML for CPU model, we can use this same API to
test 2 separate hosts for equivalence and thus figure out the
lowest common denominator between a set of hosts & also thus
what guests are available for that set of hosts.

For x86, this would require libvirt internal driver to have a
xml -> cpuid convertor, but then we already need one of those
if we've to implement this stuff for Xen and VMWare drivers
so I don't see this as too bad.

We also of course need a cpuid -> xml convertor to populate
the host capabilities XML. 

For all this I'm thining we should have some basic external
data files which map named CPUs to sets of CPUID features,
and named flags to CPUID bits. Populate this with theset of
CPUs QEMU knows about for now, and then we can extend this
later simply by dropping in new data files. 

Back to your question about duplication:

>     <features>
>         <vmx/>
>         <feature>vmx</feature>
>     </features>

Just ignore the fact that we have vmx, pae + svm features
defined for now. Focus on determining what XML schema we want
to use consistently across host + guest for describing a CPU
model + features. Once that's determined, we'll just fill in
the legacy vmx/pae/svm features based off the data for the new
format and recommend in the docs not to use the old style.

> Secondly, drivers which support detailed CPU specification have to advertise
> it in guest capabilities. In case <features> are meant to be hypervisor
> features, than it could look like:
> 
>     <guest>
>         ...
>         <features>
>             <cpu/>
>         </features>
>     </guest>
> 
> But if they are meant to be CPU features, we need to come up with something
> else:
> 
>     <guest>
>         ...
>         <cpu_selection/>
>     </guest>
> 
> I'm not sure how to deal with named CPUs suggested by Dan. Either we need to
> come up with global set of named CPUs and document what they mean or let
> drivers specify their own named CPUs and advertise them through guest
> capabilities:
> 
>     <guest>
>         ...
>         <cpu model="NAME">
>             <feature>NAME</feature>
>             ...
>         </cpu>
>     </guest>
> 
> The former approach would make matching named CPUs with those defined by a
> hypervisor (such as qemu) quite hard. The latter could bring the need for
> hardcoding features provided by specific CPU models or, in case we decide not
> to provide a list of features for each CPU model, it can complicate
> transferring a domain from one hypervisor to another.

As mentioned above I think we want to define a set of named CPU models
that can be used across all drivers. For non-x86 we can just follow the
standard CPU model names in QEMU. For x86 since there's soo many possible
models and new ones appearing all the time, we I think we should define
a set of CPUs models starting off those in QEMU, but provide a way to
add new models via data files defining CPU ID mapping. 	nternally to
libvirt we'll need bi-directional CPUID<->Model+feature  convertors to
allow good support in all our drivers.

Model+feature -> CPU ID is easy - that's just a lookup.

CPU ID -> Model+feature is harder. We'd need to iterate over all
known models, and do a CPU ID -> Model+feature conversion for each
model. Then pick the one that resulted in the fewest named features,
which will probably be the newest CPU model. This will ensure the
XML will always be the most concise.

> And finally, CPU may be configured in domain XML configuration:
> 
> <domain>
>     ...
>     <cpu model="NAME">
>         <topology>
>             <sockets>NUMBER_OF_SOCKETS</sockets>
>             <cores>CORES_PER_SOCKET</cores>
>             <threads>THREADS_PER_CORE</threads>
>         </topology>

This bit about topology looks just fine.

>         <feature name="NAME" mode="set|check" value="on|off"/>
>     </cpu>
> </domain>
> 
> Mode 'check' checks physical CPU for the feature and refuses the domain to
> start if it doesn't match. VCPU feature is set to the same value. Mode 'set'
> just sets the VCPU feature.

The <feature> bit is probably a little too verbose for my liking.

   <feature name='ssse3' policy='XXX'>

With 'policy' allowing one of:

  - 'force'    - set to '1', even if host doesn't have it
  - 'require'  - set to '1', fail if host doesn't have it
  - 'optional' - set to '1', only if host has it
  - 'disable'  - set to '0', even if host has it
  - 'forbid'   - set to '0', fail if host has it

'force' is unlikely to be used but its there for completeness since Xen
and VMWare allow it. 'forbid' is for cases where you disable the CPUID
but an guest may still try to access it anyway  and you don't want it
to succeeed - if you used 'disable' the guest could still try to use the
feature if the host supported it, even if masked out in CPUID.

The final complication is the 'optional' flag here. If we set it to
'optional' and we boot the guest on a host that has this feature,
then when tring to migrate this in essence becomes a 'require' feature
flag since you can't take it away from a running guest.

> Final note: <topology> could also be called <cpu_topology> to avoid confusion
> with NUMA <topology>, which is used in host capabilities. However, I prefer
> <cpu><topology>...</topology></cpu> over
> <cpu><cpu_topology>...</cpu_topology></cpu>.

<cpu_topology> is redundant naming - the context - within a <cpu> tag is
more than sufficient to distinguish from host capbilities NUMA topology
when using <topology>.

Finally, throughout this discussion I'm assuming that for non-x86 archs
we'll merely use the named CPU model and not bother about any features
or flags beyond this - just strict equivalence....until someone who
cares enough about those archs complains.

Daniel
-- 
|: Red Hat, Engineering, London   -o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org  -o-  http://virt-manager.org  -o-  http://ovirt.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-  F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|