[libvirt] [PATCH V2] Expose all CPU features in host definition

Mon Jul 1 15:43:58 UTC 2013

On Mon, Jul 01, 2013 at 09:44:52AM +0100, Daniel P. Berrange wrote:
> On Fri, Jun 28, 2013 at 02:26:02PM -0600, Don Dugger wrote:
> > On Fri, Jun 28, 2013 at 09:03:55PM +0100, Daniel P. Berrange wrote:
> > > On Fri, Jun 28, 2013 at 11:51:32AM -0600, Don Dugger wrote:
> > > > On Fri, Jun 28, 2013 at 10:24:48AM +0100, Daniel P. Berrange wrote:
> > > > > On Thu, Jun 27, 2013 at 10:35:58AM -0600, Don Dugger wrote:
> > > > > > On Mon, Jun 17, 2013 at 10:27:36AM +0200, Jiri Denemark wrote:
> > > > > > > On Fri, Jun 14, 2013 at 12:32:35 -0600, Don Dugger wrote:
> > > > > > > > On Fri, Jun 14, 2013 at 03:06:52PM +0200, Jiri Denemark wrote:
> > > > > > > > > I was just trying to say that it doesn't provide anything more than
> > > > > > > > > "it's supported by the host CPU", which gives mostly no value in the
> > > > > > > > > context of libvirt. Can you explain more what the use case is in which a
> > > > > > > > > virt client would need to know what specific feature are supported by
> > > > > > > > > host CPU? I feel like we should avoid people from being under the
> > > > > > > > > impression that they can actually use the CPU capabilities for checking
> > > > > > > > > whether a host can run guests that require specific CPU features.
> > > > > > > > 
> > > > > > > > The specific use case I'm trying to address is a cloud environment where,
> > > > > > > > with hundreds of hosts, you might want to schedule an instance to a host
> > > > > > > > that supports a particular HW acceleration, like AES/NI.  I agree, what
> > > > > > > > I `really` want is an API that shows the capabilities of a specific guest
> > > > > > > > that could be created on the host but, absent that API, at least knowing
> > > > > > > > that a host doesn't support the feature is more information than I can get
> > > > > > > > right now.
> > > > > > > 
> > > > > > > Hmm, fair enough. Let's wait a few days for Daniel to return from
> > > > > > > vacation in case he wants to express his opinion here.
> > > > > > 
> > > > > > So, any progress here?
> > > > > 
> > > > > I believe the cloud use case above is approaching the problem in the wrong
> > > > > way. We designed our APIs such that apps should never need to write logic
> > > > > for comparing CPU features directly. If an application has a set of CPU
> > > > > features it requires from the host, then it should use a libvirt API to
> > > > > query whether those features are available, not try to write the CPU
> > > > > fetaure comparison logic itself.
> > > > > 
> > > > > You can already pretty much do this with te virConnectCompareCPU()
> > > > > method, by passing an XML document which specifies the AES/NI feature
> > > > > flag that you want to check for support of. Then libvirt will tell
> > > > > you whether the host CPU can support it. It is entirely possible to
> > > > > make use of this capability as is in OpenStack.
> > > > 
> > > > I don't think this would work with the way scheduling in OpenStack works.
> > > > The problem is that the OpenStack scheduler doesn't want to query each node
> > > > in the system on every schedule request (with 100s if not 1,000s of compute
> > > > nodes this would not be practical).  Instead the scheduler maintains info
> > > > about all of the compute nodes and, when a request comes in, the scheduler
> > > > identifies the best compute node for the request and then causes the VM
> > > > to be started on that node.  Apriori the scheduler doesn't even know which
> > > > CPU features users are interested in, that information only becomes available
> > > > when a schedule request comes in so trying to do a `virConnectCompareCPU()'
> > > > call at that point in time is too late.
> > > 
> > > I think your model for user interaction is wrong here. I don't believe
> > > that OpenStack should be directly exposing the ability for a user to
> > > explicitly request individual CPU flags for individual VMs. This is
> > > too risky from a cloud administrator POV, because it could result in
> > > a user monopolizing a small subset of machines in the guest with
> > > particular features.  Instead an administrator should be defining
> > > new flavours with specific CPU feature characteristics. The user can
> > 
> > That's the exact mechanism that is being proposed, the ability to define
> > a flavor that specifies capabilities that are required.  The issue is
> > that the flavor is defined independently from the scheduler, it's only
> > when a schedule request is made that the flavor is presented to the scheduler
> > and then the scheduler needs to identify which of 1,000s of nodes can
> > satisfy that flavor.
> 
> Every 60 seconds or so, every node issues an update indicating what its
> capabilities are. In that update the nodes could do the CPU compatbility
> checks and then include the list of which flavours they are capable of
> executing in their capability update, so that it is then available to
> the schedular when needed

This doesn't work for multiple reasons.

  1.  Ultimately, I want to remove the periodic capability update completely.
      The better technique is to update compute node state when the state
      changes, periodic updates are just extra overhead.
  2.  There's no concept of which nodes support which flavors so this is
      a completely new infrastructure that would have to be added to the
      scheduler.
  3.  There's no easy way for the compute node to know which flavors it
      supports.  It doesn't know which filters are enabled in the scheduler
      so it doesn't know which clauses of a flavor actually apply (ignoring
      that the compute node would now have to duplicate the filtering
      mechanism from the scheduler even if it knew which filters were
      enabled).

The virConnectGetCapabilities already returns a list of CPU features, all
my patch does is have it explicitly return a complete set of features, which
I think is the right thing to do and certainly simplifies my specific use
case.

> 
> > > then choose the flavour with the CPU characteristics. In this way the
> > > system can know ahead of time what flavours are possible on what
> > > host, and do you don't need to query all nodes at scheduling time.
> > 
> > Note I am not proposing that we make a query at schedule time.  The
> > system is already setup to have the compute nodes send configuration
> > info to the scheduler, all I want to do is sent complete info (e.g. all
> > of the CPU features).
> 
> 
> Regards,
> Daniel
> -- 
> |: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
> |: http://libvirt.org              -o-             http://virt-manager.org :|
> |: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
> |: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|
> 

-- 
Don Dugger
"Censeo Toto nos in Kansa esse decisse." - D. Gale
n0ano at n0ano.com
Ph: 303/443-3786