[libvirt] [RFC PATCH 0/2] nodeinfo: PPC64: Fix topology and siblings info on capabilities and nodeinfo

Thu Jun 16 23:42:34 UTC 2016

On Fri, 10 Jun 2016 17:52:47 +0200
Andrea Bolognani <abologna at redhat.com> wrote:

> On Tue, 2016-05-31 at 16:08 +1000, David Gibson wrote:
> > > QEMU fails with errors like
> > > 
> > >   qemu-kvm: Cannot support more than 8 threads on PPC with KVM
> > >   qemu-kvm: Cannot support more than 1 threads on PPC with TCG
> > > 
> > > depending on the guest type.  
> > 
> > Note that in a sense the two errors come about for different reasons.
> > 
> > On Power, to a much greater degree than x86, threads on the same core
> > have observably different behaviour from threads on different cores.
> > Because of that, there's no reasonable way for KVM to present more
> > guest threads-per-core than there are host threads-per-core.
> > 
> > The limit of 1 thread on TCG is simply because no-one's ever bothered
> > to implement SMT emulation in qemu.  
> 
> That just means in the future we might have to expose something
> other than an hardcoded '1' as guest thread limit for TCG guests;
> the interface would remain valid AFAICT.

Right, that's kind of my point.

> > > physical_core_id would be 32 for all of the above - it would
> > > just be the very value of core_id the kernel reads from the
> > > hardware and reports through sysfs.
> > > 
> > > The tricky bit is that, when subcores are in use, core_id and
> > > physical_core_id would not match. They will always match on
> > > architectures that lack the concept of subcores, though.  
> > 
> > Yeah, I'm still not terribly convinced that we should even be
> > presenting physical core info instead of *just* logical core info.  If
> > you care that much about physical core topology, you probably
> > shouldn't be running your system in subcore mode.  
> 
> Me neither. We could leave it out initially, and add it later
> if it turns out to be useful, I guess.

I think that's a good idea.

> > > > > The optimal guest topology in this case would be
> > > > >  
> > > > >    <vcpu placement='static' cpuset='4'>4</vcpu>
> > > > >    <cpu>
> > > > >      <topology sockets='1' cores='1' threads='4'/>
> > > > >    </cpu>    
> > > >  
> > > > So when we pin to logical CPU #4, ppc KVM is smart enough to see that it's a
> > > > subcore thread, will then make use of the offline threads in the same subcore?
> > > > Or does libvirt do anything fancy to facilitate this case?    
> > > 
> > > My understanding is that libvirt shouldn't have to do anything
> > > to pass the hint to kvm, but David will have the authoritative
> > > answer here.  
> > 
> > Um.. I'm not totally certain.  It will be one of two things:
> >    a) you just bind the guest thread to the representative host thread
> >    b) you bind the guest thread to a cpumask with all of the host
> >       threads on the relevant (sub)core - including the offline host
> >       threads
> > 
> > I'll try to figure out which one it is.  
> 
> I played with this a bit: I created a guest with
> 
>   <vcpu placement='static' cpuset='0,8'>8</vcpu>
>   <cpu>
>     <topology sockets='1' cores='2' threads='4'/>
>   </cpu>
> 
> and then, inside the guest, I used cgroups to pin a bunch
> of busy loops to specific vCPUs.
> 
> As long as all the load (8+ busy loops) was distributed
> only across vCPUs 0-3, one of the host threads remained idle.
> As soon as the first of the jobs was moved to vCPUs 4-7, the
> other host thread immediately jumped to 100%.
> 
> This seems to indicate that QEMU / KVM are actually smart
> enough to schedule guest threads on the corresponding host
> threads. I think :)

Uh.. yes.  Guest threads on the same guest core will always be
scheduled together on a physical (sub)core.  In fact, it *has* to be
done this way because recent processors contain the msgsnd / msgrcv
instructions which directly send interrupts from one thread to
another.  Those instructions are not HV privileged, so they can be
invoked directly by the guest and their behaviour can't be virtualized.

This is one of the ways in which threads on the same core are
guest-observably different from threads on different cores mentioned
above.

> On the other hand, when I changed the guest to distribute the
> 8 vCPUs among 2 sockets with 4 cores each instead, the second
> host thread would start running as soon as I started the
> second busy loop.

Right likewise a single physical (sub)core can never simultaneously run
threads from multiple guests (or guest and host).  msgsnd above, as
well as some other things, would allow one guest to interfere with
another, breaking isolation.

This is the reason that having multiple threads active in the host
while also running guests would be almost impossibly difficult.

> > > We won't know whether the proposal is actually sensible until
> > > David weighs in, but I'm adding Martin back in the loop so
> > > we can maybe give us the oVirt angle in the meantime.  
> > 
> > TBH, I'm not really sure what you want from me.  Most of the questions
> > seem to be libvirt design decisions which are independent of the layers
> > below.  
> 
> I mostly need you to sanity check my proposals and point out
> any incorrect / dubious claims, just like you did above :)
> 
> The design of features like this one can have pretty
> significant consequences for the interactions between the
> various layers, and when the choices are not straightforward
> I think it's better to gather as much feedback as possible
> from across the stack before moving forward with an
> implementation.
> 
> -- 
> Andrea Bolognani
> Software Engineer - Virtualization Team

-- 
David Gibson <dgibson at redhat.com>
Senior Software Engineer, Virtualization, Red Hat
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <http://listman.redhat.com/archives/libvir-list/attachments/20160617/c0b8808a/attachment-0001.sig>