[libvirt] [Qemu-devel] q35 machine type and libvirt.

Alex Williamson alex.williamson at redhat.com
Wed Feb 6 20:15:05 UTC 2013


On Wed, 2013-02-06 at 14:13 -0500, Laine Stump wrote:
> Now that qemu is getting the q35 machine type, libvirt needs to
> support it.
> 
> As far as I understand, from libvirt's point of view, q35 is just
> another x86_64 system, but with a different set of implicit devices,
> and possibly some extra rules limiting which devices can be plugged
> into which bus/slot on the guest. That means that in order to support
> it, we will need to recognize when a q35-based machine type is being
> created, and auto-add all the implicit devices to libvirt's config
> model for that domain, then pay attention to the extra rules when
> assigning addresses for all the user-added devices.
> 
> We already add implicit controllers/devices for pc-based machine
> types; as a matter of fact, currently, libvirt improperly assumes (for
> the purposes of adding implicit devices) that *every* virtual machine
> is based on the "pc" machine type (or rather it just doesn't pay
> attention), so it always adds all the implicit devices for a pc
> machine type for every domain. This of course is already incorrect for
> many (probably all?) non-x86 machine types, even before we add q35
> into the mix. To fix this, it might be reasonable (and arguably, it's
> necessary to fix the problem in a backward-compatible manner) to just
> setup a table of machinetype ==> implicit device lists, look up the
> machine type in this table, and add the devices needed for that
> machine type. This goes against libvirt's longstanding view of
> machinetype as being an opaque value that it merely passes through to
> qemu, but it's manageable for the existing machine types (even
> including q35), since it's a finite set. But it starts to be a pain to
> maintain when you think about future additions - yet another case
> where new functionality in qemu will require an update to libvirt
> before it can be fully used via libvirt.
> 
> In the long term, it would be very useful / more easily maintainable
> to have a qemu status command available via QMP which would return the
> list of implicit devices (and their PCI addresses) for any requested
> machine type. It would be necessary that this command be callable
> multiple times within a single execution of qemu, giving it a
> different machinetype each time. This way libvirt could first query
> the list of available machinetypes in this particular qemu binary,
> then request the list of implicit devices for each machine type
> (libvirt runs each available qemu binary *once* the first time it's
> requested, and caches all such capabilities information so that it
> doesn't need to re-run qemu again and again). My limited understanding
> of qemu's code is that qemu itself doesn't have a table of this
> information as data, but instead has lines of code that are executed
> to create it, thus making it impractical to provide the list of
> devices for a machinetype without actually instantiating a machine of
> that type. What's the feasibility of adding such a capability (and in
> the process likely making the list of implicit devices in qemu itself
> table/data driven rather than constructed with lines of code).
> 
> More questions:
> 
> 1) It seems that the exact list of devices for the basic q35 machine
> type hasn't been settled on yet, is that correct?

I think what we have currently is just a stepping stone to a base
configuration.  At a minimum, we're missing the PCI bridge attached to
the ICH, which is where I think libvirt should attach non-chipset
component devices.  Next would be PCIe root ports where emulated and
assigned PCIe devices could be attached.

> 2) Are there other issues aside from implicit controller devices I
> need to consider for q35? For example, are there any devices that (as
> I recall is the case for some devices on "pc") may or may not be
> present, but if they are present they are always at a particular PCI
> address (meaning that address must be reserved)? I've also just
> learned that certain types of PCIe devices must be plugged into
> certain locations on the guest bus? ("root complex" devices - is there
> a good source of background info to learn the meaning of terms like
> that, and the rules of engagement? libvirt will need to know/follow
> these rules.)

The GMCH (Graphics & Memory Controller Hub) defines:

00.0 - Host bridge
01.0 - x16 root port for external graphics
02.0,1 - integrated graphics device (IGD)
03.0,1,2,3 - management engine subsystem

And the ICH defines:

19.0 - Embedded ethernet (e1000e)
1a.* - UHCI/EHCI
1b.0 - HDA audio
1c.* - PCIe root ports
1d.* - UHCI/EHCI
1e.0 - PCI Bridge
1f.0 - ISA Bridge
1f.2,5 - SATA
1f.3 - SMBUS

Personally, I think these slots should be reserved for only the spec
defined devices, and I'm not all that keen on using the remaining slots
for anything else.  Users should of course be allowed to put anything
anywhere, but libvirt auto-placement should follow some rules.

All of the above sit on what we now call bus pcie.0.  This is a root
complex, which implies that all of endpoints are root complex integrated
endpoints.  Being an integrated endpoint restricts aspects of the
device.  I've already found out the hard way that Windows actually cares
about this and will ignore PCI assigned devices of type "Endpoint" when
attached to the root complex bus.  (endpoint, root complex, etc is
defined in the PCIe spec, the above slot use is defined in the
respective chipset spec)

What I'd like to see is to implement the PCI-bridge at 1e.0 to expose a
complete, virgin PCI bus.  libvirt should use that as the default
location for any PCI device that's not a chipset component.  We might be
able to get away with installing our e1000 at 19.0, but otherwise I'm
thinking that the list only includes uhci/ehci, hda, ahci, and the
chipset components themselves (smbus, isa, root ports, etc...).  We
don't have "IGD", so our graphics should go on the PCI bus and the PCI
bridge should include functioning VGA enable bits.  Maybe QXL wants to
make itself a PCIe device, in which case it should be attached behind a
PCIe root port at slot 01.0.  Secondary PCIe graphics attach to root
ports behind 1c.*.  This is the same framework within real hardware has
to work.

Assigned devices get interesting due to the PCIe type.  We've never had
any problems attaching PCIe devices to PCI buses on PIIX (but it may be
holding back our ability to support graphics passthrough), so assigned
devices can probably be attached to the PCI bus.  More appropriate would
be to attach "Endpoints" behind root ports and "Integrated Endpoints" to
the root complex.  I've got some code that will mangle the PCIe type to
it's location in the topology, but it needs more work.  That should help
make things more flexible.


> 3) What new types of devices/controllers must be supported for a
> properly functioning q35 machine?

AHCI, bridges, root ports (we can skip these w/o PCIe devices, but for
hotplug we might want them fully populated - otherwise everything gets
hotplugged to the PCI bus).  Thanks,

Alex




More information about the libvir-list mailing list