[libvirt] [RFC PATCH 00/16] Introduce vGPU mdev framework to libvirt

Erik Skultety eskultet at redhat.com
Tue Feb 7 16:26:51 UTC 2017


On Mon, Feb 06, 2017 at 09:33:14AM -0700, Alex Williamson wrote:
> On Mon,  6 Feb 2017 13:19:42 +0100
> Erik Skultety <eskultet at redhat.com> wrote:
> 
> > Finally. It's here. This is the initial suggestion on how libvirt might
> > interract with the mdev framework, currently only focussing on the non-managed
> > devices, i.e. those pre-created by the user, since that will be revisited once
> > we all settled on how the XML should look like, given we might not want to use
> > the sysfs path directly as an attribute in the domain XML. My proposal on the
> > XML is the following:
> > 
> > <hostdev mode='subsystem' type='mdev'>  
> >     <source>
> >         <!-- this is the host's physical device address -->
> >         <address domain='0x0000' bus='0x00' slot='0x00' function='0x00'>
> >         <uuid>vGPU_UUID<uuid>
> >     <source>
> >     <!-- target PCI address can be omitted to assign it automatically -->
> > </hostdev>
> > 
> > So the mediated device is identified by the physical parent device visible on
> > the host and a UUID which allows us to construct the sysfs path by ourselves,
> > which we then put on the QEMU's command line.
> 
> Based on your test code, I think you're creating something like this:
> 
> -device vfio-pci,sysfsdev=/sys/class/mdev_bus/0000:00:03.0/53764d0e-85a0-42b4-af5c-2046b460b1dc
> 
> That would explain the need for the parent device address, but that's
> an entirely self inflicted requirement.  For a managed="no" scenarios,
> we shouldn't need the parent, we can get to the mdev device
> via /sys/bus/mdev/devices/53764d0e-85a0-42b4-af5c-2046b460b1dc.  So it

True, for managed="no" would this path be a nice optimization.

> seems that the UUID should be the only required source element for
> managed="no".
> 
> For managed="yes", it seems like the parent device is still an optional

The reason I went with the parent address element (and purposely neglecting the
sample mtty driver) was that I assumed any modern mdev capable HW would be
accessible through the PCI bus on the host. Also I wanted to explicitly hint
libvirt as much as possible which parent device a vGPU device instance should
be created on in case there are more than one of them, rather then scanning
sysfs for a suitable parent which actually supports the given vGPU type.

> field.  The most important thing that libvirt needs to know when
> creating a mdev device for a VM is the mdev type name.  The parent
> device should be an optional field to help higher level management
> tools deal with placement of the device for locality or load balancing.
> Also, we can't assume that the parent device is a PCI device, the
> sample mtty driver already breaks this assumption.

Since we need to assume non-PCI devices and we still need to enable management
to hint libvirt about the parent to utilize load balancing and stuff, I've come
up with the following adjustments/ideas on how to reflect that in the XML:
- still use the address element but use it with the 'type' attribute [1] (still
  breaks the sample mtty driver though) while making the element truly optional
  if I'm going to be outvoted in favor of scanning the directory for a suitable
  parent device on our own, rather than requiring the user to provide that

- providing either an attribute or a standalone element for the parent device
  name, like a string version of the PCI address or whatever form the parent
  device comes in (doesn't break the mtty driver but I don't quite like this)

- providing a path element/attribute to sysfs pointing to the parent device
  which I'm afraid is what Daniel is not in favor of libvirt doing

So, this is what I've so far come up with in terms of hinting libvirt about the
parent device, do you have any input on this, maybe some more ideas on how we
should identify the parent device?

> 
> Also, grep'ing through the patches, I don't see that the "device_api"

Yep, this was also on purpose since as you write below, right now the only
functioning mdev devices we have to work with are vfio-pci capable only, so
with this RFC I wanted to gather some feedback on whether I'm moving the right
direction in the first place. So yeah, I thought this could be added at any point
later.

[1] http://libvirt.org/formatdomain.html#elementsAddress

Erik

> file is being used to test that the mdev device actually exports the
> vfio-pci API before making use of it with the QEMU vfio-pci driver.  We
> don't yet have any examples to the contrary, but non vfio-pci mdev
> devices are in development.  Just like we can't assume the parent
> device type, we can't assume the API of an mdev device to the user.
> Thanks,
> 
> Alex
> 
> > A few remarks if you actually happen to have a machine to test this on:
> > - right now the mediated devices are one-time use only, i.e. they have to be
> > recreated before every machine boot
> > - I wouldn't recommend assigning multiple vGPUs to a single domain
> > 
> > Once this series is sorted out, we can then continue with 'managed=yes' where
> > as Laine pointed out [1], we need to figure out how exactly should the
> > management layer hint libvirt which vGPU type should be used for device
> > instantiation.
> > 
> > [1] https://www.redhat.com/archives/libvir-list/2017-January/msg00287.html  
> > 




More information about the libvir-list mailing list