[libvirt] Hot plug multi function PCI devices

Ziviani . jrziviani at gmail.com
Fri Jan 8 20:09:18 UTC 2016


You're right, I have taken for granted the cases I'm aware of, forgetting
to think all other possible scenarios. Your hints here was absolutely
important to address points of misunderstanding. I'll certainly read the
sources you told.

Thank you!

On Fri, Jan 8, 2016 at 4:41 PM, Laine Stump <laine at laine.org> wrote:

> On 01/06/2016 08:58 PM, Ziviani . wrote:
>
>
> On Wed, Jan 6, 2016 at 4:43 PM, Laine Stump <laine at laine.org> wrote:
>
>> On 12/23/2015 11:01 AM, Ziviani . wrote:
>>
>> Hi Laine,
>>
>> This (hot plugging all functions at once) is something I was thinking
>> about. What if we could create a xml file passing the IOMMU group instead
>> of only one function per time, would it be feasible?
>> I could start working on a proof of concept if the community thinks it's
>> a valid path.
>>
>> Do you know how is currently working on it? I could offer some help if
>> they need.
>>
>>
>> (Please reply inline rather than top-posting. It makes it much easier to
>> follow the context of the conversation.)
>>
>> What do you mean by "passing the IOMMU group"? Do you mean *just* the
>> iommu group, excluding the information about the devices? This doesn't seem
>> like a good idea, since afaik the iommu group number is something just
>> conjured up by the kernel at boot time, and isn't necessarily predictable
>> or stable between host reboots. Also, it wouldn't allow for assigning only
>> some of the devices/functions in a group while leaving others inactive.
>>
>
> ​ My first idea was doing something like this:
> % virsh nodedev-dumpxml pci_0000_00_16_3
> <device>
>   <name>pci_0000_00_16_3</name>
> [snip]
> <iommuGroup number='4'>
>       <address domain='0x0000' bus='0x00' slot='0x16' function='0x0'/>
>       <address domain='0x0000' bus='0x00' slot='0x16' function='0x3'/>
>     </iommuGroup>
>   </capability>
> </device>
>
> If an user wants to attach pci_0000_00_16_3, I'd find all devices
> belonging the its same iommu group to attach every one. A very poor
> pseudo-code would be like:
>
> slot = get_available_guest_slot();
> immou_group = device_to_be_attached().get_iommu();
> for (device : iommu_group.devices()) {
>   (1st iteraction) device_add
> vfio-pci,host=00:16.0,addr=slot.0,multifunction=on
>   (2nd iteraction) device_add
> vfio-pci,host=00:16.3,addr=slot.3,multifunction=on
> }
>
> So, in this case, we could accept either the device to be attached or
> simply its current iommu group#.
>
>
> But "iommu group" is not the same thing as "all functions on a single
> device". Although in some cases they might be the same, that isn't
> necessarily true - one iommu group could span several devices, and there
> could be devices in the group that the user wasn't expecting and that could
> cause unexpected disastrous results (the most commonly used example is if
> the controller for the host's main disk happens to be in the same iommu
> group as some device that you're trying to assign).
>
> Also, you're making the assumption that only physical hardware devices
> assigned with vfio can/should be put onto multiple functions of a single
> guest slot, but that isn't true. It's also okay (and at times desirable) to
> put multiple emulated devices into different functions of the same slot.
>
>
>
>> I think there are two reasonable possibilities:
>>
>> 1) Follow the apparent path of qemu - accept separate attach calls, one
>> for each function, and use the attach of function 0 as the "action" button
>> that causes all the functions to be attached.
>>
>> 2) Enhance the attach API to accept multiple <hostdev> elements in the
>> XML for a single call, and do "whatever is proper for the current
>> hypervisor" to attach them.
>>
>
> ​ I think my first idea has more to do with you 1st option. But I like the
> second one: user specify all devices in the xml, then we assert there is no
> missing function,
>
>
> Why do you assert that there is no missing function? Again, while this
> *can* be used to assign all of the functions of a single multi-function
> host device to functions of a single guest slot, that isn't the only use.
> You can also assign *some* of the functions of a single device, or a
> collection of emulated devices (or possibly even a mixture of emulated and
> assigned devices, although I'm not sure what vfio would think about that -
> it may be prohibited for very good reasons).
>
> then we go attaching one by one (
>
> ​ with this another poor pseudo-code):​
>>
> slot = get_available_guest_slot();
> ​ for (device : devices_parsed_from_xml()) {
>   (1st iteraction) device_add
> vfio-pci,host=00:16.0,addr=slot.0,multifunction=on
>   (2nd iteraction) device_add
> vfio-pci,host=00:16.3,addr=slot.3,multifunction=on
> }​
>
>
>>
>> As for detach, it's really only possible to detach *all* functions, and
>> it would take more bookkeeping to allowing marking each function for
>> removal and then removing the device when all functions had been marked, so
>> maybe we only allow detach of function 0, and that will always detach
>> everything? (not sure, that's just an idea).
>>
>
> ​I think we can let users detach anyone. We could get the slot and start
> detaching all functions from that slot, again another poor example:
>
> device = device_to_be_detached();
> for (uint function = 0; function < device.len_slot(), ++function)
>     detach(device.slot[function]->addr);
>
>
> My understanding is that there is no way to inform the guest OS that a
> single function of a device has been detached. The only thing you can do is
> tell it that the entire device has been unplugged from the slot.
>
>
>
>>
>> As far as I know, nobody is currently working on anything like this for
>> libvirt, so this is your chance to get your hands dirty!
>>
>
> ​ Awesome! :)​
>
>
>>
>> (It just occurred to me that method (1) of multifunction attach method
>> outlined above will also need similar extra bookkeeping, just as the "mark
>> each function for removal" detach method would, and this extra bookkeeping
>> would need to survive a restart of libvirtd in the middle of a series of
>> attach/detach calls, making it more complicated, so maybe the 2nd methods
>> would be better. I'd love to hear opinions though.)
>>
>
> Because it's possible to retrieve the functions belonging to a slot I
> think we can avoid such bookkeeping (of course, my idea can be totally
> wrong) :D
>
> (qemu) info pci
> ...
>   Bus  0, device   6, function 0:
>     Class 1920: PCI device 8086:9c3a
>       IRQ 11.
>       BAR0: 64 bit memory at 0x40000000 [0x4000001f].
>       id ""
>   Bus  0, device   6, function 3:
>     Serial port: PCI device 8086:9c3d
>       IRQ 6.
>       BAR0: I/O at 0x1000 [0x1007].
>       BAR1: 32 bit memory at 0x40001000 [0x40001fff].
>       id ""
>
> But based on my code above, the function device_to_be_detached() could
> return the struct with slot[functions] based on this qemu info.
>
>
> It's not that simple. You need to keep track of which devices you've told
> qemu to detach that qemu hasn't yet informed you were successfully
> detached. Also, if we allow it in steps (libvirt accepts attach/detach for
> multiple functions followed by a "make it so!" command), the info about
> pending attach/detach sets would need to be maintained.
>
> You should probably spend some time looking at src/qemu/qemu_hotplug.c,
> src/util/virhostdev.c, and virpci.c before jumping to a lot of conclusions
> :-)
>
>
>
> ​ Thank you for your time and advice, I'm starting to look on it and let
> you know the progress. My irc nickname is #ziviani.​
>
>
>
>
>>
>>
>>
>> Thank you :)
>>
>> On Mon, Dec 21, 2015 at 3:53 PM, Laine Stump < <laine at laine.org>
>> laine at laine.org> wrote:
>>
>>> On 12/21/2015 08:29 AM, Ziviani . wrote:
>>>
>>> Hello list!
>>>
>>> I'm new here and interested in hot-plug multi-function PCI devices.
>>> Basically I'd like to know why Libvirt does not support it. I've been
>>> through the archives and basically found this thread:
>>>
>>> https://www.redhat.com/archives/libvir-list/2011-May/msg00457.html
>>>
>>> But Qemu seems to handle it accordingly:
>>> virsh qemu-monitor-command --hmp fedora-23 'device_add
>>> vfio-pci,host=00:16.0,addr=08.0'
>>> virsh qemu-monitor-command --hmp fedora-23 'device_add
>>> vfio-pci,host=00:16.3,addr=08.3'
>>>
>>> GUEST:
>>> # lspci
>>> (snip)
>>> 00:08.0 Communication controller: Intel Corporation 8 Series HECI #0
>>> (rev 04)
>>> 00:08.3 Serial controller: Intel Corporation 8 Series HECI KT (rev 04)
>>>
>>> However, using Libvirt:
>>>
>>> % virsh attach-device fedora-23 pci_0000_00_16_0.xml --live
>>> Device attached successfully
>>>
>>> % virsh attach-device fedora-23 pci_0000_00_16_3.xml --live
>>> error: Failed to attach device from pci_0000_00_16_3.xml
>>> error: internal error: Only PCI device addresses with function=0 are
>>> supported
>>>
>>> I made some changes on domain_addr.c[1] for testing and it worked.
>>>
>>> [1] <https://gist.github.com/jrziviani/1da184c7fd0b413e0426>
>>> https://gist.github.com/jrziviani/1da184c7fd0b413e0426
>>>
>>> % virsh attach-device fedora-23 pci_0000_00_16_3.xml --live
>>> Device attached successfully
>>>
>>> GUEST:
>>> # lspci
>>> (snip)
>>> 00:08.0 Communication controller: Intel Corporation 8 Series HECI #0
>>> (rev 04)
>>> 00:08.3 Serial controller: Intel Corporation 8 Series HECI KT (rev 04)
>>>
>>> So there is more to it that I'm not aware?
>>>
>>>
>>> You're relying on behavior in the guest OS for which there is no
>>> standard (and which, by definition, doesn't work on real hardware, so no
>>> guest OS will be expecting it; a friend more familiar with this has told me
>>> that probably qemu is sending an (acpi?) "device check" to the guest for
>>> each function that is added, and in your case it's apparently "doing the
>>> right thing" in response to that). But just because it is successful in
>>> this one case doesn't mean that it will be successful in all situations;
>>> likely it won't be. So while the qemu monitor takes the laissez-faire
>>> approach of allowing you to try it and letting you pick up the pieces when
>>> it fails, libvirt prevents it because it is bound to fail, and thus not
>>> supportable.
>>>
>>> There has recently been some work in qemu to "save up" any requests to
>>> attach devices with function > 0, then present them all to the guest at
>>> once when function 0 is attached. This is the only standard way to handle
>>> hotplug of multiple functions in a slot. Hot unplug can only happen for all
>>> functions in the slot at once. I'm not sure of the current status of that
>>> work, but once it is in and stable, libvirt will support it.
>>>
>>>
>>> Thank you!
>>>
>>>
>>> --
>>> libvir-list mailing listlibvir-list at redhat.comhttps://www.redhat.com/mailman/listinfo/libvir-list
>>>
>>>
>>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/libvir-list/attachments/20160108/932c6a3b/attachment-0001.htm>


More information about the libvir-list mailing list