[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

[libvirt] RFC: managing "pci passthrough" usage of sriov VFs via a new network forward type

For some reason beyond my comprehension, the designers of SRIOV ethernet cards decided that the virtual functions (VF) of the card (each VF corresponds to an ethernet device, e.g. "eth10") should each be given a new+different+random MAC address each time the hardware is rebooted. Normally, udev keeps a persistent table that associates each known MAC address with an ethernet device name - any time an ethernet device with a previously-unknown MAC address is found, a new device name is allocated ("eth11", etc) and the newly found MAC address is associated with that device name. When an ethernet device is an SRIOV VF, though, udev doesn't persist the MAC address, so at each boot a device is found with a new MAC addres, but the device name from the previous boot is "unused" so magically the device ends up with the same name even though the MAC address has changed.

When this device is assigned to a guest via PCI passthrough, though, the guest doesn't have the necessary information to realize that it's actually an SRIOV VF, so the guest's udev persists the MAC address - on the first boot of host+guest, the guest will see it has, e.g., mac address 11:22:33:44:55:66 and udev will add an entry to its persistent table remembering that 11:22:33:44:55:66="eth0". If the host reboots, though, the VF will get a new MAC address, and when the guest boots, it will see a new MAC address (e.g. "66:55:44:33:22:11") and think that there's a different card, so it will create a new device (and a new udev entry - 66:55:44:33:22:11="eth1"). This will repeat each time the host reboots, with the obvious undesired consequences.

This makes using SRIOV VFs via PCI passthrough very unpalatable. The problem can be solved by setting the MAC address of the ethernet device prior to assigning it to the guest, but of course the <hostdev> element used to assign PCI devices to guests has no place to specify a MAC address (and I'm not sure it would be appropriate to add something that function-specific to <hostdev>). Dave Allan and I have discussed a different possible method of eliminating this problem (using a new forward type for libvirt networks) that I've outlined below. Please let me know what you think - is this reasonable in general? If so, what about the details? If not, any counter-proposals to solve the problem?

Providing Predictable/Configurable MAC Addresses for SRIOV VFs used via PCI Passthrough:

1) <network> will have a new forward type='hardware'. When forward type='hardware', a pool of ethernet interfaces can be specified, just as for the forward types "bridge", "vepa", "private", and "passthrough". At this point, that's the only thing that I've determined is needed in the network definition.

2) In a domain's <interface> definition, when type='network', if the network has a forward type='hardware', the domain code will request an unused ethernet device from the network driver, then do the following:

3) save the ethernet device name in interface/actual so that it can be easily retrieved if libvirtd is restarted

4) Set the MAC address of the given ethernet device according to the domain <interface> config.

5) Use the NodeDevice API to learn all the necessary PCI domain/slot/bus/function and add a (non-persisting) <hostdev> element to the guest's config before starting it up.

6) When the guest is eventually destroyed, the ethernet device will be free'd back to the network pool for use by another guest.

One problem this doesn't solve is that when a guest is migrated, the PCI info for the allocated ethernet device on the destination host will almost surely be different. Is there any provision for dealing with this in the device passthrough code? If not, then migration will still not be possible.

Although I realize that many people are predisposed to not like the idea of PCI passthrough of ethernet devices (including me), it seems that it's going to be used, so we may as well provide the management tools to do it in a sane manner.

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]