[libvirt] [RFC] handling hostdev save/load net config for non SR-IOV devices

Thu Jul 18 22:06:18 UTC 2019

On Thu, 18 Jul 2019 17:08:23 -0400
Laine Stump <laine at laine.org> wrote:

> On 7/18/19 2:58 PM, Daniel Henrique Barboza wrote:
> >
> > On 7/18/19 2:18 PM, Laine Stump wrote:
> >  
> >> But to back up a bit - what is it about managed='yes' that makes you 
> >> want to do it that way instead of managed='no'? Do you really ever 
> >> need the devices to be binded to the host driver? Or are you just 
> >> using managed='yes' because there's not a standard/concenient place 
> >> to configure devices to be permanently binded to vfio-pci immediately 
> >> when the host boots?

driverctl can do this fwiw.

> >
> > It's a case of user convenience for devices that has mixed usage, at
> > least in my opinion.
> >
> > For example, I can say from personal experience dealing with devices
> > that will never be used directly by the host, such as NVIDIA GPUs that are
> > used only as hostdevs of guests, that this code I'm developing is
> > pointless. In that setup the GPUs are binded to vfio-pci right after the
> > host boots using a /etc/rc.d script (or something equivalent). Not sure
> > if this is the standard way of binding a device to vfio-pci, but it works
> > for that environment.   
> 
> 
> Yeah, the problem is that there really isn't a standard "this is *the 
> one correct way* to configure this" place for this config, so everybody 
> just does what works for them, making it difficult to provide a 
> "recommended solution" in the libvirt documentation that you (i.e. "I" 
> :-)) have a high level of confidence in.

I think driverctl is trying to be that solution, but as soon as you say
for example "NVIDIA GPUs only work this way", there are immediately a
dozen users explaining a fifteen different ways that they bind their
NVIDIA GPU only to vfio-pci while the VM is running and return it back
when it stops.  There are cases where users try to fight the kernel to
get devices bound to vfio-pci exclusively and before everything else
and it never changes, cases where we can let the kernel do its thing
and grab it for vfio-pci later, and cases where we bounce devices
between drivers depending on the current use case.

> > The problem is with devices that the user expects to use both in guests
> > and in the host. In that case, the user will need either to handle the 
> > nodedev
> > detach/reattach manually or to use managed=true and let Libvirt re-attach
> > the devices every time the guest is destroyed.  Even if the device is 
> > going to
> > be used in the same or another guest right after (meaning that we 
> > re-attached
> > the device back simply to detach it again right after), using managed=true
> > is convenient because the user doesn't need to think about the state of
> > the device.  
> 
> 
> Yeah, I agree that there are uses for managed='yes' and it's a good 
> thing to have. It's just that I think most of the time it's being used 
> when it isn't needed (and thus shouldn't be used).

We can't guess what the user is trying to accomplish, managed=true is a
more user friendly default.  Unfortunatley nm trying to dhcp any new
NIC that appears is also a more user friendly default and the
combination of those is less than desirable.

> >> I think we should spend more time making it easier to have devices 
> >> "pre-binded" to vfio-pci at boot time, so that we could discourage 
> >> use of managed='yes'. (not "instead of" what you're doing, but "in 
> >> addition to" it).

driverctl, and soon hopefully mdevctl.

> > I think managed=true use can be called a 'bad user habit' in that 
> > sense. I can
> > think of some ways to alleviate it:
> >
> > - an user setting in an conf file that changes how managed=true works. 
> > Instead
> > of detach/re-attach the device, Libvirt will only detach the device, 
> > leaving the
> > device bind to vfio-pci even after guest destroy  
> 
> >
> > - same idea, but with a (yet another) XML attribute "re-attach=false" 
> > in the
> > hostdev definition. I like this idea better because you can set customized
> > behavior for each device/guest instead of changing the managed mechanic to
> > everyone  
> 
> 
> I posted a patch to support that (with a new managed mode called 
> "detach", which would automatically bind the device to vfio-pci at geust 
> startup, and leave it binded to vfio-pci when the guest released it) a 
> couple years ago, and it was rejected upstream (after a lot of discussion):
> 
> 
> https://www.redhat.com/archives/libvir-list/2016-March/msg00593.html
> 
> 
> I believe the main reason was that it was "giving the consumer yet 
> another option that they probably wouldn't understand, and would make 
> the wrong choice on", or something like that...
> 
> 
> I still like the idea, but it wasn't worth spending more time on the debate

If we have driverctl to bind a device to a user preferred driver at
start, doesn't managed=true become a no-op? (or is that what this patch
was trying to do?)

> > - for boot time (daemon start time), one way I can think of is an XML
> > file with the hostdev devices that are supposed to be pre-binded to 
> > vfio-pci
> > by libvirtd. Then the user can run the guests using managed=false in those
> > devices knowing that those devices are already taken care of. I don't know
> > how this idea interacts with the new "remember_owner" feature that
> > Michal implemented recently though .....  
> 
> 
> Back when I posted my patches, we discussed adding persistent config to 
> the nodedevice driver that would take care of binding certain devices to 
> vfio-pci. Unfortunately that misses one usage class - the case when a 
> device should *never* be bound to the host driver at all; those need to 
> be handled by something in the host boot process much earlier than libvirtd.

Why should any of this be handled by libvirt?  The problem of binding
devices to alternative drivers is maybe heavily leaning on VM use
cases, but it's not exclusively a VM problem.  Is it sufficient to let
driverctl handle that init-time rebinding and libvirt just handles what
it gets with just managed=true/false for whether it's allowed to put
the device where it needs to be?

> > - we can make a 're-definition' of what managed=false means for PCI
> > hostdevs. Instead of failing to execute the guest if the device isn't 
> > bind to
> > vfio-pci, managed=false means that Libvirt will detach the device from
> > the host if necessary, but it will not re-attach it back. If such a 
> > redefinition
> > is a massive break to API and user expect behavior (I suspect it is 
> > ...) we can
> > create a "managed=mixed" with this mechanic  
> 
> 
> Yeah, changing the meaning of managed='no' would be a non-starter. (I 
> guess your "managed='mixed'" is pretty much what my 2016 patch did, only 
> with a different name).

I'm still not sure what gap this is trying to fill.  If a user wants a
device only for VM use, set a driver-override with driverctl and then
managed=true/false is pretty much a no-op in libvirt.  It's only
recently that there's become a relatively standardized way to do this,
so is it just a matter of educating users?  There have always been
various modprobe.d, initrd scripts, and kernel command lines to do
this, and some of those are still required if we need to strong arm the
kernel to avoid default drivers ever touching the device.  For devices
like NVIDIA, we might still recommend things like
pci-stub.ids=10de:ffffffff on the kernel command line (or blacklist
nouveau) to prevent host kernels touching it, but then driverctl can
come in and move it to vfio-pci and users could set managed=false if
they want to avoid a pointless unbind and rebind to the same driver (if
libvirt isn't smart enough to avoid that already).  Thanks,

Alex