[libvirt PATCH] RFC: Add support for vDPA network devices

Jonathon Jongsma jjongsma at redhat.com
Thu Aug 27 14:17:14 UTC 2020


I got some feedback from John Ferlan on a different forum about missing
handling of migration and qemu capabilities. I'm adding this to my
patch, but I'd appreciate any additional feedback, particularly on the
xml format and the managed=yes/node-device questions below.


On Mon, 2020-08-24 at 16:33 -0500, Jonathon Jongsma wrote:
> On Thu, 2020-08-20 at 18:56 -0400, Laine Stump wrote:
> > On 8/18/20 2:37 PM, Jonathon Jongsma wrote:
> > > vDPA network devices allow high-performance networking in a
> > > virtual
> > > machine by providing a wire-speed data path. These devices
> > > require
> > > a
> > > vendor-specific host driver but the data path follows the virtio
> > > specification.
> > > 
> > > The support for vDPA devices was recently added to qemu. This
> > > allows
> > > libvirt to support these devices. It requires that the device is
> > > configured on the host with the appropriate vendor-specific
> > > driver.
> > > This will create a chardev on the host at e.g. /dev/vhost-vdpa-0.
> > > That
> > > chardev path can then be used to define a new interface with
> > > type='vdpa'.
> > > ---
> 
> [snip]
> 
> 
> > 
> > > +
> > > +::
> > > +
> > > +   ...
> > > +   <devices>
> > > +     <interface type='vdpa'>
> > > +       <source dev='/dev/vhost-vdpa-0'/>
> > 
> > (The above device is created (I just learned this from you in IRC!)
> > by 
> > unbinding a VF from its NIC driver on the host, and re-binding it
> > to
> > a 
> > special VDPA-VF driver.)
> > 
> > 
> > As we were just discussing online, on one hand it could be nice if 
> > libvirt could automatically handle rebinding the VF to the vdpa
> > host 
> > driver (given the PCI address of the VF), to make it easier to use 
> > (because no advance setup would be needed), similar to what's
> > already 
> > done with hostdev devices (and <interface type='hostdev'>) when 
> > managed='yes' (which is the default setting).
> > 
> > 
> > On the other hand, it is exactly that managed='yes' functionality
> > that 
> > has created more "libvirt-but-not-really-libvirt" bug reports than
> > any 
> > other aspect of vfio device assignment, because the process of
> > unbinding 
> > and rebinding drivers is timing-sensitive and causes code that's
> > usually 
> > run only once at host boot-time to be run hundreds of times thus
> > making 
> > it more likely to expose infrequently-hit bugs.
> > 
> > 
> > I just bring this up in advance of someone suggesting the addition
> > of 
> > managed='yes' here to put in my vote for *not* doing it, and
> > instead 
> > using that same effort to provide some sort of API in the node-
> > device 
> > driver for easily creating one or more VDPA devices from VFs,
> > which 
> > could be done once at host boot time, and thus avoid the level of 
> > "libvirt-not-libvirt" bug reports for VDPA. (and after that maybe
> > even 
> > an API to allocate a device from that pool to be used by a guest).
> > But 
> > that's for later.
> 
> I'd really love to hear some additional opinions on this topic from
> some more of the libvirt "old-timers". I intentionally kept the
> initial
> vdpa support simple by requiring that the device is already setup and
> bound to the appropriate driver. But I want to make sure that we can
> add additional capabilities later (as Laine suggested) with this same
> API/schema.
> 
> Anybody else have thoughts on this topic?
> 
> 
> > 
> > > +     </interface>
> > > +   </devices>
> > > +   ...
> > > +
> > >   :anchor:`<a id="elementsTeaming"/>`
> > >   
> > >   Teaming a virtio/hostdev NIC pair
> > > diff --git a/docs/schemas/domaincommon.rng
> > > b/docs/schemas/domaincommon.rng
> > > index 0d0dcbc5ce..17f74490f4 100644
> > > --- a/docs/schemas/domaincommon.rng
> > > +++ b/docs/schemas/domaincommon.rng
> > > @@ -3108,6 +3108,21 @@
> > >               <ref name="interface-options"/>
> > >             </interleave>
> > >           </group>
> > > +
> > > +        <group>
> > > +          <attribute name="type">
> > > +            <value>vdpa</value>
> > > +          </attribute>
> > > +          <interleave>
> > > +            <element name="source">
> > > +              <attribute name="dev">
> > > +                <ref name="deviceName"/>
> > > +              </attribute>
> > > +            </element>
> > > +            <ref name="interface-options"/>
> > > +          </interleave>
> > > +        </group>
> > > +
> > >         </choice>
> > >         <optional>
> > >           <attribute name="trustGuestRxFilters">
> > > 
> 
> [snip]
> 
> > > diff --git a/tests/qemuxml2argvdata/net-vdpa.x86_64-latest.args
> > > b/tests/qemuxml2argvdata/net-vdpa.x86_64-latest.args
> > > new file mode 100644
> > > index 0000000000..8e76ac7794
> > > --- /dev/null
> > > +++ b/tests/qemuxml2argvdata/net-vdpa.x86_64-latest.args
> > > @@ -0,0 +1,37 @@
> > > +LC_ALL=C \
> > > +PATH=/bin \
> > > +HOME=/tmp/lib/domain--1-QEMUGuest1 \
> > > +USER=test \
> > > +LOGNAME=test \
> > > +XDG_DATA_HOME=/tmp/lib/domain--1-QEMUGuest1/.local/share \
> > > +XDG_CACHE_HOME=/tmp/lib/domain--1-QEMUGuest1/.cache \
> > > +XDG_CONFIG_HOME=/tmp/lib/domain--1-QEMUGuest1/.config \
> > > +QEMU_AUDIO_DRV=none \
> > > +/usr/bin/qemu-system-i386 \
> > > +-name guest=QEMUGuest1,debug-threads=on \
> > > +-S \
> > > +-object secret,id=masterKey0,format=raw,\
> > > +file=/tmp/lib/domain--1-QEMUGuest1/master-key.aes \
> > > +-machine pc,accel=tcg,usb=off,dump-guest-core=off \
> > > +-cpu qemu64 \
> > > +-m 214 \
> > > +-overcommit mem-lock=off \
> > > +-smp 1,sockets=1,cores=1,threads=1 \
> > > +-uuid c7a5fdbd-edaf-9455-926a-d65c16db1809 \
> > > +-display none \
> > > +-no-user-config \
> > > +-nodefaults \
> > > +-chardev socket,id=charmonitor,fd=1729,server,nowait \
> > > +-mon chardev=charmonitor,id=monitor,mode=control \
> > > +-rtc base=utc \
> > > +-no-shutdown \
> > > +-no-acpi \
> > > +-boot strict=on \
> > > +-device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 \
> > > +-add-fd set=0,fd=1732 \
> > > +-netdev vhost-vdpa,vhostdev=/dev/fdset/0,id=hostnet0 \
> > 
> > Okay, I'm feeling too lazy to parse through the code above an see
> > how 
> > you arrived at "vhostdev='/dev/fdset/0'", but that doesn't look
> > right. 
> > Shouldn't you be ending up with "-netdev vhost-vdpa,fd=NN,..."?
> > The 
> > document I have shows that syntax is supported, so there shouldn't
> > be 
> > any need to do the add-fd stuff in this case.
> 
> The initial proposal for vdpa in qemu supported both vhostdev= and
> fd=
> parameters, but the final implementation does not actually support an
> fd= parameter here. The recommended way to pass an pre-opened fd to
> qemu in this scenario is to use the 'add-fd' and /dev/fdset/ method
> as
> shown above.
> 
> As far as I understand from reading qemu code, /dev/fdset/N is not
> actually a file that exists in the filesystem. Instead, when you call
> 'add-fd', qemu adds that fd to an internal mapping that maps from the
> fdset specified by set=N to the passed fd. Whenever qemu tries to
> open
> a file, qemu_open() has special handling for filenames that start
> with
> "/dev/fdset/": it looks up the fd associated with that fdset id and
> returns that instead of attempting to open the file path. 
> 
> So I think the above should be correct.
> 
> > 
> > I think the next step should be to find some hardware and give this
> > a 
> > smoke test! :-)
> 
> Indeed, I'm working with Cindy Lu (who implemented the qemu vdpa
> support) to try to get this tested properly. Unfortunately I've been
> unable to get vdpa_sim working and I don't have access to hardware at
> the moment.
> 
> Thanks for the review!
> 
> Jonathon
> 




More information about the libvir-list mailing list