[libvirt] [PATCH 1/2] vfio/mdev: add version field as mandatory attribute for mdev device

Alex Williamson alex.williamson at redhat.com
Wed Apr 24 14:14:50 UTC 2019


On Tue, 23 Apr 2019 23:39:34 -0400
Yan Zhao <yan.y.zhao at intel.com> wrote:

> On Tue, Apr 23, 2019 at 11:02:56PM +0800, Alex Williamson wrote:
> > On Tue, 23 Apr 2019 01:41:57 -0400
> > Yan Zhao <yan.y.zhao at intel.com> wrote:
> >   
> > > On Tue, Apr 23, 2019 at 09:21:00AM +0800, Alex Williamson wrote:  
> > > > On Mon, 22 Apr 2019 21:01:52 -0400
> > > > Yan Zhao <yan.y.zhao at intel.com> wrote:
> > > >  
> > > > > On Mon, Apr 22, 2019 at 10:39:50PM +0800, Alex Williamson wrote:  
> > > > > > On Fri, 19 Apr 2019 04:35:04 -0400
> > > > > > Yan Zhao <yan.y.zhao at intel.com> wrote:
> > > > > >  
> > > > > > > device version attribute in mdev sysfs is used by user space software
> > > > > > > (e.g. libvirt) to query device compatibility for live migration of VFIO
> > > > > > > mdev devices. This attribute is mandatory if a mdev device supports live
> > > > > > > migration.  
> > > > > >
> > > > > > The Subject: doesn't quite match what's being proposed here.
> > > > > >  
> > > > > > > It consists of two parts: common part and vendor proprietary part.
> > > > > > > common part: 32 bit. lower 16 bits is vendor id and higher 16 bits
> > > > > > >              identifies device type. e.g., for pci device, it is
> > > > > > >              "pci vendor id" | (VFIO_DEVICE_FLAGS_PCI << 16).  
> > > > > >
> > > > > > What purpose does this serve?  If it's intended as some sort of
> > > > > > namespace feature, shouldn't we first assume that we can only support
> > > > > > migration to devices of the same type?  Therefore each type would
> > > > > > already have its own namespace.  Also that would make the trailing bit
> > > > > > of the version string listed below in the example redundant.  A vendor
> > > > > > is still welcome to include this in their version string if they wish,
> > > > > > but I think the string should be entirely vendor defined.
> > > > > >  
> > > > > hi Alex,
> > > > > This common part is a kind of namespace.
> > > > > Because if version string is entirely defined by vendors, I'm worried about
> > > > > if there is a case that one vendor's version string happens to deceive and
> > > > > interfere with another vendor's version checking?
> > > > > e.g.
> > > > > vendor A has a version string like: vendor id + device id + mdev type
> > > > > vendor B has a version string like: device id + vendor id + mdev type
> > > > > but vendor A's vendor id is 0x8086, device id is 0x1217
> > > > > vendor B's vendor id is 0x1217, device id is 0x8086.
> > > > >
> > > > > In this corner case, the two vendors may regard the two device is
> > > > > migratable but actually they are not.
> > > > >
> > > > > That's the reason for this common part that serve as a kind of namespace
> > > > > that all vendors will comply with to avoid overlap.  
> > > >
> > > > If we assume that migration can only occur between matching mdev types,
> > > > this is redundant, each type already has their own namespace.
> > > >  
> > > hi Alex,
> > > do you mean user space software like libvirt needs to first check whether
> > > mdev type is matching and then check whether version is matching?  
> > 
> > Yes.
> >  
> may I know the drawback of defining the this common part?
> 	common part: 32 bit. lower 16 bits is vendor id and higher 16 bits
>              identifies device type. e.g., for pci device, it is
>              "pci vendor id" | (VFIO_DEVICE_FLAGS_PCI << 16).
> By doing so, user space software has no need to first check whether mdev type
> is matching.  A confident vendor driver can even allow devices migrating
> between different mdev types.


It's not practical to expect userspace to test the version of every
mdev type in the system to determine a match, let alone across a data
center.  Additionally, in order to be migration compatible the mdev
types must be software compatible to the VM, which is the basic
definition of the differences between mdev types.  Therefore let me
flip the question around, why would a vendor driver choose to create a
new mdev type for software compatible devices?  If the vendor wants to
maintain compatibility, indicate basic compatibility using the same
mdev type.  The original intention of the version attribute is to be
entirely opaque to userspace, introducing "common" parts is unnecessary
as above, degrades the original concept, and only defines a solution for
PCI devices. Thanks,

Alex

> > > if user space software only checks version for migration, it means vendor
> > > driver has to include mdev type in their vendor proprietary part string,
> > > right?  
> > 
> > Userspace attempting to migrate an nvidia-64 to an i915-GVT_V5_4 would
> > be a failure on the part of the user.
> >   
> > > Another thing is that could there be any future mdev parent driver that
> > > applies to all mdev devices, just like vfio-pci? like Yi's vfio-pci-mdev
> > > driver (https://lkml.org/lkml/2019/3/13/114)?  
> > 
> > For starters, this is just a sample driver from which vendor specific
> > mdev drivers could be forked to support these features, but
> > additionally, the type is defined by the vendor driver, so even a meta
> > driver like vfio-pci-mdev could create types like
> > "vfio-pci-mdev-8086_10c9_abcdef" if it wanted to provide that specific
> > device.  The "vfio-pci-mdev-type1" is just a sample implementation to
> > say "the native type of the thing bound to me" and it's going to have
> > limited usefulness for any sort of persistence to userspace.  Thus,
> > it's a sample driver.  Thanks,  
> Thanks for this explanation:)
> 
> 
> >   
> > > > > > > vendor proprietary part: this part is varied in length. vendor driver can
> > > > > > >              specify any string to identify a device.
> > > > > > >
> > > > > > > When reading this attribute, it should show device version string of the
> > > > > > > device of type <type-id>. If a device does not support live migration, it
> > > > > > > should return errno.
> > > > > > > When writing a string to this attribute, it returns errno for
> > > > > > > incompatibility or returns written string length in compatibility case.
> > > > > > > If a device does not support live migration, it always returns errno.
> > > > > > >
> > > > > > > For user space software to use:
> > > > > > > 1.
> > > > > > > Before starting live migration, user space software first reads source side
> > > > > > > mdev device's version. e.g.
> > > > > > > "#cat \
> > > > > > > /sys/bus/pci/devices/0000\:00\:02.0/5ac1fb20-2bbf-4842-bb7e-36c58c3be9cd/mdev_type/version"
> > > > > > > 00028086-193b-i915-GVTg_V5_4
> > > > > > >
> > > > > > > 2.
> > > > > > > Then, user space software writes the source side returned version string
> > > > > > > to device version attribute in target side, and checks the return value.
> > > > > > > If a negative errno is returned in the target side, then mdev devices in
> > > > > > > source and target sides are not compatible;
> > > > > > > If a positive number is returned and it equals to the length of written
> > > > > > > string, then the two mdev devices in source and target side are compatible.
> > > > > > > e.g.
> > > > > > > (a) compatibility case
> > > > > > > "# echo 00028086-193b-i915-GVTg_V5_4 >
> > > > > > > /sys/bus/pci/devices/0000\:00\:02.0/882cc4da-dede-11e7-9180-078a62063ab1/mdev_type/version"
> > > > > > >
> > > > > > > (b) incompatibility case
> > > > > > > "#echo 00028086-193b-i915-GVTg_V5_1 >
> > > > > > > /sys/bus/pci/devices/0000\:00\:02.0/882cc4da-dede-11e7-9180-078a62063ab1/mdev_type/version"
> > > > > > > -bash: echo: write error: Invalid argument
> > > > > > >
> > > > > > > 3. if two mdev devices are compatible, user space software can start
> > > > > > > live migration, and vice versa.
> > > > > > >
> > > > > > > Note: if a mdev device does not support live migration, it either does
> > > > > > > not provide a version attribute, or always returns errno when its version
> > > > > > > attribute is read/written.  
> > > > > >
> > > > > > I think it would be cleaner to do the former, not supply the
> > > > > > attribute.  This seems to do the latter in the sample drivers.  Thanks,  
> > > > > Ok. you are right!
> > > > > what about just keep one sample driver to show how to do the latter,
> > > > > and let the others do the former?  
> > > >
> > > > I'd rather that if a vendor driver doesn't support features requiring
> > > > the version attribute that they don't implement it.  It's confusing to
> > > > developers looking at the sample driver for guidance if we have
> > > > different implementations.  Of course if you'd like to add migration
> > > > support to one of the sample drivers, that'd be very welcome.  Thanks,
> > > >  
> > > Got it:)
> > >
> > > Thanks!
> > > Yan
> > >  
> > > >  
> > > > > > > Cc: Alex Williamson <alex.williamson at redhat.com>
> > > > > > > Cc: Erik Skultety <eskultet at redhat.com>
> > > > > > > Cc: "Dr. David Alan Gilbert" <dgilbert at redhat.com>
> > > > > > > Cc: Cornelia Huck <cohuck at redhat.com>
> > > > > > > Cc: "Tian, Kevin" <kevin.tian at intel.com>
> > > > > > > Cc: Zhenyu Wang <zhenyuw at linux.intel.com>
> > > > > > > Cc: "Wang, Zhi A" <zhi.a.wang at intel.com>
> > > > > > > Cc: Neo Jia <cjia at nvidia.com>
> > > > > > > Cc: Kirti Wankhede <kwankhede at nvidia.com>
> > > > > > >
> > > > > > > Signed-off-by: Yan Zhao <yan.y.zhao at intel.com>
> > > > > > > ---
> > > > > > >  Documentation/vfio-mediated-device.txt | 36 ++++++++++++++++++++++++++
> > > > > > >  samples/vfio-mdev/mbochs.c             | 17 ++++++++++++
> > > > > > >  samples/vfio-mdev/mdpy.c               | 16 ++++++++++++
> > > > > > >  samples/vfio-mdev/mtty.c               | 16 ++++++++++++
> > > > > > >  4 files changed, 85 insertions(+)
> > > > > > >
> > > > > > > diff --git a/Documentation/vfio-mediated-device.txt b/Documentation/vfio-mediated-device.txt
> > > > > > > index c3f69bcaf96e..bc28471c0667 100644
> > > > > > > --- a/Documentation/vfio-mediated-device.txt
> > > > > > > +++ b/Documentation/vfio-mediated-device.txt
> > > > > > > @@ -202,6 +202,7 @@ Directories and files under the sysfs for Each Physical Device
> > > > > > >    |     |   |--- available_instances
> > > > > > >    |     |   |--- device_api
> > > > > > >    |     |   |--- description
> > > > > > > +  |     |   |--- version
> > > > > > >    |     |   |--- [devices]
> > > > > > >    |     |--- [<type-id>]
> > > > > > >    |     |   |--- create
> > > > > > > @@ -209,6 +210,7 @@ Directories and files under the sysfs for Each Physical Device
> > > > > > >    |     |   |--- available_instances
> > > > > > >    |     |   |--- device_api
> > > > > > >    |     |   |--- description
> > > > > > > +  |     |   |--- version
> > > > > > >    |     |   |--- [devices]
> > > > > > >    |     |--- [<type-id>]
> > > > > > >    |          |--- create
> > > > > > > @@ -216,6 +218,7 @@ Directories and files under the sysfs for Each Physical Device
> > > > > > >    |          |--- available_instances
> > > > > > >    |          |--- device_api
> > > > > > >    |          |--- description
> > > > > > > +  |          |--- version
> > > > > > >    |          |--- [devices]
> > > > > > >
> > > > > > >  * [mdev_supported_types]
> > > > > > > @@ -225,6 +228,8 @@ Directories and files under the sysfs for Each Physical Device
> > > > > > >    [<type-id>], device_api, and available_instances are mandatory attributes
> > > > > > >    that should be provided by vendor driver.
> > > > > > >
> > > > > > > +  version is a mandatory attribute if a mdev device supports live migration.
> > > > > > > +
> > > > > > >  * [<type-id>]
> > > > > > >
> > > > > > >    The [<type-id>] name is created by adding the device driver string as a prefix
> > > > > > > @@ -246,6 +251,35 @@ Directories and files under the sysfs for Each Physical Device
> > > > > > >    This attribute should show the number of devices of type <type-id> that can be
> > > > > > >    created.
> > > > > > >
> > > > > > > +* version
> > > > > > > +
> > > > > > > +  This attribute is rw. It is used to check whether two devices are compatible
> > > > > > > +  for live migration. If this attribute is missing, then the corresponding mdev
> > > > > > > +  device is regarded as not supporting live migration.
> > > > > > > +
> > > > > > > +  It consists of two parts: common part and vendor proprietary part.
> > > > > > > +  common part: 32 bit. lower 16 bits is vendor id and higher 16 bits identifies
> > > > > > > +               device type. e.g., for pci device, it is
> > > > > > > +               "pci vendor id" | (VFIO_DEVICE_FLAGS_PCI << 16).
> > > > > > > +  vendor proprietary part: this part is varied in length. vendor driver can
> > > > > > > +               specify any string to identify a device.
> > > > > > > +
> > > > > > > +  When reading this attribute, it should show device version string of the device
> > > > > > > +  of type <type-id>. If a device does not support live migration, it should
> > > > > > > +  return errno.
> > > > > > > +  When writing a string to this attribute, it returns errno for incompatibility
> > > > > > > +  or returns written string length in compatibility case. If a device does not
> > > > > > > +  support live migration, it always returns errno.
> > > > > > > +
> > > > > > > +  for example.
> > > > > > > +  # cat \
> > > > > > > + /sys/bus/pci/devices/0000\:00\:02.0/mdev_supported_types/i915-GVTg_V5_2/version
> > > > > > > +  00028086-193b-i915-GVTg_V5_2
> > > > > > > +
> > > > > > > +  #echo 00028086-193b-i915-GVTg_V5_2 > \
> > > > > > > + /sys/bus/pci/devices/0000\:00\:02.0/mdev_supported_types/i915-GVTg_V5_4/version
> > > > > > > + -bash: echo: write error: Invalid argument
> > > > > > > +
> > > > > > >  * [device]
> > > > > > >
> > > > > > >    This directory contains links to the devices of type <type-id> that have been
> > > > > > > @@ -327,12 +361,14 @@ card.
> > > > > > >          |   |   |-- available_instances
> > > > > > >          |   |   |-- create
> > > > > > >          |   |   |-- device_api
> > > > > > > +        |   |   |-- version
> > > > > > >          |   |   |-- devices
> > > > > > >          |   |   `-- name
> > > > > > >          |   `-- mtty-2
> > > > > > >          |       |-- available_instances
> > > > > > >          |       |-- create
> > > > > > >          |       |-- device_api
> > > > > > > +        |       |-- version
> > > > > > >          |       |-- devices
> > > > > > >          |       `-- name
> > > > > > >          |-- mtty_dev
> > > > > > > diff --git a/samples/vfio-mdev/mbochs.c b/samples/vfio-mdev/mbochs.c
> > > > > > > index b038aa9f5a70..2f5ba96b91a2 100644
> > > > > > > --- a/samples/vfio-mdev/mbochs.c
> > > > > > > +++ b/samples/vfio-mdev/mbochs.c
> > > > > > > @@ -1391,11 +1391,28 @@ static ssize_t device_api_show(struct kobject *kobj, struct device *dev,
> > > > > > >  }
> > > > > > >  MDEV_TYPE_ATTR_RO(device_api);
> > > > > > >
> > > > > > > +static ssize_t version_show(struct kobject *kobj, struct device *dev,
> > > > > > > +             char *buf)
> > > > > > > +{
> > > > > > > +     /* do not support live migration */
> > > > > > > +     return -EINVAL;
> > > > > > > +}
> > > > > > > +
> > > > > > > +static ssize_t version_store(struct kobject *kobj, struct device *dev,
> > > > > > > +             const char *buf, size_t count)
> > > > > > > +{
> > > > > > > +     /* do not support live migration */
> > > > > > > +     return -EINVAL;
> > > > > > > +}
> > > > > > > +
> > > > > > > +static MDEV_TYPE_ATTR_RW(version);
> > > > > > > +
> > > > > > >  static struct attribute *mdev_types_attrs[] = {
> > > > > > >       &mdev_type_attr_name.attr,
> > > > > > >       &mdev_type_attr_description.attr,
> > > > > > >       &mdev_type_attr_device_api.attr,
> > > > > > >       &mdev_type_attr_available_instances.attr,
> > > > > > > +     &mdev_type_attr_version.attr,
> > > > > > >       NULL,
> > > > > > >  };
> > > > > > >
> > > > > > > diff --git a/samples/vfio-mdev/mdpy.c b/samples/vfio-mdev/mdpy.c
> > > > > > > index cc86bf6566e4..ff15fdfc7d46 100644
> > > > > > > --- a/samples/vfio-mdev/mdpy.c
> > > > > > > +++ b/samples/vfio-mdev/mdpy.c
> > > > > > > @@ -695,11 +695,27 @@ static ssize_t device_api_show(struct kobject *kobj, struct device *dev,
> > > > > > >  }
> > > > > > >  MDEV_TYPE_ATTR_RO(device_api);
> > > > > > >
> > > > > > > +static ssize_t version_show(struct kobject *kobj, struct device *dev,
> > > > > > > +             char *buf)
> > > > > > > +{
> > > > > > > +     /* do not support live migration */
> > > > > > > +     return -EINVAL;
> > > > > > > +}
> > > > > > > +
> > > > > > > +static ssize_t version_store(struct kobject *kobj, struct device *dev,
> > > > > > > +             const char *buf, size_t count)
> > > > > > > +{
> > > > > > > +     /* do not support live migration */
> > > > > > > +     return -EINVAL;
> > > > > > > +}
> > > > > > > +static MDEV_TYPE_ATTR_RW(version);
> > > > > > > +
> > > > > > >  static struct attribute *mdev_types_attrs[] = {
> > > > > > >       &mdev_type_attr_name.attr,
> > > > > > >       &mdev_type_attr_description.attr,
> > > > > > >       &mdev_type_attr_device_api.attr,
> > > > > > >       &mdev_type_attr_available_instances.attr,
> > > > > > > +     &mdev_type_attr_version.attr,
> > > > > > >       NULL,
> > > > > > >  };
> > > > > > >
> > > > > > > diff --git a/samples/vfio-mdev/mtty.c b/samples/vfio-mdev/mtty.c
> > > > > > > index 1c77c370c92f..4ae3aad3474d 100644
> > > > > > > --- a/samples/vfio-mdev/mtty.c
> > > > > > > +++ b/samples/vfio-mdev/mtty.c
> > > > > > > @@ -1390,10 +1390,26 @@ static ssize_t device_api_show(struct kobject *kobj, struct device *dev,
> > > > > > >
> > > > > > >  MDEV_TYPE_ATTR_RO(device_api);
> > > > > > >
> > > > > > > +static ssize_t version_show(struct kobject *kobj, struct device *dev,
> > > > > > > +             char *buf)
> > > > > > > +{
> > > > > > > +     /* do not support live migration */
> > > > > > > +     return -EINVAL;
> > > > > > > +}
> > > > > > > +
> > > > > > > +static ssize_t version_store(struct kobject *kobj, struct device *dev,
> > > > > > > +             const char *buf, size_t count)
> > > > > > > +{
> > > > > > > +     /* do not support live migration */
> > > > > > > +     return -EINVAL;
> > > > > > > +}
> > > > > > > +
> > > > > > > +static MDEV_TYPE_ATTR_RW(version);
> > > > > > >  static struct attribute *mdev_types_attrs[] = {
> > > > > > >       &mdev_type_attr_name.attr,
> > > > > > >       &mdev_type_attr_device_api.attr,
> > > > > > >       &mdev_type_attr_available_instances.attr,
> > > > > > > +     &mdev_type_attr_version.attr,
> > > > > > >       NULL,
> > > > > > >  };
> > > > > > >  
> > > > > >
> > > > > > _______________________________________________
> > > > > > intel-gvt-dev mailing list
> > > > > > intel-gvt-dev at lists.freedesktop.org
> > > > > > https://lists.freedesktop.org/mailman/listinfo/intel-gvt-dev  
> > > >
> > > > _______________________________________________
> > > > intel-gvt-dev mailing list
> > > > intel-gvt-dev at lists.freedesktop.org
> > > > https://lists.freedesktop.org/mailman/listinfo/intel-gvt-dev  
> > 
> > _______________________________________________
> > intel-gvt-dev mailing list
> > intel-gvt-dev at lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/intel-gvt-dev  




More information about the libvir-list mailing list