[libvirt] [Qemu-devel] [PATCH 1/2] vfio/mdev: add version field as mandatory attribute for mdev device

Christophe de Dinechin dinechin at redhat.com
Wed Apr 24 09:10:43 UTC 2019



> On 23 Apr 2019, at 12:39, Daniel P. Berrangé <berrange at redhat.com> wrote:
> 
> On Fri, Apr 19, 2019 at 04:35:04AM -0400, Yan Zhao wrote:
>> device version attribute in mdev sysfs is used by user space software
>> (e.g. libvirt) to query device compatibility for live migration of VFIO
>> mdev devices. This attribute is mandatory if a mdev device supports live
>> migration.
>> 
>> It consists of two parts: common part and vendor proprietary part.
>> common part: 32 bit. lower 16 bits is vendor id and higher 16 bits
>>             identifies device type. e.g., for pci device, it is
>>             "pci vendor id" | (VFIO_DEVICE_FLAGS_PCI << 16).
>> vendor proprietary part: this part is varied in length. vendor driver can
>>             specify any string to identify a device.
>> 
>> When reading this attribute, it should show device version string of the
>> device of type <type-id>. If a device does not support live migration, it
>> should return errno.
>> When writing a string to this attribute, it returns errno for
>> incompatibility or returns written string length in compatibility case.
>> If a device does not support live migration, it always returns errno.
>> 
>> For user space software to use:
>> 1.
>> Before starting live migration, user space software first reads source side
>> mdev device's version. e.g.
>> "#cat \
>> /sys/bus/pci/devices/0000\:00\:02.0/5ac1fb20-2bbf-4842-bb7e-36c58c3be9cd/mdev_type/version"
>> 00028086-193b-i915-GVTg_V5_4
>> 
>> 2.
>> Then, user space software writes the source side returned version string
>> to device version attribute in target side, and checks the return value.
>> If a negative errno is returned in the target side, then mdev devices in
>> source and target sides are not compatible;
>> If a positive number is returned and it equals to the length of written
>> string, then the two mdev devices in source and target side are compatible.
>> e.g.
>> (a) compatibility case
>> "# echo 00028086-193b-i915-GVTg_V5_4 >
>> /sys/bus/pci/devices/0000\:00\:02.0/882cc4da-dede-11e7-9180-078a62063ab1/mdev_type/version"
>> 
>> (b) incompatibility case
>> "#echo 00028086-193b-i915-GVTg_V5_1 >
>> /sys/bus/pci/devices/0000\:00\:02.0/882cc4da-dede-11e7-9180-078a62063ab1/mdev_type/version"
>> -bash: echo: write error: Invalid argument
> 
> What you have written here seems to imply that each mdev type is able to
> support many different versions at the same time. Writing a version into
> this sysfs file then chooses which of the many versions to actually use.
> 
> This is good as it allows for live migration across driver software upgrades.
> 
> A mgmt application may well want to know what versions are supported for an
> mdev type *before* starting a migration. A mgmt app can query all the 100's
> of hosts it knows and thus figure out which are valid to use as the target
> of a migration.
> 
> IOW, we want to avoid the ever hitting the incompatibility case in the
> first place, by only choosing to migrate to a host that we know is going
> to be compatible.
> 
> This would need some kind of way to report the full list of supported
> versions against the mdev supported types on the host.
> 
> 
>> 3. if two mdev devices are compatible, user space software can start
>> live migration, and vice versa.
>> 
>> Note: if a mdev device does not support live migration, it either does
>> not provide a version attribute, or always returns errno when its version
>> attribute is read/written.
>> 
>> Cc: Alex Williamson <alex.williamson at redhat.com>
>> Cc: Erik Skultety <eskultet at redhat.com>
>> Cc: "Dr. David Alan Gilbert" <dgilbert at redhat.com>
>> Cc: Cornelia Huck <cohuck at redhat.com>
>> Cc: "Tian, Kevin" <kevin.tian at intel.com>
>> Cc: Zhenyu Wang <zhenyuw at linux.intel.com>
>> Cc: "Wang, Zhi A" <zhi.a.wang at intel.com>
>> Cc: Neo Jia <cjia at nvidia.com>
>> Cc: Kirti Wankhede <kwankhede at nvidia.com>
>> 
>> Signed-off-by: Yan Zhao <yan.y.zhao at intel.com>
>> ---
>> Documentation/vfio-mediated-device.txt | 36 ++++++++++++++++++++++++++
>> samples/vfio-mdev/mbochs.c             | 17 ++++++++++++
>> samples/vfio-mdev/mdpy.c               | 16 ++++++++++++
>> samples/vfio-mdev/mtty.c               | 16 ++++++++++++
>> 4 files changed, 85 insertions(+)
>> 
>> diff --git a/Documentation/vfio-mediated-device.txt b/Documentation/vfio-mediated-device.txt
>> index c3f69bcaf96e..bc28471c0667 100644
>> --- a/Documentation/vfio-mediated-device.txt
>> +++ b/Documentation/vfio-mediated-device.txt
>> @@ -202,6 +202,7 @@ Directories and files under the sysfs for Each Physical Device
>>   |     |   |--- available_instances
>>   |     |   |--- device_api
>>   |     |   |--- description
>> +  |     |   |--- version
>>   |     |   |--- [devices]
>>   |     |--- [<type-id>]
>>   |     |   |--- create
>> @@ -209,6 +210,7 @@ Directories and files under the sysfs for Each Physical Device
>>   |     |   |--- available_instances
>>   |     |   |--- device_api
>>   |     |   |--- description
>> +  |     |   |--- version
>>   |     |   |--- [devices]
>>   |     |--- [<type-id>]
>>   |          |--- create
>> @@ -216,6 +218,7 @@ Directories and files under the sysfs for Each Physical Device
>>   |          |--- available_instances
>>   |          |--- device_api
>>   |          |--- description
>> +  |          |--- version
>>   |          |--- [devices]
>> 
>> * [mdev_supported_types]
>> @@ -225,6 +228,8 @@ Directories and files under the sysfs for Each Physical Device
>>   [<type-id>], device_api, and available_instances are mandatory attributes
>>   that should be provided by vendor driver.
>> 
>> +  version is a mandatory attribute if a mdev device supports live migration.
>> +
>> * [<type-id>]
>> 
>>   The [<type-id>] name is created by adding the device driver string as a prefix
>> @@ -246,6 +251,35 @@ Directories and files under the sysfs for Each Physical Device
>>   This attribute should show the number of devices of type <type-id> that can be
>>   created.
>> 
>> +* version
>> +
>> +  This attribute is rw. It is used to check whether two devices are compatible
>> +  for live migration. If this attribute is missing, then the corresponding mdev
>> +  device is regarded as not supporting live migration.
>> +
>> +  It consists of two parts: common part and vendor proprietary part.
>> +  common part: 32 bit. lower 16 bits is vendor id and higher 16 bits identifies
>> +               device type. e.g., for pci device, it is
>> +               "pci vendor id" | (VFIO_DEVICE_FLAGS_PCI << 16).
>> +  vendor proprietary part: this part is varied in length. vendor driver can
>> +               specify any string to identify a device.
>> +
>> +  When reading this attribute, it should show device version string of the device
>> +  of type <type-id>. If a device does not support live migration, it should
>> +  return errno.
>> +  When writing a string to this attribute, it returns errno for incompatibility
>> +  or returns written string length in compatibility case. If a device does not
>> +  support live migration, it always returns errno.
>> +
>> +  for example.
>> +  # cat \
>> + /sys/bus/pci/devices/0000\:00\:02.0/mdev_supported_types/i915-GVTg_V5_2/version
>> +  00028086-193b-i915-GVTg_V5_2
>> +
>> +  #echo 00028086-193b-i915-GVTg_V5_2 > \
>> + /sys/bus/pci/devices/0000\:00\:02.0/mdev_supported_types/i915-GVTg_V5_4/version
>> + -bash: echo: write error: Invalid argument
>> +
> 
> IIUC this path is against the physical device. IOW, the mgmt app would have
> to first write to the "version" file to choose a version, and then write to
> the "create" file to actually create an virtual device. This has the obvious
> concurrency problem if multiple devices are being created at the same time
> and distinct versions for each device are required. There would need to be
> a locking scheme defined to ensure safety.
> 
> Wouldn't it be better if we can pass the desired version when we write to
> the "create" file, so that we avoid any concurrent usage problems. "version"
> could be just a read-only file with a *list* of supported versions.
> 
> eg
> 
>  $ cat /sys/bus/pci/devices/0000\:00\:02.0/mdev_supported_types/i915-GVTg_V5_4/version
>  5.0
>  5.1
>  5.2
> 
>  $ echo "83b8f4f2-509f-382f-3c1e-e6bfe0fa1001;version=5.2" >
>      /sys/devices/virtual/mtty/mtty/mdev_supported_types/mtty-2/create
> 

I read Alex’s comment that creating an mdev with a specific version is not the intent here. However…

- If the goal is just to check compatibility with migration data, then I think the name should be more explicit, e.g. migration_version instead of version. Otherwise, I read the intent the way Daniel did.

- If we want to provide a list of versions and make it possible to create instances based on a version, I would rather use a directory structure for that, i.e. (replacing cat with ls):

$ ls /sys/bus/pci/devices/0000\:00\:02.0/mdev_supported_types/i915-GVTg_V5_4/version
 5.0
 5.1
 5.2

where each version is a directory that has its own available_instances, device_api, description, create, …

$ echo 83b8f4f2-509f-382f-3c1e-e6bfe0fa1001 > /sys/bus/pci/devices/0000\:00\:02.0/mdev_supported_types/i915-GVTg_V5_4/version/5.1/create

In addition to not having the problem of two consecutive writes to distinct files, I can imagine contrived cases where available_instances might depend on the version…


Thanks
Christophe

> 
> Regards,
> Daniel
> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|





More information about the libvir-list mailing list