In the previous post, we provided a high level overview of the kernel vDPA framework solution. Starting with this post and for the posts, we will dive into the technical details of the architecture and use cases for the kernel vDPA framework interacting with containers and VMs.
The audience of these technical articles are those who really want to understand the logic and details behind the vDPA kernel design. For those who only want to focus on the high level solution we recommend sticking to the previous posts.
We will start by reviewing the design and implementation of the vDPA bus, the vDPA bus driver and the vDPA device driver. These building blocks will be used to build the end to end solution as we progress.
A vDPA device is a device that uses a datapath which complies with the virtio specifications with a vendor specific control path. vDPA devices can be both physically located on the hardware or emulated by software. As illustrated by the following figure, vDPA hardware devices are usually implemented with the following types:
PF (Physical Function): A single Physical Function.
VF (Virtual Function): Device that supports single root I/O virtualization (SR-IOV). Its Virtual Function (VF) represents a virtualized instance of the device that can be assigned to different partitions independently.
VDEV (Virtual Device): With technologies such as Intel Scalable IOV, a virtual device composed by host OS utilizing one or more ADIs (ADI - A device Physical Function may be configured to support multiple light-weight Assignable Device Interfaces that can be assigned to different partitions as virtual devices).
SF (Sub Function): Vendor specific interface to slice the Physical Function to multiple subfunctions that can be assigned to different partitions as virtual devices.
Note that though PCIe hardware is discussed in this proposal, the actual implementation could be a non PCIe hardware. From a driver's perspective, depending on how and where the DMA translation is done, vDPA devices are split into two types:
Platform specific DMA translation: From the driver's perspective, the device can be used on a platform where device access to data in memory is limited and/or translated with platform IOMMU. An example is a PCIE vDPA whose DMA request was tagged via a bus (e.g PCIe) specific way. DMA translation and protection are done at PCIE bus IOMMU level.
Device specific DMA translation: The device implements DMA isolation and protection through its own logic. For example, by implementing the IOMMU on-chip, as it's typically the case for vDPA based on subfunctions or virtual functions. Note that such on-chip IOMMU may cooperate with platform IOMMU for a two stage DMA translation. This is usually done by using on-chip IOMMU as level 1 DMA translation and using platform IOMMU as level 2 translation.
To hide the differences and complexity of the above types for a vDPA device/IOMMU options and in order to present a generic virtio/vhost device to the upper layer, a device agnostic framework is required.
Figure 1: different types of vDPA devices
vDPA framework overview
For the main objectives of the vDPA kernel framework and vhost/virtio subsystems refer to our previous blog (solution overview)
By combining the vDPA framework and the vhost/virtio subsystems, kernel virtio drivers or userspace vhost drivers think they are controlling a vhost or virtio device while in practice it’s a vDPA device. In the following sections we will describe the internals of the vDPA framework including the vDPA bus and vDPA device driver. We will also touch briefly on virtio-vDPA bus driver and vhost-vDPA bus driver (covered in the next blog).
The vDPA framework implements a software bus where several types of devices and drivers (bus drivers, usually) can be attached. The vDPA bus abstracts common attributes of devices and allows bus drivers to connect vDPA devices with other kernel subsystems. This bus also defines operations that interface the device abstraction with the bus driver (
virtio specific operations - For controlling the virtio device. This includes:
Set and get device configuration space
Set and get virtqueue state
Virtio feature negotiation
Interrupt operations - Allows vDPA bus drivers to register callbacks for virtqueue, config interrupt or provide helpers to support IRQ forwarding such as Intel posted interrupt
Doorbell operations - Allows vDPA device drivers to register the callbacks when a specific virtqueue is kicked or provide an operation method to report the doorbell location and map it to userspace directly
Migration operations - This includes two requirements:
Operation method to set and get the device or virtqueue state
Operation method to allow the bus driver to collect dirty pages modified by the hardware
DMA mapping operations - An operation for enabling the vDPA devices to set their own DMA mapping. This operation is optional and is only required for vDPA devices that need device specific DMA translations
The bus driver (
vdpa_config_ops) is designed to be type agnostic to support various kinds of vDPA bus drivers and vDPA devices. The vDPA bus driver may choose to use all or only part of these operations. For example, dirty page tracking may only be used for the case of VM during live migration.
Figure 2 describes the vDPA bus and vDPA device abstraction:
Figure 2: vDPA bus abstraction
With the abstraction of vDPA bus and vDPA bus operations, the difference and complexity of the underlying hardware is hidden from the upper layer. vDPA bus drivers and vDPA devices can cooperate with unified bus operations without knowing the details of each other.
vDPA device driver
A vDPA device driver is used to communicate directly with the vDPA device through a vendor specific method while presenting a vDPA device (by implementing the common vDPA bus operations) to the vDPA bus.
From the hardware point of view, the vDPA device driver is just another driver, however, from the vDPA bus point of view, the vDPA device driver is a vDPA device.
This is shown in figure 3:
Figure 3: vDPA device driver
vDPA device drivers need to accomplish the following tasks:
vDPA device probing/removing - Requires a vendor or platform specific procedure to probe/remove vDPA hardware on the hardware bus. While probing/removing a vDPA device, the driver needs to register/unregister the device to the vDPA bus accordingly
vDPA device management - The responsibility of the vDPA device driver is to cooperate with different management modules (such as PF driver) to support life cycle management for a vDPA device
Interrupt processing - vDPA device drivers need to allocate and process hardware interrupt in a platform or vendor specific way. The driver will report the interrupt used by each virtqueue to the vDPA bus. The vDPA bus layer will then propagate the interrupt to KVM for direct injection to the guest
Device abstraction - Abstract hardware as a vDPA device and implement vdpa_config_ops. vDPA device drivers will perform mediation between the vDPA bus operations and vendor/platform specific access functions provided by the hardware/platform
Perform DMA mapping - For devices to have their own DMA translation unit, vDPA device driver needs to accept the DMA mapping request from vDPA bus and translate those mappings into a data structure that could be used by vendor specific DMA translation units. The device driver must provide a DMA device to the vDPA bus that is used for performing DMA/IOMMU related operations such as IOMMU probing and populating DMA mappings.
vDPA device driver serves as a mediator between the vDPA bus operations and vendor specific device access methods. The vDPA device driver communicates with vDPA bus drivers through vdpa bus operations. The vDPA device driver also communicates with hardware using vendor specific access methods. Provisioning and management tasks are done by cooperating with the management module. From the point of view of the vDPA bus drivers there is only a common vDPA device abstraction which uses the vDPA bus operations to control different types of vDPA devices.
vDPA bus drivers
The vDPA bus drivers are used to connect the vDPA bus to the vhost and virtio subsystems sitting on top of the vDPA kernel framework. Note that this is different from the vDPA device drivers which are used to connect the vDPA bus towards the vDPA physical devices.
There are two types of vDPA bus drivers corresponding to the vhost and virtio subsystems:
vhost-vDPA bus driver - This driver connects the vDPA bus to the vhost subsystem and presents a vhost char device to the userspace. This is useful for cases when the datapath is expected to bypass the kernel completely. Userspace drivers can control the vDPA device via vhost ioctls as if a vhost device. A typical use case is for performing direct I/O to userspace (or VM).
virtio-vDPA bus driver - This driver bridges the vDPA bus to a virtio bus and from there to a virtio interface. With the help of a virtio-vDPA bus driver, the vDPA device behaves as a virtio device so it can be used by various kernel subsystems such as networking, block, crypto etc. Applications that do not use vhost userspace APIs can keep using userspace APIs that are provided by kernel networking, block and other subsystems.
Figure 4 shows the different vDPA bus drives and their connections:
Figure 4: vDPA bus drivers
By supporting different vDPA bus drivers in the vDPA kernel framework, vDPA gains flexibility since the use cases are not limited to direct userspace I/O via vhost ioctl. Most kernel subsystems can be used with vDPA devices and new vDPA bus drivers can be developed to support features derived from new platforms and hardware.
A sysfs based device and driver management interface is provided for userspace management tools. This is used for naming the vDPA device and selecting the driver that needs to be bound to the vDPA device.
Let's take an example: in order to find the vDPA device name corresponding to the PCI device name we use the sysfs path for the PCI device. If our VF vDPA is located at PCI bus address 0000:07:00.2, we can get its vDPA device name via this command:
# ls /sys/bus/pci/devices/0000\:07\:00.2/ | grep vdpa
We can obtain the device object from /sys/bus/vdpa/devices:
# ls /sys/bus/vdpa/devices/
A further question is how can we switch from one vDPA bus driver to another vDPA bus driver?
The first step is to find which driver is bound to the vDPA device.
Again, if we go to our previous example then we can check the driver binding for vdpa0 with the following command:
# ls -l /sys/bus/vdpa/devices/vdpa0/driver
lrwxrwxrwx. 1 root root 0 May 21 03:44 /sys/bus/vdpa/devices/vdpa0/driver -> ../../../../../bus/vdpa/drivers/vhost_vdpa
You can see that vhost-vdpa bus driver was bound to vdpa0.
The second step is to switch the binding to virtio-vdpa.
# echo vdpa0 > /sys/bus/vdpa/drivers/vhost_vdpa/unbind
# echo vdpa0 > /sys/bus/vdpa/drivers/virtio_vdpa/bind
# ls -l /sys/bus/vdpa/devices/vdpa0/
lrwxrwxrwx 1 root root 0 June 23 11:19 driver -> ../../bus/vdpa/drivers/virtio_vdpa
drwxr-xr-x 2 root root 0 June 23 11:19 power
lrwxrwxrwx 1 root root 0 June 23 11:19 subsystem -> ../../bus/vdpa
-rw-r--r-- 1 root root 4096 June 23 11:19 uevent
drwxr-xr-x 4 root root 0 June 23 11:19 virtio0
You can see that the driver bound to vdpa0 is virtio_vdpa and its virtio device is named as virtio0.
Using these two steps you can perform the desired switching.
A note on the management API
Except for the basic device/driver management interface via sysfs the vDPA kernel framework does not enforce a unified management API for operations such as device provisioning. The vDPA kernel framework leaves this to the vendor specific vDPA device driver to implement.
For example, in the case of VF based vDPA the vendor may still use the existing management API for performing basic provisioning:
A devlink based management API is being developed upstream. vDPA subsystems will then integrate it in the future.
We introduced the concept of the vDPA bus. We then discussed design considerations and implementation of the vDPA device driver and provided an overview of vhost-vDPA and virtio-vDPA bus drivers. We also provided a detailed example for a vDPA PCI device.
Next, we will cover the details of the vDPA bus drivers and how the vDPA bus drivers cooperate with userspace applications for providing services to both VM and containers.