Traditionally, when looking at a virtio device and its corresponding virtio driver, we assume the device is trusted by the driver. We do, however, need to protect the virtio device from a possible malicious virtio driver.
The logic behind this approach is that the virtio driver is a “smart” software element which could contain malicious software logic while the virtio device is a “dumb” element capable of doing only what it’s asked to do. Another reason for this logic is that we traditionally focus on protecting the host from the workload running on it containing the virtio driver (be it a virtual machine (VM) or a container) or protecting workloads from each other.
Emerging hardware frameworks and security frameworks are turning things around by shifting the focus to protecting the virtio driver from the virtio device. One reason for this change is emerging smart NIC technologies that contain virtio devices that are transformed from “dumb” elements to sophisticated elements that are also capable of running malicious software.
Another reason for the change is the emergence of confidential computing/confidential containers approaches, where the focus moved to protecting the workloads (VM or container) from the host itself. This is achieved using emerging hardware encryption technologies such as SEV (ES/SNP), TDX, etc.
As a result, key changes are required in the security boundaries of virtio/vDPA drivers.
In this article we present a high-level overview of work currently being done in virtio/vDPA to support the emerging threats, including actual use cases. We also provide some insight into why using hardened virtio/vDPA devices/drivers is preferable to using vendor-specific devices and drivers as the industry shifts to smart NICs and confidential computing hardware-enabled technologies.
Virtio architecture and terminology
The focus in this article, when relating to virtio, are virtio interfaces spanning from the workloads (VM or container) to dedicated hardware supporting virtio bypassing the kernel (for data plane). This is as opposed to traditional host-to-guest virtio interfaces. These interfaces correspond with the current trends seen in the market of using virtio devices in hardware (networking, storage, etc.).
A virtio device is a device implementation which complies with the virtio specification.
A virtio driver is the driver to control and communicate with a virtio device.
vDPA (virtual data path acceleration) is a kernel shim layer intended to translate hardware vendor control plane to virtio control plane and vice versa. It assumes the virtio data plane is already implemented in the vendor’s hardware, and aims to simplify the full integration of virtio in the hardware (control plane and data plane) by removing the need to implement virtio control plane as well.
Some vendors implement both the virtio data plane and control plane in hardware and thus do not need vDPA. This approach is called full virtio hardware offloading. To deal with both cases we will use virtio/vDPA when talking about virtio in this blog.
The virtio specification originally aimed to give virtual environments and guests (VMs) a straightforward, efficient, standard and extensible mechanism for using virtual devices rather than boutique per-environment or per-OS mechanisms. As mentioned, with the increasing popularity of virtio, vendors have started to implement and use virtio in non-virtualization environments such as real hardware including smartNIC, accelerators and embedded devices.
The following diagram shows the high level architecture of virtio devices:
Figure 1. Virtio architecture
Each virtio device consists of three layers:
The transport layer of a virtio device is a bus layer interface that is used by the driver for accessing the basic facilities. For example, a virtio device implemented with PCI has registered that could be accessed via PCI configuration space or MMIO. And a virtio PCI driver will use those registers to set up the devices.
The basic facilities for a virtio device consists of the virtqueues, features bits and device config space that are defined in the virtio spec. For example, the device must obey the virtqueue layout and descriptor format defined in the virtio spec in order to communicate with a virtio driver correctly.
The device types for a virtio device implement different device logics such as ethernet (for networking), SCSI (for storage), etc.
Virtio driver and device must obey semantics and normatives defined in the virtio specification in order to communicate correctly. For example, a virtio PCI driver must use BAR (base address register) and a register layout defined in the virtio spec in order to access and use the device facilities.
In reality, however, it’s challenging to create a perfect implementation for the spec. For example, how to behave when the virtio device side or driver side do not follow the spec is not fully answered within the spec. These types of questions have to be considered by the vendors in order to have a robust implementation of both the virtio driver and virtio device.
To answer these questions, we will introduce the threat model for the virtio device and driver which will help us explain different approaches for addressing these issues.
Virtio device threat model
The device perspective trust model of virtio is shown in the following figure:
From the device implementation perspective, the security boundary is the boundary of itself. That is, the virtio device (green block) doesn’t trust any entity that talks to it via the transport layer (yellow blocks). This means the virtio device needs to validate any request that crosses its security boundary. This includes the following three types of validation.
Control path validation
The virtio device needs to validate every request it receives to access facilities such as virtqueues, config space and others, instead of assuming that the virtio driver can follow the normatives of the spec. For example, a malicious driver could try to negotiate reserved feature bits or access a non-existing virtqueue in order to trigger an unexpected logic path of the device. To avoid this, the virtio device must be able to detect and fail such a request
Data path validation
The virtio device can’t assume the data in virtqueue and the buffers are well prepared as required by the virtio specification. This means the virtio device needs to validate the descriptors and the buffer metadata so it can fail invalid requests.
For example, the malicious driver may craft an infinite descriptor chain causing a denial of service (DoS) in the device. To avoid these situations, the virtio device needs to be able to detect such a loop and fail the request. A malicious virtio driver can also attack the device by using illegal buffer metadata (for example, using the reserved value or unsupported combinations) so the virtio device should be able to detect those violations and fail them to mitigate unexpected behavior.
Inter device validation
Some of the virtio transports allow the virtio device to talk to other devices (for example, PCI allows intra device communication via peer to peer transaction). The virtio device should not assume that the input from such inter device communication can obey the virtio or transport spec (even if it’s another virtio device). Instead, the virtio device should validate all the input from external devices and detect the spec violations so it can fail them or utilize the transport specific intra-device isolation facility..
Such a threat model has been a common practice for the virtual device for the past years. Most of the popular virtio backends (such as vhost and Qemu) have been hardened to prevent these kinds of possible attacks from both the guest and other virtual devices. For hardware virtio/vDPA vendors, such practices should be applied as well in order to have a robust device implementation.
Virtio driver threat model
Traditionally, since the virtio was born from a virtualization environment, the virtio driver usually trusts the virtio device and assumes the device follows the virtio specification. With recent technologies such as confidential computing, smartNIC and vDPA (where the device logic could be implemented is a less privilege entity), a new and stronger threat model is required as demonstrated in the following figure:
In this threat model, from the virtio driver’s perspective as a software module, it must at least trust some hardware components (green blocks) in order to perform all the necessary protections and checks. This includes:
The virtio driver — the virtio drive should trust itself and the core operating system facilities.
The CPU — the virtio driver trusts the CPU and assumes it isn’t a malicious entity. There’s currently no other choice since the driver itself can not perform validation on the CPU.
The platform devices — the virtio driver trusts the platform devices. For example, the virtio driver trusts the platform IOMMU device to perform the necessary checks for limiting the memory that a virtio device can access.
The security boundary then becomes the green dotted line in Figure 3. This also implies that we don’t trust the following entities (yellow blocks):
The virtio device — the virtio driver won’t trust the virtio device, so it can’t assume it behaves exactly as what virtio specification required. In the case of a virtualization environment, the hypervisor/VMM is not trusted as well. This means that any metadata stored in the shared memory that can be accessed by the virtio device must not be trusted. For example, the virtio driver should not trust the metadata (e.g the buffer address, length and next pointer stored in the descriptor) stored in the virtqueue since it can be mangled by the malicious virtio device.
Other devices — the virtio driver should not trust any other devices that can talk to the virtio device. If possible, the driver can choose to disable the inter-device communication in a transport-specific way.
The virtio driver should perform necessary checks on every input that crosses the security boundary:
Datapath validation — Any metadata that could be modified by the virtio device should be validated by the virtio driver. For example, the driver should store the buffer metadata in private memory that can not be accessed by the virtio device instead of depending on the data stored in the virtqueue. The virtio driver should also perform the necessary checks on the buffer metadata produced by the virtio device and drop the illegal metadata.
Control path validation — The virtio driver should not trust any value that is read from the virtio device. For example, malicious virtio devices may try to crash or poke the virtio driver's private memory by crafting some illegal config spaces. The virtio driver should be able to sanitize those configs or simply fail the driver probing.
Notification validation — The virtio driver should be prepared for the unexpected notification raised by the malicious virtio device at any time. The notification handler should be disabled by the driver when it is not expected to be called.
We believe the threat model described in this section is mandatory for a number of emerging use cases:
Confidential computing — In those cases, the guest memory is encrypted and hardware-based technologies can prevent the hypervisor from accessing arbitrary guest memory. The hypervisor itself is not trusted as well as the virtualized virtio device. A lot of hardening work on top of the virtio driver is required by the guest, so the above threat model should be used with the one defined by vendor-specific confidential computing technology (for example SEV-SNP or TDX).
Accelerator or smart NIC — In this case, the virtio logic can be implemented in software or firmware running inside the SoC (system on chip) side of the hardware. There’s no guarantee of safety for those software modules.
Userspace virtio/vDPA devices — New technologies are emerging such as VDUSE (vDPA device in userspace) allowing a user space device to interact with kernel virtio drivers. Hardening is required in this case to prevent an unprivileged user space virtio device to poke or crash the kernel.
Currently, most of the common virtio drivers don't implement the above threat model. However, the work of hardening the different virtio drivers in Linux has begun. It should be noted that we expect the threat model may need to be strengthened or relaxed as the software and hardware technologies evolve.
Virtio hardening and vendor-specific hardening
We believe driver hardening will become a common practice in the future as the threat models evolve. This means the hardening needs to be done not only for the virtio driver but also for other vendor-specific drivers. With this trend, virtio will demonstrate the advantages of its open standard:
A well prepared specification as a reference for the cloud/hardware vendors to implement the devices or drivers
Easy-to-use or easy-to-implement fuzzing tools to validate robustness or detect defects
Reuse and simplify the driver validation across different vendors
Native static/dynamic checker support for some operating systems such as Linux
Open source code anyone can look at and verify there are no back doors
A large community constantly developing, testing and challenging the code to help address vulnerabilities more quickly
By supporting virtio/vDPA devices in their NIC, vendors can build on this work to help protect new hardware they develop from security risks. Instead of creating vendor-specific and closed source implementations of devices and drivers that are more prone to vulnerabilities, approaches such as virtio/vDPA can save time and help vendors focus on their “secret sauce” instead.
In this article we have reviewed two threat models virtio is required to deal with:
The historical use case of a virtio device not trusting the virtio driver
The emerging use case of virtio drivers not trusting the virtio device
For each of the threat models, we have provided some insight into what the virtio community is currently focusing on hardening. We covered a number of key validations which are being added, leveraging the open virtio specifications.
We believe this approach simplifies the process of hardening devices and drivers compared to vendor-specific approaches in terms of testing, maintaining and deploying such solutions.