Virtio-networking: first series finale and plans for 2020

November 18, 2019Amnon Ilan, Michael S. Tsirkin, Jason Wang, Ariel Adam5-minute read

Let's take a short recap of the Virtio-networking series that we've been running the past few months. We've covered a lot of ground! Looking at this series from a high level, let's revisit some of the topics we covered:

A short recap of this series

Introduction to virtio-networking and vhost-net followed by the technical deep dive and hands on post discussed the vhost-net/virtio-net architecture. We covered KVM, QEMU and libvirt, slightly touched the subject of the virtio specification (devices/drivers) and the vhost protocol. We also covered Open vSwitch (OVS) for communicating with the outside world.

How vhost-user came into being: virtio-networking and DPDK followed by the technical deep dive and hands on post discussed the vhost-user/virtio-pmd architecture. We covered DPDK (Data Plane Development Kit), OVS-DPDK and vhost-user library in DPDK. We’ve also covered virtual function I/O (VFIO), virtio-pmd drivers, I/O memory management unit (IOMMU) and vIOMMU which together enable giving userspace networking applications direct access to a NIC device.

Achieving network wirespeed in an open standard manner: introducing vDPA followed by the technical deep dive and hands on post went on to focus on providing wirespeed performance for VMs. We started by explaining how SRIOV works and its advantages. Then we covered virtio full HW offloading and virtual data path acceleration (vDPA) explaining their value compared to SR-IOV. We explored the mediator devices (MDEV), their connection to VFIO and how they are used as vDPA building blocks including virtio-mdev and vhost-mdev. We also provided a comparison of the different virtio-networking architectures (vhost-net/virtio-net, vhost-user/virtio-pmd, virtio full HW offloading and vDPA).

The Breaking cloud native network performance barriers post moved on from the realm of VMs to the realm of containers. We explored how vDPA provides wirespeed/wirelatency L2 interfaces to containers building on kubernetes orchestration and the Multus CNI. We are presenting a demo at Kubecon NA November 2019 that follows this post.

In the Making High Performance Networking Applications Work on Hybrid Clouds post we turned our attention to accelerated containers running on multiple clouds. We showed how a single container image with a secondary accelerated interface (using Multus) could run over multiple clouds without changes in the container image. This was accomplished by using different virio-networking technologies including vDPA and virtio full HW offloading. We are presenting a demo at Kubecon NA November 2019 that follows this post.

Where is this all going to?

We started this series with the blog Introducing virtio-networking: Combining virtualization and networking for modern IT. In a nutshell, the virtio-networking community aims to leverage virtio technologies for providing network acceleration to VMs and containers in mixed virtual/cloud native environments and hybrid clouds. As a far looking vision, virtio technologies can even serve generic standard NIC solution for bare metal.

We aren’t saying that this isn’t already done to some extent today with closed proprietary solutions. We are saying that we can do it better with open standardized solutions and gaining broad acceptance of the industry in the process.

We are strong believers in not reinventing the wheel and bring in innovation were it brings real value. The virtio Oasis community for the specification, and the open-source implementations with the projects virtio is tapped into (kernel, QEMU, DPDK etc…) are solid foundations for the mentioned goal.

Leveraging standard building blocks is a powerful way to innovate when it comes to accelerating VMs and containers in hybrid deployments bringing solid value to users and can help prevent vendor lock-in.

What’s planned next?

A second virtio-networking series in 2020! There are a number of topics we haven’t had time to review, topics in development missing the grisly low-level details (aka technical deep dives) and new directions currently being explored. Let’s mention a few:

The virtio spec and especially the different ring layouts have only briefly been discussed. We’d like to further explore the spec explaining the feature bits and negotiation, the notifications, the virtqueues (split-ring and packed-ring layouts), device initialization, transport options and of course the different device types. You are always welcome to simply go and read the spec however we’d like to give you the essence without reading 158 pages (give or take).

The Kubernetes vDPA integration was explained at a high level however there are many details missing relating to the actual implementation. This includes the vDPA device plugin, vDPA CNI (both might materialize and just configuration add-ons on the SR-IOV plugin and CNI), initialization and connection to a number of kubernetes internal models. The architecture and implementation are in advanced development stages and will be described in detail in the next series of posts.

The AF_VIRTIO encapsulates a number of new initiatives and approaches aimed at simplifying the consumption of the vDPA interface. That is, instead of using DPDK libraries to consume accelerated interfaces, using Linux style sockets instead. As mentioned along with the series, we have a tradeoff between the simplicity of the virtio interface to the user and its speed/latency. vDPA combined with DPDK is optimal in the sense of speed/latency however is not a walk in the park. AF_VIRTIO tries to maintain high speed/latency while significantly simplifying the user’s experience. This project is in its initial conceptualization and design phase.

Scalable IOV is a new technology developed by Intel that is using an ID named PASID (Process Address Space ID) added for each PCI transaction, enabling fine-grained association of resources to processes (replacing or complementing VFs) and IOMMU. This means that instead of the number of VF limiting the number of virtual-devices/connections, this limitation is lifted and you can have a number which is larger by an order of magnitude and even more. These new IOMMU devices will also support PRI (Page Request Interface) which will enable flexible cooperation between the CPU and the devices. vDPA and AF_VIRTIO are able to leverage such interfaces.

Why else is Scalable IOV interesting? One reason is that we don’t need to pin memory pages as is done today. And why is that interesting? Because it means we now have a bunch of powerful features to build on, such as KSM (kernel same page merging), NUMA balancing and transparent huge pages. Another reason is that using this technology, vIOMMU will be enabled by hardware instead of software, supporting DPDK applications in the guest userspace and nesting. This will be elaborated in detail in future blogs.

Virtio as a bare-metal driver is yet another interesting direction that virtio can provide. Imagine having a single network driver that is the same on any hardware, with any NIC and implemented on any OS (see NVMe as an example from the storage realm). We still have a way to go however the vDPA drivers implemented today are already laying the ground for such an ambitious direction.

Summary

For those who didn’t crack and made it all the way here, we hope this series helped you clarify the dark magic of virtio and low-level networking both in the Linux kernel and in DPDK.

We are always open to comments, new ideas and suggestions so feel free to drop any of the blog writers a quick mail and we will get back to you. See you in 2020!