Up until now we have covered virtio-networking and its usage in VMs. We started with the original vhost-net/virtio-net architecture, moved on to the vhost-user/virito-pmd architecture and continued to vDPA (vHost Data Path Acceleration) where the virtio ring layout was pushed all the way into the NIC providing wiresspeed/wirelatency to VMs.
We now turn our attention to using vDPA for providing wirespeed/wirelatency L2 interfaces to containers leveraging kubernetes to orchestrate the overall solution. We will demonstrate how Containerized Network Functions (CNFs) can be accelerated using a combination of vDPA interfaces and DPDK libraries. The vDPA interfaces are added as a secondary interface to containers using the Multus CNI plugin.
This post is a high level solution overview describing the main building blocks and how they fit together. We assume that the reader has an overall understanding of Kubernetes, the Container Network Interface (CNI) and NFV terminology such as VNFs and CNFs.
Given that this is a work in progress we will not provide a technical deep dive and hands on blogs right now. There are still ongoing community discussion on how to productize this solution with multiple approaches. We will however provide links to relevant repos for those who are interested to learn more (the repos should be updated by the community).
A PoC demonstrating the usage of vDPA for CNFs is planned for Kubecon NA November 2019 at the Red Hat booth, along with a few companies we've been working with. We encourage you to visit us and see first hand how this solution works.
The shift from accelerating VMs to accelerating containers
So why are we talking about accelerating containers?
For years, IT enjoyed the advantages of virtualization including hardware independence, flexibility and easier replication. Although offering excellent isolation, virtual machines are often too heavy weight because moving data across the virtualization barrier required an extra copy of the data. IT then began looking at switching to containers making use of cgroups and Linux namespaces to offer a lighter-weight solution while still offering effective isolation and increased portability. Unlike VMs, each container consists of (mostly) just the application, rather than an operating system required to run it.
In order to fully utilize containerization, IT has been looking to orchestration platforms to automate the deployment and management of the containers. Kubernetes has become the orchestration platform of choice with its excellent management of the application lifecycle and simplified approach to networking.
Interest in leveraging the advantages of containers has moved well beyond conventional IT. Telecommunications service providers also want to realize these advantages for their specialized networks. Their networks, having both edge and core needs, could benefit enormously from containerization. In particular, the edge use cases are unique and their networking needs impose challenges for Kubernetes.
What this means in practice is that providers expect to be able to accelerate container networking in the same why they are able to accelerate VM networking. In the case of VMs we talk about VNFs (virtual network functions) and in the case of containers we talk about CNFs (containerized network functions). It should be noted that in this series we look at accelerating L2 and L3 traffic (similar to what we described above for VMs) and not higher networking layers.
VNFs are software forms of network appliances such as routers, firewalls, load-balancers, etc. In practice they are deployed as one or more VMs. CNFs are just the containerized network functions, which increases portability—not the entire OS.
Conventional workloads vs. CNF
Let’s clarify the difference between a typical container running a conventional IT workload and a CNF:
In a typical generic application, the single networking interface provided by a Kubernetes pod (containing containers), eth0 is sufficient for most purposes and the Kubernetes networking model can be extended to allow the application to reach outside the cluster (using flannel, calico or other CNIs).
However, telco use cases are often fundamentally different. They may need to perform routing functions at layer 3, encapsulation and decapsulation or layer 2 or layer 3 function. They also need to perform all the mentioned actions for large amount of traffic and with low latency. These functions can’t be satisfied with a single eth0 interfaces owned by Kubernetes CNI and this is why we start talking about CNFs and ways to accelerate them.
To solve the challenge of meeting these needs, we need a way of providing additional network interfaces that provide direct access these high capacity layer 2 and layer 3 networks outside of standard Kubernetes networking. In the next section we will see how Multus is deployed as part of the solution to this problem.
Kubernetes CNI plugins and how Multus fits in
The basic Kubernetes network model has been quite successful and it is quite sufficient for many applications. The beauty of this approach is that application developers generally don’t need to know about the underlying networking details. Kubernetes hides this complexity by making the pod appear to the application developer as a single host with a single interface. In the diagram below, we can see the simplicity of the standard Kubernetes networking model:
Kubernetes networking includes CNI plugins to facilitate the deployment of additional networking features in Linux containers. Generally, these CNI’s plugins have been used to extend Kubernetes network with overlays, IPAM, ingress and egress, and many others. For example, OVN can be added for an enterprise oriented network fabric. It provides general layer 3 networking services such as overlays for the general networking interface.
Kubernetes networking is well suited for a web server application such as nginx. However, the needs of network function whether in the core or edge of the network can be quite different. Kubernetes has had a "let us do the driving" approach to networking. Although it is an excellent orchestration platform for most application developers, by itself, it may not meet the needs of applications requiring advanced network services.
To provide advanced network services for CNFs, we need the ability to receive and transmit directly to a high speed layer2 or layer 3 interface. This must be done independently of the transformation provided by Kubernetes under the covers. CNFs require faster accelerated access independently of the "simple" interface, eth0 provided to the pod by Kubernetes. We require the ability to "attach" an additional network interface to our Kubernetes pod that will allow our application to perform bulk reads and writes directly via layer 2 or layer 3.
Fortunately, we have a solution to achieve this. Multus, originally introduced in 2017, is a CNI plugin that allows multiple network interfaces to be added to the pod. The additional interfaces allow attachment of the pod to additional layer3 domains, networks, or "raw" layer 2 interfaces. Of more importance to our purposes here, the additional network interfaces can be completely separate from Kubernetes networking other then benefiting from Kubernetes lifecycle management, attach, detach, etc.
In the diagram below we can see a deployment with Multus CNI allowing multiple network interfaces to be attached to a pod. The standard network interface (eth0) is attached to Kubernetes pods, however net0 is an additional network interface attached to external unassociated namespaces, which could be a host-based dataplane or a physical external network. In this same manner we could have added net1, net2, etc.
Note the following:
Multus is a CNI plugin that allows allows the attachment of multiple network interfaces to the pod. In our case we install Multus first and once Multis is installed, we can use a second CNI plugin to provide an additional network interface.
As mentioned there can be many CNI plugins made possible because of the use of Multus. Multus allows the same delegate CNI plugins to be called multiple times for a given pod, or different delegates to be called for a given pod. This allows multiple additional interfaces to be added to a pod with the same or different characteristics.
The additional interface utilizes a CNI plugin and a device plugin. The CNI plugin is provided so that the state of our new interface can be managed by Kubernetes services. The device plugin provides details of hardware specific resources on the node. In this case, the device plugin provides details about our added network interface.
SR-IOV for Containers
In this blog, we focus on CNI plugins that provide direct connection between a CNF and a physical NIC. To this end, Single Root Input-Output Virtualization (SR/IOV) has been in use for quite some time as a mechanism provided by NIC vendors for user space direct access to the NIC.
SR-IOV allows the NIC to split the extremely high capacity of high speed accelerated physical network interface into multiple virtual functions (VF) and make them available for direct IO to a VM or user space application. In our case (blog #8), we described how this technique is being used by VNFs to enable access to high speed NICs at layer 2 or 3 via a zero-copy interfaces.
SR-IOV is widely used for CNFs as well. It provides a number of advantages such as low latency, leveraging hardware based QoS, and high throughput through the usage of DPDK-based user-space networking. Later in this blog, we will see how we can extend this concept to support high speed accelerated interfaces.
In this example, we are performing user space networking. Our CNF is linked with DPDK and uses the virtio-user PMD in DPDK.
Later we will show how to connect vDPA to containers. First though we will discuss how the plumbing is done for SR-IOV and later we will show how the governing process is very similar for vDPA. SR-IOV internals are discussed in detail in "Achieving network wirespeed in an open standard manner: introducing vDPA" and "How deep does the vDPA rabbit hole go?"
A disadvantage of SR-IOV is that it has specific vendor hardware dependency. When used for CNFs, it is difficult to assign a network function that is fully portable from node to node unless the nodes have identical NICs.
Multus is a prerequisite in order to add multiple CNIs each with an additional network interface to our pod. The diagram below shows this deployment using a DPDK-based CNF running over DPDK user space library. This data path entirely bypasses the Linux kernel on the data plane so the data IO is directed from the user space in the pod to and from the NIC VF via SR-IOV.
The following diagram shows how SR-IOV is deployed in Kubernetes:
Note the following:
The SR-IOV CNI interacts both with the pod and the physical NIC - to the pod it will add the accelerated interface (which we call here net0-SR-IOV) and it will configure the NIC (through its PF or primary function) to provision attributes of a VF mapped to the pod.
The SR-IOV device plugin is responsible for the bookkeeping of SR-IOV resources (VFs) when creating a pod with an SR-IOV interface. It connects to the kubernetes device plugin manager.
The vendor-VF-PMD is created as part of the DPDK library in the application and is used for managing the control plane and data plane into the NIC’s VFs.
The net0-SR-IOV interface is connected to the vendor-VF-PMD and provides an access point to the physical NIC.
In practice we are memory mapping the container userspace directly to the physical NIC, achieving near wirespeed/wirelatency to our pod with a zero-copy solution.
As pointed out, each vendor implements its own data plane/control plane through SR-IOV (for example, proprietary ring layouts); thus a different vendor-VF-PMD and a different instruction set for each vendor’s NICs is required.
In this SR-IOV diagram we are deploying Multus which allows us to use the SR-IOV CNI. To deploy SR-IOV in Kubernetes, we use an additional SR-IOV CNI and device plugin. The reader will find the code as well as instructions to build and run the SR-IOV CNI in the github repo above and in the device plugin github repo.
Accelerating a CNF with vDPA
vDPA is another mechanism to plumb high performance networking interfaces directly into containers. vDPA is similar to SR-IOV in that the NIC exports a virtual function to the user space CNF. Also, similarly to SR-IOV, it can be deployed for connecting an accelerated high speed NIC to a user space DPDK application behaving as a CNF.
Although widely used, SR-IOV has some significant limitations, such as a ring structure unique to each vendor’s NIC. Also there are specific setup and control, as well as lack of independent configuration, options for each VF. Because of these limitations, VFs need to be deployed on specific bare metal hosts. This means that the VNFs can’t readily be moved from one host to another.
In contrast, the vDPA includes a virtio buffering scheme (similar to virtio-user) that is an open standard adopted among NIC vendors. In a similar fashion to the SR-IOV example above, the vDPA interface exports virtual functions, but it has an advantage over SR-IOV because the ring layout meets the virtio standard, making the CNF more portable.
Please refer to previous vDPA blogs for additional details on how it compares with SR-IOV and the additional values it brings.
Let’s shift our attention back to vDPA for CNFs in Kubernetes. From Kubernetes point of view vDPA is similar to SR-IOV in that it consists of a CNI plugin and device plugin. However, it includes an additional component not required for SR-IOV, vDPA daemonset. The vDPA daemonset is used for communicating with the NIC and setting up the virtio buffering scheme.
With vDPA, as is the case with SR-IOV, the data path goes directly from the VF in the NIC to the user space memory in the pod.
Note the following:
The vDPA CNI interacts both with the pod and the physical NIC (similar to the SR-IOV case). To the pod it will add the accelerated interface (called here virtio mdev interface) and it will configure the NIC (through its PF or primary function) to provision attributes of a VF mapped to the pod. The goal with vDPA is to abstract away the different vendor NICs via the vDPA framework and vendor vDPA kernel drivers
The vDPA device plugin is responsible for the bookkeeping of VF resources (similar to the SR-IOV) when creating a vDPA interface for a pod. It connects to the kubernetes device plugin manager
The virtio-net-PMD is created as part of the DPDK library for managing the control plane and data plane. It is important to emphasize that the main differentiating feature of vDPA with respect to SR-IOV described earlier is that we can use the same CNF regardless of the specific NIC vendor. For example, note that in this ethis drawing we show a single virtio-net-PMD regardless of the vendor NIC we are using.
The net0-virtio mdev interface is connected to the virtio-net-PMD providing access to the physical NIC. However it uses a standard virtio ring layout in the data plane and a vDPA as the standard control plane
The vDPA framework + vDPA vendor drivers are where all the magic happens as described in blogs #8 and #9. Without going to deep into the technical parts explained in those blogs, this block is responsible for providing the framework each NIC vendor can hook into. This way, as long as the NIC vendor supports the virtio ring layout in HW, it can still maintain its own control plane and add a kernel driver (connected to the framework), supporting the standard vDPA interface to the containers and vDPA CNIs
As pointed in the SR-IOV case as well, in practice we are memory mapping the container userspace directly to the physical NIC, achieving wirespeed/wirelatency to this pod with no additional copying.
As pointed out in the introduction, the vDPA kubernetes enablement is a work in progress and the community are discussing a number of possible approaches for productizing this solution upstream. For this PoC a number of assumptions have been made and components were connected, not always in optimal ways, since showing a working prototype was first priority.
In the example used in this blog and in the Kubecon POC, we utilize vDPA as a DPDK user-space implementation instead of the vDPA kernel implementation which would use virtio-mdev and vhost-mdev.
The vDPA deployment github repo contains the most up to date details and instructions on how the different parts are connected together.
We have discussed vDPA accelerated CNFs in a Kubernetes environment. We began with a basic explanation of Kubernetes networking and discussed the specific requirements for CNF’s, most of which are not addressed by the basic Kubernetes network interface. We shared how we use Multus to enable additional network interfaces in a pod and explained how these additional network interfaces could be used with a user space DPDK application.
In this blog, we covered SR-IOV CNF acceleration and discussed how the SR-IOV CNI is deployed in Kubernetes. We discussed the advantages of SR-IOV, as well as some significant limitations when used with Kubernetes. We then shifted to vDPA as a mechanism to take advantage of accelerated NICs using open standard interfaces. Finally, we highlighted how vDPA mitigates some of the limitations of SR-IOV and reviewed how vDPA is integrated into Kubernetes.
Going back to telco use cases, we believe that adding vDPA interfaces to pods (via Multus) and and using DPDK is a powerful enabler of applications inside those pods. The vDPA interfaces provide wirespeed/wirelatency to pods in an open standard manner (as opposed to SR-IOV). Looking forward, vDPA will also support advanced technologies replacing SR-IOV VFs (such as scalable IOV). Additionally the same vDPA kernel interface can be used for both containers and VMs which we believe is critical for future brownfield deployments given today’s huge investment by telcos in VNFs.