Scaling virtual machine network performance for network intensive workloads.

Virtual network environments offer a number of benefits, and we’re already seeing providers and enterprises moving their entire infrastructures to virtual systems. But there are still shortcomings. For one, virtual machine (VM) network performance needs to scale so it can support higher-demanding workloads like those with network functions virtualization (NFV).  Now, there’s a mandatory multi-queues feature in Open vSwitch (OVS) to scale the VM network performance.

In the multi-core era, performance needs to scale linearly with the number of in-line CPUs. In order to offer a linear scalability, competition between CPUs (locks) should be avoided and resources have to be made available per CPU. This is why modern NICs support multi-queues, enabling each CPU to have its own RX and TX queue, while multiple CPUs can get access to the same NIC without stepping on each other’s toes and scale the overall performance by the number of participating CPUs. For example, if a single CPU VM is able to process 4 million packets per second, scaling to 32 million packets will require a 4 CPUs VM, so 4 queues per NIC.

Although there are several options  available today to deploy network-intensive workloads on OpenStack (such as NFV), for these purposes we are focusing on a standard software solution based on Open vSwitch, an open source software switch designed to be used as a vSwitch within virtualized server environments.

The VirtIO paravirtualized NIC standard supports multi-queue in order to scale guest networking performance with the number of its virtual CPUs (vCPUs), and is available for the Red Hat OpenStack Platform via two host implementations:

VMnetperf1 vhost-user userland implementation, which is planned for inclusion in Red Hat OpenStack Platform 10, will rely on DPDK accelerated Open vSwitch (userland netdedata path).

VMnetperf2

With one of the two host backends above, guests can benefit from multi-queue via using either regular Linux VirtIO driver which supports multi-queue (via ethtool) or DPDK VirtIO Poll-Mode Driver (PMD).

Thanks to the VirtIO standard, any DPDK or kernel implementation between guest and host will be functional. However, not all of them make sense:

  • DPDK guest with a regular vhost-net kernel based implementation does not accelerate the guest, as the host is the bottleneck.
  • DPDK host will not accelerate a vhost-net guest, as the guest is the bottleneck.

Table 1

 

Red Hat OpenStack Platform required versions for multi-queue:

 

Table 2

Open vSwitch multi-queues enable the VM networking performances to scale by adding CPUs to the VM and is often enough to scale enterprise-class networking applications. But this is still not enough for NFV carrier class workloads; many platform parameters such as CPU pinning, huge pages, etc., have to be properly tuned manually or preferably by OpenStack installer.

Nonetheless, Open vSwitch multi-queues is a must-have feature in order to scale the VM networking performances, especially for higher-demanding workloads such as NFV. The feature can be sufficient for enterprise workloads, and work is underway to expand its capabilities so that it is sufficient for NFV carrier class workloads. OPNFV VSperf aims to characterize OVS and OVS-DPDK performances for various configurations, including the performance scaling thanks to multi-queues. This is a work in progress project, however it already provides many data points and confirms the benefits of the multi-queues feature for OVS and OVS-DPDK.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.