Hands on vhost-net: Do. Or do not. There is no try

2019 年 9 月 18 日Eugenio Pérez Martín, Adrian Moreno Zapata16 分钟阅读

Vhost-net has silently become the default traffic offloading mechanism for qemu-kvm based virtual environments leveraging the standard virtio networking interface. This mechanism allows the network processing to be performed in a kernel module freeing the qemu process and improving the overall network performance.

In the previous blog posts we have introduced the different elements that compose this architecture in a high level overview and we have provided a detailed technical explanation of how these elements work together. In this post, we will provide a step-by-step hands-on guide on how to set up this sample architecture. Once up and running, we will be able to inspect the main components in action.

This post is directed towards developers, hackers and anyone else interested in learning how network offloading is performed from real examples.

After going through this post (hopefully recreating the environment in your own PC), you will be familiar with the tools commonly used in virtualization (such as `virsh`), you will know how to set up a vhost-net environment and you will know how to inspect a running VM and measure its network performance.

For those of you who want to set up this environment quickly and start reverse engineer it directly, there’s a treat for you! An ansible playbook is available that will automate the deployment of this sample scenario.

Setting things up

Requirements:

A computer running a Linux distribution. This guide is focused on Fedora 30, however the commands should not change significantly for other Linux distros.
A user with sudo permissions
~ 25 G of free space in your home directory
At least 8GB of RAM

First, let’s install the packages we are going to need:

user@host $  sudo dnf install qemu-kvm libvirt-daemon-qemu libvirt-daemon-kvm libvirt iperf3 virt-install libguestfs-tools-c kernel-tools

You will need to install netperf also if you want to perform the benchmarking. To do so, you can get the one of the packages for your OS. At the moment of writing, you can install the last version executing:

user@host $  sudo dnf install https://raw.githubusercontent.com/rpmsphere/x86_64/master/n/netperf-2.7.0-2.1.x86_64.rpm

You will also need to install it in the guest when the moment comes!

In order to be able to use libvirt, your user must be part of the libvirt group:

user@host $ sudo usermod -a -G libvirt $(whoami)

After modifying the group membership of a user, you might need to log in again for the change to be applied. Also, you’ll need to restart libvirt:

user@host $ sudo systemctl restart libvirtd

Creating a VM

First, download the latest Fedora-Cloud-Base image:

user@host $ sudo wget -O /var/lib/libvirt/images/Fedora-Cloud-Base-30-1.2.x86_64.qcow2 http://fedora.inode.at/releases/30/Cloud/x86_64/images/Fedora-Cloud-Base-30-1.2.x86_64.qcow2

(Note the URL above might change, update it to the latest qcow2 image in http://fedora.inode.at/releases/30/Cloud/x86_64/images/)

This downloads a preinstalled version of Fedora30, ready to run in an OpenStack environment. Since we’re not running OpenStack, we have to clean the image. To do that, first we will make a copy of the image so we can reuse it in the future:

user@host $ sudo qemu-img create -f qcow2 -b  /var/lib/libvirt/images/Fedora-Cloud-Base-30-1.2.x86_64.qcow2  /var/lib/libvirt/images//virtio-test1.qcow2 20G

The following libvirt commands can be executed with an unprivileged user (recommended) if we export the following variable:

user@host $ export LIBVIRT_DEFAULT_URI="qemu:///system"

Now, the cleaning command (change the password to your own):

user@host $ sudo virt-sysprep --root-password password:changeme --uninstall cloud-init --selinux-relabel -a /var/lib/libvirt/images/virtio-test1.qcow2

This command mounts the filesystem and applies some basic configuration automatically so that the image is ready to boot afresh.

We need a network to connect our VM as well. Libvirt handles networks in a similar way it manages VMs, you can define a network using an XML file and start it or stop it through the command line.

For this example, we will use a network called ‘default’ whose definition is shipped inside libvirt for convenience. The following commands define the ‘default’ network, start it and check its running.

user@host $ virsh net-define /usr/share/libvirt/networks/default.xml
Network default defined from /usr/share/libvirt/networks/default.xml
user@host $ virsh net-start default
Network default started
user@host $virsh net-list
 Name      State    Autostart   Persistent
--------------------------------------------
 default   active   no          yes

Finally, we can use virt-install to create the VM. This command line utility creates the needed definitions for a set of well known operating systems. This will give us the base definitions that we can then customize:

user@host $ virt-install --import  --name virtio-test1 --ram=4096 --vcpus=2 \
--nographics --accelerate \
       --network network:default,model=virtio --mac 02:ca:fe:fa:ce:01 \
      --debug --wait 0 --console pty \
      --disk /var/lib/libvirt/images/virtio-test1.qcow2,bus=virtio --os-variant fedora30

The options used for this command specify the number of vCPUs, the amount of RAM of our VM as well as the disk path and the network we want the VM to be connected to.

Apart from defining the VM according to the options that we specified, the virt-install command should have also started the VM for us so we should be able to list it:

user@host $ virsh list
 Id   Name           State
------------------------------
 1    virtio-test1   running

Voilà! Our VM is running.
Just as a remainder, the virsh is a command line interface to libvirt daemon. You can start a VM by running :

user@host $ virsh start virtio-test1

Jump into the console by running:

user@host $ virsh console virtio-test1

Stop the VM by running:

user@host $ virsh shutdown virtio-test1

And delete the VM (don’t do it now if you don’t want to need to create it again!) by running:

user@host $ virsh undefine virtio-test1

Inspecting the guest

As already mentioned, the virt-install command has automatically created and started a VM using libvirt. Every VM created using libvirt is described by an XML file that defines the hardware being emulated. Let’s have a look at the relevant parts of this file by dumping its content:

user@host $ virsh dumpxml virtio-test1

Specifically, let’s look at the network device that virt-install created for us:

 <devices>
...     
             <interface type='network'>                                                                                                                                                                                 
      <mac address='02:ca:fe:fa:ce:01'/>   
      <source network='default' bridge='virbr0'/>                
      <target dev='vnet0'/>   
      <model type='virtio'/>   
      <alias name='net0'/>  
      <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>                                                                                                                               
    </interface>                                                                                     

…
</devices>

We can see that a virtio device has been created and that it is connected to a network that, among other things, is comprised of a linux bridge (called virbr0). Also, the device has been assigned a PCI domain, bus and slot.

Now, let’s jump into the VM’s console and see what it looks like from the inside:

user@host $ virsh console virtio-test1

Once logged in (use the password you configured on the virt-sysprep step) and before going any further, let’s install some packages:

[root@guest ~]# dnf install pciutils iperf3

Now, let’s look around. We can see that there is, indeed, a network device on our virtual PCI bus (modify the PCI bus according to your XML or inspect with lspci first):

[root@localhost ~]# lspci -s 0000:01:00.0 -v
01:00.0 Ethernet controller: Red Hat, Inc. Virtio network device (rev 01)
     Subsystem: Red Hat, Inc. Device 1100
     Physical Slot: 0
     Flags: bus master, fast devsel, latency 0, IRQ 21
     Memory at fda40000 (32-bit, non-prefetchable) [size=4K]
     Memory at fea00000 (64-bit, prefetchable) [size=16K]
     Expansion ROM at fda00000 [disabled] [size=256K]
     Capabilities: [dc] MSI-X: Enable+ Count=3 Masked-
     Capabilities: [c8] Vendor Specific Information: VirtIO: <unknown>
     Capabilities: [b4] Vendor Specific Information: VirtIO: Notify
     Capabilities: [a4] Vendor Specific Information: VirtIO: DeviceCfg
     Capabilities: [94] Vendor Specific Information: VirtIO: ISR
     Capabilities: [84] Vendor Specific Information: VirtIO: CommonCfg
     Capabilities: [7c] Power Management version 3
     Capabilities: [40] Express Endpoint, MSI 00
     Kernel driver in use: virtio-pci

Apart from the typical PCI information (such as the memory regions and capabilities), we see that the driver used is virtio-pci. This driver implements the common virtio over PCI functionality and creates a virtio device that is then driven by virtio_net as we can see if we inspect the PCI device a bit further:

[root@localhost ~]# readlink /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/virtio0/driver
../../../../../bus/virtio/drivers/virtio_net

It is the virtio_net driver the one in charge of creating a network interface for the rest of the operating system to use:

[root@localhost ~]# ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000
    link/ether 02:ca:fe:fa:ce:01 brd ff:ff:ff:ff:ff:ff
[root@localhost ~]#

Inspecting the host

Ok, so we’ve had a look at the guest, now let’s look at the host. Note that if we configure an interface of type ‘network’ then the default behaviour is to use vhost-net.

First of all, let’s see if the vhost-net is loaded:

user@host $ sudo lsmod | grep vhost
vhost_net              32768  1
vhost                  53248  1 vhost_net
tap                    28672  1 vhost_net
tun                    57344  4 vhost_net

We can also check that QEMU interacts with tun, kvm and vhost-net devices, and the file descriptors QEMU process has assigned to them examining /proc filesystem (the actual file descriptor id can vary):

user@host $ ls -lh /proc/$(pgrep qemu)/fd | grep '/dev'
...
lrwx------. 1 qemu qemu 64 Aug 27 06:38 13 -> /dev/kvm
lrwx------. 1 qemu qemu 64 Aug 27 06:38 30 -> /dev/net/tun
lrwx------. 1 qemu qemu 64 Aug 27 06:38 31 -> /dev/vhost-net

This means that the qemu process, apart from having opened a kvm device to perform the actual virtualization and created a tun/tap device, it has opened a vhost-net device. Also, we can see the vhost kernel thread associated with our qemu instance has been created:

user@host $ ps -ef | grep '\[vhost'
root     21743     2  0 14:11 ?        00:00:00 [vhost-21702]
$ pgrep qemu
21702

Lastly, we can see the tun interface created by the qemu process (also seen by the qemu open file descriptors) and the bridge joining host and guest. Note that, although the tap device is attached to the qemu process, the vhost-net kernel thread is the actual tap reader and writer.

user@host $ # ip -d tuntap
virbr0-nic: tap persist
    Attached to processes:
vnet0: tap vnet_hdr
    Attached to processes:qemu-system-x86(21702)

OK, so vhost is up and running, and qemu is connected to it. Now, let’s generate some traffic to see how the system performs.

Generating traffic

If you have followed properly the previous steps you can send data from the host to the guest and vice-versa using their ip addresses. For example, test for the network performance using iperf3. Note that these measurements are not proper benchmarks, tiny variations in any parameter like software or hardware versions or different network stack parameters can alter the obtained results significantly. Performance tuning or specific usage benchmarking are outside of the scope of this document.

First check the IP address of the guest, and execute iperf3 server on it (or whatever tool you want to use to check connectivity or do benchmarking):

[root@guest ~]# ip addr
...
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000
    link/ether 02:ca:fe:fa:ce:01 brd ff:ff:ff:ff:ff:ff
    inet 192.168.122.41/24 brd 192.168.122.255 scope global dynamic noprefixroute eth0
       valid_lft 2534sec preferred_lft 2534sec
    inet6 fe80::ca:feff:fefa:ce01/64 scope link
       valid_lft forever preferred_lft forever
[root@localhost ~]# iperf3 -s

And the iperf3 client in the host:

user@host $ iperf3 -c 192.168.122.41
...
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  26.3 GBytes  22.6 Gbits/sec    0             sender
[  5]   0.00-10.04  sec  26.3 GBytes  22.5 Gbits/sec                  receiver

In the iperf3 output we can see a transfer speed of about 22.5 Gbit/sec in both directions (remember that network bandwidth depends on a lot of things, so don’t expect to get the same on your environment). We can modify the size of the packet (-l option) to stress more the data plane.
If we run top during the iperf3 test we can see that vhost-$pid kernel thread is using one full core for packet forwarding, and that QEMU is using almost two cores.

user@host $ top
...
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
14548 qemu      20   0 6094256   1.1g  22304 S 180.5   3.5   1:37.80 qemu-system-x86     
14586 root      20   0       0      0      0 R  99.7   0.0   0:20.83 vhost-14548
14753 root      20   0    3904   2360   2116 R  57.0   0.0   0:13.39 iperf3

To measure latency, we use netperf command to start a netperf server, and the next one to measure latency:

user@host $ netperf -l 30 -H 192.168.122.41 -p 16604 -t TCP_RR
Socket Size   Request  Resp.   Elapsed  Trans.
Send   Recv   Size     Size    Time     Rate      
bytes  Bytes  bytes    bytes   secs.    per sec

212992 212992 1        1       30.00    28822.60

So the latency in this example is 1/28822.60 = 0.000000347s, since all the requests are serialized.

As said before, we are not doing proper benchmarking or fine tuning. We are doing this as a part of the hands on to you to get familiar with the technology.

Extra: Disable vhost-net

As we’ve already seen, the use of vhost-net is the default behaviour because of the performance improvement it brings. However, since we’re here to get our hands dirty and learn, let’s disable the vhost-net and see the difference in performance. By doing that, we will see how qemu has to do all the heavy lifting of the packet processing and how that impacts performance.

First, stop the VM:

user@host $ virsh shutdown virtio-test1

Then, edit the VM and add <driver name="qemu"/> to the network definition:

user@host $ virsh edit virtio-test1   # it will open your editor

Modify the network interface so it looks like this:

<devices>
...     
        <interface type='network'>                                                                                                                                                                                 
      <mac address='02:ca:fe:fa:ce:01'/>   
      <source network='default' bridge='virbr0'/>                
      <target dev='vnet0'/>   
      <model type='virtio'/>
             <driver name="qemu"/>
      <alias name='net0'/>  
      <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
        </interface>                                                                                     

…
</devices>

Exit your editor, and restart the VM with:

user@host $ virsh start virtio-test1

You can check that there is no file descriptor pointing to `/dev/vhost-net` anymore.

Analyzing the performance impact

If we repeat the benchmarking from the previous section without vhost-net, now we can see that no vhost-net kernel thread is shown in the top output, and we can appreciate a performance drop (to about 19.2 Gb/sec):

user@host $ iperf3 -c 192.168.122.41
...
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  22.4 GBytes  19.2 Gbits/sec    2             sender
[  5]   0.00-10.00  sec  22.4 GBytes  19.2 Gbits/sec                  receiver

Also, we see a CPU usage increment in the qemu process if we use the top command, from ~180% to 190-260%, and of course no trace of vhost-$pid kernel thread.

user@host $ top
...
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
 1954 qemu      20   0 6007188 569192  22020 R 220.6   1.7   2:39.32 qemu-system-x86

If we compare TCP and UDP latency of both architectures, we see that the use of vhost-net provides a consistent improvement with respect to QEMU:

Latency improvement by type of traffic

Another good indicator of what’s going on is the number of IOCTLs that qemu has to send to KVM. This is because every time there is an I/O event that has to be handled by qemu, it needs to process it and to send an IOCTL to KVM in order to switch the context back to the guest. We can analyze the time qemu spends on each syscall with the strace command. The following table compares the results obtained with and without vhost-net driver:

Qemu and vhost comparison

In this table, we can see that without vhost-user qemu spends a lot of time reading data and sending IOCTLs.

Ansible scripts available!

Setting this environment up and running is the first and fundamental step in order to understand, debug and test this architecture. In order to make it as quick and easy as possible, Red Hat’s virtio-net team has developed a set of Ansible scripts for everyone to use available on GitHub.

Just follow the instructions in the README and Ansible should take care of the rest.

Conclusions

In this post we have guided you through the process of creating a VM with QEMU and vhost-net, inspecting both guest and host in order to understand the ins and outs of this architecture. We have also shown the performance improvement that vhost-net traffic offloading brings to the table.

This is the last stop on a journey that started with "Introduction to virtio-networking and vhost-net," where an overview of the architecture was presented. That post was followed by "Deep dive into Virtio-networking and vhost-net" where a deeply technical explanation of all the componentes was given. Now, by showing how to actually set things up, we conclude this topic hoping to have provided enough resources for IT experts, architects and developers to understand the benefits of this technology and get started with it.

Stay tuned for the next topic that we will address: Userland networking and DPDK!

关于作者

Eugenio Pérez Martín

Software Engineer

Eugenio Pérez works as a Software Engineer in the Virtualization and Networking (virtio-net) team at Red Hat. He has been developing and promoting free software on Linux since his career start. Always closely related to networking, being with packet capture or classic monitoring. He enjoys to learn about how things are implemented and how he can expand them, keeping them simple (KISS) and focusing on maintainability and security.

Read full bio

Adrian Moreno Zapata

按频道浏览

探索所有频道