Vhost-net has silently become the default traffic offloading mechanism for qemu-kvm based virtual environments leveraging the standard virtio networking interface. This mechanism allows the network processing to be performed in a kernel module freeing the qemu process and improving the overall network performance.
In the previous blog posts we have introduced the different elements that compose this architecture in a high level overview and we have provided a detailed technical explanation of how these elements work together. In this post, we will provide a step-by-step hands-on guide on how to set up this sample architecture. Once up and running, we will be able to inspect the main components in action.
This post is directed towards developers, hackers and anyone else interested in learning how network offloading is performed from real examples.
After going through this post (hopefully recreating the environment in your own PC), you will be familiar with the tools commonly used in virtualization (such as `virsh`), you will know how to set up a vhost-net environment and you will know how to inspect a running VM and measure its network performance.
For those of you who want to set up this environment quickly and start reverse engineer it directly, there’s a treat for you! An ansible playbook is available that will automate the deployment of this sample scenario.
Setting things up
A computer running a Linux distribution. This guide is focused on Fedora 30, however the commands should not change significantly for other Linux distros.
A user with
~ 25 G of free space in your home directory
At least 8GB of RAM
First, let’s install the packages we are going to need:
user@host $ sudo dnf install qemu-kvm libvirt-daemon-qemu libvirt-daemon-kvm libvirt iperf3 virt-install libguestfs-tools-c kernel-tools
You will need to install netperf also if you want to perform the benchmarking. To do so, you can get the one of the packages for your OS. At the moment of writing, you can install the last version executing:
user@host $ sudo dnf install https://raw.githubusercontent.com/rpmsphere/x86_64/master/n/netperf-2.7.0-2.1.x86_64.rpm
You will also need to install it in the guest when the moment comes!
In order to be able to use libvirt, your user must be part of the libvirt group:
user@host $ sudo usermod -a -G libvirt $(whoami)
After modifying the group membership of a user, you might need to log in again for the change to be applied. Also, you’ll need to restart libvirt:
user@host $ sudo systemctl restart libvirtd
Creating a VM
First, download the latest Fedora-Cloud-Base image:
user@host $ sudo wget -O /var/lib/libvirt/images/Fedora-Cloud-Base-30-1.2.x86_64.qcow2 http://fedora.inode.at/releases/30/Cloud/x86_64/images/Fedora-Cloud-Base-30-1.2.x86_64.qcow2
(Note the URL above might change, update it to the latest qcow2 image in http://fedora.inode.at/releases/30/Cloud/x86_64/images/)
This downloads a preinstalled version of Fedora30, ready to run in an OpenStack environment. Since we’re not running OpenStack, we have to clean the image. To do that, first we will make a copy of the image so we can reuse it in the future:
user@host $ sudo qemu-img create -f qcow2 -b /var/lib/libvirt/images/Fedora-Cloud-Base-30-1.2.x86_64.qcow2 /var/lib/libvirt/images//virtio-test1.qcow2 20G
The following libvirt commands can be executed with an unprivileged user (recommended) if we export the following variable:
user@host $ export LIBVIRT_DEFAULT_URI="qemu:///system"
Now, the cleaning command (change the password to your own):
user@host $ sudo virt-sysprep --root-password password:changeme --uninstall cloud-init --selinux-relabel -a /var/lib/libvirt/images/virtio-test1.qcow2
This command mounts the filesystem and applies some basic configuration automatically so that the image is ready to boot afresh.
We need a network to connect our VM as well. Libvirt handles networks in a similar way it manages VMs, you can define a network using an XML file and start it or stop it through the command line.
For this example, we will use a network called ‘default’ whose definition is shipped inside libvirt for convenience. The following commands define the ‘default’ network, start it and check its running.
user@host $ virsh net-define /usr/share/libvirt/networks/default.xml Network default defined from /usr/share/libvirt/networks/default.xml user@host $ virsh net-start default Network default started user@host $virsh net-list Name State Autostart Persistent -------------------------------------------- default active no yes
Finally, we can use virt-install to create the VM. This command line utility creates the needed definitions for a set of well known operating systems. This will give us the base definitions that we can then customize:
user@host $ virt-install --import --name virtio-test1 --ram=4096 --vcpus=2 \ --nographics --accelerate \ --network network:default,model=virtio --mac 02:ca:fe:fa:ce:01 \ --debug --wait 0 --console pty \ --disk /var/lib/libvirt/images/virtio-test1.qcow2,bus=virtio --os-variant fedora30
The options used for this command specify the number of vCPUs, the amount of RAM of our VM as well as the disk path and the network we want the VM to be connected to.
Apart from defining the VM according to the options that we specified, the virt-install command should have also started the VM for us so we should be able to list it:
user@host $ virsh list Id Name State ------------------------------ 1 virtio-test1 running
Voilà! Our VM is running.
Just as a remainder, the virsh is a command line interface to libvirt daemon. You can start a VM by running :
user@host $ virsh start virtio-test1
Jump into the console by running:
user@host $ virsh console virtio-test1
Stop the VM by running:
user@host $ virsh shutdown virtio-test1
And delete the VM (don’t do it now if you don’t want to need to create it again!) by running:
user@host $ virsh undefine virtio-test1
Inspecting the guest
As already mentioned, the
virt-install command has automatically created and started a VM using libvirt. Every VM created using libvirt is described by an XML file that defines the hardware being emulated. Let’s have a look at the relevant parts of this file by dumping its content:
user@host $ virsh dumpxml virtio-test1
Specifically, let’s look at the network device that
virt-install created for us:
<devices> ... <interface type='network'> <mac address='02:ca:fe:fa:ce:01'/> <source network='default' bridge='virbr0'/> <target dev='vnet0'/> <model type='virtio'/> <alias name='net0'/> <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/> </interface> … </devices>
We can see that a virtio device has been created and that it is connected to a network that, among other things, is comprised of a linux bridge (called
virbr0). Also, the device has been assigned a PCI domain, bus and slot.
Now, let’s jump into the VM’s console and see what it looks like from the inside:
user@host $ virsh console virtio-test1
Once logged in (use the password you configured on the
virt-sysprep step) and before going any further, let’s install some packages:
[root@guest ~]# dnf install pciutils iperf3
Now, let’s look around. We can see that there is, indeed, a network device on our virtual PCI bus (modify the PCI bus according to your XML or inspect with
[root@localhost ~]# lspci -s 0000:01:00.0 -v 01:00.0 Ethernet controller: Red Hat, Inc. Virtio network device (rev 01) Subsystem: Red Hat, Inc. Device 1100 Physical Slot: 0 Flags: bus master, fast devsel, latency 0, IRQ 21 Memory at fda40000 (32-bit, non-prefetchable) [size=4K] Memory at fea00000 (64-bit, prefetchable) [size=16K] Expansion ROM at fda00000 [disabled] [size=256K] Capabilities: [dc] MSI-X: Enable+ Count=3 Masked- Capabilities: [c8] Vendor Specific Information: VirtIO: <unknown> Capabilities: [b4] Vendor Specific Information: VirtIO: Notify Capabilities: [a4] Vendor Specific Information: VirtIO: DeviceCfg Capabilities:  Vendor Specific Information: VirtIO: ISR Capabilities:  Vendor Specific Information: VirtIO: CommonCfg Capabilities: [7c] Power Management version 3 Capabilities:  Express Endpoint, MSI 00 Kernel driver in use: virtio-pci
Apart from the typical PCI information (such as the memory regions and capabilities), we see that the driver used is virtio-pci. This driver implements the common virtio over PCI functionality and creates a virtio device that is then driven by virtio_net as we can see if we inspect the PCI device a bit further:
[root@localhost ~]# readlink /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/virtio0/driver ../../../../../bus/virtio/drivers/virtio_net
It is the virtio_net driver the one in charge of creating a network interface for the rest of the operating system to use:
[root@localhost ~]# ip link 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000 link/ether 02:ca:fe:fa:ce:01 brd ff:ff:ff:ff:ff:ff [root@localhost ~]#
Inspecting the host
Ok, so we’ve had a look at the guest, now let’s look at the host. Note that if we configure an interface of type ‘network’ then the default behaviour is to use vhost-net.
First of all, let’s see if the vhost-net is loaded:
user@host $ sudo lsmod | grep vhost vhost_net 32768 1 vhost 53248 1 vhost_net tap 28672 1 vhost_net tun 57344 4 vhost_net
We can also check that QEMU interacts with tun, kvm and vhost-net devices, and the file descriptors QEMU process has assigned to them examining /proc filesystem (the actual file descriptor id can vary):
user@host $ ls -lh /proc/$(pgrep qemu)/fd | grep '/dev' ... lrwx------. 1 qemu qemu 64 Aug 27 06:38 13 -> /dev/kvm lrwx------. 1 qemu qemu 64 Aug 27 06:38 30 -> /dev/net/tun lrwx------. 1 qemu qemu 64 Aug 27 06:38 31 -> /dev/vhost-net
This means that the qemu process, apart from having opened a kvm device to perform the actual virtualization and created a tun/tap device, it has opened a vhost-net device. Also, we can see the vhost kernel thread associated with our
qemu instance has been created:
user@host $ ps -ef | grep '\[vhost' root 21743 2 0 14:11 ? 00:00:00 [vhost-21702] $ pgrep qemu 21702
Lastly, we can see the tun interface created by the
qemu process (also seen by the
qemu open file descriptors) and the bridge joining host and guest. Note that, although the tap device is attached to the
qemu process, the vhost-net kernel thread is the actual tap reader and writer.
user@host $ # ip -d tuntap virbr0-nic: tap persist Attached to processes: vnet0: tap vnet_hdr Attached to processes:qemu-system-x86(21702)
vhost is up and running, and
qemu is connected to it. Now, let’s generate some traffic to see how the system performs.
If you have followed properly the previous steps you can send data from the host to the guest and vice-versa using their ip addresses. For example, test for the network performance using iperf3. Note that these measurements are not proper benchmarks, tiny variations in any parameter like software or hardware versions or different network stack parameters can alter the obtained results significantly. Performance tuning or specific usage benchmarking are outside of the scope of this document.
First check the IP address of the guest, and execute
iperf3 server on it (or whatever tool you want to use to check connectivity or do benchmarking):
[root@guest ~]# ip addr ... 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000 link/ether 02:ca:fe:fa:ce:01 brd ff:ff:ff:ff:ff:ff inet 192.168.122.41/24 brd 192.168.122.255 scope global dynamic noprefixroute eth0 valid_lft 2534sec preferred_lft 2534sec inet6 fe80::ca:feff:fefa:ce01/64 scope link valid_lft forever preferred_lft forever [root@localhost ~]# iperf3 -s
iperf3 client in the host:
user@host $ iperf3 -c 192.168.122.41 ... [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.00 sec 26.3 GBytes 22.6 Gbits/sec 0 sender [ 5] 0.00-10.04 sec 26.3 GBytes 22.5 Gbits/sec receiver
iperf3 output we can see a transfer speed of about 22.5 Gbit/sec in both directions (remember that network bandwidth depends on a lot of things, so don’t expect to get the same on your environment). We can modify the size of the packet (
-l option) to stress more the data plane.
If we run
top during the
iperf3 test we can see that vhost-$pid kernel thread is using one full core for packet forwarding, and that QEMU is using almost two cores.
user@host $ top ... PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 14548 qemu 20 0 6094256 1.1g 22304 S 180.5 3.5 1:37.80 qemu-system-x86 14586 root 20 0 0 0 0 R 99.7 0.0 0:20.83 vhost-14548 14753 root 20 0 3904 2360 2116 R 57.0 0.0 0:13.39 iperf3
To measure latency, we use
netperf command to start a netperf server, and the next one to measure latency:
user@host $ netperf -l 30 -H 192.168.122.41 -p 16604 -t TCP_RR Socket Size Request Resp. Elapsed Trans. Send Recv Size Size Time Rate bytes Bytes bytes bytes secs. per sec 212992 212992 1 1 30.00 28822.60
Extra: Disable vhost-net
user@host $ virsh shutdown virtio-test1
user@host $ virsh edit virtio-test1 # it will open your editor
<devices> ... <interface type='network'> <mac address='02:ca:fe:fa:ce:01'/> <source network='default' bridge='virbr0'/> <target dev='vnet0'/> <model type='virtio'/> <driver name="qemu"/> <alias name='net0'/> <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/> </interface> … </devices>
Exit your editor, and restart the VM with:
user@host $ virsh start virtio-test1
You can check that there is no file descriptor pointing to `/dev/vhost-net` anymore.
Analyzing the performance impact
If we repeat the benchmarking from the previous section without vhost-net, now we can see that no vhost-net kernel thread is shown in the top output, and we can appreciate a performance drop (to about 19.2 Gb/sec):
user@host $ iperf3 -c 192.168.122.41 ... [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.00 sec 22.4 GBytes 19.2 Gbits/sec 2 sender [ 5] 0.00-10.00 sec 22.4 GBytes 19.2 Gbits/sec receiver
Also, we see a CPU usage increment in the qemu process if we use the top command, from ~180% to 190-260%, and of course no trace of vhost-$pid kernel thread.
user@host $ top ... PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1954 qemu 20 0 6007188 569192 22020 R 220.6 1.7 2:39.32 qemu-system-x86
If we compare TCP and UDP latency of both architectures, we see that the use of
vhost-net provides a consistent improvement with respect to QEMU:
Another good indicator of what’s going on is the number of IOCTLs that qemu has to send to KVM. This is because every time there is an I/O event that has to be handled by qemu, it needs to process it and to send an IOCTL to KVM in order to switch the context back to the guest. We can analyze the time qemu spends on each syscall with the
strace command. The following table compares the results obtained with and without vhost-net driver:
In this table, we can see that without
qemu spends a lot of time reading data and sending IOCTLs.
Ansible scripts available!
Setting this environment up and running is the first and fundamental step in order to understand, debug and test this architecture. In order to make it as quick and easy as possible, Red Hat’s virtio-net team has developed a set of Ansible scripts for everyone to use available on GitHub.
Just follow the instructions in the README and Ansible should take care of the rest.
In this post we have guided you through the process of creating a VM with QEMU and vhost-net, inspecting both guest and host in order to understand the ins and outs of this architecture. We have also shown the performance improvement that vhost-net traffic offloading brings to the table.
This is the last stop on a journey that started with "Introduction to virtio-networking and vhost-net," where an overview of the architecture was presented. That post was followed by "Deep dive into Virtio-networking and vhost-net" where a deeply technical explanation of all the componentes was given. Now, by showing how to actually set things up, we conclude this topic hoping to have provided enough resources for IT experts, architects and developers to understand the benefits of this technology and get started with it.
Stay tuned for the next topic that we will address: Userland networking and DPDK!