Hardening QEMU through continuous security testing

21 maggio 2020Bandan Das, Alexander Bulekov

Red Hat’s virtualization ecosystem consists of QEMU, an emulator, and Linux's Kernel-based Virtual Machine (KVM), an in-kernel driver along with many other software projects that encompass QEMU and KVM. These software projects (or a subset of them) are the backbone of Red Hat products such as Red Hat Virtualization and Red Hat OpenStack Platform to name a few. While KVM relies on architecture-provided hardware virtualization extensions to provide processor virtualization, QEMU is responsible for emulating devices that provide Input/Output functionality in guest environments.

QEMU’s role in the Virtualization stack

QEMU implements the bulk of device emulation. It exposes many paravirtual devices that implement I/O devices following the VIRTIO specification. But QEMU is also the direct interface to KVM from userland. It implements a process abstraction on top of KVM so that guests can be treated as normal processes.

QEMU is the backend to libvirt, which typically sits on top of QEMU and talks to Linux applications that consume virtualization functionality. This design makes QEMU’s security important, since users and application developers may work with the assumption that their data and environment security is dependent on libvirt. In reality, though, libvirt acts as an essential link between applications and functionality provided by QEMU.

Figure 1: Virtualization, a key component of Red Hat’s open hybrid cloud model

Security vulnerabilities in emulated devices can be critical since they potentially put the host system, and data in other virtual machines at risk of being exploited by malicious actors.

Some well-known security bugs in emulated devices are Venom, a vulnerability of the virtual floppy device, CVE-2016-4964 that affects mptsas and CVE-2019-6778, which is an overflow in QEMU’s SLIRP implementation.

Developers and security researchers are in a constant battle to stay on top of vulnerabilities lurking in crucial pieces of software that can be used to exploit end-user’s data and application environments. In this post we describe a few of the ways in which QEMU developers proactively apply static and dynamic testing techniques to locate and fix bugs in the code.

A look at the security process

The QEMU project follows a well-defined process to identify, report and fix security vulnerabilities as outlined on QEMU’s Security Process page. It comprises of the following steps:

Vulnerability reporting: The reporter contacts a closed group of QEMU developers and interest-holders with the detailed description of the issue found and why the reporter thinks it’s a security vulnerability.
Determination: The security team determines whether the report is indeed a security vulnerability. If not, and a fix is required, the bug fix process may follow the usual format applicable for non-security QEMU bug fixes.
Impact and severity: If the issue is found to be a security bug, the security team formally evaluates the ease of exploitation and extent of the potential damage by an attacker.
Publication Embargo: If the vulnerability hasn’t been publicly disclosed, the security team and the reporter agree upon a deadline for public disclosure. The length of this embargo may depend on the severity and impact of the vulnerability and is generally less than 15 days.
CVE assignment: If a number hasn’t been assigned yet, an entry is made to the CVE database.
Identifying a fix: The security team collaborates with the reporter and additional QEMU developers to rapidly identify, implement and deploy a fix for the vulnerability.

While the QEMU security process streamlines the reporting and fixing process for vulnerabilities, QEMU developers are also constantly trying to limit the attack surface by following the Principle of Least Privilege and identifying problem areas by using techniques such as static analysis and fuzzing.

Principle of least privilege

Besides hiding low-level implementation details from high level applications, the marriage between libvirt and QEMU has another advantage — following good security practices. For example, libvirt can use SELinux labels managed by sVirt to confine QEMU processes and prevent unauthorized accesses. This would have been cumbersome had end-users and applications had to directly deal with QEMU.

Figure 2: Classic x86 rings depicting principle of least privilege (By Hertzsprung on English-language Wikipedia, CC BY-SA 3.0)

QEMU also has made several design decisions including some work-in-progress projects that make implementing the principle of least privilege easier. These include the multi-process QEMU project that separates QEMU services into separate processes much like a microkernel and a QEMU module system that loads features on demand.

Static Analysis

In the simplest form, static analysis of code checks code formatting and reports style errors. A popular example is the Checkpatch script in the Linux kernel. More advanced analysis can point out common programming bugs such as off-by-one, use-after-free, buffer overflow, etc. without having to run QEMU.

Figure 3: Fixing improper usage of return values, spotted by static analysis

To understand the impact of static analysis, we identified QEMU commits that were introduced as a result of defects found by static analyses. The graph below shows how the QEMU project has been using static analysis as an effective tool. It’s worth noting that the actual number of fixes introduced by static analyses may be larger since we relied on keywords in commits to identify these fixes.

Figure 4: Defects fixed after Coverity scans

Automated scans that run at regular intervals make sure that new code being merged is free from known defects. A lot of QEMU developers also run static analyses as part of their development workflow.

Fuzzing

Fuzzing is a dynamic software testing technique that feeds randomized inputs to a program under test. Coverage-guided fuzzing augments the basic fuzzing principle, by evaluating and prioritizing inputs based on the code-coverage they achieve. Fuzzing has been successfully applied to find bugs in a wide range of software. There are some challenges in applying this technique to a target such as QEMU.

QEMU exposes a vast input-space to VMs, in the form of virtual devices. The fuzzer must be connected to this input-space.
The fuzzing framework should be accessible to virtual device developers. Ideally, the difference between writing standard device testing code and fuzzing "harnesses" should be minimal.
Hypervisors and virtual-devices are stateful systems. To ensure that inputs are reproducible, their state must be cleared after each input is executed.
Each randomized input has a relatively low chance of leading to a previously unseen program behavior. A useful fuzzer must have high performance, i.e. a large rate of input execution.

QEMU's fuzzing framework addresses each of these challenges. The framework relies on QEMU's existing testing system, QTest. By leveraging QTest, the fuzzer can make use of high-level abstractions which facilitate communication over the variety of interfaces attached to virtual devices.

The QTest framework is familiar to QEMU developers. Writing a fuzzer for a particular virtual device is nearly identical to writing testing code. Where a standard test relies on hard-coded inputs to verify device functionality, the fuzzing code sources inputs from a randomized buffer. The fuzzing framework allows the developer to specify a strategy for executing each input.

For example, if the developer uses the reboot strategy, the VM is rebooted after each input. The developer can also specify the fork strategy — in this case the VM is booted only once, and each input is executed within a forked child process — to see that no state leaks between executions.

After an input has been executed, the parent process examines the code coverage of the child(executor) process to identify whether the input led to previously-unseen behavior. These “interesting” inputs serve as the basis for future mutations by the fuzzing engine.

Figure 5: Fuzzing the emulated network device

Recently, QEMU has entered the OSS-Fuzz program for continuous fuzzing of open-source software. As part of OSS-Fuzz, changes to the source-code are fuzzed as soon as they are upstream. This allows potential bugs to be found before they enter a release.

Fuzzing has led to bugs found in a variety of devices, including VIRTIO-standard devices, the console, and PCI implementation. This has demonstrated that fuzzing is an effective way of locating bugs in hypervisors, but there is still much room for improvement in the form of support for fuzzing additional virtual devices, performance improvements, and automated generation of crash-reproducing code to be sent to the device maintainer and used for regression-testing.

Conclusion

The static and dynamic bug-finding techniques used by QEMU developers improve the security of each release, and provide a high level of protection to end-users. Static Analysis with Coverity has led to over 1,000 bugs fixed in QEMU. Of these, nearly 50 have been annotated as buffer overflows, a dangerous type of vulnerability.

Fuzzing support has recently been merged into mainline QEMU and has already helped identify and patch bugs in the serial console and Virtio implementations. To date the fuzzer has produced over 100 uniquely crashing inputs. Specifically, fuzzing is an effective tool and we look forward to applying fuzzing techniques to other emulated devices exposed by QEMU such as the USB framework and user-facing interfaces such as SPICE.

Preventing security vulnerabilities is an ongoing battle. The QEMU project has been hard at work to identify potential security issues so that QEMU stays a powerful hypervisor for enabling cloud computing.