[libvirt] RFC: Improving unit test coverage

Mon May 20 15:45:24 UTC 2013

Over the many years that libvirt has been in existance, we have
developed a huge number of features, and rate of change in the
codebase shows no sign of slowing down.

Unfortunately the rate of regressions and/or new bugs also shows
no sign of decreasing. As libvirt is used ever more widely, poor
quality releases are having an increasingly serious negative
impact on the impressions of libvirt by downstream consumers.

In addition the introduction of the new access control facilities
will have the unfortunate side-effect of increasing the severity
of many bugs. Historically a read-write connection to libvirtd
could be considered to be equivalent to root, so a bug affecting
an authenticated user would not be considered a privilege
escalation, merely denial-of-service. With the introduction of
access control, a read-write connection to libvirtd may have
substantially fewer privileges than root. The result is that
many (arguably all) crasher bugs in libvirtd which were previously
considered mere denial of service issues, and not worthy of
security alerts, will now be classed as potential privilege
escalation flaws.

It is clear that the current rate of defects in libvirt code
is no longer tenable / acceptable. There are a number of test
suites which are targetting libvirt, but this document will focus
on the unit testing framework, rather than the integration test
harnesses. The rationale for this is

 - Unit tests are run by all developers daily via 'make check'
 - Unit tests gate the release process
 - Unit tests are run every time any RPM is built
 - Unit tests do not have an dependancy on external system
   infrastructure / setup
 - Unit tests are directly synchronized with the code commits
   since they're in the same GIT repo

Unit testing techniques
=======================

One of the challenges of unit testing is how to isolate the
test suite from the broader environment of the machine being
used for development. For example, the test suite must not
rely on information located in the filesystem outside the
source treee (eg it must not touch /proc or /sys). In addition
it must be careful not to negatively impact anything running
on the developers' machine (eg it must not spawn libvirtd in
a way which would clash with existing running libvirtd)

There is no silver bullet to achieve isolation of the test
suite from the host OS environment. Instead a variety of
different techniques must be applied depending on the scenario
to be addressed.

Code structure
--------------

First, and foremost, code structure has an enourmous impact
on the ability to unit test code. Ensuring there are clear
boundaries & interfaces between different areas of cdoe is
key to allowing pieces to be unit tested in isolation from
each other.

For example, testing of the QEMU command line generator, can
only be achieved because this code for converting from XML
to ARGV has been isolated from the code which actually starts
up the QEMU process.

While some areas of libvirt code are very well modularized,
others are not. When writing new unit tests it is not at all
unexpected to have to re-factor existing code to further
modularize it. Such refactoring is usually good for overall
readability of the code, as well as facilitating testing.

Intermediate data forms
-----------------------

As hinted at above, a key idea when refactoring code to make
it more amenable to testing is to introduce an intermediate
data form. For example, when generating the QEMU command
line arguments from an XML document, the virCommandPtr object
serves as the intermediate data form. Instead of having the
API which builds the command line, directly execute the QEMU
binary, it spits out a virCommandPtr object as an intermediate
data form. The contents of this object can then be validated
against pre-defined expected command lines.

As a counter-example, the network filter code is very hard to
unit test as it is currently designed because it does not have
any intermediate data form. The methods which process the
network filter configuration build up command lines and then
directly execute them as they go. There is no way to extract
the command line data, without them being executed. To enable
unit testing it is neccessary to refactor the network filter
code to introduce a clear intermediate data form which can
represent the commands that will be executed.

Static analysis
---------------

While the elephant in the room is Coverity, there is a large
amount of static analysis that can be done directly in the
libvirt source tree. Indeed all of the 'syntax-check' rules
are considered to be static analysis. While these rules all
focus on style issues, there is no reason why this should be
the case.

What makes static analysis tools like Coverity so hard to
write is the problem of understanding the semantics of C
code. Fortunately, when doing static analysis of a specific
project (like libvirt), there is often no need to solve the
general case. If there are certain style patterns used in
a project, these can make it possible to write project
specific analysis rules with little more than a few regular
expressions or other simple scripting statements.

For example, the two style checks 'src/check-drivername.pl'
'src/check-driverimpls.pl' enforce a consistent naming
convention on each internal driver method. With this naming
pattern enforced, it is now possible to write a script which
validates that every driver method includes a call to the
correct access control hook. This is project-specific static
analysis that could not be done with Coverity.

For more advanced problems it is possible to make use of a
framework such as CIL to perform static analysis. CIL is an
OCaml project which can parse most C code providing an
in-memory representation of its structure. Then it has a
number of modules to perform data flow analysis on the code.

Combined with knowledge of project coding patterns this tool
allows the creation of scripts to validate mutex lock
acquisition ordering rules, or correct error reporting on
all error exit paths.

Monkey patching / Duck Punching
-------------------------------

One of the pain-points with a project such as libvirt is finding
a way to test parts of the code which rely on external system
state, such as /proc, or /sys filesystem data. It is not always
possible to refactor the code to test it in isolation from the
system services. In such cases, the technique of monkey patching
or duck punching (better known from the world of scripting
languages) can in fact be used from C.

The idea behind the technique is to replace the APIs / objects
that are used by the test suite, with set of "mock" objects or
API implementations which simulate their operation, without
relying on external state. The trick to applying this technique
in the C world with native code, is to rely on the use of the
dynamic linkers' library preload facility, via the LD_PRELOAD.
Any symbol exported by a library listed in the LD_PRELOAD env
variable, will override symbols with the same name in the library
that the test suite is actually linked to. So, for example, if
the preloaded library exports an 'open' symbol, this will
replace the 'open' symbol exported by the standard C library.

To use this technique it is neccessary to first understand what
interfaces need to be replaced. If attempting to test some code
which uses files in some location under /sys/, one might change
the implementation of 'open', so that when passed a filename
starting with '/sys', it will re-write the path to point to a
location under the test suite directory.

This technique is used in the vircgrouptest.c / vircgroupmock.c
files to allow comprehensive testing of libvirt's cgroup
management code, without actually ever touching the cgroup
filesystem. It is also used by the securityselinuxtest.c /
securityselinuxhelper.c files to stub out the sys calls which
write SELinux attributes.

Fuzz testing
------------

This refers to the technique of taking an API or interface and
giving it some data which has some intentionale inaccuracies
or edge cases. For example, a API accepting a 'const char *str'
might be given a NULL string, or a zero length string, or a
string containing non-printable control characters. An API
accepting an 'int' might be tested with various boundary
conditions such as -1, 0, -MAX_INT, +MAX_INT.

In the case of an XML document, one technique would be to
incrementally remove attributes and/or elements to see if the
parser will ever crash on a NULL pointer (it should either
succeed or return a error, never crash). Or alternatively
change the values in various attributes as one would with
parameters passed to an API. If an attribute appears to be
a number, then replace it with a string. If an attribute
appears to be an enum, replace it with some bogus value.
Try zero length strings for various attributes, or again
try non-printable control characters.

In the case of an RPC protocol, the idea would be to make
(intelligent) changes to packets in an attempt to trick
the server into accepting bad data, or even crashing.

It is not practical to test all possible wierd data sets,
so the trick with fuzz testing is to try to intelligently
tailor the data to the interface being tested. eg if a
string parameter represents a file path, there's no point
just randomly changing characters in the path, since that'll
just generate loads of other valid file paths. Instead it is
neccessary to make some more planned changes like removing
the leading '/' to produce a relative path, inserting a
number of '../../../' dirs to trick the application into
writing to the incorrect location, and things of that nature.

For any moderately complex parser, intelligently designed
fuzz testing is likely to identify a wide range of bugs.

Error testing
-------------

Validating correct operation is great, but there is plenty
of code which lives in error handling code paths, which will
never be exercised this way. Since they are seldom run, and
by their nature deal with unexpected data or events, the
error paths are often a good source of bugs. The fuzz testing
technique is one very effective way of exercising error
handling paths, but should not be considered the only possible
one. The monkey patching techniques can also be applied to
the task of error checking. Alternatively the code structure
may facilitate the injection of error conditions in a well
defined manner. For example, all libvirt memory allocation
goes via a wrapper API, instead of to malloc() directly. It
is possible to tell the wrapper API to fail specific allocation
attempts. By running the test suite repeatedly, failing a
different memory allocation each time, it is possible to
exhaustively test error handling of the tested code.

Areas requiring testing
=======================

The best way to identify areas lacking code coverage is to
simply pick a random libvirt source code file. The odds are
that it has insufficient testing. The more optimistic way
is to look at the code coverage reports.

Run 'autogen.sh' with the --enable-test-coverage arg. Then
build it and run 'make check'. Once the tests have completed,
then run 'make cov' to generate an HTML code coverage report.

Some known areas needing work are:

Cgroups setup for QEMU
----------------------

For a given QEMU guest XML configuration there is a well
defined set of cgroups parameters that are expected to be
set.

The vircgroupmock.c file demonstrates how to monkey patch
the cgroups filesystem, allowing unit tests to be written.
The vircgroupmock.c helper could be reused to create a test
suite for validating the operation of the qemu_cgroups.c
APIs.

Cgroups setup for LXC
----------------------

The same as the QEMU test, but for the lxc_cgroups.c file.

SELinux security driver labelling
---------------------------------

For a given XML configuration, there is a well defined set
of file labels that the SELinux security driver must apply
to the filesystem on VM startup.

The securityselinuxhelper.c file demonstrates how to monkey
patch the SELinux library file labelling APIs from unit
tests.

Currently the securityselinuxlabeltest.c test suite does
checks for labels set on disks and serial devices. There
are many other aspects of domain XML that imply labelling.
In particular PCI/USB host devices & kernel/initrd boot
images. It is also neccessary to deal with backing files
for qcow2 format disks.

Audit messages
--------------

Upon QEMU or LXC guest startup a number of audit records must
be logged detailing various pieces of metadata about the guest.

By monkey patching the audit_log_user_message() and audit_open()
library APIs with an LD_PRELOAD library, it would be possible
to perform unit testsing of the domain_audit.c APIs to ensure
that correct audit records are logged for various different
guest configurations.

Network filter
--------------

As described earlier, the network filter code is really badly
designed from the point of view of unit testing. It needs to
be refactored to have an intermediate data form that represents
the list of iptables/ebtables rules that are expected for a
given XML configuration.

Currently it generates shell scripts containing iptables
commands, optionally re-directing via the 'firewall-cmd' program
for integration with firewalld. Refactoring to unit test this
offers a prime opportunity to remove the use of shell entirely

The core observation of the nwfilter code is that it consists
of lists of commands to be run to create ebtables/iptables rules.
Periodically there are commands which are able to rollback the
state changes made by earlier commands when an error occurs.

The idea for the intermediate data form would thus be to introduce
the concept of a "script" object, as a higher level interface than
that offered by virCommandPtr.

A script would comprise one or more sets of command lines for
making changes. At various points it would be possible to add a
checkpoint marker, together with one or more sets of command
lines that would be execute when errors occur which require
rollback.

The network filter code would thus output "script" objects
whose command sets could be validated by a unit test suite.
These script objects can then be translated into virCommandPtr
objects for actual execution by separate code. Or even translated
into DBus API calls for direct communication with firewalld,
bypassing the inefficient firewall-cmd program.

The unit test would consist of XML data files with network
filter configuration rules, alongside expected script command
lines, similar to how the qemu xml -> argv test works.

virsh testing
-------------

The virsh command has rarely had any serious direct testing
of its own. Its code is mostly exercised indirectly by test
suites doing testing of QEMU or other drivers.

Libvirt has a hypervisor driver that is explicit targetted
to testing tools like virsh - the "test" driver. This is a
100% in-memory driver that stubs out all the libvirt APIs
with semi-serious semantics. It is used to do automated
testing of large parts of virt-install and virt-manager.

It can be used to get near 100% testing coverage of the virsh
command from within unit tests. Since it is all in-memory
there is no scope for it to be affected the host system
state.

libvirtd testing
----------------

Testing libvirtd is a tricky proposition, because merely
starting the daemon will have an effect on host OS state,
particularly if any resources are set to autostart. It
can also clash with any libvirtd process the developer is
already running.

The test driver is able to be run inside libvirtd. If it
were possible to disable all the other hypervisor drivers
in libvirtd, the test driver could be used to execise the
RPC dispatch code in libvirtd in a safe contained manner.

This could be particularly good for doing scalability
testing of libvirtd when large numbers of objects. eg since
the test driver doesn't actually run anything, it is possible
to start many 1000's of "test" guests with the only memory
overhead being that of the parsed XML configuration.

This would allow the test suite to identify & validate
limits in the RPC protocol or inefficiencies in libvirtd.

lock driver client
------------------

The virtlockd client code recently had a bug where it
generated invalid RPC packets in response to certain
operations. It is not desirable to have the unit tests
actually run the full virtlockd daemon, just to validate
the client code.

It would, however, be possible to have a unit test which
provided a fake virtlockd UNIX domain socket for the
client to connect to. It would blindly read any data off
the socket & validate it against pre-defined RPC packets.
This approach is already used with some success in the
QEMU JSON monitor test suite.

QEMU disk snapshots
-------------------

The QEMU snapshot code is some of the most complex code in
the QEMU driver. As such it would be desirable to get test
coverage of it. The easiest way todo this would be via the
libvirt TCK integration test harness. There is, however,
value in refactoring some of the code to allow its testing
in a unit test environment.

One approach would be for the test suite to directly invoke
methods in the qemu driver dispatch table. This would require
refactoring the qemuStateInitialize code though, to make it
possible to setup the basic QEMU driver global state. More
desirable would be to refactor the snapshot code so that it
was more detached from the public driver APIs. For example
if some of it could be isolated in a qemu_snapshot.c file,
this may make it practical to unit test some of it.

Techniques from the qemumonitorjsontest.c file could be used
to stub out the monitor APIs that the snapshot code requires
in order to operate.

QEMU driver startup
-------------------

An important aspect of the QEMU driver is that it is able
to reconnect to existing domains when it starts up. It would
be desirable to unit test this functionality to validate the
sequence of operations done when reconnecting to a guest.

Again this would require some refactoring of the
qemuStateInitialize method to allow its execution in a test
suite. It would also likely require the techniques for stubing
the QEMU monitor APIs.

XML parser/formatter reliability
--------------------------------

While we have good testing of valid XML documents, there have
been a number of cases where invalid XML documents have caused
crashes or other bad behaviour. It is not reasonable to add
test cases for every possible invalid XML document. Instead
we need to adopt a fuzzing approach, where we take valid XML
documents and then make incremental changes to them & then
ensure that the code does not crash.

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|