[libvirt] Redesigning Libvirt: Adapting for the next 12 years

Tue Nov 14 17:22:30 UTC 2017

Hold tight, this is a long one...

It is hard for me to believe it, but the libvirt project is now 12 years old
(born on Nov 2, 2005), and I've been working on it since March 2006, making it
easily the most significant project I've worked on. It started off life as an
attempt to provide a stable application development API for the Xen hypervisor,
interfacing across XenD, XenStore and Xen hypercalls. It was initially just a
plain C library and Python binding, but when we added QEMU support in Feb 2007
the libvirtd daemon was born. That cemented a split of hypervisor drivers into
two distinct architectures, stateless drivers where all logic was in the library
(VMware ESX, VirtualBox, original Xen) and stateful drivers where all logic was
in the daemon (QEMU, LXC, UML, modern Xen).

The project has been wildly successful beyond our expectations, in particular
the hypervisor abstraction layer made it possible for RHEL to switch from using
Xen to KVM while keeping the userspace tooling the same for users. Libvirt is
now used, to some degree, by likely 100's of applications with KVM being the
dominant hypervisor choice by a long way. There is an old adage in the computer
industry though

   "Adapt or die"

This is usually applied to companies who see their primary product suddenly
become a commodity, or disappear into irrelevance as new technology disrupts
the market, killing their revenue stream. It is, however, just as reasonable to
apply this to open source projects which can see their core usage scenarios
disrupted by new startup projects & technologies. While the open source code
will never go away, the companies who pay for the project's developers can
quickly reassign them elsewhere, seriously harming the viability of the
community thereafter.

IOW, while Libvirt has seen 12 years of great success, we must not be so naive
to assume we are going to see another 12 years without being disrupted. 

Over time we've done alot of work refactoring libvirt code to introduce new
concepts and support new hypervisor targets, but I think its fair to say that at
a high level the architecture is unchanged since we first introduced libvirtd,
and then its multithreaded internals in the 2006-2008 timeframe. We've taken a
fairly conservative, evolutionary approach to our changes. This is good, because
providing stability to our users is a critically important reason for libvirt to
exist. This is bad, because we've not been willing to take risks in short term
that could potentially be very beneficial in the long term (5-10 year time).

I think that now is the time to consider some major architectural changes in the
approach we take. There's no single reason, rather a combination of factors all
coming together to form a compelling case for ambitious change.

Before going further though, I want to highlight one important point:

I am NOT suggesting changing the public API or the XML format in a backwards
incompatible manner. API & XML stability is the single most important part of
libvirt and MUST be maintained on a par with that's available today. IOW we can
add new features, but can't remove what's there already. This even leaves the
door open for providing a libvirt2.so, provided we're willing to still maintain
libvirt.so indefinitely alongside, though that's not something I'd encourage.
The majority of the hard problems we face are not in the API design, or in the
XML format, so that's not a significant limiting factor IMHO.

There are three core areas of libvirt I see that are problematic, and where
the fixes have major implications. 

At a very high level what I'm going to suggest is

 - Expose key hypervisor specific concepts as fully supported features to
   applications. In particular provide a way for an application to launch
   QEMU processes directly in their process execution environment, rather
   than as a child of libvirtd.

 - Explode the libvirtd daemon into a swarm of independant daemons. This
   would provide a more reliable system where a single bug doesn't take
   out the entire libvirt management daemon. It would allow for better
   security isolation of components. It would let session libvirtd
   use system daemons for networking & hostdev setup

 - Adopt use of Go and gradually convert (all|most) our C code into Go.
   This would improve the reliablity of libvirt, by giving us a memory
   safe language with garbage collection. It would improve productivity
   by letting us spend more time writing interesting code, rather than
   wasting time on platform portability or building basic abstractions
   for things like OO programming, hash tables, etc (much of the stuff
   we have in src/util), no more XML parsers needed (just annotated
   struct fields). It would increase the talent pool of potential
   contributors to libvirt by lowering the bar to getting work done.

To avoid this mail getting too long, I'll cover each area in a separate mail.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|