[libvirt] Redesigning Libvirt: Adopting use of a safe language

Bjoern Walk bwalk at linux.vnet.ibm.com
Wed Nov 15 11:28:30 UTC 2017


Hello Daniel,

thank you for this interesting insight. The future-proof choice of
tools, especially programming languages, is certainly a problem that a
lot of project have to solve sooner rather then later. For projects that
are currently written in any non-memory-managed language, this issue is
even more pressing I guess, looking back at the last decade of more or
less devastating security vulnerabilities.

However, since your solution part already reads like a call to arms, I
have to express my concerns about your problem description and proposal.

Daniel P. Berrange <berrange at redhat.com> [2017-11-14, 05:27PM +0000]:
> The Problem(s)
> ==============
> 
> When libvirt was created, C was the only viable choice for anything aiming to be
> a core system library component. At that time 2005, aside from C there were
> common choices of Java, Python, Perl. Java was way too heavy for a low level
> system component, Python was becoming popular but not widely used for low level
> system services and Perl was on a downward trend. None of them are accessible to
> arbitrary languages as libraries, without providing a RPC based API service. As
> it turns out libvirt did end up having RPC based approach for many virt drivers,
> but the original approach was to be a pure library component.
> 
> IOW it is understandable why C was chosen back in 2005, but 12 years on the world
> around us has changed significantly. It has long been accepted that C is a very
> challenging language to write "safe" applications. By "safe" I mean avoiding the
> many problems that lead to critical security bugs. In particular the lack of a
> safe memory management framework leads to memory leaks, double free's, stack or
> heap corruption and more. The lack of strict type safety just compounds these
> problems. We've got many tools to help us in this area, and at times have tried
> to design our APIs to avoid problems, but there's no getting away from fact that
> even the best programmers will continually screw up memory management leading to
> crashes & security flaws. It is just a fact of life when using C, particularly if
> you want to be fast at accepting new feature proposals.
> 
> It is no surprise that there have been no new mainstream programming languages in
> years (decades) which provide an inherantly unsafe memory management framework.
> Even back in 2005 security was a serious challenge, but in the last 10+ years
> the situation has only got worse with countless high profile security bugs a
> direct result of the choice to use C. Given the threat's faced today, one has to
> seriously consider the wisdom of writing any new system software in C.

I agree for newly written software. There is almost no reasoning to use
C for starting another project. Especially given the amount of different
options of problem-specific languages out there nowadays. But I don't
think argument holds for existing projects. I would suggest that the
amount of time that has already been spend in finding and mitigating
critical security bugs outweighs the possible inherent safety of any new
language.

> In another 10 years time, it would not surprise me if any system
> software still using C is considered an obsolete relic, and ripe for a
> rewrite in a memory safe language.

I guess this has been said about the C language a lot of times. Of
course I don't have any better crystal balls then you do, but at least
to this current time I wouldn't think so.

> 
> There are long term implications for the potential pool of contributors in the
> future. There has always been a limited pool of programmers able todo a good job
> in C, compared to those who know higher level languages like Python/Java. A
> programmer write bad code in any language, but in C/C++ that bad code quickly
> turns into a serious problem. Libvirt has done ok despite this, but I feel our
> level of contribution, particularly "drive by" patch submissions, is held back
> by use of C. Move forward another 10 years, and while C will certainly exist, I
> struggle to imagine the talent pool being larger. On the contrary I would expect
> it to shrink, certainly in relative terms, and possibly in absolute terms, as
> other new languages take C's place for low level systems programming. 10 years
> ago, Docker would have been written in C, but they took the sensible decision to
> pick Go instead. This is happening everywhere I look, and if not Go, then Rust.

Out of interest, I took a look at the CVE history of both libvirt and
docker:

    https://www.cvedetails.com/product/20594/Redhat-Libvirt.html?vendor_id=25
    https://www.cvedetails.com/product/28125/Docker-Docker.html?vendor_id=13534

Not sure, how up to date and complete this list is, but for the sake of
arguments, let's take it. Docker since its creation in 2014 had 15 CVEs,
2 of them code execution and 3 of them privilege escalation. On the
other hand, libvirt had, in the same time frame since 2014, a total of
20 CVEs, 1 of them code execution and 2 privilege escalations. The year
2014 was even an outlier with 13 CVEs that year. So honestly, in terms
of security, I don't see a prevailing argument for Go as the better
language compared to C. Mind as well that the size of the codebase of
libvirt is somewhat 3-6 times larger then that of docker, depending on
how you count it.

On could argue that at a more mature state of a project one would expect
to have less and less CVEs but even if we were to compare docker to
libvirt's initial years of CVE history, I don't see a clearer argument.

> We push up against the boundaries of what's sane todo in C in other ways too.
> For portability across operating systems, we have to rely on GNULIB to try
> to sanitize the platform inconsistencies where we use POSIX, and assume that
> any 3rd party libraries we use have done likewise.
> 
> Even then, we've tried to avoid using the platform APIs because their designs
> are often too unsafe to risk using directly (strcat, malloc, free), or are not
> thread safe (APIs lacking _r variants). So we build our own custom C platform
> library on top of the base POSIX system, re-inventing the same wheel that every
> other project written in C invents.

Why has there never been a truly satisfying standard library for C for
this kind of stuff? If such a project would exist, this wheel
re-inventing would be prevented while providing a higher-quality code
for platform library code.

> Every time we have to do work at the core C platform level, it is
> diverting time away from doing working managing higher level concepts.

How often is this the case? I assume that platform code does not change
that often and will converge into a stable fix-point.

> Our code is following an object oriented design in many areas, but such a notion
> is foreign to C, so we have to bolt a poor-mans OO framework on the side. This
> feeds back into the memory safety problem, because our OO invention cannot be
> type checked reliably at compile time, making it easy to do unsafe things with
> objects. It relies on reference counting because there's no automatic memory
> management.
> 
> The other big trend of the past 10 years has been the increase in CPU core
> counts. My first libvirt dev machine had 1 physical CPU with no cores or threads
> or NUMA. My current libvirt dev machine has 2 CPUs, each with 6 cores, for 12
> logical CPUs. Common server machines have 32/64 logical CPUs, and high end has
> 100's of CPUs. In 10 years, we'll see high end machines with 1000's of CPUs and
> entry level with mere 100's. IOW good concurrency is going to be key for any
> scalable application. Libvirt is actually doing reasonably well in this respect
> via our heavily threaded libvirtd daemon. It is not without cost though with
> ever more complex threading & locking models, which still have scalability
> problems. Part of the problem is that, despite Linux having very low overhead
> thread spawning, threads still consume non-trivial resources, so we try to
> constrain how many we use, which forces an M:N relationship between jobs we need
> to process and threads we have available.

This is the one argument against C that I fully support. Support for
parallelism is missing and with current development of multi- and
many-core platforms this is really a show-stopper.

> The Solution(s)
> ===============
> 
> [...]
> 
> The obvious question / difficulty is deciding how to adopt usage of a new
> language, without throwing everything away and starting from scratch. It needs
> to be possible for contributors to continue working on every other aspect of the
> project while adoption takes place over the long term. Blocking ongoing feature
> work for prolonged periods of time is not acceptable.

Yes, I fully concur. But still, I have seen many projects that
underestimated the amount of work even a partial rewrite in another
language takes. And in the end, feature development and even bug fixing
WILL suffer from this transition.

Maybe it is a good idea to look at the GCC project and their transition
from C to C++ and learn from their experience beforehand.

> There is also a question of scope of the work. A possible target would be to aim
> for 100% elimination of C in N years time (for a value of N that is certainly
> greater than 5, possibly as much as 10). There is a question of just whether that
> is a good use of resources, and even practical. In terms of management of KVM
> guests the bulk of ongoing development work, and complexity is in the libvirtd
> daemon. The libvirt.so library merely provides the remote driver client which is
> largely stable & unchanging. So with this in the mind the biggest benefits would
> be in tackling the daemon part of the code where all the complexity lives.
> 
> As mentioned earlier, Go has a very effective FFI mechanism for calling C code
> from Go, and also allows Go code to be called from C. There are some caveats to
> be aware of with passing data between the languages, however, generally it is
> neccessary to copy data structures as C code is not permitted to derefence
> pointers that are owned by the Go GC system. There are two possible approaches
> to take, which can be crudely described as top down, or bottom up.

Earlier you talked about the contributor pool. But wouldn't your
proposal limit this pool even further by actually requiring the
intersection of the pool of C developers AND Go developers?

> [...]

What I would like to see before any rewrite is taken into consideration,
is an effort to reduce complexity, even on the architectural level. Your
proposal to split libvirt into set of daemons with specific tasks can
help here tremendously. In my opinion, a rewrite in another language
should be a last resort thing if every other options have been
exhausted, because, from experience, it WILL set a project back.

Best,
Bjoern

-- 
IBM Systems
Linux on z Systems & Virtualization Development
------------------------------------------------------------------------
IBM Deutschland
Schönaicher Str. 220
71032 Böblingen
Phone: +49 7031 16 1819
E-Mail: bwalk at de.ibm.com
------------------------------------------------------------------------
IBM Deutschland Research & Development GmbH
Vorsitzende des Aufsichtsrats: Martina Koederitz
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 906 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/libvir-list/attachments/20171115/cc1a6f88/attachment-0001.sig>


More information about the libvir-list mailing list