Skip to main content

An introduction to crun, a fast and low-memory footprint container runtime

Check out crun, an OCI-compliant alternative to runc for Linux container runtime.
Image
Introduction to CRUN

Photo by Artūras Kokorevas from Pexels

At Red Hat, we have a mandatory shut down each year during the Christmas holiday week. During that time, all Red Hat offices close, and all engineers are free to go home and partake in their passions. Mine are eating and drinking too much, and binge-watching Netflix. Some, like Giuseppe Scrivano, go home and work on open source projects.

crun is born

Giuseppe had been working on container engine projects at Red Hat, including Podman, Buildah, and CRI-O. All three of these projects use the OCI runtime runc to launch their containers.

runc is a Go language-based tool that reads a runtime specification and configures the Linux kernel. It eventually creates and starts container processes. As it turns out, Go might not have been the best programming language for this task. Go does not have good support for the fork/exec model of computing. Go's threading model expects programs to fork a second process and then to exec immediately. However, an OCI container runtime is expected to fork off the first process in the container. It may then do some additional configuration, including potentially executing hook programs, before exec-ing the container process. The runc developers have added a lot of clever hacks to make this work but are still constrained by Go's limitations.

Giuseppe wanted to see if an OCI runtime written in C could improve upon runc. Unlike Go, C is not multi-threaded by default, and was built and designed around the fork/exec model. It could handle this part of an OCI runtime in a much cleaner fashion. C is also a much lower level language and interacts very well with the Linux kernel. Finally, C is lightweight, so it compiles down to a far smaller size and uses less memory than Go.

After a few days, Giuseppe completed a prototype of his new OCI runtime, which he called "crun." Originally, we viewed this as a nice proof of concept that showed that tools other than runc could implement the OCI runtime specification. However, Giuseppe used crun as a tool to investigate new ways of running containers and enabling new use cases. Once he proved the value of a new feature, he could work with the OCI community to update the specification and eventually get it into runc. Hence, crun evolved into a playground for innovating on new container technologies.

cgroups v2 - the chicken and the egg

In 2018, going into 2019, we were frustrated as we approached the release of RHEL8. We had been asked by upper management if we should support cgroups v2 by default. We had to tell them no. The problem was the entire container community at the time was based on cgroups v1. Tools like Kubernetes, OpenShift, and Docker, were locked into cgroups v1 and showed no signs of moving. Containers were too crucial to the enterprise customers to break them by default.

It was a real chicken-and-egg situation. Upstream kernel engineers had moved to cgroups v2 for all future development, but the container world was stuck in cgroups v1. While the rest of Linux had added support for v2, containers were so important that none of the leading distributions would change their default. Because of this, there was no incentive for container engines and orchestrators to move to cgroups v2. If we did nothing, the next major version of RHEL was going to be stuck with cgroups v1.

We wanted to force the issue by changing the Fedora default to cgroups v2. At Devconf.cz in 2019, I pledged to get our container engines, Podman, Buildah, and CRI-O, to support cgroups v2. Giuseppe agreed to make crun support cgroups v2 and collaborate with the OCI community to get cgroups v2 support into the runtime specification. We were off and running.

Fedora 31 switched the default, and there were a lot of growing pains. We were ultimately successful, mainly because of the excellent support from crun. Giuseppe kicked off an effort to get the container ecosystem to support cgroups v2, and there is on-going work to add support to the entire Kubernetes stack. Now runc also has experimental support for cgroup v2 as of v1.0.0-rc91, thanks to the upstream work of Kir Kolyshkin and Akihiro Suda.

Over time, we noticed that many users and customers have a specific set of questions. These questions led to a call for a FAQ about crun, how it compares to runc, and its performance.

A crun FAQ

Can I just swap runc with crun?
To make a long story short, yes. runc and crun can be used interchangeably as both implement the OCI runtime specification. While crun is feature-compatible with runc, it also offers a set of helpful and experimental features that you can find below.

Do crun and runc containers work differently? If I give the same Podman CLI or Kubernetes YAML, do I get the same containers?
For almost every case, they should work identically. The OCI runtime's job is to instrument the kernel to control how PID 1 of the container runs. After it finishes setting up the kernel and executing PID 1, the OCI runtime exits. It is up to higher-level tools like conmon or the container engine to monitor the container. In some cases, crun supports additional features that have not made it into the OCI runtime spec yet, and the containers are launched to take advantage of these features, as explained below.

How much smaller is crun versus runc?
crun is a much smaller binary. If compiled with -Os, the crun binary is ~300k. runc is currently ~15M. That means runc is about 50 times larger than crun.

How much faster is crun than runc?
Depending on the container configuration, crun can be twice as fast as runc. Here are some results for creating 100 containers sequentially that run /usr/bin/true using both runc and crun:

# for RUNTIME in runc crun do \
    \time -v sh -c "for i in {1..100}; do $RUNTIME run foo < /dev/null; done" \
done
    Command being timed: "sh -c for i in {1..100}; do runc run foo; done"
    User time (seconds): 2.16
    System time (seconds): 4.60
    Elapsed (wall clock) time (h:mm:ss or m:ss): 0:06.89
    ...
    Maximum resident set size (kbytes): 15120
...
    Command being timed: "sh -c for i in {1..100}; do crun run foo; done"
    User time (seconds): 0.53
    System time (seconds): 1.87
    Elapsed (wall clock) time (h:mm:ss or m:ss): 0:03.86
    ...
    Maximum resident set size (kbytes): 3752
 ...

What is the memory consumption of crun compared to runc?
In the output above, we see the memory usage with crun is at 3752 Kb. With runc, it is 15120 Kb. crun also needs much less memory for the init process. We have experimented with running a container with as small as a 250K memory limit set.

We see a decent amount of IoT interest because of its size and memory footprint.

What is the minimal number of PIDs for crun and runc?
Sometimes users want to run their containers with a minimal number of processes, or even limit it to one. With runc you can't set a PIDs limit that is too low, because the Go runtime spawns several threads. Since crun is written in C, it does not have that problem.

$ podman --runtime /use/bin/runc run --rm --pids-limit 5 fedora echo it works
Error: container create failed (no logs from conmon): EOF
$ podman --runtime /usr/bin/crun run --rm --pids-limit 1 fedora echo it works
it works

Is crun mature enough to be used in production?
crun has been the default for the past year on Fedora 31, 32, and Rawhide, running hundreds of thousands of Fedora installations with few issues. Other distributions, such as ArchLinux and Gentoo, use crun to support containers on cgroups v2-enabled systems. CRI-O's testing infrastructure runs Kubernetes with crun, adding hundreds of thousands of successful tests. Community members are running crun under containerd as well.

Crun will be in tech preview as an alternative OCI runtime as of the RHEL 8.3 release.

Does crun support OCI hooks?
Because crun is compliant with the OCI runtime specification, it supports OCI hooks. Such hooks allow the execution of specific programs at different stages of the container's lifecycle, for instance, before or after starting the container.

Can I use crun with Docker?
Yes, both Docker and containerd can use crun. Any container engine that uses OCI-compliant container runtimes can use crun. For instance, the Sarus developers reported having tested crun successfully. Sarus is an OCI-compliant container engine, so it easily switches between crun and runc.

Additional features of crun

Since runc is the reference implementation of the OCI runtime specification, it can not really experiment with new features, as we saw with cgroups v2. Changes need to be made to the specification before runc can officially adopt them. This causes a drag on innovation since the OCI committee wants to see proof of the need and an implementation before updating the specification. Giuseppe uses crun to experiment with new features based on the needs of Podman users and the greater container community. Once crun has proven the use case, we open up the discussion with the OCI to get the feature formally adopted and implemented in runc. Below are some examples of these experimental features.

Sharing files by group for rootless containers

One problem we have with rootless Podman is users can have full access to files/directories based on their user groups. For example, an administrator might create a directory on disk owned by the engineering group. Then individual users on the system can be added to the engineering group and share the files in the folder. The problem is that users want to share these files inside a container, and rootless Podman blocks the access. When rootless Podman executes, it creates a user namespace and only maps the UID of the user and the primary group of the user into the container. It does this for security reasons. If you want to isolate your container from the host, you do not wish to leak a powerful group like wheel into the container. Giuseppe added an annotation that crun interprets that allows for leaking groups into the container.

man podman run
…
       Note:  if the user only has access rights via a group, accessing the device from inside a rootless container will fail. The crun(1) runtime offers a workaround for this by adding the option --annotation run.oci.keep_original_groups=1.

We are now working to get this feature added to the OCI runtime specification.

Controlling stdout and stderr of OCI hooks

crun offers another feature that is not yet part of the runtime specification. Debugging hooks can be quite tricky because, by default, it's not possible to get the hook's stdout and stderr. Getting the error or debug messages may require some yoga. A commonly-used trick is logging to the syslog to access the hook logs via journalctl, but that may not be possible in all cases. And that's yet another use case where crun shines, because it allows for redirecting the stdout and stderr streams of the hooks via annotations.

If you’re using Podman, you can easily do it as follows:

$ podman run --annotation run.oci.hooks.stdout=/tmp/hook.stdout

The executed hooks will now write stdout to /tmp/hook.stdout and stderr to /tmp/hook.stderr, ultimately allowing easy access. We plan to propose this approach to the OCI runtime specification to make it generally available.

crun supports running older versions of systemd on cgroup v2

crun supports running older versions of systemd that lack support for cgroup v2. With the custom annotation run.oci.systemd.force_cgroup_v1, crun forces a cgroup v1 mount inside the container for the name=systemd hierarchy, which is enough for systemd to work. This is used to run older container images, such as RHEL7, on a cgroup v2-enabled system.

$ podman run --annotation run.oci.systemd.force_cgroup_v1=/sys/fs/cgroup centos:7 /usr/lib/systemd/systemd

Crun as a library

crun provides a C library that is used by other programs. We are considering integrating it with conmon, the container monitor used by Podman and CRI-O, rather than executing an OCI runtime.

Extensibility of crun

We can easily use all the kernel features, including syscalls that have not been enabled in Go. For instance, the openat2 syscall that protects against link path attacks is already supported by crun, and it is used on Fedora 32. There was an interesting bug recently where a user tried to join existing namespaces and, at the same time, create a new user namespace. The fix required using the new mount API available in Linux 5.3+, which was much easier to plug into the C code.

crun is more portable

It works on architectures where Go support is limited. For instance, crun was used to port Docker on Risc-V.

Conclusion

crun is an excellent alternative to runc for the OCI runtime. It proves the power of standards like the OCI runtime specification and the open source way. It has several advantages over runc and is leading the way in innovation for how we run containers. While we use crun as an experimental platform for developing new features, it is ready for regular production use. We continue to work on getting these new features approved into the OCI runtime specification, and merged into runc.

[ Getting started with containers? Check out this free course. Deploying containerized applications: A technical overview. ]

What to read next

Topics:   Containers  
Author’s photo

Dan Walsh

Daniel Walsh has worked in the computer security field for over 30 years. Dan is a Consulting Engineer at Red Hat. He joined Red Hat in August 2001. Dan leads the Red Hat Container Engineering team since August 2013, but has been working on container technology for several years. More about me

Author’s photo

Valentin Rothberg

Container engineer at Red Hat, bass player, music lover. More about me

Author’s photo

Giuseppe Scrivano

Giuseppe is an engineer in the containers runtime team at Red Hat.  He enjoys working on everything that is low level.  He contributes to projects like Podman and CRI-O. More about me

Related Content

OUR BEST CONTENT, DELIVERED TO YOUR INBOX