The limits of compatibility and supportability with containers

30 maggio 201910 minuti (tempo di lettura)Container

Principal Product Manager - Containers

Many folks who do container development have run Alpine container images. You might have run Fedora, Red Hat Enterprise Linux (RHEL), CentOS, Debian, and Ubuntu images as well. If you are adventurous, you may have even run Arch, Gentoo, or dare I say, really old container images - like, RHEL 5 old.

If you have some experience running container images, you might be led to believe that anything will just work, all the time, because containers are often thought to be completely portable across time and space. And a lot of the time, they do work! (Until they don't.)

It’s easy to assume that there is nothing to worry about when mixing and matching the container image userspace and host operating system. This post intends to give a realistic explanation on the limits of compatibility with container images, and demonstrate why bring your own images (BYI) isn't a workable enterprise solution..

Background

It all starts with Application Programming Interfaces (APIs). We throw this term around a lot today with microservices, but this concept has been around for decades, and it applies to system software like compilers, C libraries, and Linux kernels as much as more modern web applications.

The Linux kernel is in part founded on the idea of a fairly strict set of API calls. By strict, I mean, a lot of really smart people try to make sure that it’s designed well, doesn’t need to change very often, and when it does change, it rarely breaks applications which depend on it -- and then only if absolutely necessary. The Linux API is designed to be treated as a dependable platform to build applications on.

Applications use this kernel API to do things like open files, open TCP sockets, and send data to those files and sockets. In casual conversation, we often call this kernel API “the syscall layer,” “syscalls,” “the Linux kernel interface,” or “the Linux API.” Tangentially, all of the programs and libraries that depend on these Linux system calls are referred to as “user space” and everything that services these API calls is referred to as “the kernel” or “kernel space.” All of this language can be a blocker to understanding how this works.

Most applications don’t directly interact with system calls, unless the code is written in assembly, or machine code (don’t feel bad if you don’t fully understand what that means). Most applications will use a C library, either directly through compilation and linking, or indirectly through an interpreter like PHP, Python, Ruby, and Java. Yes, Java still uses a C library, it’s written in C and compiled (no worries if you haven’t thought about that).

The C library provides functions which map fairly directly to the kernel system calls, but it handles a lot of compatibility problems, and creates some too. In the Linux world, and in particular in the Red Hat world, the C library of choice is glibc. We put a lot of work into it to help make it the best performing C library on Linux.

But, the syscall layer isn’t the only API into the kernel -- and the C library isn’t the only way to get at bits of the kernel. There are other interfaces into the kernel that can be thought of as APIs. Things like ioctl, /proc, /dev, /sys and plenty of OTHER EXAMPLES (in rage caps). For the most part, applications stick to the syscall interface, but privileged applications which are used for troubleshooting (things like nmap, tcpdump, eBPF, dtrace, systemtap, etc.) often tiptoe outside of the syscall layer or invoke system calls in hardware specific ways which limit portability and compatibility between Linux systems.

But, containers

You use containers, so none of this applies to you right? Wrong. Containers make system calls, they also tiptoe outside of the syscall API just like regular applications, especially privileged containers.

I have previously written about this, so I will not rehash this here, but suffice to say, containers are just regular Linux processes with many of the same advantages and disadvantages when it comes to portability and compatibility. If you need a bit of a primer, look at Architecting Containers series. The takeaway is, there is no magic compatibility layer with containers. With containers the kernel and C library are the interpreter (in the strict Computer Science sense) for binaries.

So, what are the limits of compatibility?

That’s a tricky question. Let’s level set. Most applications that you want to use in a container will just work. Most applications stick to the syscall layer pretty well, so when they are put into a container, they behave nicely. For example, web servers and the Java JVM, two very commonly used applications in containers, behave fairly well in containers. That’s because they don’t typically do anything too complex at the syscall layer. In fact, web servers make a fairly limited set of system calls under most situations (mostly file open, and socket open, and write calls).

But, there are some limits...

First, there is the ever present Turing Complete problem with the containers and the syscall layer. When you give somebody an interpreter like Python, or Java (technically a virtual machine, not an interpreter from a Computer Science perspective), they can execute almost any system call they want. A human can load any code they want into the container and have the interpreter or virtual machine attempt to execute it.

If you have ever used the system() function call in Java, Python, Perl, or just executed Bash commands, you might realize that you have a lot of power in an application. This system() function allows a programmer to interact directly with the C library and ask the kernel to do just about anything.

Predicting what system calls a user will execute at runtime is considered a Turing Complete problem because it’s impossible to accurately predict what system calls any given application will execute.

A perfect example is a shell, a user can type anything they want, you can’t predict. You won’t know what system call will be executed until runtime, making it impossible to analyze a container image, or more accurately, the binaries in them. This makes it impossible to guarantee compatibility between kernel versions because you don’t know what stuff the user will try to execute. New syscalls can be added, or they can behave differently over time, so if your application is more advanced than just opening files, and shipping them across the network, it’s more likely that you will run into this problem.

The second problem. You say, “we only use web servers, so we don’t have this problem.” But, I have seen this movie before, and I know how it ends. Your workloads may expand, and you can’t predict how they will. Today, you are using web servers, but tomorrow your workloads may expand to AI workloads, HPC, GPU workloads, or edge IoT stuff with weird hardware requirements.

The list goes on and on. Like Linux bare metal applications, and then virtualization, workloads are almost certain to expand in containers. They already have! Red Hat has to think about this problem today because some customer, somewhere already cares about all of these use cases on Red Hat OpenShift and RHEL. As your workloads expand, you too will have to think about these problems.

The third problem - there are meta-APIs. These aren’t necessarily pieces of technology that a programmer interacts with directly from their code, but their code ends up touching them at runtime. I’m talking about things like SELinux, SECCOMP, Linux Capabilities, firewalls, VPNs, overlay networks, etc.

All of these pieces of technology sit between your code and your customers. Changes to any one of them can affect performance, security, or just simply break backwards compatibility at any time. We see this all the time. Infrastructure technology changes, and applications stop working like they did before. The same is true in containers.

The Bugzilla Breakdown

So, you’re saying to yourself, “alright this guy seems to be making a solid case for why I need to think about this, but I need an example to really believe it!” No problem, I got you covered. Let’s try something which I shall call “The BZ Breakdown.”

Let’s start with the situation. We have a bug reported in the RHEL 6 container image. Luckily, this bug was reported by Dan Walsh, and it has a wonderful set of instructions to reproduce it and understand it. At its core, the useradd command is failing when run in a container.

This is something that is commonly done when packages are installed during container image builds, so this is a big deal. It will pretty much prevent you from installing any package that requires useradd - like most web servers. Remember, web servers are one of the most common, if not the most common applications put into containers. Yeah, that.

Strangely, the RHEL6 image works fine on RHEL 6, but this bug manifests itself when you run it on a RHEL 7 container host that has a newer kernel, and a newer container engine. As of today, RHEL 7 is the most common place for Red Hat customers to run RHEL 6 or RHEL 7 container images, so we have to figure out what’s wrong, or customers can’t run RHEL 6 images on RHEL 7.

Let’s test it, so you can see what happens in real life. I have created a Fedora 14 container image to make this easier and will run the image on RHEL 7. If you’ll kindly remember, RHEL 6 was derived from packages in Fedora 12, 13, and even 14, so we can use that as a proxy for an “unpatched” RHEL 6 for this test.

First, run the command in Fedora 14:

podman run -it quay.io/fatherlinux/fedora14 bash -c “useradd fred && cat /etc/passwd”

Output:

useradd: failure while writing changes to /etc/passwd

Now, let’s try it on an ancient RHEL6.5-11 image:

podman run -it registry.redhat.io/rhel6:latest bash -c "useradd fred && cat /etc/passwd"

Output:
…
fred:x:500:500::/home/fred:/bin/bash

Finally, let’s try it on a newer RHEL 6 image:

podman run -it registry.redhat.io/rhel6:6.5-11 bash -c "useradd fred && cat /etc/passwd" 

Output:

...

fred:x:500:500::/home/fred:/bin/bash

So, if you discovered this problem in your environment, where would you start to troubleshoot it? Is it the container image? That would be counter intuitive? Is it the container engine? Possibly. What about the container runtimes? Maybe. Is it the C library? Possibly. Is it the Linux Kernel? Sure, maybe. In reality it could be any of these things and more. You have to ask yourself two questions:

Which component has the problem?
Once I nail down which component it is, can I patch or hack around it?

I’ll save you the pain and agony on this bug and just give you the answer. There was no way to hack around it, it required a patch. Counter intuitively, the problem was with the container image (that is not where I would have bet my money).

The libselinux package needed patching. A number of applications in a container image are SELinux aware, which means their behavior changes when SELinux is enabled or disabled. The useradd command is a perfect example. Libselinux was reporting that SELinux was enabled in the container, so useradd executed the wrong code path. How long do you think it would have taken your developers or operations team to track this down, and patch it?

The Limits of Supportability

As the product manager for containers in RHEL and OpenShift, I love meeting customers’ wants and needs. I would love to tell customers that we could support any container host and any container image. I would love to confidently tell customers that we can support Alpine, Debian, Ubuntu, Fedora, and Gentoo container images, on RHEL container hosts. I would love to tell them that they can run RHEL images on any Linux container host and it would be completely supported.

It would be great, because customers want to do it, and they want to know that they have a vendor that can help them do it in a supportable way. They want to know that the vendor can patch any problems that come up. But, I can’t tell customers that in good faith. If I did commit to that, my colleagues in Red Hat Support, and Engineering would not send me a Christmas card. In fact, I can feel their transdimensional stares through time and space for just writing this paragraph.

Why does supporting all these different permutations worry my colleagues in Red Hat Support, Engineering and Quality Engineering? At first glance, it feels like such problems should be fairly rare, and, in my experience, they are. Things do just work most of the time, so why are they so worried?

First and foremost, Red Hat Engineers can’t support and patch other people’s container images, C libraries, container engines, container runtimes and container hosts. Even though problems are rare, Red Hat can’t commit to patching other people’s code. That’s just not possible.

Second, when problems are discovered, they are difficult to troubleshoot and patch. This by itself limits what Red Hat or any other vendor can commit to support. Engineering is expensive and a finite resource. Red Hat works with the upstream community, and this kind of problem is not something that everyone in the upstream is committed to solving.

If a particular set of container images don’t work with a container host, the answer in the community will probably be, "use a different combination of components." Stated another way, they will likely just tell you to recompile on the newer kernel, get a different version, etc.

These kinds of problems can manifest themselves any time the kernel and the user space aren’t built and shipped together. This is true even with containers. Telling customers to just recompile is not an enterprise solution that Red Hat feels comfortable with. This is one reason Red Hat limits which set of components that it will commit to support.

Third, my colleagues in Red Hat Support, Engineering, and Quality Engineering are really smart people who have seen these types of problems a lot. They have a feel for how they manifest, and how hard they are to find and patch. I trust their judgement on this, and also agree. To best support customers, we have to limit the support matrix.

Conclusion

If you are a systems administrator, you are probably at least thinking a little bit about this problem. If you are a developer, you may not be thinking about this at all. After reading this post, you should be aware that this problem is real, and we have examples of it manifesting.

There are reasonable temporal and spatial limits to what can be supported with regard to combinations of container images, C libraries, container engines, container runtimes and container hosts. At Red Hat we feel confident that we can offer support, and patching of RHEL 6, RHEL 7, and RHEL 8 container images on RHEL 7 and RHEL 8 container hosts.

In fact, this is why we've introduced the Red Hat Universal Base Image. We want to make it easier for developers to work with our images, so that we can make it easier to support applications when they're finished. By lowering the barrier of entry to get and distribute RHEL-based images, we hope to make it easier to create supportable container images for all of the folks using RHEL and Red Hat OpenShift.

I’ll leave you with one question on supportability - do you think you are properly judging how expensive one of these problems will be for your engineering teams when you run into one?

Sull'autore

Scott McCarty

Principal Product Manager - Containers

At Red Hat, Scott McCarty is Senior Principal Product Manager for RHEL Server, arguably the largest open source software business in the world. Focus areas include cloud, containers, workload expansion, and automation. Working closely with customers, partners, engineering teams, sales, marketing, other product teams, and even in the community, he combines personal experience with customer and partner feedback to enhance and tailor strategic capabilities in Red Hat Enterprise Linux.

McCarty is a social media start-up veteran, an e-commerce old timer, and a weathered government research technologist, with experience across a variety of companies and organizations, from seven person startups to 20,000 employee technology companies. This has culminated in a unique perspective on open source software development, delivery, and maintenance.