The concept of distroless is a popular idea in the world of containers. The idea is to package applications in container images while at the same time removing as much of the operating system as possible (package managers, libraries, shells, etc). This does provide some security benefits, but these benefits are often blown out of proportion because of a naive understanding of what an operating system is, how it works, and in particular what a Linux distribution is and how they work.
This article will try to give a clearer understanding of the actual benefits of Distroless while at the same time tempering the over-hyped marketing of Distroless. Let’s explore some fallacies.
Want to do more with Red Hat's Universal Base Image (UBI)?
Fallacy #1: Size is The Most Important to Attack Surface
Pratyusa K. Manadhata from Carnegie Mellon has an elegantly simple definition. In his paper An Attack Surface Metric, he states that “a system’s attack surface is the set of ways in which an adversary can enter the system and potentially cause damage.” But, how does this translate to containers and container images?
Often, the attack surface of a container image is measured by the number of files in it, or how many megabytes of space it uses on disk. These measurements are naive proxies for the actual attack surface. To truly understand attack surface, a security analyst must understand several things:
Not all files in a container image contribute to attack surface equally
Files which are directly in the execution path (web servers, C libraries, etc) are more likely to expand the attack surface than files that don’t (shells, config files, etc.)
The quality of software and configuration(aka files) in the direct execution path contribute to attack surface more than the size of the container image or the number of files contained in the image.
The software in the direct execution path which is used in many different container images (C libraries, web servers, encryption libraries, etc) contributes a larger share to the attack surface.
Standardizing on the exact same versions (Linux distribution and version) of this widely used software in the direct execution path (C libraries, web servers, encryption libraries, etc) reduces attack surface, and can make compliance and remediation easier.
Stated another way, standardization and quality of the software in your direct execution path lowers your attack surface more than distroless does. If everything can’t be run with the exact same distroless images, you will not benefit much from distroless.
Fallacy #2 You Can Actually Remove the Operating System from a Container Image
Much like cloud, there is no such thing as distroless, just somebody else’s Linux distro. So-called "distroless" container images are typically very slimmed down user space environments without package managers, shells or other apps you might find in a typical distribution. That's it.
An operating system, and more specifically, a Linux distro is made up of two main components: a kernel, and a user space. A kernel is fairly easy to understand, it’s a special program that runs on the hardware or virtual machine. The user space is a bit harder to understand. The user space includes everything you can see in a container image, for example, things like the C library (glibc, or muslc), web servers, encryption libraries, timezone data, locale data (language and cultural), etc.
The user space can’t truly be removed from a distroless container image, and a distroless image must always run on a kernel.
That’s right, even Google’s distroless project relies on Debian for user space requirements. (See this request to the distroless project asking for them to rebase to the latest Debian Bullseye.)
Even in a distroless set of container images, things like the Java virtual machine, Python, and Node.js are compiled against a C library which gives these user space programs access to low level functions in the Linux kernel (network sockets, storage volumes, files, etc).
Linux distros exist because we just want to write applications, we don’t want to patch and maintain widely used libraries and infrastructure like timezone data, C libraries, etc. It’s completely logical for projects like distroless to rely on a Linux distro to provide this infrastructure.
Don’t trick yourself into thinking that you’re reducing your attack surface with distroless alone. It’s a wash. There's still a distro in "distroless" containers, just less of one. See also: Do Linux distributions still matter with containers?
Fallacy #3: Hackers Can Break into Your Containers Using All of These Files Just Sitting There
This one makes me laugh. Most of the excess size in container images comes from spurious files that do nothing. Things like man pages, timezone data, locale data, debug binary data, etc.
Ask yourself, how would a hacker use excess timezone data, provided by a Linux distro, to break into your container (not to be confused with a hacker sneaking bad time zone data in and confusing glibc. Notice, glibc)? While I’m not advocating for larger container images, it’s still a waste of space. However it’s difficult to argue that most of the bloat into container images is actually usable by attackers. I'm more concerned that a piece of Ruby (or whatever, no offense to Ruby) code copied/pasted into an application will create a CVE than a piece of timezone data.
Contrary to what we all think about bloat (it’s annoying), an attacker will likely just bring the tools they need with them. It’s quite common for attackers to use stack overflows and other exploits to write their own shells into memory. Once they have a shell, they can literally bring a toolkit with them. They probably don’t want to use the tools on the system, they want to use their own like a plumber who brings their own toolbox instead of using your tools in the garage.
Instead, think about the bloat across the fleet of cloud servers, network devices, routers, switches, as well as all of the different web servers (httpd, nginx, golang’s built-in one, as well as many scripting languages which have modules which implement http directly in the language). Think about this entire set of software across the entire environment. From encryption modules to TCP stacks, to implementations of the http protocol (aka web servers).
This is the attack surface for your applications and infrastructure. All of this code is functional and in the direct execution path from the outside of your organization when a web request is served.
When you think about attack surface, the right way, the bloat in a container image seems like the least of your worries. I’m not saying it doesn’t matter, it does, but each different piece of software deployed in an environment is a new permutation. It’s the number of permutations that you should worry about first.
Red Hat Universal Base Image Micro (UBI Micro)
You might be asking yourself, why did Red Hat release UBI Micro if distroless isn’t that big of a deal? Well, once you’ve standardized on a single glibc, a single OpenSSL library, and a single nginx or Apache web server version, the next optimization is indeed trimming the size of the individual services down.
Once you’ve standardized on UBI as a whole, which are RHEL packages, now it’s time to trim the individual container images down. UBI Micro helps with that. UBI Micro is built from the same glibc as RHEL.
When you add OpenSSL to UBI Micro, it’s the same package from RHEL. When you add httpd to UBI Micro, it’s the same Apache from RHEL. This should form a pretty clear picture. When you use UBI Micro, you’re not pulling yet another Linux distro into your environment, and thereby increasing your attack surface, you really are reducing it.
For more information on UBI micro, check out: Introduction to Red Hat's UBI Micro.