Do rootless containers sound interesting? What exactly are rootless containers? It sounds so mysterious, right? Rootless containers are Linux containers that run as non-root, unprivileged users. In this post, we’ll go into what rootless containers are, as well as how you can test them on Red Hat Enterprise Linux (RHEL) 7.6.
But, let’s ask the question another way, why should you need root to run containers? The whole point of a container is to limit the capabilities of a process to only those which they need right? And, that’s exactly why rootless containers are so interesting.
Why are they called rootless containers? Well, sometimes, every now and then, a name just sticks, and that’s what has happened with rootless containers.
You might be saying to yourself, “but I have rootless containers right now with Docker - I run my docker commands as a regular user and it works all the time.” Even though you are executing the docker command line tool without root, the docker daemon is executing those requests as root on your behalf, like this:
Docker Client (TCP/Unix Socket) -> Docker Daemon (Parent/Child Processes) -> Container
When your client connects to the daemon, you literally have root access on the system.
If a user breaks something by mistake, or worse, on purpose, it’s almost impossible to figure out who did it (or when). That’s not rootless.
To demonstrate the problem, try the following example. First make a note of the numeric user ID (UID) for your regular login (I’ll explain why later):
[fatherlinux@rhel7 ~]$ cat /proc/self/loginuid
Now let’s run a container. Containers are supposed to be isolated, so this should be safe, right? Notice that we are mounting the root file system
/host in the container, then setting up a chroot into it:
[fatherlinux@rhel7 ~]$ Docker run -ti --privileged -v /:/host fedora chroot /host /bin/sh
Now let’s try some commands that would require root access:
sh-4.2# touch /etc/passwd
sh-4.2# cat /etc/shadow
Those commands succeeded, and if you were paying attention, you might notice that you are touching and looking at the host’s files, not an isolated container. So after creating this container the fatherlinux user has just become root on the host system (queue evil laugh).
OK, let’s see if auditing could at least track what this user is doing in this container. On Red Hat systems (and most Linux systems),
/proc/self/loginuid is recorded when you first log into a system, and can’t be changed no matter what you do. The idea is that even when you run sudo or su commands, your login UID can be tracked to prove who ran commands as root. This enables system auditing to track who is doing something, even when they’ve switched user IDs.
Let’s check whether the operations in the container will still be logged under the same numeric UID. Run the following command, still in the container:
sh-4.2# cat /proc/self/loginuid
That’s not the same numeric user ID you noted earlier. (The astute reader might notice that number isn’t arbitrary number - that’s 2^32 - 1, or 0xFFFFFFFF. When interpreted as a signed 32 bit value that’s -1 indicating an error.)
So how did this happen if the loginuid can’t be changed?
When you fire up a container with the docker client, you are talking to an already running server, which fires off some subprocess which eventually fires up your container. When processes are called with a parent->child mechanism, the loginuid is preserved. On the other hand, when a client talks to a server, over a Unix or TCP socket (docker client talks to the daemon), the parent->child connection to the user who ran the command does not exist. This prevents the loginuid from working correctly.
The login UID of your containerized processes inherits the login UID of the docker server instead of your user. The docker server is running as root and was most likely started by systemd.
So, what does all of this mean? It means, your docker users have root, and nobody can track them. That’s bad.
The solution to this problem is to use a container engine that truly runs as your user, not as root. Coincidentally, Podman has a cool new feature to let you do this. You can test it upstream in Fedora, but it’s coming to RHEL too. As of RHEL 7.6, you can test it as Developer Preview with plans for it to be fully supported in RHEL 7.7.
First, make sure you’ve got an up-to-date installation of RHEL 7.6 Server that has been registered so packages can be downloaded from Red Hat. Note: You can download RHEL 7.6 server and get a no-cost subscription through the Red Hat Developer program, if you don’t already have a subscription through your organization.
Install Container Tools
If you can’t find podman, make sure you enable the Extras repo first:
subscription-manager repos --enable=rhel-7-server-extras-rpms
Now, install Podman (and Buildah, and Skopeo while we are at it):
yum install -y podman skopeo buildah
Test Podman as Root
The first step is to do some simple testing:
podman pull rhel7
podman run -it rhel7 bash
OK now, if that looks good, let’s get crazy…
Running regular containers with Podman is cool, but let’s go rootless. First, as root, let’s do some hacking. Just a warning, we are entering unsupported territory, so your mileage may vary. Do not run these commands on a production system.
We will install a set of development packages from the Copr build service which we use in the Fedora community (see them here). These packages were built by an engineer on our team named Vincent Batts. I asked Vincent to build these packages for this preview with RHEL 7.6.
Now, let’s use these packages and make a few modifications on a development system. First install the development packages:
curl -o /etc/yum.repos.d/rhel7.6-rootless-preview.repo https://copr.fedorainfracloud.org/coprs/vbatts/shadow-utils-newxidmap/repo/epel-7/vbatts-shadow-utils-newxidmap-epel-7.repo
yum install -y shadow-utils46-newxidmap slirp4netns
Enable a range of namespaces. This is what maps root in the container, to a regular user outside the container:
echo 10000 > /proc/sys/user/max_user_namespaces
Add a new user, and set the password:
Manually add some entries in
/etc/subgid (the useradd command provided in shadow-utils 4.6 handle this at GA in RHEL 7.7). These are the entries that give a regular user a range of UIDs to use in your containers:
echo "fatherlinux:100000:65536" >> /etc/subuid
echo "fatherlinux:100000:65536" >> /etc/subgid
With above instructions completed, you will be able to run containers as this new user. As of today, you have to SSH in to get all of the right environment variables (su – fatherlinux won’t work):
Now, pull an image:
podman pull rhel7
Since you performing these operations as a regular user, container images will be stored in your home directory. Since you aren’t root, podman won’t be able to write to the main systems image cache (
Inspect that the image is pulled locally:
Finally, let’s run a container. Fingers crossed:
podman run -it rhel7 bash
Red Hat Enterprise Linux Server release 7.6 (Maipo)
You just ran a container as a regular user, congratulations!
Rootless container takes advantage of the RHEL systems User Namespace support to allow users to run containers without requiring any additional privileges all the while preserving auditing on your systems. This improves security, and manageability of containers in RHEL. You can test rootless containers today in RHEL 7.6 and 8.0 Beta depending on your needs.
The work we are doing in Podman and the User Namespace separated containers is also the foundation for the work we are doing on CRI-O in OpenShift 4.X. You have to admit, that’s kinda cool. Stay tuned for more to come with Podman, Buildah, Skopeo, CRI-O, and crictl. There is a ton of work going on in this space.