In a recent GitHub issue on libpod, a user of Podman suggested that rootless containers eliminated the need for the
--user option when running containers. They assumed that the
--user option had originated in Docker to be able to run a container as a different user. Since rootless Podman runs in rootless mode to begin with, it deprecated the need for the option.
They were mistaken.
Rootless and rootful Podman each support running with multiple users. Both, by default, run the initial process as the root of the user namespace they are launched in. When running rootless containers, it launches the first process as the root of the user namespace you are using. In a previous blog, Understanding root inside and outside of the container, I dug deeper into what is happening here.
If you looked at the process from outside of the container, you would see that it is running as your UID.
$ podman run fedora cat /proc/self/uid_map 0 3267 1 1 100000 65536
My UID is 3267, and you can see the user namespace mapping is mapping UID 0 to 3267 for a range of 1 UIDs. Another way to examine this is by using
podman top to display the user inside of the container and the host user.
$ podman run -d fedora sleep 100 41ad82a732526673299d6785105e1b4a0ef4397ed7ceb8b13b9218e2f3a77003 $ podman top -l user huser USER HUSER root 3267
If you want to run with a different user within the container, then use
-u to select the user. When running in rootless mode, the root of the container is more powerful than non-root of the container, so it is still advisable to run as non-root in a rootless container.
Even in rootless containers, the root of the container has user namespace capabilities. These capabilities are a subsection of the power of root over the user namespace.
$ podman top -l capeff EFFECTIVE CAPS AUDIT_WRITE,CHOWN,DAC_OVERRIDE,FOWNER,FSETID,KILL,MKNOD,NET_BIND_SERVICE,NET_RAW,SETFCAP,SETGID,SETPCAP,SETUID,SYS_CHROOT
As you can see, the rootless process has a bunch of capabilities. You can even run
--privileged and get all of the capabilities. But these capabilities are not the same as the capabilities you get as root; they are user namespace capabilities. They have full control over the namespaces mapped into the container, but no power on other parts of the operating system.
For example, the container has CAP_SETUID. This allows the root process to change its UID to any other UID inside the container. In the case above, it can change the process UID to any UID from 100000 to 165535, as well as back to 3267. The root process is not allowed to setuid to uid (0) or any other UID on the system. Some capabilities like CAP_SYS_ADMIN are stripped down. CAP_NET_ADMIN is only able to manipulate the containers network namespace, but not the hosts.
Now let’s look at what happens when I run the container with the
$ podman run --user 1000 -d fedora sleep 10 976d7f3f034d38657cfba60aef406e7f65eae9eef735619ca7c13f8a946a0122 $ podman top -l user huser USER HUSER 1000 100999
You can see that the container process is running as UID 1000 inside of the container, but it is actually running as UID 100999 on the host.
Now, we see that the container has no capabilities and is locked down.
$ podman top -l capeff EFFECTIVE CAPS none
As long as your container does not need root, I always recommend using the
--user option to improve security further.
Using the --userns=keep-id flag
Just as an addendum, rootless Podman has another cool option:
keep-id option tells Podman to create a user namespace where the current rootless user's UID:GID maps to the same values in the container. When the container is launched, it is running as your UID inside the container and on the host. Many HPC (High-Performance Computing) environments are using this flag and running the entire container with a single non-root UID.
$ podman run --userns keep-id -d fedora sleep 100 319813af33af1f54d2a6a4c00eeb1100dec36e8ba9d4bef76846d0e0dd54a6b8 $ podman top -l user huser USER HUSER 3267 103266
Unfortunately, writing this blog revealed a bug in
podman top, displaying the wrong host user (i.e., HUSER). If I use the
ps command, I see that the
sleep is actually running on the host as my UID. I have opened an issue to fix the bug. Thanks to Giuseppe Scrivano, the bug is fixed in the next release of Podman.
$ ps -ef | grep sleep dwalsh 198080 198069 1 10:57 ? 00:00:00 sleep 100
Since the main process of the container is my UID, it no longer has root capabilities.
$ podman top -l capeff EFFECTIVE CAPS none
Depending on how the container is configured, processes in the container can use
setfcap tools like
sudo to gain additional capabilities, just like a normal login session. Fedora Toolbox uses Podman with the
keep-id option under the covers to give users access to different OS environments.
One potential issue that we have seen users have is when they specify a large UID on rootless containers. Remember that the rootless Podman user is only allocated a limited number of UIDs, as defined in the
/etc/subuid file. Usually, you can only use 65536 UIDs. This means that if you attempt to launch a rootless container with a UID of > 65536, the container will fail. If you have to launch with a larger UID, then you need to modify the
/etc/subuid to include the UID you want to use.
$ podman run --user 70000 fedora id -u Error: container_linux.go:346: starting container process caused "setup user: invalid argument": OCI runtime error $ podman run --user 65536 fedora id -u 65536
--user option is still very necessary and adds a lot of security even when using rootless Podman, and users should still use it to be as secure as possible.
[ Getting started with containers? Check out this free course. Deploying containerized applications: A technical overview. ]