Rootless Podman containers is a really cool feature that allows users to run almost all containers in their home directory without requiring any additional privileges.
Rootless containers take advantage of the user namespace, as I explained in this blog.
Sometimes the user namespace and other container security layers like SELinux make it more difficult to share content inside the container. We have seen many users who want to share system directories into their containers but fail with permission errors. These directories are usually shared via some group access, which allows the user to read/write content in the directories.
[ You might also like: Rootless containers using Podman ]
For example, the user might have a directory
/mnt/engineering on their system, which is owned by
root and group
eng, and with permissions set to 770.
Let's take a look at an example.
# groupadd eng
Make the share directory:
# mkdir /var/lib/mycontainers/
Change the group access for the directory:
# chown root:eng /var/lib/mycontainers/
Change the permissions so that the group can write to the directory:
# chmod 770 /mnt/engineering
Change the SELinux type of the directory so that containers can use it:
# chcon -t container_file_t /mnt/engineering
Next, let’s look at the permissions on the directory:
# ls -lZ /mnt/engineering/ -d drwxrwx---. 2 root eng unconfined_u:object_r:user_tmp_t:s0 40 Feb 9 07:24 /mnt/engineering/
Now the user can be added to the
$ grep eng /etc/group Eng:x:14905:dwalsh
Log in to the system and make sure the user is in the
$ id uid=3267(dwalsh) gid=3267(dwalsh) groups=3267(dwalsh),10(wheel),14905(eng) context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
Make sure the user can read/write content in the directory:
$ touch /mnt/engineering/test $ ls -l /mnt/engineering/ total 0 -rw-------. 1 dwalsh dwalsh 0 Feb 9 07:36 test
Now run a container using this volume, but the container gets permission denied:
$ podman run --userns=keep-id -v /mnt/engineering/:/mnt/engineering ubi8 ls /mnt/engineering/ ls: cannot open directory '/mnt/engineering/': Permission denied
Since we know that SELinux is not blocking since we set the label correctly, what happened?
The problem is user namespace.
$ podman run --userns=keep-id -v /mnt/engineering/:/mnt/engineering ubi8 id uid=3267(dwalsh) gid=3267(dwalsh) groups=3267(dwalsh)
Note that the
--userns=keep-id flag is used to ensure that the UID inside the container is not root but the user’s regular UID. Notice above that when I run the
id command outside of the container, my groups include the
eng group, but when the container is run, the
eng group does not show up. From a security perspective, this is a good thing because if the container processes escaped, they would not have access to directories that I have group access to. If users want to grant access, they have a problem.
The issue is that it is difficult to grant access to the container to these directories. Creating an
eng group inside of the container would not match the
eng group on the host because the user namespace offsets the real group UID.
Luckily the OCI Runtime
crun supports a special feature to leak these additional groups into the container. This ability is covered in the
podman run man page:
man podman run ... Note: if the user only has access rights via a group, accessing the de‐ vice from inside a rootless container will fail. The crun(1) runtime offers a workaround for this by adding the option --annotation run.oci.keep_original_groups=1.
And further explained in the
crun man page:
man crun … run.oci.keep_original_groups=1 If the annotation run.oci.keep_original_groups is present, then crun will skip the setgroups syscall that is used to either set the addi‐ tional groups specified in the OCI configuration, or to reset the list of additional groups if none is specified.
If I set that annotation, my rootless Podman now has access to the volume, as seen below:
$ podman run -ti --annotation run.oci.keep_original_groups=1 --userns=keep-id -v /mnt/engineering/:/mnt/engineering ubi8 ls /mnt/engineering/ -rw-------. 1 dwalsh dwalsh 0 Feb 9 12:36 test
We use an annotation since the OCI Specification currently does not have a way to tell this to the OCI Runtime. We have suggested adding it to the specification. At this time, no other OCI Runtime other than
crun can handle this, including
runc. Perhaps in the future, this feature will get added to the OCI.
If the user wants to make all of their containers share the users groups, they could add this annotation to the
containers.conf in their home directories.
$ cat ~/.config/containers/containers.conf [containers] annotations=["run.oci.keep_original_groups=1",]
Now even the default Podman can create content in the volume, and user processes outside of the container see the correct content.
$ podman run -ti --userns=keep-id -v /mnt/engineering/:/mnt/engineering ubi8 touch /mnt/engineering/test2 $ ls -l /mnt/engineering/ total 0 -rw-------. 1 dwalsh dwalsh 0 Feb 9 07:36 test -rw-r--r--. 1 dwalsh dwalsh 0 Feb 9 13:36 test2
[ Getting started with containers? Check out this free course. Deploying containerized applications: A technical overview. ]
Sharing content from the host into a container via volume mounting can sometimes be blocked by various security features of containers. Luckily Podman and
crun have advanced features to allow specific containers to share this content and
containers.conf enables users to set up their system to all containers to gain access to these volumes.