Using files and devices in Podman rootless containers
One key problem Podman users are having is accessing files and devices that they can use from the host but cannot use while in a container, even if they volume mount the objects into the container.
In this case, we are going to look at supplemental group access. Often, systems are set up with files and devices that are only accessible to specific groups of users. For example, I am in the wheel group on my system, which allows my user to access some administrative controls. Administrators might set up a directory to be shared by multiple users on the system by creating the eng group, adding users to the eng group, and then allowing the eng group to have directory
rwx permissions. Now all users in eng can read and write the directory.
Recently we received an issue where a user was struggling to give access to a GPU device on his system.
He was adding the device using a command like:
$ podman run --device /dev/video0 …
Note: In rootless containers, rootless users cannot create new devices when adding a device to a container. So Podman just bind mounts the device from the container into the host. When in rootfull mode, a new device is created to which processes inside of the container have access.
Podman volume mounts in
/dev/video0, but every time the user attempts to use the device within the container it fails with Permission denied. However, when he checked the device on the host and the groups he was a member of, everything looked correct. For example:
$ ls -l /dev/video0
crw-rw----+ 1 root video 81, 0 May 3 14:06 /dev/video0
He can fully use the device outside of the containers. Realizing that the container process is not in the video group, he then thought of adding the video group to the container to get access. He tried this command:
$ podman run --group-add video --device /dev/video0 …
But it still failed with Permission denied.
When you use
--group-add video, it adds the video group defined inside the container image to the container's primary process, like this:
$ grep video /etc/group
$ podman run --group-add video fedora id
uid=0(root) gid=0(root) groups=0(root),39(video)
Inside of the container, the process has group 39, but this is not the same as group 39 on the host. When running rootless containers, you are using user namespace so that the group is offset by the user namespace you have joined. Here is the namespace:
$ podman unshare cat /proc/self/gid_map
0 3267 1
1 100000 65536
This means that the video group inside the container will be GID 100038 on the host. Take a look at this example:
$ ctr=$(podman run -d --group-add video fedora sleep 100)
$ pid=$(podman top -l hpid | tail -1)
$ grep Groups /proc/$pid/status
In order to access the video device on the host, the process needs real GID=39, so it fails. Rootless users can not force access to the real GID=39 on the host since standard Linux protections block it.
Podman to the rescue
As of Podman 3.2, we have added a new feature,
--group-add keep-groups, which works with the OCI runtime
crun. Ordinarily, when you start a Podman container, the OCI runtime executes the setgroups system call; this changes the main process inside of the container to get the groups defined within the container and also drops the access to the parent process groups. Ordinarily, this is what you want to happen since you don't want an escaped process from a container to get access to your wheel group, for instance.
When you run with
--group-add keep-groups, the OCI container runtime (
crun) does not call the setgroups, so the new container process maintains the groups of its parent process. If the parent process has access to GID=39, the processes inside of the container will still have that group, and they can use the device.
$ podman run --device /dev/video0 --group-add keep-group …
And everything works!
[ Getting started with containers? Check out this free course. Deploying containerized applications: A technical overview. ]
Note that inside the container, the GID 39 is not mapped, so the processes within the container will see this as the nobody group. It looks like this:
$ ./bin/podman run --group-add keep-groups fedora groups
Older versions of Podman have a less user-friendly interface to trigger this behavior in
crun. By adding the
crun will not execute
$ podman run --annotation run.oci.keep_original_groups=1 --device /dev/video0
If you use the
--group-add keep-groups call, you cannot set other groups within the container. Instead, the container can only inherit the parent's groups. The reason for this is that Podman requires the
setgroups call to set additional groups within the container, and this would lose access to the parent's groups. Giuseppe Scrivanohas proposed two patches to allow
setgroups in this situation. This approach is still under discussion. Giuseppe has also opened an issue with the runtime-spec to make this a formal part of the specification and get it into other oci-runtimes like
runc, but it also has not merged yet.
Podman users are running into a problem accessing files and devices within a container, even when the users have access to those resources on the host. We looked at use cases where this problem is exposed and discussed some of the proposed patches to address the issue.
[ Download now: A sysadmin's guide to Bash scripting. ]