Rootless Podman and NFS
A lot of people are interested in rootless Podman. This tool lets you build, install, and play with containers without requiring users to run as root, or have a big root-running daemon on their systems. Instead, Podman (by default) stores container images in the user’s home directory. Podman takes advantage of user namespaces in order to do this since most container images have more than one UID in the image. I have explained how this works in previous articles.
One issue that will not work, however, is storing these images in an NFS-based home directory.
Why doesn’t Podman support storage on NFS?
First let me say that, for most use cases, rootless Podman works fine with an NFS volume. The use case that does not work well is having the container image store reside on an NFS mount point.
This problem is most easily understood when a user attempts to pull an image or install an RPM package. Let’s examine what happens when a user attempts to install a tarball or RPM package on a filesystem using rootless Podman.
For our examples, I will use the user
myuser with a UID of 1000, and a UID map setup in
/etc/subuid that looks like this:
The result looks like this:
$ podman unshare cat /proc/self/uid_map
0 1000 1
1 100000 65536
Now, inside of the container I want to install the
httpd package. The
httpd package will install files both as root and as the
apache user with a UID of 60. This fact means that the container’s root process installing the
httpd package on your home directory attempts to run something like this:
$ chown 60:60 /var/www/html/index.html
When this happens on a local filesystem, the kernel checks two things. First, it checks whether UID 60 and GID 60 are mapped inside of the user namespace. Second, it determines whether the process doing the chowning has the DAC_OVERRIDE capability.
Since the process is not running as UID 60, it has to be able to override normal UID/GID permissions. The process inside of the container is running as UID 1000 when running as root, when running as UID 60 inside of the container, it is actually uid 100059 on the host. Note that I'm only talking about the user namespace DAC_OVERRIDE, which means that the process inside of the container can OVERRIDE a UID/GID mapped into the user namespace, such as the container.
This setup works on all local filesystems because the local kernel can make the decisions. When dealing with NFS, you have to satisfy the local kernel as well as the remote kernel. And in the case of NFS, the remote kernel enforces rules.
Look at this issue from the remote NFS server’s kernel’s point of view. The remote kernel sees a process running as UID 1000 (root in the container) trying to
chmod a file owned by 1000 to UID 100059 (UID 60 inside of the container). The remote kernel denies this access.
The NFS protocol has no concept of user namespaces and has no way to know that the process running as UID 1000 is in one. The NFS server also has no way of knowing that the client process has DAC_OVERRIDE for the user namespace and that UID 100059 is mapped into the same user namespace. In other words, the chance of this information being known by NFS is slim at best.
Now, if you have a normal process creating files on an NFS share and not taking advantage of user-namespaced capabilities, everything works fine. The problem comes in when the
root process inside the container needs to do something on the NFS share that requires special capability access. In that case, the remote kernel will not know about the capability and will most likely deny access.
How can I make NFS work with rootless Podman?
There are a couple of ways that you could set up a user’s home directory on an NFS share to use rootless Podman. You could configure the
graphroot flag in the
~/.config/containers/storage.conf file to have storage point at a directory that is not on the NFS share. For example, change:
driver = "overlay"
runroot = "/run/user/1000"
graphroot = "/home/myuser/.local/share/containers/storage"
driver = "overlay"
runroot = "/run/user/1000"
graphroot = "/var/tmp/myuser/containers/storage
This change will cause the images pulled and created within the container to be handled on a different directory, which is outside of the home directory.
Another option would be to create a disk image and mount it onto the
~/.local/share/containers directory. You might use a script like this:
truncate -s 10g /home/myuser/xfs.img
mkfs.xfs -m reflink=1 /home/myuser/xfs.img
Then, you could set up
fstab on the machines with the home directories to do something like this:
$ mount /home/myuser/xfs.img /home/myuser/.local/share/containers
Rootless and rootfull Podman work great with remote network shares mounted as volumes, including NFS shares. However, rootless Podman out of the box will not work well on NFS home directories because the protocol does not understand user namespaces. Luckily, with minor configuration changes, you can use rootless Podman on an NFS home directory.
[New to containers? Download the Containers Primer and learn the basics of Linux containers.]