One of the most asked about topics to folks working on upstream container technologies is running Podman within a container. Most of this has historically been related to Docker in Docker (DIND), but now, people also want to run Podman in Podman (PINP) or Podman in Docker (PIND).
But Podman can be run in multiple ways, rootful and rootless. We end up with people wanting to run various combinations of rootful and rootless Podman:
- Rootful Podman in rootful Podman
- Rootless Podman in rootful Podman
- Rootful Podman in rootless Podman
- Rootless Podman in rootless Podman
You get the picture.
This blog will attempt to cover each combination, starting with a discussion of privileges. We'll start with the PINP scenario here in part one. In part two of the series, we'll cover similar ground but do so within the context of Kubernetes. Be sure to read both articles for a complete picture.
Container engines require privileges
In order to run a container engine like Podman within a container, the first thing you need to understand is that you need a fair amount of privilege.
- Containers require multiple UIDs. Most container images need more than one UID to work. For example, you might have an image with most of the files owned by root, but some owned by the apache user (UID=60).
- Container engines mount file systems and use the system call clone to create user namespaces.
Note: You might need a newer version of Podman. Examples in this blog were run with Podman 3.2.
Our test image
For the examples in this blog, we'll use the
quay.io/podman/stable image, which was built with the idea of finding the best way to run Podman within a container. You can examine how we build this image from the Dockerfile and
containers.conf image in the github.com repo.
# stable/Dockerfile # # Build a Podman container image from the latest # stable version of Podman on the Fedoras Updates System. # https://bodhi.fedoraproject.org/updates/?search=podman # This image can be used to create a secured container # that runs safely with privileges within the container. # FROM registry.fedoraproject.org/fedora:latest # Don't include container-selinux and remove # directories used by yum that are just taking # up space. RUN dnf -y update; yum -y reinstall shadow-utils; \ yum -y install podman fuse-overlayfs --exclude container-selinux; \ rm -rf /var/cache /var/log/dnf* /var/log/yum.* RUN useradd podman; \ echo podman:10000:5000 > /etc/subuid; \ echo podman:10000:5000 > /etc/subgid; VOLUME /var/lib/containers VOLUME /home/podman/.local/share/containers ADD https://raw.githubusercontent.com/containers/libpod/master/contrib/podmanimage/stable/containers.conf /etc/containers/containers.conf ADD https://raw.githubusercontent.com/containers/libpod/master/contrib/podmanimage/stable/podman-containers.conf /home/podman/.config/containers/containers.conf RUN chown podman:podman -R /home/podman # chmod containers.conf and adjust storage.conf to enable Fuse storage. RUN chmod 644 /etc/containers/containers.conf; sed -i -e 's|^#mount_program|mount_program|g' -e '/additionalimage.*/a "/var/lib/shared",' -e 's|^mountopt[[:space:]]*=.*$|mountopt = "nodev,fsync=0"|g' /etc/containers/storage.conf RUN mkdir -p /var/lib/shared/overlay-images /var/lib/shared/overlay-layers /var/lib/shared/vfs-images /var/lib/shared/vfs-layers; touch /var/lib/shared/overlay-images/images.lock; touch /var/lib/shared/overlay-layers/layers.lock; touch /var/lib/shared/vfs-images/images.lock; touch /var/lib/shared/vfs-layers/layers.lock ENV _CONTAINERS_USERNS_CONFIGURED=""
Let’s examine the Dockerfile.
FROM registry.fedoraproject.org/fedora:latest # Don't include container-selinux and remove # directories used by yum that are just taking # up space. RUN dnf -y update; yum -y reinstall shadow-utils; \ yum -y install podman fuse-overlayfs --exclude container-selinux; \ rm -rf /var/cache /var/log/dnf* /var/log/yum.*
First pull fedora latest, and then update to the latest packages. Note it reinstalls
shadow-utils, since there is a known issue in the
shadow-utils install on the Fedora image where the
newsubgid are not set. Reinstalling
shadow-utils fixes the problem. Next, install Podman as well as the
fuse-overlayfs. We don’t install
container-selinux because it is not needed within the container.
RUN useradd podman; \ echo podman:10000:5000 > /etc/subuid; \ echo podman:10000:5000 > /etc/subgid;
Next I create a user
podman and set up the
/etc/subgid files to use 5000 UIDs. This is used to set up User Namespace within the container. 5000 is an arbitrary number and potentially too small. We picked this number because it is smaller than the 65k allocated to rootless users. If you were only running the container as root, 65k would have been a better number.
VOLUME /var/lib/containers VOLUME /home/podman/.local/share/containers
Since we can run rootfull and rootless containers with this image we create two volumes. Rootfull Podman uses
/var/lib/containers for it’s container storage and rootless uses
/home/podman/.local/share/containers. Overlay over overlay is often denied by the kernel, so this creates non overlay volumes to be used within the container.
ADD https://raw.githubusercontent.com/containers/libpod/master/contrib/podmanimage/stable/containers.conf /etc/containers/containers.conf ADD https://raw.githubusercontent.com/containers/libpod/master/contrib/podmanimage/stable/podman-containers.conf /home/podman/.config/containers/containers.conf
I have pre-configured two
containers.conf files to make sure containers run easier in each mode.
The image is set up to run with fuse-overlayfs by default. In certain cases, you could run the kernel's overlay file system for rootful mode, and you'll soon be able to do this in rootless mode. However, for now, we use fuse-overlayfs as our container storage within the container. Other people have used VFS storage driver, but this is not that efficient.
The --privileged flag
The easiest way to run Podman inside of a container is to use the
Rootful Podman in rootful Podman with --privileged
# podman run --privileged quay.io/podman/stable podman run ubi8 echo hello Resolved "ubi8-minimal" as an alias (/etc/containers/registries.conf.d/shortnames.conf) Trying to pull registry.access.redhat.com/ubi8:latest... Getting image source signatures Copying blob sha256:a591faa84ab05242a17131e396a336da172b0e1ec66d921c9f130b7c4c24586d Copying blob sha256:76b9354adec626b01ffb0faae4a217cebd616661fd90c4b54ba4415f53392fb8 Copying config sha256:dc080723f596f2407300cca2c19a17accad89edcf39f7b8b33e6472dd41e30f1 Writing manifest to image destination Storing signatures hello
To save time, since I will be doing a lot of experiments, I created a directory on my host
./mycontainers, which I will volume mount into the container to be used and not have to pull the image each time.
# podman run --privileged -v ./mycontainers:/var/lib/containers quay.io/podman/stable podman run ubi8 echo hello hello
Rootless Podman in rootful Podman with --privileged
quay.io/podman/stable image is set up with a podman user that you can use to run rootless containers.
# podman run --user podman --privileged quay.io/podman/stable podman run ubi8 echo hello Resolved "ubi8" as an alias (/etc/containers/registries.conf.d/shortnames.conf) ... hello
Note in this case, the Podman running inside the container is running as the user podman. This is because the containerized Podman uses the user namespace to create a confined container within the privileged container.
Running rootless Podman in Docker with --privileged
Similar to rootful Podman, you can also run rootful Podman within Docker with the
# docker run --privileged quay.io/podman/stable podman run ubi8 echo hello
Rootless Podman with Docker
# docker run --user podman --privileged quay.io/podman/stable podman run ubi8 echo hello Resolved "ubi8" as an alias (/etc/containers/registries.conf.d/shortnames.conf) ... hello
Can we do this more securely?
Notice that even though we ran the outer containers
--privileged above, the inner containers are running in locked-down mode. The rootless Podman running within the container is really locked down and would have a very difficult time escaping. Given that, I am not a fan of using the
--privileged flag. I believe we can do better from a security perspective.
Running without the --privileged flag
Let's look at how we can remove the
--privileged flag for better security.
Rootful Podman in rootful Podman without --privileged
# podman run --cap-add=sys_admin,mknod --device=/dev/fuse --security-opt label=disable quay.io/podman/stable podman run ubi8-minimal echo hello hello
We can eliminate the
--privileged flag from rootful Podman but still have to disable some security features to make rootful Podman within the container work.
--cap-add=sys_admin,mknodWe need to add two Linux capabilities.
- CAP_SYS_ADMIN is required for the Podman running as root inside of the container to mount the required file systems.
- CAP_MKNOD is required for Podman running as root inside of the container to create the devices in
/dev. (Note that Docker allows this by default).
- Devices: The
--device /dev/fuseflag must use fuse-overlayfs inside the container. This option tells Podman on the host to add
/dev/fuseto the container so that containerized Podman can use it.
- Disable SELinux: The
--security-opt label=disableoption tells the host's Podman to disable SElinux separation for the container. SELinux does not allow containerized processes to mount all of the file systems required to run inside a container.
Rootful Podman in Docker without --privileged
# docker run --cap-add=sys_admin --cap-add mknod --device=/dev/fuse --security-opt seccomp=unconfined --security-opt label=disable quay.io/podman/stable podman run ubi8-minimal echo hello hello
- Note Docker does not support the comma separate
--cap-addcommand, so I had to add sys_admin and mknod separately
- Still needed
--device /dev/fuse, since container defaults to
- Docker always creates builtin volumes as owned by root:root, so we need to create a volume to mount for Podman in the container to be able to use for storage.
- As always, I need to disable SELinux separation
- Also need to disable
seccomp, since Docker has a slightly stricter
seccomppolicy than Podman. You could just use a Podman security policy by using
# docker run --cap-add=sys_admin --cap-add mknod --device=/dev/fuse --security-opt seccomp=/usr/share/containers/seccomp.json --security-opt label=disable quay.io/podman/stable podman run ubi8-minimal echo hello hello
Rootless Podman in rootful Podman without --privileged
Run non-privileged container with Podman inside using a non-root user using the user namespace.
# podman run --user podman --security-opt label=disable --security-opt unmask=ALL --device /dev/fuse -ti quay.io/podman/stable podman run -ti docker.io/busybox echo hello hello
- Note that unlike the rooful within rootful case before, we don't have to add the dangerous security capabilities sys_admin and mknod
- In this case, I am running with
--user podman, which automatically causes the Podman within the container to run within the user namespace
- Still disabling SELinux since it blocks the mounting
- Still need
--device /dev/fuseto use fuse-overlayfs within the container
Podman-remote in rootful Podman with a leaked Podman socket from the host
# podman run -v /run:/run --security-opt label=disable quay.io/podman/stable podman --remote run busybox echo hi hi
In this case, we are leaking the
/run directory from the host into the container. This allows
podman --remote to communicate with the Podman socket on the host and start the container on the host OS. This is often how people execute Docker In Docker, especially Docker builds. You could also execute Podman builds this way and take advantage of images previously pulled to the system.
Note, however, this is extremely insecure. The processes within the container can totally take over the host machine.
- You still need to disable SELinux separation because SELinux would block the container processes from using sockets leaked in
podman --remoteflag is added to tell Podman to work in remote mode. Note you could also just install the
podman-remoteexecutable into a container and use this.
[ Getting started with containers? Check out this free course. Deploying containerized applications: A technical overview. ]
Podman-remote in Docker with a leaked Podman socket from the host
# docker run -v /run:/run --security-opt label=disable quay.io/podman/stable podman --remote run busybox echo hi hi
The same example works for a Docker container.
This example shows a fully locked down container—other than SELinux being disabled—with the Podman socket leaked into the container. SELinux would block this access, as it should.
# /bin/podman run --security-opt=label=disable -v /run/podman:/run/podman quay.io/podman/stable podman --remote run alpine echo hi hi
Rootless Podman with containerized rootful Podman
$ podman run --privileged quay.io/podman/stable podman run ubi8 echo hello Resolved "ubi8" as an alias (/etc/containers/registries.conf.d/shortnames.conf) .. hello
Rootless Podman running rootless Podman
$ podman run --security-opt label=disable --user podman --device /dev/fuse quay.io/podman/stable podman run alpine echo hello
Now you have some context for Podman in Podman options, using both rootful and rootless modes. in various combinations. You also have a better sense of the necessary privileges and the considerations surrounding the
Part two in this series looks at the use of Podman and Kubernetes. The article covers similar territory but within the context of Kubernetes.
[ Want to test your sysadmin skills? Take a skills assessment today. ]