Podman in Kubernetes/OpenShift

In part one, the focus was on Podman in Podman scenarios. We saw some of the different rootful and rootless Podman combinations. We also discussed the ramifications of the --privileged flag.

But what about Podman and Kubernetes? There are plenty of options available for relating these two services, as well.

For part two of the series, I am using a Kubernetes cluster running with CRI-O as the runtime.

Rootful Podman with the privileged flag set

Here we're running a privileged container with the root user so that Podman will run as root inside the container.

Here is the YAML file: rootful-priv.yaml :

apiVersion: v1 kind: Pod metadata: name: podman-priv spec: containers: - name: priv image: quay.io/podman/stable args: - sleep - "1000000" securityContext: privileged: true

➜ kubectl exec -it podman-priv -- sh sh-5.0# id uid=0(root) gid=0(root) groups=0(root) sh-5.0# podman run ubi8 echo hello Resolved "ubi8" as an alias (/etc/containers/registries.conf.d/shortnames.conf) Trying to pull registry.access.redhat.com/ubi8:latest... Getting image source signatures Copying blob fdb393d8227c done Copying blob 6b536614e8f8 done Copying config 4199acc83c done Writing manifest to image destination Storing signatures hello

We can also successfully build images inside the privileged container with rootful Podman. Let's build an image where we install BusyBox on Fedora.

sh-5.0# cat Containerfile FROM fedora RUN dnf install -y busybox ENV foo=bar sh-5.0# podman build -t myimage -f Containerfile . STEP 1: FROM fedora STEP 2: RUN dnf install -y busybox Fedora 33 openh264 (From Cisco) - x86_64 3.0 kB/s | 2.5 kB 00:00 Fedora Modular 33 - x86_64 1.4 MB/s | 3.3 MB 00:02 Fedora Modular 33 - x86_64 - Updates 1.3 MB/s | 3.1 MB 00:02 Fedora 33 - x86_64 - Updates 1.6 MB/s | 27 MB 00:16 Fedora 33 - x86_64 3.6 MB/s | 72 MB 00:19 Dependencies resolved. ... Running transaction Preparing : 1/1 Installing : busybox-1:1.32.1-1.fc33.x86_64 1/1 Running scriptlet: busybox-1:1.32.1-1.fc33.x86_64 1/1 Verifying : busybox-1:1.32.1-1.fc33.x86_64 1/1 Installed: busybox-1:1.32.1-1.fc33.x86_64 Complete! --> 734a45854d1 STEP 3: ENV foo=bar STEP 4: COMMIT myimage --> 2326e34ac82 2326e34ac82173c849e0282b6644de5326f6b5bfba8431cf1c1115d846e440e9 sh-5.0# podman images REPOSITORY TAG IMAGE ID CREATED SIZE localhost/myimage latest 2326e34ac821 48 seconds ago 427 MB registry.fedoraproject.org/fedora latest 9f2a56037643 3 months ago 182 MB sh-5.0# podman run myimage busybox BusyBox v1.32.1 (2021-03-22 18:56:41 UTC) multi-call binary. BusyBox is copyrighted by many authors between 1998-2015. Licensed under GPLv2. See source distribution for detailed copyright notices. Usage: busybox [function [arguments]...] or: busybox --list[-full] or: busybox --show SCRIPT or: busybox --install [-s] [DIR] or: function [arguments]... ...

Rootless Podman with the privileged flag set

Here we're running a privileged container with the podman(1000) user so that Podman runs as user 1000 inside the container.

Here is the YAML file: rootless-priv.yaml :

apiVersion: v1 kind: Pod metadata: name: podman-rootless spec: containers: - name: rootless image: quay.io/podman/stable args: - sleep - "1000000" securityContext: privileged: true runAsUser: 1000

➜ kubectl exec -it podman-rootless -- sh sh-5.0$ id uid=1000(podman) gid=1000(podman) groups=1000(podman) sh-5.0$ podman run ubi8 echo hello Resolved "ubi8" as an alias (/etc/containers/registries.conf.d/shortnames.conf) Trying to pull registry.access.redhat.com/ubi8:latest... Getting image source signatures Copying blob 6b536614e8f8 done Copying blob fdb393d8227c done Copying config 4199acc83c done Writing manifest to image destination Storing signatures hello

We can also successfully build images inside the privileged container with rootless Podman. Let's build an image where we install BusyBox on fedora.

sh-5.0$ cat Containerfile FROM fedora RUN dnf install -y busybox ENV foo=bar sh-5.0$ podman build -t myimage -f Containerfile . STEP 1: FROM fedora Resolved "fedora" as an alias (/etc/containers/registries.conf.d/000-shortnames.conf) Getting image source signatures Copying blob 157ab8011454 done Copying config 9f2a560376 done Writing manifest to image destination Storing signatures STEP 2: RUN dnf install -y busybox Fedora 33 openh264 (From Cisco) - x86_64 4.8 kB/s | 2.5 kB 00:00 Fedora Modular 33 - x86_64 462 kB/s | 3.3 MB 00:07 Fedora Modular 33 - x86_64 - Updates 520 kB/s | 3.1 MB 00:06 Fedora 33 - x86_64 - Updates 7.5 MB/s | 27 MB 00:03 Fedora 33 - x86_64 522 kB/s | 72 MB 02:20 Dependencies resolved. ... Installed: busybox-1:1.32.1-1.fc33.x86_64 Complete! --> 92087429448 STEP 3: ENV foo=bar STEP 4: COMMIT myimage --> 16dd65e3f57 16dd65e3f57a5808035b713a6ba3267146caf2a03dd4205097a5727f9d326de9 sh-5.0$ podman images REPOSITORY TAG IMAGE ID CREATED SIZE localhost/myimage latest 16dd65e3f57a About a minute ago 427 MB registry.fedoraproject.org/fedora latest 9f2a56037643 3 months ago 182 MB sh-5.0$ podman run myimage busybox BusyBox v1.32.1 (2021-03-22 18:56:41 UTC) multi-call binary. BusyBox is copyrighted by many authors between 1998-2015. Licensed under GPLv2. See source distribution for detailed copyright notices. Usage: busybox [function [arguments]...] or: busybox --list[-full] or: busybox --show SCRIPT or: busybox --install [-s] [DIR] or: function [arguments]... ...

Rootless Podman without the privileged flag

To eliminate the privileged flag, we need to do the following:

Devices: /dev/fuse is required to use fuse-overlayfs inside of the container, this option tells Podman on the host to add /dev/fuse to the container so that containerized Podman can use it.

is required to use fuse-overlayfs inside of the container, this option tells Podman on the host to add to the container so that containerized Podman can use it. Disable SELinux: SELinux does not allow containerized processes to mount all of the file systems required to run inside a container. So we need to disable SELinux on the host that is running the Kubernetes cluster.

To be able to mount a device in Kubernetes, you first have to create a Device Plugin and then use that in the pod spec.

Here is an example of a Device Plugin for /dev/fuse : https://github.com/kuberenetes-learning-group/fuse-device-plugin/blob/main/fuse-device-plugin-k8s-1.16.yml.

apiVersion: apps/v1 kind: DaemonSet metadata: name: fuse-device-plugin-daemonset namespace: kube-system spec: selector: matchLabels: name: fuse-device-plugin-ds template: metadata: labels: name: fuse-device-plugin-ds spec: hostNetwork: true containers: - image: soolaugust/fuse-device-plugin:v1.0 name: fuse-device-plugin-ctr securityContext: allowPrivilegeEscalation: false capabilities: drop: ["ALL"] volumeMounts: - name: device-plugin mountPath: /var/lib/kubelet/device-plugins volumes: - name: device-plugin hostPath: path: /var/lib/kubelet/device-plugins imagePullSecrets: - name: registry-secret

Here is the YAML file: rootless-no-priv.yaml :

apiVersion: v1 kind: Pod metadata: name: no-priv spec: containers: - name: no-priv image: quay.io/podman/stable args: - sleep - "1000000" securityContext: runAsUser: 1000 resources: limits: github.com/fuse: 1 volumeMounts: - mountPath: /home/podman/.local/share/containers name: podman-local volumes: - name: podman-local hostPath: path: /home/umohnani/.local/share/containers

✗ kubectl exec -it no-priv -- sh sh-5.0$ id uid=1000(podman) gid=1000(podman) groups=1000(podman) sh-5.0$ podman run ubi8 echo hello Resolved "ubi8" as an alias (/etc/containers/registries.conf.d/000-shortnames.conf) Trying to pull registry.access.redhat.com/ubi8:latest... Getting image source signatures Copying blob 55eda7743468 done Copying blob 4b21dcdd136d done Copying config 613e5da7a9 done Writing manifest to image destination Storing signatures hello sh-5.1$ cat containerfile FROM ubi8 RUN echo "hello" ENV foo=bar sh-5.1$ podman build --isolation chroot -t myimage -f containerfile . STEP 1: FROM ubi8 STEP 2: RUN echo "hello" hello --> 096250be78f STEP 3: ENV foo=bar STEP 4: COMMIT myimage --> ea849ac9875 Ea849ac9875eb926d743362bce2e32e90d34fda7a88f28ebd6a1a546db99338f sh-5.1$ podman images REPOSITORY TAG IMAGE ID CREATED SIZE localhost/myimage latest ea849ac9875e 41 seconds ago 245 MB registry.access.redhat.com/ubi8 latest 0724f7c987a7 3 weeks ago 245 MB

Rootful Podman without the privileged flag

Create your device plugin as shown above.

You'll need to add the following capabilities for this:

CAP_SYS_ADMIN is required for the Podman running as root inside of the container to mount the required file systems.

is required for the Podman running as root inside of the container to mount the required file systems. CAP_MKNOD is required for Podman running as root inside of the container to create the devices in /dev. (Note that Docker allows this by default).

is required for Podman running as root inside of the container to create the devices in (Note that Docker allows this by default). CAP_SYS_CHROOT and CAP_SETFCAP are required as they are part of the default list of capabilities in Podman, and when you run a Podman command, it adds the capabilities it needs, so if you run your k8s pod without this capability, Podman fails.

Here is the YAML file: rootful-no-priv.yaml :

apiVersion: v1 kind: Pod metadata: name: no-priv-rootful spec: containers: - name: no-priv-rootful image: quay.io/podman/stable args: - sleep - "1000000" securityContext: capabilities: add: - "SYS_ADMIN" - "MKNOD" - "SYS_CHROOT" - "SETFCAP" resources: limits: github.com/fuse: 1

✗ kubectl exec -it no-priv-rootful -- sh sh-5.0# id uid=0(root) gid=0(root) groups=0(root) sh-5.0# podman run ubi8 echo hello Resolved "ubi8" as an alias (/etc/containers/registries.conf.d/000-shortnames.conf) Trying to pull registry.access.redhat.com/ubi8:latest... Getting image source signatures Copying blob 55eda7743468 done Copying blob 4b21dcdd136d done Copying config 613e5da7a9 done Writing manifest to image destination Storing signatures hello

Podman-remote in a Kubernetes pod with the Podman socket running on the host

You need to do the following to set up for this use case:

Disable SELinux on the host.

Follow this article to enable the Podman socket on your host.

Here is the YAML file: remote.yaml :

apiVersion: v1 kind: Pod metadata: name: podman-remote spec: containers: - name: remote image: quay.io/podman/stable args: - sleep - "1000000" volumeMounts: - mountPath: /var/run/podman name: podman-sock volumes: - name: podman-sock hostPath: path: /var/run/podman

We're leaking the Podman socket that is running on the host into the pod by creating a volume mount for it.

✗ kubectl exec -it podman-remote -- sh sh-5.0# id uid=0(root) gid=0(root) groups=0(root sh-5.0# podman --remote run ubi8 echo hello Resolved "ubi8" as an alias (/etc/containers/registries.conf.d/000-shortnames.conf) Trying to pull registry.access.redhat.com/ubi8:latest... Getting image source signatures Copying blob sha256:55eda774346862e410811e3fa91cefe805bc11ff46fad425dd1b712709c05bbc Copying blob sha256:4b21dcdd136d133a4df0840e656af2f488c226dd384a98b89ced79064a4081b4 Copying config sha256:613e5da7a934e1963e37ed935917e8be6b8dfd90cac73a724ddc224fbf16da20 Writing manifest to image destination Storing signatures hello

Builds with the Podman socket leaked into the container:

sh-5.0# cat /home/podman/Containerfile FROM fedora RUN dnf install -y busybox ENV foo=bar sh-5.0# podman --remote build -t myimage -f Containerfile . STEP 1: FROM fedora STEP 2: RUN dnf install -y busybox Fedora 33 openh264 (From Cisco) - x86_64 4.7 kB/s | 2.5 kB 00:00 Fedora Modular 33 - x86_64 1.8 MB/s | 3.3 MB 00:01 Fedora Modular 33 - x86_64 - Updates 5.2 MB/s | 3.1 MB 00:00 Fedora 33 - x86_64 - Updates 4.3 MB/s | 27 MB 00:06 Fedora 33 - x86_64 1.0 MB/s | 72 MB 01:13 Dependencies resolved. ... Installed: busybox-1:1.32.1-1.fc33.x86_64 Complete! --> 6ef78b975e1 STEP 3: ENV foo=bar STEP 4: COMMIT myimage --> 481c5a0e453 481c5a0e4534573a3872f7cc1ff6806a3ce143edce2ed39568d23efe6f65a292 sh-5.0# podman --remote images REPOSITORY TAG IMAGE ID CREATED SIZE localhost/myimage latest 481c5a0e4534 2 minutes ago 427 MB registry.fedoraproject.org/fedora latest 9f2a56037643 3 months ago 182 MB sh-5.0# podman --remote run myimage busybox BusyBox v1.32.1 (2021-03-22 18:56:41 UTC) multi-call binary. BusyBox is copyrighted by many authors between 1998-2015. Licensed under GPLv2. See source distribution for detailed copyright notices. Usage: busybox [function [arguments]...] or: busybox --list[-full] or: busybox --show SCRIPT or: busybox --install [-s] [DIR] or: function [arguments]... ...

Podman in a locked-down container using user namespaces in Kubernetes

This only works if you are using CRI-O as your runtime engine for your Kubernetes cluster.

We need to add the userns annotation to the runtime (e.g., runc , crun , kata , etc.) you'll be using with CRI-O.

[crio.runtime.runtimes.runc] runtime_path = "" runtime_type = "oci" runtime_root = "/run/runc" allowed_annotations = [ "io.containers.trace-syscall", "io.kubernetes.cri-o.userns-mode", ]

Add the Podman UID/GID ranges to the subuid and subgid files on the host.

✗ cat /etc/subuid umohnani:100000:65536 containers:200000:268435456 ✗ cat /etc/subgid umohnani:100000:65536 containers:200000:268435456

Restart CRI-O after this and then start up your Kubernetes cluster:

✗ sudo systemctl restart cri-o ✗ ./local-cluster-up.sh

Since we're running this without the privileged flag, we need to mount /dev/fuse , as shown in the examples above. So, create your /dev/fuse Device Plugin to be used in the pod spec.

Here is the YAML file: userns.yaml :

apiVersion: v1 kind: Pod metadata: name: podman-userns annotations: io.kubernetes.cri-o.userns-mode: "auto:size=65536;keep-id=true" spec: containers: - name: userns image: quay.io/podman/stable command: ["sleep", "10000"] securityContext: capabilities: add: - "SYS_ADMIN" - "MKNOD" - "SYS_CHROOT" - "SETFCAP" resources: limits: github.com/fuse: 1

We've added the userns annotation to the podspec specifying the range of UIDs/GIDs to use and what ID should be set in the container—it'll be set to the root user in this case.

✗ kubectl exec -it podman-userns -- sh sh-5.0# id uid=0(root) gid=0(root) groups=0(root) sh-5.0# cat /proc/self/uid_map 0 265536 65536 sh-5.0# cat /proc/self/gid_map 0 265536 65536 sh-5.0# podman run ubi8 echo hello Resolved "ubi8" as an alias (/etc/containers/registries.conf.d/000-shortnames.conf) Trying to pull registry.access.redhat.com/ubi8:latest... Getting image source signatures Copying blob 4b21dcdd136d done Copying blob 55eda7743468 done Copying config 613e5da7a9 done Writing manifest to image destination Storing signatures hello

Builds with rootful Podman in a locked-down container with usernamespaces

sh-5.0# cat Containerfile FROM fedora RUN dnf install -y busybox ENV foo=bar sh-5.0# podman build -t myimage -f Containerfile . STEP 1: FROM fedora Resolved "fedora" as an alias (/etc/containers/registries.conf.d/000-shortnames.conf) Getting image source signatures Copying blob 157ab8011454 done Copying config 9f2a560376 done Writing manifest to image destination Storing signatures STEP 2: RUN dnf install -y busybox Fedora 33 openh264 (From Cisco) - x86_64 764 B/s | 2.5 kB 00:03 Fedora Modular 33 - x86_64 348 kB/s | 3.3 MB 00:09 Fedora Modular 33 - x86_64 - Updates 2.2 MB/s | 3.1 MB 00:01 Fedora 33 - x86_64 - Updates 11 MB/s | 27 MB 00:02 Fedora 33 - x86_64 2.1 MB/s | 72 MB 00:34 Dependencies resolved. ... Installed: busybox-1:1.32.1-1.fc33.x86_64 Complete! --> 1b0633e5309 STEP 3: ENV foo=bar STEP 4: COMMIT myimage --> 2212a101136 2212a1011369ee7e6a4a5d4c15a56fc531a5d43ac24f49d432730c620cec4378 sh-5.0# podman images REPOSITORY TAG IMAGE ID CREATED SIZE localhost/myimage latest 2212a1011369 About a minute ago 427 MB registry.fedoraproject.org/fedora latest 9f2a56037643 3 months ago 182 MB sh-5.0# podman run myimage busybox BusyBox v1.32.1 (2021-03-22 18:56:41 UTC) multi-call binary. BusyBox is copyrighted by many authors between 1998-2015. Licensed under GPLv2. See source distribution for detailed copyright notices. Usage: busybox [function [arguments]...] or: busybox --list[-full] or: busybox --show SCRIPT or: busybox --install [-s] [DIR] or: function [arguments]... ...

Final thoughts

Here, in part two of the article series, I demonstrated various use cases related to Podman and Kubernetes interactions. Many of the choices are similar to those we saw in the part one article with Podman in Podman.

Series wrap up

It's common for the Podman team to field questions related to running Podman inside containers. There are many possible approaches to doing this, with various related security concerns.

One of the biggest differentiators is Podman on Podman or Podman within Kubernetes, along with how Docker plays into the discussion.

As you start to implement Podman in these scenarios, don't forget the privileges information discussed at the start of article one, and be sure to weigh the considerations regarding the --privileged flag. Contact the Podman team for more information.

Don't forget that Enable Sysadmin has lots of Podman content.