The ability to use
systemd services to run and manage containers has been requested by users for many years. There were several attempts in Docker’s early days to allow running Docker containers with
systemd, but that functionality turned out to be harder than expected. Why? Systemd must be aware of and have control over the processes running inside the
systemd service to properly manage it. That’s especially important so
systemd can know if the main process is running, and if it’s in a healthy state.
The problem is that Docker’s client-server architecture complicates things. All Docker commands are sent to the Docker daemon, which makes it almost impossible for
systemd to control container processes. Moreover, successful execution of the Docker client does not necessarily imply that the container is up and running. Multiple attempts to improve the situation have been rejected, leaving a lot of room for improvement.
The good news is that Podman is an excellent choice for running containers, and especially so for running them in
systemd services. Let’s take a look at how this works.
systemd service file generation
Podman’s fork and exec architecture allows
systemd to properly control and manage container processes. In fact, Podman makes putting a container into a
systemd service as simple as calling
podman generate systemd $container. Let’s generate a service for a container:
$ podman create -d --name foo fedora:latest top 54502f309f3092d32b4c496ef3d099b270b2af7b5464e7cb4887bc16a4d38597 $ podman generate systemd --name foo # container-foo.service # autogenerated by Podman 1.6.2 # Tue Nov 19 15:49:15 CET 2019 [Unit] Description=Podman container-foo.service Documentation=man:podman-generate-systemd(1) [Service] Restart=on-failure ExecStart=/usr/bin/podman start foo ExecStop=/usr/bin/podman stop -t 10 foo KillMode=none Type=forking PIDFile=/run/user/1000/overlay-containers/54502f309f3092d32b4c496ef3d099b270b2af7b5464e7cb4887bc16a4d38597/userdata/conmon.pid [Install] WantedBy=multi-user.target
systemd service file can now be used to manage the
foo container via
systemd. We can copy the file to
~/.config/systemd/user/container-foo.service and start a rootless container via
systemctl --user start container-foo.service.
Specific versus generic container services
The ability to generate
systemd service files offers a lot of flexibility to users, and intentionally blurs the difference between a container and any other program or service on the host. Since Podman v1.6, we can also generate service files for pods that can conveniently be written to files via the
--files flag. However, all of these generated files are specific to containers and pods that already exist. As shown in the example above, we first have to create a container or pod and can then generate specific service files. But what if we want to run a new container directly via the service? What if we want to share a service file with other users?
After collecting more experience in this domain and receiving feedback from the community, we sat down and reflected on how we can improve and provide a generic service skeleton that can be used in a backwards compatible fashion with already released versions of Podman in the wild. The good news is that we found such backwards compatible service files, which we shall have a closer look at now:
[Unit] Description=Podman in Systemd [Service] Restart=on-failure ExecStartPre=/usr/bin/rm -f /%t/%n-pid /%t/%n-cid ExecStart=/usr/bin/podman run --conmon-pidfile /%t/%n-pid --cidfile /%t/%n-cid -d alpine:latest top ExecStop=/usr/bin/sh -c "/usr/bin/podman rm -f `cat /%t/%n-cid`" KillMode=none Type=forking PIDFile=/%t/%n-pid [Install] WantedBy=multi-user.target
The upper service file sets the restart policy to
on-failure, which instructs
systemd to restart the service when, among other things, the service cannot be started or stopped cleanly, or when the process exits non-zero. The
ExecStart line describes how we start the container, the
ExecStop line describes how we stop and remove the container. In this example, we want to run a simple
alpine:latest container in the background that runs
top. But there are two more flags we should look at:
--conmon-pidfile flag points to a path to store the process ID for the container’s
conmon process. Conmon is a small monitoring tool that Podman uses to perform operations such as keeping ports and file descriptors open, streaming the container logs, and cleaning up once the container has finished. This command also returns the container’s exit code, which is essential for the
systemd service use case, as we can use the
conmon-pidfile as the PIDFile for the same service. If the container exits non-zero,
conmon will as well, and
systemd can report the correct service status and restart it if needed:
[Service] Restart=on-failure ExecStartPre=/usr/bin/rm -f /%t/%n-pid /%t/%n-cid ExecStart=/usr/bin/podman run --conmon-pidfile /%t/%n-pid --cidfile /%t/%n-cid -d alpine:latest top ... PIDFile=/%t/%n-pid
--cidfile flag points to the path that stores the container ID. When running or creating a container, Podman writes the corresponding container ID to the specified path. Doing so allows us to write elegant and generic service files, because we can use the file for stopping or removing the container as well. In the previous example, the
ExecStop line uses a shell trick (i.e.,
-c followed by a set of commands for shell interpretation) for stopping the container. Starting with the upcoming release of Podman v1.7,
podman stop and
podman rm support the
--cidfile flag as well, so we don’t need the upper shell trickery anymore:
[Service] Restart=on-failure ExecStartPre=/usr/bin/rm -f /%t/%n-pid /%t/%n-cid ExecStart=/usr/bin/podman run --conmon-pidfile /%t/%n-pid --cidfile /%t/%n-cid -d alpine:latest top ExecStop=/usr/bin/sh -c "/usr/bin/podman rm -f `cat /%t/%n-cid`" ...
Now, let’s look at the specified paths to the
conmon-pidfile and the
/%t/%n-cid, which deserve some explanation as well. In these statements,
%t is the path to the run time directory’s root (i.e.,
/run/user/$UserID). This is where Podman also stores most of its runtime data. The
%n portion is the full name of the service. Systemd guarantees uniqueness for service names, so we don’t need to worry about potential file name conflicts.
Assuming our service is named
foo and has a user ID of 1000, the corresponding
conmon-pidfile is placed in
/run/user/1000/foo.service-pid, while the
cidfile is placed in
Note: It’s important to set the kill mode to
systemd will start competing with Podman to stop and kill the container processes. which can lead to various undesired side effects and invalid states.
A walk-through example
So much for theory—let’s have a look. First, make sure that the file is accessible to our non-root user.
$ cat ~/.config/systemd/user/container.service [Unit] Description=Podman in Systemd [Service] Restart=on-failure ExecStartPre=/usr/bin/rm -f /%t/%n-pid /%t/%n-cid ExecStart=/usr/bin/podman run --conmon-pidfile /%t/%n-pid --cidfile /%t/%n-cid -d alpine:latest top ExecStop=/usr/bin/sh -c "/usr/bin/podman rm -f `cat /%t/%n-cid`" KillMode=none Type=forking PIDFile=/%t/%n-pid [Install] WantedBy=multi-user.target
Now, we can load and start the service:
$ systemctl --user daemon-reload $ systemctl --user start container.service $ systemctl --user status container.service ● container.service - Podman in Systemd Loaded: loaded (/home/valentin/.config/systemd/user/container.service; disabled; vendor preset: enabled) Active: active (running) since Mon 2019-11-18 15:32:56 CET; 1min 5s ago Process: 189705 ExecStartPre=/usr/bin/rm -f //run/user/1000/container.service-pid //run/user/1000/container.service-cid (code=exited, status=0/SUCCESS) Process: 189706 ExecStart=/usr/bin/podman run --conmon-pidfile //run/user/1000/container.service-pid --cidfile //run/user/1000/container.service-cid -d alpine:latest top (code=exited, status=0/SUCCESS) Main PID: 189731 (conmon) CGroup: /firstname.lastname@example.org/container.service ├─189724 /usr/bin/fuse-overlayfs [...] ├─189726 /usr/bin/slirp4netns [...] ├─189731 /usr/bin/conmon [...] └─189737 top $ podman ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES f20988d59920 docker.io/library/alpine:latest top 12 seconds ago Up 11 seconds ago funny_zhukovsky
Great! Systemd started the service successfully, and Podman reports the container as running as well. Note that I trimmed parts of the upper output for brevity. An important part is the
Main PID, which points to the correct
conmon process. Without explicitly pointing
systemd to the correct process via the
systemd might wrongly choose another process in this cgroup as the main process. There are a few other processes listed (i.e.,
top), and they all run in the same cgroup. Fuse-overlayfs is an implementation of the overlay filesystem in user space via Fuse and
slirp4nets allows unprivileged networking. Both of these tools are essential for running rootless containers with Podman.
Before properly stopping the service via
systemctl --user stop container.service, let’s test the restart policy, which is set to
on-failure. We can cause such a failure by killing the
top process (i.e.,
$ kill -9 189731 $ systemctl --user status container.service ● container.service - Podman in Systemd Loaded: loaded (/home/valentin/.config/systemd/user/container.service; disabled; vendor preset: enabled) Active: active (running) since Mon 2019-11-18 16:09:38 CET; 1min 3s ago [...] Main PID: 191263 (conmon)
We can see that the
Main PID has changed from
191263. That’s an expected outcome, as we killed the container process, which hence exited non-zero. Conmon exited with the same exit code and
systemd correctly restarted the service. Note that the service will also be restarted when we manually stop a container via
podman stop $container, because the
top binary in the
alpine:latest container exits with
143 when stopped with SIGTERM. The
top binary from other distributions (e.g., Fedora) exits with 0 after SIGTERM, so
systemd would not restart the service. Such behavioral differences are extremely important to consider when writing
systemd services, so we need to be careful when setting the restart policy.
Back to work
The nice thing about the generic
systemd service file presented in this article is that it is backwards compatible with versions of Podman running in the wild. May it be Red Hat Enterprise Linux, Fedora, or Ubuntu, users can immediately follow the suggested format. Nonetheless, the Podman team is continuing to improve the support and user experience when running containers in
systemd services. Try it out!
New to containers? Download the Containers Primer and learn the basics of Linux containers.