For a while, I’ve been working on a project that uses GitLab Runner with Docker as its executor. Since runners are hosted on CentOS 7, everything made sense—until we started looking at moving it to CentOS 8, and our world exploded.
The first thing we were thinking was, since Podman is a drop-in replacement, this should be easy (you can imagine the veracity of that statement yourselves). The truth was, Podman didn’t have the API that GitLab Runner used to manage the containers, so even if we could do everything manually, we weren’t there yet.
OK, back to the drawing board, how about we file a GitLab issue for GitLab Runner to implement Podman as an executor? It turns out that the issue already existed. The paraphrased answer was, "we’re not taking any new executors, but if you write it yourself, we can see if we can take it." My knowledge of the internals of GitLab Runner is worse than my German, and all I can say is "Danke," not even the whole word.
Don't try this at home
This "simple" migration was being anything but, so as a last resort, we thought, there must be a way to install Docker in CentOS8. Well, you can read some "tutorials" that do it, but they make you want to claw your eyes out, so that wasn’t an option. (Seriously, don’t try this at home).
[ You might also like: Improved systemd integration with Podman 2.0 ]
Some time went by, and we temporarily moved to another project that was more urgent. Even though we wanted to move everything to CentOS 8, there was no hurry.
But then a few months ago, we found a post saying that Podman was implementing a Docker compatible REST API. It felt like they were reading our minds; this is exactly what we needed. This should be easy now. (You see where I’m going with this, right?)
We started testing again when Podman 2.0 was released, all happy and optimistic. GitLab Runner was connecting to the socket but failing to create volumes. Then we read the release notes more carefully, and they said that the endpoint for volumes was not implemented yet, but it was in the main tree (soon to be 2.1). So we still had a chance.
A hacky backport
A few days went by; we made a hacky backport of the volumes endpoint to 2.0 and tried the main branch too, but it all was failing, and we had no idea why.
Luckily, Podman 2.1 was released pretty quickly, and we were back on track. We started again, but this time, we took a different approach. After commenting a little bit on the corresponding Podman issue, we started hanging out in #podman on IRC and asking questions about this problem. We got a few answers from the devs, and that’s when things became interesting!
We set up a test repo in GitLab, registered a bunch of runners, and started tackling every problem, one by one. We also set up a Docker instance to use as a baseline but monitored all its communication with GitLab Runner using socat—that way, we could see exactly what was going on and what we needed to match.
We started with an issue where the job seemed to work, but it actually wasn’t doing anything; worst of all, it wasn’t logging anything, so the guys had to fix the logs first and then go back to the main issue. With that out of the way, there was another issue with /dev entries, which was solved too. After a few days of testing, things were starting to look really good; we could run easy one-liners without much trouble. So we thought we were done, but we actually still had a little ways to go.
When we moved to longer running jobs, they were being cut in the middle due to an issue in the idle connection tracking. Fixing that led to another issue with Podman never closing, but the Podman maintainers addressed both of these issues.
Bug for bug
However, there was one issue that had been bugging us from the beginning—it caused us to have to remove the volumes before every run. The thing that nobody tells you about compatibility is that sometimes, to achieve this, you have to be bug-for-bug compatible. Docker has an issue filed over five years ago about the fact that when you ask to create a volume that already exists, it will return "all good" and act like nothing ever happened. Podman, on the other hand, was returning the correct error message (emphasis on "was" because now it acts the same way as Docker does, at least in the compat mode. Bug-for-bug, right?)
With those major issues out of the way, some minor things appeared, but everything was running smoothly, at least as far as we could test.
[ The API owner's manual: 7 best practices of effective API programs ]
So, how are things now? Well, all of the issues that we ran into with Podman all now have fixes in the main branch, and if all goes well, they will be part of the upcoming release of Podman 2.2. That will mark the first Podman release that can run with GitLab Runner right out of the box, without it even knowing that it is talking to Podman. That’s a really major milestone for us.