How to lighten the load on your container registry using Quay.io
In this post, I show you how to use Quay.io to host container images, and how to avoid over-taxing your container registry by limiting unnecessary requests for images. I use Buildah, Skopeo, and Quay.io, but the tips on limiting image pulls will work with any container registry you might use.
In late November 2020, Docker Hub started throttling or limiting the number of container images you could pull anonymously or as a Free Docker Hub user. If you're an anonymous user, you can only pull 100 container images in any 6-hour period. If you're a Free Docker Hub user, you can pull 200 container images in any 6-hour period.
When we perform our functional testing of the container tools that we work on, like Buildah and Podman, this limit is generally not a problem. For instance, when you're building a container image using a Containerfile, and then test the resulting container to see how it behaves after you run particular commands on it, you generally pull the main container image specified in the FROM instruction in the Containerfile one time. If you later rebuild the container from scratch, you typically reuse the already pulled-down container image and therefore don't hit the counter. In this scenario, throttling doesn't cause any pain, but it's always in the back of my mind.
Initial reduction of Docker Hub interactions
We did find a spot where we ran into the throttling on Docker Hub though. Ed, my colleague and one of the container engine's QE leads, created a very nice workaround for it. First, a little background. Several months ago, Ed reduced the number of times we fetched the container images that the Buildah Continuous Integration (CI) tests use by reusing the cache that Podman had already created. Before this, the Buildah CI abused the poor
alpine container image that lives in the Docker Hub at
docker.io/library to no end, along with the
busybox, and a few other assorted container images there, pulling them multitudes of times. This prefetching scheme that Ed worked up not only sped up our tests, but it also allowed us to reduce the bandwidth we were using on Docker Hub.
Despite these changes, the Buildah CI began to fail several times a day in November with this error: You have reached your pull rate limit. Hitting the rate limit was due to the number of times our CI tests ran each day. Even though the prefetching had reduced the number of times that the Buildah CI needed to pull the images, the CI was still running into the Docker Hub throttling.
[ You might also like to read: How to implement a simple personal/private Linux container image registry for internal use ]
Solving the throttling
The solution that Ed delivered makes use of Buildah's flexibility and the container tools under the Containers repository on GitHub. First, Ed created a free account on quay.io, copied images there, and made them public. Ed picked quay.io because that's where we store a lot of our container images, and it's convenient for us. Still, it could have been a local container image repository or some other company's repository.
As a bonus, quay.io isn't throttled like Docker Hub.
Using Skopeo to copy the initial image
Let's say that your project requires the
centos:8 images. You would start by creating a free account at quay.io, with a name of myquayaccountname. On a host with Skopeo installed, you would then run:
skopeo login -u myquayaccountname quay.io skopeo copy --all docker://docker.io/library/alpine:latest docker://quay.io/myquayaccountname/alpine:latest
Then repeat, replacing
centos:8, and so forth for all needed images.
The images are now on quay.io, but they're private by default. To make them public, log back into the quay.io web UI, click on each image name. That will take you to a new page showing image details. Click the gear icon on the bottom of the left navbar, find the Make Public button, and press it. You will need to confirm OK and then repeat for all images showing a pink lock icon.
In our case, the first thing Ed did was to pull the container images that we use from Docker Hub and place them into the libpod container image repository on quay.io that he had created.
Configure registries.conf for mirroring
We solved the problem of throttling by moving those images. However, we now had the issue of changing the hundreds if not thousands of the tests' references to those images so that the CI would pull from
quay.io/libpod rather than
docker.io/library. This needed change was a perfect showcase for the flexibility that the container tools afford. Ed addressed this with a relatively small change in the configuration, rather than globally changing all of the tests.
Here's what Ed worked up. When Buildah searches for a container image, it is not hardcoded to just pull from docker.io. Instead, it reads the /etc/containers/registries.conf and determines which container image repository Buildah should pull from.
Ed simply changed that file such that
quay.io/libpod is contacted whenever the tests went looking for
docker.io/library. Using our example from above, you would append the following lines to
/etc/containers/registries.conf on all systems where you want to use your cache:
toml [[registry]] prefix=" docker.io/library" location=" quay.io/myquayaccountname"
podman pull alpine commands will fetch from your mirror. You can see the change Ed made for Podman here in this Pull Request.
To further highlight the mirroring abilities in the containers/image project, which Buildah uses, you can set a mirror for container images allowing you to pull with the old name from a different registry. Mirroring was originally added to support disconnected environments. Environments without internet connectivity running software like OpenShift often can not pull images from non-local registries, so we allow users to mirror the images at internal registries without needing to change the software.
Here's a snippet with more information from the
containers-registries.conf file, which is part of the
$ man containers-registries.conf Remapping and mirroring registries The user-specified image reference is, primarily, a "logical" image name, always used for naming the image. By default, the image reference also directly specifies the registry and repository to use, but the following options can be used to redi‐ rect the underlying accesses to different registry servers or locations (e.g., to support configurations with no access to the internet without having to change Dockerfiles, or to add redundancy).
Caveats: This procedure does a one-time copy of the container images. Your cached image will not magically pick up security fixes pushed to docker.io. (Neither will it pick up random vandalism such as removed binaries or other breaking changes—don't get me started.)
Given the caveat, the image maintenance is up to you now, and you might consider adding the Skopeo commands to copy the image to the start of your test procedure. Another possible workaround is to enable a public mirror such as the Google Cloud Registry (GCR) or possibly further refining the
registries.conf file to set-up multiple mirrors. Better yet, this is probably a great fit for the skopeo-sync command as it has a nice CLI and can be used with a YAML file offering a wide array of configuration options.
[ Getting started with containers? Check out this free course. Deploying containerized applications: A technical overview. ]
There are various ways to solve the throttling Docker Hub put in place, but the method Ed used was quick, painless, and got our CI back online quickly. Now that we've some breathing room, we can work on a more complete solution.
With this change in place, the Buildah tests no longer run over the limit and hit throttling from Docker Hub, so the throttling problem is solved.