Expanding Podman capabilities to deploy SIF-formatted containers

31 de maio de 20224 minutos (tempo de leitura)Containers

Senior Software Engineer

Global Evangelist

Interest in using Podman to run containerized High Performance Computing (HPC) applications continues to grow. Red Hat has been collaborating with a number of HPC sites and the Exascale Computing Project (ECP) to add features and capabilities integral to the HPC ecosystem directly to Podman and the associated collection of tools. One of the reasons the HPC community has shown interest in Podman is its reliance on common, accepted standards and practices, such as those defined by the Open Container Initiative (OCI).

However, a number of large supercomputing sites have historically deployed containers created using the Singularity Image Format (SIF). Dating back to October of 2017, when Yannik Côté first committed the reference implementation of the SIF in the C language, this format has advantages for the HPC ecosystem that has created many SIF containers over the last five years.

To accommodate organizations that would like to use Podman and have existing SIF container images for their projects, the Podman community decided to implement SIF image support.

What is SIF and why does it matter?

A SIF container image is constructed as a single file that encapsulates an entire file system and a number of predefined and user-defined annotations and arbitrary data. This differs somewhat from OCI/Docker-type containers in that Singularity containers ship data and the application together, whereas data would normally be maintained outside a container when running with Podman, Docker, Kubernetes, etc.

In HPC, examples of this additional user-specified information are transitory data and checkpoints. With everything contained in a single file, a SIF-formatted container is very portable and can be transported more easily and efficiently than trying to move workloads with data and applications separately. These properties made SIF images well suited for running simultaneously on thousands of compute nodes.

Another important reason for the creation of SIF, and its consequent use in HPC, was the ability to guarantee that the immutable runtime container image does not change during the lifecycle of the container.

Image: SIF internal representation

To support immutable objects, SIF provides the ability to link the desired data with a signature block that cryptographically encrypts that data, forcing a creation of a new descriptor that is also cryptographically signed in addition to the header. This helps to prevent the data being tampered with or otherwise changed as the container moves around. In contrast, other popular container formats are rendered as tarballs that create new files every time the archive is expanded.

These factors helped make the decision to expand Podman's addressable audience by bringing SIF support to it. As a result, Yannik Côté collaborated with the Podman engineering team to look under the SIF hood once more

Running SIF images with Podman

Let’s take a closer look into how SIF images can run on Podman. A container image conversion and manipulation library exists to handle containers stored in different formats. Each new container format added to this library is called a transport.

A transport is a module that implements the internal representation of a specific container image type and a way to export universally defined structures used to convert the image into other supported formats. Adding a transport implementation for SIF into “containers/image” was the main task required to enable SIF support in Podman and other utilities, like Skopeo.

An additional effort was needed to define the rules for SIF to OCI conversion. This is because SIF and OCI container formats are very different in their internal implementation, where SIF images can contain a variety of data object types, some of which can even be defined by users.

Verifying SIF support

How can one be sure that SIF containers actually work with Podman? Those in the Singularity community will instantly recognize the “lolcow” container created by Dave Godlove ages ago. It can still be seen in many presentations, demos and tutorials where it is used as an example and a way of testing a simple but wholesome container. What could be more appropriate than lolcow to show Podman and SIF in action?

To prove that things are indeed working as intended, let’s execute the simple run command with lolcow container: podman run -t sif:lolcow_1.0.sif

The screenshot below confirms that a SIF container can run with Podman.

Image: Running the infamous lolcow SIF container in Podman

To try this on your local computer, run the following sequence of commands to download the SIF image and run it with Podman:

$ curl -o lolcow_1.0.sif -L https://tinyurl.com/2pw5rvtw
$ podman run -t sif:lolcow_1.0.sif

Note that executing the lolcow container, as specified above will take a little while the first time because some of the packages installed in the container include thousands of small files that need to be converted.

What’s next for SIF and Podman

The engineering work for using SIF as a valid format was completed, accepted and landed in Podman 4. Users can now pull and run their SIF images natively in Podman with the most recent versions of CentOS Stream, Fedora and RHEL 8 and 9 operating systems. Now the community is looking into enabling several tools from the Podman ecosystem to help inspect and manipulate SIF-formatted images. They are also looking into simplifying some external dependencies, such as having to use `squashfs-tools` from EPEL.

Other features in Podman that should greatly help HPC sites to manage rootless containers (the preferred way of deploying containers in security-sensitive environments) at scale include enablement for LDAP management of /etc/subuid and /etc/subgid and support for native overlay mounting, allowing direct access to the disk from the container for better performance (essentially on par with bare metal access).

And the Podman team is not stopping there. Several relevant features that will come in the near future aim to address common bandwidth limitation issues when pulling containers from the registry.

The Podman team is working on minimizing downloads from the registry by pulling only file-level differences that are based on the checksum, instead of downloading the entire container image layer. This greatly improves local storage utilization since this file-level deduplication effectively eliminates the redundant data.

Additionally, this feature helps with better utilization of kernel memory since checking for duplicate files prior to loading them into the memory could result in better performance and allow to run more containers simultaneously.

We appreciate your interest and attention to these features and if you think these are useful, please do let us know!

Sobre os autores

Yannick Côté

Senior Software Engineer

As a senior software engineer, Yannick Côté is part of of the Core Kernel team at Red Hat working on the Kpatch / Live Patching subsystem for the Linux Kernel. Prior to Red Hat, he worked at Sylabs Inc., where his efforts on data storage and file systems led him to create and develop the Singularity Image Format (SIF), a new container image format, as well as implement cryptographic signing/verification strategies for Singularity containers.

Yan Fisher

Global Evangelist

Yan Fisher is a Global evangelist at Red Hat where he extends his expertise in enterprise computing to emerging areas that Red Hat is exploring.

Fisher has a deep background in systems design and architecture. He has spent the past 20 years of his career working in the computer and telecommunication industries where he tackled as diverse areas as sales and operations to systems performance and benchmarking.

Having an eye for innovative approaches, Fisher is closely tracking partners' emerging technology strategies as well as customer perspectives on several nascent topics such as performance-sensitive workloads and accelerators, hardware innovation and alternative architectures, and, exascale and edge computing.