Following the rise of Linux container use in commercial environments, the adoption of container technologies has gained momentum in technical and scientific computing, commonly referred to as high-performance computing (HPC). Containers can help solve many HPC problems, but the mainstream container engines didn't quite tick all the boxes. Podman is showing a lot of promise in bringing a standards-based, multi-architecture enabled container engine to HPC. Let’s take a closer look.
The trend towards using AI-accelerated solutions often require repackaging of applications and staging the data for easier consumption, breaking up otherwise massively parallel flow of purely computational solutions.
The ability to package application code, its dependencies and even user data, combined with the demand to simplify sharing of scientific research and findings with a global community across multiple locations, as well as the ability to migrate said applications into public or hybrid clouds, make containers very relevant for HPC environments. A number of supercomputing sites already have portions of their workflows containerized, especially those related to artificial intelligence (AI) and machine learning (ML) applications.
Another aspect of why containerized deployments are becoming more and more important for HPC environments is the ability to provide an effective and inexpensive way to isolate the workloads. Partitioning large systems for use by multiple users or multiple applications running side by side has always been a challenge.
The desire to protect applications and their data from other users and potentially malicious actors is not new and has been addressed by virtualization in the past. With Linux cgroups and later with Linux containers the ability to partition system resources with practically no overhead has made containers particularly suitable for HPC environments where achieving maximum system utilization is the goal.
However, most recent implementations of mainstream container runtime environments have been focused on enabling CI/CD pipelines and microservices and have not been able to address supercomputing requirements, prompting the creation of several incompatible implementations just for use in HPC.
Podman and Red Hat Universal Base Image
That landscape changed when Podman arrived. Based on standards from the Open Container Initiative (OCI) Podman's implementation is rootless (does not require superuser privileges) and daemon-less (does not need constantly running background processes), and focuses on delivering performance and security benefits.
Most importantly, Podman and the accompanying container development tools, Buildah and Skopeo, are being delivered with Red Hat Enterprise Linux (RHEL), making it relevant to many HPC environments that have standardized and rely on this operating system (OS).
Another important aspect is that Podman shares many of the same underlying components with other container engines, like CRI-O, providing a proving ground for new and interesting features, and maintaining direct technology linkage to Kubernetes and Red Hat OpenShift. The benefits of technology continuity, the ability to contribute and tinker code at the lowest layers of the stack, and the presence of a thriving community, were the fundamental reasons for Red Hat’s investment in Podman, Buildah and Skopeo.
To further foster collaboration in the community and enable participants to freely redistribute their applications and containers that encapsulate them, Red Hat introduced the Red Hat Universal Base Image (UBI). UBI is an OS container image that does not run directly on bare metal hardware and is not supported as a stand alone entity, however it offers the same proven quality and reliability characteristics as Red Hat Enterprise Linux since it is tested by the same quality, security and performance teams.
UBI offers a different end user license agreement (EULA) that allows users to freely redistribute containerized applications built with it. Moreover, when a container built with UBI image is running on top of Red Hat platforms, like RHEL with Podman or OpenShift, it can inherit support terms from the host system that it runs on. For many sites that are required to run supported software this seamlessly creates a trusted software stack that is based on a verified OS container image.
Podman for HPC
Podman offers several features that are critical to HPC. For example, enabling containers to run with a single UID/GID pair based on the logged-in user’s UID/GID (i.e., no root privileges) and the ability to enforce additional security requirements via advanced kernel features like SELinux and Seccomp. Podman also allows users to set up or disable namespaces, specify mounting points for every container and modify default security controls settings across the cluster, by outlining these tasks in containers.conf file.
To make Podman truly useful for running mainstream HPC it needs the ability to run jobs via Message Passing Interface (MPI). MPI applications still represent the bulk of HPC workloads and that is not going to change overnight. In fact, even AI/ML workflows often use MPI for multi-node execution. Red Hat engineers worked in the community to enable Podman to run MPI jobs with containers. This feature was then made available in RHEL 8 and was further tested and benchmarked against different container runtime implementations by the members of the community and independent researchers resulting in a published paper.
This ecosystem consisting of the container runtime, associated tools and container base image offers tangible benefits to scientists and HPC developers. They can create and prototype containers on their laptop, test and validate containers in a workflow using a single server (referred to as "node" in HPC) and then successfully deploy containers on thousands of similarly configured nodes across large supercomputing clusters using MPI. Moreover, with UBI scientists can now distribute their applications and data within the global community more easily.
All these traits of Podman have not gone unnoticed in the scientific community and at the large national supercomputing sites. Red Hat has a long history of collaborating with supercomputing sites and building software stacks for many TOP500 supercomputers in the world. We have keen interest in the Exascale Computing Project (ECP) and are tracking the next generation of systems that seek to break the exascale threshold. So when ECP kicked off the SuperContainers project, one of ECP’s newest efforts, Andrew Younge of Sandia National Laboratories, a lead investigator for that project, reached out to Red Hat to see how we can collaborate on and expand container technologies for use in first exascale supercomputers, which are expected to arrive as soon as 2021.
Red Hat contributes to upstream Podman and has engineers with deep Linux expertise and background in HPC who were able to work out a multi-phase plan. The plan expedites the development of HPC-friendly features in Podman, Buildah and Skopeo tools that come with Red Hat Enterprise Linux, with the goal of getting these features into Kubernetes and then into OpenShift.
SuperContainers and multiple architectures
The first phase of the collaboration plan with ECP would focus on enabling a single host environment, incorporating UBI for ease of sharing container packages and providing support for accelerators and other special devices that make containers aware of the hardware that exists on the host. In the second phase, we would enable support for container runtime on the vast majority of the pre-exascale systems using MPI, across multiple architectures, like Arm and POWER. And the final phase calls for using OpenShift for provisioning containers, managing their life cycle and enabling scheduling at exascale.
Here is what Younge shared with us in a recent conversation: "When the ECP Supercomputing Containers project (aka SuperContainers) was launched, several container technologies were in use at different Department of Energy (DOE) Labs. However, a more robust production-quality container solution is desired as we are anticipating the arrival of exascale systems. Due to a culture of open source software development, support for standards, and interoperability, we’ve looked to Red Hat to help coalesce container runtimes for HPC."
Sandia National Labs is a home to Astra, the world's first Arm-based petascale supercomputer. Red Hat collaborated with HPE, Mellanox and Marvell to deliver this supercomputer to Sandia in 2018, as a part of the Vanguard program. Vanguard is aimed at expanding the high-performance computing ecosystem by evaluating and accelerating the development of emerging technologies in order to increase their viability for future large-scale production platforms. That collaboration was enabled by Red Hat’s multi-architecture strategy that helps customers design and build infrastructure based on their choice of several commercially available hardware architectures using a fully-open, enterprise-ready software stack.
Astra is now fully operational and Sandia researchers are using it to build and validate containers with Podman on 64-bit Arm v8 architecture. Younge provided the following insight: "Building containers on less widespread architectures such as Arm and POWER can be problematic, unless you have access to servers of the target architecture. Having Podman and Buildah running on Astra hardware is of value to our researchers and developers as it enables them to do unprivileged and user-driven container builds. The ability to run Podman on Arm servers is a great testament to the strength of that technology and the investment that Red Hat made in multi-architecture enablement."
International Supercomputing Conference and the TOP500 list
If you are following or virtually attending the International Supercomputing Conference (ISC) that starts today, be sure to check out "Introduction to Podman for HPC use cases" keynote by Daniel Walsh, senior distinguished engineer at Red Hat. It will be presented during the Workshop on Virtualization in High-Performance Cloud Computing. For a deeper dive into practical implementation of HPC containers be sure to check out the High Performance Container Workshop where a panel of industry experts, including Andrew Younge and engineers from Red Hat, will be providing insights into most popular container technologies and the latest trends.
While it is fascinating to see Red Hat Enterprise Linux running Podman and containers on the world’s first Arm-based supercomputer, according to the latest edition of TOP500 list, published today at ISC 2020, RHEL is also powering the world’s largest Arm supercomputer. Fujitsu's Supercomputer Fugaku is the newest and largest supercomputer in the world and it is running RHEL 8. Installed at RIKEN, Fugaku is based on Arm architecture and is the first ever Arm-based system to top the list with 415.5 Pflop/s score on the HPL benchmark.
RHEL now claims the top three spots on the TOP500 list as it continues to power the #2 and #3 supercomputers in the world, Summit and Sierra, that are based on IBM POWER architecture.
RHEL also underpins six out of the top ten most power-efficient supercomputers on the planet according to the Green500 list.
So what does the road ahead look like for Podman and RHEL in supercomputing?
RHEL serves as the unifying glue that makes many TOP500 supercomputers run reliably and uniformly across various architectures and configurations. It enables the underlying hardware and creates a familiar interface for users and administrators.
New container capabilities in Red Hat Enterprise Linux 8 are paving the way for SuperContainers and can help smooth transition of HPC workloads into the exascale space.
In the meantime, growing HPC capabilities in OpenShift could be the next logical step for successful provisioning and managing containers at exascale while also opening up a path for deploying them into the public or hybrid clouds.
About the author
Yan Fisher is a Global evangelist at Red Hat where he extends his expertise in enterprise computing to emerging areas that Red Hat is exploring.
Fisher has a deep background in systems design and architecture. He has spent the past 20 years of his career working in the computer and telecommunication industries where he tackled as diverse areas as sales and operations to systems performance and benchmarking.