Supercomputers and scientific research tend to go hand-in-hand. Designed for solving fundamental scientific problems, such as finding a cure for cancer and harnessing fusion energy, supercomputers recently have increasingly become more open in terms of global collaboration and information exchange among researchers. Contrast this with the fact that, in the past, supercomputer designs and implementations were relatively closed, often built by a single vendor from their inventory of components. Today, this model is evolving: as science becomes more accessible by a global community, supercomputers are becoming much more open via vendor collaboration.
The new Summit supercomputer at the Department of Energy’s (DOE) Oak Ridge National Labs (ORNL) is the next step in this journey. Using standard CPUs from IBM, GPU accelerators from NVIDIA, Infiniband networking from Mellanox, and a standard Linux operating system from Red Hat in Red Hat Enterprise Linux, this system is a result of a multi-year collaboration and highlights a convergence between esoteric and mainstream technologies.
The composition of Summit should not come as a surprise, as modern supercomputers are frequently built from a collection of components that each meet specific needs, rather than using the product line of a single vendor. All of these components are commoditized, in that they can be (and often are) deployed into enterprise datacenters. Also like traditional datacenters, running on this hardware are often Linux distributions - according to the most recent (November 2017) Top500 list, the top ten fastest supercomputers in the world all run a variant of Linux.
The total system design of Summit, consisting of 4,608 IBM compute servers, aims to make it easier to bring research applications to this behemoth. Part of this is the consistent environment provided by Red Hat Enterprise Linux, with which many leading researchers around the world are already familiar. Red Hat Enterprise Linux is also widely deployed in National Labs and research centers around the globe and is a proven platform for large-scale computing across multiple hardware architectures.
Summit uses a new building block-type architecture that supports a wide range of applications, from astrophysics, to material science, to human systems biology. All of which could be augmented by enhanced AI/ML (Artificial Intelligence and Machine Learning) capabilities, making it a truly "intelligent" machine.
Each of Summit’s building blocks include: two IBM POWER9 processors, six NVIDIA Volta V100 GPUs, 512 GB memory, local NVMe storage, and Mellanox Infiniband. A critical part of developing Summit was the close cooperation between all of the partners, beginning with the early enablement of POWER9 CPU in Red Hat Enterprise Linux, then collaborating with NVIDIA on GPU integration, tying the system together via Mellanox interconnect, and, finally, making Summit accessible to end users through the common interface of the operating system provided by Red Hat.
The experience of running 12 leading science applications on Summit as part of the Oak Ridge Early Science program demonstrates the power of the new design in making high performance readily available to researchers. These Early Science results suggest that, Summit is proving to be flexible in hosting a wide range of scientific and technical workloads and has shown to be effective in supporting the next-generation of supercomputing workloads through AI and Machine Learning capabilities.
Why Linux and why Red Hat Enterprise Linux?
Supercomputers are built for running extreme workloads, and Summit is no different. ORNL and the DOE plan to use Summit for workloads that are intended to further not just scientific knowledge, but provide real-world benefits to all of humanity - with that in mind, these aren’t your average applications running in a datacenter.
The compute resources required by Summit and its workloads go well beyond how we would normally talk about flexibility and scalability for IT operations. Supercomputing often pairs standard hardware at scale with additional, highly-specialized components, which is why Summit is using Linux - specifically, Red Hat Enterprise Linux. Red Hat Enterprise Linux forms a common bridge at the operating system to effectively link all of Summit’s resources together, making it easier for individual application stacks to take advantage of the specific resources that they need.
The open nature of Red Hat Enterprise Linux also allows ORNL researchers to keep pace with the high-performance computing innovations in the Linux kernel while retaining a level of stability and support required for running mission-critical workloads. All of this, combined with Red Hat’s expertise in supporting mission-critical open source systems, make Red Hat Enterprise Linux a powerful platform for not only Summit, but also other future supercomputing endeavors in the United States and across the globe.
Summit as the future of supercomputing: Open foundations, close collaboration, and rapid innovation
Summit breaks the mold of what is typically expected from supercomputer architecture, in that it is based on IBM POWER rather than an x86 hardware architecture. This highlights a new path emerging for not just supercomputing, but enterprise computing generally: the need for more seamless multi-architectural support. The broader range of architectural choices available enable organizations to choose the computing backbone that best meets their unique needs, whether it’s a traditional datacenter environment or a high-powered supercomputer like Summit.
Summit is also an example of how high-performing compute resources can be consumed and used to power emerging workloads. In effect, it’s acting as a high-performance test bed for the next wave of enterprise technologies - if this deployment can power massive scientific research projects, how could a traditional enterprise use this configuration to fuel their digital transformation?
But the rapid innovation showcased by Summit must be consumable, and that’s where Red Hat Enterprise Linux comes in. Despite the scale, processing capability, and "intelligence" of Summit’s composition, end users interact with something they understand: Linux, in the form of the world’s leading enterprise Linux platform. Red Hat Enterprise Linux provides a common, stable basis that ties together all of this innovation.
Summit is a current paradigm for technology innovation at scale, but underneath it all, Linux brings it all together. Red Hat is proud to have helped bring Summit to life, and we stand ready to support and collaborate on the future waves of supercomputing and enterprise IT innovation, no matter what they might be.
About the authors
Chris Wright is senior vice president and chief technology officer (CTO) at Red Hat. Wright leads the Office of the CTO, which is responsible for incubating emerging technologies and developing forward-looking perspectives on innovations such as artificial intelligence, cloud computing, distributed storage, software defined networking and network functions virtualization, containers, automation and continuous delivery, and distributed ledger.