The computation capabilities and scale of supercomputers have grown and are expected to continue growing. The next big trend we’re likely to see is exascale computing, where supercomputers will be able to perform at least one billion billion (quintillion) floating point operations per second. At the moment, teams in the United States, Europe, Japan and China are all racing to build and deliver exascale systems in the next 3-5 years. It is no coincidence that this new generation of systems is referred to as "intelligent" supercomputers, as they have nearly enough processing power to simulate a human brain in software.
We recently attended SC18, the leading supercomputing conference, and have several takeaways on what the future looks like for high performance computing (HPC).
Originally projected to arrive this year, based on Moore's law, exascale class systems are now expected to appear by 2021, largely based on innovative approaches in hardware design, system-level optimizations and workload-specific acceleration. Several years ago, HPC visionaries determined that we can no longer rely on commodity Central Processing Unit (CPU) technologies alone to achieve exascale computing and became vigorously involved in innovation around other parts of the system.
According to the latest edition of the Top500 list, Summit, the world’s fastest supercomputer, was benchmarked at 200 petaflops (PFlops) running traditional workloads. Sierra moved up one ranking - from the third to the second fastest machine in the world - with measured performance of 125 PFlops. Both of these systems use artificial intelligence (AI) to run machine learning (ML) applications to solve many complex problems in science, including disciplines like astrophysics, materials science, systems biology, weather modeling and cancer research. Traditional applications use double (64 bit) or single precision (32 bit) calculations, while AI workloads can use half precision (16 bit) floating point calculations, which can be executed much faster. AI-augmented calculations, accelerated by the NVIDIA Tesla V100 GPUs - some of the most powerful computational accelerators currently available, enable researchers to push the limits of computing and use half-precision floating point calculations inside their ML codes when running a specialized genomics workload, achieving nearly two exaflops on Summit. This achievement is a preview of things to come and a proof point that exascale performance will be obtainable in the next few years.
Two research projects that have already successfully pushed the supercomputing boundaries are this year’s Gordon Bell prize winners, both of which used the ORNL Summit system and deep learning to "track the progress over time of parallel computing." Both projects are a part of the U.S. Department of Energy (DOE) National Laboratories; one used scientific computing to battle the opioid epidemic and the other to gain deeper understanding around climate change.
Lassen supercomputer at Lawrence Livermore National Laboratory
To keep up with growing demands for higher computational capabilities in open science research, another supercomputer became available this fall. Lassen, installed at LLNL, is using the same building components as Sierra and landed as #11 on the coveted Top500 list of fastest computers.
"Sierra is the big system and Lassen is a good size in its own right. Lassen’s 40 racks - compared to Sierra’s 240 racks - makes it exactly one sixth of the size of its larger brother, while still topping 20 Petaflops. Lassen serves as an unclassified resource that will be available to our application teams for open science going forward." said Bronis R. de Supinski, chief technology officer, Lawrence Livermore National Laboratory.
Notable performance improvements in modern scalable computing are coming from extensive software work where large quantities of code are being optimized to take advantage of specialized hardware that is quickly becoming an integral part of heterogeneous systems, like Summit, Sierra and Lassen.
Innovation happens at every level, including hardware - the core of every machine. Recent innovations in hardware are being adopted by ARM technologies, like Astra. At SC18, Astra became the first ARM-based supercomputer to make the Top500. While Astra is the first visible system of this kind, we are likely to see many similar models follow from Red Hat partners, such as HPE and Atos, which will run Red Hat Enterprise Linux for ARM. The Red Hat Enterprise Linux operating system serves as the unifying glue that is needed to make these systems work uniformly across various architectures and configurations, enabling underlying hardware and creating a familiar interface for users and administrators on premise, in the cloud and in space.
Model of a section of ISS' Spaceborne Computer at SC18
Supercomputing continues to advance science not only on Earth, but also in space, which was represented at SC18 by HPE and NASA’s Spaceborne Computer. Running on Red Hat Enterprise Linux 6.8, it is the first commercial-off-the-shelf (COTS) computer system sent to space. Spaceborne has been orbiting Earth for over a year - currently onboard the International Space Station (ISS) - where it is working to expand our knowledge of the computational needs for unmanned Mars missions.
Now used in space and to advance scientific study, the possibilities for supercomputers seem endless, especially with extra-terrestrial installations. Following this move, we are likely to see supercomputing become more available to the masses and establish a more user-friendly architecture for scientists and open science, alike.
About the authors
Yan Fisher is a Global evangelist at Red Hat where he extends his expertise in enterprise computing to emerging areas that Red Hat is exploring.
Fisher has a deep background in systems design and architecture. He has spent the past 20 years of his career working in the computer and telecommunication industries where he tackled as diverse areas as sales and operations to systems performance and benchmarking.