High performance computing 101

11 de novembro de 20214 minutos (tempo de leitura)Containers

Managing Editor, Red Hat Blog

The data is in—massive amounts of it, and high computing power can help enterprises make some sense out of it. For a technology that has gone through ebbs and flows in popularity, high performance computing (HPC) may be expanding to use cases beyond those found in scientific research as more industries can tap into valuable insights gained from artificial intelligence, machine learning, and other emerging technologies.

So, what does this mean to your organization? If you’re increasingly facing the need to translate large amounts of consumer data to track trends or calculate thousands of financial transactions a day to support business growth, is HPC something you should be considering?

Consider this post your beginner’s guide to HPC. We’ll break down some of the basics of HPC so that when you’re ready to take the plunge, you’ll have a better idea of what you’re getting into.

What is HPC?

By aggregating compute power of hundreds or even thousands of compute servers (more on these later), high performance computing is the ability to carry out data-intensive calculations at much higher speeds than a regular desktop computer. We’re talking about computational power that can be measured in petaflops—that’s millions of times faster than a “regular” computer’s power—or even exaflops. This kind of exascale computing is necessary to advance work in countless industries and specialties, from genome sequencing to storm tracking.

Why should I know about HPC?

HPC capabilities were historically used to advance academic research, but with the rise of artificial intelligence and other data-intensive use cases, other industries may begin to realize the advantages of utilizing HPC-like infrastructure and supercomputers, the most powerful category of HPC solutions.

Though you might hear HPC and supercomputing terms used interchangeably, HPC generally refers to the robust processing ability we’ve already covered, while a supercomputer is a single system made up of many computers, interconnected via a very fast network, and the associated storage repository that feeds the data into the processors.

Given the upfront cost, specialized infrastructure and maintenance expenses they require, many organizations consider supercomputers out of reach. This is understandable when you consider that this type of compute power was historically reserved for national research laboratories.

However, with the introduction of open source approaches, HPC solutions have become more accessible. For example, by the 1990s, clusters of massively parallel computers built from industry-standard components began to emerge alongside proprietary machines. This is when Linux and open source software adoption in HPC skyrocketed leading to ubiquitous presence at all high performance computing sites. And recently, since the late 2000s, some of the traditional HPC workloads began to move to cloud resources. In place of monolithic, on-premise supercomputers, the industry has moved towards flexible infrastructure based on open standards, with workloads decoupled from specific hardware.

And now, HPC is also at a point where it’s converging with big data as more and more companies, regardless of the industry, are facing large-scale computational problems stemming from the flood of data and the need to process it within time and cost parameters. Greater compute power allows larger datasets to be ingested more quickly and more permutations to be performed in the same amount of time, ultimately allowing problems to be solved more efficiently.

What makes up an HPC system?

HPC systems are actually groups—or clusters—of computers. The main elements in an HPC system are network, storage and compute. The compute part of the system takes the data given to it via the network and spins up results.

An HPC cluster consists of hundreds or even thousands of compute servers—or nodes—that are networked together. The nodes work in parallel with each other, running smaller parts of a large job simultaneously, which reduces the time it takes to solve one problem. If you’ve heard the term parallel processing, this is where it comes from. Therein lies the main advantage of HPC, although, we are starting to see an increasing number of workflows that use hardware and software resources in a predefined sequence or in multiple consecutive stages.

The nodes need to be able to talk to one another to work in harmony—and computers talk over networks. Networking makes it possible for the cluster to talk to data storage, and it must be able to support high-speed transfers of data between these components.

And finally, the storage component, which is critical to the performance of HPC applications, must be able to feed and ingest data to and from the compute servers as quickly as it is processed.

Open source and Linux power the world’s top supercomputers

Your HPC cluster won’t run without software. Open source software takes down adoption barriers and makes it easier for the scientific community to collaborate.

One of the most popular choices for running HPC clusters is Linux and Red Hat Enterprise Linux (RHEL), is the operating system of choice for some of the top supercomputers in the world. The list of 500 of the world’s most powerful computer systems is revealed twice a year (in June and November). For almost 30 years, the TOP500 ranking has provided a pulse on the HPC market while encouraging the exchange of data and software throughout the tech industry and beyond.

You can read more about the TOP500 and how Red Hat is paving the way for next-generation HPC based on containers here.

Containers and what’s next for HPC

Containers provide the ability to package application code, its dependencies and user data, simplifying sharing of insights and data across global communities and making migrating applications between distributed sites and clouds easier.

These capabilities make containers relevant for HPC. Podman is just one of the container engines that is showing a lot of promise in the HPC market and as the adoption of containers continues to grow, the need for HPC-aware container orchestration platforms, like the Kubernetes-based Red Hat OpenShift, will come into focus.

Open source software, off-the-shelf hardware, and community-driven standards are critical to HPC’s sustainability. Innovation in the HPC world has been community-driven, and recent adoption of cloud technologies can help make HPC capabilities applicable to organizations outside the realm of academia. Collaboration among the tech industry, academia and other public agencies will ultimately drive expanded capabilities for HPC and clouds alike.

Conclusion

Before the world of digital computers, the term “computer” was used to describe people who performed these mathematical calculations. While humans sure can do a whole lot, computers make it possible to process information beyond our reach. And the computational power of a laptop or desktop computer pales in comparison to high performance computing systems, which can be used to solve some of today’s most complex and important scientific problems.

With open source approaches, HPC infrastructure and applications can span far beyond science, bleeding into enterprise datacenters and moving to clouds. You can take the next step by exploring what Red Hat is doing in HPC.

Sobre o autor

Thanh Wong

Managing Editor, Red Hat Blog

As the Managing Editor of the Red Hat Blog, Thanh Wong works with technical subject matter experts to develop and edit content for publication. She is fascinated with learning about new technologies and processes, and she's vested in sharing how they can help solve problems for enterprise environments. Outside of Red Hat, Wong hears a lot about the command line from her system administrator husband. Together, they're raising a young daughter and live in Maryland.