The role of the OS in the Age of AI

Technically Speaking with Chris Wright

00:01 — Bronis R. de Supinski, CTO Livermore Computing at Lawrence Livermore National Laboratory

"Exascale Computing is the largest such systems that we're building today. The 8 billion people on Earth, it would take them eight years to do the number of calculations that El Capitan will be able to do in one second."

00:16 — Chris Wright

Just think about that. We're talking about a massive amount of compute power here. It's hard to relate to two exaFLOPS. Certain types of AI workloads align directly to how supercomputers process large amounts of data. But what if you need to run inference at the edge or retrain models in space? There's a whole spectrum of AI that demands more flexibility. Let's explore how adaptable, composable building blocks and new approaches in an operating system can open the door to the future of AI computing.


00:54 — Chris Wright

Supercomputers work on a massive scale and to accommodate they need to be built on precisely defined building blocks. But they aren't designed to be versatile. They're like rocket ships made for one main job. They're all about maxing out performance for big tasks like processing billions of data points in the blink of an eye. AI workloads need more than just raw power. They're distributed and need flexibility and accessibility. And it's exactly these demands that have given rise to cloud computing. It's adaptable, it can grow with our needs and we can tap into it from anywhere.

01:29  — Chris Wright

With tools like containers and Kubernetes, managing complex distributed systems becomes a lot less daunting. Design principles such as composability where components are self-contained and reusable, and immutability, meaning the state remains unchanged, usher in a new era of highly customizable and resilient systems. Immutability in supercomputing is more about a fixed purpose, function-specific, large-scale computer. Immutability in cloud computing is more about smaller composable pieces that when combined can build the power of something like a supercomputer.

02:08 — Raghu Moorthy, Global Director & Principal Engineer at Intel

"So instead of going and making this huge compute investment, we now have choices. I can essentially now just reserve a compute instance in a cloud for a short duration of time and accomplish what I want to accomplish. The other aspect of that is this whole edge AI piece. So I can also go take a trained model, for example, and then fine-tune it using my edge endpoint devices. So that entire chain, both hardware and software put together, we can absolutely optimize every aspect of this as we look at this from a holistic systems perspective."

02:47 — Chris Wright

One of the things I love in computer science is repeatable patterns. And one of the things we're seeing is this compression of compute power where inside a computer it looks like a distributed system. Then that compressed compute power is connected building our new distributed system. From clusters to nodes, containers to pods, even down to the hardware, CPUs, GPUs and more, all becoming part of a grand composable infrastructure. With composable infrastructure, we can dynamically allocate and combine computing resources using this notion of immutable, composable, building blocks to support flexible needs of new workloads like AI and machine learning, dependent on accelerators and dependent on large scale infrastructure. Integrating AI into every application isn't as simple as plug and play, at least not yet, but we're getting there. The key is building a foundation that supports rapid changes in growth, focusing on the fundamentals, storage, hardware, networking, and one thing ties it all together. One thing keeps track of all the dependencies and makes sure everything plays nicely together, the operating system. And there are modern approaches to the OS that address the unique needs of AI.

04:06 — Colin Walters, Senior Principal Software Engineer at Red Hat

"One way we like to describe it is we're updating the OS the same way Kubelet and Kubernetes updates pods. Immutability, reprovisionability, GitOps, that's the idea. We're aligning how containers update with how the host updates in a immutable transactional image-based fashion. And being able to use all your container infrastructure for it."

04:29 — Chris Wright

The operating system is evolving to meet the needs of AI so that we can build a top to bottom optimized stack, starting with the machine learning models at the top, bringing through all of the dependencies with optimizations grounded in hardware.

04:45 — Colin Walters, Senior Principal Software Engineer at Red Hat

"So a really common case is, it's actually an interesting one where the application, whether that's a chatbot or other types of AI workloads become hardware dependent. In Kubernetes land there's actually a way to say this instance has this hardware and make sure that the pods that need that are scheduled that way. And so with this bootable container flow, I can make sure- I can add the drivers or user space that I need on top of RHEL alongside my application image and know when I go to roll it out that it's in that desired state."

05:27 — Chris Wright

So it's the operating system that's doing the hardware enablement, it's the operating system that's bringing together all of the dependencies. And we can look for these immutable composable layers to build an optimized system supporting AI workloads. As we stand on the brink of new frontiers, machine learning, Edge computing, the internet of things, the pressure mounts to not just innovate, but to do so efficiently, flexibly and sustainably. The principles of cloud native design, the lessons from the rise of Kubernetes, they're not just guidelines, they're the keys to unlocking a future where technology evolves in ways we're just beginning to imagine. Composability isn't just about building technology, it's about building the future. And the building blocks of today will shape the innovations of tomorrow. Thanks for watching and let's keep building together.

  • Keywords:
  • AI,
  • ML
Azhar Sayeed

Colin Walters

Senior Principal Software Engineer
Red Hat

Azhar Sayeed

Raghu Moorthy

Global Director & Principal Engineer

Keep learning

Understanding open source software supply chain risks

Security is complex and no one wants to be the weakest link in the chain. Learn more about what you can (and should) do to secure your supply chain.

Read the blog

The future of Red Hat security data

Learn more about the best approach to vulnerability management and how to develop a good understanding of risk and software vulnerabilities.

Read the blog

More like this

Technically Speaking with Chris Wright

Building a Foundation for AI Models

If training AI/ML models didn't demand so much time and data, more organizations could use them. That's why foundation models could be game changers.

Code Comments

Bringing Deep Learning to Enterprise Applications

To realize the power of AI/ML in enterprise environments, users need an inference engine to run on their hardware. Two open toolkits from Intel do precisely that.

Technically Speaking with Chris Wright

Machine Learning Model Drift & MLOps Pipelines

Like houseplants, machine learning models require some attention to thrive. That's where MLOps and ML pipelines come in.

Share our shows

We are working hard to bring you new stories, ideas, and insights. Reach out to us on social media, use our show hashtags, and follow us for updates and announcements.

Presented by Red Hat

Sharing knowledge has defined us from the beginning–ever since co-founder Marc Ewing became known as “the helpful guy in the red hat.” Head over to the Red Hat Blog for expert insights and epic stories from the world of enterprise tech.