This blog is adapted from a recent conversation I had with University of California, Berkeley’s Ion Stoica, featured in Red Hat Research Quarterly’s article, From silos to startups: Why universities must be a part of industry’s AI growth. Read our full conversation here.
For the last several years, the narrative around artificial intelligence (AI) has been dominated by large language models (LLMs) and the monumental effort of training them. The technology industry has been focused on the discovery phase—but that era is rapidly shifting.
The conversation is moving from, "How do we build the model?" to, "How do we actually run the model in production at scale?"
This shift is more than a technical detail; it’s the new center of gravity for enterprise AI. When AI leaves the research lab and becomes a core business capability, the focus lands squarely on inference—the firing synapses in a trained model’s “brain” before it generates an answer or takes action. And in the enterprise, inference must be fast, cost-effective, and fully controlled.
The open source answer to the inference challenge
Moving AI from a proof-of-concept into a reliable, production-grade service introduces significant complexity, cost, and control challenges for IT leaders.
Firstly, the hardware required to run these models—especially at the scale the enterprise needs—is expensive and often scarce. Secondly, demand is unpredictable. You might have bursts of high usage followed by long periods of low activity, which can be compounded across hundreds of variants of domain-purposed models. This variability makes it extremely difficult to maximize resource utilization and protect those critical investments.
We’ve seen the open source community rise to this challenge by focusing on performance and efficiency optimizations for serving LLMs. One of the most successful projects leading this charge is vLLM, which was established under Ion Stoica’s leadership at the Sky Computing Lab at the University of California, Berkeley. As Ion mentioned in our conversation, this academic root is crucial; it demonstrates how university research is directly solving the most pressing, real-world inference problems. vLLM has quickly become the de facto for high-performance LLM serving—an engine designed for speed and efficiency to maximize throughput and minimize latency.
Hardening community innovation for the enterprise
Community projects like vLLM are where innovation begins, but they must be adapted to meet the rigorous demands of enterprise production environments. That's where Red Hat’s value as the trusted Linux and Kubernetes expert comes into play.
We are taking the groundbreaking work of vLLM and combining it with other community-driven projects to create a hardened, supported, and scalable platform for production AI. A key component in this evolution is llm-d, a distributed inference framework for managing LLMs at cluster scale and beyond.
By integrating llm-d, we are fundamentally changing how LLMs run natively on Kubernetes. This brings the proven value of container orchestration—control, consistency, and efficient resource scheduling—to the most challenging phase of AI thus far: high-volume, variable-demand inference.
This combination allows organizations to:
- Maximize infrastructure spend: By leveraging Kubernetes orchestration, we enable the distributed serving of large models. This allows IT teams to fully utilize their expensive, limited hardware accelerators across multiple workloads and models, treating their infrastructure not as siloed hardware, but as a pool of elastic compute capacity.
- Achieve faster response times: Distributed inference intelligently manages unpredictable demand, ensuring applications get the responses they need without latency spikes.
- Accelerate deployment with confidence: We provide a trusted path from cutting-edge research and community innovation to hardened, supported software. This accelerates time-to-value for AI engineers and gives platform teams the necessary management and governance controls.
The essential open model for AI
Ion and I agree – the innovation pipeline that gave us vLLM and llm-d—starting with academic research, evolving through open source communities, and finally being stabilized and supported for enterprise scale—is the model that will define the next decade of AI adoption.
For AI to truly become an indispensable business tool, it cannot remain isolated in proprietary labs or be confined to proof-of-concepts. It must be accessible, transparent, and built on a foundation that allows for continuous, collaborative improvement. Red Hat’s commitment is to ensure that the open hybrid cloud remains the best place to operationalize this innovation, giving enterprises the foundation they need to own their data, control their destiny, and confidently navigate the evolving AI landscape.
About the author
Brian Stevens is Red Hat's Senior Vice President and Chief Technology Officer (CTO) for AI, where he drives the company's vision for an open, hybrid AI future. His work empowers enterprises to build and deploy intelligent applications anywhere, from the datacenter to the edge. As Red Hat’s CTO of Engineering (2001-2014), Brian was central to the company’s initial growth and the expansion of its portfolio into cloud, middleware, and virtualization technologies.
After helping scale Google Cloud as its VP and CTO, Brian’s passion for transformative technology led him to become CEO of Neural Magic, a pioneer in software-based AI acceleration. Red Hat’s strategic acquisition of Neural Magic in 2025 brought Brian back to the company, uniting his leadership with Red Hat's mission to make open source the foundation for the AI era.
Browse by channel
Automation
The latest on IT automation for tech, teams, and environments
Artificial intelligence
Updates on the platforms that free customers to run AI workloads anywhere
Open hybrid cloud
Explore how we build a more flexible future with hybrid cloud
Security
The latest on how we reduce risks across environments and technologies
Edge computing
Updates on the platforms that simplify operations at the edge
Infrastructure
The latest on the world’s leading enterprise Linux platform
Applications
Inside our solutions to the toughest application challenges
Virtualization
The future of enterprise virtualization for your workloads on-premise or across clouds