-
Products & documentation Red Hat AI
A platform of products and services for the development and deployment of AI across the hybrid cloud.
Red Hat AI Enterprise
Build, develop, and deploy AI-powered applications across the hybrid cloud.
Red Hat AI Inference
Optimize model performance with an integrated stack for fast, consistent, and cost-effective inference at scale.
Red Hat Enterprise Linux AI
Develop, test, and run generative AI models to power enterprise applications.
Red Hat OpenShift AI
Build and deploy AI-enabled applications and models at scale across hybrid environments.
-
Learn Basics
-
AI partners
Red Hat AI Inference
Red Hat® AI Inference is an integrated stack that provides fast, consistent, and cost-effective inference at scale.
Quick links
Is AI inference costing you too much? Video duration: 2:16
What is Red Hat AI Inference?
Red Hat AI Inference provides the operational control to run any model on any accelerator across the hybrid cloud.
Powered by vLLM and llm-d, the end-to-end inference stack optimizes token economics and hardware capacity for faster response times. Acting as the engine for agentic AI and Model-as-a-Service patterns, the open source technology increases efficiency without sacrificing performance.
vLLM: the driving open source technology
vLLM is a high-efficiency inference engine that solves GPU utilization issues with lower cost-per-token and stable latency at scale.
With its portable, open source approach and a growing community, vLLM is emerging as the Linux® of gen AI inference.
As a leading commercial contributor, Red Hat offers the unique vLLM expertise to help you reach your AI goals.
The vLLM community today
500K+ GPUs deployed 24/71
200+ Different accelerator types2
500+ Supported model architectures2
24x higher throughput compared to competitors3
Benefits
Hardware and model flexibility
Maintain operational consistency with any model on any hardware and cloud.
Decouple AI from its underlying infrastructure to build a unified Model-as-a-Service architecture and serve models and power agents efficiently.
Manage token economics
Use vLLM and llm-d to increase throughput and reduce cost-per-token.
Optimize existing resources to run agents cost effectively and scale AI sustainably.
Scale predictably
Intelligently distribute inference traffic to serve more users and agents on existing infrastructure.
Manage diverse use cases and demand reliably, from multimodal agentic workflows to RAG-based chatbots and code assistants.
Get early access to llm-d
Red Hat AI Inference now offers early access to llm-d on third-party Kubernetes environments and distributed inference capabilities on Red Hat OpenShift®.
Your models are your choice
Build a unified Model-as-a-Service architecture without rebuilding your AI stack.
Red Hat AI Inference provides operational consistency across any combination of open source models and hardware accelerators. Accelerate deployments with confidence with our collection of cost-optimized models, validated to run efficiently on the Red Hat AI platform.
233% ROI with Red Hat AI
Red Hat commissioned Forrester Consulting to conduct a Total Economic Impact™ (TEI) study and examine the potential return on investment (ROI) enterprises may realize by deploying Red Hat AI.
After interviewing Red Hat AI customers, the analysis found that a composite organization realized an ROI of 233% over 3 years, representing a total value of more than triple their initial investment.4
Product highlights
Get a comprehensive, fully integrated inference stack designed to serve models efficiently at scale.
| Feature | Details | Benefit | |
|---|---|---|---|
llm-d | Run distributed inference capabilities on OpenShift or get early access to llm‑d on third-party Kubernetes environments. | Speed up inference and get more out of your AI infrastructure running on your Kubernetes distributions of choice. | |
Gen-AI specific telemetry | See model-specific performance metrics like Time-to-First Token, KV-cache hit rate, and GPU utilization. | Get insights to meet strict service level objectives (SLOs) and see where your models can improve. | |
Model optimization toolkit | Optimize custom or foundational models with techniques like sparsity or quantization. | Maximize hardware capacity to minimize costs and speed up inference. | |
Sparse Mixture of Experts (MoE) | Run sparse MoE architectures with low-latency agents and sophisticated reasoning models. | Reduce inference costs without sacrificing performance with an efficient model architecture. | |
Certified for all Red Hat products | Red Hat AI Inference’s capabilities are part of Red Hat AI Enterprise and Red Hat OpenShift® AI. It is also supported on Red Hat OpenShift and Red Hat Enterprise Linux. | Use Red Hat products or deploy across Linux and Kubernetes platforms under our third-party support policy. |
How to buy
Red Hat AI Inference is available as a standalone product or as part of Red Hat AI. Its llm-d and vLLM-based capabilities are included in Red Hat AI Enterprise and Red Hat OpenShift AI.
Why choose Red Hat AI?
Build on a trusted foundation that supports any model and any agent on any hardware accelerator—across the hybrid cloud. Red Hat AI gives organizations the freedom to deploy where their data, compliance, and cost requirements demand.
Inference
Manage model complexity with fast, efficient inference powered by vLLM and the control to run any model on any accelerator across the hybrid cloud.
Data
Customize domain-specific agentic AI use cases with models connected to your organization’s own private data.
Agents
Simplify and accelerate your journey to successful agentic AI adoption with governance and control.
Platform
Deploy resilient, trustworthy AI solutions on a foundation of open source transparency and hybrid cloud scalability.
Deploy with partners
Experts and technologies are coming together so our customers can do more with AI. Explore all of the partners working with Red Hat to certify their operability with our solutions.
AI customer stories from Red Hat Summit and AnsibleFest 2025
Turkish Airlines doubled the speed of deployment times with organization-wide data access.
JCCM improved the region's environmental impact assessment (EIA) processes using AI.
Denizbank sped up time to market from days to minutes.
Hitachi operationalized AI across its entire business with Red Hat AI.
Latest Red Hat AI news
Frequently asked questions
Do I need to buy Red Hat AI Enterprise or Red Hat OpenShift AI to use Red Hat AI Inference?
No. You can purchase Red Hat AI Inference as a standalone Red Hat product.
Do I need to buy Red Hat AI Inference and Red Hat AI Enterprise?
No. Red Hat AI Inference’s vLLM and llm-d based capabilities are already part of Red Hat AI Enterprise as well as Red Hat OpenShift AI.
Can Red Hat AI Inference run across Red Hat Enterprise Linux or Red Hat OpenShift?
Yes, it can. Its vLLM-based runtime can also run on third-party Linux and Kubernetes environments under our third-party agreement. It also offers early access to run its llm-d-based distributed inference capabilities on third-party Kubernetes environments.
How is Red Hat AI Inference priced?
It is priced per accelerator.
Explore more AI resources
How to get started with AI at the enterprise
How to get started with AI inference
Scale enterprise AI inference across the hybrid cloud
Webinar: How to boost performance and optimize costs
Contact Sales
Talk to a Red Hatter about Red Hat AI
1Goin, Michael. “[vLLM Office Hours #38] vLLM 2025 Retrospective & 2026 Roadmap - December 18, 2025.” YouTube, Dec. 8, 2025.
2Kwon, Woosuk. “Today, vLLM supports 500+ model architectures, runs on 200+ accelerator types, and powers inference at global scale.” X, Jan. 26, 2026.
3Kwon, Woosuk, et al. “vLLM: Easy, Fast, and Cheap LLM Serving with PagedAttention.” vLLM Blog, 20 June 2023.
4Forrester Consulting study, commissioned by Red Hat. “Forrester Total Economic Impact™ Of Red Hat AI." February 2026.