Red Hat AI Inference

Red Hat® AI Inference is an integrated stack that provides fast, consistent, and cost-effective inference at scale.

Is AI inference costing you too much? Video duration: 2:16

What is Red Hat AI Inference?

Red Hat AI Inference provides the operational control to run any model on any accelerator across the hybrid cloud. 

Powered by vLLM and llm-d, the end-to-end inference stack optimizes token economics and hardware capacity for faster response times. Acting as the engine for agentic AI and Model-as-a-Service patterns, the open source technology increases efficiency without sacrificing performance.

vLLM: the driving open source technology

vLLM is a high-efficiency inference engine that solves GPU utilization issues with lower cost-per-token and stable latency at scale. 

With its portable, open source approach and a growing community, vLLM is emerging as the Linux® of gen AI inference.

As a leading commercial contributor, Red Hat offers the unique vLLM expertise to help you reach your AI goals.

The vLLM community today

500K+ GPUs deployed 24/71

200+ Different accelerator types2

500+ Supported model architectures2

24x higher throughput compared to competitors3

Benefits

Hardware and model flexibility

Maintain operational consistency with any model on any hardware and cloud.

Decouple AI from its underlying infrastructure to build a unified Model-as-a-Service architecture and serve models and power agents efficiently. 

Manage token economics

Use vLLM and llm-d to increase throughput and reduce cost-per-token. 

Optimize existing resources to run agents cost effectively and scale AI sustainably. 

Scale predictably

Intelligently distribute inference traffic to serve more users and agents on existing infrastructure. 

Manage diverse use cases and demand reliably, from multimodal agentic workflows to RAG-based chatbots and code assistants. 

llm-d icon

Get early access to llm-d

Red Hat AI Inference now offers early access to llm-d on third-party Kubernetes environments and distributed inference capabilities on Red Hat OpenShift®. 

Your models are your choice

Build a unified Model-as-a-Service architecture without rebuilding your AI stack. 

Red Hat AI Inference provides operational consistency across any combination of open source models and hardware accelerators. Accelerate deployments with confidence with our collection of cost-optimized models, validated to run efficiently on the Red Hat AI platform. 

233% ROI with Red Hat AI

Red Hat commissioned Forrester Consulting to conduct a Total Economic Impact™ (TEI) study and examine the potential return on investment (ROI) enterprises may realize by deploying Red Hat AI. 

After interviewing Red Hat AI customers, the analysis found that a composite organization realized an ROI of 233% over 3 years, representing a total value of more than triple their initial investment.4

Product highlights

Get a comprehensive, fully integrated inference stack designed to serve models efficiently at scale.

FeatureDetailsBenefit 
llm-d
Run distributed inference capabilities on OpenShift or get early access to llm‑d on third-party Kubernetes environments.Speed up inference and get more out of your AI infrastructure running on your Kubernetes distributions of choice. See documentation
Gen-AI specific telemetry
See model-specific performance metrics like Time-to-First Token, KV-cache hit rate, and GPU utilization. 

Get insights to meet strict service level objectives (SLOs) and see where your models can improve. 

 
Model optimization toolkit
Optimize custom or foundational models with techniques like sparsity or quantization.Maximize hardware capacity to minimize costs and speed up inference.See documentation
Sparse Mixture of Experts (MoE) 
Run sparse MoE architectures with low-latency agents and sophisticated reasoning models. Reduce inference costs without sacrificing performance with an efficient model architecture.See documentation
Certified for all Red Hat products
Red Hat AI Inference’s capabilities are part of Red Hat AI Enterprise and Red Hat OpenShift® AI. It is also supported on Red Hat OpenShift and Red Hat Enterprise Linux.Use Red Hat products or deploy across Linux and Kubernetes platforms under our third-party support policy.See documentation

How to buy

Red Hat AI Inference is available as a standalone product or as part of Red Hat AI. Its llm-d and vLLM-based capabilities are included in Red Hat AI Enterprise and Red Hat OpenShift AI. 

Why choose Red Hat AI?

Build on a trusted foundation that supports any model and any agent on any hardware accelerator—across the hybrid cloud. Red Hat AI gives organizations the freedom to deploy where their data, compliance, and cost requirements demand.

Inference

Manage model complexity with fast, efficient inference powered by vLLM and the control to run any model on any accelerator across the hybrid cloud.

Data

Customize domain-specific agentic AI use cases with models connected to your organization’s own private data.

Agents

Simplify and accelerate your journey to successful agentic AI adoption with governance and control.

Platform

Deploy resilient, trustworthy AI solutions on a foundation of open source transparency and hybrid cloud scalability.

Deploy with partners

Experts and technologies are coming together so our customers can do more with AI. Explore all of the partners working with Red Hat to certify their operability with our solutions. 

Dell Technologies Logo
Cisco logo in a square outline
Intel logo
Nvidia Logo
AMD logo

AI customer stories from Red Hat Summit and AnsibleFest 2025

Turkish Airlines

Turkish Airlines doubled the speed of deployment times with organization-wide data access.

JCCM Logo

JCCM improved the region's environmental impact assessment (EIA) processes using AI.

DenizBank

Denizbank sped up time to market from days to minutes.

Hitachi logo

Hitachi operationalized AI across its entire business with Red Hat AI.

Frequently asked questions

Do I need to buy Red Hat AI Enterprise or Red Hat OpenShift AI to use Red Hat AI Inference?

No. You can purchase Red Hat AI Inference as a standalone Red Hat product. 

Do I need to buy Red Hat AI Inference and Red Hat AI Enterprise?

No. Red Hat AI Inference’s vLLM and llm-d based capabilities are already part of Red Hat AI Enterprise as well as Red Hat OpenShift AI. 

Can Red Hat AI Inference run across Red Hat Enterprise Linux or Red Hat OpenShift?

Yes, it can. Its vLLM-based runtime can also run on third-party Linux and Kubernetes environments under our third-party agreement. It also offers early access to run its llm-d-based distributed inference capabilities on third-party Kubernetes environments. 

How is Red Hat AI Inference priced?

It is priced per accelerator. 

Explore more AI resources

How to get started with AI at the enterprise

How to get started with AI inference

Scale enterprise AI inference across the hybrid cloud

Webinar: How to boost performance and optimize costs

Contact Sales

Talk to a Red Hatter about Red Hat AI

1Goin, Michael. “[vLLM Office Hours #38] vLLM 2025 Retrospective & 2026 Roadmap - December 18, 2025.” YouTube, Dec. 8, 2025.

2Kwon, Woosuk. “Today, vLLM supports 500+ model architectures, runs on 200+ accelerator types, and powers inference at global scale.” X, Jan. 26, 2026. 

3Kwon, Woosuk, et al. “vLLM: Easy, Fast, and Cheap LLM Serving with PagedAttention.” vLLM Blog, 20 June 2023.

4Forrester Consulting study, commissioned by Red Hat. “Forrester Total Economic Impact™ Of Red Hat AI." February 2026.