Red Hat AI Inference

Red Hat® AI Inference is an integrated stack that provides fast, consistent, and cost-effective inference at scale.

Try It

Talk to a Red Hatter

Quick links

See product documentation

Start learning

Get hands-on training

Download the e-book

Is AI inference costing you too much? Video duration: 2:16

What is Red Hat AI Inference?

Red Hat AI Inference provides the operational control to run any model on any accelerator across the hybrid cloud.

Powered by vLLM and llm-d, the end-to-end inference stack optimizes token economics and hardware capacity for faster response times. Acting as the engine for agentic AI and Model-as-a-Service patterns, the open source technology increases efficiency without sacrificing performance.

Why you should care about AI inference

vLLM: the driving open source technology

vLLM is a high-efficiency inference engine that solves GPU utilization issues with lower cost-per-token and stable latency at scale.

With its portable, open source approach and a growing community, vLLM is emerging as the Linux® of gen AI inference.

As a leading commercial contributor, Red Hat offers the unique vLLM expertise to help you reach your AI goals.

Check out the vLLM community

The vLLM community today

500K+ GPUs deployed 24/7¹

200+ Different accelerator types²

500+ Supported model architectures²

24x higher throughput compared to competitors³

Benefits

Hardware and model flexibility

Maintain operational consistency with any model on any hardware and cloud.

Decouple AI from its underlying infrastructure to build a unified Model-as-a-Service architecture and serve models and power agents efficiently.

Manage token economics

Use vLLM and llm-d to increase throughput and reduce cost-per-token.

Optimize existing resources to run agents cost effectively and scale AI sustainably.

Scale predictably

Intelligently distribute inference traffic to serve more users and agents on existing infrastructure.

Manage diverse use cases and demand reliably, from multimodal agentic workflows to RAG-based chatbots and code assistants.

Get early access to llm-d

Red Hat AI Inference now offers early access to llm-d on third-party Kubernetes environments and distributed inference capabilities on Red Hat OpenShift®.

Learn more about llm-d

Your models are your choice

Build a unified Model-as-a-Service architecture without rebuilding your AI stack.

Red Hat AI Inference provides operational consistency across any combination of open source models and hardware accelerators. Accelerate deployments with confidence with our collection of cost-optimized models, validated to run efficiently on the Red Hat AI platform.

Check out the model repository on Hugging Face

233% ROI with Red Hat AI

Red Hat commissioned Forrester Consulting to conduct a Total Economic Impact™ (TEI) study and examine the potential return on investment (ROI) enterprises may realize by deploying Red Hat AI.

After interviewing Red Hat AI customers, the analysis found that a composite organization realized an ROI of 233% over 3 years, representing a total value of more than triple their initial investment.⁴

Read the study

Product highlights

Get a comprehensive, fully integrated inference stack designed to serve models efficiently at scale.

Get the datasheet

Feature	Details	Benefit
llm-d	Run distributed inference capabilities on OpenShift or get early access to llm‑d on third-party Kubernetes environments.	Speed up inference and get more out of your AI infrastructure running on your Kubernetes distributions of choice.	See documentation
Gen-AI specific telemetry	See model-specific performance metrics like Time-to-First Token, KV-cache hit rate, and GPU utilization.	Get insights to meet strict service level objectives (SLOs) and see where your models can improve.
Model optimization toolkit	Optimize custom or foundational models with techniques like sparsity or quantization.	Maximize hardware capacity to minimize costs and speed up inference.	See documentation
Sparse Mixture of Experts (MoE)	Run sparse MoE architectures with low-latency agents and sophisticated reasoning models.	Reduce inference costs without sacrificing performance with an efficient model architecture.	See documentation
Certified for all Red Hat products	Red Hat AI Inference’s capabilities are part of Red Hat AI Enterprise and Red Hat OpenShift® AI. It is also supported on Red Hat OpenShift and Red Hat Enterprise Linux.	Use Red Hat products or deploy across Linux and Kubernetes platforms under our third-party support policy.	See documentation

How to buy

Red Hat AI Inference is available as a standalone product or as part of Red Hat AI. Its llm-d and vLLM-based capabilities are included in Red Hat AI Enterprise and Red Hat OpenShift AI.

Talk to a Red Hatter

Why choose Red Hat AI?

Build on a trusted foundation that supports any model and any agent on any hardware accelerator—across the hybrid cloud. Red Hat AI gives organizations the freedom to deploy where their data, compliance, and cost requirements demand.

Inference

Manage model complexity with fast, efficient inference powered by vLLM and the control to run any model on any accelerator across the hybrid cloud.

Data

Customize domain-specific agentic AI use cases with models connected to your organization’s own private data.

Agents

Simplify and accelerate your journey to successful agentic AI adoption with governance and control.

Platform

Deploy resilient, trustworthy AI solutions on a foundation of open source transparency and hybrid cloud scalability.

Deploy with partners

Experts and technologies are coming together so our customers can do more with AI. Explore all of the partners working with Red Hat to certify their operability with our solutions.

Browse Red Hat AI partners

AI customer stories from Red Hat Summit and AnsibleFest 2025

Catch up on Summit 2025 highlights

Turkish Airlines doubled the speed of deployment times with organization-wide data access.

JCCM improved the region's environmental impact assessment (EIA) processes using AI.

Denizbank sped up time to market from days to minutes.

Hitachi operationalized AI across its entire business with Red Hat AI.

Latest Red Hat AI news

Visit the Red Hat® Summit newsroom

Red Hat Unites Builders and Operators on the Agentic Future with Major Advancements to Red Hat AI

From inference to agents: Scaling AI in the enterprise with Red Hat AI 3.4

Zero Latency Deploys Red Hat AI Factory with NVIDIA for Distributed Neocloud Network

Red Hat AI Factory with NVIDIA Expands Support for a New Class of Autonomous Agents in the Enterprise

Frequently asked questions

Do I need to buy Red Hat AI Enterprise or Red Hat OpenShift AI to use Red Hat AI Inference?

No. You can purchase Red Hat AI Inference as a standalone Red Hat product.

Do I need to buy Red Hat AI Inference and Red Hat AI Enterprise?

No. Red Hat AI Inference’s vLLM and llm-d based capabilities are already part of Red Hat AI Enterprise as well as Red Hat OpenShift AI.

How to buy Red Hat Enterprise Linux AI

Can Red Hat AI Inference run across Red Hat Enterprise Linux or Red Hat OpenShift?

Yes, it can. Its vLLM-based runtime can also run on third-party Linux and Kubernetes environments under our third-party agreement. It also offers early access to run its llm-d-based distributed inference capabilities on third-party Kubernetes environments.

How is Red Hat AI Inference priced?

It is priced per accelerator.

Explore more AI resources

How to get started with AI at the enterprise

How to get started with AI inference

Scale enterprise AI inference across the hybrid cloud

Webinar: How to boost performance and optimize costs

Contact Sales

Talk to a Red Hatter about Red Hat AI

¹Goin, Michael. “[vLLM Office Hours #38] vLLM 2025 Retrospective & 2026 Roadmap - December 18, 2025.” YouTube, Dec. 8, 2025.

²Kwon, Woosuk. “Today, vLLM supports 500+ model architectures, runs on 200+ accelerator types, and powers inference at global scale.” X, Jan. 26, 2026.

³Kwon, Woosuk, et al. “vLLM: Easy, Fast, and Cheap LLM Serving with PagedAttention.” vLLM Blog, 20 June 2023.

⁴Forrester Consulting study, commissioned by Red Hat. “Forrester Total Economic Impact™ Of Red Hat AI." February 2026.