Production AI for private and hybrid cloud environments

Develop, train, and deploy AI models and applications

Red Hat® OpenShift® AI is a platform for traditional, agentic, and gen AI that allows IT teams to develop, train, and deploy AI models and applications at scale across private and hybrid cloud environments. OpenShift AI offers organizations an efficient way to deploy an integrated set of common open source and third-party tools to perform gen AI and predictive AI modeling. Adopters gain a collaborative open source toolset and platform for building models and serving these models to production environments in a container-ready format across public and private cloud, on-premise, and edge environments for consistent AI operations. 

As a key component of Red Hat AI, OpenShift AI provides IT operations and platform engineers an environment that is simple to manage, scalable, and security-focused. For data scientists and AI engineers, it provides a comprehensive, unified platform for self-service development and deployment of AI solutions at scale.

OpenShift AI supports gen AI foundation models, which businesses can customize and serve with their own private data to ensure data sovereignty and maintain governance and security controls. Workloads are highly portable and can be distributed across Red Hat OpenShift clusters, independent of their location. The platform is built on Red Hat OpenShift so businesses can get the most out of their AI hardware. It supports various accelerators—like central processing units (CPUs), graphics processing units (GPUs), and extended processing units (XPUs) from NVIDIA, AMD, and Intel—whether they are on-premise or in sovereign or public clouds.

Table 1. Features and benefits of Red Hat OpenShift AI

Highlights

  • Simplifies the adoption of AI into business practices, increases AI adoption, and provides flexibility in AI initiatives to reduce operational complexity.
  • Establishes operational consistency across teams with a user experience that empowers AI engineers, data scientists, and platform teams to collaborate effectively as they scale AI projects into production.
  • Offers flexibility and consistency to build, deploy, and manage AI at scale across any hardware and hybrid cloud, addressing data constraints, privacy, security, and cost control. 

Features

Benefits

Model development and customization

Accelerate AI development using self-service notebooks and integrated development environments (IDEs) preloaded with curated AI/ML libraries. Speed up model development by integrating data ingestion, synthetic data, InstructLab, and retrieval-augmented generation (RAG). AutoRAG and AutoML (previews) automate optimization so teams can focus on more critical projects. 

Model training and experimentation

Cut training time and cost by running distributed workloads across GPU clusters with intelligent hardware allocation and experiment tracking. Versioned artifacts and reproducible workflows keep teams aligned, eliminating repeated work. 

Intelligent GPU and hardware speed 
 

Maximize GPU use and control costs with intelligent workload scheduling, quota enforcement, and priority-based access across NVIDIA, AMD, and other accelerator hardware. Hardware profiles give platform teams real-time visibility into GPU consumption while allowing data scientists to provision accelerators on demand, without requiring operational intervention. 

AI pipelines

Eliminate manual handoffs and reduce human error with automated, versioned AI pipelines. Each tracked run lets teams reproduce, audit, and optimize workflows from experimentation to production without relying on institutional knowledge. 

Optimized model serving

Serve large language models (LLMs) at production scale with high throughput and low latency using vLLM, and deploy predictive ML models using out-of-the-box and custom runtime servers. Achieve cost-efficient distributed inference with the llm-d framework for predictable, scalable performance. Reduce serving cost through LLM Compressor quantization and use a curated catalog of optimized, validated gen AI models to accelerate time to production.

Agentic AI and gen AI user interfaces (UIs)

Speed agentic AI workflows with expanding focus on Agent Ops and connect agents to core platform services. The platform delivers a unified application processing interface (API) layer, model context protocol (MCP), agentic APIs (e.g., the Open Responses API), and a dedicated dashboard experience (AI hub and gen AI studio). MLflow integration provides end-to-end agent traceability and observability, logging LLM calls and tool use for comprehensive visibility.

Model observability and governance 

Monitor model health by continuously tracking performance, data drift, and bias in real time, allowing proactive intervention before quality issues reach users. Pair runtime guardrails with LM Eval and GuideLLM benchmarking to validate models against real-world inference conditions and capture audit trails through MLflow for compliance evidence for governance and regulatory requirements. 

Evaluation 

Prevent costly production failures with EvalHub (preview), a unified evaluation control plane to scientifically benchmark, score, and assess models, RAG pipelines, and AI agents before and during deployment. Built-in domain-specific evaluation collections replace ad-hoc manual testing with reproducible, standardized evaluation suites. 

Catalog and registry

Govern AI assets from a central registry including predictive and gen AI models, MCP servers, metadata, and deployment artifacts. A curated ecosystem of validated models reduces onboarding time while metadata management ensures traceability and compliance across hybrid cloud deployments. 

Feature store 

Reduce data preparation time with a centralized feature store providing consistent, reusable sets. Shared definitions eliminate redundant feature engineering and training-serving skew, accelerating production-ready models. 

Models-as-a-service

Provides AI engineers with self-service API access to approved models via a managed, built-in gateway. Use tracking gives administrators visibility into consumption patterns for showback, quota enforcement, and cost accountability.

AI safety and security

Catch common AI attacks such as jailbreaks, prompt injections, and toxic outputs before production with automated adversarial vulnerability scanning powered by Garak and NVIDIA NeMo Guardrails. Synthetic data generation (SDG preview) creates tailored adversarial test datasets, validating guardrails against realistic threat scenarios, and supporting risk documentation required for AI regulations. 

Disconnected environments and edge

Deploy portable AI workloads across disconnected, air-gapped, and edge environments to meet strict data sovereignty and regulatory compliance.

In addition to the capabilities of OpenShift AI, integrated partner products include:

  • Starburst for distributed data access across diverse datasets.
  • HPE for data lineage and versioning.
  • NVIDIA for performance management of GPUs.
  • AMD for GPU acceleration.
  • Intel for high performance inference on Intel hardware.
  • Elastic and EDB for vector databases with retrieval-augmented generation (RAG) applications.

Next steps