Scaling enterprise AI: Delivering Models-as-a-Service with Red Hat OpenShift AI 3.4

14. Mai 2026Jonathan Zarecki, Will McGrath, RJ Johnson, Hadar Cohen3 Minuten (Lesedauer)

Discover how Red Hat OpenShift AI 3.4 and Red Hat Connectivity Link deliver Models-as-a-Service (MaaS) to centrally govern and scale enterprise AI model serving.

Many enterprises have moved past the AI pilot phase. Models are running in production and teams are consuming them, but now they're hitting the governance wall. Who controls which team can access which model? Who approved that inference endpoint for customer-facing use? What does the organization owe to a compliance team asking for usage reports?

The answer depends on your starting point. Many organizations are building their AI inference infrastructure from scratch and want the fully-integrated, Kubernetes-native capabilities of Red Hat OpenShift AI. Others have existing, corporate API policies or are using standalone, third party proxies (such as LiteLLM) and need a robust AI platform to integrate with them. OpenShift AI 3.4 excels in both scenarios.

What is Models-as-a-Service?

Models-as-a-Service (MaaS) is an approach to providing AI models as consumable, shared resources via API endpoints, enabling private and scalable AI adoption within the enterprise.

Key benefits of MaaS

Centralized AI governance: OpenShift AI 3.4 eliminates "shadow AI" of teams standing up their own models by providing centralized resources, natively managing token quotas, rate limits, and API keys for enterprise model serving.
Self-service access: Developers gain fast, security-focused API access to approved AI models (whether hosted locally or externally) without relying on IT provisioning tickets.
Kubernetes-native control: The platform's AI gateway capabilities are powered by Red Hat Connectivity Link, delivering a unified, scalable solution for policy management, token rate limiting, and API key self-service.
Cost tracking and visibility: Integrated showback dashboards provide granular tracking of token consumption, allowing administrators to allocate costs across different teams and projects accurately.

Built-in governance: MaaS in OpenShift AI 3.4

MaaS is available as of OpenShift AI 3.4, including the AI inference gateway. There is no additional tooling required, and no separate lifecycle to manage.

The MaaS architecture is straightforward, delivering enterprise-grade control through several key features:

Token quotas and rate limiting: Administrators can define Kubernetes-native CRDs (subscriptions) that dictate specific rate limits and token usage per team to prevent budget overruns.
Self-service API keys: Developers generate their own API keys scoped to their specific subscriptions, which are bound at creation time and instantly revocable.
Showback dashboards (currently in technical preview (TP)): These are embedded directly in the OpenShift AI dashboard, providing aggregate token consumption tracking per model and subscription group.
Enterprise authentication (TP): Authentication flows through Authorino, OpenShift AI's OpenID Connect-compatible authorization layer, supporting integration with existing enterprise identity providers like Microsoft Azure AD, Okta, and Keycloak.
External model routing (TP): An OpenAI-compatible/v1/chat/completionsendpoint routes traffic to locally hosted models (via vLLM) or external providers like AWS Bedrock, Microsoft Azure OpenAI, or Anthropic. Applications don't need to know where the model runs, the gateway takes care of all of that.

OpenShift AI includes essential AI gateway and API management features at no additional cost. These capabilities are powered by the core technology stack of Connectivity Link, providing a built-in, Kubernetes-native way to manage policies and control token rate limits. By using this integrated stack—built on open source standards like Envoy, Kuadrant, and Istio—organizations can avoid the complexity of managing multiple separate proxies.

For broader enterprise needs, organizations can move to the full Connectivity Link product, available through the Red Hat Application Foundations subscription. While the version in OpenShift AI focuses on AI-specific traffic, the full product extends these capabilities across your entire infrastructure. It provides advanced features like multicluster routing, high availability and disaster recovery (HA/DR), and automated DNS management.

Integrating with existing API gateways

OpenShift AI can work for you regardless of which gateway you use. Organizations with legacy API management policies or those temporarily utilizing standalone third-party proxies can continue routing traffic to OpenShift AI-hosted models. While a standalone proxy acts as a single gateway, OpenShift AI provides essential, heavy-lifting platform capabilities, including validated and optimized model serving, GPU-aware Kubernetes scheduling, lifecycle management, OpenShift observability integration, and the full security posture of a Red Hat enterprise product. For organizations routing traffic through an existing AI gateway to models hosted on OpenShift AI, this continues to be a supported approach.

To demonstrate this interoperability, we have published 2 reference integrations, one with LiteLLM and one with Portkey AI Gateway, showing how third-party proxies can connect to OpenShift AI-hosted model endpoints. These documented patterns illustrate how organizations can use external tools alongside OpenShift AI to handle agentic workflows (via frameworks like LlamaStack) and manage cost attribution per team.

What's next

The OpenShift AI 3.4 release includes MaaS as a generally available, production-ready enterprise capability. In the future, we're focused on extending it to help organizations move from being AI token consumers to becoming their own internal token providers. The end goal is to provide a comprehensive AI factory where enterprise AI inference is a security-focused, managed service that any team can use independently, without ever needing to open a ticket to a platform engineer.

Getting started

If you're evaluating AI inference governance for your enterprise, whether through MaaS or alongside your existing gateway, reach out to your Red Hat account team about getting a trial of Red Hat OpenShift AI 3.4.

In the meantime, check out the video demo, Accelerate enterprise software development with NVIDIA and MaaS, or learn more about MaaS through A guide to Models-as-a-Service.

Über die Autoren

Jonathan Zarecki

Principal Product Manager

Jonathan Zarecki is Principal Product Manager for AI data infrastructure at Red Hat, focusing on vendor-neutral solutions that accelerate enterprise AI innovation. He leads product strategy for feature stores, and enterprise AI data management within the Red Hat AI portfolio. Prior to Red Hat, Jonathan was a Co-founder & CPO at Jounce (acquired by Red Hat), where he specialized in MLOps platforms and enterprise AI deployment strategies.

Read full bio

Will McGrath

Senior Principal Product Marketing Manager

Will McGrath is a Senior Principal Product Marketing Manager at Red Hat. He is responsible for marketing strategy, developing content, and driving marketing initiatives for Red Hat OpenShift AI. He has more than 30 years of experience in the IT industry. Before Red Hat, Will worked for 12 years as strategic alliances manager for media and entertainment technology partners.

Read full bio

RJ Johnson

Principal Software Engineer

Building, breaking, and occasionally over-automating with OpenShift, Podman, KServe, and agentic AI—then writing about what actually works in production.

Read full bio

Hadar Cohen

AI Software Engineer

Hadar Cohen is a software engineer specializing in AI and machine learning, with a strong focus on building production-grade algorithms and scalable systems. He works at Red Hat, where he contributes to AI engineering initiatives, including model deployment and infrastructure on OpenShift.

Before joining Red Hat, Hadar worked as a data scientist and algorithms developer, leading the development of machine learning models for risk prediction, onboarding optimization, and identity verification.

Hadar holds a master’s degree in engineering from Ben-Gurion University, where his research focused on interpreting neural networks from an algorithmic perspective, with an emphasis on solving the Boolean Satisfiability Problem (SAT). His work bridges the gap between theoretical understanding and practical application of deep learning systems.

With a background spanning software engineering, signal processing, and AI explainability, Hadar brings a rigorous, systems-level approach to developing intelligent solutions.

Read full bio

Mehr erfahren

Nach Thema durchsuchen

Entdecken Sie alle Themen

Scaling enterprise AI: Delivering Models-as-a-Service with Red Hat OpenShift AI 3.4

What is Models-as-a-Service?

Key benefits of MaaS

Built-in governance: MaaS in OpenShift AI 3.4

Integrating with existing API gateways

What's next

Getting started

Red Hat Learning Subscription | Testversion

Über die Autoren

Jonathan Zarecki

Will McGrath

RJ Johnson

Hadar Cohen

Ähnliche Einträge

Mehr erfahren

Nach Thema durchsuchen

Plattformen

Tools

Testen, kaufen und verkaufen

Kommunizieren

Über Red Hat

Change page language

Red Hat legal and privacy links

Red Hat legal and privacy links