Discover how Red Hat OpenShift AI 3.4 and Red Hat Connectivity Link deliver Models-as-a-Service (MaaS) to centrally govern and scale enterprise AI model serving.

Many enterprises have moved past the AI pilot phase. Models are running in production and teams are consuming them, but now they're hitting the governance wall. Who controls which team can access which model? Who approved that inference endpoint for customer-facing use? What does the organization owe to a compliance team asking for usage reports?

The answer depends on your starting point. Many organizations are building their AI inference infrastructure from scratch and want the fully-integrated, Kubernetes-native capabilities of Red Hat OpenShift AI. Others have existing, corporate API policies or are using standalone, third party proxies (such as LiteLLM) and need a robust AI platform to integrate with them. OpenShift AI 3.4 excels in both scenarios.   

What is Models-as-a-Service?

Models-as-a-Service (MaaS) is an approach to providing AI models as consumable, shared resources via API endpoints, enabling private and scalable AI adoption within the enterprise.

Key benefits of MaaS

  • Centralized AI governance: OpenShift AI 3.4 eliminates "shadow AI" of teams standing up their own models by providing centralized resources, natively managing token quotas, rate limits, and API keys for enterprise model serving.
  • Self-service access: Developers gain fast, security-focused API access to approved AI models (whether hosted locally or externally) without relying on IT provisioning tickets.
  • Kubernetes-native control: The platform's AI gateway capabilities are powered by Red Hat Connectivity Link, delivering a unified, scalable solution for policy management, token rate limiting, and API key self-service.
  • Cost tracking and visibility: Integrated showback dashboards provide granular tracking of token consumption, allowing administrators to allocate costs across different teams and projects  accurately.

Built-in governance: MaaS in OpenShift AI 3.4

MaaS is available as of OpenShift AI 3.4, including the AI inference gateway. There is no additional tooling required, and no separate lifecycle to manage.

The MaaS architecture is straightforward, delivering enterprise-grade control through several key features:

  • Token quotas and rate limiting: Administrators can define Kubernetes-native CRDs (subscriptions) that dictate specific rate limits and token usage per team to prevent budget overruns.
  • Self-service API keys: Developers generate their own API keys scoped to their specific subscriptions, which are bound at creation time and instantly revocable.
  • Showback dashboards (currently in technical preview (TP)): These are embedded directly in the OpenShift AI dashboard, providing aggregate token consumption tracking per model and subscription group.
  • Enterprise authentication (TP): Authentication flows through Authorino, OpenShift AI's OpenID Connect-compatible authorization layer, supporting integration with existing enterprise identity providers like Microsoft Azure AD, Okta, and Keycloak.
  • External model routing (TP): An OpenAI-compatible/v1/chat/completionsendpoint routes traffic to locally hosted models (via vLLM) or external providers like AWS Bedrock, Microsoft Azure OpenAI, or Anthropic. Applications don't need to know where the model runs, the gateway takes care of all of that.

OpenShift AI includes essential AI gateway and API management features at no additional cost. These capabilities are powered by the core technology stack of Connectivity Link, providing a built-in, Kubernetes-native way to manage policies and control token rate limits. By using this integrated stack—built on open source standards like Envoy, Kuadrant, and Istio—organizations can avoid the complexity of managing multiple separate proxies.

For broader enterprise needs, organizations can move to the full Connectivity Link product, available through the Red Hat Application Foundations subscription. While the version in OpenShift AI focuses on AI-specific traffic, the full product extends these capabilities across your entire infrastructure. It provides advanced features like multicluster routing, high availability and disaster recovery (HA/DR), and automated DNS management.

Integrating with existing API gateways 

OpenShift AI can work for you regardless of which gateway you use. Organizations with legacy API management policies or those temporarily utilizing standalone third-party proxies can continue routing traffic to OpenShift AI-hosted models. While a standalone proxy acts as a single gateway, OpenShift AI provides essential, heavy-lifting platform capabilities, including validated and optimized model serving, GPU-aware Kubernetes scheduling, lifecycle management, OpenShift observability integration, and the full security posture of a Red Hat enterprise product. For organizations routing traffic through an existing AI gateway to models hosted on OpenShift AI, this continues to be a supported approach. 

To demonstrate this interoperability, we have published 2 reference integrations, one with LiteLLM and one with Portkey AI Gateway, showing how third-party proxies can connect to OpenShift AI-hosted model endpoints. These documented patterns illustrate how organizations can use external tools alongside OpenShift AI to handle agentic workflows (via frameworks like LlamaStack) and manage cost attribution per team.

What's next

The OpenShift AI 3.4 release includes MaaS as a generally available, production-ready enterprise capability. In the future, we're focused on extending it to help organizations move from being AI token consumers to becoming their own internal token providers. The end goal is to provide a comprehensive AI factory where enterprise AI inference is a security-focused, managed service that any team can use independently, without ever needing to open a ticket to a platform engineer.

Getting started

If you're evaluating AI inference governance for your enterprise, whether through MaaS or alongside your existing gateway, reach out to your Red Hat account team about getting a trial of Red Hat OpenShift AI  3.4.

In the meantime, check out the video demo, Accelerate enterprise software development with NVIDIA and MaaS, or learn more about MaaS through A guide to Models-as-a-Service.

产品试用

红帽培训订阅 | 产品试用

了解红帽培训订阅试用版的优势,弥补技能差距并应对业务挑战

关于作者

Jonathan Zarecki is Principal Product Manager for AI data infrastructure at Red Hat, focusing on vendor-neutral solutions that accelerate enterprise AI innovation. He leads product strategy for feature stores, and enterprise AI data management within the Red Hat AI portfolio. Prior to Red Hat, Jonathan was a Co-founder & CPO at Jounce (acquired by Red Hat), where he specialized in MLOps platforms and enterprise AI deployment strategies.

Will McGrath is a Senior Principal Product Marketing Manager at Red Hat. He is responsible for marketing strategy, developing content, and driving marketing initiatives for Red Hat OpenShift AI. He has more than 30 years of experience in the IT industry. Before Red Hat, Will worked for 12 years as strategic alliances manager for media and entertainment technology partners.

Building, breaking, and occasionally over-automating with OpenShift, Podman, KServe, and agentic AI—then writing about what actually works in production.

Hadar Cohen is a software engineer specializing in AI and machine learning, with a strong focus on building production-grade algorithms and scalable systems. He works at Red Hat, where he contributes to AI engineering initiatives, including model deployment and infrastructure on OpenShift.

Before joining Red Hat, Hadar worked as a data scientist and algorithms developer, leading the development of machine learning models for risk prediction, onboarding optimization, and identity verification.

Hadar holds a master’s degree in engineering from Ben-Gurion University, where his research focused on interpreting neural networks from an algorithmic perspective, with an emphasis on solving the Boolean Satisfiability Problem (SAT). His work bridges the gap between theoretical understanding and practical application of deep learning systems.

With a background spanning software engineering, signal processing, and AI explainability, Hadar brings a rigorous, systems-level approach to developing intelligent solutions.

UI_Icon-Red_Hat-Close-A-Black-RGB

按频道浏览

automation icon

自动化

有关技术、团队和环境 IT 自动化的最新信息

AI icon

人工智能

平台更新使客户可以在任何地方运行人工智能工作负载

open hybrid cloud icon

开放混合云

了解我们如何利用混合云构建更灵活的未来

security icon

安全防护

有关我们如何跨环境和技术减少风险的最新信息

edge icon

边缘计算

简化边缘运维的平台更新

Infrastructure icon

基础架构

全球领先企业 Linux 平台的最新动态

application development icon

应用领域

我们针对最严峻的应用挑战的解决方案

Virtualization icon

虚拟化

适用于您的本地或跨云工作负载的企业虚拟化的未来