Skip to contentRed Hat AI
  • Our approach

    • News and insights
    • Technical blog
    • Research
    • Live AI events
    • Get an overview
  • Products

    • Red Hat AI Enterprise
    • Red Hat AI Inference Server
    • Red Hat Enterprise Linux AI
    • Red Hat OpenShift AI
    • Explore Red Hat AI
  • Engage & learn

    • AI learning hub
    • AI partners
    • Services for AI
Hybrid cloud
  • Platform solutions

    • Artificial intelligence

      Build, deploy, and monitor AI models and apps.

    • Linux standardization

      Get consistency across operating environments.

    • Application development

      Simplify the way you build, deploy, and manage apps.

    • Automation

      Scale automation and unite tech, teams, and environments.

  • Use cases

    • Virtualization

      Modernize operations for virtualized and containerized workloads.

    • Digital sovereignty

      Control and protect critical infrastructure.

    • Security

      Code, build, deploy, and monitor security-focused software.

    • Edge computing

      Deploy workloads closer to the source with edge technology.

  • Explore solutions
  • Solutions by industry

    • Automotive
    • Financial services
    • Healthcare
    • Industrial sector
    • Media and entertainment
    • Public sector (Global)
    • Public sector (U.S.)
    • Telecommunications

Discover cloud technologies

Learn how to use our cloud products and solutions at your own pace in the Red Hat® Hybrid Cloud Console.

Products
  • Platforms

    • Red Hat AI

      Develop and deploy AI solutions across the hybrid cloud.

    • Red Hat Enterprise Linux

      Support hybrid cloud innovation on a flexible operating system.

    • Red Hat OpenShift

      Build, modernize, and deploy apps at scale.

    • Red Hat Ansible Automation Platform

      Implement enterprise-wide automation.

  • Featured

    • Red Hat OpenShift Virtualization Engine
    • Red Hat OpenShift Service on AWS
    • Microsoft Azure Red Hat OpenShift
    • See all products
  • Try & buy

    • Start a trial
    • Buy online
    • Integrate with major cloud providers
  • Services & support

    • Consulting
    • Product support
    • Services for AI
    • Technical Account Management
    • Explore services
Training
  • Training & certification

    • Courses and exams
    • Certifications
    • Red Hat Academy
    • Learning community
    • Learning subscription
    • Explore training
  • Featured

    • Red Hat Certified System Administrator exam
    • Red Hat System Administration I
    • Red Hat Learning Subscription trial (No cost)
    • Red Hat Certified Engineer exam
    • Red Hat Certified OpenShift Administrator exam
  • Services

    • Consulting
    • Partner training
    • Product support
    • Services for AI
    • Technical Account Management
Learn
  • Build your skills

    • Documentation
    • Hands-on labs
    • Hybrid cloud learning hub
    • Interactive learning experiences
    • Training and certification
  • More ways to learn

    • Blog
    • Events and webinars
    • Podcasts and video series
    • Red Hat TV
    • Resource library

For developers

Discover resources and tools to help you build, deliver, and manage cloud-native applications and services.

Partners
  • For customers

    • Our partners
    • Red Hat Ecosystem Catalog
    • Find a partner
  • For partners

    • Partner Connect
    • Become a partner
    • Training
    • Support
    • Access the partner portal

Build solutions powered by trusted partners

Find solutions from our collaborative community of experts and technologies in the Red Hat® Ecosystem Catalog.

Search

I'd like to:

  • Start a trial
  • Manage subscriptions
  • See Red Hat jobs
  • Explore tech topics
  • Contact sales
  • Contact customer service

Help me find:

  • Documentation
  • Developer resources
  • Skills assessments
  • Architecture center
  • Security updates
  • Support cases

I want to learn more about:

  • AI
  • Application modernization
  • Automation
  • Cloud-native applications
  • Linux
  • Virtualization
ConsoleDocsSupportNew For you

Recommended

We'll recommend resources you may like as you browse. Try these suggestions for now.

  • Product trial center
  • Courses and exams
  • All products
  • Tech topics
  • Resource library
Log in

Get more with a Red Hat account

  • Console access
  • Event registration
  • Training & trials
  • World-class support

A subscription may be required for some services.

Log in or register
Contact us
  • Home
  • Resources
  • A guide to Models-as-a-Service

A guide to Models-as-a-Service

September 11, 2025•
Resource type: Overview
Download PDF

AI adoption is growing, but infrastructure and access issues create challenges

Interest in AI is rapidly expanding, with organizations eager to use large language models (LLMs), predictive analytics, vision capabilities, and other advanced tools to extract business value. However, moving AI from isolated experimentation to widespread organizational adoption presents significant infrastructure and operational challenges.

Many organizations begin their AI journey by connecting to commercial LLM application programming interfaces (APIs) such as those from OpenAI or Anthropic, assuming it is the fastest way to production. But as use grows, costs increase, and teams encounter limitations around data privacy, observability, and customization. And in some cases, commercial AI providers make changes to models with little advance warning, disrupting organizations’ business uses.

In response, some organizations swing to the opposite extreme: building their own model infrastructure from scratch. This do-it-yourself path often leads to teams independently deploying open source models such as Llama or Mistral with little coordination. The result is a fragmented landscape where groups stand up their own stacks, leading to redundant infrastructure, idle graphics processing units (GPUs), and significant operational overhead. Security and governance suffer, and costs spiral without delivering much business value.

These challenges have been even further exacerbated by the ballooning size of recent LLMs such as Llama, DeepSeek, Mistral, or Qwen. Unlike the relatively small-scale AI models of even just a few years ago, today’s large models can require terabytes of vRAM. And those GPUs are expensive. Using these resources inefficiently can lead quickly to soaring costs. The situation worsens when multiple teams within the same organization independently attempt to deploy these models. This fragmented approach compounds operational overhead and inflates expenditure.

Organizations need an internal approach that streamlines and consolidates model use, optimizes hardware resources, and allows for controlled, scalable access for diverse sets of internal users. Without such an approach, AI initiatives risk low adoption and high operational expenses, infrastructure investments remain underused, and measurable outcomes—such as increased productivity, lower operational costs, or faster time to insights—remain difficult to achieve.

What is Models-as-a-Service

 Models-as-a-Service (MaaS) is an approach to delivering AI models as shared resources, allowing users within an organization to access them on demand. MaaS offers a ready-to-go AI foundation— in the form of application programming interface (API) endpoints—that encourages private and efficient AI at scale.

The Models-as-a-Service approach to this challenge

Models-as-a-Service (MaaS) is an approach that helps organizations deploy AI models once and deliver them as shared, security-focused resources across the entire enterprise. Instead of managing isolated deployments for individual teams, a MaaS approach helps companies to centralize AI infrastructure and operations, which simplifies internal AI adoption.

The workflow for a Models-as-a-Service setup.

Figure 1. The workflow for a Models-as-a-Service setup.

Deliver shared access to AI with centralized model operations

  • For AI engineers, MaaS provides quicker access to high-performing models via APIs, which eliminate the need to download models, manage dependencies, or request GPU allocations through lengthy IT tickets.

MaaS functions by setting up an AI operations team as the central owner of shared AI resources. Models are deployed on a scalable platform (such as Red Hat® OpenShift® AI or other similar platforms) and then exposed through an API gateway. This setup allows multiple users, developers, and business units to offer simplified access for end users while meeting security and governance priorities for IT and finance teams. This prioritization can include chargeback capabilities, consuming models without needing direct hardware access or deep technical expertise. The goal is to provide user-friendly access to the AI models and not to the required resources to run these models, such as GPUs and tensor processing units (TPUs). All this, while meeting enterprise performance and compliance requirements and without complicating access for end users.

In practice, users interact only with APIs that deliver model-generated responses. Just as public AI providers abstract away hardware complexities from end users, internal MaaS deployments offer the same simplicity. Users do not directly manage hardware or software infrastructure, wait for an IT ticket to be resolved on their behalf, or stand by while an environment is configured for them. Instead, IT operations and AI teams centrally manage model lifecycle, security, updates, and infrastructure scaling, offering users streamlined yet controlled access.

This centralization not only streamlines internal AI operations but also enhances security focus and governance. Access to AI models is tightly controlled through credential management via an API gateway. Organizations can readily track use, set up internal chargeback mechanisms, make sure privacy compliance guidelines are being followed, and establish clear operational boundaries, which makes enterprise AI both manageable and practical. Tracking usage at the token level (in and out) is the most accurate and granular way to do so, and much more precise than any GPU-level metric.

Control use, throttle access, and manage costs

  • IT and platform engineers benefit from centralized oversight, which prevents unauthorized model deployments, enforces security and compliance standards, and simplifies lifecycle and infrastructure management.
  • For finance teams, centralized use tracking and internal chargeback mechanisms reduce waste and make GPU use more predictable and accountable, avoiding overspending from underused, team-specific hardware allocations.

Control in a MaaS is primarily delivered through integrating an API gateway with the AI infrastructure, which allows teams to manage and monitor AI use at a very granular level.

Traditional AI deployments often suffer from unmanaged or inefficient use, as individuals or teams independently deploy models without centralized oversight. This fragmented approach can lead to costly inefficiencies, with GPU resources idling or underused. Placing an API gateway at the heart of the AI infrastructure creates a controlled access point between users and models.

This setup facilitates precise use tracking, down to the individual token level. Teams can clearly identify how much each user, team, or application consumes, attributing GPU and infrastructure costs accurately. For example, organizations can determine whether a particular user or application is using resources excessively and take corrective action—such as throttling use or allocating costs through internal chargeback mechanisms.

Throttling capabilities provided by the API gateway make sure there is consistent performance and prevent resource exhaustion. Use throttling allows IT teams to manage access intensity, preventing any single user from monopolizing GPU resources or degrading the performance experienced by others.

Additionally, API gateways offer fine-grained credential management and access control. Internal users can generate credentials to access AI models independently, streamlining administrative overhead. Credentials can also be revoked or modified in less time to respond to changing security requirements or use patterns.

This all means that cost management becomes more transparent and accountable. IT teams can allocate GPU and infrastructure expenses accurately to the teams or business units that consume them.

Support any model, any accelerator, and any cloud

A core tenet of the MaaS approach is control. It allows organizations to select and deploy a broad range of AI models, choose their preferred hardware accelerators, and operate within their existing cloud or on-premise environments. This approach gives organizations the freedom to implement AI precisely according to their technical needs, security requirements, and operational preferences.

  • Organizations face rigid limitations when adopting AI. They are often:
    • Restricted by specific cloud services.
    • Locked into proprietary model ecosystems.
    • Constrained by fixed hardware infrastructures.
  • MaaS addresses these limitations in a number of ways, including:
    • Supporting open source or proprietary models, custom-trained models, and popular LLMs such as Llama and Mistral.
    • Extending beyond text-based models to include predictive analytics, computer vision, audio transcription tools, and other multimodal gen AI use cases like image or video generation.
  • MaaS remains agnostic to hardware accelerators, so:
    • Organizations can select GPUs or other accelerators that align with their workloads, cost structures, and performance needs.
    • Centralized AI teams can make critical sizing and deployment decisions, improving efficiency and reducing errors from less technical users.
  • Centralized management allows:
    • Optimal allocation and use of infrastructure.
    • Reduced operational overhead and prevention of resource misconfiguration.
  • MaaS supports deployment across any environment, including:
    • On-premise, hybrid cloud, air-gapped environments, and public clouds, which is especially valuable for highly regulated sectors that require data sovereignty, regulatory compliance, or strict security controls.

How Red Hat implements MaaS

Red Hat has embraced MaaS internally by centralizing AI model deployment and access. Our internal AI team centrally manages AI resources and model operations, using Red Hat OpenShift and Red Hat OpenShift AI as the underlying platform. This centralized model deployment simplifies AI consumption for users across the organization, allowing our developers and business teams to efficiently integrate AI capabilities into their workflows without needing dedicated hardware or deep technical expertise.

Our implementation features a scalable serving architecture that uses GPUs within OpenShift AI, and connects users through a centralized API gateway. This gives controlled, security-focused, and traceable access to AI models. Use is carefully managed through token-based monitoring, facilitating precise tracking of who is using models, how often, and in what quantity. The result is optimized hardware use, reducing the unnecessary consumption of GPU resources, and offering detailed insights to accurately allocate costs across different internal teams or projects.

Our MaaS implementation uses GitOps workflows, providing high availability and reliability. This operational approach reduces manual intervention and potential errors, establishing clear control over AI deployments.

A key benefit of our internal MaaS implementation has been a marked improvement in resource efficiency and user experience. Rather than multiple teams independently provisioning GPUs and deploying models, our MaaS has eliminated duplicate efforts, streamlined internal operations, and significantly accelerated time-to-value. When new models are tested and verified, Red Hat teams can integrate and use them immediately, instead of being delayed by hardware allocation or provisioning tasks.

Start building your internal AI platform today

Ready to simplify AI delivery and unlock real value from your infrastructure investments? Start by reviewing our in-depth explainer on MaaS for further insight into how it works. Then, explore the OpenShift AI product page to evaluate platform capabilities and GPU use guidance.

For teams building a MaaS internally, Red Hat Consulting helps organizations design and operationalize model-serving environments tailored to their needs.Learn more at the Red Hat Consulting for AI page.

Want a more comprehensive look into real-world examples? Check out our on-demand webinar series, including the session dedicated to MaaS.

Tags:Artificial intelligence

Red Hat logoLinkedInYouTubeFacebookX

Platforms

  • Red Hat AI
  • Red Hat Enterprise Linux
  • Red Hat OpenShift
  • Red Hat Ansible Automation Platform
  • See all products

Tools

  • Training and certification
  • My account
  • Customer support
  • Developer resources
  • Find a partner
  • Red Hat Ecosystem Catalog
  • Documentation

Try, buy, & sell

  • Product trial center
  • Red Hat Store
  • Buy online (Japan)
  • Console

Communicate

  • Contact sales
  • Contact customer service
  • Contact training
  • Social

About Red Hat

Red Hat is an open hybrid cloud technology leader, delivering a consistent, comprehensive foundation for transformative IT and artificial intelligence (AI) applications in the enterprise. As a trusted adviser to the Fortune 500, Red Hat offers cloud, developer, Linux, automation, and application platform technologies, as well as award-winning services.

  • Our company
  • How we work
  • Customer success stories
  • Analyst relations
  • Newsroom
  • Open source commitments
  • Our social impact
  • Jobs

Change page language

Red Hat legal and privacy links

  • About Red Hat
  • Jobs
  • Events
  • Locations
  • Contact Red Hat
  • Red Hat Blog
  • Inclusion at Red Hat
  • Cool Stuff Store
  • Red Hat Summit
© 2026 Red Hat

Red Hat legal and privacy links

  • Privacy statement
  • Terms of use
  • All policies and guidelines
  • Digital accessibility