This past May at Red Hat Summit, we made several announcements across the Red Hat AI portfolio, including the introduction of Red Hat AI Inference Server and Red Hat AI third-party validated models, the integration of Llama Stack and Model Context Protocol (MCP) APIs as a developer preview, and the establishment of the llm-d community project. The portfolio’s latest iteration, Red Hat AI 3, generally available in November, delivers many of these production-ready capabilities for enterprises. Additionally, we’re providing more tools and services to empower teams to increase efficiency, collaborate more effectively, and deploy anywhere. Let’s explore what Red Hat AI 3 means for your business.
1. Achieve new levels of efficiency with SLA-aware inference
Red Hat’s strategy is to serve any model across any accelerator and any environment. The latest inferencing improvements offer features to meet Service Level Agreements (SLAs) of generative AI (gen AI) applications, support for additional hardware accelerators, and an expanded catalog of validated and optimized third-party models. Some highlights include:
- llm-d is now generally available in Red Hat OpenShift AI 3.0. llm-d provides kubernetes-native distributed inference, which is essential for scaling and managing the unpredictable nature of large language models (LLMs). Unlike the consistent behavior of many traditional scale-out workloads, LLM requests, such as prompts and responses, can vary greatly, making monolithic scaling highly inefficient. By intelligently distributing the inference process, llm-d offers consistent resource allocation and predictable response times, which is critical for meeting strict SLAs and optimizing economic and performance viability for enterprise gen AI applications.
- The latest release of Red Hat AI Inference Server, version 3.2, provides consistent, fast, and cost-effective inference via an enterprise-grade version of vLLM and access to Red Hat AI’s model optimization capabilities and extends the support of NVIDIA and AMD GPUs to now include IBM Spyre. This integration of new accelerators provides customers with the flexibility, optimization, and risk management needed to support their future AI strategies.
- Red Hat AI 3 features a new batch of third-party validated and optimized models, which encompass frontier open source models from providers like OpenAI, Google, and NVIDIA. This simplifies model selection and helps organizations reduce hardware costs, achieve higher throughput, and decrease latency during inference. These enterprise-ready models are made available in the Red Hat AI Hugging Face repository and in the Red Hat OpenShift AI model catalog as scanned and traceable containers. The new models include multilingual, coding, summarization, chat, and more.
- For enterprise IT organizations looking to become model providers for their users, OpenShift AI 3.0 provides access to Models as a Service (MaaS) capabilities as a developer preview. MaaS allows organizations to take advantage of a mix of API-based and self-managed models for use cases that cannot run in public cloud environments. This release includes a MaaS control plane, an integrated API gateway, role-based access control (RBAC), and cost-tracking metrics, which together allow organizations to centralize resources, accelerate innovation, and reduce the operational costs associated with private AI.
2. Accelerate agentic AI innovation
The evolution of cloud-native development revolutionized how many organizations built applications over the last decade. Similarly, gen AI transformed the software development standards. Now, a third wave of AI is set to usher in an even bigger transformation: agentic AI.
Several of the new capabilities found in OpenShift AI 3.0 help lay the foundation for scalable AI agentic systems and workflows, providing the frameworks, tools, and capabilities you need to accelerate the delivery of agentic AI, including:
- Modular and adaptive AI platform with Llama Stack: To enhance flexibility and simplify AI agent operations, we released the Llama Stack API as a technical preview in OpenShift AI 3.0. This provides a standardized entry point for a wide range of AI capabilities — from retrieval-augmented generation (RAG), safety, and evaluation to telemetry, inference with vLLM, and tool calling with MCP) — enabling organizations to integrate their own APIs, external providers, and preferred agentic frameworks. Red Hat AI provides a trusted, comprehensive, and consistent platform that makes it easier to deploy, manage, and run AI agents in a security-focused manner and at scale in production environments.
- MCP support - To accelerate the deployment of AI agentic systems, OpenShift AI 3.0 provides support for the emerging open standard MCP as a developer preview. The MCP server acts as a standardized "translator" for a wide range of external tools, data sources, and applications. It complements the Llama Stack API by handling the complex integrations with external applications and data sources, freeing the Llama Stack from requiring a custom integration for each external tool. We've also curated a collection of MCP servers. This enables ISVs to connect their tools and services directly to Red Hat AI.
- Streamlined, dedicated experiences - OpenShift AI 3.0 offers dedicated experiences, such as the AI hub and gen AI studio, that serve the distinct needs of platform and AI engineers. The AI hub empowers platform engineers to explore, deploy, and manage foundational assets like LLMs and MCP servers. It serves as the central point for managing the lifecycle and governance of AI assets. The gen AI studio provides AI engineers with a hands-on environment to discover, test, and manage deployed AI assets. AI engineers can experiment with different models, tuning hyperparameters, and prototyping gen AI applications, like chat and RAG.
3. Connecting models to your private data
Red Hat AI 3 allows teams to boost model performance and accuracy by offering multiple ways to customize AI for your domain. The tooling in Red Hat AI 3 is accessible to contributors of all levels of AI expertise—from developers, to data scientists, to AI engineers—streamlining collaboration and interoperability. New capabilities include:
- A modular and extensible approach - OpenShift AI 3.0 introduces a new modular and extensible toolkit for model customization, exemplifying the progression of InstructLab as it moves from a powerful, end-to-end methodology to a more flexible approach. The toolkit includes individual, specialized Python libraries for data ingestion, synthetic data generation (SDG), model tuning, and evaluation, which gives teams greater control and a more efficient path to model customization. This allows data scientists, AI researchers, and AI engineers to select only the components they need, helping them to work faster and more efficiently.
- Enhanced RAG capabilities - A new, expanded RAG experience is now available in OpenShift AI. This streamlined workflow enables developers and AI engineers to easily access data sources with open source technologies like docling, and connect them to models, applications, and agents. The platform now supports OpenAI's embedding and completion APIs alongside Llama Stack options, providing the flexibility to deploy RAG solutions across different environments while maintaining consistent functionality.
4. Scaling AI across the hybrid cloud
Productivity, consistency, and an enhanced user experience are key to a successful AI strategy. At Red Hat, our goal is to provide an AI platform that enables enterprises to consistently build, tune, deploy, and manage AI models and agentic applications at scale across the hybrid cloud, delivering a unified experience that increases time to value. OpenShift AI 3.0 offers:
- Centralized control through a model registry - The model registry provides a more streamlined experience for managing AI models, allowing teams to more easily discover, reuse, and manage a wide range of assets—from customer’s own models and artifacts to popular community and third-party options. These capabilities are designed to boost productivity, promote consistency, and help ensure centralized lifecycle management.
- Improved UX for AI pipelines - The enhanced user experience for AI pipelines provides the tooling data scientists and AI engineers need to train and tune models faster, streamlining workflows through runnable examples and reusable components, as well as the ability to bring your own Argo workflows for ultimate flexibility.
- Enhanced observability - To provide organizations with a centralized perspective on AI performance and improved control and consistency, OpenShift AI 3.0 includes foundational platform metrics with OpenTelemetry observability standard, zero-configuration GPU monitoring, reference dashboards for key AI metrics like time-to-first-token and throughput, and the ability to export APIs for smooth integration with enterprise monitoring platforms.
- Intelligent GPU-as-a-service - OpenShift AI 3.0 uses advanced features to enhance GPU utilization, maximize efficiency, and support a wide range of workloads. With accelerator slicing for all NVIDIA MIG-enabled devices, enterprises can partition GPUs for multiple users, helping ensure no resource goes to waste. By utilizing Kueue, the platform supports a more diverse set of AI workloads, including Ray training jobs, training operator-based jobs, and inference services for efficient scheduling and management across shared hardware.
A new approach to enterprise AI
Red Hat AI is built on the belief that enterprise AI is not a one-size-fits-all solution. It's a strategic, holistic approach that recognizes the complexity and diversity of real-world business challenges. Red Hat provides a flexible platform that empowers organizations to move beyond the status quo, offering the freedom to choose any model, hardware, or deployment strategy across the hybrid cloud. This commitment to choice, control, and efficiency is what sets us apart—we don't just offer AI, we provide a reliable, comprehensive foundation that enables organizations to get the most out of their AI investments.
To learn more about Red Hat AI 3 and discover how you can build AI for your world, watch our What’s new and what’s next live session and visit our website. Red Hat AI 3 will be generally available in November.
Resource
The adaptable enterprise: Why AI readiness is disruption readiness
About the authors
Jennifer Vargas is a marketer — with previous experience in consulting and sales — who enjoys solving business and technical challenges that seem disconnected at first. In the last five years, she has been working in Red Hat as a product marketing manager supporting the launch of a new set of cloud services. Her areas of expertise are AI/ML, IoT, Integration and Mobile Solutions.
Carlos Condado is a Senior Product Marketing Manager for Red Hat AI. He helps organizations navigate the path from AI experimentation to enterprise-scale deployment by guiding the adoption of MLOps practices and integration of AI models into existing hybrid cloud infrastructures. As part of the Red Hat AI team, he works across engineering, product, and go-to-market functions to help shape strategy, messaging, and customer enablement around Red Hat’s open, flexible, and consistent AI portfolio.
With a diverse background spanning data analytics, integration, cybersecurity, and AI, Carlos brings a cross-functional perspective to emerging technologies. He is passionate about technological innovations and helping enterprises unlock the value of their data and gain a competitive advantage through scalable, production-ready AI solutions.
Will McGrath is a Senior Principal Product Marketing Manager at Red Hat. He is responsible for marketing strategy, developing content, and driving marketing initiatives for Red Hat OpenShift AI. He has more than 30 years of experience in the IT industry. Before Red Hat, Will worked for 12 years as strategic alliances manager for media and entertainment technology partners.
As a principal technologist for AI at Red Hat with over 30 years of experience, Robbie works to support enterprise AI adoption through open source innovation. His focus is on cloud-native technologies, Kubernetes, and AI platforms, helping to deliver scalable and secure solutions using Red Hat AI.
Robbie is deeply committed to open source, open source AI, and open data, believing in the power of transparency, collaboration, and inclusivity to advance technology in meaningful ways. His work involves exploring private generative AI, traditional machine learning, and enhancing platform capabilities to support open and hybrid cloud solutions for AI. His focus is on helping organizations adopt ethical and sustainable AI technologies that make a real impact.
Aom is a Product Marketing Manager in Red Hat AI. She leads the strategy and coordination of the AI BU blog, ensuring timely and impactful storytelling around Red Hat’s AI efforts. She also drives the distribution of AI content across social channels and curates an internal newsletter to keep Red Hatters aligned on the latest developments in Red Hat AI.
In addition, she works with the global event team to shape AI-related event strategies, ensuring alignment between the AI BU and key marketing moments. She also collaborates closely with the AI BU’s Growth Marketing Manager to build pipeline strategies and engage with regional teams, ensuring consistent messaging and execution across markets.
More like this
Browse by channel
Automation
The latest on IT automation for tech, teams, and environments
Artificial intelligence
Updates on the platforms that free customers to run AI workloads anywhere
Open hybrid cloud
Explore how we build a more flexible future with hybrid cloud
Security
The latest on how we reduce risks across environments and technologies
Edge computing
Updates on the platforms that simplify operations at the edge
Infrastructure
The latest on the world’s leading enterprise Linux platform
Applications
Inside our solutions to the toughest application challenges
Virtualization
The future of enterprise virtualization for your workloads on-premise or across clouds