It’s no secret – the tech industry is rapidly adopting agentic software development to convert business processes into fully autonomous, agentic workflows. While the power of these tools is undeniable, the current consumption models present a challenge. Most of these solutions are delivered leveraging a model-as-a-service approach that’s poised to trigger an AI version of the cloud paradox: The agentic paradox.

The paradox is simple. The fastest path to increase the velocity of your business processes is to use powerful frontier models. However, as adoption scales, this strategy becomes unsustainable. Token costs erode profit margins, unpredictable latency can degrade performance, and routing sensitive data to public APIs can violate confidentiality, sovereignty and regulatory mandates. To relieve these tensions, enterprises must move beyond simple consumption toward a hybrid architectural strategy that prioritizes choice.  

The cost of innovation

There are visible friction points already. Some reports show enterprises exhausting their entire cloud spend budget on tokens by the middle of Q2. We’re approaching a critical juncture where we need new approaches for model inference to reassert control over cost, performance and data. 

How will organizations respond when the bill for yesterday’s innovation arrives tomorrow? We’re moving beyond the era of simply using models. We must architect the systems that support them. Adoption will likely follow a hybrid pattern. Some token consumption will use frontier models, while some will be self-managed on the public cloud or in enterprise data centers.

A system-centric mindset

Much of our work in Red Hat’s Research and Emerging Technologies groups focuses on the relationship between the intelligence and infrastructure layers, and results in innovative open source community projects, like a recently built hardened, image-based foundation for AI agents. By treating AI workloads with the same rigor as traditional enterprise software, open source provides the stability required for production environments. 

This architectural shift allows organizations to move away from a model-centric view and toward a system-centric mindset. In this model, value is found in the reliability of the entire stack rather than a single provider’s API.

The mechanism of choice

As an enterprise works to regain its financial footing and establish a foundation for hybrid control and consistency, the initial path typically uses an inference proxy or router. This is the least disruptive approach to driving down inference costs in an existing agentic implementation with minimal architectural change. By keeping inference endpoints consistent, organizations can switch between service providers or self-managed models that provide better value. 

This is why Red Hat created projects like vLLM Semantic Router and llm-d  – to explore novel ideas in artificial intelligence and, in the case of vLLM Semantic Router, inference routing and token economics. This trailblazing research and development are the building blocks that eventually shape Red Hat platforms. Projects like vLLM Semantic Router provide the intelligent, efficient routing needed to navigate a multi-modal landscape and by owning this routing intelligence layer, organizations can regain control over their workloads across any infrastructure.

A hybrid reality

Beyond inference routing, the next step for organizations is exploring self-managed solutions. This means using the latest open weight model offerings served by a high-performance inference platform like vLLM, hosted on their own infrastructure. 

Then comes a core challenge: How can organizations take the powerful, agent-driven business processes developed via model-as-a-service and replace them with open weight models? How can enterprises replicate model-as-a-service patterns to pivot from being a token-consumer and become an AI provider? What trade-offs will they face during this transition? How do they do it successfully, without reducing efficacy?

Every enterprise has years of unique data, and models trained in the public domain lack this specific context and training. Open weight models running locally can be coupled with these private data sources to safely enhance the accuracy and capabilities of agents. While some open weight models can act as an immediate replacement, others require work to close the performance gap through fine-tuning, distillation, and reinforcement learning. As reinforcement learning enters the market, the accuracy of these models and the resulting agentic workloads will be further enhanced. This path ultimately leads to a hybrid architecture. Some models remain self-managed for core workloads while others are consumed via a third party managed service interface. 

Red Hat specializes in hybrid solutions. In the same way that we delivered a hybrid platform for cloud consumption with Red Hat Enterprise Linux and Red Hat OpenShiftRed Hat AI Enterprise  provides a hybrid platform for agent deployments and inference, regardless of the model you’re using or where it’s hosted, proving choice arrives through open source. The future of AI is hybrid, and the platforms to build that future are already here at Red Hat.

To hear more from Red Hat executives, as well as our customers and partners, watch the Red Hat Summit keynotes live on YouTube.

Learn more about Red Hat Summit and take a look at all of Red Hat’s announcements this week in the Red Hat Summit newsroom. Follow @RedHatSummit or #RHSummit on X for event-specific updates.


Sull'autore

Steve Watt is a Distinguished Engineer and vice president of the Office of the CTO, which includes Red Hat Research and Emerging Technologies. Prior to joining Red Hat, Steve was the founder of the Hadoop Business and Hadoop Chief Technologist at HP and a Software Architect and Master Inventor at IBM Emerging Technologies. Prior to IBM, Steve worked for a number of consumer facing software startups in the USA and his native South Africa.

UI_Icon-Red_Hat-Close-A-Black-RGB

Ricerca per canale

automation icon

Automazione

Novità sull'automazione IT di tecnologie, team e ambienti

AI icon

Intelligenza artificiale

Aggiornamenti sulle piattaforme che consentono alle aziende di eseguire carichi di lavoro IA ovunque

open hybrid cloud icon

Hybrid cloud open source

Scopri come affrontare il futuro in modo più agile grazie al cloud ibrido

security icon

Sicurezza

Le ultime novità sulle nostre soluzioni per ridurre i rischi nelle tecnologie e negli ambienti

edge icon

Edge computing

Aggiornamenti sulle piattaforme che semplificano l'operatività edge

Infrastructure icon

Infrastruttura

Le ultime novità sulla piattaforma Linux aziendale leader a livello mondiale

application development icon

Applicazioni

Approfondimenti sulle nostre soluzioni alle sfide applicative più difficili

Virtualization icon

Virtualizzazione

Il futuro della virtualizzazione negli ambienti aziendali per i carichi di lavoro on premise o nel cloud