It’s no secret – the tech industry is rapidly adopting agentic software development to convert business processes into fully autonomous, agentic workflows. While the power of these tools is undeniable, the current consumption models present a challenge. Most of these solutions are delivered leveraging a model-as-a-service approach that’s poised to trigger an AI version of the cloud paradox: The agentic paradox.
The paradox is simple. The fastest path to increase the velocity of your business processes is to use powerful frontier models. However, as adoption scales, this strategy becomes unsustainable. Token costs erode profit margins, unpredictable latency can degrade performance, and routing sensitive data to public APIs can violate confidentiality, sovereignty and regulatory mandates. To relieve these tensions, enterprises must move beyond simple consumption toward a hybrid architectural strategy that prioritizes choice.
The cost of innovation
There are visible friction points already. Some reports show enterprises exhausting their entire cloud spend budget on tokens by the middle of Q2. We’re approaching a critical juncture where we need new approaches for model inference to reassert control over cost, performance and data.
How will organizations respond when the bill for yesterday’s innovation arrives tomorrow? We’re moving beyond the era of simply using models. We must architect the systems that support them. Adoption will likely follow a hybrid pattern. Some token consumption will use frontier models, while some will be self-managed on the public cloud or in enterprise data centers.
A system-centric mindset
Much of our work in Red Hat’s Research and Emerging Technologies groups focuses on the relationship between the intelligence and infrastructure layers, and results in innovative open source community projects, like a recently built hardened, image-based foundation for AI agents. By treating AI workloads with the same rigor as traditional enterprise software, open source provides the stability required for production environments.
This architectural shift allows organizations to move away from a model-centric view and toward a system-centric mindset. In this model, value is found in the reliability of the entire stack rather than a single provider’s API.
The mechanism of choice
As an enterprise works to regain its financial footing and establish a foundation for hybrid control and consistency, the initial path typically uses an inference proxy or router. This is the least disruptive approach to driving down inference costs in an existing agentic implementation with minimal architectural change. By keeping inference endpoints consistent, organizations can switch between service providers or self-managed models that provide better value.
This is why Red Hat created projects like vLLM Semantic Router and llm-d – to explore novel ideas in artificial intelligence and, in the case of vLLM Semantic Router, inference routing and token economics. This trailblazing research and development are the building blocks that eventually shape Red Hat platforms. Projects like vLLM Semantic Router provide the intelligent, efficient routing needed to navigate a multi-modal landscape and by owning this routing intelligence layer, organizations can regain control over their workloads across any infrastructure.
A hybrid reality
Beyond inference routing, the next step for organizations is exploring self-managed solutions. This means using the latest open weight model offerings served by a high-performance inference platform like vLLM, hosted on their own infrastructure.
Then comes a core challenge: How can organizations take the powerful, agent-driven business processes developed via model-as-a-service and replace them with open weight models? How can enterprises replicate model-as-a-service patterns to pivot from being a token-consumer and become an AI provider? What trade-offs will they face during this transition? How do they do it successfully, without reducing efficacy?
Every enterprise has years of unique data, and models trained in the public domain lack this specific context and training. Open weight models running locally can be coupled with these private data sources to safely enhance the accuracy and capabilities of agents. While some open weight models can act as an immediate replacement, others require work to close the performance gap through fine-tuning, distillation, and reinforcement learning. As reinforcement learning enters the market, the accuracy of these models and the resulting agentic workloads will be further enhanced. This path ultimately leads to a hybrid architecture. Some models remain self-managed for core workloads while others are consumed via a third party managed service interface.
Red Hat specializes in hybrid solutions. In the same way that we delivered a hybrid platform for cloud consumption with Red Hat Enterprise Linux and Red Hat OpenShift, Red Hat AI Enterprise provides a hybrid platform for agent deployments and inference, regardless of the model you’re using or where it’s hosted, proving choice arrives through open source. The future of AI is hybrid, and the platforms to build that future are already here at Red Hat.
To hear more from Red Hat executives, as well as our customers and partners, watch the Red Hat Summit keynotes live on YouTube.
- The next platform is choice — Tuesday, May 12, 8:30-10 a.m. EDT
- The AI-ready enterprise is here — Wednesday, May 13, 9-10 a.m. EDT
Learn more about Red Hat Summit and take a look at all of Red Hat’s announcements this week in the Red Hat Summit newsroom. Follow @RedHatSummit or #RHSummit on X for event-specific updates.
Sull'autore
Steve Watt is a Distinguished Engineer and vice president of the Office of the CTO, which includes Red Hat Research and Emerging Technologies. Prior to joining Red Hat, Steve was the founder of the Hadoop Business and Hadoop Chief Technologist at HP and a Software Architect and Master Inventor at IBM Emerging Technologies. Prior to IBM, Steve worked for a number of consumer facing software startups in the USA and his native South Africa.
Altri risultati simili a questo
Red Hat and Netris bring multi-tenant networking to sovereign AI clouds and neoclouds
Turning complexity into confidence with Red Hat Technical Supportability Review with AI
Technically Speaking | Build a production-ready AI toolbox
Technically Speaking | Platform engineering for AI agents
Ricerca per canale
Automazione
Novità sull'automazione IT di tecnologie, team e ambienti
Intelligenza artificiale
Aggiornamenti sulle piattaforme che consentono alle aziende di eseguire carichi di lavoro IA ovunque
Hybrid cloud open source
Scopri come affrontare il futuro in modo più agile grazie al cloud ibrido
Sicurezza
Le ultime novità sulle nostre soluzioni per ridurre i rischi nelle tecnologie e negli ambienti
Edge computing
Aggiornamenti sulle piattaforme che semplificano l'operatività edge
Infrastruttura
Le ultime novità sulla piattaforma Linux aziendale leader a livello mondiale
Applicazioni
Approfondimenti sulle nostre soluzioni alle sfide applicative più difficili
Virtualizzazione
Il futuro della virtualizzazione negli ambienti aziendali per i carichi di lavoro on premise o nel cloud