It’s no secret – the tech industry is rapidly adopting agentic software development to convert business processes into fully autonomous, agentic workflows. While the power of these tools is undeniable, the current consumption models present a challenge. Most of these solutions are delivered leveraging a model-as-a-service approach that’s poised to trigger an AI version of the cloud paradox: The agentic paradox.
The paradox is simple. The fastest path to increase the velocity of your business processes is to use powerful frontier models. However, as adoption scales, this strategy becomes unsustainable. Token costs erode profit margins, unpredictable latency can degrade performance, and routing sensitive data to public APIs can violate confidentiality, sovereignty and regulatory mandates. To relieve these tensions, enterprises must move beyond simple consumption toward a hybrid architectural strategy that prioritizes choice.
The cost of innovation
There are visible friction points already. Some reports show enterprises exhausting their entire cloud spend budget on tokens by the middle of Q2. We’re approaching a critical juncture where we need new approaches for model inference to reassert control over cost, performance and data.
How will organizations respond when the bill for yesterday’s innovation arrives tomorrow? We’re moving beyond the era of simply using models. We must architect the systems that support them. Adoption will likely follow a hybrid pattern. Some token consumption will use frontier models, while some will be self-managed on the public cloud or in enterprise data centers.
A system-centric mindset
Much of our work in Red Hat’s Research and Emerging Technologies groups focuses on the relationship between the intelligence and infrastructure layers, and results in innovative open source community projects, like a recently built hardened, image-based foundation for AI agents. By treating AI workloads with the same rigor as traditional enterprise software, open source provides the stability required for production environments.
This architectural shift allows organizations to move away from a model-centric view and toward a system-centric mindset. In this model, value is found in the reliability of the entire stack rather than a single provider’s API.
The mechanism of choice
As an enterprise works to regain its financial footing and establish a foundation for hybrid control and consistency, the initial path typically uses an inference proxy or router. This is the least disruptive approach to driving down inference costs in an existing agentic implementation with minimal architectural change. By keeping inference endpoints consistent, organizations can switch between service providers or self-managed models that provide better value.
This is why Red Hat created projects like vLLM Semantic Router and llm-d – to explore novel ideas in artificial intelligence and, in the case of vLLM Semantic Router, inference routing and token economics. This trailblazing research and development are the building blocks that eventually shape Red Hat platforms. Projects like vLLM Semantic Router provide the intelligent, efficient routing needed to navigate a multi-modal landscape and by owning this routing intelligence layer, organizations can regain control over their workloads across any infrastructure.
A hybrid reality
Beyond inference routing, the next step for organizations is exploring self-managed solutions. This means using the latest open weight model offerings served by a high-performance inference platform like vLLM, hosted on their own infrastructure.
Then comes a core challenge: How can organizations take the powerful, agent-driven business processes developed via model-as-a-service and replace them with open weight models? How can enterprises replicate model-as-a-service patterns to pivot from being a token-consumer and become an AI provider? What trade-offs will they face during this transition? How do they do it successfully, without reducing efficacy?
Every enterprise has years of unique data, and models trained in the public domain lack this specific context and training. Open weight models running locally can be coupled with these private data sources to safely enhance the accuracy and capabilities of agents. While some open weight models can act as an immediate replacement, others require work to close the performance gap through fine-tuning, distillation, and reinforcement learning. As reinforcement learning enters the market, the accuracy of these models and the resulting agentic workloads will be further enhanced. This path ultimately leads to a hybrid architecture. Some models remain self-managed for core workloads while others are consumed via a third party managed service interface.
Red Hat specializes in hybrid solutions. In the same way that we delivered a hybrid platform for cloud consumption with Red Hat Enterprise Linux and Red Hat OpenShift, Red Hat AI Enterprise provides a hybrid platform for agent deployments and inference, regardless of the model you’re using or where it’s hosted, proving choice arrives through open source. The future of AI is hybrid, and the platforms to build that future are already here at Red Hat.
To hear more from Red Hat executives, as well as our customers and partners, watch the Red Hat Summit keynotes live on YouTube.
- The next platform is choice — Tuesday, May 12, 8:30-10 a.m. EDT
- The AI-ready enterprise is here — Wednesday, May 13, 9-10 a.m. EDT
Learn more about Red Hat Summit and take a look at all of Red Hat’s announcements this week in the Red Hat Summit newsroom. Follow @RedHatSummit or #RHSummit on X for event-specific updates.
저자 소개
Steve Watt is a Distinguished Engineer and vice president of the Office of the CTO, which includes Red Hat Research and Emerging Technologies. Prior to joining Red Hat, Steve was the founder of the Hadoop Business and Hadoop Chief Technologist at HP and a Software Architect and Master Inventor at IBM Emerging Technologies. Prior to IBM, Steve worked for a number of consumer facing software startups in the USA and his native South Africa.
유사한 검색 결과
Red Hat and Netris bring multi-tenant networking to sovereign AI clouds and neoclouds
Turning complexity into confidence with Red Hat Technical Supportability Review with AI
Technically Speaking | Build a production-ready AI toolbox
Technically Speaking | Platform engineering for AI agents
채널별 검색
오토메이션
기술, 팀, 인프라를 위한 IT 자동화 최신 동향
인공지능
고객이 어디서나 AI 워크로드를 실행할 수 있도록 지원하는 플랫폼 업데이트
오픈 하이브리드 클라우드
하이브리드 클라우드로 더욱 유연한 미래를 구축하는 방법을 알아보세요
보안
환경과 기술 전반에 걸쳐 리스크를 감소하는 방법에 대한 최신 정보
엣지 컴퓨팅
엣지에서의 운영을 단순화하는 플랫폼 업데이트
인프라
세계적으로 인정받은 기업용 Linux 플랫폼에 대한 최신 정보
애플리케이션
복잡한 애플리케이션에 대한 솔루션 더 보기
가상화
온프레미스와 클라우드 환경에서 워크로드를 유연하게 운영하기 위한 엔터프라이즈 가상화의 미래