Small models, big impact: The future of scaling enterprise AI agents

20 febbraio 2026Catherine Weeks, Ricardo Noriega3 minuti (tempo di lettura)

In the AI industry, we’ve spent the last 3 years obsessed with scale. We’ve chased parameter counts into the trillions, believing that "bigger" was the only path to "smarter." But as the dust settles, a new reality is emerging for the enterprise—size is not the metric that matters, delivering reliable, deterministic outcomes is.

At Red Hat, we’ve always believed that the most powerful technologies are those that are distributed, open, and fit-for-purpose. Small language models (SLMs) represent that exact shift. The distinction between SLMs and large language models (LLMs) is less important than the architectural role the model serves. What matters is the functional sovereignty a small model brings to the table.

We are moving away from a world of conversational AI—where we ask a giant, black-box model a question—and entering the era of agentic AI, where a fleet of specialized models performs the actual work of the business.

Every business will run AI agents

We are on the verge of a shift as fundamental as the transition to the web.

Think back to the evolution of business identity. In 1995, the industry asked, "Why do I need an email address?" In 2005, it was a website. In 2015, a social media presence. In 2026, the question will be, "How many agents do I have running?"

We are heading toward a world where there will be more AI agents than people. Every business will have a swarm of them:

Customer-facing agents that don’t just answer questions, but solve complex logistics issues.
Workflow agents that automate the invisible "glue" between departments.
Headless agents that silently execute API calls to reconcile inventory and process payments.

But you cannot build a sustainable, cost-effective agentic fleet on someone else's subsidized cloud tokens. This is where the SLM becomes the mandatory tool to enable enterprise use cases and scale.

Why SLMs rule the agentic backend

While frontier LLMs are masterpieces of high-throughput engineering, they are often too heavy for the role of a reflexive digital employee. In an agentic workflow, we don’t just need raw power, we need low-latency execution. SLMs allow us to provide sub-second response times and deterministic reliability that business-critical automation demands.

1. The power of specialization (efficiency > scale)

While few organizations would consider fine-tuning a 400B-parameter model, a 3B or 7B model offers a manageable and highly effective entry point. This is where architectural control begins. Research from late 2025 demonstrates that even a 350M-parameter model fine-tuned on high-quality, synthetic data can outperform generalist frontier models in specific tool-calling and API-orchestration domains. For a robust agentic backend, the goal isn't broad, poetic language capability—it is high-precision specialization.

2. Determinism and the "math of reliability"

One of the biggest hurdles for enterprise AI is non-determinism, the risk that an agent might format a response correctly one time and fail the next. While no LLM is a perfectly deterministic math function, SLMs allow us to enforce architectural control that was previously much harder. By using constrained decoding techniques like JSON Schema or Context-Free Grammars (CFGs), we can prune the model’s token search space, making it physically impossible for the model to choose an invalid next character. This shifts the focus from open-ended magic to schema-constrained accuracy. Combined with local execution and specialized fine-tuning, SLMs can achieve over 98% validity in structured tasks, offering the predictable reliability required for sensitive agentic workflows.

3. Data sovereignty is not optional

Your data is your most precious asset. In an agentic world, these models will handle your customer relationship management (CRM), your proprietary code, and your internal strategy. Giving that data away to a third-party cloud provider in exchange for "intelligence-as-a-service" is a strategic mistake.

Running SLMs on-prem or within your own hybrid cloud environment means you remain the owner of your IP. It allows for a "zero trust" AI architecture where sensitive data never leaves your perimeter, fulfilling the strict regulatory requirements common in industries such as healthcare, finance, and government.

Final thoughts

We are transitioning from a world of generative AI (gen AI) producing conversation and content to one of agentic AI taking action on our behalf. In this new era, the question is no longer about which model is the biggest, but which infrastructure is the most reliable and protected. When your business operations depend on a fleet of specialized digital agents, the "black box" cloud model is no longer enough. You need sovereignty, speed, and precision.

At Red Hat, we believe the path to the agentic future is open. By leveraging curated small language models that can be fine tuned, served, and orchestrated with the Red Hat AI portfolio, enterprises can move AI out of the lab and into the core of their business logic.

The space is moving fast, but the goal is clear: stop chasing the giants and start building the backbone. The future of AI is small, fast, and built on the open hybrid cloud.

Learn more

Sugli autori

Catherine Weeks

Engineering Director, Red Hat AI

Catherine Weeks is an Engineering Director in Red Hat AI, where she leads the teams building software with the latest generative AI innovations.

With a background in software design, Catherine is a leader who excels at translating complex customer needs into practical engineering solutions. She is known for her ability to work at every level—from high-level strategy down to the hands-on work of getting it done. This approach helps her balance the fast-moving world of AI innovation with the need to build the reliable, high-quality products customers depend on, all while fostering a supportive team culture.

With over 20 years in the software industry, Catherine has a proven record of mentoring strong teams and has always been a champion for the end-user.

Read full bio

Ricardo Noriega

OCTO Initiative Lead

Ricardo is a Principal Software Engineer working at the Red Hat's Office of the CTO in the Emerging Technologies organization as Initiative lead. Ricardo is currently focused on the different kinds of architectures in the AI space like SLMs and multimodality. He has been part of the MicroShift and Edge Manager projects since its inception.
He is a former member of the Akraino Technical Steering Committee and Project Technical Lead of the Kubernetes-Native-Infrastructure blueprint family. He's been doing R&D related to OpenStack, as well as, contributing to OpenDaylight project and OPNFV. He is passionate about new technologies and everything related to the Open Source world. Ricardo holds a MSc Degree in Telecommunications from Technical University of Madrid (UPM). He loves music, photography and outdoor sports.

Read full bio