As AI workloads move from experimental prototypes into production environments, enterprises face a familiar challenge—how do you protect, manage, and govern these new components with the same rigor you apply to traditional software applications? A key piece of the puzzle lies in something your organization likely already uses extensively—containers, specifically Open Container Initiative (OCI) containers.

What is the Open Container Initiative?

The Open Container Initiative defines open specifications for image formats, container runtimes, and distribution, helping organizations avoid vendor lock-in.  OCI containers are an industry-standard format for packaging software applications, so they are able to run consistently across different environments, container engines (like Docker or Podman), and cloud platforms.

An OCI artifact is similar to a container, but instead of executable images, artifacts store other content like files and directories.  OCI-compatible artifact repositories (including Quay, Artifactory, Docker Hub and registries from the major cloud providers) can store and manage versioning of OCI containers and artifacts.

OCI provides a standardized and portable way to package and distribute software. By packaging your AI models, Model Context Protocol (MCP) servers, and AI agents using OCI containers, you can use your existing software supply chain security processes, CI/CD pipelines, and container orchestration infrastructure. This approach brings the same governance and auditability to your AI stack that you already apply to your application workloads.

Containerizing AI models with ModelCar

Large language models (LLMs) and other AI models present unique packaging challenges. They consist of large binary files, configuration metadata, and specific file structure requirements. In the past, organizations have relied on S3-compatible object storage to distribute models, but this approach creates friction with existing container-based workflows and security processes. We recommend building your AI models into OCI containers using a specific file structure we call ModelCar.

What is a ModelCar container?

ModelCar container is straightforward—AI model files are placed in a /models folder within the container. No additional packages or runtime components are required, the container simply holds the model artifacts in an OCI-compliant format.

The benefits of this approach are significant. Once your model is packaged as a container, you can manage it using the same software supply chain security processes you already use for your application containers. You can generate a software bill of materials (SBOM) and AI bill of materials (AI-BOM) for the container as a task in your CI/CD pipeline. You can also sign and validate the container, generate provenance attestations showing that the container was built by your trusted build system, store the container in your internal OCI artifact repository, and configure your deployment policies to only pull containers from approved repositories.

Red Hat Trusted Artifact Signer gives you the ability to sign models, and to validate a model’s authenticity and transparency using Sigstore and Rekor

Red Hat OpenShift AI 2.14 and later versions support serving models directly from ModelCar containers using KServe, eliminating the dependency on S3-compatible storage entirely. This simplifies deployment, improves inference server startup times, especially when the container is cached on a node, and provides a unified approach to artifact management across your organization.

A catalog of example ModelCar containers is available in the GitHub repository, providing templates and best practices for packaging various model types.

Model size considerations

While the OCI specification does not impose hard limits on image sizes, practical constraints exist. Container registries typically support images ranging from several GB to tens of GB, with most enterprise registries handling images up to 15-20 GB without issues. For very large models that exceed these practical limits, you may need to consider model quantization techniques to reduce file sizes, or alternative distribution mechanisms. However, for the majority of production models—especially quantized variants like 8-bit Floating Point (FP8) or 4-bit integer (INT4)—containerization with ModelCar is both practical and recommended.

OCI artifacts for models

ModelCar uses OCI containers for maximum compatibility with older systems that don’t fully support OCI artifacts, but OCI artifacts are arguably a better, more efficient choice for model storage. Build engineers can package their models into OCI artifacts instead of containers and store them in their OCI artifact repository. At deployment time, you can mount the OCI artifact directly to your model-serving Pod as storage, using a Kubernetes image volume

Containerizing MCP servers for enterprise deployment

MCP has emerged as a standard way to connect AI assistants and agents with external tools, data sources, and APIs. MCP servers act as bridges between AI systems and your enterprise resources, making their security and governance critically important.

For MCP servers that will be shared across teams or deployed in production environments, we recommend building them into containers using the same build tools and processes you use for your other applications. This approach provides consistency in how you manage, deploy, and protect these components. The process is familiar to anyone who has containerized applications—write a containerfile, build the image, push it to your registry, and deploy using Red Hat OpenShift, Kubernetes, Podman, or Docker.  You can use your typical build tools to sign and verify MCP servers, generate an SBOM, store them in OCI artifact repositories, and so on.

Like any software, it is possible for malicious code to make its way into an MCP server. Analyze and continuously validate your MCP servers using Red Hat Trusted Profile Analyzer to identify vulnerabilities, malicious dependencies, and policy violations.

Benefits of containerized MCP servers

Containerized MCP servers can be scaled horizontally to handle increased load, monitored using standard observability tools, and governed through your existing security policies.

The OpenShift Kubernetes MCP server demonstrates this pattern. It can run locally for development or be deployed in-cluster using the Streamable HTTP transport for team access. The server supports configurable access modes (read-only, non-destructive, or full access) and integrates with Kubernetes role-based access control (RBAC) for authorization.

An ecosystem is already growing around containerized MCP servers. For example, the MCP lifecycle operator facilitates deploying containerized MCP servers and connecting them to your agents through Kubernetes-native configuration. The Kuadrant MCP Gateway provides advanced enterprise security features. Other popular tools like Docker also work with MCP servers in containers.

When not to containerize your  MCP servers

Not all MCP servers benefit from containerization. The MCP specification supports 2 primary transport mechanisms—stdio (standard input/output) and HTTP-based transports (including Streamable HTTP and the legacy Server-Sent Events (SSE) transport). This distinction matters for deployment decisions.

Stdio-based MCP servers communicate through process streams, with the AI client spawning the server as a child process. This model works well for single-user scenarios—a developer's coding assistant, a local productivity tool, or a personal automation script. In these cases, the MCP server runs on the user's laptop, accesses local files and resources, and terminates when no longer needed. Containerizing stdio MCP servers adds complexity without significant benefit for these single-user, local use cases.

HTTP-based MCP servers, by contrast, run as independent processes that can handle multiple client connections concurrently. They expose network endpoints and operate more like traditional web services. These servers are natural candidates for containerization and benefit from centralized deployment, scaling, and management.

The decision framework is as follows: 

  • For shared/production environments: If your MCP server will be shared across a team, accessed over a network, or deployed to a server environment, containerize it.
  • For containerized agents: If your agent is running in a container, a stdio-based MCP server should run in the same container as the agent, while an HTTP-based MCP server should run in a separate container.
  • For single-user, local use: If an MCP server runs locally on a single developer's machine using stdio transport, containerization is optional and may add unnecessary overhead. 

Containerizing Agent Skills

Agent Skills have emerged as an alternative and compliment to MCP servers. They are "folders of instructions, scripts, and resources that agents can discover and use to do things more accurately and efficiently." The specification for skills has them packaged as zipfiles

You can extend your build system to package your skills into OCI containers or artifacts, just like models and MCP servers. Then, you can sign and verify skills, generate an SBOM, store them in OCI artifact repositories, download them, and attach them to Pods just like models. Because skills can contain scripts that might be platform-specific, OCI also has facilities to provide a different set of files for each platform. If your skill-enabled applications can’t work with OCI containers or artifacts yet, you can use the ORAS command line utility to extract a skill to the correct directory. 

Alternatively, skills could be managed using package managers in the future, similar to how other shared libraries are managed for various programming languages. In this case, skills would be imported and used within containers, but not necessarily distributed as containers themselves. 

This is an emerging technology, so watch for future developments! 

Containerizing AI agents

AI agents—autonomous systems that can plan and execute multistep tasks using AI models and other tools—represent the next evolution of AI applications. As agents move from prototypes to production, enterprises need a structured approach to building, deploying, and managing them.

The Kagenti project provides a Kubernetes-native framework for exactly this purpose. Kagenti works with any agent framework or SDK and provides modular components for production deployment. At its core, Kagenti treats agents as containerized workloads that can be defined declaratively using Kubernetes custom resources.

Kagenti uses Shipwright and Buildah to build agents into containers. If your organization uses Tekton or Jenkins for CI/CD, you can add a similar Buildah task to your existing pipelines. As with AI models and MCP servers, you can use your typical build tools to sign and verify agent containers, generate a SBOM, store them in OCI artifact repositories, and so on.

Like MCP servers, containerized agents can be scaled horizontally to handle increased load, monitored using standard observability tools, and governed through your existing security policies.  You can also analyze and continuously validate your agents using Red Hat Trusted Profile Analyzer to identify vulnerabilities, malicious dependencies, and policy violations.

Single-user agents and sub-agents

Similar to MCP servers, not all agents require containerization. Agents that run locally on a single user's laptop—such as sub-agents spawned by a coding assistant or personal automation agents—may not need the overhead of container packaging and Kubernetes deployment. These lightweight agents often run as child processes of a parent application and share the security context of that application.

For these single-user scenarios, the focus should be on verifying the parent application (the IDE, coding assistant, or automation tool) is appropriately protected, rather than containerizing every sub-component. Enterprise management of these local agents is an evolving area, and organizations should monitor developments in agent frameworks and tooling.

Containers for sandboxing

Because agents, MCP servers, and models that write and execute code introduce new vulnerabilities, a best practice is to restrict their access to systems and contain the damage they can do. This is sometimes called an "agent sandbox" or "code sandbox." 

Containerized software can be deployed with network policies that restrict communication with outside services, from opening and blocking ports, to allow-listing specific websites and services. Kubernetes RBAC and service mesh capabilities provide fine-grained control over access. OpenShift containers typically run without root permissions, restricting their access to data and compute resources.  

Containers also have limits on their CPU and memory usage. On developer workstations, Podman runs its containers without root permissions by default, and it also restricts the container’s access to your network and file system. OpenShift has long offered container isolation, and the Red Hat build of Podman Desktop enables container isolation on developer workstations as well.

Another concern is runaway processing by agents, leading to a denial-of-service attack. With OpenShift, cluster administrators can set resource quotas for each namespace, limiting the GPU, CPU, memory, and storage resources that any one project can consume. With reasonable resource quotas in place, a runaway agent cannot starve other workloads for cluster resources.

While they can run on a personal laptop, we have found that running coding assistants and personal agents in a container is often worth the effort. When an agent runs in a sandboxed container, it can’t damage the valuable documents on your laptop or access data that you don’t want it to use. This means that you can pre-approve common actions like file read/write, then let the agent run with less supervision, and simply review the end result of the assigned task.

You can also start up several agents at the same time and let them do their work in parallel, without interfering with one another. You can deploy long-running agents to your personal namespace in OpenShift, close your laptop, and go home while the agents continue working for you. You can even save the agent’s container to preserve its state and resume it later. 2 emerging projects in this area are paude for coding agents and openclaw-installer for OpenClaw.

Benefits of containerizing AI workloads

Containerizing your AI components—models, MCP servers, and agents—brings significant benefits that compound across your organization.

Software supply chain security

Containers can be signed, attested, and verified before deployment. You can require that all AI components are built by your trusted CI/CD systems, scanned for vulnerabilities, and stored in approved registries. Provenance attestations provide an audit trail showing exactly how each artifact was produced.

Version control and rollback

Container images are immutable and tagged. You can deploy specific versions, roll back to previous versions if issues arise, and maintain a clear history of what was deployed and when.

Consistent deployment

The same container image runs identically across development, staging, and production environments, reducing the risk that files are not getting copied correctly from one environment to another.

Observability

Containerized workloads integrate with existing monitoring and logging infrastructure. Kagenti, for example, supports OpenTelemetry (OTEL) tracing out of the box, allowing you to monitor agent operations using your standard observability stack.

Isolation and access control

Finally, containers offer rich capabilities for sandboxing, helping control the blast radius of any problems.

Looking ahead: Workload identity and zero trust

Containerization also lays the groundwork for more advanced security patterns. The Kagenti project is exploring integration with SPIFFE/SPIRE for workload identity and zero trust architecture for agents and MCP servers. While these capabilities are still emerging, having your AI components containerized and running on Kubernetes makes adopting these security features significantly easier as they mature.

Workload identity makes sure that each agent or MCP server has a cryptographically verifiable identity, enabling secure service-to-service communication without shared secrets. Zero trust principles—never trust, always verify—become practical to implement when your AI components are containerized and can be integrated with your existing identity and access management infrastructure.

Final thoughts

OCI containers provide a proven, standardized approach to packaging and distributing software that extends naturally to AI workloads. By containerizing your AI models, MCP servers, and agents, you bring the same governance, security, and operational maturity to your AI stack that you already apply to your applications.

The key insight is recognizing when containerization adds value. Containers are essential for shared, networked, production deployments. For single-user, local usage, containers are optional, but they offer sandboxing and portability across devices:—build once, run anywhere. This pragmatic approach lets you apply the right level of governance to each component while avoiding unnecessary complexity.

Red Hat OpenShift AI, Red Hat Trusted Artifact Signer, Red Hat Trusted Profile Analyzer, and the Red Hat build of Podman provide a solid foundation for managing your AI workloads from prototype to production. Red Hat adds enterprise support to open source tools with active communities to help you keep pace with the latest developments in AI.

Recurso

A empresa adaptável: da prontidão para a IA à disrupção

Este e-book, escrito por Michael Ferris, COO e CSO da Red Hat, aborda o ritmo das mudanças e disrupções tecnológicas que os líderes de TI enfrentam atualmente com a IA.

Sobre o autor

I've been a software engineer for 20+ years, I was a manager for 3 years, and I've been a security focal since 2018. Today I'm an AI architect and strategy lead, focused on helping developers, AI engineers, and platform engineers adopt AI into enterprise applications. In the past, I've worked in research, consulting, web portal development, IT systems management development, cloud computing, hybrid cloud, deployment automation, web platform development and operations, developer tools for Kubernetes, DevOps, SRE and platform engineering.

My specialties are leveraging artificial intelligence, AI Engineering, DevOps, cybersecurity, platform engineering, continuous delivery, cloud computing, distributed systems, agile development, scaling microservices, and high availability / disaster recovery for services.

In my free time, I enjoy reading, scuba diving, travel, games, and having fun with my husband, two daughters, and the family dog.

UI_Icon-Red_Hat-Close-A-Black-RGB

Navegue por canal

automation icon

Automação

Últimas novidades em automação de TI para empresas de tecnologia, equipes e ambientes

AI icon

Inteligência artificial

Descubra as atualizações nas plataformas que proporcionam aos clientes executar suas cargas de trabalho de IA em qualquer ambiente

open hybrid cloud icon

Nuvem híbrida aberta

Veja como construímos um futuro mais flexível com a nuvem híbrida

security icon

Segurança

Veja as últimas novidades sobre como reduzimos riscos em ambientes e tecnologias

edge icon

Edge computing

Saiba quais são as atualizações nas plataformas que simplificam as operações na borda

Infrastructure icon

Infraestrutura

Saiba o que há de mais recente na plataforma Linux empresarial líder mundial

application development icon

Aplicações

Conheça nossas soluções desenvolvidas para ajudar você a superar os desafios mais complexos de aplicações

Virtualization icon

Virtualização

O futuro da virtualização empresarial para suas cargas de trabalho on-premise ou na nuvem