This video can't play due to privacy settings
To change your settings, select the "Cookie Preferences" link in the footer and opt in to "Advertising Cookies."
Platform engineering for AI agents ft. Tushar Katarki
As we move from chatbots to autonomous AI agents, complexity is exploding. Red Hat’s Tushar Katarki joins Chris Wright to discuss building a "Kubernetes for Agents," the importance of the Model Context Protocol (MCP), and how to engineer platforms that get AI out of PoC purgatory and into production.
Transcrição
Transcript
00:00 - Chris WrightA decade ago, the enterprise shifted from monolithic applications to microservices, creating an explosion of complexity.
00:08 - Chris Wright
The open source community's answer was Kubernetes, a standardized platform for the cloud designed to manage this new distributed environment.
00:17 - Chris Wright
Today, we're facing a similar moment with the rise of agentic AI. We're moving beyond simple chatbots to build complex autonomous systems. And once again, we're seeing fragmentation and complexity. How do you actually build, deploy, and manage these multi-part applications reliably at scale?
00:37 - Chris Wright
The agentic era needs its own Kubernetes. Today we're diving into the critical infrastructure needed for the next wave of AI.
00:46 - Chris Wright
Our guest is a fellow Red Hatter, Tushar Katarki, who is focused on how we can build a scalable, open platform for agentic AI. Welcome to Technically Speaking, where we explore how open source is shaping the future of technology. I'm your host, Chris Wright.
01:03 - Chris Wright
Tushar, thanks for joining me. And before we dive into agentic AI and all of the exciting stuff there, let's just talk a little bit about your journey. You've kind of been at the center of a bunch of important transformations in the enterprise, Cloud being a prominent example there, in Kubernetes. So, what's your kind of background experience and how do you see this sort of bringing you in to this AI space?
01:26 - Tushar Katarki
Yeah, thanks. Thanks for having me, Chris. And it's lovely to to see you and talk to you. And yeah, I've been in the infrastructure software space for probably close to two decades now, and so, I mean, I'll just, not going too much into history, I'll just start with my experience at Red Hat. How about that?
01:46 - Tushar Katarki
Right, so, you know, I worked on everything at Red Hat. That's one of the nice things I like about Red Hat. I worked everything from high performance computing and grid computing to, you know, Kubernetes and OpenShift and storage and now AI. So I think that's one of the exciting things about how all those have led a solid foundation for what AI infrastructure should look like and this is great conversation to have with you on that.
02:15 - Chris Wright
Yeah, I mean, grid computing, there's a distributed element there. Certainly Kubernetes distributed bringing Linux to clusters, there's an interesting connection between those two, especially when you think about grid computing and high performance computing, starting to leverage GPUs and then, you know, the early days of CUDA to today where we are with a very different world with these generative AI workloads. And you kinda had a front row seat to all that.
02:40 - Tushar Katarki
Yeah, no, absolutely. I mean, you know, like when I joined Red Hat first, you know, kind of cloud was just beginning to happen with AWS so there was kind of elastic compute happening. Then there was this whole, as you said, GPUs were coming into the market for high performance computing, obviously for graphical users first, and then high performance computing.
03:05 - Tushar Katarki
And then on the hardware side, starting there, there has been a lot of change, you know, since then, right? Like those were GPUs, graphic processing units, but now we have TPUs, now we have, you know, AI chips per se, and then on the software side, the elastic cloud, the elastic compute with cloud has yielded to containers and Kubernetes and that revolution, right? Like so, and I certainly have been in the middle of all of that. And so I think it has been, you know, and then even on the data side, in fact, when you think about it, I mean, you know, SQL databases have led to NoSQL databases, you know, vector databases now for AI.
03:39 - Tushar Katarki
So I think all that has led a solid foundation. I feel like I've been involved in a lot of different things, including Kubernetes and OpenShift. So yeah, that'd be.
04:04 - Chris Wright
That's awesome. I'm an operating system person maybe at heart, so it's easy to get excited about the hardware and the changes that you described at the low level, but we can't forget that systems architecture views, you know, we are talking about distributed systems and that transition of traditional software architectures to more microservices architectures and containers being that unit of deployment. We're starting to see some interesting parallels in the AI space. And certainly you can deploy a container with a model in it. You can think about what it means to distribute that content across multiple GPUs, either from one server or on multiple servers, and ultimately getting all of this into production. So we've seen a lot of shift from the traditional application side. How are you seeing that inform some of the work that we're starting in that AI side?
04:59 - Tushar Katarki
Yeah, I mean, that's a great parallel to draw, right? You know, when I think about, now, let's step back and think about like what a generative AI application is. I mean, you can think about it as a RAG application or you can more generalize it as a agent or even an agentic system. And when I think about it, those involve, you know, some Python code, right? Like if a agent, if I have to describe an agent, some Python code, which is making a call to a large language model endpoint, you know, it's probably making some tool call to do some action or just read some external entity, and then it probably needs some memory and state so that it can recall context. So, you know, so starting with just that there is a piece of Python code that glues it all together. So, that's a container, right? That's Python code where it can be containerized. As you said, the language model itself can be containerized and especially in the enterprise context, you know, we get asked that about a lot, right? Like, Hey, I can download this model from Hugging Face that's great, but you know, we are an enterprise and we need to have some kind of provenance about where we can download the model and, you know, and establish some kind of a security and safeguard around it before we start using it in enterprise context. So I think, so I can start kind of seeing those parallels. And then you mentioned distributed systems also with that, right?
06:28 - Tushar Katarki
So, you got, I mean, two things. One is the large language model itself, you know, there could be many models, there could be very large models that don't fit into one GPU memory. So you had to think about how do you distribute these models across models weights across all these different GPUs? That's kind of one part of it, which has led to like things like llm-d for example, which we can talk about later. But also, you know, how do you scale the Python itself, the runtime itself? And that's where we have Kubernetes and pods and deployments and replicas and so on and so forth.
07:07 - Chris Wright
So basically an API and a simple application over an LLM. You started bringing agents into that, into this picture and describing an agent as an application, essentially wrapped around a large language model. So not unlike a chatbot, but it's doing more, it's got a mission in life. It is connecting with the rest of the enterprise to bring in data, tool calling, you know, query applications within the enterprise. This is where I think we're really starting to see the year of the agent. Probably next year we'll also be the year of the agent. Really, enterprise is trying to make this useful for themselves. And what are some of the challenges that you see in that context of bringing, you know, agents and models and enterprise context together?
07:58 - Tushar Katarki
Yeah, I mean, that's a great question. I mean, so first of all, I mean a few things. One is, let's just start with what an agent as you just said, right? The most fundamental really is that generative AI is probabilistic, not deterministic. So I think that's one of the fundamental things to kind of grapple with, so to speak. The prompt that you might have said may not give you the same answer, although semantically it might be the same. Obviously there could be hallucinations also, but even semantically it might be the same, but in the context of an application, I mean, even if it's semantically the same, you can't have a JSON output, which is different because if you're gonna call it tool later on, I mean, you can't pass arbitrary or arbitrary structure JSON output to that tool, right? So, I think those things, just in terms of what makes language more and generative AI itself has its own challenges. I think then beyond that, now you are talking about, as you said, there's a tool call involved here, right?
09:03 - Tushar Katarki
You know, sometimes because the model itself doesn't know anything, so about certain context and so for that context, either you provide that from a vector database or which happens in a RAG kind of application, or in the case of agents, that context might be provided by a database, which can be an MCP server or what have you, right? And then beyond that, then there is kind of the real enterprise setting, right? Like, you know everything from, you know, what does security, like, what does access control? How do I provide fine grain access control? How do I make sure that this tool call is not accessing parts of the enterprise that I do not want it to access, and how do I give it access to certain things only? Then beyond that you have, you know, how do I protect the agentic system from prompt injection attacks, jail breaks and so on, right? And then beyond that from a, you know, you gotta deploy all this at scale, that lots of models, lots of agents, you know, how do I observe it? How do I make sure that I'm collecting the right metrics from an operator's point of view? How do I respond to audit questions?
10:18 - Chris Wright
I love, I mean, you really gave a lot of things to think about there, and I love recognizing just as we started with a simple monolithic application and it turned into more distributed applications, and there's many of them. There's no enterprise that's run by single app, it's run by many. We start with something simple like a chatbot, maybe even a single agent. In the fullness of time, you see this very complex system, many agents, many models, a lot of data sources, many tools to call, lifecycle managing, and managing the dependencies across all of those things becomes this very complicated distributed system, which is where, you know, we're so excited about what Kubernetes can bring to this world, having learned a lot from the application space. What are some of the core tools that you're most interested in the agentic AI space that I think help advance that picture and tame the complexity?
11:17 - Tushar Katarki
You know, just from a frameworks point of view, and you know, developers, AI developers love using different tools because they have different capabilities and they're familiar with them, right? So they've developed them. I think one thing that you think about is, okay, you have developed these using different agent frameworks. What does deployment in a real production enterprise setting mean? You know, so that's one, and that's kind of where I think, you know, what I am excited about really is Llama Stack you know, which is something that, you know, as a company, but also individually, you know, we have talked about, you know, the need for it as more of a, just like starting to draw that analogy with Kubernetes, you know, what is Kubernetes? It's got that API and primitive substrate or layer for cloud-native applications. Similarly, you know, Llama Stack could be that, you know, as a API layer for that. And then, you know, so that's one part of it. So now what does that API provide for, right? Like, you know, we talked earlier about what an agent needs. It needs one or more endpoints to call and it needs one or more tools that it can call, it needs memory, it needs safety, it needs some kind of telemetry. I mean, those are five I can think off the right of the top of my head and Llama Stack has those APIs and primitives to describe that. So, you know, now beyond that, so there is also tool calling we mentioned, and that's where I think, you know, model context protocol from Anthropic, which kind of launched like earlier this year has gotten so much traction in the community and there's so many thousands of MCP servers that people have written now. And so certainly that's a point of interest MCP is definitely a big one so that, you know, to enable tool calling. And now we are talking about using MCP for almost so many different things, you know, for calling SaaS applications to calling, you know, databases and also for action, taking action.
13:38 - Chris Wright
I think MCP becomes a prominent central actor in this whole picture because it's a bridge between the model and the state that's captured in data sources, tools, applications within the enterprise, which of course the large language models weren't trained on all of that context. So we need to pull that context into the model in some way, shape, or form. Interesting to think about agent-to-agent communication, even taking it a little further in the sense of agents having identity, that identity being part of a security framework. If an agent's working on my behalf, I'm delegating some kind of authority. You know, we have a service mesh in application world, do we have an agent mesh that is important to highlight in this context of agentic workflows?
14:33 - Tushar Katarki
Right, like how do I make sure that there is a person behind this, you know, identity, right? Like behind this agent identity, right? Like, you know, otherwise I cannot really go and if some something goes wrong or if I have to respond a compliance request, how do I make sure that I know who to call, a human being to call. So there is kind of that aspect to it. Then, so in terms of what we are doing really is that, I mean obviously the MCP community also is taking it very seriously and they are, you know, there is new, quote-unquote, "RFCs", if you will, for a lack of better word, that they're working on how to manage identity in the context of MCP and MCP servers. And then, I mean, Red Hat also is join those communities and participating in that. And the second part of it really is the agent-to-agent one that you talked about, which is really how do, I mean, so in some ways I think that some of the work that MCP is doing hopefully can translate to that because in some ways you could argue that calling a tool and calling an agent could kind of reuse the same constructs and primitives, et cetera. But then whether you call a tool or whether you call another agent, because ultimately as we said, we are exchanging context either from an external source or from a different agent. So that's kind of one way to think about it. And then, yeah, I mean, you know, like in a agentic system with many agents you know, there is this concept of agent mesh that you can imagine, although I personally have not seen it yet, Chris, but I think that is something that I can see the natural evolution as to just how Istio and Service Mesh all these came about in the world of containers and well more than container, microservices. And, you know, and the suppression of concerns that naturally brings between the AI developer and their innovation and what they want to do versus the platform engineering, the operations person and what they want to do. So I think that that really, the platform is what that makes that happen.
16:50 - Chris Wright
Observability I think is really important. As systems get more and more complicated, we have to be able to look inside and understand what's happening. There's an audit and compliance aspect to that and there's just a debugable, you know, how do you root cause an issue, side of that. What are you seeing as the really important aspects of observability when you think about all these pieces, the MCP servers, the agent itself, the models, what are the key pieces?
17:15 - Tushar Katarki
Yeah, I mean, I think from an observability point of view, a few things, right, like one is, I mean, when I think about, you know, the model input and output has to be both from a metrics or a time series point of view has to be observed, right? Like, you know, so that's kind of where we are. I mean I'm now stepping back and thinking about like where llm-d and others play a role here, right? Like, you know, what an enterprise building agentic system needs access to a lot of different models, right? And so that's kind of where the Model-as-a-Service and llm-d and those constructs come into play. But where there is kind of this gate API or AI gateway layer where there is a need for observing token metrics, you know, everything from time to first token to inter token latency, end to end token latency to throughput or quest per second, et cetera, those kinds of metrics at that layer. But also, you know, that's definitely where I see a lot of innovation happening and certainly I'm in the middle of all that right now. And so you could also argue that over there also, I'll just, there's some amount of governance that needs to happen there too in terms of input and output filtering in terms of safety, you know. Can I have some kind of a enterprise level input and output filtering, just to guard both against input attacks but also against leaking enterprise data and so on and so forth. So that's one part of it. Then, you know, kind of the other one over there still is kind of how do I enforce quotas? You know, like how do I do rate limiting and how do I enforce quotas at that token layer and how do I assign costs, right? Like, you know, so those are some aspects again, again, they build upon each other. So if you can think about metrics, logs, and traces are kind of the fundamental building blocks, both at the infrastructure at the GPU and compute layer, but also at the API gateway layer. Then the cost and other things are overlaid on top of that. So that's kind of one way to think about this. But in terms of observability back to, so that's at the AI gateway layer, then, you know, there's also the observability, because if I'm an operator, right, I want to also want to see, my concern is about my whole entire distributed system, my whole entire Kubernetes cluster, how are my GPUs being utilized? What's the utilization on them, et cetera so I have my concerns at that level, and then there is the individual platforms who are being deployed on top of that, and that's at that gateway layer and everywhere in between is what I'd say. And then that ties back to what you were saying about mesh also, right? Like, you know, some, it's kind of, I'll just say that a lot of exciting stuff, but I don't think we have figured all that out yet.
20:29 - Chris Wright
All right, so you laid out a lot of the complexity there, a lot of different pieces, the MCP servers, the agents, the frameworks, the guardrails, all of these things almost feels a little overwhelming. And I know in the beginning we're thinking, how do you just get started and build an agent? In the fullness of time, we're thinking the enterprise is filled with agents, and agents are working together and really helping all of the people involved build a more effective, more impactful business. Feels like all of this comes together on a platform and that's the binding glue that brings all the different tools and even simplifies the production deployments of all these pieces that you enumerated. How do you see the platform? What is the role of the platform in this context?
21:23 - Tushar Katarki
Yeah, I mean, yeah, so to first of all, to deal with all this complexity, enterprises certainly had to start about that word exactly, how do I build it? What does the AI platform mean for me? Who are my consumers, right? And what are some of the services that I need to provide? You know, what kind of hardware budget I have that I can play with? What kind of SLAs and SLOs do I need to, you know, SLAs that I can promise and SLOs that I can execute on? So that's definitely, you know, that kind of thinking, which we have done in some ways in the world of, you know, modern applications, microservices, you know, by developing a DevSecOps platform, right, and a open hybrid platform, is the thinking that need really needs to come to AI to deal with the complexity that we talked about, right?
22:21 - Tushar Katarki
And so to that end, what does a platform mean? A platform to me means that it provides all the essential AI services that the agentic system needs, right? We touched upon several of them earlier, but just to recount, you know, how do I provide inferencing for various kinds of models? How do I provide memory, you know, and context? How do I provide tool calling? Those are the essential three. And then beyond that, we talked about safety and guardrails. We talked about evaluation, we talked about telemetry, right? So that, you know, I have... So how do I build that? Who are going to be my backend providers for that, right]? So there is different vendors that are, again, very rapidly evolving and innovative, you know, landscape out there.
23:17 - Tushar Katarki
So I got to think about who are the vendors that I want to work with, who is my platform provider? And also you gotta think about where do I want to run it on. You know, one thing, especially in the context of AI, what is different? Well, it's not just the context of AI, but also the present kind of geopolitical situation is two things. One is the regulation that comes with AI itself, right? And the possibility of reputational harm and so on and so forth. So where can I run this? How much of my data can I really afford to, enterprise data can I really afford to give it to a third party? Both from a policy perspective, we talked about that earlier. So what I'm saying really is where do I run this platform? How much should I run inside my "private cloud" or sovereign cloud or data center? How much should it be on a hyperscaler or a new cloud, et cetera. Those are visions that you have to make. And what is a platform and what are the technologies that allows me to be flexible in that regard, right? Again, this, nothing is... I mean, the past 20 years has been really fast and furious, but I mean, this AI is at a totally different layer. So how do I be adaptable and be, so that's the second part of this, you know, is how do I build a platform? Where should I run it on? How do I have that flexibility? For example, one example that we could think about is, you know, obviously NVIDIA is supreme. I mean they make amazing chips and they have the software stack to support it, but then, you know, how do I, you know, there could be different circumstances when I need to look at other vendors also, be it for cost reasons or be it just availability reasons or be it just because, you know, of certain other geopolitical reasons that I could. So that kind of hybrid mentality, hybrid cloud mentality, which is something that Red Hat has been talking about for a very long time. So that's important. I think we covered it. I think those things, you know, what's my platform? Where do I run it? What are my services that I'm going to provide? How do I scale it? How do I secure it? And, you know, what are the providers and vendors that I'll use to bring the best of breed and provide choice to my-
25:41 - Chris Wright
Yeah, yeah, I think the, what I heard in that description is the platform creates this layer of predictability, consistency. It allows you to choose models, all the frameworks, the possibilities of what you're going to deploy, infused with best practices around observability, guardrails, et cetera, that might be implemented at a platform level. Also insulating you from the choice underneath, which could be the infrastructure, the location, the hardware, which might be the focus for the platform engineering team to really build the best, efficient infrastructure to support the agentic workflows on that platform is the point of consistency between those two, fast moving innovative spaces, the models and the agents and the frameworks and the underlying hardware
26:35 - Tushar Katarki
And, you know, and the seperation of concerns that that naturally brings between the AI developer and their innovation and what they want to do versus the platform engineering, the operations person and what they want to do. So I think that really, the platform is what that makes that happen.
26:50 - Chris Wright
Tushar, it's been really, really fun having you on. I especially love that analogy you've drawn to the platform and how we can bring choice and flexibility with consistency to really help get out of the challenges that we've seen with POC purgatory and transition into production. So, appreciate all of your insights. Thank you for spending time with us today.
27:14 - Tushar Katarki
Thank you, Chris. Appreciate it. Thanks for having me.
27:17 - Chris Wright
The rapid evolution of agents and the AI world is driving a need for a standardized platform to help manage that chaos. And as we've discussed, that platform isn't about replacing the dozens of agent frameworks or models. It's about providing that common Kubernetes-like backbone that can manage all of them. By providing a common way to handle tool calling, memory, safety, and telemetry, we can allow developers to bring the frameworks they choose while giving operators the governance and observability they need. That's the essential bridge from demos and experiments to bringing AI into production at scale. Thanks for listening. Can't wait to see what we discuss next on Technically Speaking.
About the show
Technically Speaking
What’s next for enterprise IT? No one has all the answers—But CTO Chris Wright knows the tech experts and industry leaders who are working on them.