Technically Speaking | Build a production-ready AI toolbox

This video can't play due to privacy settings

To change your settings, select the "Cookie Preferences" link in the footer and opt in to "Advertising Cookies."

Build a production-ready AI toolbox ft. Cat Weeks

  |  Technically Speaking Team   Inteligência artificial

Is the era of the massive AI monolith over? Red Hat CTO Chris Wright and Cat Weeks discuss why specialized models, agentic workflows, and a "right tool for the job" mindset are the keys to moving AI from flashy demos to secure enterprise production.

Transcrição

Transcript

00:00 - Chris Wright
In any trade, you use the right tool for the job. You wouldn't roll a massive floor-to-ceiling mechanic's tool chest onto a job site with thousands of specialized tools just to hang a single picture frame. Yet, in the world of AI, many enterprises are being told to use the largest, most expensive tool, the frontier model, for every single task. This isn't just inefficient, it's incredibly costly, but the alternative, using smaller, specialized models, introduces its own set of challenges. Today, we're moving past the demos to talk about what it really takes to get AI into production and why the biggest hurdles aren't just the technology and the hardware. So to get into that, we have longtime Red Hatter, Cat Weeks. Welcome to Technically Speaking, where we explore how open source is shaping the future of technology. I'm your host, Chris Wright. Catherine, it's great to have you on the show. You working in this really important space of really helping get AI into production, building that production reality for AI workloads. We talk a lot about agents and agentic workflows, and the demos are flashy, but the translation of those demos into real production environments are still leaving some scratching their heads. So what do you see as the real disconnect there?

00:18 - Cat Weeks
Yeah, it's great to be here, Chris. It's so interesting to watch companies try to go from a POC to production. I've been working in enterprise software for 20 years, and every major technology disruption that we have, you see it fall down immediately on enterprise readiness. As soon as companies try to take that new technology and apply the "-ilities" to it, you know, make it reliable, make it secure, make it portable, make it foundational to their infrastructure, all of those things. That's where it can fall down so quickly, and so it's the same with AI. That's what we're seeing right now. People are doing really, really great demos, but as soon as you try to take that and put into production, you have to move from look at what this thing can do to how can I do this in a reliable, secure way for millions of users at once? And it's a whole different ball game at that point.

02:20 - Chris Wright
I can really relate to those enterprise challenges, having also worked in the enterprise space for quite a long time. I wonder if there's also an element of we're trying to take that, the canned demo, and bring it into a real world environment that isn't the canned demo, and so there's the "-ilities," but there's also the connection into the enterprise. I know data is central to the AI workflow. What do you see as some of the challenges in that kind of people/process before we get into the technology of what it means to bring AI and agents and autonomy into the enterprise?

03:00 - Cat Weeks
I mean, there's so many concerns. It's hard to even pick one direction, in a way, but to start with one, I think AI is still scary, and so there's gonna be organizational pushback on adoption of AI just because people don't know what that means for them, and there's fear around it. So I think organizations need to think about how they bring their employees in, how they experiment with their employees as part of the experimentation process, and give time for those organizations to kind of learn and apply these technologies in ways that work for the employees and help the employees and the employees feel good about, or they're gonna kind of get continual pushback on any implementation that they try out.

03:54 - Chris Wright
I love that you're bringing the human aspect into this. I think it's really critically important. I know we've gone through a lot of thinking, within Red Hat, on this exact topic, and some of the language I usually use is about augmentation and amplification, just to help, like make it a little less scary sounding. It's not about replacing roles. I would acknowledge it's about being able to do tasks and so there's, potentially, some task replacement, and you could logically conclude in the context where a role is made up exclusively of those tasks. Well, maybe that kind of role is harder to see how it evolves, but bringing the experimentation along, bringing the employees along with the experimentation process. I think you also mentioned in that experimentation, giving people the time and space, giving that grace that we have to go through the learning process collectively, but also as individuals, I think inclusive of enabling people with some training, giving people the opportunity to upskill, because it's not just play with it at home and create cool videos and upload them to your favorite social media site. It's also how do you apply it to your job and make it a little more real, and that requires some time. So it's not just in your own time, but it's actually part of your job to figure out how to use these tools.

05:27 - Cat Weeks
Yeah, and that's an excellent point. How do you build it into the work day and actually get people using these tools? And I think where we're seeing this happen really effectively here at Red Hat in my teams is exactly that. Like really letting people experiment in their day job and say, could you use AI to do this a different way? How could you do it a different way? What's possible? And letting our designers, our engineers, our documentation folks, like letting them experiment and see what they can achieve with AI, and what they're coming up with is amazing. It's so cool to watch, and when you see people excited about the technology and building on each other's ideas, that's when you're like, "okay, this is what we want. This is what we want to see."

06:20 - Chris Wright
That new way of solving problems, you know, what we're seeing, the enthusiasm, now we can own it because we're excited about it. I think that also kind of touches on the processes that we have within an organization, were designed around different tooling, often just human to human interactions. I usually say, "a stupid process automated is an automated stupid process." So what do we have to do to think differently about the processes that we're using to get work done as we introduce AI into the picture?

06:57 - Cat Weeks
There is probably fundamental changes to our processes ahead, and I don't even know that we can conceptualize it fully. We might start with AI agents that kind of try to replicate humans and follow the same processes that we have today, but we're gonna learn pretty quickly that there's a lot of bad processes. There's a lot of fundamental mistakes in how we think about our processes and will change things pretty dramatically as we move along and move into a space where these agents work together in ways that we don't work together as humans and doesn't look anything like our kind of human space.

07:37 - Chris Wright
So we got some organizational concerns around the antibody reject. We got some understanding of the processes and how to either reimagine them or even understand what they are today. Then there's, there's data. All of this is gonna require access to data, consistent views of data. What do you see as the challenges there?

08:01 - Cat Weeks
Yeah, data, I think, is the fundamental challenge. Data was critical back when enterprises were doing predictive AI as well, and so this isn't a new challenge. In predictive AI, there was a lot more onus on we have to spend a lot of time in our data to do anything with it and really work with our data to get it into the format that we can train with it, but with generative AI, there's some new patterns that are sort of interesting. If you can actually let generative AI get access to your data and you can put in agents that are exploring it and you have agents that are validating that exploration, there's some new patterns there that are sort of interesting that get you a little bit further away from "I need a PhD-level data scientist to understand what's going on with my data." So your data foundations have to be strong, and you have to make sure that you're able to get access to that data in a secure way, that you're bringing your data together in the right places, and that at the end of the day, evaluation is still key. If you can't evaluate whether generative AI is doing the right thing, you're not gonna get very far, and to evaluate your LLMs and SLMs and know if they're doing the right thing, you have to have well-labeled data, and so that's gonna still be a key part of the solution for you.

09:39 - Chris Wright
I love that you brought up evals. The personal mission is make sure we think of evals all the time. I think that they're really, really important. The access to data, okay, clearly there's a security question there. Aggregating data in the appropriate way also brings in a security question. Can you pull in data sources aggregated that you shouldn't have been able to get to access as a single source. There's also another question about sort of data stewardship. Who owns the data? Who's the authoritative, within an organization, now I'm talking, who's the authoritative owner of a data set so that it's lifecycle-managed? It's always you're accessing the right data set. I think an easy degenerative case would be two agents making similar decisions on apparently the same data, but practically speaking, very different sources of data. Maybe one's older or maybe one's missing some of the full context, and so how do you think about the, I guess it's data quality. It's not the data engineering part that you describe with ML, but it's the who owns it? Who's the steward of it?

10:54 - Cat Weeks
This is going to be a huge challenge, I think, for enterprises, because for years and years and years, we've built up kind of bad practices around the industry around copying data all over the place, and there's some enterprises that have spent a lot of time trying to clean that up and have like true sources of truth, where the data and really understanding what the source of truth of data is in their enterprise, and I think those that have spent time doing that are gonna be an advantage at this point, but I think agents and and apps using generative AI are gonna reach in, and they're not gonna understand whether one source of data has been modified or not modified by another system or who owns that data and who's maintaining that data. They're just looking at it for what it is. So you have the risk of having unreliable outcomes if you don't spend time really making sure your data is strong at the foundation.

12:01 - Chris Wright
Yeah, I think it's important, and I know one of the things that we've done is while building that best case data foundation, which could be a real time-consuming, big effort, we've also identified some key data sources and built agents around those almost in a greenfield way to show what's possible, recognizing that we also need to merge this in with a more consistent data foundation

12:28 - Cat Weeks
Yeah.

12:29 - Chris Wright
inclusive of the security concerns. So maybe we could bring security into the picture. There's a lot of security questions with agents, the access to data, but even something simpler, like the identity of the agent and the relationship

12:44 - Cat Weeks
Yeah, that's right.

12:44 - Chris Wright
between the agent's identity and maybe the human that might be kicking off the agent's workflow.

12:52 - Cat Weeks
That's right. So historically, when we've been working with systems, we look at it as a static identity. You have a static user, and you're expecting that user has an identity or you have a fairly static application that has a certain job to do, and you're expecting that static identity from that application. As we move into agents, we're talking about a dynamic identity, all of a sudden, because the agent is changing over time, and it's trying to accomplish a task, and as it learns more about that task and about the systems around it to achieve that task, it might need to dynamically change what level of permissions it needs to access to get that task done, and so it's a whole different paradigm for us to deal with when it comes to identity and authentication and security. So there's a lot of thought happening right now about how do we create systems that enable that sort of understanding of the agent so that we have two main things. We have attestation of where that agent is running and who it is, in a way, making sure that it's coming from a trusted and secure place, but then also the identity and authentication and permissions of what that agent should actually be allowed to do and what are the limits of what it should be allowed to do, and you have to pair both of those together moving forward. So it's very interesting right now to see where the security of agents is going to head.

14:32 - Chris Wright
Yeah, historically, there's access controls, there's notions of least privilege, there's concepts around capabilities and really negotiating capabilities. I think about agents in the context of humans kicking off an agentic workflow. The same agent used by two different people should have two different kind of authentication authorization capabilities, but I hadn't considered what you were describing, which is almost given my capabilities, security access capabilities, asking an agent to do something and it effectively coming back and renegotiating some expanded ways of operating within the enterprise that's still confined, shouldn't just be able to negotiate arbitrarily for privilege escalation, but somehow recognizing that that is an okay request versus something that looks malicious.

15:32 - Cat Weeks
Yeah.

15:32 - Chris Wright
So I can see there's a lot of research work to be done

15:34 - Cat Weeks
There is.

15:36 - Chris Wright
in this space.

15:36 - Cat Weeks
Yeah, and I think your case is definitely, I hope, what we can solve in a simpler way, which is, yeah, I am asking an agent to do something, and somebody else is asking an agent to do something, our identities are different, and you start from our identities as the base,

15:53 - Chris Wright
Yeah, yeah.

15:54 - Cat Weeks
but I think it does get much more complicated as you kind of let these agents have more agency and get out there and do more things on their own in an enterprise.

16:06 - Chris Wright
I think largely in the delegation sense of I've kicked it off, you've kicked it off, we come with a different security context, but in any case, my "go buy chewing gum" agent shouldn't have the ability to then fire people in my organization ever, right? So there's identity of what the agent is, there's practical negotiations of the capabilities required to get the tasks done. Interesting to note where that's automated and where that still requires a lot of manual intervention. I'm sure we'll start with human in the loop.

16:38 - Cat Weeks
Absolutely, yeah, and human in the loop is a great pattern to start with for almost anything an enterprise starts experimenting with. We should always start with human in the loop, make sure that we understand what these systems are doing, where they're touching, and then build confidence and trust, and as you build confidence and trust, then you can start to think about how you remove human in the loop.

17:01 - Chris Wright
You brought up the microservices world earlier, and one of the things that sort of became important in that context is connectivity between the services. We have a service mesh, we have notions like SPIFFE and SPIRE which kind of created this identity for everything, not just human identity. We even application identity. Now we're taking it a step further to agent identity, potentially even sub-delegated identities to subagents. I see this agent mesh concept emerging out of all of that. So there's just so much interesting things that we're doing from a technology and research and development point of view to really make sure these are secure agentic autonomous workflows for enterprises. So this is an important topic. We'll keep having this conversation as we go forward. But maybe to switch gears a little bit, thinking of those microservices that was take monoliths and make them smaller, bounded contexts, domain-driven design. The larger models are used to great success for general purpose knowledge bases, but generally speaking, they're trained on the internet, not on the enterprise. So there's a couple different ways we can bring the enterprise to those models. One is RAG, or retrieval augmented generation, and one is more in the context of fine tuning. Do you have any guidance or recommendations for where those tools can be used, how to use those tools, getting started? What is the best way to jump in with your own data?

18:39 - Cat Weeks
Yeah, we see a lot of enterprises start with RAG. I think retrieval augmented generation is sort of an easy entry point for people to try first, and that's incredibly useful if your use case is one where you want to reference the data, and I think of it as an like when you were writing a paper as a kid and you are referencing an encyclopedia. You go through your encyclopedia, you find the data and the information that you want, you then go to your research paper, you put in your interpretation of that, but you reference back to the encyclopedia, and you can go actually look at that encyclopedia and see the article. RAG works very much the same way. You're referencing information that is in your enterprise, but if you really want to get to the point where you're using generative AI to, sort of, its full power where you are letting it generate additional information and make predictions based on information that doesn't even exist today, then you probably wanna move to fine tuning, and fine tuning a model is actually changing the weights of the model. So you're going to take your data, apply that into the model, change the weights of the model itself, and now that generative AI model can actually make new assessments of the world based on that training, and so that's more powerful for use cases where you want to use this model for things that don't exist today, things that you can't predict that would exist tomorrow, and you can use that for those sorts of use cases.

20:18 - Chris Wright
And in either case, you can start leveraging maybe a more cost-effective model, something that's smaller. It doesn't need to know all the information on the internet and how to reproduce content in haikus. It just needs to know how to look up content specified in embeddings in a vector database or weights that it's been pre-trained on or post-trained on.

20:40 - Cat Weeks
Yeah, and that's an excellent point. If you go try to do RAG or fine tuning, especially, with a large language model, the cost is going to be much higher. Just on the training for the fine tuning, it might be so high that you can't even manage it, right? So small language models are often going to be able to do what you need. They're able to speak the language and interact in that kind of human-like way, but now you're feeding in your data and making them specialized on what you're trying to do, and so now you can, first of all, be faster with your answers, 'cause it's a small language model. It's going to run faster. So you're gonna get better performance, you're gonna have lower cost, and it's gonna be just as accurate.

21:29 - Chris Wright
Yeah, performance, I think, is overlooked sometimes, and as a user of agents, we've done some experiments with some pretty slow agents, and you start feeling like I could do this faster, and then you just go do it, and it sort of defeats the purpose. And then there's time sensitive in speech-to-text context where you really need to respond in the real time that humans are actually speaking. Okay, that's a very time sensitive type of model required, and I think bringing cost into the equation, success looks like not one agent doing one thing, but tens, hundreds, thousands of agents doing many, many things, which means there's large scale deployments, lots of agents, lots of models, all of those models can incur costs. So cost management, I feel like, is a really important part of the full picture of the enterprise.

22:25 - Cat Weeks
It is, and as your intro talked about a little bit, like we want a full toolbox of capabilities here, and so sometimes if you're doing kind of deep research kind of tasks, maybe an LLM is appropriate, 'cause you want all of that context and that information, but many times in the enterprise when you're trying to solve specific use cases, yeah, creating very specific small language models, creating agents that do those specific tasks, you're gonna get a lot more done pairing a bunch of those things together and building out your toolbox with a lot of small, very focused answers than you are just throwing a big LLM at it.

23:07 - Chris Wright
Yeah, it makes me feel like the Unix days and early Linux days of many small tools that you compose together to create an outcome, and ultimately microservices are like that with many small bounded services that you aggregate together to build this big system, and I think agents are very much in that same ballpark of focused, domain-specific areas. We can add agents together and really change how we do the work and go back to defining smart processes, maybe writing them down this time. Maybe it's some markdown file.

23:41 - Cat Weeks
Let your agents write down your processes for you, and actually, that's a very fun thing to do is build a bunch of stuff, and then go back to your LLM or your SLM and say, tell me what this is. What did I just build? So that's a lot of fun now with generative AI to like, just kind of get lost in building something and then come back around and say, show me a systems diagram of what I just built.

24:11 - Chris Wright
Yeah, you never know what you might get, and there's actually some interesting work that I've heard people doing, interviewing, using a bot to do some interviewing, and then taking the transcription to understand what is a process

24:27 - Cat Weeks
Oh, interesting.

24:28 - Chris Wright
as a beginning of building a spec to define an agent that can go do said process. So it's turtles all the way down.

24:35 - Cat Weeks
It does, it does. Yeah, it's gonna be fascinating to see where the industry goes with this technology, and I think that's one of the things that's really exciting right now is just the potential of all of these different directions. The LLMs continue to expand and improve what they can do. We're starting to see what we can do with agents and how specific we can get with these agents to achieve really interesting use cases, and it's just as we kind of expand out these different possibilities, I think it's gonna be fascinating to see how that changes the industry over time.

25:15 - Chris Wright
Well, I couldn't agree more. Really appreciate your time and sharing your insights for helping us think about how to get started and then building this really impactful, to be embraced, and cost effective enterprise-wide set of AI tools in our AI toolbox. Thanks, Cat.

25:35 - Cat Weeks
Thanks, Chris.

25:37 - Chris Wright
This conversation with Cat really gets to the heart of the challenge we're all facing. The AI space is moving at an incredible speed, but many organizations are still figuring out just the foundations, the capabilities, the risks, the knowledge needed. Much of it is still being defined as we go. So my takeaway is this, the entire industry is experimenting, but the future of the enterprise won't be a monolith. We may see a diverse fleet of models, large and small, each one securely fine tuned for a specific job, but experimenting early is the key to learning enough to start to move toward production. If we bring experimentation and innovation together with a strong foundation, we can be ready to take advantage of the opportunity ahead with a full toolbox of LLMs, SLMs, and agentic AI. Thanks for joining the conversation. I'm Chris Wright, and I can't wait to see what we explore next on Technically Speaking.

About the show

Technically Speaking

What’s next for enterprise IT? No one has all the answers—But CTO Chris Wright knows the tech experts and industry leaders who are working on them.