This video can't play due to privacy settings
To change your settings, select the "Cookie Preferences" link in the footer and opt in to "Advertising Cookies."
Security for the AI supply chain ft. Aeva Black
Securing the software supply chain has always been a challenge, and AI has introduced increased scale and complexity. We're no longer just securing lines of code; we're responsible for ensuring the models that generate code are secure. In this episode, Red Hat CTO Chris Wright is joined by open source security expert Aeva Black to explore this new class of threats. They discuss everything from hidden backdoors in models to the weaponization of optimization techniques and why open source principles like transparency and community governance are our most effective tools for building trust in these complex systems.
Transcription
Transcript
00:00 - Chris WrightWe've spent the better part of a decade talking about software supply chain security. We've learned to check the ingredients, verify the source and scan for vulnerabilities. But what if the instructions for building the software were compromised? What if the blueprints themselves contained invisible ink with hidden guidance that only appears under specific conditions? That's the new reality we face with AI. We're not just consuming code anymore. We're consuming models that generate code and the attack surface has shifted in ways we're only beginning to understand. Today, we have Aeva Black with us to look under the hood at this new class of threats and discuss how we can build trust in an AI-powered world. And spoiler alert, the answer looks a lot like the open source communities that got us here. Welcome to Technically Speaking, where we explore how open source is shaping the future of technology. I'm your host, Chris Wright. Before we dive in too deeply into the details, I'm curious for you, what was that 'This changes everything' moment when you realize that AI is going to really fundamentally rewrite the rules for open source and security?
01:21 - Aeva Black
The first moment for me was really probably in 2018 or so. I took a little break from my day jobs to dive into AI, did some LLM training myself and was preparing for a talk at a conference in Portland. Building that demo showed me how easy it is to embed a bias and create hidden backdoors in a model, and then I demoed it. And then the very next year at DEFCON I got to watch the demo from the AI team there of a deepfake of Senator Ron Wyden while he was in the room, watch his own face and his own surprise. I think this was the first time that anyone in government really realized, "Oh crap, this stuff's going to be a problem."
02:02 - Chris Wright
I love the shoutout to Portland. Some might know that's where I'm from. I don't live there now, but yeah, and what a powerful moment to really see, especially the impact, the look on Ron Wyden's real face with a deepfake and the recognition of what that means, the implications of all of that. You have a really interesting career history. You've had the opportunity to be in the middle of a lot of different important technology shifts, early days of P2P file sharing, the open source database world with MySQL. I think where we first got to connect really was in the OpenStack community. So what are some of the aspects of that journey that kind of brought you into this thinking about the connection of open source and cybersecurity?
03:01 - Aeva Black
Yeah. Well, in the early days of my career, I guess right before my career started, I watched the crypto wars unfolding in the '90s when I was in college studying computer science, and I thought, "I don't really want to get anywhere near that." But then during the OpenStack days when I was building Ironic and I was working with teams managing hardware remotely and hardware security and how we were implementing things like IPMI and remote firmware management, I started to see two things that really changed the path of my career after that. Product security wasn't a thing that developers didn't want to do, they wanted to. But so often I saw business decisions forcing good open source engineers to make the tough trade-off between, "Oh, go build a feature or go fix this bug." Yeah, saw that I could not fix some of the problems with just writing more code. I needed to start thinking about policy and how people use software, not just what's available but how it's used.
04:04 - Chris Wright
I mean, I know that the cliche would be shift left in putting developers more prominently in the position of being a part of that security story. Maybe that draws to the foreground where you see the role of community and governance and even how we think about the definition of community health in that context of bringing security into the foreground. And then as we look at AI as a new actor in this story, what that means at the community level, at the governance level.
04:39 - Aeva Black
As you know, building open source communities is a lot of work and it's not often seen as the technical work of writing code, but it's so critical to the sustainability, the safety of the final product. When communities are struggling to get maintainers or their infrastructure isn't being funded to run all the CI tests or fuzzing or whatever else, it creates a lot of security problems downstream, and this is why I think so many of us have spent the last five years really trying to raise this awareness that security starts by shifting left when your products depend on open source. That shifting left has to include the open source communities as well. So I'm kind of worried that this wave of AI enthusiasm, like most things with a hype cycle, it'll probably slow down and there'll be something else exciting in five more years. Who knows what it is. But right now there's all of this investment in AI and there's a lot of new pressure on communities both from generative code that is sometimes of questionable quality to bug reports. I know that there's been an effort, some DARPA funding, some big announcement at DEFCON this year and last year for sort of an AI-driven bug hunt challenge and some open source projects actually won that challenge, which is great for open source. But the impact, well, some of the bug reports aren't great. I got to see Daniel Stenberg, the lead maintainer of the cURL Project, talk about how much AI slop, that's what he calls it, and is the title of his keynote, how much AI slop is burdening the cURL project, enough that they're actually going to have to stop their bug bounty program.
06:29 - Chris Wright
The hype cycle is a part of this, and then also tool and user maturity connected together. Users get better at driving the tools, tools themselves get better, so I'm actually quite optimistic that the leveraging of generative AI to produce more tests and better test coverage or understanding of where we should be, including security checks and even policy enforcement points could help us write more secure code. But I do think we're in that uncomfortable, maybe it's the teenage years, it's a little uncomfortable. We're not quite experts at this. I think it is important to think about that whole supply chain of the development of the model and the relationship of content that's going into models possibly as models are aligned, tuned, further augmented and post-training, what does that provenance look like? How do you see that in the context of supply chain security that we understand in a software world? I mean, there's so many analogies we can draw between the two, but in some cases it's really just a different space with AI. How do you see that evolution going?
07:56 - Aeva Black
Yeah, I don't think we've changed the goal of the game. Of course, it still is sustain open source communities and be able to build businesses with them and work together, also sometimes see our friends and community, but we're playing on a different field now where we need to start considering that data in all of its forms is effectively code for these systems, and that's a new layer. It involves other aspects of intellectual property. It involves other aspects of privacy and all of those sorts of things as well that we didn't have to think about before. And yes, SBOMs are now, I hope everyone's making an SBOM, but it's pretty easy to conflate authenticity and integrity. Is this package from the source I think it's from or was it changed along the way? Does it contain what I expected it to contain? If multiple hands have done fine-tuning along the way, how do we know? Because the data doesn't get transmitted. There's no code fingerprint left over from it that we can audit.
09:03 - Chris Wright
A very great insight there, and I think it kind of maybe draws in an interesting recent paper, the quote-unquote "Mind the Gap" paper that described sort of a backdoor attack that targets a well-defined model quantization format. A model can look perfectly safe, but it can be malicious after it's been quantized or effectively compressed. What do you think about that? I mean, that feels like we're introducing a whole new set of concerns, maybe vaguely analogous to the XZ Utils kind of concerns, but different, so what do you think about that?
09:55 - Aeva Black
I think it's an interesting paper and I've been trying to stay on top of all of the different types of model supply chain attacks that have been emerging over the past five, six years, and I'll admit this one particular paper was new to me. It's pretty new research, but it reminds me more of CPU side channel attacks like RowHammer and Plundervolt where there's an optimization that creates the vulnerability in both of those and in things like Spectre and Meltdown with branch prediction with the problem there, it's more efficient to build a CPU that does certain things, but under certain conditions, an attacker can take advantage of that optimization, and I think we're going to see a lot more attempts in the same vein to both optimize a model and then figure out how to exploit the optimization and it's just going to keep moving. It's always going to be a moving target.
10:51 - Chris Wright
I sort of liken it to the late-'90s turn of the century, makes it sound really old in application security where we were looking at buffer overflows and we were looking at cross-site scripting, and today we're sort of looking at context-window corruption to bypass guardrails and we're looking at prompt injection and then maybe new techniques like corrupting the quantization process, meaning it's very early days. There's a lot ahead of us like we did with application and infrastructure security, identify the issues, respond, build the mitigations. But it takes time and there's a lot that we don't know at this point. From a developer point of view, you're using a coding assistant to help you be more productive. How do we help developers and maintainers build confidence? What tools do we need? PRs are coming and we talked about the slop, so that we're not introducing just this gaping hole in our security story from the supply chain point of view, but actually produce better code, more quantity, but well, better quality.
12:21 - Aeva Black
I wish I had a good answer for how to do that today when we're just using a GPT to write some code- most of these are trained off of public data sets, and they will produce code that looks like code, looks fine. If you don't review it and pay extra special attention, it's kind of the same as just grabbing 10 lines or 10,000 lines off of a random source on the internet. You are still the developer responsible for reviewing it and making sure it's secure. So that's one aspect. But the agents, I know folks are hooking them up to the network or their command line and letting the agents take actions, not just writing code. That's a whole scarier situation. I have no idea right now other than setting up entire sandbox machines that are completely isolated, separate from your corporate environment, separate from your SSH and PGP keys. A couple of years ago we saw a whole spate of open source packages that were typo squats. Didn't affect AIs, but if you typo'd the wrong package name and ran it through your build pipeline, it would exfiltrate your build keys. I'm sure we're going to see the same if we haven't already with AI agents and we need better tools to protect against that.
13:49 - Chris Wright
I really love that perspective of the copy and paste, it's Stack Overflow, its code from a GitHub repo into your project, and you still have that responsibility. I love that. It reminds us that there's nothing magic. It's just generated content and you need to make sure it's working well in your environment. However, I wonder if there's opportunity in this world to think not just of AI that's doing code generation, but AI that's doing assistance in that review process, and I can think of really simple things. We already have linters, imagine a more intelligent linter that can do something that's difficult to do from essentially static analysis, but we can include policy in a Claude.md-esque file that describes expectations for a project, just to ensure a set of filters are required before it ever even gets to human review. So we're not putting more burden, we're not adding slop to the mix of the human reviewers, which are that really precious commodity in a community context. What are your thoughts there?
15:04 - Aeva Black
I agree that we are facing a sustainability crisis in open source for several years now. Not enough human time available to do the reviews necessary for the amount of code coming in. Actually, I think I made this point first back in 2022, got to give a talk in D.C. about open source security and that Linus's Law really should be updated for the AI era. Linus's Law from I think 1999 or so, said that 'Given enough eyeballs, all bugs are shallow.' But now there is so much code, there's not enough eyes looking at it, and the more people, the more rapidly we are generating code, the more burden there is on review. However, I'm not sure that AI review yet is the right solution because that brings up the question of trust, and we trust open source because of this old maxim, Linus's Law. We trust it because we assume people have actually put in the time to review it because we trust the people who've done that work. At some point, it still has to be a person who's made that decision for us to trust it.
16:18 - Chris Wright
Certainly puts us to danger of the foxes watching the hen house. Well, clearly it's a complex space and a lot of new threats, and in order to defend against these kinds of new threats, whether it's tampering at the model level or whether it's the threat of an agent in the development process, what shifts are you seeing in how critical open source infrastructure is built and then more importantly, sustained?
16:58 - Aeva Black
Yeah, I am seeing a lot of shifts in that space in the past three, four years from, Iguess going back even 10 years or so, the slow work of some of the public sector in the US, groups like Technology Transformation Service trying to do open source and government or all of the growth of open source program offices in the public sector in Europe. I think there's a growing awareness of how important, yes, all open source, but specific sets of projects are to building the infrastructure that powers the modern world. Governments are really taking notice of that and discussing ways to fund that. I think it was about two weeks ago that the Sovereign Tech Agency, which is a group in Germany that's working on using public funds to sustain public digital infrastructure, actually proposed a European-wide sovereign tech fund to help sustain all of the key projects and technologies that make an open internet possible. So I think, I'm hopeful, very hopeful that we'll see more of this kind of broader engagement. It's not just hobbyists and businesses, it's public sector as well that depends on this, and I'm really glad to see that awareness rising.
18:21 - Chris Wright
I know we've both had opportunity to be involved in things like the OpenSSF and touching into what is critical infrastructure, what are these key projects that help sustain this open internet. For the sort of enterprise leaders or CTOs that are listening, how should they think about their role in this? It's not just being a passive consumer of open source. We touched briefly on where you can set some expectations for what consumption looks like. What advice can you give our audience how to become a better contributor to that long-term view of health and security of this critical ecosystem?
19:11 - Aeva Black
As you know, as you said it's never really been enough to just consume open source. At a minimum I've always said safely using open source requires a consumer to take an active role, pay attention to the project, the project health at best, actually have some developer time at the company to go engage and help maintain it. Interestingly, this is going to become a little bit more compulsory in Europe soon and about two years when the Cyber Resilience Act kicks in, there's a clause in there that says the manufacturer is using open source and they find a vulnerability in the open source they're using, they have to tell the project. If they fix it, they have to share the fix in some way. So it'll be interesting how that plays out. But it's nice to see the advice that I've given for a long time, I think you have as well, starting to be more widely understood. And for companies that don't have the staff to do this themselves, I think just picking a supplier or picking a company to partner with that has the staff, that has that role is a perfectly fine choice.
20:16 - Chris Wright
Great words of wisdom, Aeva. I appreciate it, and thank you just for spending some time with us today talking through this really important topic of supply chain security and the intersection of that with AI, as it's clearly going to have massive impacts for all of humanity. Not to be too dramatic, but it feels very true.
20:41 - Aeva Black
But it sure will. Yeah, I agree, and I do think we are still at the beginning of what is guaranteed to be both interesting and incredibly impactful on humanity, so let's keep it going in a good direction.
20:54 - Chris Wright
What I'm really taking away from my conversation with Aeva Black is that strong AI security and real innovation go hand in hand. The security of these systems over the long run is directly tied to the health and sustainability of the open communities that create them. So the big question is how we support them, making sure these communities have the resources to thrive and to build security into their work from the very beginning. My optimism comes from the thoughtfulness and passion I heard from Aeva on this. The fact that the community builders themselves are pushing for more sustainable and forward-thinking practices is the best sign that we're on the right track. Thank you for listening. Look forward to seeing you in the next one.
About the show
Technically Speaking
What’s next for enterprise IT? No one has all the answers—But CTO Chris Wright knows the tech experts and industry leaders who are working on them.