A practical look at what happens when kernel bugs meet containers.
Author’s note: Refer to this Red Hat Security Bulletin for the most recent information about this CVE. This blog post was originally published on May 4, 2026 and has been updated.
Today, I spent some time trying to break out of a Red Hat OpenShift container.
No, not because I had to… but because CVE-2026-31431 dropped, and I wanted to see how bad it really is.
Short answer: it’s real, it’s exploitable, and your default controls probably aren’t stopping it.
Longer answer: defense-in-depth still matters… a lot. And this is one of those cases where you really see where Red Hat Advanced Cluster Security and Red Hat Advanced Cluster Management for Kubernetes can help.
CVE-2026-31431 is a Linux kernel issue that lets an unprivileged user mess with the page cache of pretty much any readable file and use that to skip authentication entirely.
We ran it against an OpenShift worker kernel (5.14.0-570.96.1.el9_6.x86_64) and it worked immediately. Root in the container in seconds. What makes this one uncomfortable isn’t just that it works, it’s how clean it is. You’re not asking for privileges, you’re not tripping obvious alarms, and you’re not leaving much behind. It’s the kind of thing that fits very nicely into a real attack chain.
Going in, I assumed that once I had root in the container, I could pivot out through something like /proc/1/root. That assumption didn’t hold. I was running as a limited user in a limited user namespace, but with an SCC a bit more permissive than restricted-v2. Even as UID 0, I was still stuck inside a user namespace with no real capabilities. Everything you’d expect to work just quietly failed. nsenter didn’t go anywhere, chroot wasn’t allowed, kernel modules were a non-starter, and cgroups weren’t giving me anything useful. This is OpenShift’s security model doing exactly what it’s supposed to do. Even with UID 0 inside the container, you’re still bound by SCCs, SELinux, and a restricted set of capabilities at the node level. It is important to mention here that if this pod was deployed as an admin, in the default namespace, results could differ. But, from a detection standpoint, though, this is where things get interesting. Even though the escape failed, the behavior leading up to it doesn’t look like a normal application anymore.
This is actually the good news. Container isolation does hold up when it’s done properly. The problem is that most teams don’t actually know if they’re in that “properly configured” category until someone tries to break it, and that’s not a great time to find out.
Red Hat Advanced Cluster Security doesn’t magically fix the kernel. If you’re vulnerable, you’re vulnerable. What it does do is cover the parts of the attack lifecycle where this turns from an interesting vulnerability into an actual incident. At build time, it gives you immediate visibility into the fact that you’re shipping a vulnerable kernel instead of spending days figuring out where you’re exposed. At deployment, it gives you a control point that can actually say “no” when something doesn’t meet your baseline, even if it slipped through CI or someone tried to bypass the process.
Where it really becomes useful is at runtime. When I ran the exploit, it wasn’t noisy in the traditional sense, but it was definitely weird. AF_ALG usage inside a container stands out if you’re looking for it. su behaving in ways that don’t line up with normal execution stands out. Privilege transitions that don’t match the workload baseline stand out and setting behavioral baselines in Red Hat Advanced Cluster Security will make it so you get an alert whenever something deviates. That kind of signal only matters if you’re actually watching what’s happening at the kernel level.
Because the real problem isn’t that I got root in a container in 30 seconds. The real problem is what happens if nobody notices for three days.
Red Hat Advanced Cluster Security isn’t a silver bullet. It won’t patch your kernel, it won’t stop every exploit, and it won’t fix bad architectural decisions. What it does is bring together visibility, enforcement, detection, and response in a way that actually works when something slips through, which it inevitably will. You can also create custom policies to block the deployment of workloads that are vulnerable.
If you’re running at scale, this is where Red Hat Advanced Cluster Management for Kubernetes comes into play, because at some point, you need a way to actually enforce that mitigation for the kernel flaw across a fleet of clusters, not just detect that you’re at risk. Red Hat Advanced Cluster Management’s governance policy can enforce the necessary configuration and help remediate your cluster. Please see the remediation knowledge base article with the details. (Red Hat account required) The remediation knowledge base article details 2 mitigation scenarios where one requires a node reboot while the other does not.
At the end of the day, the fundamentals haven’t changed. You patch when you can, because it’s necessary. You reduce the attack surface where it makes sense. You accept that prevention isn’t perfect and invest in detection. You assume something will get through and limit the blast radius with mitigating controls. And you keep doing the boring work consistently so you don’t drift into a bad state over time.
This vulnerability just happens to be a very clear reminder of all of that.
We’ve spent a lot of time focusing on application security, supply chain, APIs, etc., which are all important. But this is a reminder that the kernel is still very much in play. In this scenario, the attacker is already inside your container, doesn’t need additional privileges, doesn’t need persistence right away, and doesn’t need to be loud. If you’re not watching runtime behavior, you’re effectively blind.
If you’re running Kubernetes in production, you don’t necessarily need Red Hat Advanced Cluster Security specifically, but you do need to be able to answer a few uncomfortable questions. Do you actually know where you’re vulnerable right now? Can you stop something from being deployed if it shouldn’t be? Would you detect behavior like this while it’s happening? And if you did, could you respond fast enough to matter?
If we had to deal with this in a real environment, the first priority would be getting a clear picture of where we’re actually exposed. Not a spreadsheet, not a ticket, not something someone updated last week—real, current data. This is where something like Red Hat Advanced Cluster Management becomes useful. It provides a way to look across clusters and understand which nodes are still running a vulnerable kernel, based on what’s actually happening in the environment right now.
At the same time, we would be thinking about blast radius. The exploit itself doesn’t require privileges, but that doesn’t mean all workloads carry the same level of risk. Anything running with elevated SCCs, host networking, or looser constraints still represents a more attractive pivot point if something goes wrong. Those are the areas that deserve closer attention while remediation is in progress.
If patching cannot happen immediately—and in most environments it cannot—then guardrails become critical. Admission control needs to actively enforce security posture, not just exist as a recommendation. Whether that enforcement is done through Red Hat Advanced Cluster Security or another policy engine, workloads should be required to define a proper seccomp profile rather than relying on defaults. This does not eliminate the vulnerability, but it does reduce the available attack surface during the window of exposure.
At runtime, the focus shifts to identifying behavior that does not align with expected workload activity. Most applications have no legitimate reason to interact with interfaces like AF_ALG, nor do they typically exhibit unusual su behavior or unexpected privilege transitions. These are the kinds of signals that stand out—provided there is visibility into runtime activity. If you choose the reboot path, you need to plan it carefully to avoid violating Pod Disruption Budgets and disrupting application availability.
Ultimately, the vulnerability still needs to be remediated at the source. As of this publication date there are two mitigation scenarios where one requires a node reboot while the other does not. Choosing the scenario that requires a node reboot must be coordinated carefully to respect Pod Disruption Budgets and maintain application availability.
Once remediation is complete, validation is just as important as the fix itself. A controlled test in a canary namespace can confirm whether detection and alerting mechanisms are functioning as expected. If no signal is generated, or if detection is delayed, that indicates a gap in visibility that should be addressed.
Because in the end, resolving the CVE is only part of the outcome. The more important question is whether similar behavior would be detected and responded to the next time it occurs.
Getting root in a container in under 30 seconds isn’t the scary part anymore. The scary part is how long that container could live like that without anyone noticing.
That’s the difference between having security controls and actually having security.
Red Hat Product Security
저자 소개
Sean Rickerd, a distinguished professional in the technology and security domain, seamlessly blends his extensive career journey with a commitment to excellence. From his early days at SUSE to his current role as Principal Technical Marketing Manager at Red Hat, Sean's writing reflects a dedication to continuous learning. With a focus on authoring about cutting-edge fields like DevSecOps and Kubernetes security, he stands at the forefront of driving innovation and elevating security practices.
채널별 검색
오토메이션
기술, 팀, 인프라를 위한 IT 자동화 최신 동향
인공지능
고객이 어디서나 AI 워크로드를 실행할 수 있도록 지원하는 플랫폼 업데이트
오픈 하이브리드 클라우드
하이브리드 클라우드로 더욱 유연한 미래를 구축하는 방법을 알아보세요
보안
환경과 기술 전반에 걸쳐 리스크를 감소하는 방법에 대한 최신 정보
엣지 컴퓨팅
엣지에서의 운영을 단순화하는 플랫폼 업데이트
인프라
세계적으로 인정받은 기업용 Linux 플랫폼에 대한 최신 정보
애플리케이션
복잡한 애플리케이션에 대한 솔루션 더 보기
가상화
온프레미스와 클라우드 환경에서 워크로드를 유연하게 운영하기 위한 엔터프라이즈 가상화의 미래