Speculative Store Bypass explained: what it is, how it works

2018 年 5 月 21 日Jon Masters10 分 (読了時間の目安)

In January, the world became aware of a new class of security threat that allows attackers to exploit common industry-wide performance optimizations of modern microprocessors (aka chips). Almost every kind of computing device was affected - from servers, workstations, and laptops, to tablets, smartphones, and other gadgets. As such, “Meltdown” and “Spectre” had a significant impact across the industry.

In the wake of the discovery of these vulnerabilities, technology leaders, including Red Hat, came together to mitigate these problems ahead of public disclosure, using a combination of hardware and software updates. While those of us in the software community cannot fix already-deployed hardware, we can, and did, collaborate with microprocessor vendors to create and release software workarounds engineered to prevent the conditions required to perform the attacks. As a result of this collaboration, we were proud of our ability to respond to our customers’ needs. In the months following the January disclosure, we continued to work on improvements to the mitigations, chiefly in terms of the impact to system performance, reducing the impact still further with each additional update.

Red Hat has reprised this role once more for the newly disclosed Speculative Store Buffer Bypass CVE. Over the course of the past few months, we have further refined our understanding of the many nuances of speculative execution attacks, to develop mitigations for this latest vulnerability while working with other vendors under an industry embargo process. We are proud of the countless hours invested to make these mitigations available to our customers on a timely basis. Over the coming weeks, we plan to share more technical detail as we work with customers to deploy the latest updates.

We recommend all Red Hat customers review the Red Hat Security Advisory for CVE-2018-3639. In this post, I’m going to explain the Speculative Store Bypass CVE.

Like the earlier exploits, Speculative Store Bypass is concerned with Speculative Execution. To explain what’s going on with this latest vulnerability, let’s return to the everyday coffee shop analogy that we used in our previous post on Meltdown and Spectre.

Suppose a group of coworker friends take turns stopping at a local coffee shop on the way to work. They each want their caffeinated beverage of choice, but nobody has time to wait in line every day these days for coffee. To make ordering (and general office gossiping) easier, the group has an incredibly long, never-ending group text message chat thread, in which each calls out their preferred order. Often the orders are the same, but not always, so the person making the order on any given day will read out the orders from the group chat to their favorite barista rather than guessing about what kind of caffeinated beverage each person wants to have.

This process works well, but it isn’t flawless. One morning, as the usual list comes in for various beverages, the person whose turn it is to stand in line gets ready to make the order. An indecisive member of the group texts an update for a different beverage, which arrives just in time, but following dozens of other messages and general gossip that came after the original orders were listed. The person making the order knows that there could be such an update, but they also know that these are rare. Rather than read through the entire text thread looking for such updates, they decide to go ahead and place the order anyway before they catch up.

As the orders are read out, the group’s favorite barista dutifully labels each on a personalized cup with a name and order details using a permanent marker. After paying and moving to the end of the line to wait for the coffee, the person making the order finally catches up with the group thread and notices that one of the orders had changed. Quickly, they call out to the barista, who is still able to label a replacement cup. The cup with the wrong order written on it is thrown away, but personally identifiable information written on it is visible to anyone watching.

As in the case of Spectre and Meltdown, the coffee shop scenario once again involves speculative execution. The person making the orders knows from experience that updates to orders are unlikely to occur and proceeds to read the list of beverage orders as if there are none. They notice late in the ordering process that one of the earlier orders has been changed and have to quickly intervene to ensure that things are put right. This process of speculation generally works well because such updates are rare and it isn’t necessary to search the entire group chat history before making the order.

A similar process happens in our computers when it comes to handling updates to data values upon which they are operating as part of a program.

In the case of computers, data upon which the machine is operating are commonly stored in large banks of memory chips, known as DRAM (Dynamic Random Access Memory) or external memory. These are much larger than the tiny amounts of cache and register memory contained within the microprocessor itself, but also far slower. In order to speed up access to this external memory, the microprocessor uses special internal buffers. These contain copies of reads and writes, or what microprocessor designers call loads and stores. Whenever external memory is updated with a new value, the update is first written to a store buffer inside the chip. This buffer is eventually written back to the external DRAM, but meanwhile, the processor can keep going.

Buffering greatly increases performance because the microprocessor does not usually have to wait for the buffers to be written back to slower memory. But they introduce a complication. Data read from memory (loads) might have been updated by an outstanding write (store) that is in the buffer but hasn’t yet made it back to the external memory. Thus, every time the processor performs a load from a memory address, it must check the store buffer to see if this address is part of a recent store. In fact, things are even more complicated by the fact that modern microprocessors allow unaligned access, meaning that addresses might not exactly match but could slightly overlap. Thus the logic required to search the store buffer is complex and slow. This process is often referred to as “Memory Disambiguation”, but terms can vary by vendor.

Store buffers are highly sophisticated and they utilize fast memories (known as CAM - or Content Addressable Memory), but searching them for all possible overlapping addresses still takes time. Rather than wait for serial searching of the store buffer to complete, many modern microprocessors implement a performance optimization (known as “Speculative Store Buffer Bypass”) that will assume no such update is present in the store buffer, then speculatively execute later program instructions while performing the store buffer search in parallel. In the common case that no matching recent store exists, a significant speedup is obtained. Conflicts are trivially resolved by throwing away the results of the speculation and repeating calculations.

This process of store buffer speculation improves performance because conflicting updates are unlikely. Normal program code, of course, performs loads and stores to and from memory. These typically take the form of a machine-level instruction with a base pointer (e.g. the stack) and an offset (e.g. a local automatic variable address). But while loads and stores are common, programming tools - such as compilers and language runtimes - are already optimized to prevent reloading of recently used data values, thus improving performance because of the slower speed of DRAM. As a consequence, a great amount of effort already happens to be spent reducing just the kind of circumstances required to have a conflicting load and store in microprocessor store buffers.

In addition, many microprocessors are designed to handle loads and stores using the same base pointer (such as a stack) differently from those that have a different base pointer. In those microprocessors, the circumstances required for a speculative store buffer bypass attack are even more complicated. Some microprocessors take this even further, tracking how often there are conflicting loads and stores and automatically disabling the store buffer bypass feature when there are many conflicts. This can serve to reduce the impact of potential attacks. Yet while some microprocessor designs are difficult to exploit, ultimately store buffer speculation is designed in such a way that it can be exploited if a determined attacker is patient and skilled enough.

Meltdown and Spectre demonstrated that aggressive speculative execution such as store buffer bypassing effectively improves performance just as long as it behaves as an unobservable black box. In reality, the speculation apparatus has observable side-effects upon shared resources, such as the high-performance cache memories contained within microprocessors. Loads and stores will cause data to be loaded into caches, or evicted from them. Careful timing of subsequent loads and stores can be used to determine whether those values were previously cached. As a result, it is possible to infer what data has been loaded speculatively. This process is known as side channel analysis because secrets are not leaked directly, but instead inferred by measurement.

When cache side-channel analysis is applied to store buffer speculation, it is possible to leak earlier values of certain memory locations. Unlike in previous attacks, Speculative Store Buffer Bypass (usually) allows only reading of memory locations from within the same privilege level. Thus, it would allow only a kernel to attack itself, or an application to read memory to which it already has legitimate access. It would, therefore, seem that such accesses are harmless and that it is implicitly safe to allow aggressive speculation. Unfortunately, this is not the case.

One potential problem arises when an application is implementing a sandbox or other attempt at isolation within a single running process. In this case, there are really two active contexts: the trusted sandbox environment, and the untrusted code running within it. Microprocessors are designed with the concept of different privilege levels, and of course, our entire computing world relies upon this in order to isolate processes and virtual machines from one another. But microprocessor designers don’t (traditionally) factor in separated contexts within the same process (same privilege or exception level). As a result, untrusted (possibly malicious) code can run within a sandbox and abuse store buffer speculation to read sensitive data from the sandbox itself.

In the common case of managed code environments (such as Java or JavaScript), an ability for a managed code to dump arbitrary content from its managing process could be fatal to the security of the application, or of other applications running within the same shared process. The attack is possible because the code may be constructed to appear to perform benign reads of values to which it has legitimate access. These accesses are seen by the runtime security checks that validate the managed code prior to allowing it to execute. Unfortunately, the untrusted managed code could, in fact, be abusing speculation to see unsafe previous values of memory variables, pointers, and sensitive security structures through a cache side channel.

Mitigating Speculative Store Buffer Bypass attacks is a complex topic. We could simply globally disable every speculative performance feature. But that would rapidly remove many decades worth of performance gains across the industry. And doing so wouldn’t necessarily make us any safer in the process because in most cases store buffer speculation is safe. This is because applications that rely upon process-level separations aren’t impacted by this vulnerability. Thus, a “big hammer” approach of disabling store buffer speculation would unfairly penalize all applications to protect just a few that could be exploited through a carefully crafted attack.

Rather than globally disable all performance features, the industry has come together to provide a range of options, including new APIs for use by sandbox code. In addition, a “big hammer” Speculative Store Buffer Bypass global disable option is available to those who want to use it. System Administrators wanting to globally disable Speculative Store Buffer Bypassing can do so quickly and easily through the new “spec_store_bypass_disable” kernel parameter.

By default, updated Linux kernels providing mitigations for this vulnerability will leave Speculative Store Buffer Bypass enabled globally but also provide a new standard Linux API intended for sandboxes and managed code environments that could be at risk from exploitation. Applications providing sandboxing environments have access to a new “prctl” interface through which they can determine whether a given microprocessor is vulnerable to store buffer attacks, and through which they can easily disable store buffer bypass speculation on a per-process level in a few lines of code. When such a per process level mitigation is applied at runtime, it will apply to all further processes and threads created by an application while it continues to run.

Because of previous threats, many applications are already being refactored to use process-level isolation between trusted and untrusted code. Those efforts have been expedited in light of the latest vulnerability. Browser vendors and other trusted third parties have worked together during the embargo process on updates that are expected to begin shipping in the coming days.

Since it will take time to enable explicit mitigations in all impacted third-party software, steps have been taken to automatically enable mitigation for some classes of applications. The Linux kernel “seccomp” (“secure computing”) framework allows an application to request that limits be placed upon itself, for example, to prevent further use of certain system calls and other system interfaces after its initial startup phase. Seccomp is often used by sandboxes and managed code frameworks to limit the ability for sandbox escape once potentially malicious untrusted code begins to execute. The seccomp framework has been modified in the latest Linux kernel updates such that its use will automatically have the side-effect of disabling Speculative Store Buffer Bypass, equivalent to applications making an explicitly programmed “prctl” request.

System Administrators can quickly determine the mitigation status of a specific system either globally (in the kernel boot log messages, and through a new vulnerabilities entry in sysfs), or on a per-application level by looking at the “status” file in /proc for the running application. In many (but not all) cases, full mitigation will also require updated microcode from the system microprocessor vendor. Red Hat intends to ship updated microcode as a convenience to our customers as it is made available to us. In the interim, customers are strongly advised to contact their OEM, ODM, or system manufacturer to receive this via a system BIOS update.

Red Hat and other vendors have worked with the upstream Linux kernel community to create best practices, as well as new security APIs, including mitigations against Speculative Store Buffer Bypass exploitation that can be enabled globally or on a per-process level. In addition, we are shipping updates to common managed code environments that could be subject to attack. You should apply these updates, along with any necessary microcode updates, as soon as possible.

Learn more about Red Hat’s response to the Speculative Store Bypass Vulnerability Article (CVE-2018-3639).