There is much to discuss in the wake of the security news flow last week. It was dominated by the Meltdown and Spectre CPU bug announcements — 2018 has certainly got off to an interesting start. In part one of this two part blog I will look at these bugs from a high level. In part two I shine the spotlight on the implications for mobile security, and for Android in particular.
Although patches for these bugs have been in the works for some time, the announcements have been rushed out and it has showed at times. However The Register and Ars Technica amongst others have provided some excellent coverage which I won’t repeat again here.
Cache side channel attacks are not new but until now many of those attacks have been somewhat contrived. For the attack to work, there needs to be a strong signal coming from the victim application you are trying to eavesdrop upon and locating this has been the biggest challenge for the attacker. In this context a strong signal means a cache line access pattern that itself reveals some secret data. In other words the data has to influence the memory access pattern in a particularly convenient way. Some key piece of data you are trying to steal needs to be used to calculate addresses that conveniently span lots of cache lines to give you some good cache side channel information to spy on later. Furthermore you need that cache layout to remain intact long enough that your eavesdropping code can observe it. Often, the cache layout information is quickly degraded by other operations being performed by the code under attack. These requirements don’t occur very frequently in reality, so although there are some nice demos of side channel communication the opportunities of finding a good signal generator in the code you want to attack is actually rather slim.
Meltdown and Spectre change the game fundamentally. You get to make your own signal generating code that you can then eavesdrop upon yourself. Very convenient. This is much more generic and much more powerful. The key insight here is the use of speculative execution to do this for you. To maintain decent performance, CPUs execute speculatively quite a lot of the time. In particular they predict which way branches are likely to take and as it turns out they effectively predict that, sure, you are definitely allowed to access that memory location you just read from.
First let us deal with Meltdown, which effectively only impacts Intel CPUs. Formally this is Rogue Data Cache Load (CVE-2017–5754), a rather clever abuse of the out of order speculative nature of CPUs. The first thing you do is you load from a memory location that are not meant to have permission to read. When you do this your code is going to get an exception and it is going to stop, and certainly not tell you directly what was held in that memory location. But, as it turns out, it doesn’t do that right away because it deals with the exception as a separate sub-operation to the load itself. So if you cleverly construct your code you can actually take the data load and transform it into an address and then load from that location. Now when the exception finally catches up with you all the register state you got from the original load gets rolled back. Washed away as if it never happened. However, the fact that you read from that data dependent address is still left in the cache state. It isn’t washed away. This vestige remains and with careful measurement of the cache state from another thread in your code you can infer what the data was. This is very bad news. Now you can effectively read any memory you want to, albeit a little statistically and very indirectly. The really terrible news is that by default Linux and Windows make all of the machine’s memory accessible in the process address space. So all the secret data, passwords and other useful nuggets are potentially accessible to your attack code. If you can get your code running on the target machine then you can read anything you want in theory. A pretty devastating oversight in the security isolation of CPUs. It is truly remarkable that this hasn’t been obvious before now. Speculative execution has been around for many years. It is one of those insights which results in a collective sigh of “of course, why did nobody think of that before” once you realise its implication. Patches to fix Meltdown have been in the works for a while and are ready to roll out soon. Basically the fix is to make sure the only pages that are MMU mapped for a process are ones that it should be able to read anyway or are otherwise not very interesting. This is termed Kernel Page Table Isolation (KPTI ) for Linux, and there is a similar fix for Windows arriving soon. This has a performance impact though. There were good reasons to make the rest of the kernel pages quickly accessible. Even though systems will see a drop in performance, we will be safe from Meltdown. So although much of the initial hysteria has been around the impact of Meltdown, especially in multi-tenant cloud environments such as AWS, it seems that the patch cycle should catch up soon and Meltdown will then likely drop out of the headlines. Although the main Meltdown bug does not impact ARM CPUs, they define a special variant 3a whereby certain system register contents can be leaked to ordinary processes but overall this seems less serious than the Intel variety. A PoC on ARM hardware is already running.
I suspect that Spectre will be the bug that keeps on giving. It will certainly haunt us for years to come as been pointed out by the researchers who found this vulnerability. Its implications are deeper and are more difficult to mitigate. Unlike Meltdown, Spectre equally impacts ARM and AMD devices. This means that phones and tablets are in the frame too for Spectre related data leaks.
There are two variants of Spectre. The first is termed Bounds Check Bypass (CVE-2017–5753). Basically this requires a rather particular sequence of code including a range check of a conditional value. It uses the branch prediction to execute a block of code that shouldn’t be executed. The idea is to train the branch predictor to execute some conditional code and then pass in a value that means it shouldn’t. However the conditional check needs to be carefully constructed so that it can’t be evaluated straight away. Thus, rather than wait and waste time, the CPU will go ahead and execute the conditional code just like it did last time. What’s the harm, right? There is nothing better for it to do. The conditional code is constructed to load data from an arbitrary, attacker controlled, memory location and then use that data as an address to load from memory again. There are obvious similarities with Meltdown here. That second load leaves its impact in the cache layout even after the processor catches up with itself and realises it should have never executed that code path after all. So basically you can read any piece of memory that the process can read. You also need a relatively contrived code sequence to do it, one that is less likely to occur naturally so you really need to be able to generate that code yourself. The bug allows you to effectively load from memory that the process can load from anyway. What’s the big deal you might ask. Well the big deal is JIT and user generated code that is running in the context of a process. Memory bound checks will typically ensure that such code cannot just read arbitrary memory, only the parts it is allowed to in its little sandbox. This Spectre variant allows escape from the sandbox. The disclosure shows how Javascript can be constructed that is then JITted to generate the exact sequence to exploit this bug. Now Javascript running in a web page can potentially access all of the browser process memory. This might include all sorts of passwords, credentials, cookies and other useful confidential tidbits that should not be accessible to whatever rogue Javascript running advert wants to steal your data. The fact that many browsers run each tab in a separate process doesn’t seem quite like the act of extreme tin-foil hatted-ness that I thought it once was. Suddenly it seems awfully prescient. The other thing I discovered yesterday for the first time is that even the Linux kernel has a JIT! Apparently there is a thing called eBPF that can be exploited to leak arbitrary kernel data. Needless to say that fixes are in the pipeline for all these things, but they are more complex and require different code sequences to be generated with special fence instructions to ensure that bad speculations cannot leak confidential data in this way. The wider, longer term implication of all this is you can’t trust user code that you allow to run in the context of your process unless you really think about it. Even if you sandbox that code so you limit what it can access that won’t necessarily be sufficient. Suddenly we have to realise programs live in a strange quantum like world, and can actually take code paths that they never should. The veil of abstraction over speculation execution has been breached.
Now we come to the second variant of Spectre, Branch Target Injection (CVE-2017–5715). This one seems the most bizarre of them all. Again this is related to branch prediction, but this time to indirect branches. Indirect branches happen reasonably frequently, especially in object oriented languages that allow polymorphic method overrides. So it is quite important to speed them up, otherwise they would be rather slow. This is because the CPU can’t even start filling its pipeline until it knows what instruction to go to next, and modern CPUs have rather long pipelines to be filled. As it turns out, where a particular indirect branch went last time is a rather excellent predictor of where it will go next time. So again, in the absence of anything better to do, a typical CPU will do exactly that and execute speculatively to the last target. To support this there is a table in the CPU called the Branch Target Buffer (BTB) that records a mapping from an instruction address to where that instruction will likely go to if it is an indirect branch. So this gets updated when the determination is made to predict for next time. Of course this table can’t be infinitely big and there are some collisions between the entries but most frequently executed instructions will be retained. It acts just like a cache of recent indirect branches and in some architectures is called a Branch Target Address Cache (BTAC). The BTB content is retained if your code stops executing and then passes to the kernel and then perhaps on to some other process running on the same core. It makes sense to keep the entries because some of the valid ones might still be there by the time it is your code’s turn to execute again. There is no point in throwing away potentially useful information that speeds up execution and what is the harm? If the target address was wrong then the CPU didn’t have anything better to do anyway, and the incorrect execution can just be rolled back as if it never happened and then the correct destination can be branched to. Well I’m sure you can see where this is going. As we know already, speculative execution is not without side effects as it turns out and these can be exploited.
The really clever trick with Spectre Branch Target Injection is that an attacker can get to choose what code is speculatively executed. If the attacker knows the address of a particular indirect branch then they can train the BTB to point to where they want it to. That needs to be code that is already in the process being attacked, but let’s presume that the attacker knows what and where that code is. The code doesn’t need to execute cleanly, it just needs to execute long enough that it is going to leak something useful into the cache state to examine later. This code is speculative, and it will be the wrong prediction anyway, so will be rolled back. So basically if the attacker wants to know the value of a particular register at an indirect branch call site then they need to find some code that uses that register in an address calculation and memory load that will leave a distinctive enough trace in the cache state. There is an extremely strong parallel here with ROP gadgets. These are short sequences of existing code that an attacker can cobble together to do something useful if they can cause a buffer overflow in a process that allows them to control a target address, but can’t inject new code. Buffer overflows and ROP gadget chains are the scourge of application security. What we have in the Spectre case are speculation gadgets. We can choose just the right existing code sequence in the existing code that will give us the leakage signature that we want. Unlike ROP gadgets though we don’t need an existing vulnerability like a buffer overflow to exploit them. We just need to have run earlier and have set our trap in the BTB to force another process’s code to incorrectly speculate and leave us some useful state in the cache that we can examine later. Of course in practice this is hard to setup and seems to be much easier to leak small blobs of information (such as individual bytes or words) rather than data blocks since our poisonous BTB training is soon undone and we have limited capacity for transmitting data through the cache state. It seems to me that the most obvious use for such an attack is for software key logging. Potentially every register that is live at an indirect branch site is susceptible to leakage. We just need to find one that, for instance, holds the value of the last key press on a system. This may of course be part of a password or confidential message. Not good.
Note that in order to exploit this Spectre variant it is necessary to know the exact code layout of the process you are trying to attack. You need to know this so you can set your BTB trap correctly for the indirect branch that is being subverted. Since Address Space Layout Randomization (ASLR) is widely used nowadays, as a defence against ROP Gadget attacks, this is not necessarily straightforward. There are various techniques around to try and infer the code addresses but this does add another layer of complication in the already byzantine causal chain required to make this attack actually work.
Fixes are also in the works for this flaw. In particular the approach relies on BTB training existing beyond the boundary of a context switch. Intel have a microcode update coming that will disable the retention of the entries across the boundary, effectively rendering this bug unexploitable. ARM have similar patches available that can disable or clear the BTB across context switch boundaries. At this stage it is unclear what the performance impact of this is going to be. There is definitely going to be some. The BTB content is traditionally retained for good performance reasons. Compiler patches are also coming in the form of the new “retpoline” feature. This allows an application to be recompiled so that it is no longer susceptible, or indeed the kernel can be recompiled with this feature on. This is particularly for cases where an application might be run in an unpatched environment. Retpolines seem rather ugly but fix the issue. They basically cause the compiler to replace indirect branches and calls with a more complicated and slower sequence that abuses the return instruction to do the same thing. This is much slower but not susceptible to Spectre as returns don’t use the BTB for prediction. It is unclear when and if similar Retpoline fixes will come for other compiled and JITted languages such as Java, C#, Python or Go. In part two I discuss what this all means for mobile security. These bugs are definitely not restricted to Intel devices only.