Researchers:  Erez Israel, Daniel Marx, Yoav Alon, Aviv Gafni and Ben Omelchenko

Last week, two publications regarding a pair of vulnerabilities named individually by their publishers as Meltdown and Spectre sent shockwaves through the cyber-security ecosystem. Using side-channel attacks, these vulnerabilities allow an attacker to break the security that lies at the core of a large chunk of today’s computer systems.

Meltdown, (also known as Variant 3: rogue data cache load (CVE-2017-5754)), exploits a design flaw in Intel CPUs in a mechanism embedded in the CPU known as out-of-order execution. When this mechanism is employed by the CPU, instructions are executed out-of-order as an optimization technique. Eventually, the CPU may have executed too far ahead into a flow that should not be executed, due to an exception, for example. At this point some of these instructions will be rolled back and the CPU state reverted.

The problem lies in the fact that the state of the system is not completely reverted. A mechanism known as the CPU cache can be used to gather information about events that should not have occurred during out-of-order execution. When carefully employed, the Meltdown vulnerability allows an attacker to disclose sensitive kernel memory, which was assumed to be protected from access at the hardware level.

The second vulnerability, named Spectre, leverages yet another design flaw in modern hardware. In order to speed up execution, the CPU employs a mechanism known as ‘Speculative Execution’. Speculative execution is used by the CPU to predict ahead of time which instructions will be executed, and to speculatively execute these instructions in order to avoid what would otherwise be idle CPU time. Spectre exploits a flaw in the design of this system. This flaw allows an attacker to read inaccessible memory and to break the isolation of processes operating independently inside the system.

While the Meltdown vulnerability was patched earlier this week in a feature known as KTPI patch, the Spectre vulnerability remains unpatched. This leaves even up-to-date systems vulnerable to attack.

It is important to note that both vulnerabilities are a result of design flaws in the hardware. Properly dealing with these flaws will require a fix at the hardware level. The patch for Meltdown addressed the problem at the level of the operating system, and only against known attacks. This means that the patch does not address the problem at its core. It is still unclear if and when such a fix will be implemented.

It is also vital to note that even once patches are rolled out to help mitigate these vulnerabilities, many systems will most likely remain vulnerable in the foreseeable future. This is due to many systems (such as IOT, custom or legacy systems) not having the ability to be patched due to the difficulties involved in patching hardware, as well as other reasons (such as incompatibility with other parts of the system). For this reason, it is important that systems are kept protected by another layer of defense.

Analysis of Meltdown

Leaking information through the exploitation of the Meltdown vulnerability requires three steps:

  1. Arranging a code path such that it will speculatively execute memory reads from non-accessible memory location in the process, followed by access to a user-accessible memory location that depends on the speculative memory reads.
  2. Stopping the code short of executing the above-mentioned memory-read, thereby executing the complete code path only speculatively due to an out-of-order execution mechanism.
  3. Using a side channel (the CPU cache) to gather the information about the speculatively executed code path, thereby disclosing secret information.

In order to detect and prevent exploitation of the Meltdown vulnerability we employ the following insights:

  1. During step three, the Meltdown vulnerability uses the cache as a side channel to leak the information found during step two. During the attack, in order to use the cache as a side-channel, the attacker must generate a significant amount of cache misses for every cache-hit, as the timing differences between the cache-miss and the cache-hit is used to gather the data.
  2. During step two, speculative code execution attempts a read from inaccessible memory locations. The value returned from these reads is later exfiltrated to an overt-channel using a side-channel (through the cache). This speculative code is consequently reverted and is never committed (retired).

The abovementioned insights lead to the conclusion that code that exploits the Meltdown vulnerability will be characterized by speculative code which reads inaccessible memory, followed by a write to user controlled memory. This is then followed by a large amount of cache misses in order to exfiltrate the data into an overt-channel.

Analysis of Spectre

In variant 1 (see https://googleprojectzero.blogspot.co.il/2018/01/reading-privileged-memory-with-side.html), Spectre exploits the branch predictor mechanism in order to execute an out-of-bounds read.

The stages required for the exploitation of this variant are as follows:

  1. Arrange a code path such that a conditional branch is followed by a possible read from non-accessible memory location in the process, followed by an access to a user-accessible memory location that depends on the speculative memory reads.
  2. (optional) Train the branch predictor to take the exploitable path (see https://spectreattack.com/spectre.pdf – Poisoning Indirect Branches).
  3. Causing an out of bounds read which triggers the speculative read of inaccessible memory locations due to a mistaken branch prediction (as a result of step two).
  4. Using a side channel (the CPU cache) to gather the information about the speculatively executed code path, thereby disclosing secret information.

While Spectre exploits a different hardware mechanism to induce speculative code execution, the side effects and the methodology used to exfiltrate the data behave in a similar manner (use of cache side-channel to disclose the secret) and leave measurable side-effects.

Variant 2 of the Spectre vulnerability is based upon the techniques described in variant 1 in order to subvert the normal operation of the branch predictor. This variant exploits the design of the branch predictor to subvert speculative code to execute attacker controlled paths. As the side effects exhibited by this variant of attack from our detection point of view is similar, we will not discuss this variant in further detail.

Detection of Meltdown and Spectre Attacks

An example of a simple test showing one of the statistical differences exhibited by a process that exploits the Spectre vulnerability (See figure 1) is evident in the ratio of cache-misses to missed branches (branches that were executed speculatively and we’re later reverted as the speculation was wrong. This can be caused by training the branch predictor to take these branches before reading out-of-bounds). While the ratio of cache misses to branch misses on most programs we tested was less than one (See figure 2), exploiting the Spectre vulnerability raised this ratio significantly to more than three (See figure 1).

Figure 1: cache-misses as compared to missed-branches data collected from Spectre POC

Figure 2: cache-misses as compared to missed-branches data collected from Spectre POC and standard programs showing an example of the statistical difference caused by the exploit of Spectre

 

An example of another statistical difference evident in the exploitation of Spectre arises from the training of the branch predictor. As described in “Spectre Attacks: Exploiting Speculative Execution” (https://spectreattack.com/spectre.pdf), Spectre exploits the branch predictor design by training it to predict according to the attacker’s intention. The training of the branch predictor is conducted using a series of repeated executed (non-speculative) branches which are used to deceive the branch predictor when initiating the exploit (reading what should be inaccessible data). When analyzing execution of the image, this mechanism raises a red flag.

Our analysis of Meltdown also reveals key indicators that can be used to detect the exploitation of this vulnerability. One interesting indicator is the occurrence of segmentation faults in the kernel address space and the abovementioned pattern of exploiting the cache as a side-channel. This pattern can be used to identify exploitation attempts with a high degree of certainty.

As explained above, Meltdown and Spectre leave measurable anomalies in the process behavior when trying to exfiltrate secret data using a side channel. Additionally, it is evident that Spectre and Meltdown also exhibit anomalies in the flow of speculative and out-of-order execution.

These vulnerabilities are a perfect example of attacks that are essentially invisible to the operating system (as they don’t involve any operating system call), but can be detected when monitoring patterns and events at the CPU level.

By monitoring the occurrences and patterns of these cache misses/cache hits during a process execution, and by monitoring the execution of speculative code which is later reverted, we can identify with high certainty any attempt to subvert the system using the Meltdown and the Spectre vulnerabilities.

References:

https://googleprojectzero.blogspot.co.il/2018/01/reading-privileged-memory-with-side.html
https://spectreattack.com/spectre.pdf
https://meltdownattack.com/meltdown.pdf
https://gruss.cc/files/kaiser.pdf
https://xania.org/201602/bpu-part-one
http://www.agner.org/optimize/microarchitecture.pdf
https://software.intel.com/en-us/articles/intel-sdm