Utilizing hardware to stop attackers earlier and without disruption
Too often the defense community makes the mistake of focusing on the “what,” without considering and truly understanding the “why.” This mindset often leads to the development of technologies based on known exploitation techniques, which are ineffective and easily circumvented shortly after their release. Instead of focusing on those known exploit techniques, our research introduces a new method for early detection and prevention of exploits without prior knowledge of the vulnerability or technique.
Our hardware-assisted technique presented yesterday at Black Hat USA 2016 has proven successful at blocking exploits, while minimizing the impact on performance to ensure operational utility at scale.
Challenges with current approaches
Time and again we’ve seen newly developed software protections bypassed shortly after their release. This is especially true with exploit mitigations and Return-Oriented Programming (ROP) in particular. Over a decade ago, processor manufacturers began to add hardware enforcement of page level permissions, which enabled operating systems to restrict code from executing anywhere in memory, a common exploit technique. But as Microsoft Windows and other operating systems introduced these countermeasures, researchers were quick to devise creative ways to bypass them.
Many of these exploits focused on reusing legitimate code, and ROP became enemy number one. However, times change and today new exploit techniques are less reliant on ROP. Many more techniques exist publicly, and as the HackingTeam leak proved, private and therefore unknown techniques exist, too.
The exploit kit graph below clearly illustrates the declining utility of ROP, which in turn demonstrates the difficulty in ROP-based exploit mitigations. A single change in exploit technique trends can have a dramatic and long lasting effect.
As attackers have moved away from ROP and towards a more advanced, and frankly harder to detect, technique for executing payloads, what can we do?
Toward earlier detection
Over the years the industry has realized that it is impossible to eliminate vulnerabilities. We also know that exploit authors are incredibly creative. Therefore, the biggest impact we can have on the success of exploits is to limit the opportunity for creative bypasses. To oversimplify, exploits have to trigger a vulnerability, and then “do something.” Anti-exploit solutions need to disrupt this “something” early in the stages of exploitation to maintain an advantage.
To demonstrate, consider the following graphic that illustrates the high-level stages of an exploit.
This progression highlights that real defense must fight in the “Exploitation” stage of the attack. Unfortunately, most exploit prevention products continue to focus on the “Post-Exploitation” stage, including the continued focus on ROP, and by that time, the attacker almost always wins. However, at the “Exploitation” stage of the attack, defenders still have the advantage and can stop attackers from achieving their objectives.
Hardware Assisted Control Flow Integrity (HA-CFI)
Advanced software exploitation is a rapidly evolving and increasingly mainstream field of study, introducing clever ways to bypass existing exploit defenses. Instead of focusing on the post-exploitation stage, we leverage the enforcement of coarse-grained Control Flow Integrity (CFI) to enhance detection at the exploitation stage. Existing implementations of CFI require recompilation, extensive software updates, or incur a significant performance penalty, making them difficult to adopt and use in the enterprise.
To enable earlier detection while limiting the impact on performance, we have developed a new concept we’re calling Hardware Assisted Control Flow Integrity, or HA-CFI. This technology utilizes hardware features available in Intel processors to monitor and prevent exploitation in real time, with manageable overhead. By leveraging hardware features we can detect exploits before they reach the “Post-Exploitation” stage and provide stronger protections while defense still has the upper hand.
The Performance Monitoring Unit (PMU) of microprocessors is a good candidate for enforcing Control Flow Integrity. The PMU is a specialized unit in most microprocessor architectures that provides useful performance measuring facilities for developers. Most features of the unit are intended to count hardware level events during program execution to aid in program optimization and debugging. To ensure our approach is resilient for enterprise security while also providing significant detection and prevention assurances, we established several functional requirements, such as ensured functionality on 32 and 64bit Operating Systems, application without software recompilation, or access to source code.
HA-CFI uses PMU-based traps in order to apply coarse-grained CFI on indirect calls on the x86 architecture. The system uses the PMU to count and trap mis-predicted, indirect branches in order to validate branch destinations in real time. This approach requires support from Intel’s Last Branch Record (LBR) feature, a method for tracking thread context switching in a given OS, as well as an algorithm for validating branch destination addresses, all while keeping performance overhead to a minimum on both Windows and Linux. Using a runtime-generated whitelist, we can determine the validity of indirect calls to locations classified as malicious. This approach greatly reduces the overhead of the instrumentation by moving the policy enforcement to a “coarse-grained” verifier on indirect branch targets. Our approach consists of four key components, which we discussed in more detail at our Black Hat USA 2016 presentation: indirect branch, added precision with the LBR, on-demand PMU assisted CFI, and whitelist generation.
To ensure we minimized the overhead of HA-CFI while maintaining detection at the exploitation phase, we extensively tested the system. This included tests against a variety of exploits to determine its efficacy against as many bug classes and exploitation techniques as possible, with an emphasis on more recent samples using approaches intended to bypass other mitigation measures, including samples captured in-the-wild. Throughout all of this, we maintained an extremely low false positive rate. HA-CFI detected and prevented exploitation for each of the tested modules, including both ROP and “ROP-less” techniques.
Conclusion
Next generation exploit defense must move to detecting and preventing exploitation patterns in earlier stages of the process to maintain the defensive advantage needed in order to limit exploit authors’ creativity and effectively block them. For the time being, ROP defenses are still providing some protection, especially in commodity and less advanced exploits, or when reading and writing memory may be impossible. Our novel yet practical approach to exploit detection uses the Performance Monitoring Unit to enforce control flow integrity on branch mis-predictions. HA-CFI advances the state-of-the-art to give enterprise-scale security software an upper hand in earlier detection of exploitation.