JIT Spraying Never Dies
JIT%20Spraying%20Never%20Dies%20-%20Bypass%20CFG%20By%20Leveraging%20WARP%20Shader%20JIT%20Spraying
JIT%20Spraying%20Never%20Dies%20-%20Bypass%20CFG%20By%20Leveraging%20WARP%20Shader%20JIT%20Spraying
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
<strong>JIT</strong> <strong>Spraying</strong> <strong>Never</strong> <strong>Dies</strong> - Bypass CFG<br />
By Leveraging WARP Shader <strong>JIT</strong> <strong>Spraying</strong><br />
Bing Sun, Chong Xu
Abstract<br />
• Many scripting languages, such as JavaScript and ActionScript, use Just-In-Time (<strong>JIT</strong>) compilation<br />
to improve the script execution performance. However, under some circumstances, the legit <strong>JIT</strong><br />
mechanism can be leveraged by the exploit to bypass memory protection and mitigation such as<br />
ASLR and DEP. Such exploitation technique was first introduced as "<strong>JIT</strong> <strong>Spraying</strong>" in 2010. The<br />
idea is to use the constant numeric value in high-level script language to generate the desired<br />
<strong>JIT</strong>ed code at predictable locations. With the <strong>JIT</strong> spraying as a reliable exploitation technique<br />
seeing its popularity, vendors started to revisit the <strong>JIT</strong> engine implementation. Since then,<br />
mitigation countermeasures, such as randomizing the <strong>JIT</strong> code page allocation and mutating<br />
<strong>JIT</strong>ed code generation, have been employed to prevent <strong>JIT</strong> spraying. Particularly, MS WARP<br />
Shader <strong>JIT</strong> engine, which we will exploit in this talk, has security mechanisms such as Shader<br />
complexity, <strong>JIT</strong> cache size limit, separation between the constant data and code. As a result, the<br />
<strong>JIT</strong> spraying technique became less effective in most exploitation scenarios. <strong>Never</strong>theless, <strong>JIT</strong><br />
<strong>Spraying</strong> technique has never died, even in the most secure Windows 10 era. In this talk, we will<br />
present a completely different <strong>JIT</strong> spraying exploitation technique (based on MS WARP <strong>JIT</strong>) to<br />
bypass control flow guard (CFG) in the context of browser in a generic way. This presentation<br />
provides details on how to circumvent the MS WARP <strong>JIT</strong> restrictions and achieve reliable CFG<br />
bypass. At the end, a live demo will be given to demonstrate bypassing CFG on IE11 and Edge of<br />
Windows 10.
About Speakers<br />
• Bing Sun<br />
– Bing Sun is a senior information security researcher, and now he is leading the IPS security research<br />
team of Intel Security Group (formerly McAfee). He has extensive experiences in operating system<br />
kernel and information security technique R&D, with especially deep diving in advanced<br />
vulnerability exploitation and detection, Rootkits detection, firmware security and virtualization<br />
technology. Moreover, Bing is also a regular speaker at international security conference, such as<br />
XCon, Black Hat and CanSecWest.<br />
• Chong Xu<br />
– Chong received his Ph.D. degree in networking and security from Duke University. His current focus<br />
includes research and innovation on intrusion and prevention techniques as well as threat<br />
intelligence. He is a senior director of Intel Security IPS team, which leads Intel Security vulnerability<br />
research, malware and APT detection, and botnet detection and feeds security content and<br />
innovative protection solutions into Intel Security’s network IPS, host IPS, and sandbox products, as<br />
well as McAfee Global Threat Intelligence (GTI).
Agenda<br />
• Background Knowledge<br />
• Bypass CFG via WARP Shader <strong>JIT</strong> <strong>Spraying</strong><br />
• Demo (WARP Shader <strong>JIT</strong> <strong>Spraying</strong>)<br />
• Extra Demo (Other 0day CFG/DEP Bypass Methods)<br />
• Q&A
Background Knowledge<br />
• Direct3D & DXGI<br />
• Software Rasterizer & WARP<br />
• Rendering Pipeline & Shader<br />
• GLSL/HLSL<br />
• WebGL & Its Usage<br />
• Shader’s Lifecycle on WARP<br />
• The Basic Principle of CFG<br />
• Known CFG Bypass Methods
• Direct3D (part of DirectX)<br />
– A Microsoft DirectX API subsystem component. It is<br />
presented like a thin abstract layer between a graphics<br />
application and the graphics hardware drivers<br />
(comparable to GDI).<br />
– Provides low-level API for drawing primitives with the<br />
rendering pipeline or performing parallel operations<br />
with the compute shader.<br />
– Compete with Khronos' OpenGL and its follow-on<br />
Vulkan.<br />
• DXGI (Microsoft DirectX Graphics<br />
Infrastructure )<br />
– Encapsulates some of the low-level tasks that are<br />
needed by Direct3D 10/11/12.<br />
– Enumerating graphics adapters, enumerating display<br />
modes, selecting buffer formats, sharing resources<br />
between processes, and presenting rendered frames to<br />
a window or monitor for display.<br />
Direct3D & DXGI
Software Rasterizer & WARP<br />
• Software Rasterizer<br />
– A software component that can render an image<br />
independent on graphics hardware (GPU). The rendering<br />
takes place entirely in the CPU.<br />
• WARP (Windows Advanced Rasterization Platform )<br />
– WARP is a full-featured Direct3D 10 software rasterizer<br />
that does not require graphics hardware (GPU) to<br />
execute.<br />
– WARP can be used for rendering when no compatible<br />
hardware is available, in kernel mode applications, in a<br />
headless environment, or for remote rendering of<br />
Remote Desktop Connection client.<br />
– WARP contains two high-speed, real-time compilers:<br />
• The high-level intermediate language compiler that<br />
converts HLSL bytecode and the current render<br />
state into an optimized stream of vector<br />
commands for the Shaders.<br />
• The high-performance <strong>JIT</strong> code generator.
Rendering Pipeline & Shader<br />
• Rendering Pipeline<br />
– Refers to the sequence of steps used to create a 2D raster<br />
representation of a 3D scene, and it is the process of turning 3D<br />
model into what the computer displays.<br />
– Modern GPUs use a programmable rendering pipeline that makes it<br />
possible to write your own functions to control how shapes and<br />
images are rendered using vertex and fragment Shaders.<br />
• Shader<br />
– Shader: An user-defined program that is used to do Shading (the<br />
production of appropriate levels of color within an image, or to<br />
produce special effects or do video post-processing). Shader is<br />
designed to execute one of the programmable stages of the rendering<br />
pipeline.<br />
– Vertex Shader: A pipeline stage that handles the processing of<br />
individual vertices, and it performs transformations to post-projection<br />
space (vertex's 3D position in virtual space to the 2D coordinate), and<br />
per-vertex lighting etc.<br />
– Fragment Shader (aka. Pixel Shader): A pipeline stage after a primitive<br />
is rasterized, and it processes a fragment (pixel) generated by the<br />
Rasterization into a set of colors and a single depth value.
GLSL & HLSL<br />
• Shading Language<br />
– A graphics programming language adapted to programming Shader effects (characterizing<br />
surfaces, volumes, and objects).<br />
• GLSL (OpenGL Shading Language)<br />
– A high-level shading language based on the syntax of the C programming language.<br />
– It was created by the OpenGL ARB (OpenGL Architecture Review Board) to give developers<br />
more direct control of the graphics pipeline without having to use ARB assembly language<br />
or hardware-specific languages.<br />
• HLSL (High-Level Shading Language)<br />
– A proprietary shading language developed by Microsoft to augment the Shader assembly<br />
language.<br />
– HLSL is analogous to the GLSL shading language used with the OpenGL standard.
An Example of Shaders<br />
Defined with GLSL<br />
A Fragment Shader<br />
A Vertex Shader
WebGL and Its Usage<br />
• WebGL (Web Graphics Library)<br />
– a JavaScript API for rendering interactive 3D computer graphics and 2D graphics within any<br />
compatible web browser without the use of plug-ins.<br />
– WebGL programs consist of control code written in JavaScript and shader code (GLSL).<br />
– Officially supported by MS IE11 & Edge.<br />
• Create a WebGL Shader program<br />
1. Define Shaders with GLSL in the page.<br />
2. Add a Canvas element to the page, and create a new WebGL rendering context<br />
(getContext("experimental-webgl")).<br />
3. Get Shader source code and compile shader (createShader, shaderSource, compileShader).<br />
4. Attach Shaders to program and link program (createProgram, attachShader, linkProgram) .<br />
5. Feed data from JavaScript into Shader program through attribute or uniform<br />
(getAttribLocation, enableVertexAttribArray, bindBuffer, vertexAttribPointer,<br />
getUniformLocation , uniformxxx).<br />
6. Draw to the screen (drawArrays, drawElements).
Shader’s Lifecycle on WARP<br />
HTML Rendering<br />
Engine<br />
(edgehtml.dll/mshtml.dll)<br />
2. HLSL<br />
3. HLSL bytecode<br />
D3D HLSL<br />
Compiler<br />
(d3dcompiler_47.dll)<br />
5. Create Shader or Draw<br />
D3D 10<br />
Rasterizer<br />
(d3d10warp.dll)<br />
Shader GLSL<br />
source<br />
1. GLSL 4. Create Shader or Draw<br />
D3D 11<br />
runtime<br />
(d3d11.dll)<br />
6. Compile Shader<br />
WARP<br />
Shader <strong>JIT</strong><br />
Code
The Basic Principle of CFG<br />
• About CFG (Control Flow Guard)<br />
– A compiler-aided exploitation mitigation mechanism that prevents exploit from<br />
hijacking the control flow.<br />
– Compiler inserts CFG check before each indirect control transfer instruction<br />
(call/jmp), and at runtime the CFG check will validate the call target address<br />
against a pre-configured CFG bitmap to determine whether the call target is valid<br />
or not. The process will be terminated upon an unexpected call target is identified.<br />
– The RVA of all valid call targets determined at the time of compilation are kept in<br />
a Guard CF Function table in PE file. During the PE loading process, the loader will<br />
read CF info from guard CF function table and update the CFG bitmap.<br />
– The read-only CFG bitmap is maintained by the OS, and part of the bitmap is<br />
shared by all processes. Even bit in CFG bitmap corresponds to one 16-bytes<br />
aligned address, while odd bit corresponds to 15 non 16-bytes aligned addresses.<br />
– When the PE file is loaded, __guard_check_icall_fptr will be resolved to point to<br />
ntdll!LdrpValidateUserCallTarget.
The Basic Principle of CFG<br />
(cont’d)<br />
Compiler inserts a call<br />
target check before each<br />
indirect function call/jmp<br />
CFG bitmap base<br />
16-byte aligned<br />
Non 16-byte aligned, set bit 0 of offset<br />
High 24-bit of call target<br />
address is used as an<br />
index into the bitmap to<br />
get a 32-bit bitmap entry<br />
Bit 3 ~ 7 of target address<br />
is used as an offset<br />
Test the bit “offset” of that<br />
32-bit bitmap entry. Target<br />
address is valid if bit is set,<br />
otherwise trigger INT 29h
Known CFG Bypass Methods<br />
• Call VirtualProtect Wrapper to replace ___guard_check_icall_fptr<br />
– The Wrapper itself must be able to pass CFG check.<br />
– The Wrapper is better to take as few arguments as possible to facilitate passing arguments from high<br />
level language.<br />
– Fixed by adding extra logic in wrapper to make sure it can not be used for other purposes.<br />
• Transit via unguarded trampoline (either in executable or in <strong>JIT</strong> code)<br />
– The trampoline itself must be able to pass CFG check.<br />
– The target address of unguarded indirect control transfer instruction must be controllable.<br />
– Fixed by introducing a CFG check before the indirect control transfer instruction.<br />
• Leverage stack desynchronization situation to overwrite function return address<br />
– Requires a function that contains a controllable function callout, which is used to cause stack imbalance.<br />
– A controllable value must be pushed onto the stack, which happens to overwrite the function’s saved<br />
return address.<br />
– Fixed by enforcing stack pointer sanity check.
Bypass CFG via WARP Shader<br />
<strong>JIT</strong> <strong>Spraying</strong><br />
• The Security Assessment on WARP Shader <strong>JIT</strong> Mechanism<br />
• The Weakness of WARP Shader <strong>JIT</strong> Engine<br />
• The Challenge of Exploiting WARP Shader <strong>JIT</strong> & Solution<br />
• The Detailed Bypass Implementation<br />
• The Possibility of Exploiting 64-bit Browser
The Security Assessment on<br />
WARP Shader <strong>JIT</strong> Mechanism<br />
• Some security related measures<br />
(intentional or otherwise) in WARP<br />
Shader <strong>JIT</strong> implementation raised the<br />
bar of performing successful <strong>JIT</strong><br />
spraying attack.<br />
– <strong>JIT</strong> cache limits<br />
– Separation of data and code
<strong>JIT</strong> Cache Limits<br />
Max cached Shader in <strong>JIT</strong> cache is 0x180, exceeding that threshold leads to the<br />
deletion of cached Shader, thus breaks the continuity of sprayed memory layout.
Separation of Data and Code<br />
Data<br />
Code
The Weakness of WARP Shader<br />
<strong>JIT</strong> Engine<br />
• Although security measures have been<br />
considered in WARP Shader <strong>JIT</strong><br />
implementation, weaknesses still exist,<br />
making it possible to leverage WARP<br />
Shader <strong>JIT</strong> to bypass memory mitigation.<br />
– No randomization of <strong>JIT</strong> page allocation<br />
and <strong>JIT</strong>ed code generation<br />
– No CFG for <strong>JIT</strong>ed code
The Weakness of WARP Shader<br />
<strong>JIT</strong> Engine (cont’d)<br />
• No Randomization of <strong>JIT</strong> Page Allocation and<br />
Code Generation<br />
– WARP <strong>JIT</strong> code page is allocated by<br />
kernel32!VirtualAlloc with MEM_TOP_DOWN flag. As a<br />
result, the repeated <strong>JIT</strong> call can eventually generate<br />
continuous RX pages at the high address end. After<br />
spraying a big enough space (about 19M), certain<br />
address will become stable and predictable.<br />
– The same Shader will always generate the same <strong>JIT</strong>ed<br />
code on the same OS (i.e. the same version of WARP).<br />
• No CFG for <strong>JIT</strong>ed Code<br />
– All bits in CFG bitmap are set by default, meaning any<br />
address in <strong>JIT</strong> code page will be treated as a valid call<br />
target.
No Randomization for <strong>JIT</strong> Page<br />
Allocation & Code Generation<br />
Data<br />
Code
No CFG for <strong>JIT</strong>ed Code<br />
Data<br />
Code
The Challenge of Exploiting<br />
WARP Shader <strong>JIT</strong><br />
I. WARP module not loaded<br />
‒ WARP will NOT be used if the hardware display<br />
device supports D3D10 and above.<br />
‒ Some tested platforms where WARP is used by<br />
default: earlier version of VMware (or VM tools not<br />
updated), MS Hyper-V (without RemoteFX),<br />
Remote Desktop etc.<br />
II.<br />
III.<br />
The speed of WARP <strong>JIT</strong> <strong>Spraying</strong><br />
‒ It takes approx. 5 minutes to spray 0x13F0000 (19M) in order to cover some<br />
predictable address (such as 0x7EC3XXXX).<br />
<strong>JIT</strong>ed instruction generation<br />
– Separation of data and code makes it very difficult to control a <strong>JIT</strong>ed instruction of<br />
three-bytes in length or longer.
WARP Module Not Loaded<br />
Web page contains WebGL Shader<br />
Data<br />
Code<br />
Check Check this this VM setting option to option add<br />
to enable support D3D10 for D3D10 support<br />
WARP will not<br />
come into play<br />
when the<br />
underlying<br />
display hardware<br />
supports D3D10
The Speed of WARP <strong>JIT</strong> <strong>Spraying</strong><br />
Out of 535M private bytes, WARP <strong>JIT</strong><br />
data/code only accounts for 19M<br />
It takes approx. 4.5 minutes to<br />
spray a space of 19M<br />
Data<br />
Check this option to Code add<br />
support for D3D10
The Solution to Challenge I & II<br />
• Without the ability of Arbitrary<br />
Address Read/Write(AAR/AAW),<br />
there seems to be not much we can<br />
do to get these problems solved! <br />
• Magic will happen with the help of AAR/AAW. <br />
• Manipulates the internal data structure of D3D to force the instant<br />
functioning of WARP module on any platform (simply call<br />
LoadLibrary will NOT work though)!<br />
• Tweaks the internal parameter of WARP <strong>JIT</strong> page allocation to reduce<br />
the whole <strong>JIT</strong> spraying time to only a few second!
Force Loading WARP Module<br />
and Fast <strong>JIT</strong> <strong>Spraying</strong><br />
Use larger <strong>JIT</strong> section allocation<br />
to speed up the <strong>JIT</strong> spraying<br />
Data<br />
Code<br />
WARP module is loaded even on<br />
D3D10 Check hardware this option to add<br />
support for D3D10
The Solution to Challenge III<br />
• Specially crafted Shader can generate<br />
some useful two-byte long instructions,<br />
such as \x94\xc3; however this sequence is<br />
no longer usable because CFG check will<br />
alter the value in the eax register. <br />
• Sometimes simple things can make our life easier. <br />
• The natural <strong>JIT</strong> function epilog.<br />
• Indirect jmp via esi.
The Solution to Challenge III (cont’d)<br />
• The natural <strong>JIT</strong> function epilog<br />
‒ pop ebx // pop function call return address<br />
pop ebp // pop the 1 st argument<br />
mov esp, ebp // switch the stack to something we control<br />
pop ebp // skip the 1 st dword<br />
ret<br />
// transfer the control to wherever we want<br />
‒ The 1 st argument of function call must be controllable.<br />
‒ No need for specially crafted Shader.<br />
‒ No need for additional stack pivot ROP.<br />
‒ A lot of usage scenarios, especially when AAR/AAW ability are<br />
acquired (see examples in the following slides).
The Solution to Challenge III (cont’d)<br />
• Indirect jmp via esi<br />
‒ jmp dword ptr [esi+0fh]<br />
‒ The value in esi must be controllable.<br />
‒ No need for specially crafted Shader.<br />
‒ Need stack pivot.<br />
‒ In terms of the virtual function call format in 32-bit binaries, call<br />
dword ptr [esi+xx] is more difficult to find comparing to call esi/call<br />
edi, therefore more difficult to exploit.
CVE-2015-6055<br />
vbscript!VAR::ObjGetDefault<br />
1 st Argument, ROP stack<br />
Shellcode
MS16-063<br />
typedarray.subarray(ropstack)<br />
1 st Argument, ROP stack<br />
Shellcode
CVE-2016-0193 (ret)<br />
array.fill(value, ropstack,<br />
ropstack + 1)<br />
1 st Argument, ROP stack<br />
Shellcode<br />
Shellcode
CVE-2016-0193 (jmp)<br />
esi -> Fake vftable<br />
WebGLRenderingContext.drawArrays() -><br />
d3d11!CDevice::CreateRasterizerState<br />
Shellcode<br />
Stack pivot ROP gadget<br />
Shellcode<br />
Shellcode<br />
* Some trick needs to be played to make sure ebx is holding a proper value to avoid crash @ 0x61d703b7.
Craft some WebGL Shader<br />
and make sure it’s able to<br />
generate the desired <strong>JIT</strong><br />
gadget at certain fixed<br />
offset<br />
Use that particular <strong>JIT</strong><br />
gadget as a trampoline to<br />
bypass CFG, and later<br />
transfer to the main<br />
shellcode or other ROP<br />
gadget if necessary<br />
The Detailed Bypass<br />
Implementation<br />
If for some reason WARP is<br />
not enabled on the system<br />
by default, do some magic<br />
to make it come into play at<br />
runtime<br />
After spraying a big enough<br />
space, some <strong>JIT</strong> gadget will<br />
be expected to appear at<br />
certain predictable address.<br />
(Sometimes some searching<br />
work may be needed)<br />
Trigger WARP<br />
Shader <strong>JIT</strong> in a<br />
repeated manner<br />
by creating/linking<br />
new WebGL<br />
program object and<br />
drawing on the<br />
Canvas in a loop
The Possibility of Exploiting<br />
64-bit Browser<br />
* <strong>Spraying</strong> 9G is big enough to make some address covered, but it still<br />
needs some searching to find the <strong>JIT</strong> gadget within each 1/2G section.<br />
Data<br />
Code<br />
* In fact no need to spray such a huge space if AAR is acquired, the<br />
Check this option to add<br />
<strong>JIT</strong> page address can be deduced by leaking WARP module base.<br />
support for D3D10
Demo<br />
• CFG Bypass via WARP Shader <strong>JIT</strong><br />
– CVE-2016-0193 (Chakra OOB Write)<br />
– MS16-063 (Jscript9 UAF)<br />
– CVE-2015-6055 (VBScript Type Confusion)
Demo – MS16-063
Demo - CVE-2016-0193
Extra Demo<br />
• Other 0day CFG/DEP Bypass Methods<br />
– 0day I (Replace fptr in .idata section)<br />
– 0day II (Create RWX memory)<br />
– 0day III (Make arbitrary memory RW)
Demo - CFG Bypass 0day I<br />
Replace fptr in .idata Section
Demo – CFG/DEP Bypass 0day II<br />
Create RWX CFG Friendly Memory
Demo - CFG Bypass 0day III<br />
Make Arbitrary Memory RW
Q&A<br />
• You are welcomed to send questions to<br />
– Bing Sun @ bing.sun@intel.com<br />
– Chong Xu @ chong.c.xu@intel.com<br />
• Thank MSRC for helping getting the issue fixed in MS June<br />
Patch.<br />
• Special thanks to Haifei Li, Stanley Zhu and the ISecG IPS<br />
Vulnerability Research team.
References<br />
• https://www.blackhat.com/docs/us-15/materials/us-15-Zhang-Bypass-<br />
Control-Flow-Guard-Comprehensively-wp.pdf<br />
• http://xlab.tencent.com/en/2015/12/09/bypass-dep-and-cfg-using-jitcompiler-in-charkra-engine/<br />
• http://xlab.tencent.com/en/2016/01/04/use-chakra-engine-again-to-bypasscfg/<br />
• https://blog.coresecurity.com/2015/03/25/exploiting-cve-2015-0311-part-iibypassing-control-flow-guard-on-windows-8-1-update-3/<br />
• https://blog.coresecurity.com/2016/06/14/exploiting-internet-explorers-<br />
ms15-106-part-ii-jscript-arraybuffer-slice-memory-disclosure-cve-2015-<br />
6053/<br />
• https://labs.bromium.com/2015/09/28/an-interesting-detail-about-controlflow-guard/