22.09.2016 Views

JIT Spraying Never Dies

JIT%20Spraying%20Never%20Dies%20-%20Bypass%20CFG%20By%20Leveraging%20WARP%20Shader%20JIT%20Spraying

JIT%20Spraying%20Never%20Dies%20-%20Bypass%20CFG%20By%20Leveraging%20WARP%20Shader%20JIT%20Spraying

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>JIT</strong> <strong>Spraying</strong> <strong>Never</strong> <strong>Dies</strong> - Bypass CFG<br />

By Leveraging WARP Shader <strong>JIT</strong> <strong>Spraying</strong><br />

Bing Sun, Chong Xu


Abstract<br />

• Many scripting languages, such as JavaScript and ActionScript, use Just-In-Time (<strong>JIT</strong>) compilation<br />

to improve the script execution performance. However, under some circumstances, the legit <strong>JIT</strong><br />

mechanism can be leveraged by the exploit to bypass memory protection and mitigation such as<br />

ASLR and DEP. Such exploitation technique was first introduced as "<strong>JIT</strong> <strong>Spraying</strong>" in 2010. The<br />

idea is to use the constant numeric value in high-level script language to generate the desired<br />

<strong>JIT</strong>ed code at predictable locations. With the <strong>JIT</strong> spraying as a reliable exploitation technique<br />

seeing its popularity, vendors started to revisit the <strong>JIT</strong> engine implementation. Since then,<br />

mitigation countermeasures, such as randomizing the <strong>JIT</strong> code page allocation and mutating<br />

<strong>JIT</strong>ed code generation, have been employed to prevent <strong>JIT</strong> spraying. Particularly, MS WARP<br />

Shader <strong>JIT</strong> engine, which we will exploit in this talk, has security mechanisms such as Shader<br />

complexity, <strong>JIT</strong> cache size limit, separation between the constant data and code. As a result, the<br />

<strong>JIT</strong> spraying technique became less effective in most exploitation scenarios. <strong>Never</strong>theless, <strong>JIT</strong><br />

<strong>Spraying</strong> technique has never died, even in the most secure Windows 10 era. In this talk, we will<br />

present a completely different <strong>JIT</strong> spraying exploitation technique (based on MS WARP <strong>JIT</strong>) to<br />

bypass control flow guard (CFG) in the context of browser in a generic way. This presentation<br />

provides details on how to circumvent the MS WARP <strong>JIT</strong> restrictions and achieve reliable CFG<br />

bypass. At the end, a live demo will be given to demonstrate bypassing CFG on IE11 and Edge of<br />

Windows 10.


About Speakers<br />

• Bing Sun<br />

– Bing Sun is a senior information security researcher, and now he is leading the IPS security research<br />

team of Intel Security Group (formerly McAfee). He has extensive experiences in operating system<br />

kernel and information security technique R&D, with especially deep diving in advanced<br />

vulnerability exploitation and detection, Rootkits detection, firmware security and virtualization<br />

technology. Moreover, Bing is also a regular speaker at international security conference, such as<br />

XCon, Black Hat and CanSecWest.<br />

• Chong Xu<br />

– Chong received his Ph.D. degree in networking and security from Duke University. His current focus<br />

includes research and innovation on intrusion and prevention techniques as well as threat<br />

intelligence. He is a senior director of Intel Security IPS team, which leads Intel Security vulnerability<br />

research, malware and APT detection, and botnet detection and feeds security content and<br />

innovative protection solutions into Intel Security’s network IPS, host IPS, and sandbox products, as<br />

well as McAfee Global Threat Intelligence (GTI).


Agenda<br />

• Background Knowledge<br />

• Bypass CFG via WARP Shader <strong>JIT</strong> <strong>Spraying</strong><br />

• Demo (WARP Shader <strong>JIT</strong> <strong>Spraying</strong>)<br />

• Extra Demo (Other 0day CFG/DEP Bypass Methods)<br />

• Q&A


Background Knowledge<br />

• Direct3D & DXGI<br />

• Software Rasterizer & WARP<br />

• Rendering Pipeline & Shader<br />

• GLSL/HLSL<br />

• WebGL & Its Usage<br />

• Shader’s Lifecycle on WARP<br />

• The Basic Principle of CFG<br />

• Known CFG Bypass Methods


• Direct3D (part of DirectX)<br />

– A Microsoft DirectX API subsystem component. It is<br />

presented like a thin abstract layer between a graphics<br />

application and the graphics hardware drivers<br />

(comparable to GDI).<br />

– Provides low-level API for drawing primitives with the<br />

rendering pipeline or performing parallel operations<br />

with the compute shader.<br />

– Compete with Khronos' OpenGL and its follow-on<br />

Vulkan.<br />

• DXGI (Microsoft DirectX Graphics<br />

Infrastructure )<br />

– Encapsulates some of the low-level tasks that are<br />

needed by Direct3D 10/11/12.<br />

– Enumerating graphics adapters, enumerating display<br />

modes, selecting buffer formats, sharing resources<br />

between processes, and presenting rendered frames to<br />

a window or monitor for display.<br />

Direct3D & DXGI


Software Rasterizer & WARP<br />

• Software Rasterizer<br />

– A software component that can render an image<br />

independent on graphics hardware (GPU). The rendering<br />

takes place entirely in the CPU.<br />

• WARP (Windows Advanced Rasterization Platform )<br />

– WARP is a full-featured Direct3D 10 software rasterizer<br />

that does not require graphics hardware (GPU) to<br />

execute.<br />

– WARP can be used for rendering when no compatible<br />

hardware is available, in kernel mode applications, in a<br />

headless environment, or for remote rendering of<br />

Remote Desktop Connection client.<br />

– WARP contains two high-speed, real-time compilers:<br />

• The high-level intermediate language compiler that<br />

converts HLSL bytecode and the current render<br />

state into an optimized stream of vector<br />

commands for the Shaders.<br />

• The high-performance <strong>JIT</strong> code generator.


Rendering Pipeline & Shader<br />

• Rendering Pipeline<br />

– Refers to the sequence of steps used to create a 2D raster<br />

representation of a 3D scene, and it is the process of turning 3D<br />

model into what the computer displays.<br />

– Modern GPUs use a programmable rendering pipeline that makes it<br />

possible to write your own functions to control how shapes and<br />

images are rendered using vertex and fragment Shaders.<br />

• Shader<br />

– Shader: An user-defined program that is used to do Shading (the<br />

production of appropriate levels of color within an image, or to<br />

produce special effects or do video post-processing). Shader is<br />

designed to execute one of the programmable stages of the rendering<br />

pipeline.<br />

– Vertex Shader: A pipeline stage that handles the processing of<br />

individual vertices, and it performs transformations to post-projection<br />

space (vertex's 3D position in virtual space to the 2D coordinate), and<br />

per-vertex lighting etc.<br />

– Fragment Shader (aka. Pixel Shader): A pipeline stage after a primitive<br />

is rasterized, and it processes a fragment (pixel) generated by the<br />

Rasterization into a set of colors and a single depth value.


GLSL & HLSL<br />

• Shading Language<br />

– A graphics programming language adapted to programming Shader effects (characterizing<br />

surfaces, volumes, and objects).<br />

• GLSL (OpenGL Shading Language)<br />

– A high-level shading language based on the syntax of the C programming language.<br />

– It was created by the OpenGL ARB (OpenGL Architecture Review Board) to give developers<br />

more direct control of the graphics pipeline without having to use ARB assembly language<br />

or hardware-specific languages.<br />

• HLSL (High-Level Shading Language)<br />

– A proprietary shading language developed by Microsoft to augment the Shader assembly<br />

language.<br />

– HLSL is analogous to the GLSL shading language used with the OpenGL standard.


An Example of Shaders<br />

Defined with GLSL<br />

A Fragment Shader<br />

A Vertex Shader


WebGL and Its Usage<br />

• WebGL (Web Graphics Library)<br />

– a JavaScript API for rendering interactive 3D computer graphics and 2D graphics within any<br />

compatible web browser without the use of plug-ins.<br />

– WebGL programs consist of control code written in JavaScript and shader code (GLSL).<br />

– Officially supported by MS IE11 & Edge.<br />

• Create a WebGL Shader program<br />

1. Define Shaders with GLSL in the page.<br />

2. Add a Canvas element to the page, and create a new WebGL rendering context<br />

(getContext("experimental-webgl")).<br />

3. Get Shader source code and compile shader (createShader, shaderSource, compileShader).<br />

4. Attach Shaders to program and link program (createProgram, attachShader, linkProgram) .<br />

5. Feed data from JavaScript into Shader program through attribute or uniform<br />

(getAttribLocation, enableVertexAttribArray, bindBuffer, vertexAttribPointer,<br />

getUniformLocation , uniformxxx).<br />

6. Draw to the screen (drawArrays, drawElements).


Shader’s Lifecycle on WARP<br />

HTML Rendering<br />

Engine<br />

(edgehtml.dll/mshtml.dll)<br />

2. HLSL<br />

3. HLSL bytecode<br />

D3D HLSL<br />

Compiler<br />

(d3dcompiler_47.dll)<br />

5. Create Shader or Draw<br />

D3D 10<br />

Rasterizer<br />

(d3d10warp.dll)<br />

Shader GLSL<br />

source<br />

1. GLSL 4. Create Shader or Draw<br />

D3D 11<br />

runtime<br />

(d3d11.dll)<br />

6. Compile Shader<br />

WARP<br />

Shader <strong>JIT</strong><br />

Code


The Basic Principle of CFG<br />

• About CFG (Control Flow Guard)<br />

– A compiler-aided exploitation mitigation mechanism that prevents exploit from<br />

hijacking the control flow.<br />

– Compiler inserts CFG check before each indirect control transfer instruction<br />

(call/jmp), and at runtime the CFG check will validate the call target address<br />

against a pre-configured CFG bitmap to determine whether the call target is valid<br />

or not. The process will be terminated upon an unexpected call target is identified.<br />

– The RVA of all valid call targets determined at the time of compilation are kept in<br />

a Guard CF Function table in PE file. During the PE loading process, the loader will<br />

read CF info from guard CF function table and update the CFG bitmap.<br />

– The read-only CFG bitmap is maintained by the OS, and part of the bitmap is<br />

shared by all processes. Even bit in CFG bitmap corresponds to one 16-bytes<br />

aligned address, while odd bit corresponds to 15 non 16-bytes aligned addresses.<br />

– When the PE file is loaded, __guard_check_icall_fptr will be resolved to point to<br />

ntdll!LdrpValidateUserCallTarget.


The Basic Principle of CFG<br />

(cont’d)<br />

Compiler inserts a call<br />

target check before each<br />

indirect function call/jmp<br />

CFG bitmap base<br />

16-byte aligned<br />

Non 16-byte aligned, set bit 0 of offset<br />

High 24-bit of call target<br />

address is used as an<br />

index into the bitmap to<br />

get a 32-bit bitmap entry<br />

Bit 3 ~ 7 of target address<br />

is used as an offset<br />

Test the bit “offset” of that<br />

32-bit bitmap entry. Target<br />

address is valid if bit is set,<br />

otherwise trigger INT 29h


Known CFG Bypass Methods<br />

• Call VirtualProtect Wrapper to replace ___guard_check_icall_fptr<br />

– The Wrapper itself must be able to pass CFG check.<br />

– The Wrapper is better to take as few arguments as possible to facilitate passing arguments from high<br />

level language.<br />

– Fixed by adding extra logic in wrapper to make sure it can not be used for other purposes.<br />

• Transit via unguarded trampoline (either in executable or in <strong>JIT</strong> code)<br />

– The trampoline itself must be able to pass CFG check.<br />

– The target address of unguarded indirect control transfer instruction must be controllable.<br />

– Fixed by introducing a CFG check before the indirect control transfer instruction.<br />

• Leverage stack desynchronization situation to overwrite function return address<br />

– Requires a function that contains a controllable function callout, which is used to cause stack imbalance.<br />

– A controllable value must be pushed onto the stack, which happens to overwrite the function’s saved<br />

return address.<br />

– Fixed by enforcing stack pointer sanity check.


Bypass CFG via WARP Shader<br />

<strong>JIT</strong> <strong>Spraying</strong><br />

• The Security Assessment on WARP Shader <strong>JIT</strong> Mechanism<br />

• The Weakness of WARP Shader <strong>JIT</strong> Engine<br />

• The Challenge of Exploiting WARP Shader <strong>JIT</strong> & Solution<br />

• The Detailed Bypass Implementation<br />

• The Possibility of Exploiting 64-bit Browser


The Security Assessment on<br />

WARP Shader <strong>JIT</strong> Mechanism<br />

• Some security related measures<br />

(intentional or otherwise) in WARP<br />

Shader <strong>JIT</strong> implementation raised the<br />

bar of performing successful <strong>JIT</strong><br />

spraying attack.<br />

– <strong>JIT</strong> cache limits<br />

– Separation of data and code


<strong>JIT</strong> Cache Limits<br />

Max cached Shader in <strong>JIT</strong> cache is 0x180, exceeding that threshold leads to the<br />

deletion of cached Shader, thus breaks the continuity of sprayed memory layout.


Separation of Data and Code<br />

Data<br />

Code


The Weakness of WARP Shader<br />

<strong>JIT</strong> Engine<br />

• Although security measures have been<br />

considered in WARP Shader <strong>JIT</strong><br />

implementation, weaknesses still exist,<br />

making it possible to leverage WARP<br />

Shader <strong>JIT</strong> to bypass memory mitigation.<br />

– No randomization of <strong>JIT</strong> page allocation<br />

and <strong>JIT</strong>ed code generation<br />

– No CFG for <strong>JIT</strong>ed code


The Weakness of WARP Shader<br />

<strong>JIT</strong> Engine (cont’d)<br />

• No Randomization of <strong>JIT</strong> Page Allocation and<br />

Code Generation<br />

– WARP <strong>JIT</strong> code page is allocated by<br />

kernel32!VirtualAlloc with MEM_TOP_DOWN flag. As a<br />

result, the repeated <strong>JIT</strong> call can eventually generate<br />

continuous RX pages at the high address end. After<br />

spraying a big enough space (about 19M), certain<br />

address will become stable and predictable.<br />

– The same Shader will always generate the same <strong>JIT</strong>ed<br />

code on the same OS (i.e. the same version of WARP).<br />

• No CFG for <strong>JIT</strong>ed Code<br />

– All bits in CFG bitmap are set by default, meaning any<br />

address in <strong>JIT</strong> code page will be treated as a valid call<br />

target.


No Randomization for <strong>JIT</strong> Page<br />

Allocation & Code Generation<br />

Data<br />

Code


No CFG for <strong>JIT</strong>ed Code<br />

Data<br />

Code


The Challenge of Exploiting<br />

WARP Shader <strong>JIT</strong><br />

I. WARP module not loaded<br />

‒ WARP will NOT be used if the hardware display<br />

device supports D3D10 and above.<br />

‒ Some tested platforms where WARP is used by<br />

default: earlier version of VMware (or VM tools not<br />

updated), MS Hyper-V (without RemoteFX),<br />

Remote Desktop etc.<br />

II.<br />

III.<br />

The speed of WARP <strong>JIT</strong> <strong>Spraying</strong><br />

‒ It takes approx. 5 minutes to spray 0x13F0000 (19M) in order to cover some<br />

predictable address (such as 0x7EC3XXXX).<br />

<strong>JIT</strong>ed instruction generation<br />

– Separation of data and code makes it very difficult to control a <strong>JIT</strong>ed instruction of<br />

three-bytes in length or longer.


WARP Module Not Loaded<br />

Web page contains WebGL Shader<br />

Data<br />

Code<br />

Check Check this this VM setting option to option add<br />

to enable support D3D10 for D3D10 support<br />

WARP will not<br />

come into play<br />

when the<br />

underlying<br />

display hardware<br />

supports D3D10


The Speed of WARP <strong>JIT</strong> <strong>Spraying</strong><br />

Out of 535M private bytes, WARP <strong>JIT</strong><br />

data/code only accounts for 19M<br />

It takes approx. 4.5 minutes to<br />

spray a space of 19M<br />

Data<br />

Check this option to Code add<br />

support for D3D10


The Solution to Challenge I & II<br />

• Without the ability of Arbitrary<br />

Address Read/Write(AAR/AAW),<br />

there seems to be not much we can<br />

do to get these problems solved! <br />

• Magic will happen with the help of AAR/AAW. <br />

• Manipulates the internal data structure of D3D to force the instant<br />

functioning of WARP module on any platform (simply call<br />

LoadLibrary will NOT work though)!<br />

• Tweaks the internal parameter of WARP <strong>JIT</strong> page allocation to reduce<br />

the whole <strong>JIT</strong> spraying time to only a few second!


Force Loading WARP Module<br />

and Fast <strong>JIT</strong> <strong>Spraying</strong><br />

Use larger <strong>JIT</strong> section allocation<br />

to speed up the <strong>JIT</strong> spraying<br />

Data<br />

Code<br />

WARP module is loaded even on<br />

D3D10 Check hardware this option to add<br />

support for D3D10


The Solution to Challenge III<br />

• Specially crafted Shader can generate<br />

some useful two-byte long instructions,<br />

such as \x94\xc3; however this sequence is<br />

no longer usable because CFG check will<br />

alter the value in the eax register. <br />

• Sometimes simple things can make our life easier. <br />

• The natural <strong>JIT</strong> function epilog.<br />

• Indirect jmp via esi.


The Solution to Challenge III (cont’d)<br />

• The natural <strong>JIT</strong> function epilog<br />

‒ pop ebx // pop function call return address<br />

pop ebp // pop the 1 st argument<br />

mov esp, ebp // switch the stack to something we control<br />

pop ebp // skip the 1 st dword<br />

ret<br />

// transfer the control to wherever we want<br />

‒ The 1 st argument of function call must be controllable.<br />

‒ No need for specially crafted Shader.<br />

‒ No need for additional stack pivot ROP.<br />

‒ A lot of usage scenarios, especially when AAR/AAW ability are<br />

acquired (see examples in the following slides).


The Solution to Challenge III (cont’d)<br />

• Indirect jmp via esi<br />

‒ jmp dword ptr [esi+0fh]<br />

‒ The value in esi must be controllable.<br />

‒ No need for specially crafted Shader.<br />

‒ Need stack pivot.<br />

‒ In terms of the virtual function call format in 32-bit binaries, call<br />

dword ptr [esi+xx] is more difficult to find comparing to call esi/call<br />

edi, therefore more difficult to exploit.


CVE-2015-6055<br />

vbscript!VAR::ObjGetDefault<br />

1 st Argument, ROP stack<br />

Shellcode


MS16-063<br />

typedarray.subarray(ropstack)<br />

1 st Argument, ROP stack<br />

Shellcode


CVE-2016-0193 (ret)<br />

array.fill(value, ropstack,<br />

ropstack + 1)<br />

1 st Argument, ROP stack<br />

Shellcode<br />

Shellcode


CVE-2016-0193 (jmp)<br />

esi -> Fake vftable<br />

WebGLRenderingContext.drawArrays() -><br />

d3d11!CDevice::CreateRasterizerState<br />

Shellcode<br />

Stack pivot ROP gadget<br />

Shellcode<br />

Shellcode<br />

* Some trick needs to be played to make sure ebx is holding a proper value to avoid crash @ 0x61d703b7.


Craft some WebGL Shader<br />

and make sure it’s able to<br />

generate the desired <strong>JIT</strong><br />

gadget at certain fixed<br />

offset<br />

Use that particular <strong>JIT</strong><br />

gadget as a trampoline to<br />

bypass CFG, and later<br />

transfer to the main<br />

shellcode or other ROP<br />

gadget if necessary<br />

The Detailed Bypass<br />

Implementation<br />

If for some reason WARP is<br />

not enabled on the system<br />

by default, do some magic<br />

to make it come into play at<br />

runtime<br />

After spraying a big enough<br />

space, some <strong>JIT</strong> gadget will<br />

be expected to appear at<br />

certain predictable address.<br />

(Sometimes some searching<br />

work may be needed)<br />

Trigger WARP<br />

Shader <strong>JIT</strong> in a<br />

repeated manner<br />

by creating/linking<br />

new WebGL<br />

program object and<br />

drawing on the<br />

Canvas in a loop


The Possibility of Exploiting<br />

64-bit Browser<br />

* <strong>Spraying</strong> 9G is big enough to make some address covered, but it still<br />

needs some searching to find the <strong>JIT</strong> gadget within each 1/2G section.<br />

Data<br />

Code<br />

* In fact no need to spray such a huge space if AAR is acquired, the<br />

Check this option to add<br />

<strong>JIT</strong> page address can be deduced by leaking WARP module base.<br />

support for D3D10


Demo<br />

• CFG Bypass via WARP Shader <strong>JIT</strong><br />

– CVE-2016-0193 (Chakra OOB Write)<br />

– MS16-063 (Jscript9 UAF)<br />

– CVE-2015-6055 (VBScript Type Confusion)


Demo – MS16-063


Demo - CVE-2016-0193


Extra Demo<br />

• Other 0day CFG/DEP Bypass Methods<br />

– 0day I (Replace fptr in .idata section)<br />

– 0day II (Create RWX memory)<br />

– 0day III (Make arbitrary memory RW)


Demo - CFG Bypass 0day I<br />

Replace fptr in .idata Section


Demo – CFG/DEP Bypass 0day II<br />

Create RWX CFG Friendly Memory


Demo - CFG Bypass 0day III<br />

Make Arbitrary Memory RW


Q&A<br />

• You are welcomed to send questions to<br />

– Bing Sun @ bing.sun@intel.com<br />

– Chong Xu @ chong.c.xu@intel.com<br />

• Thank MSRC for helping getting the issue fixed in MS June<br />

Patch.<br />

• Special thanks to Haifei Li, Stanley Zhu and the ISecG IPS<br />

Vulnerability Research team.


References<br />

• https://www.blackhat.com/docs/us-15/materials/us-15-Zhang-Bypass-<br />

Control-Flow-Guard-Comprehensively-wp.pdf<br />

• http://xlab.tencent.com/en/2015/12/09/bypass-dep-and-cfg-using-jitcompiler-in-charkra-engine/<br />

• http://xlab.tencent.com/en/2016/01/04/use-chakra-engine-again-to-bypasscfg/<br />

• https://blog.coresecurity.com/2015/03/25/exploiting-cve-2015-0311-part-iibypassing-control-flow-guard-on-windows-8-1-update-3/<br />

• https://blog.coresecurity.com/2016/06/14/exploiting-internet-explorers-<br />

ms15-106-part-ii-jscript-arraybuffer-slice-memory-disclosure-cve-2015-<br />

6053/<br />

• https://labs.bromium.com/2015/09/28/an-interesting-detail-about-controlflow-guard/

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!