21.01.2013 Views

Lecture Notes in Computer Science 4917

Lecture Notes in Computer Science 4917

Lecture Notes in Computer Science 4917

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

foo:(80 times)<br />

BB1: Call bar<br />

BB2: CMP R3,0<br />

JEQ BB4<br />

BB3: ADD R3,12<br />

BB4: RET<br />

bar:(90 times)<br />

BB5: CMP R6,R7<br />

JEQ BB7<br />

BB6: ADD R6,9<br />

BB7: RET<br />

(a)Beforecodereorder<strong>in</strong>g<br />

Aggressive Function Inl<strong>in</strong><strong>in</strong>g: Prevent<strong>in</strong>g Loop Block<strong>in</strong>gs 393<br />

foo:(80 times)<br />

BB1: Call bar<br />

JMP BB2 (penalty)<br />

bar:<br />

BB5: CMP R6,R7<br />

JNE BB6<br />

BB7: RET<br />

BB2: CMP R3,0<br />

JNE BB3<br />

BB4: RET<br />

BB6: ADD R6,9<br />

JMP BB7<br />

BB3: ADD R3,12<br />

JMP BB4<br />

(b) After code reorder<strong>in</strong>g<br />

(i)<br />

foo:(80 times)<br />

BB1: Call bar<br />

BB2: CMP R3,0<br />

JNE BB3<br />

BB4: RET<br />

bar:<br />

BB5: CMP R6,R7<br />

JNE BB6<br />

BB7: RET<br />

BB6: ADD R6,9<br />

JMP BB7<br />

BB3: ADD R3,12<br />

JMP BB4<br />

(c) After code reorder<strong>in</strong>g<br />

(ii)<br />

Fig. 7. Code reorder<strong>in</strong>g followed by function <strong>in</strong>l<strong>in</strong><strong>in</strong>g<br />

foo:(80 times)<br />

BB5: CMP R6,R7<br />

JNE BB6<br />

BB2: CMP R3,0<br />

JNE BB3<br />

BB4: RET<br />

BB6: ADD R6,9<br />

JMP BB4<br />

BB3: ADD R3,12<br />

JMP BB4<br />

bar:(10 times)<br />

BB5: CMP R6,R7<br />

JEQ BB7<br />

BB6: ADD R6,9<br />

BB7: RET<br />

(d) After reorder<strong>in</strong>g<br />

and <strong>in</strong>l<strong>in</strong><strong>in</strong>g<br />

group<strong>in</strong>g <strong>in</strong>creases the Icache miss rate by spread<strong>in</strong>g the hot code over more<br />

cache l<strong>in</strong>es. In addition it <strong>in</strong>creases the amount of “mixed” Icache l<strong>in</strong>es conta<strong>in</strong><strong>in</strong>g<br />

both hot and cold <strong>in</strong>structions. Thus, after <strong>in</strong>l<strong>in</strong><strong>in</strong>g we apply code reorder<strong>in</strong>g<br />

to rearrange the code layout by group<strong>in</strong>g hot consecutive basic blocks <strong>in</strong>to consecutive<br />

cha<strong>in</strong>s. Code reorder<strong>in</strong>g is the last phase of the optimization process and<br />

as such it can determ<strong>in</strong>e the f<strong>in</strong>al location of <strong>in</strong>structions. Hence, code reorder<strong>in</strong>g<br />

can rearrange code segments such that Icache misses are reduced.<br />

Most previous works also used code reorder<strong>in</strong>g after <strong>in</strong>l<strong>in</strong><strong>in</strong>g as will be expla<strong>in</strong>ed<br />

<strong>in</strong> section 5, hence this aspect of code reorder<strong>in</strong>g is well known. In this<br />

section we focus on a different aspect of code reorder<strong>in</strong>g and <strong>in</strong>l<strong>in</strong><strong>in</strong>g which is<br />

the way <strong>in</strong>l<strong>in</strong><strong>in</strong>g can help improve code reorder<strong>in</strong>g’s ability to group larger and<br />

more efficient code segments.<br />

The code reorder<strong>in</strong>g algorithm for generat<strong>in</strong>g optimized sequences of basic<br />

blocks is based on the trac<strong>in</strong>g scheme [11]. The algorithm starts with an entry<br />

po<strong>in</strong>t and grows a trace of basic blocks, based on profile <strong>in</strong>formation. A trace is<br />

a sequence of basic blocks that are executed serially. When the control flow to<br />

the next block <strong>in</strong> a trace reaches an <strong>in</strong>direct branch <strong>in</strong>struction (which usually<br />

<strong>in</strong>dicates a function return) or falls below a certa<strong>in</strong> frequency threshold, the<br />

algorithm stops grow<strong>in</strong>g a trace and starts a new one.<br />

The example <strong>in</strong> Figure 7 exemplifies this scenario (hot basic blocks are<br />

boldfaced). Figure 7a shows the hot path with<strong>in</strong> function foo that <strong>in</strong>cludes<br />

a hot call to function bar. There are two options to build the traces dur<strong>in</strong>g code<br />

reorder<strong>in</strong>g:

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!