21.01.2013 Views

Lecture Notes in Computer Science 4917

Lecture Notes in Computer Science 4917

Lecture Notes in Computer Science 4917

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

390 Y. Ben Asher et al.<br />

35%<br />

30%<br />

25%<br />

20%<br />

15%<br />

10%<br />

5%<br />

0%<br />

-5%<br />

-10%<br />

-15%<br />

100%<br />

80%<br />

60%<br />

40%<br />

20%<br />

0%<br />

-20%<br />

Performance improvement on Power4<br />

<strong>in</strong>l<strong>in</strong>e all exec funcs <strong>in</strong>l<strong>in</strong>e hot funcs<br />

<strong>in</strong>l<strong>in</strong>e small funcs <strong>in</strong>l<strong>in</strong>e dom<strong>in</strong>ant calls<br />

ILB<br />

gcc pars tw olf perlb bzip2 crafty vortex eon gzip gap vpr m cf Avg<br />

Performance improvement on 440GX<br />

<strong>in</strong>l<strong>in</strong>e all exec funcs <strong>in</strong>l<strong>in</strong>e hot funcs<br />

<strong>in</strong>l<strong>in</strong>e small funcs <strong>in</strong>l<strong>in</strong>e dom<strong>in</strong>ant calls<br />

ILB<br />

gcc pars tw olf perlb bzip2 crafty vortex eon gzip gap vpr m cf Avg<br />

Fig. 3. Performance improvements on the Power4 (top) and 440GX (bottom)<br />

<strong>in</strong>l<strong>in</strong><strong>in</strong>g methods that are based on comb<strong>in</strong>ations of size and temperature. The<br />

improvements on the embedded CPU are smaller than those on the server. This<br />

can be expla<strong>in</strong>ed by the fact that the Icache on the AMCC 440GX is highly associative.<br />

Therefore, conflict<strong>in</strong>g <strong>in</strong>l<strong>in</strong><strong>in</strong>gs on the Power4’s direct-mapped Icache<br />

do not conflict on the 440GX’s associative Icache.<br />

Figure 4 depicts the number of function <strong>in</strong>l<strong>in</strong>ed by each method. Note that<br />

ILB <strong>in</strong>l<strong>in</strong>ed less functions than most of the other methods yet obta<strong>in</strong>ed a higher<br />

performance. This suggests that the ILB scheme <strong>in</strong>l<strong>in</strong>es the “correct” set of<br />

functions. Moreover, there is a clear correlation between <strong>in</strong>l<strong>in</strong><strong>in</strong>g to many functions<br />

and performance degradation, demonstrat<strong>in</strong>g the need for “precise” <strong>in</strong>l<strong>in</strong><strong>in</strong>g<br />

methods.<br />

2.2 Remov<strong>in</strong>g Cold Edges<br />

Here we describe <strong>in</strong> detail how “cold” edges are removed from the call graph,<br />

and are not considered for <strong>in</strong>l<strong>in</strong><strong>in</strong>g. The algorithm is based on a threshold that<br />

<strong>in</strong>dicates which edges are considered cold. The experimental results verify that<br />

each program requires a different threshold <strong>in</strong> order to maximize the performance<br />

of aggressive <strong>in</strong>l<strong>in</strong><strong>in</strong>g. This requires a normalization procedure of the profile<br />

<strong>in</strong>formation as follows:

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!