21.01.2013 Views

Lecture Notes in Computer Science 4917

Lecture Notes in Computer Science 4917

Lecture Notes in Computer Science 4917

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

388 Y. Ben Asher et al.<br />

Table 1. Dynamic branch predictors on lead<strong>in</strong>g high-end embedded processors<br />

Processor BHT BTB Return Stack L1 I-cache<br />

AMCC 440GX 1024 16 – 32KB, 32B, 64 ways<br />

Broadcom BCM1250 1024 64 16 32KB, 32B, 4 ways<br />

Cavium Octeon 256 – 4 32KB, 32B, 4 ways<br />

IBM 750GX 512 64 – 32KB, 32B, 8 ways<br />

FreeScale MPC7447A 2048 2 – 32KB, 32B, 8 ways<br />

FreeScale MPC8560 512 512 – 32KB, 32B, 8 ways<br />

PMC-Sierra RM9000x2GL 256 – 4 32KB, 32B, 4 ways<br />

and 16-entry return address cache. On the other hand their L1 I-caches are <strong>in</strong><br />

par with the aforementioned class of processors. The numbers demonstrate the<br />

importance of aggressive <strong>in</strong>l<strong>in</strong><strong>in</strong>g techniques for embedded systems.<br />

2 ILB Based Aggressive Function Inl<strong>in</strong><strong>in</strong>g<br />

Inl<strong>in</strong><strong>in</strong>g decisions are based on the Call Graph (CG) of the program. Each edge<br />

of the graph is assigned a weight: the frequency of each function call accord<strong>in</strong>g to<br />

the profile <strong>in</strong>formation collected. The Average Heat ratio (AvgHeat) is the sum<br />

of all the frequencies of all the executed (dynamic) <strong>in</strong>structions (DI) gathered<br />

dur<strong>in</strong>g the profil<strong>in</strong>g stage, divided by the total number of static <strong>in</strong>structions (SI)<br />

<strong>in</strong> the program:<br />

�DI i freqi<br />

AvgHeat =<br />

SI<br />

Cold edges are def<strong>in</strong>ed as any edge <strong>in</strong> the extended CFG whose weight is lower<br />

than some threshold (e.g., 10%) of AvgHeat. The algorithm implement<strong>in</strong>g the<br />

ILB method is:<br />

1. Based on edge profile <strong>in</strong>formation, create the call graph CG for the given<br />

program and attach a weight to each edge. The weight is its execution frequency.<br />

2. Traverse CG and remove all cold edges. Section 2.2 elaborates on this.<br />

3. Remove cyclic paths from CG by f<strong>in</strong>d<strong>in</strong>g the smallest weighted set of edges<br />

<strong>in</strong> CG us<strong>in</strong>g the algorithm of Eades et. al. [6] for solv<strong>in</strong>g the feedback edge<br />

set problem. Section 3 elaborates on this.<br />

4. Let EG be the extended control flow graph conta<strong>in</strong><strong>in</strong>g the control flow graph<br />

of each function and direct edges for call and return <strong>in</strong>structions. For each<br />

function f <strong>in</strong> the call graph CG, which is a candidate for <strong>in</strong>l<strong>in</strong><strong>in</strong>g:<br />

(a) For every two <strong>in</strong>com<strong>in</strong>g edges e1 ande2 tof <strong>in</strong> CG, letcaller1 bethe<br />

caller basic block end<strong>in</strong>g with e1 andletcaller2 be the caller basic block<br />

end<strong>in</strong>g with e2 <strong>in</strong>EG. fallthru1 andfallthru2 arethebasicblocks<br />

follow<strong>in</strong>g caller1 andcaller2 <strong>in</strong>EG respectively.<br />

(b) Traverse EG and search for directed paths from fallthru1 tocaller2<br />

and from fallthru2 tocaller1.<br />

(c) If both paths exist, remove the e1 ande2 edgesfromCG.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!