21.01.2013 Views

Lecture Notes in Computer Science 4917

Lecture Notes in Computer Science 4917

Lecture Notes in Computer Science 4917

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Variation-Aware Software Techniques for Cache Leakage Reduction 237<br />

fir<br />

compress<br />

dct<br />

fft<br />

mpeg2<br />

jpeg<br />

20<br />

30<br />

40<br />

50<br />

60<br />

70 80<br />

90 100<br />

30<br />

20<br />

10<br />

0<br />

90<br />

80<br />

70<br />

60<br />

50<br />

40<br />

Total leakage sav<strong>in</strong>g (%)<br />

With<strong>in</strong>-Die Sigma-Vth (mv)<br />

Fig. 11. Sav<strong>in</strong>g improvement with technology scal<strong>in</strong>g<br />

<strong>in</strong> all cases). Fig. 11 shows the trend <strong>in</strong> sav<strong>in</strong>g results which confirm the <strong>in</strong>creas<strong>in</strong>g<br />

significance of the approach <strong>in</strong> future technologies where random with<strong>in</strong>-die Vth<br />

variation is expected to <strong>in</strong>crease [20] due to random dopant fluctuation which is ris<strong>in</strong>g<br />

when further approach<strong>in</strong>g atomic sizes <strong>in</strong> nanometer processes.<br />

Costs of Intra-BB Reschedul<strong>in</strong>g and Register-Renam<strong>in</strong>g. Register-renam<strong>in</strong>g<br />

imposes absolutely no penalty. Instruction-reschedul<strong>in</strong>g has no impact on <strong>in</strong>structioncache<br />

but may <strong>in</strong> rare cases marg<strong>in</strong>ally affect data-cache: s<strong>in</strong>ce the order and address<br />

of basic-blocks do not change, <strong>in</strong>struction cache performance is kept <strong>in</strong>tact. In data<br />

cache, however, reorder<strong>in</strong>g of <strong>in</strong>structions may change the sequence of accesses to<br />

data elements, and hence, may change cache behavior. If a miss-caus<strong>in</strong>g <strong>in</strong>struction is<br />

moved, the hit-ratio is kept, but residence-times (and hence leakage power) of the<br />

evicted and fetched data items change negligibly. In addition, if two <strong>in</strong>structions that<br />

access cache-conflict<strong>in</strong>g data elements change their relative order, the cache hit-ratio<br />

changes if the orig<strong>in</strong>ally-first one was to be a hit. This case may also change the data<br />

that f<strong>in</strong>ally rema<strong>in</strong>s <strong>in</strong> the cache after basic-block execution, and hence, potentially<br />

affects leakage power of the data cache. It is, however, very unlikely to happen when<br />

not<strong>in</strong>g that due to locality of reference, two conflict<strong>in</strong>g data accesses are unlikely to<br />

follow closely <strong>in</strong> time (and <strong>in</strong> a s<strong>in</strong>gle BB). In our experiments data cache power and<br />

performance varied no more than 1%.<br />

Cost of Cache Initialization. As expla<strong>in</strong>ed <strong>in</strong> Section 3, the cache-<strong>in</strong>itialization<br />

technique consumes some dynamic power to execute the cache-management<br />

<strong>in</strong>structions before it can save leakage power. Our implementation of M32R processor<br />

with two separate 8KB <strong>in</strong>struction and data caches on a 0.18μ process technology<br />

consumes 200mW at 50MHz clock frequency. This gives, on average, 4nJ per clock<br />

cycle or pessimistically 20nJ per <strong>in</strong>struction <strong>in</strong> the 5-stage pipel<strong>in</strong>ed M32R processor.<br />

Assum<strong>in</strong>g all 512 cache-l<strong>in</strong>es of the <strong>in</strong>struction cache are to be <strong>in</strong>itialized, 10.24μJ is<br />

consumed for cache-<strong>in</strong>itialization. Tviable can now be calculated us<strong>in</strong>g the powersav<strong>in</strong>g<br />

values obta<strong>in</strong>ed by cache-<strong>in</strong>itialization (Fig. 7). Results are given <strong>in</strong> Table 4<br />

which confirm that most often a small fraction of a second is enough to make the

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!