21.01.2013 Views

Lecture Notes in Computer Science 4917

Lecture Notes in Computer Science 4917

Lecture Notes in Computer Science 4917

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

228 M. Goudarzi, T. Ishihara, and H. Noori<br />

rema<strong>in</strong>s <strong>in</strong>variant for a given σVth-<strong>in</strong>tra, but the absolute value of the saved power<br />

<strong>in</strong>creases with decreas<strong>in</strong>g Vth. This makes sense s<strong>in</strong>ce this sav<strong>in</strong>g opportunity is<br />

enabled by the Vth variation, not the Vth average value.<br />

S<strong>in</strong>ce σ ∝ 1 L × W [3], where L and W are effective channel length and width<br />

Vth −<strong>in</strong>tra<br />

respectively, the Vth variation is only to <strong>in</strong>crease with technology scal<strong>in</strong>g, and as Fig. 3<br />

shows, this <strong>in</strong>creases the significance of value-to-cell match<strong>in</strong>g. In 0.13µm process,<br />

empirical study [19] reports σVth-<strong>in</strong>tra=22.1mv for W/L=4 which by extrapolation gives<br />

σvth-<strong>in</strong>tra>60mv <strong>in</strong> 90nm for m<strong>in</strong>imum-geometry transistors; ITRS roadmap also shows<br />

similar prospects [20]. (We found no public empirical report on 90nm and 65nm<br />

processes, apparently due to sensitiveness and confidentiality.) Thus we present results at<br />

various σvth-<strong>in</strong>tra values, but consider 60mv as a typical case. Note that even if the<br />

extrapolation is not accurate for 90nm process, σvth-<strong>in</strong>tra=60 f<strong>in</strong>ally happens at a f<strong>in</strong>er<br />

technology node due to σ Vth −<strong>in</strong>tra<br />

∝ 1 L × W . Fig. 3 shows that maximum theoretical sav<strong>in</strong>g<br />

us<strong>in</strong>g this phenomenon at 60mv variation can be as high as 70%.<br />

3.1 Our Approach<br />

Maximum possible sav<strong>in</strong>g (%<br />

100<br />

90<br />

80<br />

70<br />

60<br />

50<br />

40<br />

30<br />

20<br />

10<br />

0<br />

70.62<br />

57.08<br />

Per-cache<br />

Per-cell<br />

10 20 30 40 50 60 70 80 90 100<br />

Intra-die Vth standard deviation (Sigma Vth-<strong>in</strong>tra)<br />

Fig. 3. Leakage sav<strong>in</strong>g opportunity <strong>in</strong>creases with V th-variation<br />

We propose three techniques applicable to <strong>in</strong>struction-caches: reschedul<strong>in</strong>g <strong>in</strong>structions<br />

with<strong>in</strong> basic-blocks, static register-renam<strong>in</strong>g, and <strong>in</strong>itializ<strong>in</strong>g unused cache-l<strong>in</strong>es. We<br />

first illustrate them by examples before formal formulation.<br />

Illustrative Example 1: Intra-BB Instructions Reschedul<strong>in</strong>g. Fig. 4 illustrates our<br />

approach applied to a small basic block (shown at left <strong>in</strong> Fig. 4) consist<strong>in</strong>g of three 8-bit<br />

<strong>in</strong>structions aga<strong>in</strong>st a 512-set direct-mapped cache with 8-bit l<strong>in</strong>e size. The arrow at the<br />

right of <strong>in</strong>struction-memory box represents dependence of <strong>in</strong>struction 2 to <strong>in</strong>struction 1.<br />

For simplicity, we assume (i) all the 3 <strong>in</strong>structions spend the same amount of time <strong>in</strong> the<br />

cache, and (ii) the leakage-sav<strong>in</strong>g (i.e., |leak0-leak1|) is the same for all bits of the 3<br />

cache l<strong>in</strong>es. An SRAM cell is called 1-friendly (0-friendly) or equivalently prefers 1<br />

(prefers 0), if it leaks less power when stor<strong>in</strong>g a 1 (a 0). This leakage-preference of the<br />

cache l<strong>in</strong>es are given <strong>in</strong> gray <strong>in</strong> the middle of Fig. 4; for example, the leftmost bit of<br />

cache l<strong>in</strong>e number 490 prefers 0 (is 0-friendly) while its rightmost bit prefers 1 (is 1friendly).<br />

The Match<strong>in</strong>g table <strong>in</strong> Fig. 4 shows the number of matched bits for each<br />

(<strong>in</strong>struction, cache-l<strong>in</strong>e) pair. Due to <strong>in</strong>struction dependencies, only three schedules are<br />

valid <strong>in</strong> this example: 1-2-3 (i.e., the orig<strong>in</strong>al one), 1-3-2, and 3-1-2 with respectively

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!