29.01.2015 Views

Embedded Software for SoC - Grupo de Mecatrônica EESC/USP

Embedded Software for SoC - Grupo de Mecatrônica EESC/USP

Embedded Software for SoC - Grupo de Mecatrônica EESC/USP

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Exploring High Bandwidth Pipelined Cache Architecture 355<br />

Table 26-3.<br />

Per<strong>for</strong>mance, energy and area estimates.<br />

Tech<br />

% incr<br />

area<br />

% incr<br />

read<br />

energy<br />

% incr<br />

write<br />

energy<br />

% incr in MOPOS with<br />

respect to conventional<br />

cache<br />

% incr in MOPS with<br />

respect to cache with<br />

pipelined <strong>de</strong>co<strong>de</strong>r<br />

BBR<br />

BBW<br />

ARW<br />

BBR<br />

BBW<br />

ARW<br />

0.18<br />

0.13<br />

0.10<br />

0.07<br />

3.3<br />

7.5<br />

15.3<br />

31.3<br />

3.1<br />

11.2<br />

23.1<br />

54.4<br />

3.7<br />

12.4<br />

25.4<br />

60.6<br />

41.1<br />

85.0<br />

85.4<br />

68.1<br />

16.5<br />

48.9<br />

49.3<br />

42.9<br />

–6.5<br />

20.7<br />

21.0<br />

13.4<br />

15.9<br />

50.5<br />

51.8<br />

47.7<br />

3.9<br />

32.0<br />

32.4<br />

32.2<br />

–19.1<br />

3.6<br />

4.2<br />

3.0<br />

pipelining <strong>for</strong> different technology generations. The energy estimate consi<strong>de</strong>rs<br />

the energy overhead due to extra latches, multiplexers, clock and <strong>de</strong>co<strong>de</strong>rs.<br />

It also accounts <strong>for</strong> the <strong>de</strong>crease in precharge energy as the bit-line capacitance<br />

is reduced. Increase in area due to extra precharge circuitry, latches,<br />

<strong>de</strong>co<strong>de</strong>rs, sense amplifiers, and multiplexer is consi<strong>de</strong>red in this analysis. To<br />

account <strong>for</strong> area and energy overhead together with per<strong>for</strong>mance gain we use<br />

MOPS (million of operation per unit time per unit area per unit energy) as a<br />

metric <strong>for</strong> comparison. Three scenarios have been consi<strong>de</strong>red to calculate<br />

MOPS:<br />

a)<br />

b)<br />

c)<br />

Back to Back Read (BBR) operations;<br />

Back to Back Write (BBW) operations;<br />

Alternate Read and Write (ARW).<br />

Figure 26-6 shows the pipeline stalls required due to resource conflict <strong>for</strong><br />

all three scenarios. In the case of conventional unpipelined cache, both read<br />

and write have to wait until previous read or write is finished. In a cache<br />

with pipelined <strong>de</strong>co<strong>de</strong>r, DD is hid<strong>de</strong>n into previous read or write access.<br />

In the proposed pipeline <strong>de</strong>sign read access, is divi<strong>de</strong>d into three stages<br />

(Figure 26-6). In the case of BBR, a read comes out of the cache in every<br />

clock cycle. A write is a read followed by a write and that requires five stages.<br />

Issuing write back-to-back encounters a structural hazard due to multiple<br />

access of the multiplexer stage in single write. Also the fourth stage requires<br />

the multiplexer and the <strong>de</strong>co<strong>de</strong>r to be accessed simultaneously causing<br />

resource conflict in the pipeline. Issuing another write after two cycles of<br />

previous write resolves this hazard. Similar hazard is there in the case of ARW.<br />

Again issuing next read or write after two cycles of previous read or write<br />

resolves this hazard. Fortunately, similar stalls are required <strong>for</strong> the case when<br />

multiplexer is pipelined and accessed in two clock cycles (0.10 and<br />

technology). MOPS achieved by the pipelined cache is compared with the<br />

conventional and the cache with pipelined <strong>de</strong>co<strong>de</strong>r.<br />

Table 26-3 shows the percentage improvement in MOPS achieved by<br />

pipelining cache. For technology pipeline cache achieves 41% and<br />

15.9% improvement in MOPS in the case of BBR and 16.5% and 3.86%

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!