29.01.2015 Views

Embedded Software for SoC - Grupo de Mecatrônica EESC/USP

Embedded Software for SoC - Grupo de Mecatrônica EESC/USP

Embedded Software for SoC - Grupo de Mecatrônica EESC/USP

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Chapter 37<br />

LOW ENERGY ASSOCIATIVE DATA CACHES<br />

FOR EMBEDDED SYSTEMS<br />

Dan Nicolaescu, Alex Vei<strong>de</strong>nbaum, and Alex Nicolau<br />

Center <strong>for</strong> <strong>Embed<strong>de</strong>d</strong> Computer Systems, University of Cali<strong>for</strong>nia, Irvine, USA<br />

Abstract. Mo<strong>de</strong>rn embed<strong>de</strong>d processors use data caches with higher and higher <strong>de</strong>grees of<br />

associativity in or<strong>de</strong>r to increase per<strong>for</strong>mance. A set-associative data cache consumes a significant<br />

fraction of the total energy budget in such embed<strong>de</strong>d processors. This paper <strong>de</strong>scribes a<br />

technique <strong>for</strong> reducing the D-cache energy consumption and shows its impact on energy<br />

consumption and per<strong>for</strong>mance of an embed<strong>de</strong>d processor. The technique utilizes cache line<br />

address locality to <strong>de</strong>termine (rather than predict) the cache way prior to the cache access. It<br />

thus allows only the <strong>de</strong>sired way to be accessed <strong>for</strong> both tags and data. The proposed mechanism<br />

is shown to reduce the average L1 data cache energy consumption when running the<br />

MiBench embed<strong>de</strong>d benchmark suite <strong>for</strong> 8, 16 and 32-way set-associate caches by, respectively,<br />

an average of 66%, 72% and 76%. The absolute energy savings from this technique<br />

increase significantly with associativity. The <strong>de</strong>sign has no impact on per<strong>for</strong>mance and, given<br />

that it does not have mis-prediction penalties, it does not introduce any new non-<strong>de</strong>terministic<br />

behavior in program execution.<br />

Key words: set associative, data cache, energy consumption, low power<br />

1. INTRODUCTION<br />

A data caches is an important component of a mo<strong>de</strong>rn embed<strong>de</strong>d processor,<br />

indispensable <strong>for</strong> achieving high per<strong>for</strong>mance. Until recently most embed<strong>de</strong>d<br />

processors did not have a cache or had direct-mapped caches, but today there’s<br />

a growing trend to increase the level of associativity in or<strong>de</strong>r to further<br />

improve the system per<strong>for</strong>mance. For example, Transmeta’s Crusoe [7] and<br />

Motorola’s MPC7450 [8] have 8-way set associative caches and Intel’s XScale<br />

has 32-way set associative caches.<br />

Un<strong>for</strong>tunately the energy consumption of set-associative caches adds to<br />

an already tight energy budget of an embed<strong>de</strong>d processor.<br />

In a set-associative cache the data store access is started at the same time<br />

as the tag store access. When a cache access is initiated the way containing<br />

the requested cache line is not known. Thus all the cache ways in a set are<br />

accessed in parallel. The parallel lookup is an inherently inefficient mechanism<br />

from the point of view of energy consumption, but very important<br />

<strong>for</strong> not increasing the cache latency. The energy consumption per cache<br />

access grows with the increase in associativity.<br />

513<br />

A Jerraya et al. (eds.), <strong>Embed<strong>de</strong>d</strong> <strong>Software</strong> <strong>for</strong> SOC, 513–522, 2003.<br />

© 2003 Kluwer Aca<strong>de</strong>mic Publishers. Printed in the Netherlands.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!