29.01.2015 Views

Embedded Software for SoC - Grupo de Mecatrônica EESC/USP

Embedded Software for SoC - Grupo de Mecatrônica EESC/USP

Embedded Software for SoC - Grupo de Mecatrônica EESC/USP

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

388 Chapter 29<br />

which already takes the impact of the software restructuring into account. In<br />

addition, there is a promising aspect of combining the hardware and software<br />

approaches. Usually compilers have a global view of the program, which is<br />

hard to obtain at run-time. If the in<strong>for</strong>mation about this global view can be<br />

conveyed to the hardware, the per<strong>for</strong>mance of the system can be increased<br />

significantly. This is particularly true if hardware can integrate this in<strong>for</strong>mation<br />

with the runtime in<strong>for</strong>mation it gathers during execution.<br />

But, can we have the best of both worlds Can we combine hardware and<br />

software techniques in a logical and selective manner so that we can obtain<br />

even better per<strong>for</strong>mance than either applying only one or applying each<br />

in<strong>de</strong>pen<strong>de</strong>ntly Are there simple and cost-effective ways to combine them<br />

Can one scheme interfere with another if applied in<strong>de</strong>pen<strong>de</strong>ntly; that is, does<br />

it result in per<strong>for</strong>mance <strong>de</strong>gradation as compared to using either one scheme<br />

only or no scheme at all<br />

To answer these questions, we use hardware and software locality optimization<br />

techniques in concert and study the interaction between these<br />

optimizations. We propose an integrated scheme, which selectively applies<br />

one of the optimizations and turns the other off. Our goal is to combine<br />

existing hardware and software schemes intelligently and take full advantage<br />

of both the approaches. To achieve this goal, we propose a region <strong>de</strong>tection<br />

algorithm, which is based on the compiler analysis and that <strong>de</strong>termines which<br />

regions of the programs are suitable <strong>for</strong> hardware optimization scheme and<br />

subsequently, which regions are suitable <strong>for</strong> software scheme. Based on this<br />

analysis, our approach turns the hardware optimizations on and off. In the<br />

following, we first <strong>de</strong>scribe some of the current hardware and software locality<br />

optimization techniques briefly, give an overview of our integrated strategy,<br />

and present the organization of this chapter.<br />

1.1. Hardware techniques<br />

Hardware solutions typically involve several levels of memory hierarchy and<br />

further enhancements on each level. Research groups have proposed smart<br />

cache control mechanisms and novel architectures that can <strong>de</strong>tect program<br />

access patterns at run-time and can fine-tune some cache policies so that the<br />

overall cache utilization and data locality are maximized. Among the techniques<br />

proposed are victim caches [10], column-associative caches [1],<br />

hardware prefetching mechanisms, cache bypassing using memory address<br />

table (MAT) [8, 9], dual/split caches [7], and multi-port caches.<br />

1.2. <strong>Software</strong> techniques<br />

In the software area, there is consi<strong>de</strong>rable work on compiler-directed data<br />

locality optimizations. In particular, loop restructuring techniques are wi<strong>de</strong>ly<br />

used in optimizing compilers [6]. Within this context, trans<strong>for</strong>mation techniques<br />

such as loop interchange [13], iteration space tiling [13], and loop

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!