29.01.2015 Views

Embedded Software for SoC - Grupo de Mecatrônica EESC/USP

Embedded Software for SoC - Grupo de Mecatrônica EESC/USP

Embedded Software for SoC - Grupo de Mecatrônica EESC/USP

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

232 Chapter 18<br />

would not be easily obtained during execution. While one might think of<br />

different types of in<strong>for</strong>mation that can be passed from the compiler to the<br />

OS, in this work, we focus on in<strong>for</strong>mation that helps improve data cache<br />

per<strong>for</strong>mance. Specifically, we use the compiler to analyze and restructure the<br />

process co<strong>de</strong>s so that the data reuse in the cache is maximized.<br />

Such a strategy is expected to be viable in the embed<strong>de</strong>d environments<br />

where process co<strong>de</strong>s are extracted from the same application and have significant<br />

data reuse between them. In many cases, it is more preferable to<br />

co<strong>de</strong> an embed<strong>de</strong>d application as a set of co-operating processes (instead of<br />

a large monolithic co<strong>de</strong>). This is because in general such a coding style leads<br />

to better modularity and maintainability. Such light-weight processes are often<br />

called threads. In array-intensive embed<strong>de</strong>d applications (e.g., such as those<br />

found in embed<strong>de</strong>d image and vi<strong>de</strong>o processing domains), we can expect a<br />

large <strong>de</strong>gree of data sharing between processes extracted from the same<br />

application. Previous work on process scheduling in the embed<strong>de</strong>d systems<br />

area inclu<strong>de</strong> works targeting instruction and data caches. A careful mapping<br />

of the process co<strong>de</strong>s to memory can help reduce the conflict misses in instruction<br />

caches significantly. We refer the rea<strong>de</strong>r to [8] and [4] <strong>for</strong> elegant process<br />

mapping strategies that target instruction caches. Li and Wolfe [5] present a<br />

mo<strong>de</strong>l <strong>for</strong> estimating the per<strong>for</strong>mance of multiple processes sharing a cache.<br />

Section 1.2 presents the <strong>de</strong>tails of our strategy. Section 1.3 presents our<br />

benchmarks and experimental setting, and discusses the preliminary results<br />

obtained using a set of five embed<strong>de</strong>d applications. Section 1.4 presents our<br />

concluding remarks.<br />

2. OUR APPROACH<br />

Our process scheduling algorithm takes a data space oriented approach. The<br />

basic i<strong>de</strong>a is to restructure the process co<strong>de</strong>s to improve data cache locality.<br />

Let us first focus on a single array; that is, let us assume that each process<br />

manipulates the same array. We will relax this constraint shortly. We first<br />

logically divi<strong>de</strong> the array in question into tiles (these tiles are called data tiles).<br />

Then, these data tiles are visited one-by-one (in some or<strong>de</strong>r), and we <strong>de</strong>termine<br />

<strong>for</strong> each process a set of iterations (called iteration tile) that will be<br />

executed in each of its quanta. This approach will obviously increase data<br />

reuse in the cache (as in their corresponding quanta the processes manipulate<br />

the same set of array elements). The important questions here are how to<br />

divi<strong>de</strong> the array into data tiles, in which or<strong>de</strong>r the tiles should be visited, and<br />

how to <strong>de</strong>termine the iterations to execute (<strong>for</strong> each process in each quanta).<br />

In the following discussion, after presenting a simple example illustrating<br />

the overall i<strong>de</strong>a, we explain our approach to these three issues in <strong>de</strong>tail.<br />

Figure 18-1 shows a simple case (<strong>for</strong> illustrative purposes) where three<br />

processes access the same two-dimensional array (shown in Figure 18-1 (a)).<br />

The array is divi<strong>de</strong>d into four data tiles. The first and the third processes

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!