29.01.2015 Views

Embedded Software for SoC - Grupo de Mecatrônica EESC/USP

Embedded Software for SoC - Grupo de Mecatrônica EESC/USP

Embedded Software for SoC - Grupo de Mecatrônica EESC/USP

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Chapter 23<br />

DYNAMIC PARALLELIZATION OF ARRAY<br />

BASED ON-CHIP MULTIPROCESSOR<br />

APPLICATIONS<br />

M. Kan<strong>de</strong>mir 1 ‚ W. Zhang 1 and M. Karakoy 2<br />

1 CSE Department‚ The Pennsylvania State University‚ University Park‚ PA 16802‚ USA;<br />

2 Department of Computing‚ Imperial College‚ London SW7 2AZ‚ UK;<br />

E-mail: {kan<strong>de</strong>mir‚wzhang}@cse.psu.edu‚ m.karakoy@ic.ac.uk<br />

Abstract. Chip multiprocessing (or multiprocessor system-on-a-chip) is a technique that<br />

combines two or more processor cores on a single piece of silicon to enhance computing per<strong>for</strong>mance.<br />

An important problem to be addressed in executing applications on an on-chip multiprocessor<br />

environment is to select the most suitable number of processors to use <strong>for</strong> a given<br />

objective function (e.g.‚ minimizing execution time or energy-<strong>de</strong>lay product) un<strong>de</strong>r multiple constraints.<br />

Previous research proposed an ILP-based solution to this problem that is based on<br />

exhaustive evaluation of each nest un<strong>de</strong>r all possible processor sizes. In this study‚ we take a<br />

different approach and propose a pure runtime strategy <strong>for</strong> <strong>de</strong>termining the best number of<br />

processors to use at runtime. Our experiments show that the proposed approach is successful in<br />

practice.<br />

Key words: co<strong>de</strong> parallelization‚ array-based applications‚ on-chip multiprocessing<br />

1. INTRODUCTION<br />

Researchers agree that chip multiprocessing is the next big thing in CPU<br />

<strong>de</strong>sign [4]. The per<strong>for</strong>mance improvements obtained through better process<br />

technology‚ better micro-architectures and better compilers seem to saturate;<br />

in this respect‚ on-chip multiprocessing provi<strong>de</strong>s an important alternative <strong>for</strong><br />

obtaining better per<strong>for</strong>mance/energy behavior than what is achievable using<br />

current superscalar and VLIW architectures.<br />

An important problem to be addressed in executing applications on an<br />

on-chip multiprocessor environment is to select the most suitable number of<br />

processors to use. This is because in most cases using all available processors<br />

to execute a given co<strong>de</strong> segment may not be the best choice. There might<br />

be several reasons <strong>for</strong> that. First‚ in cases where loop bounds are very small‚<br />

parallelization may not be a good i<strong>de</strong>a at the first place as creating individual<br />

threads of control and synchronizing them may itself take significant amount<br />

of time‚ offsetting the benefits from parallelization. Second‚ in many cases‚<br />

data <strong>de</strong>pen<strong>de</strong>nces impose some restrictions on parallel execution of loop<br />

iterations; in such cases‚ the best results may be obtained by using a specific<br />

305<br />

A Jerraya et al. (eds.)‚ <strong>Embed<strong>de</strong>d</strong> <strong>Software</strong> <strong>for</strong> SOC‚ 305–318‚ 2003.<br />

© 2003 Kluwer Aca<strong>de</strong>mic Publishers. Printed in the Netherlands.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!