21.01.2013 Views

Lecture Notes in Computer Science 4917

Lecture Notes in Computer Science 4917

Lecture Notes in Computer Science 4917

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Model<strong>in</strong>g Multigra<strong>in</strong> Parallelism on Heterogeneous Multi-core Processors 41<br />

APUs, and/or HPU-APU pairs, us<strong>in</strong>g a variant of LogP [6] of po<strong>in</strong>t-to-po<strong>in</strong>t<br />

communication:<br />

Ci,j = Oi + L + Oj, (1)<br />

where Ci,j is the communication cost, Oi, Oj is the overhead of send and receive<br />

respectively, and L is communication latency.<br />

2.2 Program Abstraction<br />

Figure 2 illustrates the program abstraction used by MMGP. We model programs<br />

us<strong>in</strong>g a variant of the Hierarchical Task Graph (HTG [8]). An HTG represents<br />

multiple layers of concurrency with progressively f<strong>in</strong>er granularity when mov<strong>in</strong>g<br />

from outermost to <strong>in</strong>nermost layers. We use a phased HTG, <strong>in</strong> which we partition<br />

the application <strong>in</strong>to multiple phases of execution and split each phase <strong>in</strong>to nested<br />

sub-phases, each modeled as a s<strong>in</strong>gle, potentially parallel task. The degree of<br />

concurrency may vary between tasks and with<strong>in</strong> tasks.<br />

Time<br />

Subtask1<br />

Ma<strong>in</strong> Process<br />

Task1 Task2<br />

Subtask2<br />

Task1<br />

Subtask3<br />

Ma<strong>in</strong> Process<br />

Subtask1<br />

Subtask2<br />

Task2<br />

Fig. 2. Program abstraction for two parallel tasks with nested parallelism<br />

Mapp<strong>in</strong>g a workload with nested parallelism as shown <strong>in</strong> Figure 2 to an<br />

accelerator-based multi-core architecture can be challeng<strong>in</strong>g. In the general case,<br />

any task of any granularity could be mapped to any type comb<strong>in</strong>ation of HPUs<br />

and APUs. The solution space under these conditions can be unmanageable. We<br />

conf<strong>in</strong>e the solution space by mak<strong>in</strong>g some assumptions about the program and<br />

the hardware. First, we assume that the application exposes all available layers<br />

of <strong>in</strong>herent parallelism to the runtime environment, without however specify<strong>in</strong>g<br />

how to map this parallelism to parallel execution vehicles <strong>in</strong> hardware. Second,<br />

we assume hardware configurations consist of a hierarchy of nested resources,<br />

even though the actual resources may not be physically nested <strong>in</strong> the architecture.<br />

For <strong>in</strong>stance, the Cell Broadband Eng<strong>in</strong>e can be considered as 2 HPUs and<br />

8 APUs, where the two HPUs correspond to the PowerPC dual-thread SMT<br />

core and APUs to the synergistic (SPE) accelerator cores. This assumption is<br />

reasonable s<strong>in</strong>ce it represents faithfully current accelerator architectures, where<br />

front-end processors off-load computation and data to accelerators. This assumption<br />

simplifies model<strong>in</strong>g of both communication and computation.<br />

Subtask3

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!