21.01.2013 Views

Lecture Notes in Computer Science 4917

Lecture Notes in Computer Science 4917

Lecture Notes in Computer Science 4917

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

48 F. Blagojevic et al.<br />

(a) (b) (c) (d)<br />

Fig. 6. MMGP predictions and actual execution times of PBPI, when the code uses<br />

one dimension of PPE-HPU, ((a), (b)), and SPE-APU ((c), (d)) parallelism<br />

error is 4.1% and the standard deviation is 3.2. The maximum prediction error<br />

<strong>in</strong> this case is 10%. We measured the execution time necessary for solv<strong>in</strong>g<br />

Equation 11 for T (m, p) tobe0.4μs. The overhead of the model is therefore<br />

negligible.<br />

4.4 PBPI with Two Dimensions of Parallelism<br />

Figure 7 shows the modeled and actual execution times of PBPI for all feasible<br />

comb<strong>in</strong>ations of two-dimensional parallelism under the constra<strong>in</strong>t that the code<br />

does not use more than 16 SPEs, i.e. the maximum number of SPEs on the<br />

experimental platform. MMGP’s mean prediction error is 3.2%, the standard<br />

deviation of the error is 2.6 and the maximum prediction error is 10%. The<br />

important observation <strong>in</strong> these results is that MMGP matches the experimental<br />

outcome <strong>in</strong> terms of the degrees of PPE and SPE parallelism to use <strong>in</strong> PBPI<br />

for maximiz<strong>in</strong>g performance. In a real program development scenario, MMGP<br />

would po<strong>in</strong>t the programmer <strong>in</strong> the direction of us<strong>in</strong>g two layers of parallelism<br />

with a balanced allocation of PPE contexts and SPEs between the two layers.<br />

In pr<strong>in</strong>ciple, if the difference between the optimal and nearly optimal configurations<br />

of parallelism are with<strong>in</strong> the marg<strong>in</strong> of error of MMGP, MMGP<br />

may not predict the optimal configuration accurately. In the applications we<br />

tested, MMGP never mispredicts the optimal configuration. We also anticipate<br />

that due to high accuracy, potential MMGP mispredictions should generally<br />

lead to configurations that perform marg<strong>in</strong>ally lower than the actual optimal<br />

configuration.<br />

4.5 RAxML Outl<strong>in</strong>e<br />

RAxML uses an embarrass<strong>in</strong>gly parallel master-worker algorithm, implemented<br />

with MPI. In RAxML, workers perform two tasks: (i) calculation of multiple <strong>in</strong>ferences<br />

on the <strong>in</strong>itial alignment <strong>in</strong> order to determ<strong>in</strong>e the best known Maximum<br />

Likelihood tree, and (ii) bootstrap analyses to determ<strong>in</strong>e how well supported are<br />

some parts of the Maximum Likelihood tree. From a computational po<strong>in</strong>t of view,<br />

<strong>in</strong>ferences and bootstraps are identical. We use an optimized port of RAxML on<br />

Cell, described <strong>in</strong> further detail <strong>in</strong> [5].

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!