26.06.2015 Views

Parallel Programming in Fortran 95 using OpenMP - People

Parallel Programming in Fortran 95 using OpenMP - People

Parallel Programming in Fortran 95 using OpenMP - People

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

3.2. Other clauses 49<br />

The 600 iterations are go<strong>in</strong>g to be distributed <strong>in</strong> different ways by chang<strong>in</strong>g the<br />

value of chunk. The results are shown graphically <strong>in</strong> figure 3.6, where each block<br />

represents a piece of work and the number <strong>in</strong>side it the thread identification number<br />

process<strong>in</strong>g the piece of work.<br />

0<br />

0<br />

0<br />

0<br />

0<br />

1<br />

1<br />

2<br />

1<br />

1<br />

0<br />

2<br />

2<br />

chunk = 150 chunk = 250 chunk = 300 default<br />

50<br />

100<br />

150<br />

200<br />

250<br />

300<br />

350<br />

400<br />

450<br />

500<br />

550<br />

600<br />

I<br />

t<br />

e r<br />

a<br />

ti<br />

o<br />

n<br />

s<br />

p<br />

ace<br />

Figure 3.6: Graphical representation of the example expla<strong>in</strong><strong>in</strong>g the general work<strong>in</strong>g pr<strong>in</strong>ciple<br />

of the SCHEDULE(STATIC,chunk) clause for different values of the optional parameter chunk.<br />

In the first case, chunk = 150, four blocks of work are needed to cover the complete<br />

iteration space. S<strong>in</strong>ce the total number of iterations is exactly four times the value<br />

of chunk, all blocks are exactly equal <strong>in</strong> size. To distribute the different blocks<br />

among the exist<strong>in</strong>g threads, a round-rob<strong>in</strong> procedure is used lead<strong>in</strong>g to the presented<br />

distribution. The result<strong>in</strong>g parallel runn<strong>in</strong>g code is not very efficient, s<strong>in</strong>ce thread<br />

0 is go<strong>in</strong>g to work while all the other threads are wait<strong>in</strong>g <strong>in</strong> an idle state.<br />

In the second case, chunk = 250, the number of blocks equals the number of threads,<br />

which is someth<strong>in</strong>g good. But this time the sizes of the blocks are not equal, s<strong>in</strong>ce<br />

the total number of iterations is not an exact multiply of chunk. Even though, this<br />

value of the optional parameter leads to a faster code than <strong>in</strong> the previous case,<br />

because the latency times are smaller 4 .<br />

In the third case, chunk = 300, the number of blocks is smaller than the number of<br />

available threads. This means that thread 2 is not go<strong>in</strong>g to work for the present<br />

do-loop. The result<strong>in</strong>g efficiency is worse than <strong>in</strong> the previous case and equal to the<br />

first case.<br />

In the last case, which corresponds to do not specify the optional parameter chunk,<br />

<strong>OpenMP</strong> creates a number of blocks equal to the number of available threads and<br />

with equal block sizes. In general this option leads to the best performance, if all<br />

iterations require the same computational time. This remark also applies to the<br />

previous cases and explanations.<br />

4 In the first case the fact, that thread 0 needs to compute <strong>in</strong> total 300 iterations, leads to a slower<br />

code than <strong>in</strong> the second case, where the maximum number of iterations to compute by one thread is 250.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!