11.07.2015 Views

A Compiler for Parallel Exeuction of Numerical Python Programs on ...

A Compiler for Parallel Exeuction of Numerical Python Programs on ...

A Compiler for Parallel Exeuction of Numerical Python Programs on ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

the element and its 4 neighbors and writes the result to the corresp<strong>on</strong>ding element in amatrix <str<strong>on</strong>g>of</str<strong>on</strong>g> the same dimensi<strong>on</strong>s. The kernel is highly parallel as each element is processedindependently but the amount <str<strong>on</strong>g>of</str<strong>on</strong>g> computati<strong>on</strong> per point is very small. The compiler wasable to successfully carry out the array access analysis and was able to generate the GPUcode <str<strong>on</strong>g>for</str<strong>on</strong>g> this benchmark. However, the data transfer overhead c<strong>on</strong>siderably outweighed thecomputati<strong>on</strong> time savings <str<strong>on</strong>g>of</str<strong>on</strong>g> the GPU. There<str<strong>on</strong>g>for</str<strong>on</strong>g>e, the benchmark ran slower when usinga GPU compared to OpenMP. The results are reported in Table 7.9. For each case, theminimum, maximum and mean executi<strong>on</strong> times are presented.Table 7.9: Executi<strong>on</strong> time <str<strong>on</strong>g>for</str<strong>on</strong>g> 5-point stencil benchmark (millisec<strong>on</strong>ds)Problem size 1024 2048 3072 4096Serial Time min 10.8 43.3 97.5 175max 10.9 43.7 98.2 176.6mean 10.8 43.4 97.7 175.8OpenMP 1 thread min 10.1 46 70 150max 10.26 47 71.24 152mean 10.17 46.3 70.4 151OpenMP 4 threads min 5 24 47.3 58.1max 5 24 47.8 58.5mean 5 24 47.5 58.2GPU Total (Opt) min 65 98.8 123.1 1010max 65.8 99.6 124.9 1016mean 65.3 99.1 123.7 1014GPU Only (Opt) min 0.39 0.89 2.8 25max 0.39 0.89 2.8 26mean 0.39 0.89 2.8 25.5GPU Total (No Opt) min 35.3 63.1 112.1 1000.3max 35.6 63.7 112.7 1006mean 35.4 63.25 112.3 1002GPU Only (No Opt) min 0.59 2.0 4.2 50max 0.59 2.0 4.2 52mean 0.59 2.0 4.2 517.5 RPES benchmarkRPES benchmark is a <str<strong>on</strong>g>Pyth<strong>on</strong></str<strong>on</strong>g> adapti<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> the benchmark from Parboil suite. The benchmarkinvolves indirect memory loads and triangular loops. Un<str<strong>on</strong>g>Pyth<strong>on</strong></str<strong>on</strong>g> determined that aGPU versi<strong>on</strong> cannot be generated and there<str<strong>on</strong>g>for</str<strong>on</strong>g>e did not generate calls to the JIT compiler.There<str<strong>on</strong>g>for</str<strong>on</strong>g>e, the per<str<strong>on</strong>g>for</str<strong>on</strong>g>mance did not change when the GPU was enabled because theGPU was never used and the JIT compiler was not called. The compiler <strong>on</strong>ly generatedan OpenMP versi<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> the benchmark. This benchmark illustrates that the programmercan safely add GPU parallel annotati<strong>on</strong>s without any fear <str<strong>on</strong>g>of</str<strong>on</strong>g> errors in the case <str<strong>on</strong>g>of</str<strong>on</strong>g> compilerlimitati<strong>on</strong>s.The benchmark was <strong>on</strong>ly tested with default parameters provided by the benchmark58

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!