11.07.2015 Views

A Compiler for Parallel Exeuction of Numerical Python Programs on ...

A Compiler for Parallel Exeuction of Numerical Python Programs on ...

A Compiler for Parallel Exeuction of Numerical Python Programs on ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

each parallel loop to half its original value. If there are m parallel loops, then 2 m tiles are<str<strong>on</strong>g>for</str<strong>on</strong>g>med. The compiler then computes the amount <str<strong>on</strong>g>of</str<strong>on</strong>g> data to be transferred <str<strong>on</strong>g>for</str<strong>on</strong>g> each tile andif the number <str<strong>on</strong>g>of</str<strong>on</strong>g> elements is less than the amount <str<strong>on</strong>g>of</str<strong>on</strong>g> memory available, then the compilerc<strong>on</strong>tinues tiling the loop. The compiler aborts the attempts to tile the loop if the totalnumber <str<strong>on</strong>g>of</str<strong>on</strong>g> tiles exceeds 64 and aband<strong>on</strong>s ef<str<strong>on</strong>g>for</str<strong>on</strong>g>ts to use the GPU.5.5 C<strong>on</strong>clusi<strong>on</strong>sThis chapter presented a new heuristic algorithm <str<strong>on</strong>g>for</str<strong>on</strong>g> array access analysis and <str<strong>on</strong>g>for</str<strong>on</strong>g> automaticallytransfering data between the system memory and the GPU memory. The algorithm<strong>on</strong>ly handles <strong>on</strong>e class <str<strong>on</strong>g>of</str<strong>on</strong>g> LMADs but can <str<strong>on</strong>g>of</str<strong>on</strong>g>fer significant space savings <strong>on</strong> the GPU comparedto more naive approaches. This chapter also presented a loop tiling algorithm thatcan automatically scale parallel loop nests so that the data required <str<strong>on</strong>g>for</str<strong>on</strong>g> the computati<strong>on</strong>fits in the limited GPU memory. The algorithms have been implemented in jit4GPU thatcurrently <strong>on</strong>ly generates code <str<strong>on</strong>g>for</str<strong>on</strong>g> AMD GPUs, but the algorithms presented in this chapterare equally applicable to any GPGPU system with a separate address space and limitedGPU memory.48

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!