11.07.2015 Views

A Compiler for Parallel Exeuction of Numerical Python Programs on ...

A Compiler for Parallel Exeuction of Numerical Python Programs on ...

A Compiler for Parallel Exeuction of Numerical Python Programs on ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

with AMD architecture. Their loop optimizati<strong>on</strong>s are completely different than the loopoptimizati<strong>on</strong>s described in this thesis because the architectures are very different.Another work that extends OpenMP <str<strong>on</strong>g>for</str<strong>on</strong>g> GPGPU programming is EXOCHI frameworkby Wang et al [23]. EXOCHI is an extensi<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> OpenMP <str<strong>on</strong>g>for</str<strong>on</strong>g> C/C++ <str<strong>on</strong>g>for</str<strong>on</strong>g> heterogenoussystems. Their implementati<strong>on</strong> is <str<strong>on</strong>g>for</str<strong>on</strong>g> a multicore x86 CPU and <str<strong>on</strong>g>for</str<strong>on</strong>g> an integrated Intelgraphics chipset. Unlike the discrete GPUs c<strong>on</strong>sidered in this thesis, such as the Rade<strong>on</strong>4870, Intel graphics chipsets are integrated into the northbridge <str<strong>on</strong>g>of</str<strong>on</strong>g> the CPU and do not sit<strong>on</strong> a PCIe bus. These integrated chips also do not have a separate <strong>on</strong>board memory and canaccess the system RAM. EXOCHI there<str<strong>on</strong>g>for</str<strong>on</strong>g>e does not need to copy data and instead <strong>on</strong>lyneeds to remap the memory address translati<strong>on</strong> table from CPU to the GPU. The addresstranslati<strong>on</strong> remapping is handled by EXOCHI’s runtime. To program the GPU, EXOCHIrequires the programmer to write GPU code but does not require the programmer to do anydata transfers because data transfers are not necessary. Instead, the GPU code can directlyaccess any data in system RAM thereby simplifying the programming. EXOCHI is <strong>on</strong>lysuitable <str<strong>on</strong>g>for</str<strong>on</strong>g> systems where both the CPU and the accelerator (such as the GPU) can accessthe system RAM directly and where the address translati<strong>on</strong> table can be simply remapped.There<str<strong>on</strong>g>for</str<strong>on</strong>g>e EXOCHI is not applicable to current generati<strong>on</strong> discrete GPUs.8.4 C<strong>on</strong>clusi<strong>on</strong>sThis thesis describes the first <str<strong>on</strong>g>Pyth<strong>on</strong></str<strong>on</strong>g> compiler to provide simple parallel programmingsupport <str<strong>on</strong>g>for</str<strong>on</strong>g> numerical applicati<strong>on</strong>s. The implented compiler is also <strong>on</strong>e <str<strong>on</strong>g>of</str<strong>on</strong>g> the first to automaticallymap a shared-memory parallel programming model to a GPGPU system. Thisthesis describes a new algorithm to automatically transfer relevant data between a CPUand a GPU. The implemeneted compiler provides a programming model that is simpler toprogram than current GPGPU APIs such as CUDA and that relies <strong>on</strong> compiler analysisand optimizati<strong>on</strong> to automatically generate GPU code.64

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!