11.07.2015 Views

A Compiler for Parallel Exeuction of Numerical Python Programs on ...

A Compiler for Parallel Exeuction of Numerical Python Programs on ...

A Compiler for Parallel Exeuction of Numerical Python Programs on ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

access pattern analysis and GPU code generati<strong>on</strong> is d<strong>on</strong>e by a JIT compiler jit4GPU andun<str<strong>on</strong>g>Pyth<strong>on</strong></str<strong>on</strong>g> <strong>on</strong>ly plays a supporting role in GPU code generati<strong>on</strong>.4.2.1 Motivati<strong>on</strong> <str<strong>on</strong>g>for</str<strong>on</strong>g> the JIT compilerOne <str<strong>on</strong>g>of</str<strong>on</strong>g> the original goals in the project to compile <str<strong>on</strong>g>Pyth<strong>on</strong></str<strong>on</strong>g> was to extend un<str<strong>on</strong>g>Pyth<strong>on</strong></str<strong>on</strong>g> togenerate GPU code. However, after some ef<str<strong>on</strong>g>for</str<strong>on</strong>g>t, it became clear that un<str<strong>on</strong>g>Pyth<strong>on</strong></str<strong>on</strong>g> does nothave sufficient in<str<strong>on</strong>g>for</str<strong>on</strong>g>mati<strong>on</strong> to per<str<strong>on</strong>g>for</str<strong>on</strong>g>m the analysis required <str<strong>on</strong>g>for</str<strong>on</strong>g> GPU code generati<strong>on</strong>. Forexample, the compiler does not know the data layout and the aliasing <str<strong>on</strong>g>of</str<strong>on</strong>g> NumPy arraysthat are parameters to a functi<strong>on</strong> being compiled. NumPy arrays are generally views <str<strong>on</strong>g>of</str<strong>on</strong>g>an underlying data array and are more like pointers than true arrays from the perspective<str<strong>on</strong>g>of</str<strong>on</strong>g> compiler analysis. The memory layout <str<strong>on</strong>g>of</str<strong>on</strong>g> a NumPy array is determined by the strides<str<strong>on</strong>g>of</str<strong>on</strong>g> the NumPy array and these strides are dynamic quantities unknown to un<str<strong>on</strong>g>Pyth<strong>on</strong></str<strong>on</strong>g>. Twoapparently distinct NumPy arrays passed into a functi<strong>on</strong> may be two different views <str<strong>on</strong>g>of</str<strong>on</strong>g>the same piece <str<strong>on</strong>g>of</str<strong>on</strong>g> memory. A somewhat similar problem arises in C functi<strong>on</strong>s that havepointer parameters. Typically C compilers per<str<strong>on</strong>g>for</str<strong>on</strong>g>m a global analysis and can customize thefuncti<strong>on</strong> differently <str<strong>on</strong>g>for</str<strong>on</strong>g> different calling c<strong>on</strong>texts. However, such an analysis is not applicable<str<strong>on</strong>g>for</str<strong>on</strong>g> un<str<strong>on</strong>g>Pyth<strong>on</strong></str<strong>on</strong>g>. Un<str<strong>on</strong>g>Pyth<strong>on</strong></str<strong>on</strong>g> is not a whole-program compiler and there<str<strong>on</strong>g>for</str<strong>on</strong>g>e cannot do globalanalysis, such as finding the calling c<strong>on</strong>text <str<strong>on</strong>g>of</str<strong>on</strong>g> all functi<strong>on</strong>s.The objective <str<strong>on</strong>g>of</str<strong>on</strong>g> un<str<strong>on</strong>g>Pyth<strong>on</strong></str<strong>on</strong>g> is to compile <str<strong>on</strong>g>Pyth<strong>on</strong></str<strong>on</strong>g> libraries that can be used by variousapplicati<strong>on</strong>s. The idea is that un<str<strong>on</strong>g>Pyth<strong>on</strong></str<strong>on</strong>g> will <strong>on</strong>ly see a small porti<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> the applicati<strong>on</strong>because <strong>on</strong>ly a small porti<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> the applicati<strong>on</strong> is per<str<strong>on</strong>g>for</str<strong>on</strong>g>mance critical. <str<strong>on</strong>g>Pyth<strong>on</strong></str<strong>on</strong>g> programmerscannot be expected to impose the typing restricti<strong>on</strong>s required by un<str<strong>on</strong>g>Pyth<strong>on</strong></str<strong>on</strong>g> <strong>on</strong> theirentire program. Even if a whole-program compiler were implemented <str<strong>on</strong>g>for</str<strong>on</strong>g> <str<strong>on</strong>g>Pyth<strong>on</strong></str<strong>on</strong>g>, <str<strong>on</strong>g>Pyth<strong>on</strong></str<strong>on</strong>g>applicati<strong>on</strong>s <str<strong>on</strong>g>of</str<strong>on</strong>g>ten c<strong>on</strong>tain bindings to C libraries and those libraries are opaque to a <str<strong>on</strong>g>Pyth<strong>on</strong></str<strong>on</strong>g>compiler unless it is also interfaced with a full C compiler. The complexity <str<strong>on</strong>g>of</str<strong>on</strong>g> such a systemcan be prohibitive to implement. Thus, effectively, un<str<strong>on</strong>g>Pyth<strong>on</strong></str<strong>on</strong>g> cannot do any globalanalysis. In absence <str<strong>on</strong>g>of</str<strong>on</strong>g> a global analysis, <strong>on</strong>e other possible soluti<strong>on</strong> <str<strong>on</strong>g>for</str<strong>on</strong>g> generating codefrom un<str<strong>on</strong>g>Pyth<strong>on</strong></str<strong>on</strong>g> was to generate different versi<strong>on</strong>s <str<strong>on</strong>g>of</str<strong>on</strong>g> the loop <str<strong>on</strong>g>for</str<strong>on</strong>g> different cases <str<strong>on</strong>g>of</str<strong>on</strong>g> layouts<str<strong>on</strong>g>of</str<strong>on</strong>g> NumPy arrays accessed within a loop. But NumPy arrays are very general structuresand the number <str<strong>on</strong>g>of</str<strong>on</strong>g> possible loop versi<strong>on</strong>s can become intractable as the number <str<strong>on</strong>g>of</str<strong>on</strong>g> NumPyarrays increases.A simpler soluti<strong>on</strong> to the problem <str<strong>on</strong>g>of</str<strong>on</strong>g> per<str<strong>on</strong>g>for</str<strong>on</strong>g>ming analysis <str<strong>on</strong>g>for</str<strong>on</strong>g> GPU code generati<strong>on</strong> is tocompile a parallel loop to GPU code just be<str<strong>on</strong>g>for</str<strong>on</strong>g>e the loop is to be executed. At this stage,jit4GPU can query all the NumPy arrays to determine their data layouts and also knowsthe value <str<strong>on</strong>g>of</str<strong>on</strong>g> loop-invariant numeric c<strong>on</strong>stants. For example, loop bounds may appear tobe unknown symbolic c<strong>on</strong>stants to un<str<strong>on</strong>g>Pyth<strong>on</strong></str<strong>on</strong>g> but those c<strong>on</strong>stants are known to jit4gpu.Jit4GPU operates <strong>on</strong> a typed AST <str<strong>on</strong>g>of</str<strong>on</strong>g> the loop to be compiled and per<str<strong>on</strong>g>for</str<strong>on</strong>g>ms several analysis30

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!