11.07.2015 Views

A Compiler for Parallel Exeuction of Numerical Python Programs on ...

A Compiler for Parallel Exeuction of Numerical Python Programs on ...

A Compiler for Parallel Exeuction of Numerical Python Programs on ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Algorithm 6 per<str<strong>on</strong>g>for</str<strong>on</strong>g>ms loop trans<str<strong>on</strong>g>for</str<strong>on</strong>g>mati<strong>on</strong>s <str<strong>on</strong>g>for</str<strong>on</strong>g> GPU code1: <str<strong>on</strong>g>for</str<strong>on</strong>g> all loops with no child loops do2: Mark as candidate <str<strong>on</strong>g>for</str<strong>on</strong>g> unrolling3: end <str<strong>on</strong>g>for</str<strong>on</strong>g>4: <str<strong>on</strong>g>for</str<strong>on</strong>g> all parallel loops do5: Mark as suitable <str<strong>on</strong>g>for</str<strong>on</strong>g> unrolling if children loops have no children.6: end <str<strong>on</strong>g>for</str<strong>on</strong>g>7: while register usage is below threshold do8: Pick the innermost unrolling candidate loop available and unroll.9: Update register usage estimate <str<strong>on</strong>g>of</str<strong>on</strong>g> parent loops.10: Mark the unrolled loop as unsuitable <str<strong>on</strong>g>for</str<strong>on</strong>g> unrolling.11: end while12: Fuse as many loops as possible.13: Per<str<strong>on</strong>g>for</str<strong>on</strong>g>m load-coalescing trans<str<strong>on</strong>g>for</str<strong>on</strong>g>mati<strong>on</strong>s.51

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!