11.07.2015 Views

A Compiler for Parallel Exeuction of Numerical Python Programs on ...

A Compiler for Parallel Exeuction of Numerical Python Programs on ...

A Compiler for Parallel Exeuction of Numerical Python Programs on ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

the smallest set E because any value <str<strong>on</strong>g>of</str<strong>on</strong>g> b 0 smaller than b 1 will result in either the same ora larger domain D ′ .The methodology is generalized by c<strong>on</strong>structi<strong>on</strong> in algorithm 4 <str<strong>on</strong>g>for</str<strong>on</strong>g> n RCSLMADs withb 0 = b 1 . The idea is that if the number <str<strong>on</strong>g>of</str<strong>on</strong>g> n<strong>on</strong>-distinct bases in the c<strong>on</strong>struct ERL E is q,and if the upper bound <str<strong>on</strong>g>of</str<strong>on</strong>g> the domain <str<strong>on</strong>g>of</str<strong>on</strong>g> E is t n +u, then the total number <str<strong>on</strong>g>of</str<strong>on</strong>g> elements to betransferred to the GPU is (t n +u)∗q since t n +u elements need to be transferred <str<strong>on</strong>g>for</str<strong>on</strong>g> each n<strong>on</strong>distinctRCSLMAD present in the ERL E. Thus <strong>on</strong> the GPU, we can allocate space equalto (t n + u) ∗ q elements. For address mapping, c<strong>on</strong>sider the interval b 0 to b 0 + m − 1 <strong>on</strong> thesystem RAM. Out <str<strong>on</strong>g>of</str<strong>on</strong>g> this interval, q elements accessed by the q n<strong>on</strong>-overlapping RCSLMADsare transferred. This pattern <str<strong>on</strong>g>of</str<strong>on</strong>g> q out <str<strong>on</strong>g>of</str<strong>on</strong>g> every m elements is repeated. There<str<strong>on</strong>g>for</str<strong>on</strong>g>e if elementsnot accessed by the ERL were to be discarded, then the pattern is equivalent to q interleavedaccesses with stride q.Algorithm 4 computes the approximate uni<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> n <strong>on</strong>e-dimensi<strong>on</strong>al RCSLMADs with thesame stride.Inputs: n <strong>on</strong>e-dimensi<strong>on</strong>al RCSLMADs {L 1 , L 2 , .., L n } with stride m and bases{b 1 , b 2 , .., b n } such that b 1 ≤ b 2 ... ≤ b n defined over domain D = {i, 0 ≤ i < u}.Outputs: ERL E and a list <str<strong>on</strong>g>of</str<strong>on</strong>g> n expressi<strong>on</strong>s representing the trans<str<strong>on</strong>g>for</str<strong>on</strong>g>med address <str<strong>on</strong>g>of</str<strong>on</strong>g> eachRCSLMAD.1: <str<strong>on</strong>g>for</str<strong>on</strong>g> each RCSLMAD L j do2: Compute the term t j = ⌊(b j − b 1 )/m⌋ and r j = (b j − b 1 )%m.3: end <str<strong>on</strong>g>for</str<strong>on</strong>g>4: C<strong>on</strong>struct a list R = {b 1 + r 1 , b 1 + r 2 , .., b n + r n }.5: Remove any duplicates from R. Let the number <str<strong>on</strong>g>of</str<strong>on</strong>g> elements remaining in R be q.6: Sort R in-place.7: C<strong>on</strong>struct a <strong>on</strong>e-dimensi<strong>on</strong>al ERL E with stride m, bases R1 and domain D ′ = {j, 0 ≤j ≤ t n + u}.8: C<strong>on</strong>struct an empty list I.9: <str<strong>on</strong>g>for</str<strong>on</strong>g> each RCSLMAD L j do10: Find x such that R(x) = b j + r j .11: Append the expressi<strong>on</strong> (represented as an AST or other compiler IR) q ∗ (i + t j ) + xto I where i is the original loop counter <str<strong>on</strong>g>for</str<strong>on</strong>g> which the analysis is being c<strong>on</strong>ducted.12: end <str<strong>on</strong>g>for</str<strong>on</strong>g>13: return E and I.5.3.3 Multidimensi<strong>on</strong>al RCSLMADs with comm<strong>on</strong> stridesC<strong>on</strong>sider n RCSLMADs {L 1 , L 2 , .., L n } with the comm<strong>on</strong> strides P = {p 1 , p 2 , .., p d }. Letthe RCSLMADs be defined over a d dimensi<strong>on</strong>al domain D = {(i 1 , i 2 , .., i d ) | 0 ≤ i j

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!