the smallest set E because any value <str<strong>on</strong>g>of</str<strong>on</strong>g> b 0 smaller than b 1 will result in either the same ora larger domain D ′ .The methodology is generalized by c<strong>on</strong>structi<strong>on</strong> in algorithm 4 <str<strong>on</strong>g>for</str<strong>on</strong>g> n RCSLMADs withb 0 = b 1 . The idea is that if the number <str<strong>on</strong>g>of</str<strong>on</strong>g> n<strong>on</strong>-distinct bases in the c<strong>on</strong>struct ERL E is q,and if the upper bound <str<strong>on</strong>g>of</str<strong>on</strong>g> the domain <str<strong>on</strong>g>of</str<strong>on</strong>g> E is t n +u, then the total number <str<strong>on</strong>g>of</str<strong>on</strong>g> elements to betransferred to the GPU is (t n +u)∗q since t n +u elements need to be transferred <str<strong>on</strong>g>for</str<strong>on</strong>g> each n<strong>on</strong>distinctRCSLMAD present in the ERL E. Thus <strong>on</strong> the GPU, we can allocate space equalto (t n + u) ∗ q elements. For address mapping, c<strong>on</strong>sider the interval b 0 to b 0 + m − 1 <strong>on</strong> thesystem RAM. Out <str<strong>on</strong>g>of</str<strong>on</strong>g> this interval, q elements accessed by the q n<strong>on</strong>-overlapping RCSLMADsare transferred. This pattern <str<strong>on</strong>g>of</str<strong>on</strong>g> q out <str<strong>on</strong>g>of</str<strong>on</strong>g> every m elements is repeated. There<str<strong>on</strong>g>for</str<strong>on</strong>g>e if elementsnot accessed by the ERL were to be discarded, then the pattern is equivalent to q interleavedaccesses with stride q.Algorithm 4 computes the approximate uni<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> n <strong>on</strong>e-dimensi<strong>on</strong>al RCSLMADs with thesame stride.Inputs: n <strong>on</strong>e-dimensi<strong>on</strong>al RCSLMADs {L 1 , L 2 , .., L n } with stride m and bases{b 1 , b 2 , .., b n } such that b 1 ≤ b 2 ... ≤ b n defined over domain D = {i, 0 ≤ i < u}.Outputs: ERL E and a list <str<strong>on</strong>g>of</str<strong>on</strong>g> n expressi<strong>on</strong>s representing the trans<str<strong>on</strong>g>for</str<strong>on</strong>g>med address <str<strong>on</strong>g>of</str<strong>on</strong>g> eachRCSLMAD.1: <str<strong>on</strong>g>for</str<strong>on</strong>g> each RCSLMAD L j do2: Compute the term t j = ⌊(b j − b 1 )/m⌋ and r j = (b j − b 1 )%m.3: end <str<strong>on</strong>g>for</str<strong>on</strong>g>4: C<strong>on</strong>struct a list R = {b 1 + r 1 , b 1 + r 2 , .., b n + r n }.5: Remove any duplicates from R. Let the number <str<strong>on</strong>g>of</str<strong>on</strong>g> elements remaining in R be q.6: Sort R in-place.7: C<strong>on</strong>struct a <strong>on</strong>e-dimensi<strong>on</strong>al ERL E with stride m, bases R1 and domain D ′ = {j, 0 ≤j ≤ t n + u}.8: C<strong>on</strong>struct an empty list I.9: <str<strong>on</strong>g>for</str<strong>on</strong>g> each RCSLMAD L j do10: Find x such that R(x) = b j + r j .11: Append the expressi<strong>on</strong> (represented as an AST or other compiler IR) q ∗ (i + t j ) + xto I where i is the original loop counter <str<strong>on</strong>g>for</str<strong>on</strong>g> which the analysis is being c<strong>on</strong>ducted.12: end <str<strong>on</strong>g>for</str<strong>on</strong>g>13: return E and I.5.3.3 Multidimensi<strong>on</strong>al RCSLMADs with comm<strong>on</strong> stridesC<strong>on</strong>sider n RCSLMADs {L 1 , L 2 , .., L n } with the comm<strong>on</strong> strides P = {p 1 , p 2 , .., p d }. Letthe RCSLMADs be defined over a d dimensi<strong>on</strong>al domain D = {(i 1 , i 2 , .., i d ) | 0 ≤ i j
solve <str<strong>on</strong>g>for</str<strong>on</strong>g> unknown parameters <str<strong>on</strong>g>of</str<strong>on</strong>g> E using an integer programming solver. The equati<strong>on</strong>s arederived as follows:1. E can be c<strong>on</strong>strained such that the m-th comp<strong>on</strong>ent <str<strong>on</strong>g>of</str<strong>on</strong>g> E must be a superset <str<strong>on</strong>g>of</str<strong>on</strong>g>RCSLMAD L m . The idea is to assume that <str<strong>on</strong>g>for</str<strong>on</strong>g> each V ɛD, there exists a point V ′ ɛD ′such that V ′ = V + {t m1 , t m2 , ..., t mn } and that E(m)(V ′ ) = L m (V ). The values t mkare assumed to be unknown integer c<strong>on</strong>stants. The following equati<strong>on</strong>s can be stated:d∑d∑b m + p k ∗ i k = b ′ m + p k ∗ (i k + t mk ) (5.51)k=1k=1u ′ 1 ≥ u k + t m1 (5.52)u ′ 2 ≥ u 2 + t m2 (5.53).u ′ d ≥ u d + t md (5.54)t m1 ≥ 0 (5.55)t m2 ≥ 0 (5.56).t mk ≥ 0 (5.57)If suitable integer values are found <str<strong>on</strong>g>for</str<strong>on</strong>g> the unknowns t mk and u ′ kthat satisfy the abovec<strong>on</strong>straints, then E is a superset <str<strong>on</strong>g>of</str<strong>on</strong>g> the uni<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> the RCSLMADs by c<strong>on</strong>structi<strong>on</strong>.2. Each comp<strong>on</strong>ent <str<strong>on</strong>g>of</str<strong>on</strong>g> E must be an RCSLMAD and must there<str<strong>on</strong>g>for</str<strong>on</strong>g>e satisfy c<strong>on</strong>straintsrelating the upper bounds and strides.For each integer k such that 2 ≤ k ≤ dd∑p ′ k ≥ p d + (p j ∗ (u ′ j − 1)) (5.58)j=k+13. From the definiti<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> an ERL, the difference between any pair <str<strong>on</strong>g>of</str<strong>on</strong>g> bases (b ′ x, b ′ y shouldbe less than stride p d in the last dimensi<strong>on</strong>.For {(x, y)|1 ≤ x ≤ n, 1 ≤ y ≤ n, x ≠ y}b x − b y ≤ p d − 1 (5.59)These inequalities <str<strong>on</strong>g>for</str<strong>on</strong>g>ms a set <str<strong>on</strong>g>of</str<strong>on</strong>g> (n − 1) 2 c<strong>on</strong>straints necessary <str<strong>on</strong>g>for</str<strong>on</strong>g> ensuring that E isan ERL.Thus a total <str<strong>on</strong>g>of</str<strong>on</strong>g> n equality c<strong>on</strong>straints and n ∗ d + (d − 1) + (n − 1) 2 inequality c<strong>on</strong>straintscan be derived <str<strong>on</strong>g>for</str<strong>on</strong>g> a total <str<strong>on</strong>g>of</str<strong>on</strong>g> n + d + n ∗ d unknowns. The unknown variables are u ′ k , b′ m45