13.07.2015 Views

More Iteration Space Tiling

More Iteration Space Tiling

More Iteration Space Tiling

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Program 13b:do I = 2, N-ldo J = 1+2, I+M-1s1: A(I,J-1)=0.2*(A(I-l,J-I)+A(I,J-I-1)+A(I,J-I)+A(I+l,J-I)+A(I,J-I+l))enddoenddoFigure i’.Program 13~:do J = 4, N+M-2do I = MAX(2,J-M+l), MIN(N-l,J-2)S 1: A(I,J-1)=0.2*(A(I-l,J-I)+A(I,J-I-1)+A(I,J-I)+A(I+l,J-I)+A(I,J-I+l))enddoenddoLoop skewing changes the shape of the iterationspace from a rectangle to a parallelogram. We can skewthe J loop of Program 13a with respect to the I loop byadding I to the upper and lower limits of the J loop; thisrequires that we then subtract I from J within the loop.The skewed loop is shown in Program 13b and the skewediteration space is shown in Figure 7. The direction vectorsfor the data dependence relations in the skewed loop willchange from (d,, d,) to (d,, dl+dl) , so the modifieddependence relations are:Sl 6(0,1) SlSl 6(1,1) SlSl J(O.1) Sl Sl X(1,1) SlInterchanging the skewed loops requires some clevermodifications to the loop limits, as shown in Program 13~.As before, interchanging the two loops requires that weswitch the corresponding elements in the direction vectors,giving:Sl F(LO) Sl Sl 6(1,1) SlSI X(1,0) $1 Sl J(l,l) SlNotice that in each case, the direction vector has a positivevalue in the first element, meaning that each dependencerelation is carried by the outer loop (the J loop);thus, the skewed and interchanged I loop can be executedin parallel, which gives us the wavefront formulation.Strip Mining: Vectorizing compilers often divide asingle loop into a pair of loops, where the maximum tripcount of the inner loop is equal to the maximum vectorlength of the machine. Thus, for a Cray vector computer,the loop in Program 14a will essentially be converted intothe pair of loops in Program 14b. This process is calledstrip mining [Love77]. The original loop is divided intostrips of some maximum size, the strip size; in Program14b, the inner loop (or element loop) has a strip size of 64,which is the length of the Cray vector registers. The outerloop (the IS loop, for “strip loop”) steps between thestrips; on the Cray, the I loop corresponds to the vectorinstructions.Program 14a:do I = 1, Ns1: A(I) = A(I) + B(1)sa: C(I) = A(I-1) * 2enddoProgram 14b:do IS = 1, N, 64do I = IS, MIN(N,IS+63)s1: A(1) = A(1) + B(1)s,: C(1) = A(I-1) * 2enddoenddoStrip mining is always legal; however it does have aneffect on the data dependence relations in the loop. Asstrip mining adds a loop, it adds a dimension to the iterationspace; thus it must also add an element to the distanceor direction vector. When a loop is strip mined, adependence relation with a (d) in the distance vector forthat loop produces one or two dependence relations. If d isa multiple of the strip size ss, then a distance vector (d)is changed to the distance vector (d/ss, 0) If d is not amultiple of ss, then a distance vector (d) generates twodependence relations, with distance vectors:( d,dmodss) (d, -d mod ss)ssSSI JI 1In either case, if the original dependence distance is largerthan (or equal to) the strip size, then after strip mining thestrip loop will carry that dependence relation, allowingparallel execution of the element loop.<strong>Iteration</strong> <strong>Space</strong> <strong>Tiling</strong>: When nested loops arestrip mined and the strip loops are all interchanged to theoutermost level, the result is a tiling of the iteration space.The double-nested loop in Program 15a can be tiled tobecome the four-nested loop in Program 15b. Thiscorresponds to dividing the two dimensional iterationspace for Program 15a into “tiles”, as shown in Figure 8.Each tile corresponds to the inner two element loops, andthe outer two “tile” loops step between the tiles.Program 15a:do I = 1, Ndo J = 1, Ns1: A(1.J) = A(I,J) + B(I,J)s1: C(I,J) = A(I-l,J) * 2enddoenddo659

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!