A Framework for Automatic High- Level Optimization of Applications ...

sti.cc.gatech.edu
  • No tags were found...

A Framework for Automatic High- Level Optimization of Applications ...

R-Stream: A Framework for Automatic High-Level Optimization of Applications to theCell/B.E. Microprocessor: Status andChallengesRichard A. LethinReservoir Labs, Inc.New York, NY Portland, OR Columbia, MD(212) 780-0527reservoirabsGT STI Cell Workshop18 June 20070


The Challenge of Optimizing to Cell B.E.High Level MappingSPUSPUSPUSPUSPUSPUSPUSPU• Partitioning between PPU/SPULSLSLSLSLSLSLSLS• Parallelization• SchedulingEIB• Memory distribution &managementCPPEPPUMIC• Communication generation• Synchronization• Locality of referenceLow Level Mapping• Instruction selection• Instruction schedulingDRAM• Register allocation• SIMDizationreservoirabsGT STI Cell Workshop18 June 20071


Libraries are good, still…The plusses:• Someone else does the work• Just have to write to theirabstraction• People getting good results now• Existing libraries for physics,image processing, linear algebra…Still,• What if the computation youwant is not in the library?• Who writes the library, and how?• Non-standard or proprietary APIs• Licensing costs – per core, perchip?• Does the best performance occurwhen you break the abstractionand optimize across library calls?Is hope justified for a tool thatcan automatically map code toCell B.E.?reservoirabsGT STI Cell Workshop18 June 20072


Compilation Flowfor (int i = 0; i


• “C language API”Minimal API between HLC and LLC• HLC Backend generates two separate programs, one for SPU and one forPPU• Execution model: Most of application on PPU, SPU’s operating “long”running tasks, initiating their own DMA and synchronization.• Asynchronous DMA: CELL_dma_get, CELL_dma_put,CELL_dma_wait• Process control: CELL_mapped_begin, CELL_mapped_end• Memory management: C library• Synchronization: CELL_barrier(int id)reservoirabsGT STI Cell Workshop18 June 20074


Under the HoodR-Stream 3.0ISO ISO C Front Front End EndRaising RaisingPolyhedral PolyhedralMapper MapperLowering LoweringCompiler Compiler InfrastructureInfrastructureCode Code Gen/Back Gen/Back End EndAPILow-Level Low-Level Compilers CompilersDetails are the subject of many hours of presentations, but I’m happyto talk about it offline today… or please visit us in New YorkreservoirabsGT STI Cell Workshop18 June 20075


Polyhedral representationforfor(i=2;(i=2;i


Algorithmic Tools in the Polyhedral Model• Basic linear algebra– Hermite Normal Form, Smith Normal Form, Gaussian Elimination,Fourier- Motzkin Elimination, …• Minkowski decomposition: P = conv(V)+cone(R)+lin.space(L)• Set operations on polyhedra• Linear programming and extensions– LP, ILP, parametric LP, parametric ILP, …• Farkas Lemma• Computational geometry. e.g., bounding volumes computation• Combinatorial optimizations• Lattice point counting. e.g., Ehrhart Polynomials and related generatingfunctionsreservoirabsGT STI Cell Workshop18 June 20079


Polyhedral Model Properties• Model everything as polyhedra, linear constraints in N dimensions• Can represent task, pipeline, data parallelism as affine schedules• Can represent affine data layout transformations• Do both computation mapping and data mapping in same framework• Frameworks subsume “classic” optimizations fusion, scaling, interchange,reversal, skewing in single phase• Coarse grained parallelization – not JUST vectorization– Can do vectorization too in same framework• Tiling: imperfectly nested loop• Communication generation and DMA optimizationsreservoirabsGT STI Cell Workshop18 June 200710


Matrix Multiply Examplefloat float A[1024][1024];A[1024][1024];float float B[1024][1024];B[1024][1024];float float C[1024][1024];C[1024][1024];for for (int (int i = 0; 0; i


Parallelizingdoall doall (int (int i = 0; 0; i


Tilingdoall doall (int (int i = 0; 0; i


Processor Placement// // The The processor processor grid grid on on CELL CELL is is one one dimensional.dimensional.// // PROC0 PROC0 is is a parameter parameter which which stands stands for for the the processor processor number. number.// // It It ranges ranges from from 0 to to 7. 7.doall doall (i (i = 0; 0; i


Initialization Sectionfor for (k (k = 0; 0; k = 1) 1) { rotate rotate C_l_v1 C_l_v1 and and C_l_v2; C_l_v2; }if if (k (k


Pipelined 64x64 Matrix Multiplyforfor(k(k==-1;-1;kk


Interface between PPU and SPUunion union __context __context {struct struct {float float (*A)[1024];(*A)[1024];float float (*B)[1024];(*B)[1024];float float (*C)[1024];(*C)[1024];} context; context;double double padding[2];padding[2];PPU}SPUunionunion__context __contextcontext;intcontext;intmain(uint64_tmain(uint64_tid,id,uint64_tuint64_targp)argp)externexternspe_program_handle{spe_program_handle{matmult1024_spu;unionmatmult1024_spu;union__context __contextc;c;structstructCELL_mapped_region* CELL_mapped_region*region;uint64_tregion;uint64_tt1;t1;context.context.Acontext.context.A==A;CELL_spu_init(id,argp,...);A;CELL_spu_init(id,argp,...);context.context.Bcontext.context.B==B;CELL_dma_get((void B;CELL_dma_get((void*)_t1,*)_t1,&c,&c,context.context.Ccontext.context.C==C;sizeof(c), C;sizeof(c),0,0,0,0,1,1,0);0);regionregion==CELL_mapped_begin(0, CELL_mapped_begin(0,8,8,0,CELL_dma_wait(0);0,CELL_dma_wait(0);&matmult1024_spu,&matmult1024_spu,&context,__kernel(c.context.A,&context,__kernel(c.context.A,sizeof(context));c.context.B,sizeof(context));c.context.B,CELL_mapped_end(region);c.context.C);CELL_mapped_end(region);c.context.C);returnreturn0;0;}}reservoirabsGT STI Cell Workshop18 June 200718


QR Exampleforfor(int(intii==0;0;ii


Parallelizing ...// // prologue prologue code code omitted omittedfor for (int (int i = 0; 0; i


After Tilingforfor(i(i==0;0;ii


After Processor Placement////PROC0PROC0rangesrangesfromfrom00toto77forfor(i(i==0;0;ii=-31)-31)doall doall(j(j==max(i,max(i,3232**PROC0);PROC0);jj


Local Memory Compactionfor for (i (i = 0; 0; i


Local Memory Compaction/DMA, SynchronizationGenerationdoall doall(j(j==max(0,max(0,ceilDiv(iceilDiv(i++-126,-126,128));128));jj=-31-31&&&&kk=0)0){{CELL_barrier(8);CELL_barrier(8);ifif(l(l--PROC0PROC0====0)0){{CELL_dma_wait(0);CELL_dma_wait(0);rotate rotatepointerspointerstotoQR_l_7,QR_l_7,QR_l_8,QR_l_8,s_l_3s_l_3}}}}ifif(l(l--PROC0PROC0====0)0){{ifif(k(k=-31-31&&&&kk=0)0){{doall doall(m(m==max(128*j+16*PROC0,max(128*j+16*PROC0,ii++1);1);mm=1)1)CELL_dma_wait(1);CELL_dma_wait(1);ifif((--ii++3232**kk>=>=-31-31&&&&kk=0)0){{initialize initializewrite writeback backof ofQR_l_8QR_l_8totoQRQR}}}}}}reservoirabsGT STI Cell Workshop18 June 200724


Local memory compaction (Example)Tiled inner loop from LUfloat float A[256][256];A[256][256];doall doall (l (l = 128 128 * j + 16 16 * P; P; l


Scope of Application• Generalities: Limitations of:– C language– Polyhedral model– Scalability of optimization algorithms and underlying mathematical solvers– Our implementation– Raising/lowering modules in R-Stream– …limit us to kernels within the immediate scope of model.– …taking off-the-shelf code, and raising, optimizing it, is still a challenge• Currently working through performance issues associated with handing code to anLLC to optimize• Nevertheless, we have a very powerful substrate, underlying model, and our goal isto provide this as a commercial tool for programming Cell B.E.– It’s a question of when, not if…reservoirabsGT STI Cell Workshop18 June 200726


How (and when) can I get it?• Now: inviting select users to log in to try it out on our systems…• Will be able to distribute “Alpha” versions in ~ 2 months• Goal: to have this usable as a flow alongside an IBM 3.0 SDK release(Q3/07)• (As an aside, while our plan is to provide this as a commercial tool, itALSO opens up enormous number of research opportunities in automaticoptimization– We will be interested in working with academic, government, andindustrial research groups on optimization on forward researchprojects)reservoirabsGT STI Cell Workshop18 June 200727


Thanks• To the commercial interest and support by the members of STI• The extremely dedicated and talented members of the Reservoir compilerteam• The members of the Morphware Forum• DARPA/AFRL for providing funding for this work (F03602-03-C-0033)• Other government agencies and components for their interest and supportreservoirabsGT STI Cell Workshop18 June 200728

More magazines by this user
Similar magazines