12.07.2015 Views

NoC design and optimization for Multi-core media processors

NoC design and optimization for Multi-core media processors

NoC design and optimization for Multi-core media processors

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

CHAPTER 2. RELATED WORK 27Work in [113] presents an instruction replication method along with clustering approachto decrease inter-cluster (inter-PE) communication. The load balancing algorithmused to distribute instructions among clusters along with the amount of inter clustercommunication dictate the per<strong>for</strong>mance of a clustered processor. The work aims to reduceinter-cluster communication by replication instructions in processing elements (PEs)where their results are utilized. Resources idle <strong>and</strong> available in PEs are used by replicatedinstructions such that load balancing is maintained.Data transfer on long latency wires can be reduced by value prediction[114] <strong>and</strong> cacheline replication[115][116] techniques. Work presented in [114] reduces long wire delays bypredicting data being communicated. The predicted value is then validated locally whereit was produced. Correctly predicted values do not incur the long wire delay. The stridevalue predictor[117][118] predicts source oper<strong>and</strong>s of instructions to be executed.Victim cache line replication presented in [115] replicates evicted primary cache linesinto L2 slice local to the CMP tile. The work considers a CMP with each tile containinga slice of the total L2. Cache line replication is the hybrid cache management policycombining private local L2 slice <strong>and</strong> shared L2. Total effective capacity of L2 is reducedwhen every tile has a local copy of accessed cache lines. On the other h<strong>and</strong> a single sharedL2 may incur large latencies when cache lines have to be accessed from remote tiles. Hitsto replicated cache lines reduce effective latency of shared L2 cache <strong>and</strong> hence reducelatency effects from communication in CMPs.GALS & Floorplanning TechniquesScalablemicroarchitecturaltechniquestoreducetheimpactofwiredelayhavebeenlookedat[119][120]. Work in [119] investigates interconnect bottleneck in FPGA based systems<strong>and</strong> proposes Globally Asynchronous Locally Synchronous (GALS) as a potential solution.The work proposes a <strong>design</strong> flow to investigate optimal GALS isl<strong>and</strong> size to balanceamount of inter-isl<strong>and</strong> communication <strong>and</strong> asynchronous communication overhead betweenGALS isl<strong>and</strong>s.Floorplanning techniques to overcome long latencies between the processor <strong>and</strong> the

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!