NoC design and optimization for Multi-core media processors

More documents

Recommendations

Info

CHAPTER 2. RELATED WORK 27Work in [113] presents an instruction replication method along with clustering approachto decrease inter-cluster (inter-PE) communication. The load balancing algorithmused to distribute instructions among clusters along with the amount of inter clustercommunication dictate the performance of a clustered processor. The work aims to reduceinter-cluster communication by replication instructions in processing elements (PEs)where their results are utilized. Resources idle and available in PEs are used by replicatedinstructions such that load balancing is maintained.Data transfer on long latency wires can be reduced by value prediction[114] and cacheline replication[115][116] techniques. Work presented in [114] reduces long wire delays bypredicting data being communicated. The predicted value is then validated locally whereit was produced. Correctly predicted values do not incur the long wire delay. The stridevalue predictor[117][118] predicts source operands of instructions to be executed.Victim cache line replication presented in [115] replicates evicted primary cache linesinto L2 slice local to the CMP tile. The work considers a CMP with each tile containinga slice of the total L2. Cache line replication is the hybrid cache management policycombining private local L2 slice and shared L2. Total effective capacity of L2 is reducedwhen every tile has a local copy of accessed cache lines. On the other hand a single sharedL2 may incur large latencies when cache lines have to be accessed from remote tiles. Hitsto replicated cache lines reduce effective latency of shared L2 cache and hence reducelatency effects from communication in CMPs.GALS & Floorplanning TechniquesScalablemicroarchitecturaltechniquestoreducetheimpactofwiredelayhavebeenlookedat[119][120]. Work in [119] investigates interconnect bottleneck in FPGA based systemsand proposes Globally Asynchronous Locally Synchronous (GALS) as a potential solution.The work proposes a design flow to investigate optimal GALS island size to balanceamount of inter-island communication and asynchronous communication overhead betweenGALS islands.Floorplanning techniques to overcome long latencies between the processor and the
CHAPTER 2. RELATED WORK 28Level-2 cache have been experimented on[121]. Work states that floorplans should aidCMP systems in distinguishing and shared and private data accessed by cores. The paperintroduces a floorplan topology that partitions shared and private cached data and thusenables fast access to shared data for all processors while preserving the vicinity of privatedata to each processor.2.4 SummaryProviding QoS guarantees in on-chip communication networks has been identified as oneofmajorresearchproblemsinNoCs[48]. ALabelSwitchedQoSguaranteeingNoCthatretainsadvantagesofbothpacketswitchedandcircuitswitchednetworkshasbeendescribedin the thesis. LS-NoC sets up communication channels (pipes) between communicatingnodes that are independent of existing pipes and contention free at routers. Flow identificationalgorithms identify contention free and resource rich paths taking into accountthe existing routes in the NoC. LS-NoC is described in Chapters 5, 6 and 7.Wirelengths have a significant influence on the latency of interconnect, and henceneed to be included in the simulation framework. It is clear from works emphasizingon effects of communication on chip performance that there is a need for a co-designof interconnects, processing elements and memory blocks to fully optimize the overallsystem-on-chip performance. This necessitates a simulation framework which allows aco-simulation of the communicating entities along with ICN simulation. Additionally,to optimize power fully, one also needs to incorporate the link-level microarchitecturalchoices of pipelining etc.A System-C framework which enables NoC designers to assemble communicating entitiesalong with the ICN and also allows for exploration of architectural and microarchitecturalparameters of the ICN in order to obtain the latency, throughput and powertrade-offs is presented in Chapter 3. Chapter 3 presents results for NoC power by consideringeffects of various pipelining configurations, frequency and voltage scaling values.Various traffic generation and distribution models have been used to mimic realistic trafficpatterns and activity in NoCs. Trade-off studies in this chapter consider Energy-Delay
Page 1 and 2: NoC Design & Optimization of Multic
Page 3 and 4: Abstractiibit data bus, 4 bit label
Page 5 and 6: Abstractivinterconnect latency.Simu
Page 7 and 8: PublicationsJournals• Basavaraj T
Page 9 and 10: CONTENTSviii2.4 Summary . . . . . .
Page 11 and 12: CONTENTSxC The Flow Algorithm 155C.
Page 13 and 14: LIST OF TABLESxii5.4 Synthesis Para
Page 15 and 16: LIST OF FIGURESxiv3.16 Schematic re
Page 17 and 18: LIST OF FIGURESxviB.3 8×8 mesh use
Page 19 and 20: CHAPTER 1. INTRODUCTION 2there is a
Page 21 and 22: CHAPTER 1. INTRODUCTION 4A few prop
Page 23 and 24: CHAPTER 1. INTRODUCTION 6Figure 1.1
Page 25 and 26: CHAPTER 1. INTRODUCTION 8link micro
Page 27 and 28: CHAPTER 1. INTRODUCTION 10identify
Page 29 and 30: CHAPTER 2. RELATED WORK 12this sect
Page 31 and 32: CHAPTER 2. RELATED WORK 14Æthereal
Page 33 and 34: CHAPTER 2. RELATED WORK 16in [64] a
Page 35 and 36: CHAPTER 2. RELATED WORK 18the overa
Page 37 and 38: CHAPTER 2. RELATED WORK 20Intacte[8
Page 39 and 40: CHAPTER 2. RELATED WORK 22predict t
Page 41 and 42: CHAPTER 2. RELATED WORK 24Sapphire[
Page 43: CHAPTER 2. RELATED WORK 26case, ban
Page 47 and 48: Chapter 3Link Microarchitecture Exp
Page 49 and 50: CHAPTER 3. LINK MICROARCHITECTURE E
Page 79 and 80: CHAPTER 4. TILE EXPLORATION 62inclu
Page 81 and 82: CHAPTER 4. TILE EXPLORATION 64Execu
Page 83 and 84: CHAPTER 4. TILE EXPLORATION 66reduc
Page 85 and 86: CHAPTER 4. TILE EXPLORATION 68laten
Page 87 and 88: CHAPTER 4. TILE EXPLORATION 70local
Page 89 and 90: CHAPTER 4. TILE EXPLORATION 72each
Page 91 and 92: CHAPTER 4. TILE EXPLORATION 74Table
Page 93 and 94: CHAPTER 4. TILE EXPLORATION 76of op
Page 95 and 96:
CHAPTER 4. TILE EXPLORATION 78Table
Page 97 and 98:
CHAPTER 4. TILE EXPLORATION 80FFT.
Page 99 and 100:
CHAPTER 4. TILE EXPLORATION 82Progr
Page 101 and 102:
CHAPTER 4. TILE EXPLORATION 84resul
Page 103 and 104:
Page 105 and 106:
Page 107 and 108:
CHAPTER 4. TILE EXPLORATION 906560F
Page 109 and 110:
CHAPTER 4. TILE EXPLORATION 92Progr
Page 111 and 112:
Page 113 and 114:
CHAPTER 4. TILE EXPLORATION 964.9 I
Page 115 and 116:
CHAPTER 4. TILE EXPLORATION 981.151
Page 117 and 118:
Chapter 5Label Switched NoC - Motiv
Page 119 and 120:
CHAPTER 5. LABEL SWITCHED NOC 102Ta
Page 121 and 122:
CHAPTER 5. LABEL SWITCHED NOC 104co
Page 123 and 124:
CHAPTER 5. LABEL SWITCHED NOC 1065.
Page 125 and 126:
CHAPTER 5. LABEL SWITCHED NOC 108
Page 127 and 128:
CHAPTER 5. LABEL SWITCHED NOC 110IN
Page 129 and 130:
CHAPTER 5. LABEL SWITCHED NOC 1120R
Page 131 and 132:
Page 133 and 134:
Chapter 6LS-NoC ManagementStreaming
Page 135 and 136:
CHAPTER 6. LS-NOC MANAGEMENT 118abi
Page 137 and 138:
CHAPTER 6. LS-NOC MANAGEMENT 120the
Page 139 and 140:
CHAPTER 6. LS-NOC MANAGEMENT 122Tab
Page 141 and 142:
CHAPTER 6. LS-NOC MANAGEMENT 124Vid
Page 143 and 144:
CHAPTER 6. LS-NOC MANAGEMENT 126S0l
Page 145 and 146:
CHAPTER 6. LS-NOC MANAGEMENT 128Flo
Page 147 and 148:
Page 149 and 150:
CHAPTER 7. LABEL SWITCHED NOC 132OR
Page 151 and 152:
CHAPTER 7. LABEL SWITCHED NOC 1347.
Page 153 and 154:
CHAPTER 7. LABEL SWITCHED NOC 136LS
Page 155 and 156:
CHAPTER 7. LABEL SWITCHED NOC 138co
Page 157 and 158:
Chapter 8Conclusion and Future Work
Page 159 and 160:
CHAPTER 8. CONCLUSION AND FUTURE WO
Page 161 and 162:
CHAPTER 8. CONCLUSION AND FUTURE WO
Page 163 and 164:
Appendix AInterface and Outputs of
Page 165 and 166:
APPENDIXA. INTERFACEANDOUTPUTSOFTHE
Page 167 and 168:
Appendix BTesting & Validation of L
Page 169 and 170:
APPENDIX B. TESTING & VALIDATION OF
Page 171 and 172:
APPENDIX B. TESTING & VALIDATION OF
Page 173 and 174:
APPENDIX C. THE FLOW ALGORITHM 156X
Page 175 and 176:
APPENDIX C. THE FLOW ALGORITHM 158E
Page 177 and 178:
Bibliography[1] W.J. Dally and B. T
Page 179 and 180:
BIBLIOGRAPHY 162[16] C. Hilton and
Page 181 and 182:
BIBLIOGRAPHY 164[32] ErnoSalminen,T
Page 183 and 184:
BIBLIOGRAPHY 166[50] Erno Salminen,
Page 185 and 186:
BIBLIOGRAPHY 168[66] N. Megiddo. Op
Page 187 and 188:
BIBLIOGRAPHY 170[84] B.S. Landman a
Page 189 and 190:
BIBLIOGRAPHY 172[101] Michael Huang
Page 191 and 192:
BIBLIOGRAPHY 174[118] J Gonzlez and
Page 193 and 194:
BIBLIOGRAPHY 176[133] Ron Ho, Kenne
show all

NoC design and optimization for Multi-core media processors

Create successful ePaper yourself

Delete template?

Save as template?