PPKE ITK PhD and MPhil Thesis Classes

More documents

Recommendations

Info

30 1. INTRODUCTION Even pipe Floating point Fixed point Odd pipe Branch Memory Permutation Dual-Issue Instruction Logic Register File 128x16-byte register Instruction Buffer 3.5x32 instruction Local Store 256kB, Single Ported 128byte 128byte DMA Globally coherent 8byte Figure 1.14: Block diagram of the Synergistic Processor Element
1.5 IBM Cell Broadband Engine Architecture 31 the execution of branch, memory and permute instructions. Instructions for the even and odd pipeline can be issued in parallel. Similarly to the PPE the SPEs are also in-order processors. Data for the instructions are provided by the very large 128 element register file where each register is 16byte wide. Therefore SIMD instructions of the SPE work on 16byte-wide vectors, for example, four single precision floating point numbers or eight 16bit integers. The register file has 6 read and 2 write ports to provide data for the two pipelines. The SPEs can only address their local 256KB SRAM memory but they can access the main memory of the system by DMA instructions. The Local Store is 128byte wide for the DMA and instruction fetch unit, while the Memory unit can address data on 16byte boundaries by using a buffer register. 16byte data words arriving from the EIB are collected by the DMA engine and written to the memory in one cycle. The DMA engines can handle up to 16 concurrent DMA operations where the size of each DMA operation can be 16KB. The DMA engine is part of the globally coherent memory address space but we must note that the local store of the SPE is not coherent. Consequently, the most significant difference between the SPE and PPE lies in how they access memory. The PPE accesses main storage (the effective-address space) with load and store instructions that move data between main storage and a private register file, the contents of which may be cached. The SPEs, in contrast, access main storage with Direct Memory Access (DMA) commands that move data and instructions between main storage and a private local memory, called a local store or local storage (LS). An SPE’s instruction-fetches and load and store instructions access its private LS rather than shared main storage, and the LS has no associated cache. This three-level organization of storage (register file, LS, main storage), with asynchronous DMA transfers between LS and main storage, is a radical break from conventional architecture and programming models, because it explicitly parallelizes computation with the transfers of data and instructions that feed computation and store the results of computation in main storage.
Page 1: An Optimal Mapping of Numerical Sim
Page 5 and 6: Acknowledgements It is not so hard
Page 7 and 8: Contents 1 Introduction 1 1.1 Cellu
Page 9: CONTENTS iii 4.3.3 Operating Steps
Page 12 and 13: vi LIST OF FIGURES 2.4 Performance
Page 15: List of Tables 2.1 Comparison of di
Page 18 and 19: xii LIST OF ABBREVIATIONS FPE Falco
Page 21 and 22: Chapter 1 Introduction Due to the r
Page 23 and 24: 3 can be represented in arbitrary t
Page 25 and 26: 1.1 Cellular Neural/Nonlinear Netwo
Page 27 and 28: 1.2 Cellular Neural/Nonlinear Netwo
Page 29 and 30: 1.3 CNN-UM Implementations 9 the Lo
Page 31 and 32: 1.3 CNN-UM Implementations 11 modif
Page 33 and 34: 1.4 Field Programmable Gate Arrays
Page 47 and 48: 1.5 IBM Cell Broadband Engine Archi
Page 49: 1.5 IBM Cell Broadband Engine Archi
Page 53 and 54: 1.5 IBM Cell Broadband Engine Archi
Page 55 and 56: Chapter 2 Mapping the Numerical Sim
Page 57 and 58: 2.1 Introduction 37 18 Load instruc
Page 59 and 60: 2.1 Introduction 39 Instruction his
Page 61 and 62: 2.1 Introduction 41 2E+07 1E+07 9E+
Page 63 and 64: 2.1 Introduction 43 Main memory 8 t
Page 65 and 66: 2.1 Introduction 45 6 4.5 3 Speedup
Page 67 and 68: 2.1 Introduction 47 2.50E+07 1.88E+
Page 69 and 70: 2.2 Ocean model and its implementat
Page 71 and 72: 2.2 Ocean model and its implementat
Page 73 and 74: 2.3 Computational Fluid Flow Simula
Page 89 and 90: Chapter 3 Investigating the Precisi
Page 91 and 92: 3.2 Numerical Solutions of the PDEs
Page 93 and 94: 3.3 Testing Methodology 73 In fixed
Page 95 and 96: 3.4 Properties of the Arithmetic Un
Page 97 and 98: 3.4 Properties of the Arithmetic Un
Page 99 and 100: 3.5 Results 79 in the L2 cache of t
Page 101 and 102:
3.5 Results 81 1E-02 1E-03 Error 1E
Page 103 and 104:
3.5 Results 83 arithmetic unit requ
Page 105:
3.6 Conclusion 85 point and the 40
Page 108 and 109:
4. IMPLEMENTING A GLOBAL ANALOGIC P
Page 110 and 111:
Page 112 and 113:
Page 114 and 115:
Page 116 and 117:
Page 118 and 119:
Page 120 and 121:
Page 122 and 123:
Page 124 and 125:
Page 126 and 127:
Page 128 and 129:
Page 130 and 131:
110 5. SUMMARY OF NEW SCIENTIFIC RE
Page 132 and 133:
Page 134 and 135:
Page 136 and 137:
Page 138 and 139:
Page 141 and 142:
References The author’s journal p
Page 143 and 144:
5.2 Application of the results 123
Page 145 and 146:
Page 147 and 148:
show all

PPKE ITK PhD and MPhil Thesis Classes

Create successful ePaper yourself

Delete template?

Save as template?