20.11.2014 Views

PPKE ITK PhD and MPhil Thesis Classes

PPKE ITK PhD and MPhil Thesis Classes

PPKE ITK PhD and MPhil Thesis Classes

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

vi<br />

LIST OF FIGURES<br />

2.4 Performance of the implemented CNN simulator on the Cell architecture<br />

compared to other architectures, considering the speed<br />

of the Intel processor as a unit in both linear <strong>and</strong> nonlinear case<br />

(CNN cell array size: 256×256, 16 forward Euler iterations, *Core<br />

2 Duo T7200 @2GHz, **Falcon Emulated Digital CNN-UM implemented<br />

on Xilinx Virtex-5 FPGA (XC5VSX95T) @550MHz only<br />

one Processing Element (max. 71 Processing Element). . . . . . . 40<br />

2.5 Intruction histogram in case of one <strong>and</strong> multiple SPEs . . . . . . 41<br />

2.6 Data-flow of the pipelined multi-SPE CNN simulator . . . . . . . 43<br />

2.7 Instruction histogram in case of SPE pipeline . . . . . . . . . . . 44<br />

2.8 Startup overhead in case of SPE pipeline . . . . . . . . . . . . . . 44<br />

2.9 Speedup of the multi-SPE CNN simulation kernel . . . . . . . . . 45<br />

2.10 Comparison of the instruction number in case of different unrolling 47<br />

2.11 Performance comparison of one <strong>and</strong> multiple SPEs . . . . . . . . 48<br />

2.12 The computational domain . . . . . . . . . . . . . . . . . . . . . . 56<br />

2.13 Local store buffers . . . . . . . . . . . . . . . . . . . . . . . . . . 61<br />

2.14 Data distribution between SPEs . . . . . . . . . . . . . . . . . . . 62<br />

2.15 Number of slices in the arithmetic unit . . . . . . . . . . . . . . . 64<br />

2.16 Number of multipliers in the arithmetic unit . . . . . . . . . . . . 65<br />

2.17 Number of block-RAMs in the arithmetic unit . . . . . . . . . . . 66<br />

2.18 Simulation around a cylinder in the initial state, 0.25 second, 0.5<br />

second <strong>and</strong> in 1 second . . . . . . . . . . . . . . . . . . . . . . . . 67<br />

3.1 Error of the 1st order scheme in different precision with 10 4 grid<br />

resolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75<br />

3.2 The arithmetic unit of the first order scheme . . . . . . . . . . . . 76<br />

3.3 The arithmetic unit of the second order scheme . . . . . . . . . . 77<br />

3.4 Structure of the system with the accelerator unit . . . . . . . . . 78<br />

3.5 Number of slices of the Accelerator Unit in different precisions . . 78<br />

3.6 Error of the 1st order scheme in different precisions <strong>and</strong> step sizes<br />

using floating point numbers . . . . . . . . . . . . . . . . . . . . . 80<br />

3.7 Error of the 2nd order scheme in different precisions <strong>and</strong> step sizes<br />

using floating point numbers . . . . . . . . . . . . . . . . . . . . . 81

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!