20.11.2014 Views

PPKE ITK PhD and MPhil Thesis Classes

PPKE ITK PhD and MPhil Thesis Classes

PPKE ITK PhD and MPhil Thesis Classes

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

4. IMPLEMENTING A GLOBAL ANALOGIC PROGRAMMING UNIT<br />

88 FOR EMULATED DIGITAL CNN PROCESSORS ON FPGA<br />

To overcome the former problems of the analog solutions, the emulated-digital<br />

CNN-UM implementations, as aggregated arrays of processing elements, provide<br />

non-linear template operations <strong>and</strong> multi-layer structures of high accuracy with<br />

somewhat lower performance [31], [26], <strong>and</strong> [45].<br />

Among the CNN-UM implementations the emulated-digital approaches based<br />

on FPGAs (Field Programmable Gate Arrays) have been proved to be very efficient<br />

in the numerical solution of various complex spatio-temporal problems<br />

described by PDEs (Partial Differential Equations) [46]. Several FPGA implementations<br />

of the specialized emulated-digital CNN architecture, for example a<br />

real-time, multi-layer retina model [18], non-linear template runner [72], 2D seismic<br />

wave propagation models [19], <strong>and</strong> solution of 2D Navier-Stokes equations,<br />

such as barotropic ocean model [73], <strong>and</strong> compressible flow simulation [74] have<br />

been successfully designed <strong>and</strong> tested.<br />

Hence, the elaborated Falcon architecture on FPGA [45] seems to be a good<br />

trade-off between the high-speed analog VLSI CNN-UM <strong>and</strong> the versatile software<br />

solutions. In order to provide high flexibility in CNN computations, it is<br />

interesting how we can reach large performance by connecting locally a lot of<br />

simple <strong>and</strong> relatively low-speed parallel processing elements, which are organized<br />

in a regular array. The large variety of configurable parameters of this architecture<br />

(such as state- <strong>and</strong> template-precision, size of templates, number of rows <strong>and</strong><br />

columns of processing elements, number of layers, size of pictures, etc.) allows<br />

us to arrange an implementation, which is best suited to the target application<br />

(e.g. image/video processing or fluid flow simulation). So far, without the proper<br />

controller (Global Analog Programming Unit, GAPU) extension, when solving<br />

different types of PDEs, a single set of CNN template operations has been implemented<br />

on the host PC: by downloading the image onto the FPGA board (across<br />

a quite slow parallel port), computing the transient, <strong>and</strong> finally uploading the result<br />

back to the host computer where logical, arithmetic <strong>and</strong> program organizing<br />

steps were executed.<br />

If, howewer, we want to run analogic CNN algorithms with template statements,<br />

the Falcon architecture has to be extended with this proposed GAPU<br />

implementation. It simplifies the downloading of the input picture(s)/image(s)<br />

(e.g.: pictures for image processing, or surfaces for tactile sensing, etc.) along

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!