PPKE ITK PhD and MPhil Thesis Classes
PPKE ITK PhD and MPhil Thesis Classes
PPKE ITK PhD and MPhil Thesis Classes
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
4. IMPLEMENTING A GLOBAL ANALOGIC PROGRAMMING UNIT<br />
88 FOR EMULATED DIGITAL CNN PROCESSORS ON FPGA<br />
To overcome the former problems of the analog solutions, the emulated-digital<br />
CNN-UM implementations, as aggregated arrays of processing elements, provide<br />
non-linear template operations <strong>and</strong> multi-layer structures of high accuracy with<br />
somewhat lower performance [31], [26], <strong>and</strong> [45].<br />
Among the CNN-UM implementations the emulated-digital approaches based<br />
on FPGAs (Field Programmable Gate Arrays) have been proved to be very efficient<br />
in the numerical solution of various complex spatio-temporal problems<br />
described by PDEs (Partial Differential Equations) [46]. Several FPGA implementations<br />
of the specialized emulated-digital CNN architecture, for example a<br />
real-time, multi-layer retina model [18], non-linear template runner [72], 2D seismic<br />
wave propagation models [19], <strong>and</strong> solution of 2D Navier-Stokes equations,<br />
such as barotropic ocean model [73], <strong>and</strong> compressible flow simulation [74] have<br />
been successfully designed <strong>and</strong> tested.<br />
Hence, the elaborated Falcon architecture on FPGA [45] seems to be a good<br />
trade-off between the high-speed analog VLSI CNN-UM <strong>and</strong> the versatile software<br />
solutions. In order to provide high flexibility in CNN computations, it is<br />
interesting how we can reach large performance by connecting locally a lot of<br />
simple <strong>and</strong> relatively low-speed parallel processing elements, which are organized<br />
in a regular array. The large variety of configurable parameters of this architecture<br />
(such as state- <strong>and</strong> template-precision, size of templates, number of rows <strong>and</strong><br />
columns of processing elements, number of layers, size of pictures, etc.) allows<br />
us to arrange an implementation, which is best suited to the target application<br />
(e.g. image/video processing or fluid flow simulation). So far, without the proper<br />
controller (Global Analog Programming Unit, GAPU) extension, when solving<br />
different types of PDEs, a single set of CNN template operations has been implemented<br />
on the host PC: by downloading the image onto the FPGA board (across<br />
a quite slow parallel port), computing the transient, <strong>and</strong> finally uploading the result<br />
back to the host computer where logical, arithmetic <strong>and</strong> program organizing<br />
steps were executed.<br />
If, howewer, we want to run analogic CNN algorithms with template statements,<br />
the Falcon architecture has to be extended with this proposed GAPU<br />
implementation. It simplifies the downloading of the input picture(s)/image(s)<br />
(e.g.: pictures for image processing, or surfaces for tactile sensing, etc.) along