20.11.2014 Views

PPKE ITK PhD and MPhil Thesis Classes

PPKE ITK PhD and MPhil Thesis Classes

PPKE ITK PhD and MPhil Thesis Classes

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

4.8 Conclusions 107<br />

designed on FPGA. It provides a fully programmable CNN-UM, on which the<br />

most sophisticated <strong>and</strong> complex analogic algorithms can be executed. It has<br />

shown that integrating this embedded GAPU is highly advantageous in case sequences<br />

of template operations in an analogic CNN algorithm are performed<br />

on large-sized (e.g., 128×128, or even 512×512) images. Due to the modular<br />

structure, the GAPU makes the integration easy with the different elaborated,<br />

emulated-digital CNN-UM implementations on FPGA, e.g., computing<br />

single layer dynamic, globally connected multi-layer computations, or running<br />

non-linear templates, as well.<br />

I made recommendations for the structure of the GAPU (precision) to develop<br />

an emulated digital CNN-UM. The Falcon processor should be extended<br />

with the GAPU, according to the original CNN-UM architecture, in order to execute<br />

a more complex algorithm time efficiently. The implemented GAPU should<br />

consume minimal area while providing high operating speed to avoid slow down<br />

of the Falcon processor, to gain the largest possible computational performance.<br />

The GAPU can be built from a properly configured MicroBlaze, or a dedicated<br />

PPC, or ARM processor. I made further considerations on the structure of the<br />

controller’s state registers <strong>and</strong> configuration of the template <strong>and</strong> state memory<br />

in order to adopt the system for the different kind of Falcon Processing Units.<br />

E.g.: Different Falcon units are optimal for black <strong>and</strong> white or grayscale image<br />

processing.<br />

I have developed a new memory organization methodology in order to exploit<br />

the possibilities of the latest FPGAs. The dedicated arithmetic units of the new<br />

generation FPGAs become faster, but the speed of the embedded processor <strong>and</strong><br />

bus architecture are evolving slower. The Falcon processor can work on higher<br />

operating frequency than the embedded microprocessor <strong>and</strong> the bus system on<br />

the latest FPGAs. I have developed a new architecture, where the embedded<br />

microprocessor, the controller circuit, the memory <strong>and</strong> the Falcon processing unit<br />

can be operated on different clock speed. In addition to the internal structural<br />

modifications the external memory can be accessed via a dedicated FIFO element.<br />

The new architecture makes concurrent access to the external memory possible<br />

for the MicroBlaze, the control unit <strong>and</strong> the Falcon processor.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!