20.11.2014 Views

PPKE ITK PhD and MPhil Thesis Classes

PPKE ITK PhD and MPhil Thesis Classes

PPKE ITK PhD and MPhil Thesis Classes

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

4. IMPLEMENTING A GLOBAL ANALOGIC PROGRAMMING UNIT<br />

102 FOR EMULATED DIGITAL CNN PROCESSORS ON FPGA<br />

between the MicroBlaze core <strong>and</strong> the array of Falcon / Vector processor elements.<br />

4.6 Device utilization<br />

The experimental system is implemented on the RC203 development board from<br />

Celoxica [75], which is equipped with a Xilinx Virtex-II 3000 FPGA including<br />

14 336 slices, 96 18 × 18 bit signed multipliers, 96 BRAMs <strong>and</strong> 2 × 2 MB ZBT<br />

SSRAM memory. Using rapid prototyping techniques <strong>and</strong> high-level hardware<br />

description languages such as H<strong>and</strong>el-C from Celoxica makes it possible to develop<br />

optimized architectures much faster, compared to the conventional VHDL or<br />

Verilog based RTL-level approaches. During the implementation of the GAPU,<br />

H<strong>and</strong>el-C is located at the top level of the design, while the MicroBlaze core<br />

<strong>and</strong> its modules are wrapped as a low-level system processor macro. Using the<br />

Platform Studio integrated development environment [37] from Xilinx supports<br />

both the MicroBlaze soft-core <strong>and</strong> IBM PowerPC hard processor core designs.<br />

The required number of resources of the Falcon Processor Element <strong>and</strong> the<br />

proposed GAPU in different precision are examined (see Figure 4.6 <strong>and</strong> 4.7).<br />

As shown in Table 4.1, the proposed GAPU, based on a Xilinx MicroBlaze<br />

core, requires minimal additional area on the available chip resources at 18-bit<br />

state precision, which accuracy is best suited for dedicated multipliers (MULT18×<br />

18) <strong>and</strong> block-RAM (BRAM) modules. Though, the GAPU controller occupies<br />

four-times as many slices <strong>and</strong> BRAMs as one Falcon PE does, but only a small<br />

fraction of the available resources on the moderate-sized Virtex-II FPGA is used.<br />

Due to this consideration, the embedded GAPU does not decrease the number<br />

of implementable Falcon processor elements significantly. The number of the<br />

implementable FPEs <strong>and</strong> VPEs is configurable. Hence, by using our XC2V3000<br />

FPGA only 12% of the available slices are utilized, which makes it possible to<br />

implement 15 FPE cores, depending on the limited number of BlockRAMs. We<br />

can save some additional area by using external ZBT-SRAM modules instead<br />

of on-chip BRAMs. Moreover, the speed of the GAPU is close to the clock<br />

frequency of a Falcon processing element on Virtex-II architecture (the maximum<br />

realizable clock frequency of the MicroBlaze core is shown in case of the stateof-the-art<br />

Virtex-6 architecture [37]). If we want to attain the best performance,

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!