PPKE ITK PhD and MPhil Thesis Classes
PPKE ITK PhD and MPhil Thesis Classes
PPKE ITK PhD and MPhil Thesis Classes
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
4. IMPLEMENTING A GLOBAL ANALOGIC PROGRAMMING UNIT<br />
102 FOR EMULATED DIGITAL CNN PROCESSORS ON FPGA<br />
between the MicroBlaze core <strong>and</strong> the array of Falcon / Vector processor elements.<br />
4.6 Device utilization<br />
The experimental system is implemented on the RC203 development board from<br />
Celoxica [75], which is equipped with a Xilinx Virtex-II 3000 FPGA including<br />
14 336 slices, 96 18 × 18 bit signed multipliers, 96 BRAMs <strong>and</strong> 2 × 2 MB ZBT<br />
SSRAM memory. Using rapid prototyping techniques <strong>and</strong> high-level hardware<br />
description languages such as H<strong>and</strong>el-C from Celoxica makes it possible to develop<br />
optimized architectures much faster, compared to the conventional VHDL or<br />
Verilog based RTL-level approaches. During the implementation of the GAPU,<br />
H<strong>and</strong>el-C is located at the top level of the design, while the MicroBlaze core<br />
<strong>and</strong> its modules are wrapped as a low-level system processor macro. Using the<br />
Platform Studio integrated development environment [37] from Xilinx supports<br />
both the MicroBlaze soft-core <strong>and</strong> IBM PowerPC hard processor core designs.<br />
The required number of resources of the Falcon Processor Element <strong>and</strong> the<br />
proposed GAPU in different precision are examined (see Figure 4.6 <strong>and</strong> 4.7).<br />
As shown in Table 4.1, the proposed GAPU, based on a Xilinx MicroBlaze<br />
core, requires minimal additional area on the available chip resources at 18-bit<br />
state precision, which accuracy is best suited for dedicated multipliers (MULT18×<br />
18) <strong>and</strong> block-RAM (BRAM) modules. Though, the GAPU controller occupies<br />
four-times as many slices <strong>and</strong> BRAMs as one Falcon PE does, but only a small<br />
fraction of the available resources on the moderate-sized Virtex-II FPGA is used.<br />
Due to this consideration, the embedded GAPU does not decrease the number<br />
of implementable Falcon processor elements significantly. The number of the<br />
implementable FPEs <strong>and</strong> VPEs is configurable. Hence, by using our XC2V3000<br />
FPGA only 12% of the available slices are utilized, which makes it possible to<br />
implement 15 FPE cores, depending on the limited number of BlockRAMs. We<br />
can save some additional area by using external ZBT-SRAM modules instead<br />
of on-chip BRAMs. Moreover, the speed of the GAPU is close to the clock<br />
frequency of a Falcon processing element on Virtex-II architecture (the maximum<br />
realizable clock frequency of the MicroBlaze core is shown in case of the stateof-the-art<br />
Virtex-6 architecture [37]). If we want to attain the best performance,