05.09.2016 Views

“Gate” lookup table (LUT)

ZXJP8

ZXJP8

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

• <strong>“Gate”</strong>: <strong>lookup</strong> <strong>table</strong> (<strong>LUT</strong>)<br />

<strong>LUT</strong><br />

∧ ∧ I2 // AND<br />

1


• <strong>“Gate”</strong>: <strong>lookup</strong> <strong>table</strong> (<strong>LUT</strong>)<br />

<strong>LUT</strong><br />

1<br />

1<br />

1<br />

⊕ ⊕ I2 // odd parity<br />

1


• Logic element (LE)<br />

LE<br />

<strong>LUT</strong><br />

FF


• Configurable logic block<br />

LE<br />

L<br />

LE<br />

LE<br />

LE<br />

LE<br />

LE<br />

LE<br />

LE


• Configurable logic blocks<br />

L L M<br />

×<br />

+<br />

IO<br />

L L M<br />

×<br />

+<br />

IO<br />

L L M<br />

×<br />

+<br />

IO<br />

L L M<br />

×<br />

+<br />

IO<br />

IO IO IO IO CFG


• Interconnect fabric<br />

L L M<br />

L L M<br />

×<br />

+<br />

×<br />

+<br />

IO<br />

IO<br />

L L M<br />

×<br />

+<br />

IO<br />

L L M<br />

×<br />

+<br />

IO<br />

IO IO IO IO CFG


L L M<br />

×<br />

+<br />

IO<br />

• I/O blocks<br />

L L M<br />

L L M<br />

×<br />

+<br />

×<br />

+<br />

IO<br />

IO<br />

L L M<br />

×<br />

+<br />

IO<br />

IO IO IO IO CFG


L L M<br />

×<br />

+<br />

IO<br />

L L M<br />

×<br />

+<br />

IO<br />

• Fixed logic blocks<br />

• BRAM: dual-port 1K×36b<br />

L L M<br />

L L M<br />

×<br />

+<br />

×<br />

+<br />

IO<br />

IO<br />

IO IO IO IO CFG


L L M<br />

×<br />

+<br />

IO<br />

L L M<br />

×<br />

+<br />

IO<br />

• Fixed logic blocks<br />

• BRAM: dual-port 1K×36b<br />

• DSP: fixed/floating multiply-add<br />

L L M<br />

L L M<br />

×<br />

+<br />

×<br />

+<br />

IO<br />

IO<br />

IO IO IO IO CFG


L L M<br />

L L M<br />

×<br />

+<br />

×<br />

+<br />

IO<br />

IO<br />

• Fixed logic blocks<br />

• BRAM: dual-port 1K×36b<br />

• DSP: fixed/floating multiply-add<br />

• Much more<br />

L L M<br />

L L M<br />

×<br />

+<br />

×<br />

+<br />

IO<br />

IO<br />

IO IO IO IO CFG


http://www.xilinx.com/content/dam/xilinx/imgs/products/zynq/zynq-ev-block.PNG


https://www.altera.com/content/dam/altera-www/global/en_US/pdfs/literature/wp/wp-01264-stratix10mx-devices-solve-memory-bandwidth-challenge.pdf


• Cloud computing<br />

• Deep learning<br />

•<br />

•<br />

• Interconnect and storage<br />

• SDN, NFV, programmable<br />

data plane<br />

•<br />

[http://www.xilinx.com/applications/megatrends.html]


[Andrew Putnam, Doug Burger, et al, MSR, ISCA-41, 2014] [https://www.microsoft.com/en-us/research/project/project-catapult/]


[Toward Accelerating Deep Learning at Scale Using Specialized Hardware in the Datacenter, Hot Chips 27]<br />

[http://www.hotchips.org/wp-content/uploads/hc_archives/hc27/HC27.25-Tuesday-Epub/HC27.25.40-FPGAs-Epub/HC27.25.432-Catapult_HOTCHIPS2015_Chung_DRAFT_V8.pdf]


[http://www.hotchips.org/wp-content/uploads/hc_archives/hc27/HC27.25-Tuesday-Epub/HC27.25.40-FPGAs-Epub/HC27.25.432-Catapult_HOTCHIPS2015_Chung_DRAFT_V8.pdf]


Verilog,<br />

VHDL<br />

Synthesis<br />

Technology<br />

Mapping<br />

Place<br />

Route<br />

Configuration<br />

Bitstream<br />

Slow!


Latency: 25121 clocks<br />

DSPs: 3<br />

[http://tcfpga.org/fpga2013/VivadoHLS_Tutorial.pdf]


#include <br />

ap_int<br />

ap_int<br />

ap_int<br />

#pragma HLS ARRAY_PARTITION DIM=2 VARIABLE=a complete<br />

#pragma HLS ARRAY_PARTITION DIM=1 VARIABLE=b complete<br />

Latency: 25121 clocks<br />

DSPs: 3<br />

↓<br />

Latency: 260 clocks<br />

DSPs: 16<br />

#pragma HLS pipeline<br />

ap_int<br />

[http://tcfpga.org/fpga2013/VivadoHLS_Tutorial.pdf]


• Clusters<br />

1 MIPS/<strong>LUT</strong>


{240,000 <strong>LUT</strong>s + 600 BRAMs} ÷ 320 <strong>LUT</strong>s ≈ 750 PEs??


IRAM<br />

2:1<br />

IRAM<br />

2:1<br />

IRAM<br />

2:1<br />

32 KB CRAM<br />

CLUSTER DATA RAM<br />

ACCELERATOR(S)<br />

IRAM<br />

2:1<br />

PE<br />

4:4<br />

PE<br />

PE<br />

PE<br />

PE<br />

PE<br />

PE<br />

PE<br />

XBAR


IRAM<br />

2:1<br />

IRAM<br />

2:1<br />

IRAM<br />

2:1<br />

32 KB CRAM<br />

CLUSTER DATA RAM<br />

ACCELERATOR(S)<br />

IRAM<br />

2:1<br />

HOPLITE<br />

ROUTER<br />

300<br />

NOC ITF<br />

PE<br />

4:4<br />

PE<br />

PE<br />

PE<br />

PE<br />

PE<br />

PE<br />

PE<br />

XBAR

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!