05.03.2015 Views

High Performance Reconfigurable Computing

High Performance Reconfigurable Computing

High Performance Reconfigurable Computing

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Anthony Agresta & Jeremy Coon


Intro: <strong>High</strong> <strong>Performance</strong><br />

<strong>Reconfigurable</strong> <strong>Computing</strong><br />

<strong>Reconfigurable</strong> <strong>Computing</strong> incorporates<br />

the use of a high-speed reprogrammable<br />

“fabric,” which is (re)programmed as<br />

needed in order to solve tasks.<br />

Takes advantage of data-level parallelism<br />

in order to boost performance<br />

Reprogramming a single FPGA as needed<br />

can be cheaper than having several ASICs<br />

on a board


<strong>Computing</strong> System Element Choices<br />

Programmability /<br />

Flexibility<br />

GPPs<br />

General<br />

Purpose<br />

Processors<br />

Superscalar<br />

VLIW<br />

Application<br />

Specific<br />

Processors<br />

DSPs<br />

Network Processors<br />

Graphics Processors<br />

…..<br />

Re-configurable<br />

Hardware<br />

Hardware customization/reconfigurablity, how?<br />

Change both functionality of hardware cells (elements)<br />

and their spatial connectivity to match requirements of<br />

computation/application on the fly (at runtime).<br />

<strong>Reconfigurable</strong> <strong>Computing</strong><br />

Also known as Custom <strong>Computing</strong> Machines (CCMs)<br />

Utilize hardware devices customized to match computation<br />

Using: FPGAs (Fine grain) or<br />

Micro-coded arrays of simple processors (coarse grain)<br />

Co-Processors<br />

ASICs<br />

Specialization , Development cost/time<br />

<strong>Performance</strong>/Chip Area/Watt<br />

(Computational Efficiency)<br />

+ Shorter Useful Life cycle<br />

#3 lec # 9<br />

Fall 2010<br />

10-20-2010


What is <strong>Reconfigurable</strong> <strong>Computing</strong> (RC)?<br />

• Utilize reconfigurable hardware devices: (spatially-programmed connections of<br />

hardware processing elements) tailored to application:<br />

• Customizing hardware to match computations needed/present in a particular<br />

application by changing hardware functionality on the fly (at runtime).<br />

• <strong>Reconfigurable</strong> <strong>Computing</strong> Goal: Using reconfigurable hardware devices to build<br />

systems with advantages over conventional computing solutions in terms of:<br />

- Flexibility - <strong>Performance</strong> - Power - Time-to-market - Life cycle cost<br />

(vs. ASICS) Computational Efficiency (vs. ASICS) (vs. ASICS)<br />

(vs. processors)<br />

“Hardware” customized to<br />

specifics of problem.<br />

Direct map of problem specific<br />

dataflow, control.<br />

Circuits “adapted” as problem<br />

requirements change.<br />

Still spatial computing but both<br />

functionality and connectivity of<br />

hardware elements are not fixed<br />

#4 lec # 9<br />

Fall 2010<br />

10-20-2010


Intro: von Neumann Architecture<br />

Single in-order execution<br />

Implementations have faster clock<br />

speeds than reconfigurable computers<br />

Much less parallelism due to the<br />

limitations of the architecture


Intro: <strong>High</strong> <strong>Performance</strong><br />

<strong>Reconfigurable</strong> <strong>Computing</strong><br />

Processor is “rewired” as needed in order to<br />

perform a task in massive parallel<br />

Slower clock speed than GPPs<br />

Much more gets done per cycle due to<br />

parallelism


Spatial vs. Temporal <strong>Computing</strong><br />

Spatial<br />

(using hardware)<br />

Temporal<br />

Space vs. Time Trade-off<br />

(using software/program)<br />

Processor<br />

Instructions<br />

Defined by fixed functionality<br />

and connectivity of hardware elements<br />

Processor running programs written using<br />

a pre-defined fixed set of instructions (ISA)<br />

#7 lec # 9<br />

Fall 2010<br />

10-20-2010


Approaches for HPRC<br />

• Pure FPGA approach:<br />

○ An entire system is built around an FPGA,<br />

which is programmed as needed to solve a<br />

task.


Approaches for HPRC<br />

• “Hybrid-core” approach<br />

○ An FPGA is used alongside a general<br />

purpose processor, often in the form of an<br />

FPGA expansion board or coprocessor<br />

installed into a normal computer.<br />

○ The GPP reprograms the FPGA to do<br />

massively parallel work best suited to it.


FPGAs<br />

The FPGA, or Field Programmable Gate<br />

Array, lies at the heart of most<br />

reconfigurable computing designs<br />

An FPGA’s function is determined long<br />

after it is manufactured<br />

Programmed using a variant of C, or an<br />

HDL (often VHDL or Verilog)


Fine-grain <strong>Reconfigurable</strong> Hardware Devices: FPGAs<br />

Conventional FPGA Tile<br />

K-LUT (typical k=4)<br />

w/ optional<br />

output Flip-Flop<br />

~ 75% of FPGA area<br />

4-LUT<br />

~ 25% of FPGA area<br />

Or configurable Logic Block (CLB)<br />

#12 lec # 9<br />

Fall 2010<br />

10-20-2010


FPGA: Pros and Cons<br />

Pros:<br />

• FPGAs offer the reconfigurable hardware<br />

needed for HPRC<br />

• Entire FPGA can be used each cycle<br />

• Power Consumption<br />

• Reduced time to market / startup cost<br />

compared to ASIC<br />

Cons:<br />

• Clock speed<br />

• Harder to program than a GPP


Applications of FPGAs and HPRC<br />

Programmable firmware for consumer<br />

electronics<br />

Multi-band phones<br />

Upgradeable firmware for consumer<br />

electronics (game consoles, etc.)


Applications of FPGAs and HPRC<br />

Embedded systems, “systems-on-achip”<br />

Hardware cryptography<br />

Sensors<br />

Image processing


Sample Configurable <strong>Computing</strong> Application:<br />

Prototype Video Communications System<br />

Uses a single FPGA to perform four functions that typically require separate chips.<br />

A memory chip stores the four circuit configurations and loads them sequentially into the<br />

FPGA as needed.<br />

Initially, the FPGA's circuits are configured to acquire digitized video data.<br />

The chip is then rapidly reconfigured to transform the video information into a<br />

compressed form and reconfigured again to prepare it for transmission.<br />

Finally, the FPGA circuits are reconfigured to modulate and transmit the video<br />

information.<br />

At the receiver, the four configurations are applied in reverse order to demodulate the<br />

data, uncompress the image and then send it to a digital-to-analog converter so it can be<br />

displayed on a television screen.<br />

#16 lec # 9<br />

Fall 2010<br />

10-20-2010


Conclusions<br />

In the future, mainstream computers<br />

might contain a programmable FPGA<br />

coprocessor<br />

This could allow applications to “re-wire”<br />

a part of your computer in order to<br />

perform required number crunching<br />

faster than with a traditional CPU


Conclusions<br />

Modern processors have hit a physical<br />

clock-speed barrier<br />

Best way to continue performance gains<br />

as dictated by Moore’s Law is increased<br />

parallelism<br />

FPGAs and RC offer a good method to<br />

take advantage of data-level parallelism<br />

and increased transistor count.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!