15.01.2013 Views

U. Glaeser

U. Glaeser

U. Glaeser

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

measured in megabits, not megabytes. Although external memory can always be added and wide<br />

buses employed to provide high bandwidth, this uses valuable I/O pins, the path to external memory<br />

is likely to become a bottleneck and limit performance.<br />

(c) “ Decision-free”<br />

processing patterns: Multiplexors in the data paths will readily handle simple decisions,<br />

which switch the dataflow between down-line functional blocks, but complex decision trees<br />

will generally not map efficiently to hardware. When large numbers of branches exist, inevitably<br />

many paths are little used and thus expensive to implement in fixed hardware relative to their<br />

benefit. In particular, error handling logic will generally be complex relative to its frequency of<br />

use. Complex decision logic is efficiently handled in high-performance modern processors, which<br />

move common logic to cache at the expense of little used code. When branches have similar<br />

probabilities, speculative execution ensures good average rates of instruction completion. However,<br />

this criterion for successful hardware implementation should be applied with caution: if high<br />

throughput for all possible processing paths is required, then the resources devoted to implementing<br />

all paths (including little used ones) may be justified. In the near future, dynamically reconfigurable<br />

logic may also provide effective solutions when there are complex decision trees.<br />

(d) Ability to use local (i.e., between neighbouring devices) data paths in problems that are large<br />

enough to require multiple devices. Most systems provide high-bandwidth paths between nearest<br />

neighbors with lower-bandwidth multiple device buses and global interconnects. The 3-D Achilles<br />

design provides more device–device data path flexibility but at a cost—wiring patterns must be<br />

set up manually for each application [14].<br />

(e) Integer arithmetic: Although it is possible to implement arbitrary precision floating-point processors<br />

in FPGAs, the number of logic blocks required and hence the delays introduced by data paths<br />

between logic blocks make them expensive in area and low in performance compared to those<br />

5<br />

found in superscalar processors. On the other hand, the ability to easily implement arbitrary<br />

precision integer arithmetic allows a reconfigurable system designer to pack more functional units<br />

of higher performance into a given area by choosing the minimum required word length.<br />

Image Processing<br />

Real-time image processing presents a classic application for custom processors. A stream of pixels<br />

emanating from a camera can be passed through a wide deep pipeline—performing as many unrelated<br />

and complex operations on each pixel as needed. Unrelated operations (e.g., threshholding and masking)<br />

are performed in parallel and complex operations (e.g., masking) are performed in deep pipelines. For<br />

basic operations, little storage is required and the relatively inefficient memory on an FPGA suffices. A<br />

masking operation, such as applying a 3 × 3 mask to a group of neighboring pixels, requires the storage<br />

of two scanlines in a shift register and thus is feasible in large FPGAs. The reverse process, visualisation, or<br />

the processing of machine generated images for display is already the domain of special purpose proces-<br />

6<br />

sors, but market volumes have justified use of ASICs.<br />

Stereo Vision<br />

The matching problem dominates research into fully automated stereo vision systems; it requires the<br />

comparison of pixels (or regions of pixels) to determine matches between corresponding segments of two<br />

images. The distance between matching regions on the left and right images (the disparity) is combined<br />

with camera system geometry to determine the distance to objects in the field of view. Without the apparent<br />

ability of a human brain to “jump” to the obvious match, a machine must try all possible disparities in<br />

order to find candidate matches between pixels or to correlate regions. Objects close to the camera system<br />

5<br />

Superscalar processor manufacturers are also prepared to invest large amounts in order to win benchmark competitions,<br />

which allows man-years of effort to be used to optimize individual circuits and layouts.<br />

6<br />

However, prototyping designs which are destined for ASICs are a major application for reconfigurable processors.<br />

They can be used to ensure that a design is correct and that the silicon will function correctly first time. Some foundries<br />

will take FPGA-based designs and convert them directly to ASICs.<br />

© 2002 by CRC Press LLC

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!