15.01.2013 Views

U. Glaeser

U. Glaeser

U. Glaeser

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

from modern FPGAs, e.g., Altera’s APEX 20K devices have built-in content addressable memories (CAMs),<br />

which would speed up the process of matching input strings with the dictionary.<br />

Arithmetic<br />

When designing a reconfigurable system, the widths of arithmetic function units, and hence their propagation<br />

delays, can be constrained trivially to the number of bits actually required for the application.<br />

This saves space, logic resources, and time. Designers also have considerable flexibility when complex<br />

arithmetic expressions must be evaluated; they can choose a single-stage combinatorial circuit or increase<br />

throughput by adding registers and forming a pipeline. This can often be done at essentially no cost: the<br />

logic blocks contain flip-flops already, so there is no space penalty and negligible time penalty.<br />

An application requiring floating point arithmetic may be a poor candidate for a reconfigurable<br />

system—to achieve performance comparable to that offered by a commodity processor will require significant<br />

effort; however, reconfigurable systems are excellent at processing streams of data from sensors:<br />

this data will be fixed point and readily handled by the same circuits used for integer arithmetic.<br />

CORDIC<br />

Even trigonometric functions of fixed-point data are readily implemented using CORDIC arithmetic.<br />

CORDIC algorithms are iterative, but require only shifts and adds. Again, the designer has a large space<br />

in which to work [26]. Bit-serial designs are simple and compact, but require many cycles; this may not<br />

be a problem if the input data rate is relatively slow. An iterative bit-parallel design will require more<br />

space but fewer cycles. Finally, the iterative loop can be unrolled by one or more stages to produce the<br />

desired throughput/space balance.<br />

String and Text Matching<br />

Genetic sequencing technology is just one technology that is producing enormous databases of data that<br />

must be searched. Thus, there has been considerable interest in hardware to accelerate the process of<br />

comparing new sequences with those in existing databases. Biologists use a measure known as the edit<br />

distance when comparing sequences. A simple implementation of a dynamic algorithm can compute the<br />

edit distance in O( mn)<br />

time ( m,<br />

n = length of source and target sequences, respectively), but if the calculation<br />

is carried out on a processor array, then it can be seen that all operations on the diagonal may be<br />

performed in parallel. A single board Splash 2 machine achieved a factor of 20 speedup over a CM-2—a<br />

massively parallel processor [27]!<br />

Similarly, full text searching of documents for relevance has sufficient parallelism to make FPGA-based<br />

hardware effective. When document content cannot be adequately described by keywords, a searcher will<br />

supply a list of relevant words and require that every word of every document be checked against the list<br />

in order to build a relevance score for each document. Gunther et al. demonstrated that the original<br />

SPACE machine was effective in this application [28]. They used a technique called “data folding” in<br />

which the data are built into the circuitry. Match circuitry is built for each of the words in the list of<br />

relevant words and incorporated into a fixed matching structure. This is an excellent example of the<br />

power of partial reconfiguration; circuit patterns corresponding to the relevant words are loaded for each<br />

new search. They demonstrate that matching in hardware does not need to be limited to direct characterby-character<br />

matching. It is possible to implement simple regular expressions allowing, e.g., matching<br />

on the root of a word only. Overall the system is able to test for each word in the relevant list in parallel<br />

and aggregate a weighted relevance score as the document is read, results become available at a rate which<br />

is basically limited by the rate at which documents can be read from disc.<br />

Simulations<br />

Cellular automata map readily to reconfigurable systems. They involve arrays of cells: each cell is a simple<br />

finite state machine whose behavior depends only on its current state and the state of cells in its immediate<br />

environment. Milne extends the fundamental cellular automata concept by removing the restrictions on<br />

identical components, uniform update and synchronization of all updates to create generalized cellular<br />

© 2002 by CRC Press LLC

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!