01.12.2012 Views

Architecture of Computing Systems (Lecture Notes in Computer ...

Architecture of Computing Systems (Lecture Notes in Computer ...

Architecture of Computing Systems (Lecture Notes in Computer ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

A Tightly Coupled Accelerator Infrastructure<br />

for Exact Arithmetics<br />

Fabian Nowak and Ra<strong>in</strong>er Buchty<br />

Chair for <strong>Computer</strong> <strong>Architecture</strong><br />

Karlsruhe Institute <strong>of</strong> Technology<br />

76128 Karlsruhe, Germany<br />

{nowak,buchty}@kit.edu<br />

Abstract. Processor speed and available comput<strong>in</strong>g power constantly<br />

<strong>in</strong>creases, enabl<strong>in</strong>g computation <strong>of</strong> more and more complex problems<br />

such as numerical simulations <strong>of</strong> physical processes. In this doma<strong>in</strong>, however,<br />

the problem <strong>of</strong> accuracy arises due to round<strong>in</strong>g <strong>of</strong> <strong>in</strong>termediate<br />

results. One solution is to avoid <strong>in</strong>termediate round<strong>in</strong>g by us<strong>in</strong>g exact<br />

arithmetic. The use <strong>of</strong> FPGAs as application-specific accelerators can<br />

speed up such operations compared to their s<strong>of</strong>tware implementation.<br />

In this paper, we present a system approach employ<strong>in</strong>g state-<strong>of</strong>-the art<br />

FPGA and <strong>in</strong>terconnection technology for exact arithmetic with doubleprecision<br />

operands, deliver<strong>in</strong>g up to 400M exact MACs/s <strong>in</strong> total and<br />

provid<strong>in</strong>g a speedup <strong>of</strong> up to 88 times over compet<strong>in</strong>g s<strong>of</strong>tware implementations<br />

<strong>in</strong> the case <strong>of</strong> matrix multiplication.<br />

1 Introduction<br />

With the computation <strong>of</strong> <strong>in</strong>creas<strong>in</strong>gly complex problems, accuracy issues arose<br />

related to round<strong>in</strong>g effects tak<strong>in</strong>g place <strong>in</strong> current FPU implementations. These<br />

may lead to unsatisfy<strong>in</strong>g results and even physical damage, if upfront simulations<br />

do not <strong>in</strong>dicate certa<strong>in</strong> problems. This problem is typically addressed by<br />

certa<strong>in</strong> exact arithmetics built <strong>in</strong>to mathematics and simulation packages. Such<br />

arithmetics <strong>in</strong>crease the width <strong>of</strong> the <strong>in</strong>ternal data representation or concatenate<br />

several float<strong>in</strong>g-po<strong>in</strong>t numbers <strong>in</strong> order to enlarge the “accuracy w<strong>in</strong>dow”<br />

<strong>in</strong> which no round<strong>in</strong>g is necessary. While these s<strong>of</strong>tware implementations provide<br />

a viable workaround, they are magnitudes slower than native hardware support.<br />

Custom accelerator hardware can speed up such operations us<strong>in</strong>g several design<br />

techniques, e.g. pipel<strong>in</strong>ed execution and parallelization.<br />

We therefore present an accelerator system for exact arithmetics. Based on a<br />

careful exam<strong>in</strong>ation <strong>of</strong> the underly<strong>in</strong>g algorithm for implement<strong>in</strong>g exact arithmetics,<br />

a hardware solution employ<strong>in</strong>g pipel<strong>in</strong><strong>in</strong>g techniques and exploit<strong>in</strong>g parallelism<br />

is proposed deliver<strong>in</strong>g up to 400M exact MAC operations per second.<br />

In the rema<strong>in</strong>der, we first outl<strong>in</strong>e related work <strong>in</strong> Section 2. We then present<br />

our general architecture design <strong>in</strong> Section 3 before discuss<strong>in</strong>g implementation<br />

issues and viable solutions <strong>in</strong> Section 4. The elaborated design is thoroughly<br />

evaluated <strong>in</strong> Section 5. We conclude this paper by summ<strong>in</strong>g up the results and<br />

present<strong>in</strong>g our plans for future work <strong>in</strong> Section 6.<br />

C. Müller-Schloer, W. Karl, and S. Yehia (Eds.): ARCS 2010, LNCS 5974, pp. 222–233, 2010.<br />

c○ Spr<strong>in</strong>ger-Verlag Berl<strong>in</strong> Heidelberg 2010

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!