05.04.2014 Views

Elektronika 2010-11.pdf - Instytut Systemów Elektronicznych ...

Elektronika 2010-11.pdf - Instytut Systemów Elektronicznych ...

Elektronika 2010-11.pdf - Instytut Systemów Elektronicznych ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

nreset<br />

angle R<br />

z 1<br />

angle L<br />

nreset<br />

zero<br />

clk<br />

iteration, FSM state<br />

ROM 24×29<br />

rotation_ angle 1<br />

±<br />

di<br />

Z 2<br />

–<br />

iteration,<br />

FSM state<br />

enable<br />

clk<br />

angle<br />

rotation_ angle 2<br />

…<br />

rotation_ angle 23<br />

rotation_ angle 24<br />

Fig. 5. SVD architecture – calculation of rotation angle Rys. 5. Architektura SVD – blok wyliczania kąta obrotu<br />

|∆σ1/σ1|<br />

1,4x10 -6<br />

1,2x10 -6<br />

1,0x10 -6<br />

8,0x10 -7<br />

6,0x10 -7<br />

4,0x10 -7<br />

2,0x10 -7<br />

0,0<br />

10 -23 10 -13 10 -3 10 7 10 17 10 27 10 37<br />

σ1<br />

Fig. 6. Relative error of singular value determination for two<br />

kinds of arithmetic approach – 25-bit fixed point (lower) and<br />

-32-bit floating point floating point (upper plot)<br />

Rys. 6. Względne błędy wyliczania wartości osobliwych dla<br />

dwóch wariantów arytmetyki – 25-bitowej ze stałym przecinkiem<br />

(u dołu) i 32-bitowego ze zmiennym przecinkiem (u góry)<br />

SVD architecture with two variants of arithmetic was implemented<br />

in VHDL and synthesized for Xilinx Virtex-5 device.<br />

Synthesis results are summarized in Table 2. If to compare<br />

allocation of resources there is no huge difference in number<br />

of registers used. On the other hand the floating point variant<br />

consumes much more combinatorial logic. There is huge difference<br />

in maximum clock speed – 148 MHz for fixed point<br />

version point and only 35 MHz for floating point approach.<br />

Arithmetic operations on floating point numbers require long<br />

chains of combinatorial logic which require more time to transfer<br />

signal from one register to another.<br />

The two variants were simulated in Xilinx ISE environment<br />

for several sample matrices. The results were sent to a file,<br />

converted and compared with the ones given by SVD algorithm<br />

run in a computer (Octave tools). Fig. 6 shows two plots of relative<br />

errors obtained for two architectures. It is a bit surprising<br />

that fixed point arithmetic delivers substantially better results.<br />

Conclusions<br />

A study of digital hardware dedicated to Singular Value Decomposition<br />

was performed. The motivation was authors interest in<br />

construction of specialized computing machines performing ope-<br />

rations on matrices in highly parallel way. Significant effort was<br />

devoted to CORDIC algorithm which was used for SVD but may<br />

be treated as separate issue as well. The results lead to conclusion<br />

that contemporary FPGAs are very close to enable construction<br />

of machines dealing with huge computational complexity.<br />

Presented results, limited to small matrices are a good basis<br />

for further work, but at this stage deliver quite reasonable comparative<br />

material about architecture and arithmetic variants. In<br />

this context the results obtained for fixed and floating point are<br />

very interesting. As it was expected, fixed point approach provides<br />

higher processing speed and lower logic resources allocation.<br />

Surprising result was higher precision obtained with fixed<br />

point. Shall be noted however that 25-bit representation was<br />

selected after very careful considerations and estimations.<br />

Further research will focus on construction of devices dealing<br />

with matrices of higher dimension, perhaps with processing<br />

decomposed to basic 2x2 elements, so the described<br />

modules may be used without any redesign. An advantage of<br />

this approach is a chance to develop a methodology of processing<br />

matrices of unlimited dimension with limited number<br />

of basic SVD/CORDIC units. That would enable optimal utilization<br />

of currently available resources with at least partial<br />

independence on input complexity.<br />

References<br />

[1] Eckart C., Young G.: The approximation of one matrix by another<br />

of lower rank. Psychometrika, vol. 1, no. 3, 1936.<br />

[2] Volder J.E.: The CORDIC Trigonometric Computing Technique.<br />

IRE Transactions on Electronic Computers, 1959.<br />

[3] Golub G., Kahan W.: Calculating the singular values and pseudo-inverse<br />

of a matrix. J. SIAM Numerical Analysis, Ser. B, vol.<br />

2, no. 2, 1965, pp. 205–224.<br />

[4] Brent R.P., Luk F.T., Van Loan C.F.: Computation of the singular<br />

value decomposition using mesh-connected processors. Journal<br />

for VLSI Computer Systems, vol. 1, no. 3, 1985, pp. 243–270.<br />

[5] Cavallaro J.R., Luk F.T.: CORDIC Arithmetic for a SVD Processor.<br />

Journal for Parallel and Distributed Computing, vol. 5, 1988,<br />

pp. 271–290.<br />

[6] Andraka R.: A Survey of CORDIC Algorithms for FPGA based<br />

computers. In FPGA ‘98: Proc. of sixth international symposium<br />

on Field programmable gate arrays ACM/SIGDA, 1998, pp. 191–<br />

200.<br />

[7] Deprettere F. (ed.): SVD and signal processing. Algorithms, applications<br />

and architectures. Department of Electrical Engineering,<br />

Delft University of Technology, Elsevier Science Publishers<br />

B.V., Amsterdam, 1988.<br />

[8] Wang H., Leray P., Palicot J.: A CORDIC-based dynamically<br />

reconfigurable FPGA architecture for signal processing algorithms.<br />

URSI 08, The XXIX General Assembly of the International<br />

Union of Radio Science, Chicago IL, 2008.<br />

[9] Floating-point arithmetic, IEEE Std No. 754, 2008.<br />

<strong>Elektronika</strong> 11/<strong>2010</strong> 29

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!