Elektronika 2010-11.pdf - Instytut Systemów Elektronicznych ...
Elektronika 2010-11.pdf - Instytut Systemów Elektronicznych ...
Elektronika 2010-11.pdf - Instytut Systemów Elektronicznych ...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
nreset<br />
angle R<br />
z 1<br />
angle L<br />
nreset<br />
zero<br />
clk<br />
iteration, FSM state<br />
ROM 24×29<br />
rotation_ angle 1<br />
±<br />
di<br />
Z 2<br />
–<br />
iteration,<br />
FSM state<br />
enable<br />
clk<br />
angle<br />
rotation_ angle 2<br />
…<br />
rotation_ angle 23<br />
rotation_ angle 24<br />
Fig. 5. SVD architecture – calculation of rotation angle Rys. 5. Architektura SVD – blok wyliczania kąta obrotu<br />
|∆σ1/σ1|<br />
1,4x10 -6<br />
1,2x10 -6<br />
1,0x10 -6<br />
8,0x10 -7<br />
6,0x10 -7<br />
4,0x10 -7<br />
2,0x10 -7<br />
0,0<br />
10 -23 10 -13 10 -3 10 7 10 17 10 27 10 37<br />
σ1<br />
Fig. 6. Relative error of singular value determination for two<br />
kinds of arithmetic approach – 25-bit fixed point (lower) and<br />
-32-bit floating point floating point (upper plot)<br />
Rys. 6. Względne błędy wyliczania wartości osobliwych dla<br />
dwóch wariantów arytmetyki – 25-bitowej ze stałym przecinkiem<br />
(u dołu) i 32-bitowego ze zmiennym przecinkiem (u góry)<br />
SVD architecture with two variants of arithmetic was implemented<br />
in VHDL and synthesized for Xilinx Virtex-5 device.<br />
Synthesis results are summarized in Table 2. If to compare<br />
allocation of resources there is no huge difference in number<br />
of registers used. On the other hand the floating point variant<br />
consumes much more combinatorial logic. There is huge difference<br />
in maximum clock speed – 148 MHz for fixed point<br />
version point and only 35 MHz for floating point approach.<br />
Arithmetic operations on floating point numbers require long<br />
chains of combinatorial logic which require more time to transfer<br />
signal from one register to another.<br />
The two variants were simulated in Xilinx ISE environment<br />
for several sample matrices. The results were sent to a file,<br />
converted and compared with the ones given by SVD algorithm<br />
run in a computer (Octave tools). Fig. 6 shows two plots of relative<br />
errors obtained for two architectures. It is a bit surprising<br />
that fixed point arithmetic delivers substantially better results.<br />
Conclusions<br />
A study of digital hardware dedicated to Singular Value Decomposition<br />
was performed. The motivation was authors interest in<br />
construction of specialized computing machines performing ope-<br />
rations on matrices in highly parallel way. Significant effort was<br />
devoted to CORDIC algorithm which was used for SVD but may<br />
be treated as separate issue as well. The results lead to conclusion<br />
that contemporary FPGAs are very close to enable construction<br />
of machines dealing with huge computational complexity.<br />
Presented results, limited to small matrices are a good basis<br />
for further work, but at this stage deliver quite reasonable comparative<br />
material about architecture and arithmetic variants. In<br />
this context the results obtained for fixed and floating point are<br />
very interesting. As it was expected, fixed point approach provides<br />
higher processing speed and lower logic resources allocation.<br />
Surprising result was higher precision obtained with fixed<br />
point. Shall be noted however that 25-bit representation was<br />
selected after very careful considerations and estimations.<br />
Further research will focus on construction of devices dealing<br />
with matrices of higher dimension, perhaps with processing<br />
decomposed to basic 2x2 elements, so the described<br />
modules may be used without any redesign. An advantage of<br />
this approach is a chance to develop a methodology of processing<br />
matrices of unlimited dimension with limited number<br />
of basic SVD/CORDIC units. That would enable optimal utilization<br />
of currently available resources with at least partial<br />
independence on input complexity.<br />
References<br />
[1] Eckart C., Young G.: The approximation of one matrix by another<br />
of lower rank. Psychometrika, vol. 1, no. 3, 1936.<br />
[2] Volder J.E.: The CORDIC Trigonometric Computing Technique.<br />
IRE Transactions on Electronic Computers, 1959.<br />
[3] Golub G., Kahan W.: Calculating the singular values and pseudo-inverse<br />
of a matrix. J. SIAM Numerical Analysis, Ser. B, vol.<br />
2, no. 2, 1965, pp. 205–224.<br />
[4] Brent R.P., Luk F.T., Van Loan C.F.: Computation of the singular<br />
value decomposition using mesh-connected processors. Journal<br />
for VLSI Computer Systems, vol. 1, no. 3, 1985, pp. 243–270.<br />
[5] Cavallaro J.R., Luk F.T.: CORDIC Arithmetic for a SVD Processor.<br />
Journal for Parallel and Distributed Computing, vol. 5, 1988,<br />
pp. 271–290.<br />
[6] Andraka R.: A Survey of CORDIC Algorithms for FPGA based<br />
computers. In FPGA ‘98: Proc. of sixth international symposium<br />
on Field programmable gate arrays ACM/SIGDA, 1998, pp. 191–<br />
200.<br />
[7] Deprettere F. (ed.): SVD and signal processing. Algorithms, applications<br />
and architectures. Department of Electrical Engineering,<br />
Delft University of Technology, Elsevier Science Publishers<br />
B.V., Amsterdam, 1988.<br />
[8] Wang H., Leray P., Palicot J.: A CORDIC-based dynamically<br />
reconfigurable FPGA architecture for signal processing algorithms.<br />
URSI 08, The XXIX General Assembly of the International<br />
Union of Radio Science, Chicago IL, 2008.<br />
[9] Floating-point arithmetic, IEEE Std No. 754, 2008.<br />
<strong>Elektronika</strong> 11/<strong>2010</strong> 29