Elektronika 2010-11.pdf - Instytut SystemÃ³w Elektronicznych ...

More documents

Recommendations

Info

Tabl. 1. Synthesis results for two variants of CORDIC architectures (Xilinx Virtex-5 FPGA) Tab. 1. Wyniki syntezy dwóch wariantów architektury CORDIC (układ Xilinx Virtex-5) cking speed available. The two concepts were implemented in VHDL, verified and synthesized with Xilinx ISE tools for Virtex- 5 programmable device. For this comparative study fixed point arithmetic with 8-bit numbers coded in 2’complement was applied. Synthesis results summarized in Table 1. show clearly the difference between the low-cost and high-speed approach. SVD architecture Sequential Number of Slice Registers 56 208 Number of Slice LUTs 151 243 Pipelined Clock frequency 257 MHz 428 MHz Levels of Logic 10 2 Delay 3.891 ns 2.336 ns Delay on Logic 1.612 ns (41.4%) 0.659 ns (28.2%) Delay on Route 2.279 ns (58.6%) 1.677 ns (71.8%) General concept of SVD architecture based on CORDIC modules is presented in Fig. 3. The input is a basic 2x2 matrix. The primary output are two singular values, secondary output a b c d SVD 2×2 CORDIC CORDIC SHIFT-SUM SHIFT-SUM SHIFT-SUM SHIFT-SUM Fig. 3. Basic SVD architecture composed of CORDIC blocks Rys. 3. Podstawowa architektura SVD wykorzystująca moduły CORDIC σ 1 σ 2 θ p θ l are rotation angles. This module, either replicated or reused may be applied for construction of digital hardware working with bigger matrices. Detailed schematic of vector rotation block is presented in Fig. 4. It is a synchronous machine based on a single CORDIC element reused in consecutive iterations. The CORDIC output is fed back to the input via the register until the final value is obtained and latched. Rotation angle is delivered by the module shown in Fig. 5. Arithmetic block is reused again for consecutive iterations, thus the output is fed back. The appropriate angles for elementary rotations are stored in a memory. Control of data flow in these two modules is provided by the Finite State Machine working together with iteration counter. Activation of the strobe signal forces calculation of the angle and then the following steps of processing – left or right rotations and correction the output values scale, disturbed during iterative approximations. For this part of study, two kinds of number formats and arithmetic were applied. In the first approach the floating point numbers compatible with IEEE 754 standard [9] were used. In this format the bit vector consists of a sign bit, 8-bit, 2-complement coded exponent and 23-bit significand (non-negative). Another approach was fixed point arithmetic with 25-bit, 2- complement coded vectors. For constant angles specific format was chosen – fixed point with 2 bits reserved for integral part and the rest left for fractions (the possible angle values when scaled in radians do not exceed 2). CORDIC module described in previous section was redesigned twice for these two formats Tabl. 2. Synthesis results for 2 variants of SVD architecture (Xilinx Virtex-5 FPGA) Tab. 2. Wyniki syntezy dla dwóch wariantów architektury SVD (układ Xilinx Virtex-5) 32-bit IEEE floating point 25-bit fixed point Clock frequency 35 MHz 148 MHz Levels of Logic 74 35 Delay 28,602 ns 6.738 ns Number of Slice 337 (1%) 314 (1%) Registers Number of Slice LUTs 4648 (14%) 2609 (7%) nreset nreset x 1 d clk iteration, FSM state nreset y 2 shift–sum CORDIC shift–sum iteration, FSM state enable clk nreset Out 1 c y 1 Out 2 iteration, FSM state clk iteration, FSM state enable clk Fig. 4. SVD architecture – vector rotation block Rys. 4. Architektura SVD – blok obracania wektora 28 Elektronika 11/2010
nreset angle R z 1 angle L nreset zero clk iteration, FSM state ROM 24×29 rotation_ angle 1 ± di Z 2 – iteration, FSM state enable clk angle rotation_ angle 2 … rotation_ angle 23 rotation_ angle 24 Fig. 5. SVD architecture – calculation of rotation angle Rys. 5. Architektura SVD – blok wyliczania kąta obrotu |∆σ1/σ1| 1,4x10 -6 1,2x10 -6 1,0x10 -6 8,0x10 -7 6,0x10 -7 4,0x10 -7 2,0x10 -7 0,0 10 -23 10 -13 10 -3 10 7 10 17 10 27 10 37 σ1 Fig. 6. Relative error of singular value determination for two kinds of arithmetic approach – 25-bit fixed point (lower) and -32-bit floating point floating point (upper plot) Rys. 6. Względne błędy wyliczania wartości osobliwych dla dwóch wariantów arytmetyki – 25-bitowej ze stałym przecinkiem (u dołu) i 32-bitowego ze zmiennym przecinkiem (u góry) SVD architecture with two variants of arithmetic was implemented in VHDL and synthesized for Xilinx Virtex-5 device. Synthesis results are summarized in Table 2. If to compare allocation of resources there is no huge difference in number of registers used. On the other hand the floating point variant consumes much more combinatorial logic. There is huge difference in maximum clock speed – 148 MHz for fixed point version point and only 35 MHz for floating point approach. Arithmetic operations on floating point numbers require long chains of combinatorial logic which require more time to transfer signal from one register to another. The two variants were simulated in Xilinx ISE environment for several sample matrices. The results were sent to a file, converted and compared with the ones given by SVD algorithm run in a computer (Octave tools). Fig. 6 shows two plots of relative errors obtained for two architectures. It is a bit surprising that fixed point arithmetic delivers substantially better results. Conclusions A study of digital hardware dedicated to Singular Value Decomposition was performed. The motivation was authors interest in construction of specialized computing machines performing operations on matrices in highly parallel way. Significant effort was devoted to CORDIC algorithm which was used for SVD but may be treated as separate issue as well. The results lead to conclusion that contemporary FPGAs are very close to enable construction of machines dealing with huge computational complexity. Presented results, limited to small matrices are a good basis for further work, but at this stage deliver quite reasonable comparative material about architecture and arithmetic variants. In this context the results obtained for fixed and floating point are very interesting. As it was expected, fixed point approach provides higher processing speed and lower logic resources allocation. Surprising result was higher precision obtained with fixed point. Shall be noted however that 25-bit representation was selected after very careful considerations and estimations. Further research will focus on construction of devices dealing with matrices of higher dimension, perhaps with processing decomposed to basic 2x2 elements, so the described modules may be used without any redesign. An advantage of this approach is a chance to develop a methodology of processing matrices of unlimited dimension with limited number of basic SVD/CORDIC units. That would enable optimal utilization of currently available resources with at least partial independence on input complexity. References [1] Eckart C., Young G.: The approximation of one matrix by another of lower rank. Psychometrika, vol. 1, no. 3, 1936. [2] Volder J.E.: The CORDIC Trigonometric Computing Technique. IRE Transactions on Electronic Computers, 1959. [3] Golub G., Kahan W.: Calculating the singular values and pseudo-inverse of a matrix. J. SIAM Numerical Analysis, Ser. B, vol. 2, no. 2, 1965, pp. 205–224. [4] Brent R.P., Luk F.T., Van Loan C.F.: Computation of the singular value decomposition using mesh-connected processors. Journal for VLSI Computer Systems, vol. 1, no. 3, 1985, pp. 243–270. [5] Cavallaro J.R., Luk F.T.: CORDIC Arithmetic for a SVD Processor. Journal for Parallel and Distributed Computing, vol. 5, 1988, pp. 271–290. [6] Andraka R.: A Survey of CORDIC Algorithms for FPGA based computers. In FPGA ‘98: Proc. of sixth international symposium on Field programmable gate arrays ACM/SIGDA, 1998, pp. 191– 200. [7] Deprettere F. (ed.): SVD and signal processing. Algorithms, applications and architectures. Department of Electrical Engineering, Delft University of Technology, Elsevier Science Publishers B.V., Amsterdam, 1988. [8] Wang H., Leray P., Palicot J.: A CORDIC-based dynamically reconfigurable FPGA architecture for signal processing algorithms. URSI 08, The XXIX General Assembly of the International Union of Radio Science, Chicago IL, 2008. [9] Floating-point arithmetic, IEEE Std No. 754, 2008. Elektronika 11/2010 29
Page 3 and 4: ok LI nr 11/2010 • MATERIAŁY •
Page 5 and 6: Streszczenia artykułów ● Summar
Page 11 and 12: Nonlinear compact thermal model of
Page 13 and 14: Using the measurement results for t
Page 15 and 16: The circuit The scheme of the Krumm
Page 17 and 18: Fig. 6. Simulated minimum CSA feedb
Page 19 and 20: Tabl. 1. Seed applied and operation
Page 21 and 22: The Time-over-Threshold based silic
Page 23 and 24: Fig. 4. CSA characteristics and out
Page 25 and 26: The design of low power 11.6 mW hig
Page 27 and 28: minal V REF reference voltage of de
Page 29: Nevertheless the product of algorit
Page 33 and 34: Fig. 2. Block scheme of the device
Page 35 and 36: Good correlation between position o
Page 37 and 38: Fig. 4. Portable real-time PCR DNA
Page 39 and 40: application from another one by usi
Page 41 and 42: Fig. 5. The state diagram of Moore
Page 43 and 44: FPGA implementation of feature extr
Page 45 and 46: Tabl. 1. Frame execution time compa
Page 47 and 48: tion about a set. That is why we ca
Page 49 and 50: Fig. 6. Architecture of a VLSI circ
Page 51 and 52: final test result is related direct
Page 53 and 54: I/O other than digital, i.e. analog
Page 55 and 56: ved by the use of the Xquery (XML Q
Page 57 and 58: The innovative use of ontology in d
Page 59 and 60: publicly available. However, a clos
Page 61 and 62: a) b) c) d ) Fig. 6. Code generatio
Page 63 and 64: GSM\GPRS+GPS module is using hardwa
Page 65 and 66: Set of LCD screen, keyboard and spe
Page 67 and 68: This section presents a few pieces
Page 69 and 70: Model of human palm controlled by g
Page 71 and 72: The main module of the electronic s
Page 73 and 74: Bezkontaktowy czujnik przemieszczen
Page 75 and 76: multimetrów laboratoryjnych Rigol
Page 77 and 78: Poprawa zależności poziomu listk
Page 79 and 80: Rys. 1. Aproksymacja widma syntetyz
Page 81 and 82:
Między Web 2.0 i 3.0: Mobilne syst
Page 83 and 84:
Rys. 3. Przykłady różnych rodzaj
Page 85 and 86:
go; ekran rozpoznaje kształty poł
Page 87 and 88:
cją ostatnich lat); usługi są re
Page 89 and 90:
Rys. 8. Przykład informacji AR z S
Page 91 and 92:
The module of the demodulated lines
Page 93 and 94:
The amplitude modulated measuring s
Page 95 and 96:
• włączenie wymagań szkoleniow
Page 97 and 98:
Ogólnoświatowy system radionawiga
Page 99 and 100:
Urządzenie do generacji silnych i
Page 101 and 102:
Rys. 4. Cewka robocza widok schemat
Page 103 and 104:
Rys. 7. Schemat wzmacniaczy pomiaro
Page 105 and 106:
Rys. 13. Podmenu wyzwalania „Strz
Page 107 and 108:
Można zapisać: ~ (1) (2) (1) (2)
Page 109 and 110:
c 0 H 2 c 1 H 2 H 2 H 2 c 2 c 3 H 2
Page 111 and 112:
Technika próżni i technologie pr
Page 113 and 114:
Na początku września br. prof. T
Page 115 and 116:
V. Sprawy organizacyjno-członkowsk
Page 117 and 118:
2. Poprawie sytuacji finansowej Tow
Page 119 and 120:
Spektrometr Elektronowego Rezonansu
Page 121 and 122:
Rezonator prostokątny 2-wnękowy R
Page 123 and 124:
Rys. 9. Dwukanałowe spektrometry E
Page 125 and 126:
Rys. 1. Schemat blokowy modułu lok
Page 127 and 128:
Rys. 3. Wyniki testu u-blox LEA 4P
Page 129 and 130:
Analiza możliwości zastosowania s
Page 131 and 132:
|w| ⎛ t − b ⎞ wˆ ( a, b) =
Page 133 and 134:
Wcześniej opisano budowę anteny m
Page 135 and 136:
Rys. 10. Charakterystyka promieniow
Page 137 and 138:
Experimental evaluation In order to
Page 139:
Zaprenumeruj wiedz fachow 2011 WWW.
show all

Elektronika 2010-11.pdf - Instytut SystemÃ³w Elektronicznych ...

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?