15.01.2013 Views

U. Glaeser

U. Glaeser

U. Glaeser

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

FIGURE 9.30 Block diagram of high-speed parallel multiplier.<br />

multiplication requires a 54 × 54-bit hardware multiplier, because the mantissa is represented by 52 bits<br />

internally, and a hidden bit and a sign bit must be added to manipulate 2’s complement numbers.<br />

A fast 54 × 54-bit parallel structured multiplier was developed by Mori, et al. in 1991 [7]. They adopted<br />

the 2-bit Booth algorithm and the Wallace tree composed of 58 transistor 4-2 compressors. By adopting<br />

the Wallace tree composed of the 4-2 compressors, only four addition stages suffice to compress the<br />

maximum number of the partial product bits at the same bit position. This design adopts an XOR gate<br />

that is a pseudo-CMOS circuit shown in Fig. 9.19(d) to increase the operation speed of 4-2 compressors<br />

and the final CPA. They obtained a 54 × 54-bit multiplier with a delay time of 10 ns and area of 12.5 mm 2<br />

(transistor count is 81,600) in 0.5 µm CMOS technology.<br />

Ohkubo, et al. implemented a 54 × 54-bit parallel multiplier by utilizing pass-transistor multiplexers<br />

[35]. The delay time constructed with them can be made smaller than that implemented in the conventional<br />

CMOS gates because of shorter critical path within the circuit. They constructed a CSA tree in<br />

Fig. 9.31 only by 4-2 compressors shown in Fig. 9.27(b). By combining a 4-2 compressor tree with a<br />

conditional carry-selection (CCS) adder [35], they obtained a fast multiplier with a delay time of 4.4 ns<br />

and area of 12.9 mm 2 (transistor count is 100,200) in 0.25 µm CMOS technology.<br />

Goto proposed a new layout scheme named “Regularly Structured Tree (RST)” for implementing the<br />

Wallace tree in 1992 [34]. In this scheme, partial product bits with a maximum of 28 at the same bit<br />

position to be compressed into two for a 54 × 54-bit parallel multiplier are first divided into four 7-2<br />

compressor blocks, as shown in Fig. 9.32. In this figure, a 4D2 block consists of two sets of four Booth<br />

selectors and a 4-2 compressor, and a 3D2 block consists of two sets of three Booth selectors and a FA.<br />

A 4W means a 4-2 compressor in the same figure. Thus, a 7D4 block constitutes four 7-2 compressors<br />

at the consecutive bit positions. Arranging this 7D4 block with regularity as shown in Fig. 9.33, the Booth<br />

selectors and the CSA part of a 54 × 54-bit parallel multiplier can be systematically laid out including<br />

the intermediate wiring among the blocks. This scheme simplifies drastically the complicated layout and<br />

wiring among not only the compressors in the CSA part but also the compressors and the Booth selectors.<br />

In a modified version of the RST multiplier, the delay time of 4.1 ns and as small size as 1.27 mm 2<br />

(transistor count is 60,797) were obtained in 0.25 µm CMOS technology [36]. By adopting a 4-2<br />

compressor with 48 transistors (Fig. 9.27(c)) and the sign-select Booth recoding algorithm as described<br />

earlier, the total number of transistors were reduced by 24% as compared with that of the earlier<br />

design.<br />

© 2002 by CRC Press LLC<br />

Multiplier:B<br />

Booth encoder<br />

Multiplicand:A<br />

Booth<br />

selector<br />

CSA(Wallace)<br />

tree<br />

Carry-propagate<br />

adder(CPA)<br />

Product:Z

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!