12.07.2015 Views

Implementation of MAC unit using booth multiplier ... - Gimt.edu.in

Implementation of MAC unit using booth multiplier ... - Gimt.edu.in

Implementation of MAC unit using booth multiplier ... - Gimt.edu.in

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

International Journal <strong>of</strong> Applied Eng<strong>in</strong>eer<strong>in</strong>g Research, ISSN 0973-4562 Vol.7 No.11 (2012)© Research India Publications; http://www.ripublication.com/ijaer.htm<strong>Implementation</strong> <strong>of</strong> <strong>MAC</strong> <strong>unit</strong> <strong>us<strong>in</strong>g</strong> <strong>booth</strong> <strong>multiplier</strong> & ripple carry adderSwati Malik and Sangeeta DhallDepartment <strong>of</strong> Electronics Eng<strong>in</strong>eer<strong>in</strong>g YMCAUST Faridabad, Indiaer.swatimalik@gmail.com sangeeta_dhall@yahoo.co.<strong>in</strong>AbstractDigital signal process<strong>in</strong>g is the application <strong>of</strong> mathematicaloperations to digitally represented signals. DSP processorsshare basic features designed to support high-performance,repetitive, numerically <strong>in</strong>tensive tasks. <strong>MAC</strong> is the mostimportant block <strong>in</strong> DSP system. High throughput <strong>multiplier</strong>accumulator (<strong>MAC</strong>) is always a key element to achieve ahigh-performance digital signal process<strong>in</strong>g application for realtime signal process<strong>in</strong>g applications. This is because speed andthroughput rate are always the concerns <strong>of</strong> digital signalprocess<strong>in</strong>g systems. This is because, the limited battery energy<strong>of</strong> these portable products restricts the power consumption <strong>of</strong>the system. The goal <strong>of</strong> this project is to design the VLSIimplementation <strong>of</strong> <strong>MAC</strong> for high-speed DSP applications. Fordesign<strong>in</strong>g the <strong>MAC</strong>, various <strong>multiplier</strong>s and adders areconsidered. The total operation is coded with VHDL,synthesized and simulated <strong>us<strong>in</strong>g</strong> Xill<strong>in</strong>x ISE 6.2i.Keywords— VHDL, Multipliers, Adders, Shift registers.IntroductionAs <strong>in</strong>tegrated circuit technology has improved to allow moreand more components on a chip, digital systems havecont<strong>in</strong>ued to grow <strong>in</strong> complexity. As digital systems havebecome more complex, detailed design <strong>of</strong> the systems at thegate and flip-flop level has become very tedious and timeconsum<strong>in</strong>g. For this reason, use <strong>of</strong> hardware descriptionlanguages <strong>in</strong> the digital design process cont<strong>in</strong>ues to grow <strong>in</strong>importance. A hardware description language allows a digitalsystem to be designed and debugged at a higher level beforeconversion to the gate and flip-flop level.DSP processors are microprocessors designed to performdigital signal process<strong>in</strong>g—the mathematical manipulation <strong>of</strong>digitally represented signals. Digital signal process<strong>in</strong>g is one<strong>of</strong> the core technologies <strong>in</strong> rapidly grow<strong>in</strong>g application areassuch as wireless communications, audio and video process<strong>in</strong>g,and <strong>in</strong>dustrial control. Digital signal process<strong>in</strong>g (DSP)applications constitute the critical operations which usually<strong>in</strong>volve many multiplications and accumulations. Hence, highthroughput <strong>multiplier</strong> accumulator (<strong>MAC</strong>) is always a keyelement to achieve a high-performance digital signalprocess<strong>in</strong>g application for real time signal process<strong>in</strong>gapplications.In the last few years, the ma<strong>in</strong> consideration <strong>of</strong> <strong>MAC</strong>design has been to enhance its speed. This is because speedand throughput rate are always the concerns <strong>of</strong> digital signalprocess<strong>in</strong>g systems. Due to the <strong>in</strong>crease <strong>of</strong> portable electronicproducts, low power designs have also become majorconsiderations. This is because the limited battery energy <strong>of</strong>these portable products restricts the power consumption <strong>of</strong> thesystem. Therefore, the motivation beh<strong>in</strong>d this project is to<strong>in</strong>vestigate various pipel<strong>in</strong>ed <strong>MAC</strong> architectures and circuitand the design techniques which are suitable for theimplementation <strong>of</strong> high throughput signal process<strong>in</strong>galgorithms. The goal <strong>of</strong> this project is to design the VLSIimplementation <strong>of</strong> pipel<strong>in</strong>ed <strong>MAC</strong> for high-speed DSPapplications. For design<strong>in</strong>g the <strong>MAC</strong>, various architectures <strong>of</strong><strong>multiplier</strong>s and adders are considered. The total process iscoded with VHDL to describe the hardware.Overview <strong>of</strong> mac <strong>unit</strong>Multiply-accumulate operation is one <strong>of</strong> the basic arithmeticoperations extensively used <strong>in</strong> modern digital signalprocess<strong>in</strong>g (DSP). The <strong>MAC</strong> <strong>unit</strong> provides high-speedmultiplication, multiplication with cumulative addition,multiplication with cumulative subtraction, saturation, andclear-to-zero functions. Therefore, the ma<strong>in</strong> motivation is to<strong>in</strong>vestigate various pipel<strong>in</strong>ed <strong>MAC</strong> architectures and circuitand the design techniques which are suitable for theimplementation <strong>of</strong> high through put signal process<strong>in</strong>galgorithms.Hence, a high-speed <strong>MAC</strong> that is capable <strong>of</strong> support<strong>in</strong>gmultiple precisions and parallel operations is highly desirable.<strong>MAC</strong> is composed <strong>of</strong> an adder, <strong>multiplier</strong> and an accumulator.The implementation <strong>of</strong> the <strong>multiplier</strong> is <strong>in</strong> the form <strong>of</strong> BoothMultiplier. The adder used is Ripple Carry Adder. The layout<strong>of</strong> this adder is simple which allows for fast design time. TheParallel <strong>in</strong> Parallel out (PIPO) shift register is used as theaccumulator. The <strong>in</strong>puts for the <strong>MAC</strong> are to be fetched frommemory location and fed to the <strong>multiplier</strong> block <strong>of</strong> the <strong>MAC</strong>,which will perform multiplication and give the result to theadder which will accumulate the result and then will store theresult <strong>in</strong>to a memory location. This entire process is to beachieved <strong>in</strong> a s<strong>in</strong>gle clock cycle <strong>in</strong> the architecture <strong>of</strong> the<strong>MAC</strong> <strong>unit</strong>. The <strong>MAC</strong> design consists <strong>of</strong> one 8-bit <strong>booth</strong><strong>multiplier</strong>, one 16-bit ripple carry adder & a 17-bitaccumulator <strong>us<strong>in</strong>g</strong> PIPO shift register. To multiply the values<strong>of</strong> A and B, Booth Multiplier is used <strong>in</strong>stead <strong>of</strong> conventional<strong>multiplier</strong> because this is simple to design. However to ga<strong>in</strong>better performance, parallel <strong>multiplier</strong>s are used as they arethe fastest but the designs are much more complex. Hence,when regularity, high performance & low power are primaryconcerns, Booth Multipliers tend to be the primary choice.Apparently, together with the utilization <strong>of</strong> Booth<strong>multiplier</strong> approach, ripple carry adder as the adder and PIPOshift register as the accumulator, this <strong>MAC</strong> design canenhance the <strong>MAC</strong> <strong>unit</strong> speed so as to ga<strong>in</strong> better systemperformance. The product <strong>of</strong> Xi x Yi is always fed back <strong>in</strong>tothe 17-bit accumulator and then added aga<strong>in</strong> with the nextproduct Xi x Yi. This <strong>MAC</strong> <strong>unit</strong> is capable <strong>of</strong> multiply<strong>in</strong>g andadd<strong>in</strong>g with previous product consecutively. Hence, output =


International Journal <strong>of</strong> Applied Eng<strong>in</strong>eer<strong>in</strong>g Research, ISSN 0973-4562 Vol.7 No.11 (2012)© Research India Publications; http://www.ripublication.com/ijaer.htmΣ Xi Yi. The design <strong>of</strong> 8x8 <strong>multiplier</strong> <strong>unit</strong> is carried out thatcan perform accumulation on 17 bit number. This <strong>MAC</strong> <strong>unit</strong>has 17 bit output and its operation is to add repeatedly themultiplication results. The total design area can be <strong>in</strong>spectedby observ<strong>in</strong>g the total count <strong>of</strong> transistors. Several otherparameters can be calculated as well. Figure 1 shows the basicblock diagram <strong>of</strong> the <strong>MAC</strong> <strong>unit</strong>.The equation for <strong>MAC</strong> operation can be given as:a= a + (bxc) (1)where a is an accumulator register,b is the <strong>multiplier</strong> & c is the multiplicand.Equation (1) depicts the basic operation <strong>of</strong> the <strong>MAC</strong> <strong>unit</strong>.Figure 2 depicts the block diagram <strong>of</strong> <strong>MAC</strong> <strong>unit</strong>consist<strong>in</strong>g <strong>of</strong> Booth <strong>multiplier</strong>, Ripple carry adder & PIPOshift register.Figure 1. Basic <strong>MAC</strong> <strong>unit</strong>OperationBasically a <strong>MAC</strong> <strong>unit</strong> employs a fast <strong>multiplier</strong> fitted <strong>in</strong> thedata path and the multiplied output <strong>of</strong> <strong>multiplier</strong> is fed <strong>in</strong>to afast adder which is set to zero <strong>in</strong>itially. The result <strong>of</strong> additionis stored <strong>in</strong> an accumulator register. The <strong>MAC</strong> <strong>unit</strong> should beable to produce output <strong>in</strong> one clock cycle and the new result <strong>of</strong>addition is added to the previous one and stored <strong>in</strong> theaccumulator register. A s<strong>in</strong>gle <strong>MAC</strong> <strong>unit</strong> has <strong>multiplier</strong>, adder,and accumulator. The most typical feature that differentiates aDSP from any General Purpose Processor is the Multiply andAccumulate <strong>unit</strong>.All DSP Algorithms would require some form <strong>of</strong> theMultiplication and Accumulation Operation. This is the mostimportant block <strong>in</strong> DSP systems. It is composed <strong>of</strong> an adder,<strong>multiplier</strong> and the accumulator. Usually adders implemented<strong>in</strong> DSPs are Ripple Carry Adders, Carry-Select or Carry-Saveadders, as speed is <strong>of</strong> utmost importance <strong>in</strong> a DSP. Basicallythe <strong>multiplier</strong> will multiply the <strong>in</strong>puts and give the results tothe adder, which will add the <strong>multiplier</strong> results to thepreviously accumulated results. This operation eases thecomputation <strong>of</strong> the most important formula i.e. b(n)x(n-k)which is needed <strong>in</strong> filters, Fourier analyzers, etc. The <strong>in</strong>putsfor the <strong>MAC</strong> are supposed to be fetched from some memorylocation and fed to the <strong>multiplier</strong> block <strong>of</strong> the <strong>MAC</strong>, whichwill perform multiplication and give the result to adder whichwill accumulate the result and then if needed will also storethe result <strong>in</strong>to a memory location. This entire process is to beachieved <strong>in</strong> a s<strong>in</strong>gle clock cycle. The <strong>MAC</strong> operation can beverified with the help <strong>of</strong> the simulation waveform.Figure 2. Block diagram <strong>of</strong> <strong>MAC</strong> <strong>unit</strong>Experimental ResultsThe VHDL code <strong>of</strong> <strong>MAC</strong> process is synthesized andsimulated <strong>us<strong>in</strong>g</strong> Xil<strong>in</strong>x ISE 6.2i. It is implemented onxc3s1000-5fg456 FPGA device.Table 1 shows the estimated values <strong>of</strong> the logic utilization<strong>of</strong> the devices used. Hence, the estimated values <strong>of</strong> thenumerous logic utilized <strong>in</strong> the design <strong>of</strong> the <strong>MAC</strong> <strong>unit</strong> can becalculated i.e. % utilization <strong>of</strong> no. <strong>of</strong> LUTs, slices,<strong>in</strong>put/output blocks, global clock etc.Figure 3 shows the RTL schematic <strong>of</strong> <strong>MAC</strong> Operation. It hastwo <strong>in</strong>put ports and two output ports.Figure 3. RTL view <strong>of</strong> <strong>MAC</strong> operation


International Journal <strong>of</strong> Applied Eng<strong>in</strong>eer<strong>in</strong>g Research, ISSN 0973-4562 Vol.7 No.11 (2012)© Research India Publications; http://www.ripublication.com/ijaer.htmTable 1. Device Utilization Summary (estimated values)Logic Utilization Used Available UtilizationNumber <strong>of</strong> Slices 172 7680 2%Number <strong>of</strong> Slice flip flops 17 15360 0%Number <strong>of</strong> 4 <strong>in</strong>put LUTs 313 15360 2%Number <strong>of</strong> bonded IOBs 33 333 9%Number <strong>of</strong> GCLKs 1 8 12%Figure 4 displays the simulation waveform <strong>of</strong> <strong>MAC</strong>operation.Article: Efficient Multiplier Architecture <strong>in</strong> VLSIDesign,” Journal <strong>of</strong> Theoretical and AppliedInformation Technology, vol. 38, no. 2, April 2012.[3] Ravi Shankar Mishra, Puran Gour, Braj Bihari Soni,“Design and Implements <strong>of</strong> Booth and Robertson’s<strong>multiplier</strong>s algorithm on FPGA,” International Journal<strong>of</strong> Eng<strong>in</strong>eer<strong>in</strong>g Research and Applications, Vol. 1,Issue 3, pp. 905-910, 2011.[4] Tung Thanh Hoang, Magnus Själander, Per Larsson-Edefors, “A High-Speed, Energy-Efficient Two-CycleMultiply-Accumulate (<strong>MAC</strong>) Architecture and ItsApplication to a Double-Throughput <strong>MAC</strong> Unit,”IEEE transactions on Circuits & Systems, vol. 57, no.12, pp. 3073-3081, Dec. 2010.[5] A. Abdelgawad, Magdy Bayoumi, “High Speed andArea-Efficient Multiply Accumulate (<strong>MAC</strong>) Unit forDigital Signal Pross<strong>in</strong>g Applications,” IEEEInternational Symposium on Circuits & Systems , pp.3199 – 3202, 2007.[6] Berkeley Design Technology, Inc., “Choos<strong>in</strong>g a DSPProcessor,” World Wide Web,http://www.bdti.com/articles/choose_2000.pdf, 2000.[7] Jennifer Eyre and Jeff Bier, “The Evolution <strong>of</strong> DSPProcessors”, Berkeley Design Technology,Inc.,http://www.bdti.com/articles/evolution.pdf, 2000.Figure 4. Simulation waveform <strong>of</strong> <strong>MAC</strong> operationConclusionThe <strong>MAC</strong> process is coded with VHDL and synthesized <strong>us<strong>in</strong>g</strong>Xil<strong>in</strong>x ISE 6.2i. The <strong>MAC</strong> process is implemented <strong>us<strong>in</strong>g</strong>xc3s1000-5fg456 FPGA Xil<strong>in</strong>x device. The synthesis results<strong>of</strong> the <strong>MAC</strong> <strong>unit</strong> have been calculated as can be seen <strong>in</strong> Table2. Here, same FPGA device (part number & speed grade) withthe same design constra<strong>in</strong>ts implied for the synthesis <strong>of</strong> the<strong>MAC</strong> <strong>unit</strong> has been targeted. This <strong>MAC</strong> <strong>unit</strong> is generallypreferred for simpler designs. The experimental test showsthat the results have been validated.Table 2 shows the synthesis results <strong>of</strong> the MultiplyAccumulate Unit.Table 2. Synthesis results <strong>of</strong> <strong>MAC</strong> <strong>unit</strong>FPGA Device PackageM<strong>in</strong>imum periodMaximum frequencyXc3s1000-5fg4564.484 nsMHzReferences[1] Pratap Kumar Dakua, Anamika S<strong>in</strong>ha, Shivdhari &Gourab,“Hardware <strong>Implementation</strong> <strong>of</strong> <strong>MAC</strong> <strong>unit</strong>,”International Journal <strong>of</strong> Electronics Communicationand Computer Eng<strong>in</strong>eer<strong>in</strong>g, vol. 3, 2012.[2] M.Jeevitha, R.Muthaiah, P.Swam<strong>in</strong>athan, “Review

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!