Implementation of MAC unit using booth multiplier ... - Gimt.edu.in

International Journal of Applied Engineering Research, ISSN 0973-4562 Vol.7 No.11 (2012)© Research India Publications; http://www.ripublication.com/ijaer.htmImplementation of MAC unit using booth multiplier & ripple carry adderSwati Malik and Sangeeta DhallDepartment of Electronics Engineering YMCAUST Faridabad, Indiaer.swatimalik@gmail.com sangeeta_dhall@yahoo.co.inAbstractDigital signal processing is the application of mathematicaloperations to digitally represented signals. DSP processorsshare basic features designed to support high-performance,repetitive, numerically intensive tasks. MAC is the mostimportant block in DSP system. High throughput multiplieraccumulator (MAC) is always a key element to achieve ahigh-performance digital signal processing application for realtime signal processing applications. This is because speed andthroughput rate are always the concerns of digital signalprocessing systems. This is because, the limited battery energyof these portable products restricts the power consumption ofthe system. The goal of this project is to design the VLSIimplementation of MAC for high-speed DSP applications. Fordesigning the MAC, various multipliers and adders areconsidered. The total operation is coded with VHDL,synthesized and simulated using Xillinx ISE 6.2i.Keywords— VHDL, Multipliers, Adders, Shift registers.IntroductionAs integrated circuit technology has improved to allow moreand more components on a chip, digital systems havecontinued to grow in complexity. As digital systems havebecome more complex, detailed design of the systems at thegate and flip-flop level has become very tedious and timeconsuming. For this reason, use of hardware descriptionlanguages in the digital design process continues to grow inimportance. A hardware description language allows a digitalsystem to be designed and debugged at a higher level beforeconversion to the gate and flip-flop level.DSP processors are microprocessors designed to performdigital signal processing—the mathematical manipulation ofdigitally represented signals. Digital signal processing is oneof the core technologies in rapidly growing application areassuch as wireless communications, audio and video processing,and industrial control. Digital signal processing (DSP)applications constitute the critical operations which usuallyinvolve many multiplications and accumulations. Hence, highthroughput multiplier accumulator (MAC) is always a keyelement to achieve a high-performance digital signalprocessing application for real time signal processingapplications.In the last few years, the main consideration of MACdesign has been to enhance its speed. This is because speedand throughput rate are always the concerns of digital signalprocessing systems. Due to the increase of portable electronicproducts, low power designs have also become majorconsiderations. This is because the limited battery energy ofthese portable products restricts the power consumption of thesystem. Therefore, the motivation behind this project is toinvestigate various pipelined MAC architectures and circuitand the design techniques which are suitable for theimplementation of high throughput signal processingalgorithms. The goal of this project is to design the VLSIimplementation of pipelined MAC for high-speed DSPapplications. For designing the MAC, various architectures ofmultipliers and adders are considered. The total process iscoded with VHDL to describe the hardware.Overview of mac unitMultiply-accumulate operation is one of the basic arithmeticoperations extensively used in modern digital signalprocessing (DSP). The MAC unit provides high-speedmultiplication, multiplication with cumulative addition,multiplication with cumulative subtraction, saturation, andclear-to-zero functions. Therefore, the main motivation is toinvestigate various pipelined MAC architectures and circuitand the design techniques which are suitable for theimplementation of high through put signal processingalgorithms.Hence, a high-speed MAC that is capable of supportingmultiple precisions and parallel operations is highly desirable.MAC is composed of an adder, multiplier and an accumulator.The implementation of the multiplier is in the form of BoothMultiplier. The adder used is Ripple Carry Adder. The layoutof this adder is simple which allows for fast design time. TheParallel in Parallel out (PIPO) shift register is used as theaccumulator. The inputs for the MAC are to be fetched frommemory location and fed to the multiplier block of the MAC,which will perform multiplication and give the result to theadder which will accumulate the result and then will store theresult into a memory location. This entire process is to beachieved in a single clock cycle in the architecture of theMAC unit. The MAC design consists of one 8-bit boothmultiplier, one 16-bit ripple carry adder & a 17-bitaccumulator using PIPO shift register. To multiply the valuesof A and B, Booth Multiplier is used instead of conventionalmultiplier because this is simple to design. However to gainbetter performance, parallel multipliers are used as they arethe fastest but the designs are much more complex. Hence,when regularity, high performance & low power are primaryconcerns, Booth Multipliers tend to be the primary choice.Apparently, together with the utilization of Boothmultiplier approach, ripple carry adder as the adder and PIPOshift register as the accumulator, this MAC design canenhance the MAC unit speed so as to gain better systemperformance. The product of Xi x Yi is always fed back intothe 17-bit accumulator and then added again with the nextproduct Xi x Yi. This MAC unit is capable of multiplying andadding with previous product consecutively. Hence, output =

International Journal of Applied Engineering Research, ISSN 0973-4562 Vol.7 No.11 (2012)© Research India Publications; http://www.ripublication.com/ijaer.htmΣ Xi Yi. The design of 8x8 multiplier unit is carried out thatcan perform accumulation on 17 bit number. This MAC unithas 17 bit output and its operation is to add repeatedly themultiplication results. The total design area can be inspectedby observing the total count of transistors. Several otherparameters can be calculated as well. Figure 1 shows the basicblock diagram of the MAC unit.The equation for MAC operation can be given as:a= a + (bxc) (1)where a is an accumulator register,b is the multiplier & c is the multiplicand.Equation (1) depicts the basic operation of the MAC unit.Figure 2 depicts the block diagram of MAC unitconsisting of Booth multiplier, Ripple carry adder & PIPOshift register.Figure 1. Basic MAC unitOperationBasically a MAC unit employs a fast multiplier fitted in thedata path and the multiplied output of multiplier is fed into afast adder which is set to zero initially. The result of additionis stored in an accumulator register. The MAC unit should beable to produce output in one clock cycle and the new result ofaddition is added to the previous one and stored in theaccumulator register. A single MAC unit has multiplier, adder,and accumulator. The most typical feature that differentiates aDSP from any General Purpose Processor is the Multiply andAccumulate unit.All DSP Algorithms would require some form of theMultiplication and Accumulation Operation. This is the mostimportant block in DSP systems. It is composed of an adder,multiplier and the accumulator. Usually adders implementedin DSPs are Ripple Carry Adders, Carry-Select or Carry-Saveadders, as speed is of utmost importance in a DSP. Basicallythe multiplier will multiply the inputs and give the results tothe adder, which will add the multiplier results to thepreviously accumulated results. This operation eases thecomputation of the most important formula i.e. b(n)x(n-k)which is needed in filters, Fourier analyzers, etc. The inputsfor the MAC are supposed to be fetched from some memorylocation and fed to the multiplier block of the MAC, whichwill perform multiplication and give the result to adder whichwill accumulate the result and then if needed will also storethe result into a memory location. This entire process is to beachieved in a single clock cycle. The MAC operation can beverified with the help of the simulation waveform.Figure 2. Block diagram of MAC unitExperimental ResultsThe VHDL code of MAC process is synthesized andsimulated using Xilinx ISE 6.2i. It is implemented onxc3s1000-5fg456 FPGA device.Table 1 shows the estimated values of the logic utilizationof the devices used. Hence, the estimated values of thenumerous logic utilized in the design of the MAC unit can becalculated i.e. % utilization of no. of LUTs, slices,input/output blocks, global clock etc.Figure 3 shows the RTL schematic of MAC Operation. It hastwo input ports and two output ports.Figure 3. RTL view of MAC operation

International Journal of Applied Engineering Research, ISSN 0973-4562 Vol.7 No.11 (2012)© Research India Publications; http://www.ripublication.com/ijaer.htmTable 1. Device Utilization Summary (estimated values)Logic Utilization Used Available UtilizationNumber of Slices 172 7680 2%Number of Slice flip flops 17 15360 0%Number of 4 input LUTs 313 15360 2%Number of bonded IOBs 33 333 9%Number of GCLKs 1 8 12%Figure 4 displays the simulation waveform of MACoperation.Article: Efficient Multiplier Architecture in VLSIDesign,” Journal of Theoretical and AppliedInformation Technology, vol. 38, no. 2, April 2012.[3] Ravi Shankar Mishra, Puran Gour, Braj Bihari Soni,“Design and Implements of Booth and Robertson’smultipliers algorithm on FPGA,” International Journalof Engineering Research and Applications, Vol. 1,Issue 3, pp. 905-910, 2011.[4] Tung Thanh Hoang, Magnus Själander, Per Larsson-Edefors, “A High-Speed, Energy-Efficient Two-CycleMultiply-Accumulate (MAC) Architecture and ItsApplication to a Double-Throughput MAC Unit,”IEEE transactions on Circuits & Systems, vol. 57, no.12, pp. 3073-3081, Dec. 2010.[5] A. Abdelgawad, Magdy Bayoumi, “High Speed andArea-Efficient Multiply Accumulate (MAC) Unit forDigital Signal Prossing Applications,” IEEEInternational Symposium on Circuits & Systems , pp.3199 – 3202, 2007.[6] Berkeley Design Technology, Inc., “Choosing a DSPProcessor,” World Wide Web,http://www.bdti.com/articles/choose_2000.pdf, 2000.[7] Jennifer Eyre and Jeff Bier, “The Evolution of DSPProcessors”, Berkeley Design Technology,Inc.,http://www.bdti.com/articles/evolution.pdf, 2000.Figure 4. Simulation waveform of MAC operationConclusionThe MAC process is coded with VHDL and synthesized usingXilinx ISE 6.2i. The MAC process is implemented usingxc3s1000-5fg456 FPGA Xilinx device. The synthesis resultsof the MAC unit have been calculated as can be seen in Table2. Here, same FPGA device (part number & speed grade) withthe same design constraints implied for the synthesis of theMAC unit has been targeted. This MAC unit is generallypreferred for simpler designs. The experimental test showsthat the results have been validated.Table 2 shows the synthesis results of the MultiplyAccumulate Unit.Table 2. Synthesis results of MAC unitFPGA Device PackageMinimum periodMaximum frequencyXc3s1000-5fg4564.484 nsMHzReferences[1] Pratap Kumar Dakua, Anamika Sinha, Shivdhari &Gourab,“Hardware Implementation of MAC unit,”International Journal of Electronics Communicationand Computer Engineering, vol. 3, 2012.[2] M.Jeevitha, R.Muthaiah, P.Swaminathan, “Review

Implementation of MAC unit using booth multiplier ... - Gimt.edu.in

Create successful ePaper yourself

Delete template?

Save as template?