13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

OPTIMIZING FOR SIMD INTEGER APPLICATIONS5.6.11 Complex Multiply by a ConstantComplex multiplication is an operation which requires four multiplications <strong>and</strong> twoadditions. This is exactly how the PMADDWD instruction operates. In order to usethis instruction, you need to format the data into multiple 16-bit values. The real <strong>and</strong>imaginary components should be 16-bits each. Consider Example 5-30, whichassumes that the <strong>64</strong>-bit MMX registers are being used:• Let the input data be DR <strong>and</strong> DI, where DR is real component of the data <strong>and</strong> DIis imaginary component of the data.• Format the constant complex coefficients in memory as four 16-bit values [CR -CI CI CR]. Remember to load the values into the MMX register using MOVQ.• The real component of the complex product is PR = DR*CR - DI*CI <strong>and</strong> theimaginary component of the complex product is PI = DR*CI + DI*CR.• The output is a packed doubleword. If needed, a pack instruction can be used toconvert the result to 16-bit (thereby matching the format of the input).Example 5-30. Complex Multiply by a Constant; Input:; MM0 complex value, Dr, Di; MM1 constant complex coefficient in the form; [Cr -Ci Ci Cr]; Output:; MM0 two <strong>32</strong>-bit dwords containing [Pr Pi];punpckldq mm0, mm0 ; makes [dr di dr di]pmaddwd mm0, mm1 ; done, the result is; [(Dr*Cr-Di*Ci)(Dr*Ci+Di*Cr)]5.6.12 Packed <strong>32</strong>*<strong>32</strong> MultiplyThe PMULUDQ instruction performs an unsigned multiply on the lower pair of doublewordoper<strong>and</strong>s within <strong>64</strong>-bit chunks from the two sources; the full <strong>64</strong>-bit result fromeach multiplication is returned to the destination register.This instruction is added in both a <strong>64</strong>-bit <strong>and</strong> 128-bit version; the latter performs 2independent operations, on the low <strong>and</strong> high halves of a 128-bit register.5.6.13 Packed <strong>64</strong>-bit Add/SubtractThe PADDQ/PSUBQ instructions add/subtract quad-word oper<strong>and</strong>s within each <strong>64</strong>-bitchunk from the two sources; the <strong>64</strong>-bit result from each computation is written tothe destination register. Like the integer ADD/SUB instruction, PADDQ/PSUBQ canoperate on either unsigned or signed (two’s complement notation) integer oper<strong>and</strong>s.5-30

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!