13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

OPTIMIZING FOR SIMD INTEGER APPLICATIONSExample 5-28. Simplified Clipping to an Arbitrary Signed Range; Input: MM0 signed source oper<strong>and</strong>s; Output: MM1 signed oper<strong>and</strong>s clipped to the unsigned; range [high, low]paddssw mm0, (packed_max - packed_high); in effect this clips to highpsubssw mm0, (packed_usmax - packed_high + packed_ow); clips to lowpaddw mm0, low ; undo the previous two offsetsThis algorithm saves a cycle when it is known that (High - Low) >= 0x8000. Thethree-instruction algorithm does not work when (High - Low) < 0x8000 because0xffff minus any number < 0x8000 will yield a number greater in magnitude than0x8000 (which is a negative number).When the second instruction, psubssw MM0, (0xffff - High + Low) in the three-stepalgorithm (Example 5-28) is executed, a negative number is subtracted. The resultof this subtraction causes the values in MM0 to be increased instead of decreased, asshould be the case, <strong>and</strong> an incorrect answer is generated.5.6.6.2 Clipping to an Arbitrary Unsigned Range [High, Low]Example 5-29 clips an unsigned value to the unsigned range [High, Low]. If the valueis less than low or greater than high, then clip to low or high, respectively. This techniqueuses the packed-add <strong>and</strong> packed-subtract instructions with unsigned saturation,thus the technique can only be used on packed-bytes <strong>and</strong> packed-words datatypes.Figure 5-29 illustrates operation on word values.Example 5-29. Clipping to an Arbitrary Unsigned Range [High, Low]; Input:; MM0 unsigned source oper<strong>and</strong>s; Output:; MM1 unsigned oper<strong>and</strong>s clipped to the unsigned; range [HIGH, LOW]paddusw mm0, 0xffff - high; in effect this clips to highpsubusw mm0, (0xffff - high + low); in effect this clips to lowpaddw mm0, low; undo the previous two offsets5-27

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!