13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

OPTIMIZING FOR SIMD INTEGER APPLICATIONSExample 5-25. Big-Endian to Little-Endian Conversion Using PSHUFB__declspec(align(16)) BYTE bswapMASK[16] ={3,2,1,0, 7,6,5,4, 11,10,9,8, 15,14,13,12};lea eax, srclea ecx, dstmov edx, elCountmovaps xmm7, bswapMASKstart:movdqa xmm0, [eax]pshufb xmm0, xmm7movdqa [ecx], xmm0add eax, 16add ecx, 16sub edx, 4jnz start5.6.6 Clipping to an Arbitrary Range [High, Low]This section explains how to clip a values to a range [HIGH, LOW]. Specifically, if thevalue is less than LOW or greater than HIGH, then clip to LOW or HIGH, respectively.This technique uses the packed-add <strong>and</strong> packed-subtract instructions with saturation(signed or unsigned), which means that this technique can only be used on packedbyte<strong>and</strong> packed-word data types.The examples in this section use the constants PACKED_MAX <strong>and</strong> PACKED_MIN <strong>and</strong>show operations on word values. For simplicity, we use the following constants(corresponding constants are used in case the operation is done on byte values):PACKED_MAX equals 0X7FFF7FFF7FFF7FFFPACKED_MIN equals 0X8000800080008000PACKED_LOW contains the value LOW in all four words of the packed-words data typePACKED_HIGH contains the value HIGH in all four words of the packed-words data typePACKED_USMAX all values equal 1HIGH_US adds the HIGH value to all data elements (4 words) of PACKED_MINLOW_US adds the LOW value to all data elements (4 words) of PACKED_MIN5.6.6.1 Highly Efficient ClippingFor clipping signed words to an arbitrary range, the PMAXSW <strong>and</strong> PMINSW instructionsmay be used. For clipping unsigned bytes to an arbitrary range, the PMAXUB<strong>and</strong> PMINUB instructions may be used.5-25

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!