13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

OPTIMIZING FOR SIMD INTEGER APPLICATIONSWhen an individual result is too large to be represented in <strong>64</strong>-bits, the lower <strong>64</strong>-bitsof the result are written to the destination oper<strong>and</strong> <strong>and</strong> therefore the result wrapsaround. These instructions are added in both a <strong>64</strong>-bit <strong>and</strong> 128-bit version; the latterperforms 2 independent operations, on the low <strong>and</strong> high halves of a 128-bit register.5.6.14 128-bit ShiftsThe PSLLDQ/PSRLDQ instructions shift the first oper<strong>and</strong> to the left/right by thenumber of bytes specified by the immediate oper<strong>and</strong>. The empty low/high-orderbytes are cleared (set to zero).If the value specified by the immediate oper<strong>and</strong> is greater than 15, then the destinationis set to all zeros.5.7 MEMORY OPTIMIZATIONSYou can improve memory access using the following techniques:• Avoiding partial memory accesses• Increasing the b<strong>and</strong>width of memory fills <strong>and</strong> video fills• Prefetching data with Streaming SIMD Extensions. See Chapter 9, “OptimizingCache Usage.”MMX registers <strong>and</strong> XMM registers allow you to move large quantities of data withoutstalling the processor. Instead of loading single array values that are 8, 16, or <strong>32</strong> bitslong, consider loading the values in a single quadword or double quadword <strong>and</strong> thenincrementing the structure or array pointer accordingly.Any data that will be manipulated by SIMD integer instructions should be loadedusing either:• An SIMD integer instruction that loads a <strong>64</strong>-bit or 128-bit oper<strong>and</strong> (for example:MOVQ MM0, M<strong>64</strong>)• The register-memory form of any SIMD integer instruction that operates on aquadword or double quadword memory oper<strong>and</strong> (for example, PMADDW MM0,M<strong>64</strong>).All SIMD data should be stored using an SIMD integer instruction that stores a <strong>64</strong>-bitor 128-bit oper<strong>and</strong> (for example: MOVQ M<strong>64</strong>, MM0)The goal of the above recommendations is twofold. First, the loading <strong>and</strong> storing ofSIMD data is more efficient using the larger block sizes. Second, following the aboverecommendations helps to avoid mixing of 8-, 16-, or <strong>32</strong>-bit load <strong>and</strong> store operationswith SIMD integer technology load <strong>and</strong> store operations to the same SIMD data.This prevents situations in which small loads follow large stores to the same area ofmemory, or large loads follow small stores to the same area of memory. The5-31

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!