13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

GENERAL OPTIMIZATION GUIDELINESThe Intel C++ Compiler supports vectorization in three ways:• The compiler may be able to generate SIMD code without intervention from theuser.• The can user insert pragmas to help the compiler realize that it can vectorize thecode.• The user can write SIMD code explicitly using intrinsics <strong>and</strong> C++ classes.To help enable the compiler to generate SIMD code, avoid global pointers <strong>and</strong> globalvariables. These issues may be less troublesome if all modules are compiled simultaneously,<strong>and</strong> whole-program optimization is used.User/Source Coding Rule 2. (H impact, M generality) Use the smallestpossible floating-point or SIMD data type, to enable more parallelism with the useof a (longer) SIMD vector. For example, use single precision instead of doubleprecision where possible..User/Source Coding Rule 3. (M impact, ML generality) Arrange the nesting ofloops so that the innermost nesting level is free of inter-iteration dependencies.Especially avoid the case where the store of data in an earlier iteration happenslexically after the load of that data in a future iteration, something which is called alexically backward dependence..The integer part of the SIMD instruction set extensions cover 8-bit,16-bit <strong>and</strong> <strong>32</strong>-bitoper<strong>and</strong>s. Not all SIMD operations are supported for <strong>32</strong> bits, meaning that somesource code will not be able to be vectorized at all unless smaller oper<strong>and</strong>s are used.User/Source Coding Rule 4. (M impact, ML generality) Avoid the use ofconditional branches inside loops <strong>and</strong> consider using SSE instructions to eliminatebranches.User/Source Coding Rule 5. (M impact, ML generality) Keep induction (loop)variable expressions simple.3.5.4 <strong>Optimization</strong> of Partially Vectorizable CodeFrequently, a program contains a mixture of vectorizable code <strong>and</strong> some routinesthat are non-vectorizable. A common situation of partially vectorizable code involvesa loop structure which include mixtures of vectorized code <strong>and</strong> unvectorizable code.This situation is depicted in Figure 3-1.3-39

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!