13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

CODING FOR SIMD ARCHITECTURESmance of applications using this methodology can approach that of one using theintrinsics. Further details on the use of these classes can be found in the Intel C++Class Libraries for SIMD Operations User’s Guide, order number 693500.Example 4-9 shows the C++ code using a vector class library. The example assumesthe arrays passed to the routine are already aligned to 16-byte boundaries.Example 4-9. C++ Code Using the Vector Classes#include void add(float *a, float *b, float *c){F<strong>32</strong>vec4 *av=(F<strong>32</strong>vec4 *) a;F<strong>32</strong>vec4 *bv=(F<strong>32</strong>vec4 *) b;F<strong>32</strong>vec4 *cv=(F<strong>32</strong>vec4 *) c;*cv=*av + *bv;}Here, fvec.h is the class definition file <strong>and</strong> F<strong>32</strong>vec4 is the class representing an arrayof four floats. The “+” <strong>and</strong> “=” operators are overloaded so that the actual StreamingSIMD Extensions implementation in the previous example is abstracted out, orhidden, from the developer. Note how much more this resembles the original code,allowing for simpler <strong>and</strong> faster programming.Again, the example is assuming the arrays, passed to the routine, are alreadyaligned to 16-byte boundary.4.3.1.4 Automatic VectorizationThe Intel C++ Compiler provides an optimization mechanism by which loops, such asin Example 4-6 can be automatically vectorized, or converted into Streaming SIMDExtensions code. The compiler uses similar techniques to those used by aprogrammer to identify whether a loop is suitable for conversion to SIMD. Thisinvolves determining whether the following might prevent vectorization:• The layout of the loop <strong>and</strong> the data structures used• Dependencies amongst the data accesses in each iteration <strong>and</strong> across iterationsOnce the compiler has made such a determination, it can generate vectorized codefor the loop, allowing the application to use the SIMD instructions.The caveat to this is that only certain types of loops can be automatically vectorized,<strong>and</strong> in most cases user interaction with the compiler is needed to fully enable this.4-12

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!