13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

APPLICATION PERFORMANCE TOOLSTable A-2. Recommended Processor <strong>Optimization</strong> Options for <strong>64</strong>-bit Code (Contd.)Need Recommendation Comments• /QaxP /QxW (-axP-xW on Linux)Best performance on otherprocessors supporting Intel <strong>64</strong>architecture, utilizing SSE3where possible, while stillrunning on older Intel as wellas non-Intel x86-<strong>64</strong> processorssupporting SSE2• Multiple code path aregenerated• Be sure to validate yourapplication on all systemswhere it may be deployed.A.1.2Vectorization <strong>and</strong> Loop <strong>Optimization</strong>The Intel C++ <strong>and</strong> Fortran Compiler’s vectorization feature can detect sequentialdata access by the same instruction <strong>and</strong> transforms the code to use SSE, SSE2,SSE3, <strong>and</strong> SSSE3, depending on the target processor platform. The vectorizersupports the following features:• Multiple data types: Float/double, char/short/int/long (both signed <strong>and</strong>unsigned), _Complex float/double are supported.• Step by step diagnostics: Through the /Qvec-report[n] (-vec-report[n] on Linux<strong>and</strong> Mac OS) switch (see Table A-3), the vectorizer can identify, line-by-line <strong>and</strong>variable-by-variable, what code was vectorized, what code was not vectorized,<strong>and</strong> more importantly, why it was not vectorized. This feedback gives thedeveloper the information necessary to slightly adjust or restructure code, withdependency directives <strong>and</strong> restrict keywords, to allow vectorization to occur.• Advanced dynamic data-alignment strategies: Alignment strategies include looppeeling <strong>and</strong> loop unrolling. Loop peeling can generate aligned loads, enablingfaster application performance. Loop unrolling matches the prefetch of a fullcache line <strong>and</strong> allows better scheduling.• Portable code: By using appropriate Intel compiler switches to take advantagenew processor features, developers can avoid the need to rewrite source code.The processor-specific vectorizer switch options are: -Qx[K,W, N, P, T] <strong>and</strong>-Qax[K,W, N, P, T]. The compiler provides a number of other vectorizer switchoptions that allow you to control vectorization. The latter switches require the -Qx[,K,W,N,P,T] or -Qax[K,W,N,P,T] switch to be on. The default is off.Table A-3. Vectorization Control Switch Options-Qvec_report[n] Controls the vectorizer’s diagnostic levels, where n is either 0, 1, 2, or 3.-QrestrictEnables pointer disambiguation with the restrict qualifier.A-4

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!