13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

CODING FOR SIMD ARCHITECTURES4.3.1 Coding MethodologiesSoftware developers need to compare the performance improvement that can beobtained from assembly code versus the cost of those improvements. Programmingdirectly in assembly language for a target platform may produce the required performancegain, however, assembly code is not portable between processor architectures<strong>and</strong> is expensive to write <strong>and</strong> maintain.Performance objectives can be met by taking advantage of the different SIMD technologiesusing high-level languages as well as assembly. The new C/C++ languageextensions designed specifically for SSSE3, SSE3, SSE2, SSE, <strong>and</strong> MMX technologyhelp make this possible.Figure 4-2 illustrates the trade-offs involved in the performance of h<strong>and</strong>-codedassembly versus the ease of programming <strong>and</strong> portability.PerformanceAssemblyInstrinsicsC/C++/FortranAutomaticVectorizationEase of Programming/PortabilityFigure 4-2. H<strong>and</strong>-Coded Assembly <strong>and</strong> High-Level Compiler Performance Trade-offsThe examples that follow illustrate the use of coding adjustments to enable the algorithmto benefit from the SSE. The same techniques may be used for single-precisionfloating-point, double-precision floating-point, <strong>and</strong> integer data under SSSE3, SSE3,SSE2, SSE, <strong>and</strong> MMX technology.4-8

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!