13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

OPTIMIZING FOR SIMD FLOATING-POINT APPLICATIONS• Which part of the code benefits from SIMD floating-point instructions?• Is the current algorithm the most appropriate for SIMD floating-point instructions?• Is the code floating-point intensive?• Do either single-precision floating-point or double-precision floating-pointcomputations provide enough range <strong>and</strong> precision?• Does the result of computation affected by enabling flush-to-zero or denormalsto-zeromodes?• Is the data arranged for efficient utilization of the SIMD floating-point registers?• Is this application targeted for processors without SIMD floating-point instructions?See also: Section 4.2, “Considerations for Code Conversion to SIMD Programming.”6.3 USING SIMD FLOATING-POINT WITH X87 FLOATING-POINTBecause the XMM registers used for SIMD floating-point computations are separateregisters <strong>and</strong> are not mapped to the existing x87 floating-point stack, SIMD floatingpointcode can be mixed with x87 floating-point or <strong>64</strong>-bit SIMD integer code.With Intel Core microarchitecture, 128-bit SIMD integer instructions providessubstantially higher efficiency than <strong>64</strong>-bit SIMD integer instructions. Software shouldfavor using SIMD floating-point <strong>and</strong> integer instructions with XMM registers wherepossible.6.4 SCALAR FLOATING-POINT CODEThere are SIMD floating-point instructions that operate only on the least-significantoper<strong>and</strong> in the SIMD register. These instructions are known as scalar instructions.They allow the XMM registers to be used for general-purpose floating-point computations.In terms of performance, scalar floating-point code can be equivalent to or exceedx87 floating-point code <strong>and</strong> has the following advantages:• SIMD floating-point code uses a flat register model, whereas x87 floating-pointcode uses a stack model. Using scalar floating-point code eliminates the need touse FXCH instructions. These have performance limits on the Intel Pentium 4processor.• Mixing with MMX technology code without penalty.• Flush-to-zero mode.• Shorter latencies than x87 floating-point.6-2

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!