13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

OPTIMIZING FOR SIMD INTEGER APPLICATIONSthroughput <strong>and</strong> provides three ports to execute multiple SIMD instructions inparallel.• When writing SIMD code that works for both integer <strong>and</strong> floating-point data, usethe subset of SIMD convert instructions or load/store instructions to ensure thatthe input oper<strong>and</strong>s in XMM registers contain data types that are properly definedto match the instruction.Code sequences containing cross-typed usage produce the same result acrossdifferent implementations but incur a significant performance penalty. UsingSSE/SSE2/SSE3/SSSE3 instructions to operate on type-mismatched SIMD datain the XMM register is strongly discouraged.• Use the optimization rules <strong>and</strong> guidelines described in Chapter 3 <strong>and</strong> Chapter 4.• Take advantage of hardware prefetcher where possible. Use the PREFETCHinstruction only when data access patterns are irregular <strong>and</strong> prefetch distancecan be pre-determined. See Chapter 7, “Optimizing Cache Usage.”• Emulate conditional moves by using masked compares <strong>and</strong> logicals instead ofusing conditional branches.5.2 USING SIMD INTEGER WITH X87 FLOATING-POINTAll <strong>64</strong>-bit SIMD integer instructions use MMX registers, which share register statewith the x87 floating-point stack. Because of this sharing, certain rules <strong>and</strong> considerationsapply. Instructions using MMX registers cannot be freely intermixed with x87floating-point registers. Take care when switching between <strong>64</strong>-bit SIMD integerinstructions <strong>and</strong> x87 floating-point instructions. See Section 5.2.1, “Using the EMMSInstruction.”SIMD floating-point operations <strong>and</strong> 128-bit SIMD integer operations can be freelyintermixed with either x87 floating-point operations or <strong>64</strong>-bit SIMD integer operations.SIMD floating-point operations <strong>and</strong> 128-bit SIMD integer operations use registersthat are unrelated to the x87 FP / MMX registers. The EMMS instruction is notneeded to transition to or from SIMD floating-point operations or 128-bit SIMD operations.5.2.1 Using the EMMS InstructionWhen generating <strong>64</strong>-bit SIMD integer code, keep in mind that the eight MMX registersare aliased to x87 floating-point registers. Switching from MMX instructions tox87 floating-point instructions incurs a finite delay, so it is the best to minimizeswitching between these instruction types. But when switching, the EMMS instructionprovides an efficient means to clear the x87 stack so that subsequent x87 code canoperate properly.As soon as an instruction makes reference to an MMX register, all valid bits in the x87floating-point tag word are set, which implies that all x87 registers contain valid5-2

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!