13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

GENERAL OPTIMIZATION GUIDELINESFor example, loading data from memory with MOVSS or MOVSD causes an extramicro-op for zeroing the upper part of the XMM register.On Pentium M, Intel Core Solo, <strong>and</strong> Intel Core Duo processors, this penalty can beavoided by using MOVLPD. However, using MOVLPD causes a performance penalty onPentium 4 processors.Another situation occurs when mixing single-precision <strong>and</strong> double-precision code. Onprocessors based on Intel NetBurst microarchitecture, using CVTSS2SD has performancepenalty relative to the alternative sequence:XORPS XMM1, XMM1MOVSS XMM1, XMM2CVTPS2PD XMM1, XMM1On Intel Core Solo <strong>and</strong> Intel Core Duo processors, using CVTSS2SD is more desirablethan the alternative sequence.3.8.4.2 x87 Floating-point Operations with Integer Oper<strong>and</strong>sFor processors based on Intel NetBurst microarchitecture, splitting floating-pointoperations (F<strong>IA</strong>DD, FISUB, FIMUL, <strong>and</strong> FIDIV) that take 16-bit integer oper<strong>and</strong>s intotwo instructions (FILD <strong>and</strong> a floating-point operation) is more efficient. However, forfloating-point operations with <strong>32</strong>-bit integer oper<strong>and</strong>s, using F<strong>IA</strong>DD, FISUB, FIMUL,<strong>and</strong> FIDIV is equally efficient compared with using separate instructions.Assembly/Compiler Coding Rule <strong>64</strong>. (M impact, L generality) Try to use<strong>32</strong>-bit oper<strong>and</strong>s rather than 16-bit oper<strong>and</strong>s for FILD. However, do not do so at theexpense of introducing a store-forwarding problem by writing the two halves of the<strong>32</strong>-bit memory oper<strong>and</strong> separately.3.8.4.3 x87 Floating-point Comparison InstructionsThe FCOMI <strong>and</strong> FCMOV instructions should be used when performing x87 floatingpointcomparisons. Using the FCOM, FCOMP, <strong>and</strong> FCOMPP instructions typicallyrequires additional instruction like FSTSW. The latter alternative causes more μops tobe decoded, <strong>and</strong> should be avoided.3.8.4.4 Transcendental FunctionsIf an application needs to emulate math functions in software for performance orother reasons (see Section 3.8.1, “Guidelines for Optimizing Floating-point Code”), itmay be worthwhile to inline math library calls because the CALL <strong>and</strong> theprologue/epilogue involved with such calls can significantly affect the latency ofoperations.Note that transcendental functions are supported only in x87 floating point, not inStreaming SIMD Extensions or Streaming SIMD Extensions 2.3-86

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!