13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

GENERAL OPTIMIZATION GUIDELINESThese are all true NOPs, having no effect on the state of the machine except toadvance the EIP. Because NOPs require hardware resources to decode <strong>and</strong> execute,use the fewest number to achieve the desired padding.The one byte NOP:[XCHG EAX,EAX] has special hardware support. Although it stillconsumes a µop <strong>and</strong> its accompanying resources, the dependence upon the old valueof EAX is removed. This µop can be executed at the earliest possible opportunity,reducing the number of outst<strong>and</strong>ing instructions <strong>and</strong> is the lowest cost NOP.The other NOPs have no special hardware support. Their input <strong>and</strong> output registersare interpreted by the hardware. Therefore, a code generator should arrange to usethe register containing the oldest value as input, so that the NOP will dispatch <strong>and</strong>release RS resources at the earliest possible opportunity.Try to observe the following NOP generation priority:• Select the smallest number of NOPs <strong>and</strong> pseudo-NOPs to provide the desiredpadding.• Select NOPs that are least likely to execute on slower execution unit clusters.• Select the register arguments of NOPs to reduce dependencies.3.5.1.9 Mixing SIMD Data TypesPrevious microarchitectures (before Intel Core microarchitecture) do not haveexplicit restrictions on mixing integer <strong>and</strong> floating-point (FP) operations on XMMregisters. For Intel Core microarchitecture, mixing integer <strong>and</strong> floating-point operationson the content of an XMM register can degrade performance. Software shouldavoid mixed-use of integer/FP operation on XMM registers. Specifically,• Use SIMD integer operations to feed SIMD integer operations. Use PXOR foridiom.• Use SIMD floating point operations to feed SIMD floating point operations. UseXORPS for idiom.• When floating point operations are bitwise equivalent, use PS data type insteadof PD data type. MOVAPS <strong>and</strong> MOVAPD do the same thing, but MOVAPS takes oneless byte to encode the instruction.3.5.1.10 Spill SchedulingThe spill scheduling algorithm used by a code generator will be impacted by thememory subsystem. A spill scheduling algorithm is an algorithm that selects what3-31

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!