13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

GENERAL OPTIMIZATION GUIDELINES• On processors based on Intel NetBurst microarchitecture, the code size limit ofinterest is imposed by the trace cache. On Pentium M processors, the code sizelimit is governed by the instruction cache.• Dependencies for partial register writes incur large penalties when using thePentium M processor (this applies to processors with CPUID signature family 6,model 9). On Pentium 4, Intel Xeon processors, Pentium M processor (withCPUID signature family 6, model 13), such penalties are relieved by artificialdependencies between each partial register write. Intel Core Solo, Intel Core Duoprocessors <strong>and</strong> processors based on Intel Core microarchitecture can experienceminor delays due to partial register stalls. To avoid false dependences frompartial register updates, use full register updates <strong>and</strong> extended moves.• Use appropriate instructions that support dependence-breaking (PXOR, SUB,XOR instructions). Dependence-breaking support for XORPS is available in IntelCore Solo, Intel Core Duo processors <strong>and</strong> processors based on Intel Coremicroarchitecture.• Floating point register stack exchange instructions are slightly more expensivedue to issue restrictions in processors based on Intel NetBurst microarchitecture.• Hardware prefetching can reduce the effective memory latency for data <strong>and</strong>instruction accesses in general. But different microarchitectures may requiresome custom modifications to adapt to the specific hardware prefetch implementationof each microarchitecture.• On processors based on Intel NetBurst microarchitecture, latencies of someinstructions are relatively significant (including shifts, rotates, integer multiplies,<strong>and</strong> moves from memory with sign extension). Use care when using the LEAinstruction. See Section 3.5.1.3, “Using LEA.”• On processors based on Intel NetBurst microarchitecture, there may be a penaltywhen instructions with immediates requiring more than 16-bit signed representationare placed next to other instructions that use immediates.3.2.1 CPUID Dispatch Strategy <strong>and</strong> Compatible Code StrategyWhen optimum performance on all processor generations is desired, applications cantake advantage of the CPUID instruction to identify the processor generation <strong>and</strong>integrate processor-specific instructions into the source code. The Intel C++Compiler supports the integration of different versions of the code for different targetprocessors. The selection of which code to execute at runtime is made based on theCPU identifiers. Binary code targeted for different processor generations can begenerated under the control of the programmer or by the compiler.For applications that target multiple generations of microarchitectures, <strong>and</strong> whereminimum binary code size <strong>and</strong> single code path is important, a compatible codestrategy is the best. Optimizing applications using techniques developed for the IntelCore microarchitecture <strong>and</strong> combined with some for Intel NetBurst microarchitectureare likely to improve code efficiency <strong>and</strong> scalability when running on processorsbased on current <strong>and</strong> future generations of Intel <strong>64</strong> <strong>and</strong> <strong>IA</strong>-<strong>32</strong> processors. This3-4

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!