13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

GENERAL OPTIMIZATION GUIDELINESExample 3-3. Eliminating Branch with CMOV Instructiontest ecx, ecxjne 1Hmov eax, ebx1H:; To optimize code, combine jne <strong>and</strong> mov into one cmovcc instruction that checks the equal flagtest ecx, ecx ; Test the flagscmoveq eax, ebx ; If the equal flag is set, move; ebx to eax- the 1H: tag no longer neededThe CMOV <strong>and</strong> FCMOV instructions are available on the Pentium II <strong>and</strong> subsequentprocessors, but not on Pentium processors <strong>and</strong> earlier <strong>IA</strong>-<strong>32</strong> processors. Be sure tocheck whether a processor supports these instructions with the CPUID instruction.3.4.1.2 Spin-Wait <strong>and</strong> Idle LoopsThe Pentium 4 processor introduces a new PAUSE instruction; the instruction isarchitecturally a NOP on Intel <strong>64</strong> <strong>and</strong> <strong>IA</strong>-<strong>32</strong> processor implementations.To the Pentium 4 <strong>and</strong> later processors, this instruction acts as a hint that the codesequence is a spin-wait loop. Without a PAUSE instruction in such loops, the Pentium4 processor may suffer a severe penalty when exiting the loop because the processormay detect a possible memory order violation. Inserting the PAUSE instructionsignificantly reduces the likelihood of a memory order violation <strong>and</strong> as a resultimproves performance.In Example 3-4, the code spins until memory location A matches the value stored inthe register EAX. Such code sequences are common when protecting a criticalsection, in producer-consumer sequences, for barriers, or other synchronization.Example 3-4. Use of PAUSE Instructionlock:loop:cmp eax, ajne loop; Code in critical section:pausecmp eax, ajne loopjmp lock3.4.1.3 Static PredictionBranches that do not have a history in the BTB (see Section 3.4.1, “Branch Prediction<strong>Optimization</strong>”) are predicted using a static prediction algorithm. Pentium 4,3-9

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!