13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

GENERAL OPTIMIZATION GUIDELINESneeded to wait decreases with larger offset until offset of 96 resolves the issue (asthere is no pending stores by the time of the load with same address).The Intel Core microarchitecture provides a performance monitoring event (seeLOAD_BLOCK.OVERLAP_STORE in Intel® <strong>64</strong> <strong>and</strong> <strong>IA</strong>-<strong>32</strong> <strong>Architectures</strong> SoftwareDeveloper’s <strong>Manual</strong>, Volume 3B) that allows software tuning effort to detect theoccurrence of aliasing conditions.Example 3-38. Aliasing Between Loads <strong>and</strong> Stores Across Loop IterationsLP:movaps xmm0, [eax+ecx]movaps [edx+ecx], xmm0add ecx, 16jnz lp3.6.8 Mixing Code <strong>and</strong> DataThe aggressive prefetching <strong>and</strong> pre-decoding of instructions by Intel processors havetwo related effects:• Self-modifying code works correctly, according to the Intel architecture processorrequirements, but incurs a significant performance penalty. Avoid self-modifyingcode if possible.• Placing writable data in the code segment might be impossible to distinguishfrom self-modifying code. Writable data in the code segment might suffer thesame performance penalty as self-modifying code.Assembly/Compiler Coding Rule 56. (M impact, L generality) If (hopefullyread-only) data must occur on the same page as code, avoid placing it immediatelyafter an indirect jump. For example, follow an indirect jump with its mostly likelytarget, <strong>and</strong> place the data after an unconditional branch.Tuning Suggestion 1. In rare cases, a performance problem may be caused byexecuting data on a code page as instructions. This is very likely to happen whenexecution is following an indirect branch that is not resident in the trace cache. Ifthis is clearly causing a performance problem, try moving the data elsewhere, orinserting an illegal opcode or a PAUSE instruction immediately after the indirectbranch. Note that the latter two alternatives may degrade performance in somecircumstances.3-63

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!