13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

GENERAL OPTIMIZATION GUIDELINESSoftware can maximize memory performance by not exceeding the issue or bufferinglimitations of the machine. In the Intel Core microarchitecture, only 20 stores <strong>and</strong> <strong>32</strong>loads may be in flight at once. Since only one load can issue per cycle, algorithmswhich operate on two arrays are constrained to one operation every other cycleunless you use programming tricks to reduce the amount of memory usage.Intel NetBurst microarchitecture has the same number of store buffers, slightly moreload buffers <strong>and</strong> similar throughput of issuing load operations. Intel Core Duo <strong>and</strong>Intel Core Solo processors have less buffers. Nevertheless the general heuristicapplies to all of them.3.6.2 Enhance Speculative Execution <strong>and</strong> Memory DisambiguationPrior to Intel Core microarchitecture, when code contains both stores <strong>and</strong> loads, theloads cannot be issued before the address of the store is resolved. This rule ensurescorrect h<strong>and</strong>ling of load dependencies on preceding stores.The Intel Core microarchitecture contains a mechanism that allows some loads to beissued early speculatively. The processor later checks if the load address overlapswith a store. If the addresses do overlap, then the processor re-executes the instructions.Example 3-27 illustrates a situation that the compiler cannot be sure that “Ptr->Array” does not change during the loop. Therefore, the compiler cannot keep “Ptr->Array” in a register as an invariant <strong>and</strong> must read it again in every iteration.Although this situation can be fixed in software by a rewriting the code to require theaddress of the pointer is invariant, memory disambiguation provides performancegain without rewriting the code.Example 3-27. Loads Blocked by Stores of Unknown AddressC codeAssembly sequencestruct AA {AA ** array;};void nullify_array ( AA *Ptr, DWORD Index,AA *ThisPtr ){while ( Ptr->Array[--Index] != ThisPtr ){Ptr->Array[Index] = NULL ;} ;} ;nullify_loop:mov dword ptr [eax], 0mov edx, dword ptr [edi]sub ecx, 4cmp dword ptr [ecx+edx], esilea eax, [ecx+edx]jne nullify_loop3-47

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!