13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

GENERAL OPTIMIZATION GUIDELINESThese recommendations are approximate. They can vary depending on coding style,application domain, <strong>and</strong> other factors.The purpose of the high, medium, <strong>and</strong> low (H, M, <strong>and</strong> L) priorities is to suggest therelative level of performance gain one can expect if a recommendation is implemented.Because it is not possible to predict the frequency of a particular code instance inapplications, priority hints cannot be directly correlated to application-level performancegain. In cases in which application-level performance gain has been observed,we have provided a quantitative characterization of the gain (for information only).In cases in which the impact has been deemed inapplicable, no priority is assigned.3.4 OPTIMIZING THE FRONT ENDOptimizing the front end covers two aspects:• Maintaining steady supply of μops to the execution engine — Mispredictedbranches can disrupt streams of μops, or cause the execution engine to wasteexecution resources on executing streams of μops in the non-architected codepath. Much of the tuning in this respect focuses on working with the BranchPrediction Unit. Common techniques are covered in Section 3.4.1, “BranchPrediction <strong>Optimization</strong>.”• Supplying streams of μops to utilize the execution b<strong>and</strong>width <strong>and</strong> retirementb<strong>and</strong>width as much as possible — For Intel Core microarchitecture <strong>and</strong> Intel CoreDuo processor family, this aspect focuses maintaining high decode throughput.In Intel NetBurst microarchitecture, this aspect focuses on keeping the TraceCache operating in stream mode. Techniques to maximize decode throughput forIntel Core microarchitecture are covered in Section 3.4.2, “Fetch <strong>and</strong> Decode<strong>Optimization</strong>.”3.4.1 Branch Prediction <strong>Optimization</strong>Branch optimizations have a significant impact on performance. By underst<strong>and</strong>ingthe flow of branches <strong>and</strong> improving their predictability, you can increase the speed ofcode significantly.<strong>Optimization</strong>s that help branch prediction are:• Keep code <strong>and</strong> data on separate pages. This is very important; see Section 3.6,“Optimizing Memory Accesses,” for more information.• Eliminate branches whenever possible.• Arrange code to be consistent with the static branch prediction algorithm.• Use the PAUSE instruction in spin-wait loops.• Inline functions <strong>and</strong> pair up calls <strong>and</strong> returns.3-6

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!