13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

GENERAL OPTIMIZATION GUIDELINESvalues to spill to memory when there are too many live values to fit in registers.Consider the code in Example 3-17, where it is necessary to spill either A, B, or C.Example 3-17. Spill Scheduling CodeLOOPC := ...B := ...A := A + ...For modern microarchitectures, using dependence depth information in spill schedulingis even more important than in previous processors. The loop-carried dependencein A makes it especially important that A not be spilled. Not only would astore/load be placed in the dependence chain, but there would also be a data-notreadystall of the load, costing further cycles.Assembly/Compiler Coding Rule 41. (H impact, MH generality) For smallloops, placing loop invariants in memory is better than spilling loop-carrieddependencies.A possibly counter-intuitive result is that in such a situation it is better to put loopinvariants in memory than in registers, since loop invariants never have a loadblocked by store data that is not ready.3.5.2 Avoiding Stalls in Execution CoreAlthough the design of the execution core is optimized to make common casesexecutes quickly. A μop may encounter various hazards, delays, or stalls whilemaking forward progress from the front end to the ROB <strong>and</strong> RS. The significant casesare:• ROB Read Port Stalls• Partial Register <strong>Reference</strong> Stalls• Partial Updates to XMM Register Stalls• Partial Flag Register <strong>Reference</strong> Stalls3.5.2.1 ROB Read Port StallsAs a μop is renamed, it determines whether its source oper<strong>and</strong>s have executed <strong>and</strong>been written to the reorder buffer (ROB), or whether they will be captured “in flight”in the RS or in the bypass network. Typically, the great majority of source oper<strong>and</strong>sare found to be “in flight” during renaming. Those that have been written back to theROB are read through a set of read ports.Since the Intel Core Microarchitecture is optimized for the common case where theoper<strong>and</strong>s are “in flight”, it does not provide a full set of read ports to enable allrenamed μops to read all sources from the ROB in the same cycle.3-<strong>32</strong>

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!