13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

GENERAL OPTIMIZATION GUIDELINESproceed with fixed latencies. General guidelines to make use of the available parallelismare:• Follow the rules (see Section 3.4) to maximize useful decode b<strong>and</strong>width <strong>and</strong> frontend throughput. These rules include favouring single μop instructions <strong>and</strong> takingadvantage of micro-fusion, Stack pointer tracker <strong>and</strong> macro-fusion.• Maximize rename b<strong>and</strong>width. Guidelines are discussed in this section <strong>and</strong> includeproperly dealing with partial registers, ROB read ports <strong>and</strong> instructions whichcauses side-effects on flags.• Scheduling recommendations on sequences of instructions so that multipledependency chains are alive in the reservation station (RS) simultaneously, thusensuring that your code utilizes maximum parallelism.• Avoid hazards, minimize delays that may occur in the execution core, allowingthe dispatched μops to make progress <strong>and</strong> be ready for retirement quickly.3.5.1 Instruction SelectionSome execution units are not pipelined, this means that μops cannot be dispatchedin consecutive cycles <strong>and</strong> the throughput is less than one per cycle.It is generally a good starting point to select instructions by considering the numberof μops associated with each instruction, favoring in the order of: single-μop instructions,simple instruction with less then 4 μops, <strong>and</strong> last instruction requiring microsequencerROM (μops which are executed out of the microsequencer involve extraoverhead).Assembly/Compiler Coding Rule 28. (M impact, H generality) Favor singlemicro-operationinstructions. Also favor instruction with shorter latencies.A compiler may be already doing a good job on instruction selection. If so, user interventionusually is not necessary.Assembly/Compiler Coding Rule 29. (M impact, L generality) Avoid prefixes,especially multiple non-0F-prefixed opcodes.Assembly/Compiler Coding Rule 30. (M impact, L generality) Do not usemany segment registers.On the Pentium M processor, there is only one level of renaming of segment registers.Assembly/Compiler Coding Rule 31. (ML impact, M generality) Avoid usingcomplex instructions (for example, enter, leave, or loop) that have more than fourµops <strong>and</strong> require multiple cycles to decode. Use sequences of simple instructionsinstead.Complex instructions may save architectural registers, but incur a penalty of 4 µops toset up parameters for the microsequencer ROM in Intel NetBurst microarchitecture.Theoretically, arranging instructions sequence to match the 4-1-1-1 template appliesto processors based on Intel Core microarchitecture. However, with macro-fusion<strong>and</strong> micro-fusion capabilities in the front end, attempts to schedule instructionsequences using the 4-1-1-1 template will likely provide diminishing returns.3-25

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!