13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

GENERAL OPTIMIZATION GUIDELINESAssembly/Compiler Coding Rule 20. (M impact, ML generality) Software canenable macro fusion when it can be logically determined that a variable is nonnegativeat the time of comparison; use TEST appropriately to enable macro-fusionwhen comparing a variable with 0.Example 3-13. Macro-fusion, Signed VariableWithout Macro-fusionWith Macro-fusiontest ecx, ecxtest ecx, ecxjle OutSideTheIFjle OutSideTheIFcmp ecx, <strong>64</strong>Hcmp ecx, <strong>64</strong>Hjge OutSideTheIFjae OutSideTheIFOutSideTheIF:OutSideTheIF:For either signed or unsigned variable ‘a’; “CMP a,0” <strong>and</strong> “TEST a,a” produce thesame result as far as the flags are concerned. Since TEST can be macro-fused moreoften, software can use “TEST a,a” to replace “CMP a,0” for the purpose of enablingmacro-fusion.Example 3-14. Macro-fusion, Signed ComparisonC Code Without Macro-fusion With Macro-fusionif (a == 0) cmp a, 0jne lbl...lbl:if ( a >= 0) cmp a, 0jl lbl;...lbl:test a, ajne lbl...lbl:test a, ajl lbl...lbl:3.4.2.3 Length-Changing Prefixes (LCP)The length of an instruction can be up to 15 bytes in length. Some prefixes c<strong>and</strong>ynamically change the length of an instruction that the decoder must recognize.Typically, the pre-decode unit will estimate the length of an instruction in the bytestream assuming the absence of LCP. When the predecoder encounters an LCP in thefetch line, it must use a slower length decoding algorithm. With the slower lengthdecoding algorithm, the predecoder decodes the fetch in 6 cycles, instead of theusual 1 cycle. Normal queuing throughout of the machine pipeline generally cannothide LCP penalties.The prefixes that can dynamically change the length of a instruction include:• oper<strong>and</strong> size prefix (0x66)• address size prefix (0x67)3-21

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!