13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

GENERAL OPTIMIZATION GUIDELINESExample 3-30. Non-forwarding Example of Large Load After Small Storemov [EBP], ‘a’mov [EBP + 1], ‘b’mov [EBP + 2], ‘c’mov [EBP + 3], ‘d’mov EAX, [EBP] ; Blocked; The first 4 small store can be consolidated into; a single DWORD store to prevent this non-forwarding; situation.Example 3-31 illustrates a stalled store-forwarding situation that may appear incompiler generated code. Sometimes a compiler generates code similar to thatshown in Example 3-31 to h<strong>and</strong>le a spilled byte to the stack <strong>and</strong> convert the byte toan integer value.Example 3-31. A Non-forwarding Situation in Compiler Generated Codemov DWORD PTR [esp+10h], 00000000hmov BYTE PTR [esp+10h], blmov eax, DWORD PTR [esp+10h] ; Stall<strong>and</strong> eax, 0xff; Converting back to byte valueExample 3-<strong>32</strong> offers two alternatives to avoid the non-forwarding situation shown inExample 3-31.Example 3-<strong>32</strong>. Two Ways to Avoid Non-forwarding Situation in Example 3-31; A. Use MOVZ instruction to avoid large load after small; store, when spills are ignored.movz eax, bl; Replaces the last three instructions; B. Use MOVZ instruction <strong>and</strong> h<strong>and</strong>le spills to the stackmov DWORD PTR [esp+10h], 00000000hmov BYTE PTR [esp+10h], blmovz eax, BYTE PTR [esp+10h] ; Not blockedWhen moving data that is smaller than <strong>64</strong> bits between memory locations, <strong>64</strong>-bit or128-bit SIMD register moves are more efficient (if aligned) <strong>and</strong> can be used to avoidunaligned loads. Although floating-point registers allow the movement of <strong>64</strong> bits at atime, floating point instructions should not be used for this purpose, as data may beinadvertently modified.3-53

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!