13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

GENERAL OPTIMIZATION GUIDELINESbefore it), <strong>and</strong> instruction 6 on 5 are broken. This creates two independent chains ofcomputation instead of one serial one.Example 3-18. Dependencies Caused by Referencing Partial Registers1: add ah, bh2: add al, 3 ; Instruction 2 has a false dependency on 13: mov bl, al ; depends on 2, but the dependence is real4: mov ah, ch ; Instruction 4 has a false dependency on 25: sar eax, 16 ; this wipes out the al/ah/ax part, so the; result really doesn't depend on them programatically,; but the processor must deal with real dependency on; al/ah/ax6: mov al, bl ; instruction 6 has a real dependency on 57: add ah, 13 ; instruction 7 has a false dependency on 68: imul dl ; instruction 8 has a false dependency on 7; because al is implicitly used9: mov al, 17 ; instruction 9 has a false dependency on 7; <strong>and</strong> a real dependency on 810: imul cx : implicitly uses ax <strong>and</strong> writes to dx, hence; a real dependencyExample 3-19 illustrates the use of MOVZX to avoid a partial register stall whenpacking three byte values into a register.Example 3-19. Avoiding Partial Register Stalls in Integer CodeA Sequence Causing PartialRegister Stallmov al, byte ptr a[2]shl eax,16mov ax, word ptr amovd mm0, eaxretAlternate Sequence UsingMOVZX to Avoid Delaymovzx eax, byte ptr a[2]shl eax, 16movzx ecx, word ptr aor eax,ecxmovd mm0, eaxret3.5.2.4 Partial XMM Register StallsPartial register stalls can also apply to XMM registers. The following SSE <strong>and</strong> SSE2instructions update only part of the destination register:MOVL/HPD XMM, MEM<strong>64</strong>MOVL/HPS XMM, MEM<strong>32</strong>3-35

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!