13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

SUMMARY OF RULES AND SUGGESTIONSthat any code transformations made do not introduce problems withoverflow. ........................................................................................... 3-30Assembler/Compiler Coding Rule 41. (H impact, MH generality) For smallloops, placing loop invariants in memory is better than spilling loop-carrieddependencies. .................................................................................... 3-<strong>32</strong>Assembler/Compiler Coding Rule 42. (M impact, ML generality) Avoidintroducing dependences with partial floating point register writes, e.g. from theMOVSD XMMREG1, XMMREG2 instruction. Use the MOVAPD XMMREG1, XMMREG2instruction instead. ............................................................................. 3-38Assembler/Compiler Coding Rule 43. (ML impact, L generality) Instead ofusing MOVUPD XMMREG1, MEM for a unaligned 128-bit load, use MOVSDXMMREG1, MEM; MOVSD XMMREG2, MEM+8; UNPCKLPD XMMREG1, XMMREG2.If the additional register is not available, then use MOVSD XMMREG1, MEM;MOVHPD XMMREG1, MEM+8................................................................. 3-38Assembler/Compiler Coding Rule 44. (M impact, ML generality) Instead ofusing MOVUPD MEM, XMMREG1 for a store, use MOVSD MEM, XMMREG1;UNPCKHPD XMMREG1, XMMREG1; MOVSD MEM+8, XMMREG1 instead....... 3-38Assembler/Compiler Coding Rule 45. (H impact, H generality) Align data onnatural oper<strong>and</strong> size address boundaries. If the data will be accessed with vectorinstruction loads <strong>and</strong> stores, align the data on 16-byte boundaries. ........... 3-48Assembler/Compiler Coding Rule 46. (H impact, M generality) Passparameters in registers instead of on the stack where possible. Passingarguments on the stack requires a store followed by a reload. While this sequenceis optimized in hardware by providing the value to the load directly from thememory order buffer without the need to access the data cache if permitted bystore-forwarding restrictions, floating point values incur a significant latency inforwarding. Passing floating point arguments in (preferably XMM) registers shouldsave this long latency operation. ........................................................... 3-50Assembler/Compiler Coding Rule 47. (H impact, M generality) A load thatforwards from a store must have the same address start point <strong>and</strong> therefore thesame alignment as the store data. ........................................................ 3-52Assembler/Compiler Coding Rule 48. (H impact, M generality) The data of aload which is forwarded from a store must be completely contained within thestore data. ......................................................................................... 3-52Assembler/Compiler Coding Rule 49. (H impact, ML generality) If it isnecessary to extract a non-aligned portion of stored data, read out the smallestaligned portion that completely contains the data <strong>and</strong> shift/mask the data asnecessary. This is better than incurring the penalties of a failedstore-forward. .................................................................................... 3-52Assembler/Compiler Coding Rule 50. (MH impact, ML generality) Avoidseveral small loads after large stores to the same area of memory by using asingle large read <strong>and</strong> register copies as needed....................................... 3-52Assembler/Compiler Coding Rule 51. (H impact, MH generality) Where it ispossible to do so without incurring other penalties, prioritize the allocation ofE-5

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!