13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

GENERAL OPTIMIZATION GUIDELINESStoreAlignmentTable 3-1. Store Forwarding Restrictions of ProcessorsBased on Intel Core Microarchitecture (Contd.)Width ofStore(bits)Load Alignment(byte)Width ofLoad (bits)StoreForwardingRestrictionTo Natural size <strong>64</strong> not qword aligned 8, 16 stalledTo Natural size <strong>64</strong> dword aligned <strong>32</strong> not stalledTo Natural size <strong>64</strong> not dword aligned <strong>32</strong> stalledTo Natural size 128 dqword aligned 8, 16, 128 not stalledTo Natural size 128 not dqword aligned 8, 16 stalledTo Natural size 128 dword aligned <strong>32</strong> not stalledTo Natural size 128 not dword aligned <strong>32</strong> stalledTo Natural size 128 qword aligned <strong>64</strong> not stalledTo Natural size 128 not qword aligned <strong>64</strong> stalledUnaligned, start byte 1 <strong>32</strong> byte 0 of store 8, 16, <strong>32</strong> not stalledUnaligned, start byte 1 <strong>32</strong> not byte 0 of store 8, 16 stalledUnaligned, start byte 1 <strong>64</strong> byte 0 of store 8, 16, <strong>32</strong> not stalledUnaligned, start byte 1 <strong>64</strong> not byte 0 of store 8, 16, <strong>32</strong> stalledUnaligned, start byte 1 <strong>64</strong> byte 0 of store <strong>64</strong> stalledUnaligned, start byte 7 <strong>32</strong> byte 0 of store 8 not stalledUnaligned, start byte 7 <strong>32</strong> not byte 0 of store 8 not stalledUnaligned, start byte 7 <strong>32</strong> don’t care 16, <strong>32</strong> stalledUnaligned, start byte 7 <strong>64</strong> don’t care 16, <strong>32</strong>, <strong>64</strong> stalled3.6.4.2 Store-forwarding Restriction on Data AvailabilityThe value to be stored must be available before the load operation can be completed.If this restriction is violated, the execution of the load will be delayed until the data isavailable. This delay causes some execution resources to be used unnecessarily, <strong>and</strong>that can lead to sizable but non-deterministic delays. However, the overall impact ofthis problem is much smaller than that from violating size <strong>and</strong> alignment requirements.In processors based on Intel NetBurst microarchitecture, hardware predicts whenloads are dependent on <strong>and</strong> get their data forwarded from preceding stores. Thesepredictions can significantly improve performance. However, if a load is scheduledtoo soon after the store it depends on or if the generation of the data to be stored isdelayed, there can be a significant penalty.3-55

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!