13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

INTEL® <strong>64</strong> AND <strong>IA</strong>-<strong>32</strong> PROCESSOR ARCHITECTURESSometimes a modified cache line has to be evicted to make space for a new cacheline. The modified cache line is evicted in parallel to bringing the new data <strong>and</strong> doesnot require additional latency. However, when data is written back to memory, theeviction uses cache b<strong>and</strong>width <strong>and</strong> possibly bus b<strong>and</strong>width as well. Therefore, whenmultiple cache misses require the eviction of modified lines within a short time, thereis an overall degradation in cache response time.2.1.5.2 StoresWhen an instruction writes data to a memory location that has WB memory type, theprocessor first ensures that the line is in Exclusive or Modified state in its own DCU.The processor looks for the cache line in the following locations, in the specifiedorder:1. DCU of initiating core2. DCU of the other core <strong>and</strong> L2 cache3. System memoryThe cache line is taken from the DCU of the other core only if it is modified, ignoringthe cache line availability or state in the L2 cache. After reading for ownership iscompleted, the data is written to the first-level data cache <strong>and</strong> the line is marked asmodified.Reading for ownership <strong>and</strong> storing the data happens after instruction retirement <strong>and</strong>follows the order of retirement. Therefore, the store latency does not effect the storeinstruction itself. However, several sequential stores may have cumulative latencythat can affect performance. Table 2-4 presents store latencies depending on theinitial cache line location.2.2 INTEL NETBURST ® MICROARCHITECTUREThe Pentium 4 processor, Pentium 4 processor Extreme Edition supporting Hyper-Threading Technology, Pentium D processor, <strong>and</strong> Pentium processor Extreme Editionimplement the Intel NetBurst microarchitecture. Intel Xeon processors that implementIntel NetBurst microarchitecture can be identified using CPUID (familyencoding 0FH).This section describes the features of the Intel NetBurst microarchitecture <strong>and</strong> itsoperation common to the above processors. It provides the technical backgroundrequired to underst<strong>and</strong> optimization recommendations <strong>and</strong> the coding rulesdiscussed in the rest of this manual. For implementation details, including instructionlatencies, see Appendix C, “Instruction Latency <strong>and</strong> Throughput.”Intel NetBurst microarchitecture is designed to achieve high performance for integer<strong>and</strong> floating-point computations at high clock rates. It supports the followingfeatures:• hyper-pipelined technology that enables high clock rates2-19

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!