13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

USING PERFORMANCE MONITORING EVENTSB.7.6.2L1 Data Cache Misses37. L1 Data Cache Miss Rate: L1D_REPL / INST_RETIRED.ANYA high value for L1 Data Cache Miss Rate indicates that the code misses the L1 datacache too often <strong>and</strong> pays the penalty of accessing the L2 cache. See also LoadsBlocked by L1 Data Cache Rate (Ratio <strong>32</strong>).You can count separately cache misses due to loads, stores, <strong>and</strong> locked operationsusing the events L1D_CACHE_LD.I_STATE, L1D_CACHE_ST.I_STATE, <strong>and</strong>L1D_CACHE_LOCK.I_STATE, accordingly.B.7.6.3L2 Cache Misses38. L2 Cache Miss Rate: L2_LINES_IN.SELF.ANY / INST_RETIRED.ANYA high L2 Cache Miss Rate indicates that the running workload has a data set largerthan the L2 cache. Some of the data might be evicted without being used. Unless allthe required data is brought ahead of time by the hardware prefetcher or softwareprefetching instructions, bringing data from memory has a significant impact on theperformance.39. L2 Cache Dem<strong>and</strong> Miss Rate: L2_LINES_IN.SELF.DEMAND / INST_RETIRED.ANYA high value for L2 Cache Dem<strong>and</strong> Miss Rate indicates that the hardware prefetchersare not exploited to bring the data this workload needs. Data is brought frommemory when needed to be used <strong>and</strong> the workload bears memory latency for eachsuch access.B.7.7Memory Sub-system - PrefetchingB.7.7.1L1 Data PrefetchingThe event L1D_PREFETCH.REQUESTS is counted whenever the DCU attempts toprefetch cache lines from the L2 (or memory) to the DCU. If you expect the DCUprefetchers to work <strong>and</strong> to count this event, but instead you detect the eventMEM_LOAD_RETIRE.L1D_MISS, it might be that the IP prefetcher suffers from loadinstruction address collision of several loads.B.7.7.2L2 Hardware PrefetchingWith the event L2_LD.SELF.PREFETCH.MESI you can count the number of prefetchrequests that were made to the L2 by the L2 hardware prefetchers. The actualnumber of cache lines prefetched to the L2 is counted by the eventL2_LD.SELF.PREFETCH.I_STATE.B-58

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!