13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

USING PERFORMANCE MONITORING EVENTScount of 1 per <strong>64</strong> bytes for earlier processors <strong>and</strong> for the FSB IOQ (Thisgranularity may change in future implementations).• Retries — If the chipset requests a retry, the FSB IOQ allocations get one countper retry.There are two noteworthy cases where there may be BSQ allocations without FSBIOQ allocations. The first is UC reads <strong>and</strong> writes to the local XAPIC registers. Second,if a cache line is evicted from the 2nd-level cache but it hits in the on-die 3rd-levelcache, then a BSQ entry is allocated but no FSB transaction is necessary, <strong>and</strong> therewill be no allocation in the FSB IOQ. The difference in the number of write transactionsof the writeback (WB) memory type for the FSB IOQ <strong>and</strong> the BSQ can be anindication of how often this happens. It is less likely to occur for applications withpoor locality of writes to the 3rd-level cache, <strong>and</strong> of course cannot happen when no3rd-level cache is present.B.2.3Usage Notes for Specific MetricsThe difference between the metrics “Read from the processor” <strong>and</strong> “Reads nonprefetchfrom the processor” is nominally the number of hardware prefetches.The paragraphs below cover several performance metrics that are based on thePentium 4 processor performance-monitoring event “BSQ_cache_rerference”. Themetrics are:• 2nd-Level Cache Read Misses• 2nd-Level Cache Read <strong>Reference</strong>s• 3rd-Level Cache Read Misses• 3rd-Level Cache Read <strong>Reference</strong>s• 2nd-Level Cache Reads Hit Shared• 2nd-Level Cache Reads Hit Modified• 2nd-Level Cache Reads Hit Exclusive• 3rd-Level Cache Reads Hit Shared• 3rd-Level Cache Reads Hit Modified• 3rd-Level Cache Reads Hit ExclusiveThese metrics based on BSQ_cache_reference may be useful as an indicator of therelative effectiveness of the 2nd-level cache, <strong>and</strong> the 3rd-level cache if present. Butdue to the current implementation of BSQ_cache_reference in Pentium 4 <strong>and</strong> IntelXeon processors, they should not be used to calculate cache hit rates or cache missrates. The following three paragraphs describe some of the issues related toBSQ_cache_reference, so that its results can be better interpreted.Current implementations of the BSQ_cache_reference event do not distinguishbetween programmatic read <strong>and</strong> write misses. Programmatic writes that miss mustget the rest of the cache line <strong>and</strong> merge the new data. Such a request is called a readfor ownership (RFO). To the “BSQ_cache_reference” hardware, both a programmaticB-33

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!