13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

USING PERFORMANCE MONITORING EVENTSoperations. These misses can impact performance if they do not occur in parallel toother instructions. In addition, if there are many stores in a row, some of themmissing the DTLB, it may cause stalls due to full store buffer.B.7.9Memory Sub-system - Core InteractionB.7.9.1Modified Data Sharing50. Modified Data Sharing Ratio: EXT_SNOOP.ALL_AGENTS.HITM /INST_RETIRED.ANYFrequent occurrences of modified data sharing may be due to two threads using <strong>and</strong>modifying data laid in one cache line. Modified data sharing causes L2 cache misses.When it happens unintentionally (aka false sharing) it usually causes dem<strong>and</strong> missesthat have high penalty. When false sharing is removed code performance c<strong>and</strong>ramatically improve.51. Local Modified Data Sharing Ratio: EXT_SNOOP.THIS_AGENT.HITM /INST_RETIRED.ANYModified Data Sharing Ratio indicates the amount of total modified data sharingobserved in the system. For systems with several processors you can use Local ModifiedData Sharing Ratio to indicates the amount of modified data sharing betweentwo cores in the same processor. (In systems with one processor the two ratios aresimilar).B.7.9.2Fast Synchronization Penalty52. Locked Operations Impact: (L1D_CACHE_LOCK_DURATION + 20 *L1D_CACHE_LOCK.MESI) / CPU_CLK_UNHALTED.CORE * 100Fast synchronization is frequently implemented using locked memory accesses. Ahigh value for Locked Operations Impact indicates that locked operations used in theworkload have high penalty. The latency of a locked operation depends on the locationof the data: L1 data cache, L2 cache, other core cache or memory.B.7.9.3Simultaneous Extensive Stores <strong>and</strong> Load Misses53. Store Block by Snoop Ratio: (STORE_BLOCK.SNOOP /CPU_CLK_UNHALTED.CORE) * 100A high value for “Store Block by Snoop Ratio” indicates that store operations arefrequently blocked <strong>and</strong> performance is reduced. This happens when one coreexecutes a dense stream of stores while the other core in the processor frequentlysnoops it for cache lines missing in its L1 data cache.B-60

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!