13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

USING PERFORMANCE MONITORING EVENTSFrequent modifications to the Floating-Point Control Word (FPCW) might significantlydecrease performance. The main reason for changing FPCW is for changing roundingmode when doing FP to integer conversions.B.7.5Memory Sub-System - Access Conflicts RatiosA high value for Load or Store Buffer Full Ratio (Ratio 4) indicates that the load bufferor store buffer are frequently full, hence new micro-ops cannot enter the executionpipeline. This can reduce execution parallelism <strong>and</strong> decrease performance.30. Load Rate: L1D_CACHE_LD.MESI / CPU_CLK_UNHALTED.COREOne memory read operation can be served by a core each cycle. A high “Load Rate”indicates that execution may be bound by memory read operations.31. Store Order Block: STORE_BLOCK.ORDER / CPU_CLK_UNHALTED.CORE * 100Store Order Block ratio is the percentage of cycles that store operations, which missthe L2 cache, block committing data of later stores to the memory sub-system. Thisbehavior can further cause the store buffer to fill up (see Ratio 4).B.7.5.1Loads Blocked by the L1 Data Cache<strong>32</strong>. Loads Blocked by L1 Data Cache Rate:LOAD_BLOCK.L1D/CPU_CLK_UNHALTED.COREA high value for “Loads Blocked by L1 Data Cache Rate” indicates that load operationsare blocked by the L1 data cache due to lack of resources, usually happening asa result of many simultaneous L1 data cache misses.B.7.5.24K Aliasing <strong>and</strong> Store Forwarding Block Detection33. Loads Blocked by Overlapping Store Rate:LOAD_BLOCK.OVERLAP_STORE/CPU_CLK_UNHALTED.CORE4K aliasing <strong>and</strong> store forwarding block are two different scenarios in which loads areblocked by preceding stores due to different reasons. Both scenarios are detected bythe same event: LOAD_BLOCK.OVERLAP_STORE. A high value for “Loads Blocked byOverlapping Store Rate” indicates that either 4K aliasing or store forwarding blockmay affect performance.B.7.5.3Load Block by Preceding Stores34. Loads Blocked by Unknown Store Address Rate: LOAD_BLOCK.STA /CPU_CLK_UNHALTED.COREA high value for “Loads Blocked by Unknown Store Address Rate” indicates that loadsare frequently blocked by preceding stores with unknown address <strong>and</strong> implies performancepenalty.B-56

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!