13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

USING PERFORMANCE MONITORING EVENTS8. L2 Instruction Cache Line Miss Rate: L2_IFETCH.SELF.I_STATE /INST_RETIRED.ANYL2 Instruction Cache Line Miss Rate higher than zero indicates instruction cache linemisses from the L2 cache may have a noticeable performance impact of programperformance.B.7.2.2Branching <strong>and</strong> Front-end9. BACLEAR Performance Impact: 7 * BACLEARS / CPU_CLK_UNHALTED.COREA high value for BACLEAR Performance Impact ratio usually indicates that the codehas many branches such that they cannot be consumed by the Branch PredictionUnit.10. Taken Branch Bubble: (BR_TKN_BUBBLE_1+BR_TKN_BUBBLE_2) /CPU_CLK_UNHALTED.COREA high value for Taken Branch Bubble ratio indicates that the code contains manytaken branches coming one after the other <strong>and</strong> cause bubbles in the front-end. Thismay affect performance only if it is not covered by execution latencies <strong>and</strong> stalls laterin the pipe.B.7.2.3Stack Pointer Tracker11. ESP Synchronization: ESP.SYNCH / ESP.ADDITIONSThe ESP Synchronization ratio calculates the ratio of ESP explicit use (for example byload or store instruction) <strong>and</strong> implicit uses (for example by PUSH or POP instruction).The expected ratio value is 0.2 or lower. If the ratio is higher, consider rearrangingyour code to avoid ESP synchronization events.B.7.2.4Macro-fusion12. Macro-Fusion: UOPS_RETIRED.MACRO_FUSION / INST_RETIRED.ANYThe Macro-Fusion ratio calculates how many of the retired instructions were fused toa single micro-op. You may find this ratio is high for a <strong>32</strong>-bit binary executable butsignificantly lower for the equivalent <strong>64</strong>-bit binary, <strong>and</strong> the <strong>64</strong>-bit binary performsslower than the <strong>32</strong>-bit binary. A possible reason is the <strong>32</strong>-bit binary benefited frommacro-fusion significantly.B.7.2.5Length Changing Prefix (LCP) Stalls13. LCP Delays Detected: ILD_STALL / CPU_CLK_UNHALTED.COREA high value of the LCP Delays Detected ratio indicates that many Length ChangingPrefix (LCP) delays occur in the measured code.B-52

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!