13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

USING PERFORMANCE MONITORING EVENTSA high value of Branch Misprediction Performance Impact ratio (Ratio 15) togetherwith high Virtual Table Misuse ratio indicate that significant time is spent due tomispredicted indirect function calls.In addition to explicit use of function pointers in C code, indirect calls are used forimplementing inheritance, abstract classes, <strong>and</strong> virtual methods in C++.B.7.3.3Mispredicted Returns19. Mispredicted Return Instruction Rate: BR_RET_MISSP_EXEC/BR_RET_EXECThe processor has a special mechanism that tracks CALL-RETURN pairs. Theprocessor assumes that every CALL instruction has a matching RETURN instruction.If a RETURN instruction restores a return address, which is not the one stored duringthe matching CALL, the code incurs a misprediction penalty.B.7.4Execution RatiosThis section covers event ratios that can provide insights to the interactions of microopswith RS, ROB, execution units, etc.B.7.4.1Resource StallsA high value for the RS Full Ratio (Ratio 2) indicates that the Reservation Station (RS)often gets full with μops due to long dependency chains. The μops that get into theRS cannot execute because they wait for their oper<strong>and</strong>s to be computed by previousμops, or they wait for a free execution unit to be executed. This prevents exploitingthe parallelism provided by the multiple execution units.A high value for the ROB Full Ratio (Ratio 3) indicates that the reorder buffer (ROB)often gets full with μops. This usually implies on long latency operations, such as L2cache dem<strong>and</strong> misses.B.7.4.2ROB Read Port Stalls20. ROB Read Port Stall Rate: RAT_STALLS.ROB_READ_PORT /CPU_CLK_UNHALTED.COREThe ratio ROB Read Port Stall Rate identifies ROB read port stalls. However it shouldbe used only if the number of resource stalls, as indicated by Resource Stall Ratio, islow.B.7.4.3Partial Register Stalls21. Partial Register Stalls Ratio: RAT_STALLS.PART<strong>IA</strong>L_CYCLES /CPU_CLK_UNHALTED.CORE*100B-54

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!