12.07.2015 Views

Commonly Used Metrics for Performance Analysis - Power.org

Commonly Used Metrics for Performance Analysis - Power.org

Commonly Used Metrics for Performance Analysis - Power.org

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Metrics</strong> <strong>for</strong> Per<strong>for</strong>mance <strong>Analysis</strong>Figure 2-2 A simplified view of the execution pipelineGroupTagit 0it 1 Finishit 2 Indicatorsit 3itag4Example entryFetchDecodeGlobal Completion TableCompletion UnitCreate Group completion entry withGroup tag and Instruction tag in programdDispatchIssueUnitFinishInterfaceFinishInterface setsAppropriatefinish bits to 1based on gtag/itagfrom unitsFinish report fromUnit Gtag and itagLoad store unitFixed point unitFloating point unitGtag and itagGtag and itagGtag and itagBranch UnitGtag and itagCompletion LogicComplete oldest group ofinstructions if all finish indicatorsare ‘1’CompleteThe stages are:1. FETCH/DECODE – instructions are fetched from the instruction cache.2. DISPATCH – instructions are placed into groups (of up to 6 instructions) and sent to distributed issuequeues.a. An entry <strong>for</strong> the instruction group is made in the Global Completion Table (GCT) which tracks everygroup that has been dispatched and is still executing somewhere in the core.3. ISSUE – instructions (up to 8 at a time) are sent from the issue queues to their target functional units (e.g.LSU, VSX unit).4. FINISH – Instructions that were dispatched in order can execute and finish out of order from anyfunctional unit. Up to 8 internal operations can finish in a cycle.5. COMPLETION – an instruction group is marked complete when all of its member instructions havefinished executing. A completed group is deallocated from the GCT.Generally speaking, the CPI stack apportions the total CPU (compute cycle) time between three places in theexecution pipeline on a per-thread basis:1. Cycles where an instruction group completed.A group can contain one to six PPC instructions. When they have all finished, the group entry in theGlobal Completion Table is removed and the next group is eligible <strong>for</strong> completion. These cycles areassociated with the COMPLETION stage.2. Cycles where the GCT is empty.In this case no new instructions were dispatched and the pipeline is empty <strong>for</strong> that thread. Thesecycles are associated with the DISPATCH stage. These are also referred to as front-end delays.3. Cycles where there are groups present in the GCT, but no group has completed. These arecompletion stall cycles and most optimization work concerns finding what type and where significantamounts of stall cycles occur. These cycles are associated with the EXECUTE stage. These are alsoreferred to as back-end delays.Copyright ©2011 IBM Corporation Page 7 of 52

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!