12.07.2015 Views

Commonly Used Metrics for Performance Analysis - Power.org

Commonly Used Metrics for Performance Analysis - Power.org

Commonly Used Metrics for Performance Analysis - Power.org

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Metrics</strong> <strong>for</strong> Per<strong>for</strong>mance <strong>Analysis</strong>1 Per<strong>for</strong>mance Event Data <strong>for</strong> Application OptimizationFirst, this paper briefly covers the POWER7 execution pipeline and the PMU hardware. Then it introducessome AIX and Linux tools that can be used to collect hardware events. Finally, the paper discusses severaluseful sets of metrics.The first step in optimizing an application is characterizing how well the application runs on a POWER7system. The fundamental intensive metric used to characterize the per<strong>for</strong>mance of any givenprogram/workload is CPI (Cycles Per Instruction) – the average number of clock cycles (or fractions of acycle) needed to complete an instruction. CPI is best understood as a relative quantity. Lower is better, butthat assumes that useful work is being done. For a given set of calculations (an execution path), the lower theCPI, the more effectively the processor hardware is being kept busy. Note that the CPI is a measure ofprocessor per<strong>for</strong>mance, “How busy is the system hardware?” which is a narrower question than “Can aprogram be sped up?”The CPI stack (also referred to as a “CPI stall analysis”) hierarchically breaks down the CPI based on whatthe execution pipeline is doing (or not doing) at any given cycle on a per-hardware-thread basis. It is used toanswer “What are the main front-end and back-end delays encountered while executing?”The CPI stack uses data from the PMU (Per<strong>for</strong>mance Monitoring Unit) hardware in the POWER7 chip.Focusing on the core (and not the “nest” – the subsystems that transfer data to and from memory), dataaccess accounting is simplified – either the data is found in L1 cache or it’s not (and there is a processingdelay). And per<strong>for</strong>mance data <strong>for</strong> disk I/O (and other “slow” hardware interrupts like networking) are excluded.Many other metrics are also useful in both characterizing how well an application runs on a POWER7 systemand how efficiently the application uses the available hardware resources. These include metrics <strong>for</strong> memorybandwidth, L1 cache instruction and data behavior, branch prediction, data locality, address translation,flushes and read-claim machines. While not an exhaustive list, these metrics do cover several common areasof concern.Copyright ©2011 IBM Corporation Page 5 of 52

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!