<strong>Metrics</strong> <strong>for</strong> Per<strong>for</strong>mance <strong>Analysis</strong>3 The POWER7 PMUThe POWER7 processor has a built in Per<strong>for</strong>mance Monitoring Unit (PMU), which is designed to provideinstrumentation to aid in per<strong>for</strong>mance monitoring, workload characterization, system characterization codeanalysis.The PMU comprises of 6 thread level Per<strong>for</strong>mance Monitor Counters (PMC). PMC1 – PMC4 areprogrammable, PMC5 counts non idle completed instructions and PMC6 counts non idle cycles.Copyright ©2011 IBM Corporation Page 8 of 52
<strong>Metrics</strong> <strong>for</strong> Per<strong>for</strong>mance <strong>Analysis</strong>4 AIX Tools <strong>for</strong> Collecting PMC DataThe AIX tool hpmcount can be used to get CPI stack data <strong>for</strong> a complete workload. If the workload has arelatively uni<strong>for</strong>m profile and takes long enough (at least 1 CPU-minute per group is recommended – 11minutes <strong>for</strong> the complete set of CPI stack groups), hpmcount can be used to multiplex all CPI stack eventgroups in one run – the quickest way to get an idea of the events with the largest stalls.Once the critical events have been isolated, the tprof tool can be used to give profiling in<strong>for</strong>mation by event.By default, tprof will provide function level counts, but it can also do microprofiling of events.4.1 Profiling ToolsProfiling refers to charging CPU time to subroutines and micro-profiling refers to charging CPU time toinstructions. Profiling is frequently used in benchmarking and tuning activities to find out where the "hotspots"in a code and data, identify per<strong>for</strong>mance-sensitive areas, and identify problem instructions and data regions.Several tools are available <strong>for</strong> profiling in UNIX in general, and AIX offers additional tools.For many years, UNIX has included gprof, and this is also available in AIX.tprof is an AIX-only alternative which can provide profiling in<strong>for</strong>mation from the original binaries.4.1.1 Profiling with gprof/XprofilerTo get gprof-compatible output, first binaries need to be compiled and created with the added “-pg” option(additional options like optimization level, -On can also be added):xlc –pg –o myprog.exe myprog.corxlf –pg –o myprog.exe myprog.cWhen the program is executed, a gmon.out file is generated (or, <strong>for</strong> a parallel job. several gmon.out filesare generated, one per task). To get the human-readable profile, run:gprof myprog.exe gmon.out > myprog.gpro<strong>for</strong>gprof myprog.exe gmon.*.out > myprog.gprofTo get microprofiling in<strong>for</strong>mation, from gprof output, you need to use the Xprofiler tool.Full documentation <strong>for</strong> gprof can be found here4.1.2 Profiling with tprofDescription from Man PageThe tprof command reports CPU usage <strong>for</strong> individual programs and the system asa whole. This command is a useful tool <strong>for</strong> anyone with a JAVA, C, C++, orFORTRAN program that might be CPU-bound and who wants to know whichsections of the program are most heavily using the CPU. The tprof command cancharge CPU time to object files, processes, threads, subroutines (user mode,kernel mode and shared library) and even to source lines of programs or individualinstructions.tprof estimates the CPU time spent in a program, the kernel, a shared library etc. by sampling theinstructions every 10 milliseconds. When the sampling occurs a "tic" is applied to the components running atCopyright ©2011 IBM Corporation Page 9 of 52