13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

APPLICATION PERFORMANCE TOOLSA.2.1.1Time-based SamplingTime-based sampling (TBS) uses an operating system’s (OS) timer to periodicallyinterrupt the processor to collect samples. The sampling interval is user definable.TBS is useful for identifying the software on your computer that is taking the mostCPU time. This feature is only available in the Windows version of the VTune AnalyzerA.2.1.2Event-based SamplingEvent-based sampling (EBS) can be used to provide detailed information on thebehavior of the microprocessor as it executes software. Some of the events that canbe used to trigger sampling include clockticks, cache misses, <strong>and</strong> branch mispredictions.The VTune analyzer indicates where micro architectural events, specific to theIntel Core microarchitecture, Pentium 4, Pentium M <strong>and</strong> Intel Xeon processors, occurthe most often. On processors based on Intel Core microarchitecture, it is possible tocollect up to 5 events (three events using fixed-function counters, two events usinggeneral-purpose counters) at a time from a list of over 400 events (see Appendix A,“Performance Monitoring Events” of Intel® <strong>64</strong> <strong>and</strong> <strong>IA</strong>-<strong>32</strong> <strong>Architectures</strong> SoftwareDeveloper’s <strong>Manual</strong>, Volume 3B). On Pentium M processors, the VTune analyzer cancollect two different events at a time. The number of the events that the VTuneanalyzer can collect at once on the Pentium 4 <strong>and</strong> Intel Xeon processor depends onthe events selected.Event-based samples are collected periodically after a specific number of processorevents have occurred while the program is running. The program is interrupted,allowing the interrupt h<strong>and</strong>ling driver to collect the Instruction Pointer (IP), loadmodule, thread <strong>and</strong> process ID's. The instruction pointer is then used to derive thefunction <strong>and</strong> source line number from the debug information created at compile time.The Data can be displayed as horizontal bar charts or in more detail as spread sheetsthat can be exported for further manipulation <strong>and</strong> easy dissemination.A.2.1.3Workload CharacterizationUsing event-based sampling <strong>and</strong> processor-specific events can provide usefulinsights into the nature of the interaction between a workload <strong>and</strong> the microarchitecture.A few metrics useful for workload characterization are discussed in Appendix B.The event lists available on various Intel processors can be found in Appendix A,“Performance Monitoring Events” of Intel® <strong>64</strong> <strong>and</strong> <strong>IA</strong>-<strong>32</strong> <strong>Architectures</strong> SoftwareDeveloper’s <strong>Manual</strong>, Volume 3B.A.2.2Call GraphCall graph helps you underst<strong>and</strong> the relationships between the functions in yourapplication by providing timing <strong>and</strong> caller/callee (functions called) information. Callgraph works by instrumenting the functions in your application. Instrumentation isthe process of modifying a function so that performance data can be captured whenthe function is executed. Instrumentation does not change the functionality of theA-11

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!