13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

APPLICATION PERFORMANCE TOOLSprogram. However, it can reduce performance. The VTune analyzer can detectmodules as they are loaded by the operating system, <strong>and</strong> instrument them at runtime.Call graph can be used to profile Win<strong>32</strong>*, Java*, <strong>and</strong> Microsoft.NET* applications.Call graph only works for application (ring 3) software.Call graph profiling provides the following information on the functions called by yourapplication: total time, self-time, total wait time, wait time, callers, callees, <strong>and</strong> thenumber of calls. This data is displayed using three different views: functionsummary, call graph, <strong>and</strong> call list. These views are all synchronized.The Function Summary View can be used to focus the data displayed in the call graph<strong>and</strong> call list views. This view displays all the information about the functions called byyour application in a sortable table format. However, it does not provide callee <strong>and</strong>caller information. It just provides timing information <strong>and</strong> number of times a functionis called.The Call Graph View depicts the caller/callee relationships. Each thread in the applicationis the root of a call tree. Each node (box) in the call tree represents a function.Each edge (line with an arrow) connecting two nodes represents the call from theparent to the child function. If the mouse pointer is hovered over a node, a tool tipwill pop up displaying the function's timing information.The Call List View is useful for analyzing programs with large, complex call trees.This view displays only the caller <strong>and</strong> callee information for the single function thatyou select in the Function Summary View. The data is displayed in a table format.A.2.3Counter MonitorCounter monitor helps you identify system level performance bottlenecks. It periodicallypolls software <strong>and</strong> hardware performance counters. The performance counterdata can help you underst<strong>and</strong> how your application is impacting the performance ofthe computer's various subsystems. Counter monitor data can be displayed in realtime<strong>and</strong> logged to a file. The VTune analyzer can also correlate performance counterdata with sampling data. This feature is only available in the Windows version of theVTune AnalyzerA.3 INTEL ® PERFORMANCE LIBRARIESThe Intel Performance Library family contains a variety of specialized libraries whichhas been optimized for performance on Intel processors. These optimizations takeadvantage of appropriate architectural features, including MMX technology,Streaming SIMD Extensions (SSE), Streaming SIMD Extensions 2 (SSE2) <strong>and</strong>Streaming SIMD Extensions 3 (SSE3). The library set includes the Intel Math KernelLibrary (MKL) <strong>and</strong> the Intel Integrated Performance Primitives (IPP).• The Intel Math Kernel Library for Linux <strong>and</strong> Windows: MKL is composed of highlyoptimized mathematical functions for engineering, scientific <strong>and</strong> financial applicationsrequiring high performance on Intel platforms. The functional areas of theA-12

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!