13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

CODING FOR SIMD ARCHITECTURESTo use any of the SIMD technologies optimally, you must evaluate the following situationsin your code:• Fragments that are computationally intensive• Fragments that are executed often enough to have an impact on performance• Fragments that with little data-dependent control flow• Fragments that require floating-point computations• Fragments that can benefit from moving data 16 bytes at a time• Fragments of computation that can coded using fewer instructions• Fragments that require help in using the cache hierarchy efficiently4.2.1 Identifying Hot SpotsTo optimize performance, use the VTune Performance Analyzer to find sections ofcode that occupy most of the computation time. Such sections are called thehotspots. See Appendix A, “Application Performance Tools.”The VTune analyzer provides a hotspots view of a specific module to help you identifysections in your code that take the most CPU time <strong>and</strong> that have potential performanceproblems. The hotspots view helps you identify sections in your code that takethe most CPU time <strong>and</strong> that have potential performance problems.The VTune analyzer enables you to change the view to show hotspots by memorylocation, functions, classes, or source files. You can double-click on a hotspot <strong>and</strong>open the source or assembly view for the hotspot <strong>and</strong> see more detailed informationabout the performance of each instruction in the hotspot.The VTune analyzer offers focused analysis <strong>and</strong> performance data at all levels of yoursource code <strong>and</strong> can also provide advice at the assembly language level. The codecoach analyzes <strong>and</strong> identifies opportunities for better performance of C/C++, Fortran<strong>and</strong> Java* programs, <strong>and</strong> suggests specific optimizations. Where appropriate, thecoach displays pseudo-code to suggest the use of highly optimized intrinsics <strong>and</strong>functions in the Intel ® Performance Library Suite. Because VTune analyzer isdesigned specifically for Intel architecture (<strong>IA</strong>)-based processors, including thePentium 4 processor, it can offer detailed approaches to working with <strong>IA</strong>. SeeAppendix A.1.1, “Recommended <strong>Optimization</strong> Settings for Intel <strong>64</strong> <strong>and</strong> <strong>IA</strong>-<strong>32</strong> Processors,”for details.4.2.2 Determine If Code Benefits by Conversion to SIMD ExecutionIdentifying code that benefits by using SIMD technologies can be time-consuming<strong>and</strong> difficult. Likely c<strong>and</strong>idates for conversion are applications that are highly computationintensive, such as the following:• Speech compression algorithms <strong>and</strong> filters• Speech recognition algorithms4-6

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!