12.07.2015 Views

PGI User's Guide

PGI User's Guide

PGI User's Guide

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Chapter 3. Optimizing & ParallelizingThe remainder of this chapter describes the –0 options, the loop unroller option –Munroll, the vectorizeroption –Mvect, the auto-parallelization option –Mconcur, the interprocedural analysis optimization –Mipa,and the profile-feedback instrumentation (–Mpfi) and optimization (–Mpfo) options. You should be able toget very near optimal compiled performance using some combination of these switches.Common Compiler Feedback Format (CCFF)Using the Common Compiler Feedback Format (CCFF), <strong>PGI</strong> compilers save information about how yourprogram was optimized, or why a particular optimization was not made, in the executable file. To append thisinformation to the object file, use the compiler option–Minfo=ccff.If you choose to use PGPROF to aid with your optimization, PGPROF can extract this information and associateit with source code and other performance data, allowing you to view all of this information simultaneously inone of the available profiler panels.Local and Global Optimization using -OUsing the <strong>PGI</strong> compiler commands with the –Olevel option (the capital O is for Optimize), you can specify anyof the following optimization levels:–O0Level zero specifies no optimization. A basic block is generated for each language statement.–O1Level one specifies local optimization. Scheduling of basic blocks is performed. Register allocation isperformed.–O2Level two specifies global optimization. This level performs all level-one local optimization as well as leveltwoglobal optimization. If optimization is specified on the command line without a level, level 2 is thedefault.–O3Level three specifies aggressive global optimization. This level performs all level-one and level-twooptimizations and enables more aggressive hoisting and scalar replacement optimizations that may or maynot be profitable.–O4Level four performs all level-one, level-two, and level-three optimizations and enables hoisting of guardedinvariant floating point expressions.NoteIf you use the –O option to specify optimization and do not specify a level, then level-two optimization(–O2) is the default.Level-zero optimization specifies no optimization (–O0). At this level, the compiler generates a basic block foreach statement. Performance will almost always be slowest using this optimization level. This level is useful forthe initial execution of a program. It is also useful for debugging, since there is a direct correlation betweenthe program text and the code generated.27

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!