12.07.2015 Views

PGI User's Guide

PGI User's Guide

PGI User's Guide

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Chapter 3. Optimizing & ParallelizingVectorization Sub-optionsThe vectorizer performs high-level loop transformations on countable loops. A loop is countable if thenumber of iterations is set only before loop execution and cannot be modified during loop execution. Someof the vectorizer transformations can be controlled by arguments to the –Mvect command line option. Thefollowing sections describe the arguments that affect the operation of the vectorizer. In addition, some of thesevectorizer operations can be controlled from within code using directives and pragmas. For details on the useof directives and pragmas, refer to Chapter 8, “Using Directives and Pragmas”.The vectorizer performs the following operations:• Loop interchange• Loop splitting• Loop fusion• Memory-hierarchy (cache tiling) optimizations• Generation of SSE instructions on processors where these are supported• Generation of prefetch instructions on processors where these are supported• Loop iteration peeling to maximize vector alignment• Alternate code generationBy default, –Mvect without any sub-options is equivalent to:-Mvect=assoc,cachesize=cwhere c is the actual cache size of the machine.This enables the options for nested loop transformation and various other vectorizer options. These defaultsmay vary depending on the target system.Assoc OptionThe option –Mvect=assoc instructs the vectorizer to perform associativity conversions that can changethe results of a computation due to a round-off error (–Mvect=noassoc disables this option). Forexample, a typical optimization is to change one arithmetic operation to another arithmetic operation thatis mathematically correct, but can be computationally different and generate faster code. This option isprovided to enable or disable this transformation, since a round-off error for such associativity conversionsmay produce unacceptable results.Cachesize OptionThe option –Mvect=cachesize:n instructs the vectorizer to tile nested loop operations assuming a datacache size of n bytes. By default, the vectorizer attempts to tile nested loop operations, such as matrix multiply,using multi-dimensional strip-mining techniques to maximize re-use of items in the data cache.SSE OptionThe option –Mvect=sse instructs the vectorizer to automatically generate packed SSE (Streaming SIMDExtensions), SSE2, and prefetch instructions when vectorizable loops are encountered. SSE instructions, first31

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!