12.07.2015 Views

PGI User's Guide

PGI User's Guide

PGI User's Guide

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Vectorization using –MvectAssume the preceding program is compiled as follows, where -Mvect=nosse disables SSE vectorization:% pgfortran -fast -Mvect=nosse -Minfo vadd.fvector_op:4, Loop unrolled 4 timesloop:18, Loop unrolled 4 timesThe following output shows a sample result if the generated executable is run and timed on a standalone AMDOpteron 2.2 Ghz system:% /bin/time vadd-1.000000 -771.000 -3618.000 -6498.00 -9999.005.39user 0.00system 0:05.40elapsed 99%CPNow, recompile with SSE vectorization enabled, and you see results similar to these:% pgfortran -fast -Minfo vadd.f -o vaddvector_op:4, Unrolled inner loop 8 timesLoop unrolled 7 times (completely unrolled)loop:18, Generated 4 alternate loops for the inner loopGenerated vector sse code for inner loopGenerated 3 prefetch instructions for this loopNotice the informational message for the loop at line 18.• The first two lines of the message indicate that the loop was vectorized, SSE instructions were generated,and four alternate versions of the loop were also generated. The loop count and alignments of the arraysdetermine which of these versions is executed.• The last line of the informational message indicates that prefetch instructions have been generated for threeloads to minimize latency of data transfers from main memory.Executing again, you should see results similar to the following:% /bin/time vadd-1.000000 -771.000 -3618.00 -6498.00-9999.03.59user 0.00system 0:03.59elapsed 100%CPUThe result is a 50% speed-up over the equivalent scalar, that is, the non-SSE, version of the program.Speed-up realized by a given loop or program can vary widely based on a number of factors:• When the vectors of data are resident in the data cache, performance improvement using vector SSE or SSE2instructions is most effective.• If data is aligned properly, performance will be better in general than when using vector SSE operations onunaligned data.• If the compiler can guarantee that data is aligned properly, even more efficient sequences of SSEinstructions can be generated.34

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!