24.05.2014 Views

XL Fortran Enterprise Edition for AIX : User's Guide - IBM

XL Fortran Enterprise Edition for AIX : User's Guide - IBM

XL Fortran Enterprise Edition for AIX : User's Guide - IBM

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

v -qarch=auto generates code that may take advantage of instructions available<br />

only on the compiling machine (or similar machines).<br />

v To get sqrt optimization, you need to specify -qarch=ppc64grsq or another<br />

-qarch option that supports the square root instruction set.<br />

v Specifying a -qarch option that is not compatible with your hardware, even<br />

though your program appears to work, may cause undefined behaviour; the<br />

compiler may emit instructions not available on that hardware.<br />

Try to specify with -qtune the machine where per<strong>for</strong>mance should be best. If you<br />

are not sure, let the compiler determine how to best tune <strong>for</strong> optimization <strong>for</strong> a<br />

given -qarch setting.<br />

Be<strong>for</strong>e using the -qcache option, look at the options sections of the listing using<br />

-qlist to see if the current settings are satisfactory. The settings appear in the listing<br />

itself when the -qlistopt option is specified. Modification of cache geometry may<br />

be useful in cases where the systems have configurable L2 or L3 cache options or<br />

where the execution mode reduces the effective size of a shared level of cache (<strong>for</strong><br />

example, two-core-per-chip SMP execution on POWER4).<br />

If you decide to use -qcache, use -qhot or -qsmp along with it.<br />

Optimizing Floating-Point Calculations<br />

Special compiler options exist <strong>for</strong> handling floating-point calculations efficiently. By<br />

default, the compiler makes a trade-off to violate certain IEEE floating-point rules<br />

in order to improve per<strong>for</strong>mance. For example, multiply-add instructions are<br />

generated by default because they are faster and produce a more precise result<br />

than separate multiply and add instructions. Floating-point exceptions, such as<br />

overflow or division by zero, are masked by default. If you need to catch these<br />

exceptions, you have the choice of enabling hardware trapping of these exceptions<br />

or using software-based checking. The option -qflttrap enables software-based<br />

checking. On the POWER4, POWER5, or PowerPC 970 processor, hardware<br />

trapping is recommended.<br />

Options <strong>for</strong> handling floating-point calcluations<br />

Option<br />

-qfloat<br />

-qflttrap<br />

Description<br />

Provides precise control over the handling of floating-point<br />

calculations.<br />

Enables software checking of IEEE floating-point exceptions. This<br />

technique is sometimes more efficient than hardware checking<br />

because checks can be executed less frequently.<br />

To understand the per<strong>for</strong>mance considerations <strong>for</strong> floating-point calculations with<br />

different combinations of compiler options, see “Maximizing Floating-Point<br />

Per<strong>for</strong>mance” on page 295 and “Minimizing the Per<strong>for</strong>mance Impact of<br />

Floating-Point Exception Trapping” on page 302.<br />

High-order Trans<strong>for</strong>mations (-qhot)<br />

High-order trans<strong>for</strong>mations are optimizations that specifically improve the<br />

per<strong>for</strong>mance of loops and array language. Optimization techniques can include<br />

interchange, fusion, and unrolling of loops, and reducing the generation of<br />

temporary arrays. The goals of these optimizations include:<br />

Optimizing <strong>XL</strong> <strong>Fortran</strong> Programs 311

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!