XL Fortran Enterprise Edition for AIX : User's Guide - IBM

More documents

Recommendations

Info

v v v Reducing the costs of memory access through the effective use of caches and translation look-aside buffers. Overlapping computation and memory access through effective utilization of the data prefetching capabilities provided by the hardware. Improving the utilization of processor resources through reordering and balancing the usage of instructions with complementary resource requirements. -qhot=vector is the default when -qhot is specified. Compiling with -qhot=vector transforms some loops to exploit optimized versions of functions rather than the standard versions. The optimized functions reside in a built-in library that includes functions and operations such as reciprocal, square root, and so on. The optimized versions make different trade-offs with respect to precision versus performance. Usage of -qstrict implies -qhot=novector. Getting the Most out of -qhot Try using -qhot along with -O3 for all of your code. (The compiler assumes at least -O2 level for -qhot.) It is designed to have a neutral effect when no opportunities for transformation exist. v If you encounter unacceptably long compile times (this can happen with complex loop nests) or if your performance degrades with the use of -qhot, try using -qhot=novector, or -qstrict or -qcompact along with -qhot. v If necessary, deactivate -qhot selectively, allowing it to improve some of your code. Optimizing Loops and Array Language The -qhot option does the following transformations to improve the performance of loops, array language, and memory management: v Scalar replacement, loop blocking, distribution, fusion, interchange, reversal, skewing, and unrolling v Reducing generation of temporary arrays It requires at least level 2 of -O. The -C option inhibits it. If you have SMP hardware, you can enable automatic parallelization of loops by specifying the -qsmp option. This optimization includes explicitly coded DO loops as well as DO loops that are generated by the compiler for array language (WHERE, FORALL, array assignment, and so on). The compiler can only parallelize loops that are independent (each iteration can be computed independently of any other iteration). One case where the compiler will not automatically parallelize loops is where the loops contain I/O, because doing so could lead to unexpected results. In this case, by using the PARALLEL DO or work-sharing DO directive, you can advise the compiler that such a loop can be safely parallelized. However, the type of I/O must be one of the following: v Direct-access I/O where each iteration writes to or reads from a different record v Sequential I/O where each iteration writes to or reads from a different unit v Stream-access I/O where each iteration uses the POS= specifier to write to, or read from, a different part of the file. v Stream-access I/O where each iteration writes to, or reads from, a different unit. For more details, refer to the description of the PARALLEL DO or work-sharing DO directive in the XL Fortran Enterprise Edition for AIX Language Reference. You can use the -qhot and -qsmp options on: v Programs with performance bottlenecks that are caused by loops and structured memory accesses 312 XL Fortran Enterprise Edition for AIX : User’s Guide
v Programs that contain significant amounts of array language (which can be optimized in the same ways as FORTRAN 77 loops for array operations) Cost Model for Loop Transformations The loop transformations performed by the -qhot option are controlled by a set of assumptions about the characteristics of typical loops and the costs (in terms of registers used and potential delays introduced) of performing particular transformations. The cost model takes into consideration: v The number of available registers and functional units that the processor has v The configuration of cache memory in the system v The number of iterations of each loop v The need to make conservative assumptions to ensure correct results When the compiler can determine information precisely, such as the number of iterations of a loop, it uses this information to improve the accuracy of the cost model at that location in the program. If it cannot determine the information, the compiler relies on the default assumptions of the cost model. You can change these default assumptions, and thus influence how the compiler optimizes loops, by specifying compiler options: v v -qassert=nodeps asserts that none of the loops in the files being compiled have dependencies that extend from one iteration to any other iteration within the same loop. This is known as a loop-carried dependency. If you can assert that the computations performed during iteration n do not require results that are computed during any other iteration, the compiler is better able to rearrange the loops for efficiency. -qassert=itercnt=n asserts that a “typical” loop in the files that you are compiling will iterate approximately n times. If this is not specified, the assumption is that loops iterate approximately 1024 times. The compiler uses this information to assist in transformations such as putting a high-iteration loop inside a low-iteration one. It is not crucial to get the value exactly right, and the value does not have to be accurate for every loop in the file. This value is not used if either of the following conditions is true: – The compiler can determine the exact iteration count. – You specified the ASSERT(ITERCNT(n)) directive. Some of the loop transformations only speed up loops that iterate many times. For programs with many such loops or for programs whose hotspots and bottlenecks are high-iteration loops, specify a large value for n. A program might contain a variety of loops, some of which are speeded up by these options and others unaffected or even slowed down. Therefore, you might want to determine which loops benefit most from which options, split some loops into different files, and compile the files with the set of options and directives that suits them best. Unrolling Loops Loop unrolling involves expanding the loop body to do the work of two, three, or more iterations, and reducing the iteration count proportionately. Benefits to loop unrolling include the following: v Data dependence delays may be reduced or eliminated v v Loads and stores may be eliminated in successive loop iterations Loop overhead may be reduced Optimizing XL Fortran Programs 313
Page 1:
XL Fortran Enterprise Edition for A
Page 4 and 5:
Note! Before using this information
Page 6 and 7:
Options That Control Listings and M
Page 8 and 9:
How XLF I/O Interacts with Pipes, S
Page 10 and 11:
viii XL Fortran Enterprise Edition
Page 12 and 13:
x XL Fortran Enterprise Edition for
Page 14 and 15:
The following performance-related d
Page 16 and 17:
How to Read the Syntax Diagrams and
Page 18 and 19:
v Optionally, enter the value of at
Page 20 and 21:
6 XL Fortran Enterprise Edition for
Page 22 and 23:
Migration Support The XL Fortran co
Page 24 and 25:
These items are located, or accesse
Page 26 and 27:
v /usr/share/man/info/en_US/xlf/pos
Page 28 and 29:
Notes: 1. Specifying the LC_ALL env
Page 30 and 31:
3. Installing a compiler program te
Page 32 and 33:
fsuffix cppsuffix osuffix ssuffix l
Page 34 and 35:
mcrt = /lib/mcrt0_r.o gcrt = /lib/g
Page 36 and 37:
options include -L/usr/lpp/xlf/lib,
Page 38 and 39:
Related Information: You can use th
Page 40 and 41:
Table 1. Potential Problems Migrati
Page 42 and 43:
Table 1. Potential Problems Migrati
Page 44 and 45:
►► xlf90 xlf90_r xlf90_r7 xlf95
Page 46 and 47:
xlf90, xlf90_r, and xlf90_r7 comman
Page 48 and 49:
Related Information: The ssuffix at
Page 50 and 51:
Related Information: See “-qpdf O
Page 52 and 53:
The @PROCESS compiler directive mus
Page 54 and 55:
Passing Fortran Files through the C
Page 56 and 57:
the trigraph sequence. XL Fortran u
Page 58 and 59:
The configuration file /etc/xlf.cfg
Page 60 and 61:
esulting program work as you expect
Page 62 and 63:
v v Do not use names that are the s
Page 64 and 65:
v v Build the application by using
Page 66 and 67:
If you have applications in which F
Page 68 and 69:
total_usertime Returns the total us
Page 70 and 71:
To obtain support for items that ar
Page 72 and 73:
Note: You may need the old setting
Page 74 and 75:
These partitions are then assigned
Page 76 and 77:
During the busy-wait search for wor
Page 78 and 79:
(for example, loops for which you s
Page 80 and 81:
Other Environment Variables That Af
Page 82 and 83:
Options That Control Input to the C
Page 84 and 85:
Table 3. Options That Control Input
Page 86 and 87:
Table 5. Options for Performance Op
Page 88 and 89:
Table 5. Options for Performance Op
Page 90 and 91:
Table 6. Options for Debugging and
Page 92 and 93:
Table 7. Options That Control Listi
Page 94 and 95:
Table 8. Options for Compatibility
Page 96 and 97:
Page 98 and 99:
Page 100 and 101:
Page 102 and 103:
v v Determine names and options for
Page 104 and 105:
Detailed Descriptions of the XL For
Page 106 and 107:
-1 Option Syntax -1 ONETRIP | NOONE
Page 108 and 109:
-b64 Option Syntax -b64 The AIX ope
Page 110 and 111:
Examples xlf95 f.f -brtl -bshared -
Page 112 and 113:
-bloadmap Option Syntax -bloadmap:n
Page 114 and 115:
-brtl Option Syntax -brtl | -bnortl
Page 116 and 117:
-bstatic Option Related Information
Page 118 and 119:
-c Option Syntax -c Prevents the co
Page 120 and 121:
-d Option Syntax -d Causes preproce
Page 122 and 123:
-g Option Syntax -g DBG | NODBG Gen
Page 124 and 125:
-k Option Syntax -k FREE(F90) Speci
Page 126 and 127:
-l Option Syntax -lkey Searches the
Page 128 and 129:
-O Option Syntax -O[level] OPTimize
Page 130 and 131:
-o Option Syntax -o name Specifies
Page 132 and 133:
-p Option Syntax -p[g] Sets up the
Page 134 and 135:
-q32 Option Related Information See
Page 136 and 137:
-qalias Option Syntax -qalias={[no]
Page 138 and 139:
equivalence (a, b(3)) t = b; a = t
Page 140 and 141:
a derived type are stored with suff
Page 142 and 143:
pwr2s p2sc You can run the executab
Page 144 and 145:
v v If your primary concern is exec
Page 146 and 147:
-qassert Option Syntax -qassert={de
Page 148 and 149:
-qautodbl Option Syntax -qautodbl=s
Page 150 and 151:
Related Information For background
Page 152 and 153:
2 Level-2 cache or the table lookas
Page 154 and 155:
-qcheck Option Syntax -qcheck | -qn
Page 156 and 157:
-qcompact Option Syntax -qcompact |
Page 158 and 159:
-qctyplss Option Syntax -qctyplss[(
Page 160 and 161:
-qdbg Option Syntax -qdbg | -qnodbg
Page 162 and 163:
-qdirective Option Syntax -qdirecti
Page 164 and 165:
-qdirectstorage Option Syntax -qdir
Page 166 and 167:
-qdpc Option Syntax -qdpc[=e] | -qn
Page 168 and 169:
-qescape Option Syntax -qescape | -
Page 170 and 171:
-qextchk Option Syntax -qextchk | -
Page 172 and 173:
-qextname Option Syntax -qextname[=
Page 174 and 175:
-qfdpr Option Syntax -qfdpr | -qnof
Page 176 and 177:
-qflag Option Syntax -qflag=listing
Page 178 and 179:
the reciprocal of the divisor. It a
Page 180 and 181:
export SQRT_EXCEPTION=3.1 166 XL Fo
Page 182 and 183:
-qfree Option Syntax -qfree[={f90|i
Page 184 and 185:
-qhalt Option Syntax -qhalt=severit
Page 186 and 187:
If you specify -qhot=novector, the
Page 188 and 189:
-qhssngl Option Syntax -qhssngl | -
Page 190 and 191:
-qinit Option Syntax -qinit=f90ptr
Page 192 and 193:
v You can specify alphabetic digits
Page 194 and 195:
-qintsize Option Syntax -qintsize=b
Page 196 and 197:
-qipa Option Syntax -qipa[=suboptio
Page 198 and 199:
If you specify short, the Object Fi
Page 200 and 201:
Table 15. Regular expression syntax
Page 202 and 203:
-qkeepparm Option Syntax -qkeepparm
Page 204 and 205:
Restrictions The -qflag option can
Page 206 and 207:
-qlibansi Option Related Informatio
Page 208 and 209:
-qlibposix Option Related Informati
Page 210 and 211:
-qlistopt Option Syntax -qlistopt |
Page 212 and 213:
-qlog4 Option Syntax -qlog4 | -qnol
Page 214 and 215:
Restrictions Depending on the sourc
Page 216 and 217:
-qmixed Option Syntax -qmixed | -qn
Page 218 and 219:
-qmodule Option Syntax -qmodule=man
Page 220 and 221:
-qnullterm Option Syntax -qnullterm
Page 222 and 223:
-qonetrip Option Syntax -qonetrip |
Page 224 and 225:
-qpdf Option Syntax -qpdf{1|2} Tune
Page 226 and 227:
mergepdf Generates a single pdf rec
Page 228 and 229:
-qphsinfo Option Syntax -qphsinfo |
Page 230 and 231:
-qpic Option Syntax -qpic[=suboptio
Page 232 and 233:
sce | nosce } else if (a == 3) *res
Page 234 and 235:
-qprefetch Option Syntax -qprefetch
Page 236 and 237:
-qrealsize Option Syntax -qrealsize
Page 238 and 239:
-qrecur Option Syntax -qrecur | -qn
Page 240 and 241:
To produce a listing file that you
Page 242 and 243:
-qsave Option Syntax -qsave[={all|d
Page 244 and 245:
-qsclk Option Syntax -qsclk[=centi
Page 246 and 247:
-qsigtrap Option Syntax -qsigtrap[=
Page 248 and 249:
-qsmp Option Syntax -qsmp[=suboptio
Page 250 and 251:
threshold=n The work in a partition
Page 252 and 253:
xlf90 -qsmp=noopt -O3... is equival
Page 254 and 255:
-qspillsize Option Syntax -qspillsi
Page 256 and 257:
-qstrictieeemod Option Syntax -qstr
Page 258 and 259:
-qsuffix Option Syntax -qsuffix=opt
Page 260 and 261:
Examples @process nullterm i = 1; j
Page 262 and 263:
Related Information See the OpenMP
Page 264 and 265:
-qthreaded Option Syntax -qthreaded
Page 266 and 267:
If you do not specify -qtune, its s
Page 268 and 269:
-qundef Option Syntax -qundef | -qn
Page 270 and 271:
-qunwind Option Syntax -qunwind |-q
Page 272 and 273:
-qwarn64 Option See “-qwarn64 Opt
Page 274 and 275:
-qxflag=xalias Option Syntax -qxfla
Page 276 and 277: intxor | nointxor Treats .XOR. as a
Page 278 and 279: Examples Consider the following pro
Page 280 and 281: In this example, conditional compil
Page 282 and 283: -qzerosize Option Syntax -qzerosize
Page 284 and 285: -t Option Syntax -tcomponents Appli
Page 286 and 287: -u Option Syntax -u UNDEF | NOUNDEF
Page 288 and 289: -V Option Syntax -V This option is
Page 290 and 291: $ ./a.out 1 2 3 4 276 XL Fortran En
Page 292 and 293: -y Option Syntax -y{n | m | p | z}
Page 294 and 295: 64-Bit Thread Support On AIX Versio
Page 296 and 297: -q64 Option Syntax -q64[=largetype]
Page 298 and 299: -qwarn64 Option Syntax -qwarn64 | -
Page 300 and 301: 286 XL Fortran Enterprise Edition f
Page 302 and 303: v v If the data or program code con
Page 304 and 305: v Comparisons involving NaN values
Page 306 and 307: How XL Fortran Rounds Floating-Poin
Page 308 and 309: 2. For thread-safety and reentrancy
Page 310 and 311: v The fltint suboption speeds up fl
Page 312 and 313: There are other related operating s
Page 314 and 315: Another method is to use the ieee_s
Page 316 and 317: call fpgets(fpstat) fpstat(fpox) =
Page 318 and 319: The following example shows the dif
Page 320 and 321: v Code that might cause an exceptio
Page 322 and 323: Optimization levels Option -qnoopt/
Page 324 and 325: v -qcache=auto Optimization level -
Page 328 and 329: Loop unrolling also increases code
Page 330 and 331: 2. Run the application using a typi
Page 332 and 333: # Let the compiler decide (relative
Page 334 and 335: can also use the SNAPSHOT directive
Page 336 and 337: -qipa=list cross-file type checking
Page 338 and 339: 324 XL Fortran Enterprise Edition f
Page 340 and 341: Stream-access formatted files: A re
Page 342 and 343: I/O Redirection The important thing
Page 344 and 345: Default Record Lengths File Permiss
Page 346 and 347: XLFRTEOPTS="unit_vars=yes" # Allow
Page 348 and 349: Logical Volume I/O To use a logical
Page 350 and 351: READ(unit_number,ID=idvar) a .... W
Page 352 and 353: 450 end do close(20) end Performanc
Page 354 and 355: Table 22. Table for Binding an Appl
Page 356 and 357: v The result of an application does
Page 358 and 359: Sometimes an application can guaran
Page 360 and 361: v v The XL compilers generate code
Page 362 and 363: #include template class junk { pri
Page 364 and 365: Notes: 1. In interlanguage communic
Page 366 and 367: Table 25. Escape Sequences for Char
Page 368 and 369: You can use this built-in function
Page 370 and 371: The system linkage convention passe
Page 372 and 373: Run-time Stack for 32-bit Environme
Page 374 and 375: processing, and the fifth doublewor
Page 376 and 377:
v v v v v v v In a 32-bit Environme
Page 378 and 379:
Will Be Passed In: R3 0 Storage Map
Page 380 and 381:
Function Values Functions return th
Page 382 and 383:
Example The control structure has t
Page 384 and 385:
technique can reduce the amount of
Page 386 and 387:
Note: When you run an XL Fortran pr
Page 388 and 389:
Duplicating Extensions from Other S
Page 390 and 391:
Input/Output Errors If the error de
Page 392 and 393:
-> xlf95 -qddim -g testprog.f -o te
Page 394 and 395:
7. The next step is to research why
Page 396 and 397:
Note that you cannot use the -qextn
Page 398 and 399:
v v Memory leak reporting that indi
Page 400 and 401:
HD_FILL HD_STACK=n called a certain
Page 402 and 403:
388 XL Fortran Enterprise Edition f
Page 404 and 405:
Source Section The source section c
Page 406 and 407:
Attribute and Cross-Reference Secti
Page 408 and 409:
Page 410 and 411:
Automating Large, Complex Compilati
Page 412 and 413:
For fixed source form code, in addi
Page 414 and 415:
Common Industry Extensions That XL
Page 416 and 417:
Page 418 and 419:
Page 420 and 421:
Example 2 - Valid C Routine Source
Page 422 and 423:
close(10) open(10, file="fun10.out"
Page 424 and 425:
!**********************************
Page 426 and 427:
operating system require the same o
Page 428 and 429:
Examples of Storage Relationships f
Page 430 and 431:
0 4 8 16 32 64 COMPLEX (16) COMPLEX
Page 432 and 433:
! Data values between r8 and x16 ar
Page 434 and 435:
end function sqrt real*8 function r
Page 436 and 437:
Table 34. MASS Vector Library Funct
Page 438 and 439:
Page 440 and 441:
Page 442 and 443:
IBM may use or distribute any of th
Page 444 and 445:
Page 446 and 447:
ind. To relate an identifier to ano
Page 448 and 449:
expression. A sequence of operands,
Page 450 and 451:
name. A lexical token consisting of
Page 452 and 453:
labels can be used to transfer cont
Page 454 and 455:
Page 456 and 457:
.profile file 12 .s files 33, 34 .X
Page 458 and 459:
fdate_ service and utility subprogr
Page 460 and 461:
nonested_par suboption of -qsmp 234
Page 462:
vi text editor 29 W W error severit
show all

XL Fortran Enterprise Edition for AIX : User's Guide - IBM

Create successful ePaper yourself

Delete template?

Save as template?