26.11.2012 Views

Compiler Usage Guidelines for 64-Bit Operating Systems on AMD64 ...

Compiler Usage Guidelines for 64-Bit Operating Systems on AMD64 ...

Compiler Usage Guidelines for 64-Bit Operating Systems on AMD64 ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

32035 Rev. 3.22 November 2007<br />

<str<strong>on</strong>g>Compiler</str<strong>on</strong>g> <str<strong>on</strong>g>Usage</str<strong>on</strong>g> <str<strong>on</strong>g>Guidelines</str<strong>on</strong>g> <str<strong>on</strong>g>for</str<strong>on</strong>g> AMD<str<strong>on</strong>g>64</str<strong>on</strong>g> Plat<str<strong>on</strong>g>for</str<strong>on</strong>g>ms<br />

3. Recompile the program with the -prof_use switch. It is recommended to also use the -ipo switch<br />

in this stage.<br />

-nolib_inline. For programs with many calls to memory-related library routines (such as, memmove<br />

and memcopy), using the -nolib_inline switch may improve per<str<strong>on</strong>g>for</str<strong>on</strong>g>mance <str<strong>on</strong>g>for</str<strong>on</strong>g> Intel compiler versi<strong>on</strong>s<br />

7.1 and 8.0. This switch is not recommended <str<strong>on</strong>g>for</str<strong>on</strong>g> versi<strong>on</strong> 9.1.<br />

-unroll[n]. This switch sets the maximum number of times to unroll a loop. Experiment with values<br />

1–4. For scientific programs, a particular value may slightly improve per<str<strong>on</strong>g>for</str<strong>on</strong>g>mance.<br />

-fno-rtti. Using this switch will instruct the C++ compiler not to keep C++ run-time type in<str<strong>on</strong>g>for</str<strong>on</strong>g>mati<strong>on</strong><br />

(RTTI). This may improve per<str<strong>on</strong>g>for</str<strong>on</strong>g>mance. However, C++ features requiring RTTI (excepti<strong>on</strong>s,<br />

dynamic cast, etc.) will not be supported.<br />

-ansi-alias. Try this switch if the program strictly c<strong>on</strong><str<strong>on</strong>g>for</str<strong>on</strong>g>ms to the ISO C99 standard. If the<br />

program adheres to the standard, this switch allows the compiler to per<str<strong>on</strong>g>for</str<strong>on</strong>g>m aggressive optimizati<strong>on</strong>s.<br />

3.10 PathScale <str<strong>on</strong>g>Compiler</str<strong>on</strong>g>s (32-<str<strong>on</strong>g>Bit</str<strong>on</strong>g>) <str<strong>on</strong>g>for</str<strong>on</strong>g> Linux®<br />

PathScale provides C, C++, and Fortran compilers <str<strong>on</strong>g>for</str<strong>on</strong>g> x86 Linux. The current versi<strong>on</strong> (as of August<br />

2007) is 3.0. All the opti<strong>on</strong>s described in this secti<strong>on</strong> apply to this release. To generate 32-bit binaries,<br />

the -m32 switch must be used with the PathScale compiler.<br />

3.10.1 Invocati<strong>on</strong> Commands<br />

The following commands invoke specific compilers:<br />

pathcc invokes the PathScale C compiler.<br />

pathCC invokes the PathScale C++ compiler.<br />

pathf90 invokes the PathScale Fortran compiler.<br />

3.10.2 Generic Per<str<strong>on</strong>g>for</str<strong>on</strong>g>mance Switches<br />

Use the -O3 and -OPT:Ofast switches as the first step of optimizati<strong>on</strong>. For further tuning, experiment<br />

with the switches in Secti<strong>on</strong> 3.10.3.<br />

3.10.3 Other Switches<br />

In additi<strong>on</strong> to the -O3 and -OPT:Ofast switches, the following list of switches may improve the<br />

per<str<strong>on</strong>g>for</str<strong>on</strong>g>mance of the program. It is worth experimenting with these switches.<br />

Profile Guided Optimizati<strong>on</strong>. The 32-bit PathScale compiler also allows profile guided<br />

optimizati<strong>on</strong>. Use the following steps <str<strong>on</strong>g>for</str<strong>on</strong>g> profile guided optimizati<strong>on</strong> with PathScale compilers.<br />

1. Compile the program with the -fb_create fbdata switch.<br />

2. Run the executable produced in Step 1. Running this executable generates several files with<br />

profile in<str<strong>on</strong>g>for</str<strong>on</strong>g>mati<strong>on</strong>.<br />

Chapter 3 Per<str<strong>on</strong>g>for</str<strong>on</strong>g>mance-Centric <str<strong>on</strong>g>Compiler</str<strong>on</strong>g> Switches 35

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!