26.11.2012 Views

Compiler Usage Guidelines for 64-Bit Operating Systems on AMD64 ...

Compiler Usage Guidelines for 64-Bit Operating Systems on AMD64 ...

Compiler Usage Guidelines for 64-Bit Operating Systems on AMD64 ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<str<strong>on</strong>g>Compiler</str<strong>on</strong>g> <str<strong>on</strong>g>Usage</str<strong>on</strong>g> <str<strong>on</strong>g>Guidelines</str<strong>on</strong>g> <str<strong>on</strong>g>for</str<strong>on</strong>g> AMD<str<strong>on</strong>g>64</str<strong>on</strong>g> Plat<str<strong>on</strong>g>for</str<strong>on</strong>g>ms<br />

3.1.2 General Per<str<strong>on</strong>g>for</str<strong>on</strong>g>mance Switches<br />

32035 Rev. 3.22 November 2007<br />

To get a program running, start by compiling and linking without optimizati<strong>on</strong>. Use the optimizati<strong>on</strong><br />

level -O0 or select -g to per<str<strong>on</strong>g>for</str<strong>on</strong>g>m minimal optimizati<strong>on</strong>. At this level, you can debug a program easily<br />

and isolate any coding errors exposed during porting to x86 or AMD<str<strong>on</strong>g>64</str<strong>on</strong>g> plat<str<strong>on</strong>g>for</str<strong>on</strong>g>ms. Use opti<strong>on</strong> -tp (i.e.<br />

target processor) to specify the target architecture. Opti<strong>on</strong>s -tp k8-<str<strong>on</strong>g>64</str<strong>on</strong>g> and -tp k8-<str<strong>on</strong>g>64</str<strong>on</strong>g>e result in the<br />

generati<strong>on</strong> of code supported <strong>on</strong> and optimized <str<strong>on</strong>g>for</str<strong>on</strong>g> AMD<str<strong>on</strong>g>64</str<strong>on</strong>g> processors. Editi<strong>on</strong> 7 supports AMD<br />

Opter<strong>on</strong> quad-core processor with opti<strong>on</strong>s -tp barcel<strong>on</strong>a-<str<strong>on</strong>g>64</str<strong>on</strong>g> to generate <str<strong>on</strong>g>64</str<strong>on</strong>g>-bit code and -tp<br />

barcel<strong>on</strong>a to generate 32-bit code.<br />

Note: The <str<strong>on</strong>g>64</str<strong>on</strong>g>-bit PGI compiler can generate 32-bit binaries.<br />

To get started quickly with optimizati<strong>on</strong>, with any PGI compiler use opti<strong>on</strong>s -fast and -Mipa=fast.<br />

For C++ programs, add -Minline=levels:10 --no_excepti<strong>on</strong>s (C++ program compiled with<br />

--no_excepti<strong>on</strong>s will fail if the program uses excepti<strong>on</strong> handling). Beginning in Editi<strong>on</strong> 7 the -fast<br />

opti<strong>on</strong> became syn<strong>on</strong>ymous with the -fastsse opti<strong>on</strong>, and the optimizati<strong>on</strong>s per<str<strong>on</strong>g>for</str<strong>on</strong>g>med by -fast in<br />

previous releases were placed under the -nfast opti<strong>on</strong>.<br />

Note: The -fastsse opti<strong>on</strong> is still necessary to compile 32 bit code.<br />

Generally, further significant per<str<strong>on</strong>g>for</str<strong>on</strong>g>mance gains can be realized. However, individual optimizati<strong>on</strong>s<br />

can sometimes cause slowdowns depending <strong>on</strong> coding style. Optimizati<strong>on</strong> flags most likely to further<br />

improve per<str<strong>on</strong>g>for</str<strong>on</strong>g>mance are-O3, -Mpfi/-Mpfo, -Minline, and <strong>on</strong> targets with multiple processors<br />

-Mc<strong>on</strong>cur,<br />

The --zc_eh opti<strong>on</strong> allows zero-cost excepti<strong>on</strong> handling <str<strong>on</strong>g>for</str<strong>on</strong>g> C++.<br />

For C++ BASE optimizati<strong>on</strong>, use --zc_eh with -Mipa=fast,inline and -Msmartalloc=huge. The<br />

huge flag enables the use of huge pages if the OS is c<strong>on</strong>figured to provide them.<br />

3.1.3 Optimizati<strong>on</strong> Switches<br />

In additi<strong>on</strong> to the -tp (i.e., target processor) switch, the following list of switches may improve the<br />

per<str<strong>on</strong>g>for</str<strong>on</strong>g>mance of the program. It is worth experimenting with these switches, but care must be used to<br />

ensure per<str<strong>on</strong>g>for</str<strong>on</strong>g>mance improvements.<br />

Local and Global Optimizati<strong>on</strong> using -O. Specify any of the following optimizati<strong>on</strong> level<br />

(-Olevel) opti<strong>on</strong>s.<br />

-O0—(level-0) specifies no optimizati<strong>on</strong>. This optimizati<strong>on</strong> level generates a basic block <str<strong>on</strong>g>for</str<strong>on</strong>g> each<br />

language statement. This is useful <str<strong>on</strong>g>for</str<strong>on</strong>g> debugging since there is a direct correlati<strong>on</strong> between the<br />

program text and the code generated.<br />

-O1 (level-1) specifies local optimizati<strong>on</strong>. This optimizati<strong>on</strong> level per<str<strong>on</strong>g>for</str<strong>on</strong>g>ms scheduling of basic<br />

blocks and allocates registers.<br />

-O2 (level-2) specifies global optimizati<strong>on</strong>. This optimizati<strong>on</strong> level per<str<strong>on</strong>g>for</str<strong>on</strong>g>ms all level-<strong>on</strong>e local<br />

optimizati<strong>on</strong> as well as level-two global optimizati<strong>on</strong>.<br />

20 Per<str<strong>on</strong>g>for</str<strong>on</strong>g>mance-Centric <str<strong>on</strong>g>Compiler</str<strong>on</strong>g> Switches Chapter 3

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!