26.11.2012 Views

Compiler Usage Guidelines for 64-Bit Operating Systems on AMD64 ...

Compiler Usage Guidelines for 64-Bit Operating Systems on AMD64 ...

Compiler Usage Guidelines for 64-Bit Operating Systems on AMD64 ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

32035 Rev. 3.22 November 2007<br />

<str<strong>on</strong>g>Compiler</str<strong>on</strong>g> <str<strong>on</strong>g>Usage</str<strong>on</strong>g> <str<strong>on</strong>g>Guidelines</str<strong>on</strong>g> <str<strong>on</strong>g>for</str<strong>on</strong>g> AMD<str<strong>on</strong>g>64</str<strong>on</strong>g> Plat<str<strong>on</strong>g>for</str<strong>on</strong>g>ms<br />

2. Run the executable produced in Step 1. Running the executable generates several files with profile<br />

in<str<strong>on</strong>g>for</str<strong>on</strong>g>mati<strong>on</strong> (*.dyn and *.dpi).<br />

3. Recompile the program with the -Qprof_use switch. It is recommended to also use the<br />

-Qipo switch in this stage.<br />

-Oi-. For programs with many calls to memory-related library routines (such as, memset and<br />

memcpy), using the -Oi- switch may improve per<str<strong>on</strong>g>for</str<strong>on</strong>g>mance <str<strong>on</strong>g>for</str<strong>on</strong>g> Intel compiler versi<strong>on</strong>s 7.1 and 8.0.<br />

This switch is not recommended <str<strong>on</strong>g>for</str<strong>on</strong>g> versi<strong>on</strong> 9.1.<br />

-Qunroll[n]. This switch sets the maximum number of times to unroll a loop. Experiment with<br />

values 1–4. For scientific programs, a particular value may slightly improve per<str<strong>on</strong>g>for</str<strong>on</strong>g>mance.<br />

-Qansi-alias. Try this switch if the program strictly c<strong>on</strong><str<strong>on</strong>g>for</str<strong>on</strong>g>ms to the ISO C99 standard. If the<br />

program adheres to the standard, this switch allows the compiler to per<str<strong>on</strong>g>for</str<strong>on</strong>g>m aggressive optimizati<strong>on</strong>s.<br />

3.12 Microsoft ® <str<strong>on</strong>g>Compiler</str<strong>on</strong>g>s (32-<str<strong>on</strong>g>Bit</str<strong>on</strong>g>) <str<strong>on</strong>g>for</str<strong>on</strong>g> Microsoft ®<br />

Windows ®<br />

The 32-bit Microsoft compilers can be installed and run <strong>on</strong> 32-bit Microsoft Windows and <str<strong>on</strong>g>64</str<strong>on</strong>g>-bit<br />

Microsoft Windows <strong>on</strong> AMD Athl<strong>on</strong> <str<strong>on</strong>g>64</str<strong>on</strong>g>, AMD Opter<strong>on</strong>, and AMD Family 10h processors. The<br />

current versi<strong>on</strong> is Visual Studio 2008. All the opti<strong>on</strong>s below apply to this versi<strong>on</strong>.<br />

3.12.1 Invocati<strong>on</strong> Command<br />

The cl command invokes the Microsoft C/C++ compiler.<br />

3.12.2 Generic Per<str<strong>on</strong>g>for</str<strong>on</strong>g>mance Switches<br />

The /O2, /GL, /Oy, and /fp:fast switches almost always result in improved per<str<strong>on</strong>g>for</str<strong>on</strong>g>mance. The /O2<br />

switch turns <strong>on</strong> several general optimizati<strong>on</strong>s. The /GL switch enables whole-program IPA and /Oy<br />

allows the compiler to use frame pointer register as a general register which usually result in better<br />

per<str<strong>on</strong>g>for</str<strong>on</strong>g>mance. Using /fp:fast allows the compiler to use fast math library routines with extensive error<br />

checking turned off. Using /fp:fast also allows the compiler to adhere to a fast but less predictable<br />

floating point model in general. However, applicati<strong>on</strong>s that require high precisi<strong>on</strong> should avoid using<br />

this switch. For code that may be sensitive to cache size, c<strong>on</strong>sider using the /O1 compiler switch. /O1<br />

will generate smaller code at the possible expense of instructi<strong>on</strong> executi<strong>on</strong> speed. However, the<br />

potential per<str<strong>on</strong>g>for</str<strong>on</strong>g>mance improvement due to smaller code footprint may be of more benefit than any<br />

loss due to slower instructi<strong>on</strong>s.<br />

3.12.3 Other Switches<br />

In additi<strong>on</strong> to the /O2, /GL, /Oy, and /fp:fast switches, the following list of switches may improve the<br />

per<str<strong>on</strong>g>for</str<strong>on</strong>g>mance of the program. It is worth experimenting with these switches.<br />

Chapter 3 Per<str<strong>on</strong>g>for</str<strong>on</strong>g>mance-Centric <str<strong>on</strong>g>Compiler</str<strong>on</strong>g> Switches 37

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!