Compiler Usage Guidelines for 64-Bit Operating Systems on AMD64 ...
Compiler Usage Guidelines for 64-Bit Operating Systems on AMD64 ...
Compiler Usage Guidelines for 64-Bit Operating Systems on AMD64 ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
<str<strong>on</strong>g>Compiler</str<strong>on</strong>g> <str<strong>on</strong>g>Usage</str<strong>on</strong>g> <str<strong>on</strong>g>Guidelines</str<strong>on</strong>g> <str<strong>on</strong>g>for</str<strong>on</strong>g> AMD<str<strong>on</strong>g>64</str<strong>on</strong>g> Plat<str<strong>on</strong>g>for</str<strong>on</strong>g>ms<br />
The -O3 switch turns <strong>on</strong> several general optimizati<strong>on</strong>s.<br />
32035 Rev. 3.22 November 2007<br />
Table 6. Recommended Opti<strong>on</strong> Switches <str<strong>on</strong>g>for</str<strong>on</strong>g> 32-<str<strong>on</strong>g>Bit</str<strong>on</strong>g> GCC <str<strong>on</strong>g>Compiler</str<strong>on</strong>g>s <str<strong>on</strong>g>for</str<strong>on</strong>g> Linux ®<br />
SuSE GCC 4.2.0<br />
(<str<strong>on</strong>g>for</str<strong>on</strong>g> C/C++ and Fortran) and<br />
Red Hat gcc-ssa<br />
(<str<strong>on</strong>g>for</str<strong>on</strong>g> C/C++ and Fortran)<br />
Using the -ffast-math switch allows the compiler to use a significantly fast floating point model.<br />
The -fomit-frame-pointer causes the frame pointer to be omitted resulting in a per<str<strong>on</strong>g>for</str<strong>on</strong>g>mance<br />
improvement. The user should not use this switch if they need to rewind the stack using the frame<br />
pointer.<br />
Using -malign-double will result in better alignment and hence faster code <strong>on</strong> the AMD Athl<strong>on</strong> <str<strong>on</strong>g>64</str<strong>on</strong>g><br />
and AMD Opter<strong>on</strong> processors.<br />
Using -mfpmath=sse causes the compiler to generate SSE/SSE2 instructi<strong>on</strong>s in favor of the default<br />
x87 instructi<strong>on</strong>s.<br />
Since the default <str<strong>on</strong>g>for</str<strong>on</strong>g> the 32-bit gcc compiler is -march=i386, using -march=k8 causes it to generate<br />
high-per<str<strong>on</strong>g>for</str<strong>on</strong>g>mance code <str<strong>on</strong>g>for</str<strong>on</strong>g> the AMD Athl<strong>on</strong> <str<strong>on</strong>g>64</str<strong>on</strong>g> and AMD Opter<strong>on</strong> processors, while using<br />
-march=amdfam10 causes it to generate high-per<str<strong>on</strong>g>for</str<strong>on</strong>g>mance code <str<strong>on</strong>g>for</str<strong>on</strong>g> AMD Family 10h processors.<br />
The GCC 4.2.0 compiler can per<str<strong>on</strong>g>for</str<strong>on</strong>g>m loop vectorizati<strong>on</strong> by using the -ftree-vectorize flag.<br />
3.8.4 Other Switches<br />
-O3 -march=k8 -ffast-math -fomit-frame-pointer<br />
-malign-double -mfpmath=sse<br />
FSF GCC 4.2.0 Red Hat GCC 3.4.1 -O3 -march=k8 -ffast-math -fomit-frame-pointer<br />
-malign-double -mfpmath=sse -fpeel-loops -ftracer<br />
-funswitch-loops -funit-at-a-time<br />
SuSE GCC 4.2.0 -O3 -march=k8 -ffast-math -fomit-frame-pointer<br />
-malign-double -mfpmath=sse -fpeel-loops<br />
FSF GCC 4.2.0 (<str<strong>on</strong>g>for</str<strong>on</strong>g> C/C++ and<br />
Fortran)<br />
-O3 -march=k8 -ffast-math -fomit-frame-pointer<br />
-malign-double -mfpmath=sse -fpeel-loops -ftracer<br />
-funswitch-loops -ftree-vectorize<br />
In additi<strong>on</strong> to the switches menti<strong>on</strong>ed in Table 6, “Recommended Opti<strong>on</strong> Switches <str<strong>on</strong>g>for</str<strong>on</strong>g> 32-<str<strong>on</strong>g>Bit</str<strong>on</strong>g> GCC<br />
<str<strong>on</strong>g>Compiler</str<strong>on</strong>g>s <str<strong>on</strong>g>for</str<strong>on</strong>g> Linux ® ,” <strong>on</strong> page 31 the following list of switches may also improve the per<str<strong>on</strong>g>for</str<strong>on</strong>g>mance<br />
of the program. It is worth experimenting with these switches.<br />
Profile Guided Optimizati<strong>on</strong>. The 32-bit GCC compiler allows profile guided optimizati<strong>on</strong>.<br />
Table 7 shows the profile guided optimizati<strong>on</strong> switches <str<strong>on</strong>g>for</str<strong>on</strong>g> the three GCC compilers.<br />
Table 7. Profile Guided Optimizati<strong>on</strong> <str<strong>on</strong>g>for</str<strong>on</strong>g> 32-<str<strong>on</strong>g>Bit</str<strong>on</strong>g> GCC <str<strong>on</strong>g>Compiler</str<strong>on</strong>g>s <str<strong>on</strong>g>for</str<strong>on</strong>g> Linux ®<br />
<str<strong>on</strong>g>Compiler</str<strong>on</strong>g> Versi<strong>on</strong> Optimizati<strong>on</strong> Switches<br />
32 Per<str<strong>on</strong>g>for</str<strong>on</strong>g>mance-Centric <str<strong>on</strong>g>Compiler</str<strong>on</strong>g> Switches Chapter 3