13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

GENERAL OPTIMIZATION GUIDELINES3.8.3 Floating-point ModesOn the Pentium III processor, the FLDCW instruction is an expensive operation. Onearly generations of Pentium 4 processors, FLDCW is improved only for situationswhere an application alternates between two constant values of the x87 FPU controlword (FCW), such as when performing conversions to integers. On Pentium M, IntelCore Solo, Intel Core Duo <strong>and</strong> Intel Core 2 Duo processors, FLDCW is improved overprevious generations.Specifically, the optimization for FLDCW in the first two generations of Pentium 4processors allow programmers to alternate between two constant values efficiently.For the FLDCW optimization to be effective, the two constant FCW values are onlyallowed to differ on the following 5 bits in the FCW:FCW[8-9] ; Precision controlFCW[10-11] ; Rounding controlFCW[12] ; Infinity controlIf programmers need to modify other bits (for example: mask bits) in the FCW, theFLDCW instruction is still an expensive operation.In situations where an application cycles between three (or more) constant values,FLDCW optimization does not apply, <strong>and</strong> the performance degradation occurs foreach FLDCW instruction.One solution to this problem is to choose two constant FCW values, take advantageof the optimization of the FLDCW instruction to alternate between only these twoconstant FCW values, <strong>and</strong> devise some means to accomplish the task that requiresthe 3rd FCW value without actually changing the FCW to a third constant value. Analternative solution is to structure the code so that, for periods of time, the applicationalternates between only two constant FCW values. When the application lateralternates between a pair of different FCW values, the performance degradationoccurs only during the transition.It is expected that SIMD applications are unlikely to alternate between FTZ <strong>and</strong> DAZmode values. Consequently, the SIMD control word does not have the short latenciesthat the floating-point control register does. A read of the MXCSR register has a fairlylong latency, <strong>and</strong> a write to the register is a serializing instruction.There is no separate control word for single <strong>and</strong> double precision; both use the samemodes. Notably, this applies to both FTZ <strong>and</strong> DAZ modes.Assembly/Compiler Coding Rule 59. (H impact, M generality) Minimizechanges to bits 8-12 of the floating point control word. Changes for more than twovalues (each value being a combination of the following bits: precision, rounding<strong>and</strong> infinity control, <strong>and</strong> the rest of bits in FCW) leads to delays that are on theorder of the pipeline depth.3.8.3.1 Rounding ModeMany libraries provide float-to-integer library routines that convert floating-pointvalues to integer. Many of these libraries conform to ANSI C coding st<strong>and</strong>ards which3-81

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!