13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

GENERAL OPTIMIZATION GUIDELINESThere are basically three ways to reduce the impact of overflow/underflow situationswith x87 FPU code:• Choose floating-point data types that are large enough to accommodate resultswithout generating arithmetic overflow <strong>and</strong> underflow exceptions.• Scale the range of oper<strong>and</strong>s/results to reduce as much as possible the number ofarithmetic overflow/underflow situations.• Keep intermediate results on the x87 FPU register stack until the final resultshave been computed <strong>and</strong> stored in memory. Overflow or underflow is less likelyto happen when intermediate results are kept in the x87 FPU stack (this isbecause data on the stack is stored in double extended-precision format <strong>and</strong>overflow/underflow conditions are detected accordingly).• Denormalized floating-point constants (which are read-only, <strong>and</strong> hence neverchange) should be avoided <strong>and</strong> replaced, if possible, with zeros of the same sign.3.8.2.3 Floating-point Exceptions in SSE/SSE2/SSE3 CodeMost special situations that involve masked floating-point exceptions are h<strong>and</strong>ledefficiently in hardware. When a masked overflow exception occurs while executingSSE/SSE2/SSE3 code, processor hardware can h<strong>and</strong>les it without performancepenalty.Underflow exceptions <strong>and</strong> denormalized source oper<strong>and</strong>s are usually treatedaccording to the IEEE 754 specification, but this can incur significant performancedelay. If a programmer is willing to trade pure IEEE 754 compliance for speed, twonon-IEEE 754 compliant modes are provided to speed situations where underflows<strong>and</strong> input are frequent: FTZ mode <strong>and</strong> DAZ mode.When the FTZ mode is enabled, an underflow result is automatically converted to azero with the correct sign. Although this behavior is not compliant with IEEE 754, it isprovided for use in applications where performance is more important than IEEE 754compliance. Since denormal results are not produced when the FTZ mode is enabled,the only denormal floating-point numbers that can be encountered in FTZ mode arethe ones specified as constants (read only).The DAZ mode is provided to h<strong>and</strong>le denormal source oper<strong>and</strong>s efficiently whenrunning a SIMD floating-point application. When the DAZ mode is enabled, inputdenormals are treated as zeros with the same sign. Enabling the DAZ mode is theway to deal with denormal floating-point constants when performance is the objective.If departing from the IEEE 754 specification is acceptable <strong>and</strong> performance is critical,run SSE/SSE2/SSE3 applications with FTZ <strong>and</strong> DAZ modes enabled.NOTEThe DAZ mode is available with both the SSE <strong>and</strong> SSE2 extensions,although the speed improvement expected from this mode is fullyrealized only in SSE code.3-80

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!