13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

CHAPTER 6OPTIMIZING FOR SIMD FLOATING-POINTAPPLICATIONSThis chapter discusses rules for optimizing for the single-instruction, multiple-data(SIMD) floating-point instructions available in Streaming SIMD Extensions (SSE),Streaming SIMD Extensions 2 (SSE2)<strong>and</strong> Streaming SIMD Extensions 3 (SSE3). Thechapter also provides examples that illustrate the optimization techniques for singleprecision<strong>and</strong> double-precision SIMD floating-point applications.6.1 GENERAL RULES FOR SIMD FLOATING-POINT CODEThe rules <strong>and</strong> suggestions in this section help optimize floating-point code containingSIMD floating-point instructions. Generally, it is important to underst<strong>and</strong> <strong>and</strong> balanceport utilization to create efficient SIMD floating-point code. Basic rules <strong>and</strong> suggestionsinclude the following:• Follow all guidelines in Chapter 3 <strong>and</strong> Chapter 4.• Mask exceptions to achieve higher performance. When exceptions areunmasked, software performance is slower.• Utilize the flush-to-zero <strong>and</strong> denormals-are-zero modes for higher performanceto avoid the penalty of dealing with denormals <strong>and</strong> underflows.• Use the reciprocal instructions followed by iteration for increased accuracy. Theseinstructions yield reduced accuracy but execute much faster. Note the following:— If reduced accuracy is acceptable, use them with no iteration.— If near full accuracy is needed, use a Newton-Raphson iteration.— If full accuracy is needed, then use divide <strong>and</strong> square root which providemore accuracy, but slow down performance.6.2 PLANNING CONSIDERATIONSWhether adapting an existing application or creating a new one, using SIMD floatingpointinstructions to achieve optimum performance gain requires programmers toconsider several issues. In general, when choosing c<strong>and</strong>idates for optimization, lookfor code segments that are computationally intensive <strong>and</strong> floating-point intensive.Also consider efficient use of the cache architecture.The sections that follow answer the questions that should be raised before implementation:• Can data layout be arranged to increase parallelism or cache utilization?6-1

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!