13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

CHAPTER 5OPTIMIZING FOR SIMD INTEGER APPLICATIONSSIMD integer instructions provide performance improvements in applications thatare integer-intensive <strong>and</strong> can take advantage of SIMD architecture.Guidelines in this chapter for using SIMD integer instructions (in addition to thosedescribed in Chapter 3) may be used to develop fast <strong>and</strong> efficient code that scalesacross processors with MMX technology, processors that use Streaming SIMD Extensions(SSE) SIMD integer instructions, as well as processor with the SIMD integerinstructions in SSE2, SSE3 <strong>and</strong> SSSE3.The collection of <strong>64</strong>-bit <strong>and</strong> 128-bit SIMD integer instructions supported by MMXtechnology, SSE, SSE2, SSE3 <strong>and</strong> SSSE3 are referred to as SIMD integer instructions.Code sequences in this chapter demonstrates the use of <strong>64</strong>-bit SIMD integer instructions<strong>and</strong> 128-bit SIMD integer instructions.Processors based on Intel Core microarchitecture support MMX, SSE, SSE2, SSE3,<strong>and</strong> SSSE3. Execution of 128-bit SIMD integer instructions in Intel Core microarchitectureare substantially more efficient than equivalent implementations on previousmicroarchitectures. Conversion from <strong>64</strong>-bit SIMD integer code to 128-bit SIMDinteger code is highly recommended.This chapter contains examples that will help you to get started with coding yourapplication. The goal is to provide simple, low-level operations that are frequentlyused. The examples use a minimum number of instructions necessary to achievebest performance on the current generation of Intel <strong>64</strong> <strong>and</strong> <strong>IA</strong>-<strong>32</strong> processors.Each example includes a short description, sample code, <strong>and</strong> notes if necessary.These examples do not address scheduling as it is assumed the examples will beincorporated in longer code sequences.For planning considerations of using the new SIMD integer instructions, refer toSection 4.1.3.5.1 GENERAL RULES ON SIMD INTEGER CODEGeneral rules <strong>and</strong> suggestions are:• Do not intermix <strong>64</strong>-bit SIMD integer instructions with x87 floating-point instructions.See Section 5.2, “Using SIMD Integer with x87 Floating-point.” Note thatall SIMD integer instructions can be intermixed without penalty.• Favor 128-bit SIMD integer code over <strong>64</strong>-bit SIMD integer code. On previousmicroarchitectures, most 128-bit SIMD instructions have two-cycle throughputrestrictions due to the underlying <strong>64</strong>-bit data path in the execution engine. IntelCore microarchitecture executes almost all SIMD instructions with one-cycle5-1

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!