03.03.2013 Views

Intel® Architecture Instruction Set Extensions Programming Reference

Intel® Architecture Instruction Set Extensions Programming Reference

Intel® Architecture Instruction Set Extensions Programming Reference

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

INTEL® ADVANCED VECTOR EXTENSIONS<br />

SIMD prefixes. The 128-bit data processing instructions in AVX cover floating-point and integer data movement<br />

primitives.<br />

Additional enhancements in AVX on 128-bit data processing primitives include 16 new instructions with the<br />

following capabilities:<br />

• Non-unit-strided fetching of SIMD data. AVX provides several flexible SIMD floating-point data fetching<br />

primitives:<br />

— broadcast of single data element into a 128-bit destination,<br />

— masked move primitives to load or store SIMD data elements conditionally,<br />

• Intra-register manipulation of SIMD data elements. AVX provides several flexible SIMD floating-point data<br />

manipulation primitives:<br />

— permute primitives to facilitate efficient manipulation of floating-point data elements in 128-bit SIMD<br />

registers<br />

• Branch handling. AVX provides several primitives to enable handling of branches in SIMD programming:<br />

— new variable blend instructions supports four-operand syntax with non-destructive source syntax.<br />

Branching conditions dependent on floating-point data or integer data can benefit from Intel AVX. This is<br />

more flexible than non-VEX encoded instruction syntax that uses the XMM0 register as implied mask for<br />

blend selection. While variable blend with implied XMM0 syntax is supported in SSE4 using SIMD prefix<br />

encoding, VEX-encoded 128-bit variable blend instructions only support the more flexible four-operand<br />

syntax.<br />

— Packed TEST instructions for floating-point data.<br />

1.5.5 AVX2 and 256-bit Vector Integer Processing<br />

AVX2 promotes the vast majority of 128-bit integer SIMD instruction sets to operate with 256-bit wide YMM registers.<br />

AVX2 instructions are encoded using the VEX prefix and require the same operating system support as AVX.<br />

Generally, most of the promoted 256-bit vector integer instructions follow the 128-bit lane operation, similar to the<br />

promoted 256-bit floating-point SIMD instructions in AVX.<br />

Newer functionalities in AVX2 generally fall into the following categories:<br />

• Fetching non-contiguous data elements from memory using vector-index memory addressing. These “gather”<br />

instructions introduce a new memory-addressing form, consisting of a base register and multiple indices<br />

specified by a vector register (either XMM or YMM). Data elements sizes of 32 and 64-bits are supported, and<br />

data types for floating-point and integer elements are also supported.<br />

• Cross-lane functionalities are provided with several new instructions for broadcast and permute operations.<br />

Some of the 256-bit vector integer instructions promoted from legacy SSE instruction sets also exhibit crosslane<br />

behavior, e.g. VPMOVZ/VPMOVS family.<br />

• AVX2 complements the AVX instructions that are typed for floating-point operation with a full compliment of<br />

equivalent set for operating with 32/64-bit integer data elements.<br />

• Vector shift instructions with per-element shift count. Data elements sizes of 32 and 64-bits are supported.<br />

1.6 GENERAL PURPOSE INSTRUCTION SET ENHANCEMENTS<br />

Enhancements in the general-purpose instruction set consist of several categories:<br />

• A rich collection of instructions to manipulate integer data at bit-granularity. Most of the bit-manipulation<br />

instructions employ VEX-prefix encoding to support three-operand syntax with non-destructive source<br />

operands. Two of the bit-manipulating instructions (LZCNT, TZCNT) are not encoded using VEX. The VEXencoded<br />

bit-manipulation instructions include: ANDN, BEXTR, BLSI, BLSMSK, BLSR, BZHI, PEXT, PDEP, SARX,<br />

SHLX, SHRX, and RORX.<br />

• Enhanced integer multiply instruction (MULX) in conjunctions with some of the bit-manipulation instructions<br />

allow software to accelerate calculation of large integer numerics (wider than 128-bits).<br />

• INVPCID instruction targets system software that manages processor context IDs.<br />

Ref. # 319433-014 1-5

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!