03.03.2013 Views

Intel® Architecture Instruction Set Extensions Programming Reference

Intel® Architecture Instruction Set Extensions Programming Reference

Intel® Architecture Instruction Set Extensions Programming Reference

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

INTEL® ADVANCED VECTOR EXTENSIONS<br />

1.5.1 256-bit Floating-Point Arithmetic Processing Enhancements<br />

Intel AVX provides 35 256-bit floating-point arithmetic instructions. The arithmetic operations cover add, subtract,<br />

multiply, divide, square-root, compare, max, min, round, etc., on single-precision and double-precision floatingpoint<br />

data.<br />

The enhancement in AVX on floating-point compare operation provides 32 conditional predicates to improve<br />

programming flexibility in evaluating conditional expressions.<br />

FMA provides 36 256-bit floating-point instructions to perform computation on 256-bit vectors. The arithmetic<br />

operations cover fused multiply-add, fused multiply-subtract, fused multiply add/subtract interleave, signedreversed<br />

multiply on fused multiply-add and multiply-subtract.<br />

1.5.2 256-bit Non-Arithmetic <strong>Instruction</strong> Enhancements<br />

Intel AVX provides new primitives for handling data movement within 256-bit floating-point vectors and promotes<br />

many 128-bit floating data processing instructions to handle 256-bit floating-point vectors.<br />

AVX includes 39 256-bit data processing instructions that are promoted from previous generations of SIMD instruction<br />

extensions, ranging from logical, blend, convert, test, unpacking, shuffling, load and stores.<br />

AVX introduces 18 new data processing instructions that operate on 256-bit vectors. These new primitives cover<br />

the following operations:<br />

• Non-unit-stride fetching of SIMD data. AVX provides several flexible SIMD floating-point data fetching<br />

primitives:<br />

— broadcast of single or multiple data elements into a 256-bit destination,<br />

— masked move primitives to load or store SIMD data elements conditionally,<br />

• Intra-register manipulation of SIMD data elements. AVX provides several flexible SIMD floating-point data<br />

manipulation primitives:<br />

— insert/extract multiple SIMD floating-point data elements to/from 256-bit SIMD registers<br />

— permute primitives to facilitate efficient manipulation of floating-point data elements in 256-bit SIMD<br />

registers<br />

• Branch handling. AVX provides several primitives to enable handling of branches in SIMD programming:<br />

— new variable blend instructions supports four-operand syntax with non-destructive source syntax. This is<br />

more flexible than the equivalent SSE4 instruction syntax which uses the XMM0 register as the implied<br />

mask for blend selection.<br />

— Packed TEST instructions for floating-point data.<br />

1.5.3 Arithmetic Primitives for 128-bit Vector and Scalar processing<br />

Intel AVX provides 131 128-bit numeric processing instructions that employ VEX-prefix encoding. These VEXencoded<br />

instructions generally provide the same functionality over instructions operating on XMM register that are<br />

encoded using SIMD prefixes. The 128-bit numeric processing instructions in AVX cover floating-point and integer<br />

data processing across 128-bit vector and scalar processing.<br />

The enhancement in AVX on 128-bit floating-point compare operation provides 32 conditional predicates to<br />

improve programming flexibility in evaluating conditional expressions. This contrasts with floating-point SIMD<br />

compare instructions in SSE and SSE2 supporting only 8 conditional predicates.<br />

FMA provides 60 128-bit floating-point instructions to process 128-bit vector and scalar data. The arithmetic operations<br />

cover fused multiply-add, fused multiply-subtract, signed-reversed multiply on fused multiply-add and<br />

multiply-subtract.<br />

1.5.4 Non-Arithmetic Primitives for 128-bit Vector and Scalar Processing<br />

Intel AVX provides 126 data processing instructions that employ VEX-prefix encoding. These VEX-encoded instructions<br />

generally provide the same functionality over instructions operating on XMM register that are encoded using<br />

1-4 Ref. # 319433-014

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!