03.03.2013 Views

Intel® Architecture Instruction Set Extensions Programming Reference

Intel® Architecture Instruction Set Extensions Programming Reference

Intel® Architecture Instruction Set Extensions Programming Reference

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

INSTRUCTION SET REFERENCE<br />

PALIGNR — Byte Align<br />

Opcode/<br />

<strong>Instruction</strong><br />

Description<br />

Op/<br />

En<br />

64/32<br />

-bit<br />

Mode<br />

CPUID<br />

Feature<br />

Flag<br />

Description<br />

66 0F 3A 0F /r ib A V/V SSSE3 Concatenate destination and source operands, extract byte<br />

PALIGNR xmm1, xmm2/m128,<br />

imm8<br />

<strong>Instruction</strong> Operand Encoding<br />

aligned result shifted to the right by constant value in imm8 and<br />

result is stored in xmm1.<br />

VEX.NDS.128.66.0F3A.WIG 0F /r ib B V/V AVX Concatenate xmm2 and xmm3/m128 into a 32-byte intermedi-<br />

VPALIGNR xmm1, xmm2,<br />

xmm3/m128, imm8<br />

ate result, extract byte aligned result shifted to the right by constant<br />

value in imm8 and result is stored in xmm1.<br />

VEX.NDS.256.66.0F3A.WIG 0F /r ib B V/V AVX2 Concatenate pairs of 16 bytes in ymm2 and ymm3/m256 into<br />

VPALIGNR ymm1, ymm2,<br />

ymm3/m256, imm8<br />

32-byte intermediate result, extract byte-aligned, 16-byte result<br />

shifted to the right by constant values in imm8 from each intermediate<br />

result, and two 16-byte results are stored in ymm1<br />

Op/En Operand 1 Operand 2 Operand 3 Operand 4<br />

A ModRM:reg (r, w) ModRM:r/m (r) NA NA<br />

B ModRM:reg (w) VEX.vvvv ModRM:r/m (r) NA<br />

(V)PALIGNR concatenates two blocks of 16-byte data from the first source operand and the second source operand<br />

into an intermediate 32-byte composite, shifts the composite at byte granularity to the right by a constant immediate,<br />

and extracts the right aligned 16-byte result into the destination. The immediate value is considered<br />

unsigned. Immediate shift counts larger than 32 for 128-bit operands produces a zero result.<br />

Legacy SSE instructions: In 64-bit mode use the REX prefix to access additional registers.<br />

128-bit Legacy SSE version: Bits (255:128) of the corresponding YMM destination register remain unchanged.<br />

VEX.256 encoded version: The first source operand is a YMM register and contains two 16-byte blocks. The second<br />

source operand is a YMM register or a 256-bit memory location containing two 16-byte block. The destination<br />

operand is a YMM register and contain two 16-byte results. The imm8[7:0] is the common shift count used for the<br />

two lower 16-byte block sources and the two upper 16-byte block sources. The low 16-byte block of the two source<br />

operands produce the low 16-byte result of the destination operand, the high 16-byte block of the two source operands<br />

produce the high 16-byte result of the destination operand.<br />

VEX.128 encoded version: The first source operand is an XMM register. The second source operand is an XMM<br />

register or 128-bit memory location. The destination operand is an XMM register. The upper bits (255:128) of the<br />

corresponding YMM register destination are zeroed.<br />

Concatenation is done with 128-bit data in the first and second source operand for both 128-bit and 256-bit<br />

instructions. The high 128-bits of the intermediate composite 256-bit result came from the 128-bit data from the<br />

first source operand; the low 128-bits of the intermediate result came from the 128-bit data of the second source<br />

operand.<br />

5-34 Ref. # 319433-014

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!