03.03.2013 Views

Intel® Architecture Instruction Set Extensions Programming Reference

Intel® Architecture Instruction Set Extensions Programming Reference

Intel® Architecture Instruction Set Extensions Programming Reference

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

INSTRUCTION SET REFERENCE<br />

VEX.256 version: For dword indices, the instruction will gather eight dword values. For qword indices, the instruction<br />

will gather four values and zeroes the upper 128 bits of the destination.<br />

Note that:<br />

• If any pair of the index, mask, or destination registers are the same, this instruction results a UD fault.<br />

• The values may be read from memory in any order. Memory ordering with other instructions follows the Intel-<br />

64 memory-ordering model.<br />

• Faults are delivered in a right-to-left manner. That is, if a fault is triggered by an element and delivered, all<br />

elements closer to the LSB of the destination will be completed (and non-faulting). Individual elements closer<br />

to the MSB may or may not be completed. If a given element triggers multiple faults, they are delivered in the<br />

conventional order.<br />

• Elements may be gathered in any order, but faults must be delivered in a right-to-left order; thus, elements to<br />

the left of a faulting one may be gathered before the fault is delivered. A given implementation of this<br />

instruction is repeatable - given the same input values and architectural state, the same set of elements to the<br />

left of the faulting one will be gathered.<br />

• This instruction does not perform AC checks, and so will never deliver an AC fault.<br />

• This instruction will cause a #UD if the address size attribute is 16-bit.<br />

• This instruction should not be used to access memory mapped I/O as the ordering of the individual loads it does<br />

is implementation specific, and some implementations may use loads larger than the data element size or load<br />

elements an indeterminate number of times.<br />

• The scaled index may require more bits to represent than the address bits used by the processor (e.g., in 32bit<br />

mode, if the scale is greater than one). In this case, the most significant bits beyond the number of address<br />

bits are ignored.<br />

Operation<br />

DEST SRC1;<br />

BASE_ADDR: base register encoded in VSIB addressing;<br />

VINDEX: the vector index register encoded by VSIB addressing;<br />

SCALE: scale factor encoded by SIB:[7:6];<br />

DISP: optional 1, 4 byte displacement;<br />

MASK SRC3;<br />

VPGATHERDD (VEX.128 version)<br />

FOR j 0 to 3<br />

i j * 32;<br />

IF MASK[31+i] THEN<br />

MASK[i +31:i] 0xFFFFFFFF; // extend from most significant bit<br />

ELSE<br />

MASK[i +31:i] 0;<br />

FI;<br />

ENDFOR<br />

MASK[VLMAX:128] 0;<br />

FOR j 0 to 3<br />

i j * 32;<br />

DATA_ADDR BASE_ADDR + (SignExtend(VINDEX[i+31:i])*SCALE + DISP;<br />

IF MASK[31+i] THEN<br />

DEST[i +31:i] FETCH_32BITS(DATA_ADDR); // a fault exits the instruction<br />

FI;<br />

MASK[i +31:i] 0;<br />

ENDFOR<br />

DEST[VLMAX:128] 0;<br />

(non-masked elements of the mask register have the content of respective element cleared)<br />

Ref. # 319433-014 5-213

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!