03.03.2013 Views

Intel® Architecture Instruction Set Extensions Programming Reference

Intel® Architecture Instruction Set Extensions Programming Reference

Intel® Architecture Instruction Set Extensions Programming Reference

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

INSTRUCTION SET REFERENCE<br />

VEX.256 version: The instruction will gather four qword values. For dword indices, only the lower four indices in<br />

the vector index register are used.<br />

Note that:<br />

• If any pair of the index, mask, or destination registers are the same, this instruction results a UD fault.<br />

• The values may be read from memory in any order. Memory ordering with other instructions follows the Intel-<br />

64 memory-ordering model.<br />

• Faults are delivered in a right-to-left manner. That is, if a fault is triggered by an element and delivered, all<br />

elements closer to the LSB of the destination will be completed (and non-faulting). Individual elements closer<br />

to the MSB may or may not be completed. If a given element triggers multiple faults, they are delivered in the<br />

conventional order.<br />

• Elements may be gathered in any order, but faults must be delivered in a right-to-left order; thus, elements to<br />

the left of a faulting one may be gathered before the fault is delivered. A given implementation of this<br />

instruction is repeatable - given the same input values and architectural state, the same set of elements to the<br />

left of the faulting one will be gathered.<br />

• This instruction does not perform AC checks, and so will never deliver an AC fault.<br />

• This instruction will cause a #UD if the address size attribute is 16-bit.<br />

• This instruction should not be used to access memory mapped I/O as the ordering of the individual loads it does<br />

is implementation specific, and some implementations may use loads larger than the data element size or load<br />

elements an indeterminate number of times.<br />

• The scaled index may require more bits to represent than the address bits used by the processor (e.g., in 32bit<br />

mode, if the scale is greater than one). In this case, the most significant bits beyond the number of address<br />

bits are ignored.<br />

Operation<br />

DEST SRC1;<br />

BASE_ADDR: base register encoded in VSIB addressing;<br />

VINDEX: the vector index register encoded by VSIB addressing;<br />

SCALE: scale factor encoded by SIB:[7:6];<br />

DISP: optional 1, 4 byte displacement;<br />

MASK SRC3;<br />

VPGATHERDQ (VEX.128 version)<br />

FOR j 0 to 1<br />

i j * 64;<br />

IF MASK[63+i] THEN<br />

MASK[i +63:i] 0xFFFFFFFF_FFFFFFFF; // extend from most significant bit<br />

ELSE<br />

MASK[i +63:i] 0;<br />

FI;<br />

ENDFOR<br />

FOR j 0 to 1<br />

k j * 32;<br />

i j * 64;<br />

DATA_ADDR BASE_ADDR + (SignExtend(VINDEX[k+31:k])*SCALE + DISP;<br />

IF MASK[63+i] THEN<br />

DEST[i +63:i] FETCH_64BITS(DATA_ADDR); // a fault exits the instruction<br />

FI;<br />

MASK[i +63:i] 0;<br />

ENDFOR<br />

MASK[VLMAX:128] 0;<br />

DEST[VLMAX:128] 0;<br />

(non-masked elements of the mask register have the content of respective element cleared)<br />

Ref. # 319433-014 5-217

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!