03.03.2013 Views

Intel® Architecture Instruction Set Extensions Programming Reference

Intel® Architecture Instruction Set Extensions Programming Reference

Intel® Architecture Instruction Set Extensions Programming Reference

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

PMADDWD — Multiply and Add Packed Integers<br />

Opcode/<br />

<strong>Instruction</strong><br />

Description<br />

<strong>Instruction</strong> Operand Encoding<br />

INSTRUCTION SET REFERENCE<br />

Multiplies the individual signed words of the first source operand by the corresponding signed words of the second<br />

source operand, producing temporary signed, doubleword results. The adjacent doubleword results are then<br />

summed and stored in the destination operand. For example, the corresponding low-order words (15:0) and (31-<br />

16) in the second source and first source operands are multiplied by one another and the doubleword results are<br />

added together and stored in the low doubleword of the destination register (31-0). The same operation is<br />

performed on the other pairs of adjacent words.<br />

The (V)PMADDWD instruction wraps around only in one situation: when the 2 pairs of words being operated on in<br />

a group are all 8000H. In this case, the result wraps around to 80000000H.<br />

128-bit Legacy SSE version: The first source and destination operands are XMM registers. The second source<br />

operand is an XMM register or a 128-bit memory location. Bits (255:128) of the corresponding YMM destination<br />

register remain unchanged.<br />

VEX.128 encoded version: The first source and destination operands are XMM registers. The second source<br />

operand is an XMM register or a 128-bit memory location. Bits (255:128) of the corresponding YMM register are<br />

zeroed.<br />

VEX.256 encoded version: The second source operand can be an YMM register or a 256-bit memory location. The<br />

first source and destination operands are YMM registers.<br />

Operation<br />

Op/<br />

En<br />

64/32<br />

-bit<br />

Mode<br />

CPUID<br />

Feature<br />

Flag<br />

Description<br />

66 0F F5 /r A V/V SSE2 Multiply the packed word integers in xmm1 by the packed word<br />

PMADDWD xmm1, xmm2/m128<br />

integers in xmm2/m128, add adjacent doubleword results, and<br />

store in xmm1.<br />

VEX.NDS.128.66.0F.WIG F5 /r B V/V AVX Multiply the packed word integers in xmm2 by the packed word<br />

VPMADDWD xmm1, xmm2,<br />

xmm3/m128<br />

VPMADDWD (VEX.256 encoded version)<br />

DEST[31:0] (SRC1[15:0] * SRC2[15:0]) + (SRC1[31:16] * SRC2[31:16])<br />

DEST[63:32] (SRC1[47:32] * SRC2[47:32]) + (SRC1[63:48] * SRC2[63:48])<br />

DEST[95:64] (SRC1[79:64] * SRC2[79:64]) + (SRC1[95:80] * SRC2[95:80])<br />

DEST[127:96] (SRC1[111:96] * SRC2[111:96]) + (SRC1[127:112] * SRC2[127:112])<br />

DEST[159:128] (SRC1[143:128] * SRC2[143:128]) + (SRC1[159:144] * SRC2[159:144])<br />

DEST[191:160] (SRC1[175:160] * SRC2[175:160]) + (SRC1[191:176] * SRC2[191:176])<br />

DEST[223:192] (SRC1[207:192] * SRC2[207:192]) + (SRC1[223:208] * SRC2[223:208])<br />

DEST[255:224] (SRC1[239:224] * SRC2[239:224]) + (SRC1[255:240] * SRC2[255:240])<br />

integers in xmm3/m128, add adjacent doubleword results, and<br />

store in xmm1.<br />

VEX.NDS.256.66.0F.WIG F5 /r B V/V AVX2 Multiply the packed word integers in ymm2 by the packed word<br />

VPMADDWD ymm1, ymm2,<br />

ymm3/m256<br />

integers in ymm3/m256, add adjacent doubleword results, and<br />

store in ymm1.<br />

Op/En Operand 1 Operand 2 Operand 3 Operand 4<br />

A ModRM:reg (r, w) ModRM:r/m (r) NA NA<br />

B ModRM:reg (w) VEX.vvvv ModRM:r/m (r) NA<br />

Ref. # 319433-014 5-69

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!