Intel® Architecture Instruction Set Extensions Programming Reference

More documents

Recommendations

Info

$Intel® Math Kernel Library (Intel® MKL) 11.0 Release Notes$

$Intel® Math Kernel Library 11.0 Support for Intel® Xeon Phi ...$

INSTRUCTION SET REFERENCE - FMA VFMADDSUB132PS/VFMADDSUB213PS/VFMADDSUB231PS — Fused Multiply-Alternating Add/Subtract of Packed Single-Precision Floating-Point Values Opcode/ Instruction VEX.DDS.128.66.0F38.W0 96 /r VFMADDSUB132PS xmm0, xmm1, xmm2/m128 VEX.DDS.128.66.0F38.W0 A6 /r VFMADDSUB213PS xmm0, xmm1, xmm2/m128 VEX.DDS.128.66.0F38.W0 B6 /r VFMADDSUB231PS xmm0, xmm1, xmm2/m128 VEX.DDS.256.66.0F38.W0 96 /r VFMADDSUB132PS ymm0, ymm1, ymm2/m256 VEX.DDS.256.66.0F38.W0 A6 /r VFMADDSUB213PS ymm0, ymm1, ymm2/m256 VEX.DDS.256.66.0F38.W0 B6 /r VFMADDSUB231PS ymm0, ymm1, ymm2/m256 Description Op/ En 64/32 -bit Mode CPUID Feature Flag Description A V/V FMA Multiply packed single-precision floating-point values from xmm0 and xmm2/mem, add/subtract elements in xmm1 and put result in xmm0. A V/V FMA Multiply packed single-precision floating-point values from xmm0 and xmm1, add/subtract elements in xmm2/mem and put result in xmm0. A V/V FMA Multiply packed single-precision floating-point values from xmm1 and xmm2/mem, add/subtract elements in xmm0 and put result in xmm0. A V/V FMA Multiply packed single-precision floating-point values from ymm0 and ymm2/mem, add/subtract elements in ymm1 and put result in ymm0. A V/V FMA Multiply packed single-precision floating-point values from ymm0 and ymm1, add/subtract elements in ymm2/mem and put result in ymm0. A V/V FMA Multiply packed single-precision floating-point values from ymm1 and ymm2/mem, add/subtract elements in ymm0 and put result in ymm0. Instruction Operand Encoding Op/En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM:reg (r, w) VEX.vvvv (r) ModRM:r/m (r) NA VFMADDSUB132PS: Multiplies the four or eight packed single-precision floating-point values from the first source operand to the four or eight packed single-precision floating-point values in the third source operand. From the infinite precision intermediate result, adds the odd single-precision floating-point elements and subtracts the even single-precision floating-point values in the second source operand, performs rounding and stores the resulting four or eight packed single-precision floating-point values to the destination operand (first source operand). VFMADDSUB213PS: Multiplies the four or eight packed single-precision floating-point values from the second source operand to the four or eight packed single-precision floating-point values in the first source operand. From the infinite precision intermediate result, adds the odd single-precision floating-point elements and subtracts the even single-precision floating-point values in the third source operand, performs rounding and stores the resulting four or eight packed single-precision floating-point values to the destination operand (first source operand). VFMADDSUB231PS: Multiplies the four or eight packed single-precision floating-point values from the second source operand to the four or eight packed single-precision floating-point values in the third source operand. From Ref. # 319433-014 6-15
INSTRUCTION SET REFERENCE - FMA the infinite precision intermediate result, adds the odd single-precision floating-point elements and subtracts the even single-precision floating-point values in the first source operand, performs rounding and stores the resulting four or eight packed single-precision floating-point values to the destination operand (first source operand). VEX.256 encoded version: The destination operand (also first source operand) is a YMM register and encoded in reg_field. The second source operand is a YMM register and encoded in VEX.vvvv. The third source operand is a YMM register or a 256-bit memory location and encoded in rm_field. VEX.128 encoded version: The destination operand (also first source operand) is a XMM register and encoded in reg_field. The second source operand is a XMM register and encoded in VEX.vvvv. The third source operand is a XMM register or a 128-bit memory location and encoded in rm_field. The upper 128 bits of the YMM destination register are zeroed. Compiler tools may optionally support a complementary mnemonic for each instruction mnemonic listed in the opcode/instruction column of the summary table. The behavior of the complementary mnemonic in situations involving NANs are governed by the definition of the instruction mnemonic defined in the opcode/instruction column. See also Section 2.3.1, “FMA Instruction Operand Order and Arithmetic Behavior”. Operation In the operations below, "+", "-", and "*" symbols represent addition, subtraction, and multiplication operations with infinite precision inputs and outputs (no rounding). VFMADDSUB132PS DEST, SRC2, SRC3 IF (VEX.128) THEN MAXVL =2 ELSEIF (VEX.256) MAXVL = 4 FI For i = 0 to MAXVL -1{ n = 64*i; DEST[n+31:n] RoundFPControl_MXCSR(DEST[n+31:n]*SRC3[n+31:n] - SRC2[n+31:n]) DEST[n+63:n+32] RoundFPControl_MXCSR(DEST[n+63:n+32]*SRC3[n+63:n+32] + SRC2[n+63:n+32]) } IF (VEX.128) THEN DEST[VLMAX-1:128] 0 FI VFMADDSUB213PS DEST, SRC2, SRC3 IF (VEX.128) THEN MAXVL =2 ELSEIF (VEX.256) MAXVL = 4 FI For i = 0 to MAXVL -1{ n = 64*i; DEST[n+31:n] RoundFPControl_MXCSR(SRC2[n+31:n]*DEST[n+31:n] - SRC3[n+31:n]) DEST[n+63:n+32] RoundFPControl_MXCSR(SRC2[n+63:n+32]*DEST[n+63:n+32] + SRC3[n+63:n+32]) } IF (VEX.128) THEN DEST[VLMAX-1:128] 0 FI VFMADDSUB231PS DEST, SRC2, SRC3 IF (VEX.128) THEN MAXVL =2 ELSEIF (VEX.256) MAXVL = 4 6-16 Ref. # 319433-014
Page 1 and 2:
Intel® Architecture Instruction Se
Page 3 and 4:
CONTENTS CHAPTER 1 INTEL® ADVANCED
Page 5 and 6:
PANDN — Logical AND NOT . . . . .
Page 7 and 8:
BZHI — Zero High Bits Starting wi
Page 9 and 10:
TABLES 2-1 Rounding behavior of Zer
Page 11 and 12:
FIGURES Figure 2-1. General Procedu
Page 13 and 14:
INTEL® ADVANCED VECTOR EXTENSIONS
Page 15 and 16:
Page 17 and 18:
Page 19 and 20:
APPLICATION PROGRAMMING MODEL Prior
Page 21 and 22:
APPLICATION PROGRAMMING MODEL } and
Page 23 and 24:
APPLICATION PROGRAMMING MODEL Scala
Page 25 and 26:
APPLICATION PROGRAMMING MODEL x (mu
Page 27 and 28:
APPLICATION PROGRAMMING MODEL Instr
Page 29 and 30:
APPLICATION PROGRAMMING MODEL Table
Page 31 and 32:
APPLICATION PROGRAMMING MODEL Excep
Page 33 and 34:
APPLICATION PROGRAMMING MODEL 2.7.2
Page 35 and 36:
Page 37 and 38:
Page 39 and 40:
Page 41 and 42:
Page 43 and 44:
Page 45 and 46:
Page 47 and 48:
Page 49 and 50:
Page 51 and 52:
Page 53 and 54:
APPLICATION PROGRAMMING MODEL EAX 3
Page 55 and 56:
Page 57 and 58:
Page 59 and 60:
Page 61 and 62:
APPLICATION PROGRAMMING MODEL • T
Page 63 and 64:
APPLICATION PROGRAMMING MODEL How B
Page 65 and 66:
APPLICATION PROGRAMMING MODEL
Page 67 and 68:
APPLICATION PROGRAMMING MODEL EAX[1
Page 69 and 70:
APPLICATION PROGRAMMING MODEL EBX
Page 71 and 72:
SYSTEM PROGRAMMING MODEL • Verify
Page 73 and 74:
SYSTEM PROGRAMMING MODEL The proces
Page 75 and 76:
SYSTEM PROGRAMMING MODEL 3.3 RESET
Page 77 and 78:
INSTRUCTION FORMAT 4.1.1 VEX and th
Page 79 and 80:
INSTRUCTION FORMAT (Bit Position) 7
Page 81 and 82:
INSTRUCTION FORMAT The VEX.vvvv fie
Page 83 and 84:
INSTRUCTION FORMAT 4.1.8 The Third
Page 85 and 86:
INSTRUCTION FORMAT NOTES: 1. If Mod
Page 87 and 88:
INSTRUCTION SET REFERENCE (V)ADDSD
Page 89 and 90:
INSTRUCTION SET REFERENCE register
Page 91 and 92:
INSTRUCTION SET REFERENCE MPSADBW
Page 93 and 94:
INSTRUCTION SET REFERENCE SRC2_BYTE
Page 95 and 96:
INSTRUCTION SET REFERENCE VMPSADBW
Page 97 and 98:
INSTRUCTION SET REFERENCE DEST[79:6
Page 99 and 100:
INSTRUCTION SET REFERENCE Operation
Page 101 and 102:
INSTRUCTION SET REFERENCE PACKSSWB/
Page 103 and 104:
Page 105 and 106:
INSTRUCTION SET REFERENCE PACKUSDW
Page 107 and 108:
INSTRUCTION SET REFERENCE TMP[239:2
Page 109 and 110:
INSTRUCTION SET REFERENCE DEST[119:
Page 111 and 112:
INSTRUCTION SET REFERENCE PADDB/PAD
Page 113 and 114:
INSTRUCTION SET REFERENCE PADDQ (Le
Page 115 and 116:
INSTRUCTION SET REFERENCE PADDSB/PA
Page 117 and 118:
INSTRUCTION SET REFERENCE PADDUSB/P
Page 119 and 120:
INSTRUCTION SET REFERENCE PALIGNR
Page 121 and 122:
INSTRUCTION SET REFERENCE PAND —
Page 123 and 124:
INSTRUCTION SET REFERENCE PANDN —
Page 125 and 126:
INSTRUCTION SET REFERENCE PAVGB/PAV
Page 127 and 128:
INSTRUCTION SET REFERENCE PBLENDVB
Page 129 and 130:
INSTRUCTION SET REFERENCE IF (MASK[
Page 131 and 132:
INSTRUCTION SET REFERENCE PBLENDW
Page 133 and 134:
INSTRUCTION SET REFERENCE ELSE DEST
Page 135 and 136:
INSTRUCTION SET REFERENCE Descripti
Page 137 and 138:
INSTRUCTION SET REFERENCE PCMPEQQ (
Page 139 and 140:
Page 141 and 142:
INSTRUCTION SET REFERENCE Intel C/C
Page 143 and 144:
Page 145 and 146:
INSTRUCTION SET REFERENCE PHADDSW
Page 147 and 148:
INSTRUCTION SET REFERENCE PHSUBW/PH
Page 149 and 150:
Page 151 and 152:
Page 153 and 154:
INSTRUCTION SET REFERENCE PMADDUBSW
Page 155 and 156:
INSTRUCTION SET REFERENCE VPMADDWD
Page 157 and 158:
INSTRUCTION SET REFERENCE VEX.128 e
Page 159 and 160:
Page 161 and 162:
Page 163 and 164:
Page 165 and 166:
Page 167 and 168:
Page 169 and 170:
Page 171 and 172:
INSTRUCTION SET REFERENCE VPMINUD (
Page 173 and 174:
INSTRUCTION SET REFERENCE PMOVMSKB
Page 175 and 176:
INSTRUCTION SET REFERENCE Opcode/ I
Page 177 and 178:
INSTRUCTION SET REFERENCE VPMOVSXWQ
Page 179 and 180:
INSTRUCTION SET REFERENCE PMOVZX
Page 181 and 182:
INSTRUCTION SET REFERENCE Packed_Ze
Page 183 and 184:
INSTRUCTION SET REFERENCE Other Exc
Page 185 and 186:
Page 187 and 188:
INSTRUCTION SET REFERENCE temp13[31
Page 189 and 190:
INSTRUCTION SET REFERENCE PMULHUW
Page 191 and 192:
Page 193 and 194:
Page 195 and 196:
INSTRUCTION SET REFERENCE PMULLW/PM
Page 197 and 198:
INSTRUCTION SET REFERENCE Temp10[31
Page 199 and 200:
INSTRUCTION SET REFERENCE PMULUDQ
Page 201 and 202:
INSTRUCTION SET REFERENCE POR — B
Page 203 and 204:
INSTRUCTION SET REFERENCE PSADBW
Page 205 and 206:
INSTRUCTION SET REFERENCE PSHUFB
Page 207 and 208:
INSTRUCTION SET REFERENCE PSHUFD
Page 209 and 210:
INSTRUCTION SET REFERENCE PSHUFHW
Page 211 and 212:
INSTRUCTION SET REFERENCE PSHUFLW
Page 213 and 214:
INSTRUCTION SET REFERENCE PSIGNB/PS
Page 215 and 216:
INSTRUCTION SET REFERENCE DEST[255.
Page 217 and 218:
INSTRUCTION SET REFERENCE PSLLDQ
Page 219 and 220:
INSTRUCTION SET REFERENCE PSLLW/PSL
Page 221 and 222:
Page 223 and 224:
INSTRUCTION SET REFERENCE PSLLD (xm
Page 225 and 226:
INSTRUCTION SET REFERENCE PSRAW/PSR
Page 227 and 228:
INSTRUCTION SET REFERENCE COUNT 31
Page 229 and 230:
INSTRUCTION SET REFERENCE PSRLDQ
Page 231 and 232:
INSTRUCTION SET REFERENCE PSRLW/PSR
Page 233 and 234:
INSTRUCTION SET REFERENCE FI; DEST[
Page 235 and 236:
INSTRUCTION SET REFERENCE VPSRLQ (x
Page 237 and 238:
Page 239 and 240:
Page 241 and 242:
INSTRUCTION SET REFERENCE PSUBSB/PS
Page 243 and 244:
INSTRUCTION SET REFERENCE PSUBUSB/P
Page 245 and 246:
INSTRUCTION SET REFERENCE PUNPCKHBW
Page 247 and 248:
Page 249 and 250:
INSTRUCTION SET REFERENCE VPUNPCKHW
Page 251 and 252:
INSTRUCTION SET REFERENCE Instructi
Page 253 and 254:
INSTRUCTION SET REFERENCE INTERLEAV
Page 255 and 256:
INSTRUCTION SET REFERENCE VPUNPCKLD
Page 257 and 258:
INSTRUCTION SET REFERENCE SIMD Floa
Page 259 and 260:
Page 261 and 262:
INSTRUCTION SET REFERENCE VBROADCAS
Page 263 and 264:
Page 265 and 266:
INSTRUCTION SET REFERENCE VPBLENDD
Page 267 and 268:
Page 269 and 270: INSTRUCTION SET REFERENCE VPBROADCA
Page 271 and 272: INSTRUCTION SET REFERENCE VPERMPD
Page 273 and 274: INSTRUCTION SET REFERENCE VPERMQ
Page 275 and 276: INSTRUCTION SET REFERENCE Operation
Page 277 and 278: INSTRUCTION SET REFERENCE VINSERTI1
Page 279 and 280: INSTRUCTION SET REFERENCE The secon
Page 281 and 282: INSTRUCTION SET REFERENCE VPSLLVD/V
Page 283 and 284: INSTRUCTION SET REFERENCE Other Exc
Page 285 and 286: INSTRUCTION SET REFERENCE VPSRAVD (
Page 287 and 288: INSTRUCTION SET REFERENCE VPSRLVD (
Page 289 and 290: INSTRUCTION SET REFERENCE VGATHERDP
Page 291 and 292: INSTRUCTION SET REFERENCE VGATHERQP
Page 293 and 294: INSTRUCTION SET REFERENCE VGATHERDP
Page 295 and 296: INSTRUCTION SET REFERENCE VGATHERQP
Page 297 and 298: INSTRUCTION SET REFERENCE VPGATHERD
Page 299 and 300: INSTRUCTION SET REFERENCE VPGATHERQ
Page 301 and 302: INSTRUCTION SET REFERENCE VPGATHERD
Page 303 and 304: INSTRUCTION SET REFERENCE VPGATHERQ
Page 305 and 306: INSTRUCTION SET REFERENCE This page
Page 307 and 308: INSTRUCTION SET REFERENCE - FMA VFM
Page 309 and 310: INSTRUCTION SET REFERENCE - FMA For
Page 311 and 312: INSTRUCTION SET REFERENCE - FMA sou
Page 319: INSTRUCTION SET REFERENCE - FMA Int
Page 325 and 326: INSTRUCTION SET REFERENCE - FMA Int
Page 327 and 328: INSTRUCTION SET REFERENCE - FMA the
Page 331 and 332: INSTRUCTION SET REFERENCE - FMA } I
Page 333 and 334: INSTRUCTION SET REFERENCE - FMA ope
Page 339 and 340: INSTRUCTION SET REFERENCE - FMA VFN
Page 343 and 344: INSTRUCTION SET REFERENCE - FMA VEX
Page 353 and 354: INSTRUCTION SET REFERENCE - FMA VEX
Page 359 and 360: INSTRUCTION SET REFERENCE - FMA Thi
Page 361 and 362: INSTRUCTION SET REFERENCE - VEX-ENC
Page 371 and 372:
INSTRUCTION SET REFERENCE - VEX-ENC
Page 373 and 374:
Page 375 and 376:
Page 377 and 378:
Page 379 and 380:
Page 381 and 382:
Page 383 and 384:
Page 385 and 386:
Page 387 and 388:
INTEL® TRANSACTIONAL SYNCHRONIZATI
Page 389 and 390:
Page 391 and 392:
Page 393 and 394:
Page 395 and 396:
Page 397 and 398:
Page 399 and 400:
Page 401 and 402:
Page 403 and 404:
Page 405 and 406:
Page 407 and 408:
ADDITIONAL NEW INSTRUCTIONS -------
Page 409 and 410:
ADDITIONAL NEW INSTRUCTIONS • If
Page 411 and 412:
ADDITIONAL NEW INSTRUCTIONS ADCX
Page 413 and 414:
ADDITIONAL NEW INSTRUCTIONS ADOX
Page 415 and 416:
ADDITIONAL NEW INSTRUCTIONS Compati
Page 417 and 418:
ADDITIONAL NEW INSTRUCTIONS Flags A
Page 419 and 420:
ADDITIONAL NEW INSTRUCTIONS STAC—
Page 421 and 422:
OPCODE MAP D The reg field of the M
Page 423 and 424:
OPCODE MAP • A ModR/M byte is req
Page 425 and 426:
OPCODE MAP A.2.5 Superscripts Utili
Page 427 and 428:
OPCODE MAP Table A-2. One-byte Opco
Page 429 and 430:
OPCODE MAP 0 1 2 3 4 5 6 7 Table A-
Page 431 and 432:
OPCODE MAP 8 9 A B C D E F Table A-
Page 433 and 434:
OPCODE MAP 0 Table A-4. Three-byte
Page 435 and 436:
OPCODE MAP 0 66 1 66 2 66 3 4 66 5
Page 437 and 438:
OPCODE MAP A.4 OPCODE EXTENSIONS FO
Page 439 and 440:
OPCODE MAP Opcode Group Mod 7,6 pfx
Page 441 and 442:
OPCODE MAP Table A-8 shows the map
Page 443 and 444:
Page 445 and 446:
Page 447 and 448:
OPCODE MAP Table A-20 shows the opc
Page 449 and 450:
OPCODE MAP This page was intentiona
Page 451 and 452:
INSTRUCTION SUMMARY VEX.256 Encodin
Page 453 and 454:
Page 455 and 456:
Page 457 and 458:
INSTRUCTION SUMMARY Note 3: It is e
Page 459 and 460:
INSTRUCTION SUMMARY Table B-2. Prom
Page 461 and 462:
INSTRUCTION SUMMARY Table B-3. VEX-
Page 463 and 464:
INSTRUCTION SUMMARY Opcode Instruct
Page 465 and 466:
Page 467 and 468:
Page 469 and 470:
Page 471 and 472:
Page 473 and 474:
SSE extensions flag . . . . . . . .
Page 475 and 476:
SSE extensions CPUID flag . . . . .
Page 477:
I-6 Ref. # 319433-014
show all

Intel® Architecture Instruction Set Extensions Programming Reference

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?