03.03.2013 Views

Intel® Architecture Instruction Set Extensions Programming Reference

Intel® Architecture Instruction Set Extensions Programming Reference

Intel® Architecture Instruction Set Extensions Programming Reference

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

APPLICATION PROGRAMMING MODEL<br />

x<br />

(multiplicand)<br />

y<br />

(multiplier)<br />

F 0 F z -z z -z<br />

F F 0 x*y x*y -x*y -x*y Rounded to the destination precision, with<br />

bounded exponent<br />

F F F (x*y)+z (x*y)-z -(x*y)+z -(x*y)-z Rounded to the destination precision, with<br />

bounded exponent; however, if the exact values<br />

of x*y and z are equal in magnitude with signs<br />

resulting in the FMA operation producing 0, the<br />

rounding behavior described in Table 2-1.<br />

If unmasked floating-point exceptions are signaled (invalid operation, denormal operand, overflow, underflow, or<br />

inexact result) the result register is left unchanged and a floating-point exception handler is invoked.<br />

2.3.1 FMA <strong>Instruction</strong> Operand Order and Arithmetic Behavior<br />

FMA instruction mnemonics are defined explicitly with an ordered three digits, e.g. VFMADD132PD. The value of<br />

each digit refers to the ordering of the three source operand as defined by instruction encoding specification:<br />

• ‘1’: The first source operand (also the destination operand) in the syntactical order listed in this specification.<br />

• ‘2’: The second source operand in the syntactical order. This is a YMM/XMM register, encoded using VEX prefix.<br />

• ‘3’: The third source operand in the syntactical order. The first and third operand are encoded following ModR/M<br />

encoding rules.<br />

The ordering of each digit within the mnemonic refers to the floating-point data listed on the right-hand side of the<br />

arithmetic equation of each FMA operation (see Table 2-2):<br />

• The first position in the three digits of a FMA mnemonic refers to the operand position of the first FP data<br />

expressed in the arithmetic equation of FMA operation, the multiplicand.<br />

• The second position in the three digits of a FMA mnemonic refers to the operand position of the second FP data<br />

expressed in the arithmetic equation of FMA operation, the multiplier.<br />

• The third position in the three digits of a FMA mnemonic refers to the operand position of the FP data being<br />

added/subtracted to the multiplication result.<br />

Note the non-numerical result of an FMA operation does not resemble the mathematically-defined commutative<br />

property between the multiplicand and the multiplier values (see Table 2-2). Consequently, software tools (such as<br />

an assembler) may support a complementary set of FMA mnemonics for each FMA instruction for ease of programming<br />

to take advantage of the mathematical property of commutative multiplications. For example, an assembler<br />

may optionally support the complementary mnemonic “VFMADD312PD“ in addition to the true mnemonic<br />

“VFMADD132PD“. The assembler will generate the same instruction opcode sequence corresponding to<br />

VFMADD132PD. The processor executes VFMADD132PD and report any NAN conditions based on the definition of<br />

VFMADD132PD. Similarly, if the complementary mnemonic VFMADD123PD is supported by an assembler at source<br />

level, it must generate the opcode sequence corresponding to VFMADD213PD; the complementary mnemonic<br />

VFMADD321PD must produce the opcode sequence defined by VFMADD231PD. In the absence of FMA operations<br />

reporting a NAN result, the numerical results of using either mnemonic with an assembler supporting both<br />

mnemonics will match the behavior defined in Table 2-2. Support for the complementary FMA mnemonics by software<br />

tools is optional.<br />

2.4 ACCESSING YMM REGISTERS<br />

z<br />

r=(x*y)<br />

+z<br />

r=(x*y)<br />

-z<br />

r =<br />

-(x*y)+z<br />

r=<br />

-(x*y)-z<br />

Comment<br />

The lower 128 bits of a YMM register is aliased to the corresponding XMM register. Legacy SSE instructions (i.e.<br />

SIMD instructions operating on XMM state but not using the VEX prefix, also referred to non-VEX encoded SIMD<br />

instructions) will not access the upper bits (255:128) of the YMM registers. AVX and FMA instructions with a VEX<br />

prefix and vector length of 128-bits zeroes the upper 128 bits of the YMM register. See Chapter 2, “<strong>Programming</strong><br />

Considerations with 128-bit SIMD <strong>Instruction</strong>s” for more details.<br />

2-8 Ref. # 319433-014

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!