09.12.2012 Views

Cortex-A8 Technical Reference Manual - ARM Information Center

Cortex-A8 Technical Reference Manual - ARM Information Center

Cortex-A8 Technical Reference Manual - ARM Information Center

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Instruction Cycle Timing<br />

The Subnormal penalty column indicates whether additional cycles are required for subnormal<br />

operands, subnormal intermediate values, or subnormal final results. This penalty only applies<br />

when the VFP coprocessor has flush-to-zero mode disabled.<br />

For operations that have the result penalty, six to seven additional cycles are required to format<br />

the final result.<br />

For operations that have the operand penalty:<br />

• one subnormal operand requires five to six additional cycles<br />

• two subnormal operands require nine to ten additional cycles<br />

• three subnormal operands require nine to ten additional cycles plus an intermediate<br />

penalty.<br />

All 3-input operations FMAC, FNMAC, FMSC, or FNMSC are variations of multiply-add, that is, a<br />

multiplication followed by an addition. The multiplication produces an intermediate result that<br />

might itself be subnormal. This intermediate subnormal has a penalty that is the same as the<br />

output penalty (applied to the multiply) plus the input penalty (applied to the addition), which<br />

amounts to an additional 11-13 cycles.<br />

A slightly simpler way to look at 3-input operation is to split them into equivalent multiply and<br />

add instructions. A 3-input operation takes the same amount of time as its component<br />

multiplication and addition, usually minus one cycle.<br />

An FMAC operation with three normal operands might have a multiplication that takes 12<br />

cycles and an addition that takes nine cycles. The corresponding multiply followed by add<br />

instruction takes:<br />

12 + 9 - 1 = 20 cycles<br />

For a multiplication of a normal number with a subnormal number that results in a product that<br />

is also subnormal, this operation has an operand and result penalty and takes a total of 21 to 25<br />

cycles. We then add the subnormal product to another subnormal number, resulting in a normal<br />

sum. This addition has two operand penalties, and takes a total of 18 to 20 cycles. The total time<br />

the two operations take is between:<br />

10 + 5 + 6 + 18 = 39 cycles and 12 + 6 + 7 + 20 = 45 cycles<br />

The corresponding FMAC multiply followed by add instruction has two operand penalties of<br />

nine to 10 cycles, an intermediate penalty of 11 to 13 cycles, and the cost of the multiply-add of<br />

18 to 21 cycles. The total time is between:<br />

9 + 11 + 18 = 38 cycles and 10 + 13 + 21 = 44 cycles<br />

16.7.2 VFP instruction execution in the NFP pipeline<br />

The NFP pipeline can execute a subset of the VFPv3 data-processing instructions more quickly<br />

than the VFP coprocessor. The following constraints define which VFP instructions are<br />

executable by the NFP pipeline:<br />

• single-precision data-processing operations only<br />

• RunFast mode must be enabled<br />

• scalar only or non-short vector instructions<br />

If these constraints are met, the following instructions can execute in the NFP pipeline:<br />

• FADDS, FSUBS<br />

• FABSS, FNEGS<br />

• FMULS, FNMULS<br />

<strong>ARM</strong> DDI 0344K Copyright © 2006-2010 <strong>ARM</strong> Limited. All rights reserved. 16-35<br />

ID060510 Non-Confidential

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!