09.12.2012 Views

Cortex-A8 Technical Reference Manual - ARM Information Center

Cortex-A8 Technical Reference Manual - ARM Information Center

Cortex-A8 Technical Reference Manual - ARM Information Center

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

16.4 Other pipeline-dependent latencies<br />

Instruction Cycle Timing<br />

This section describes a variety of other factors that can affect the timing of a code sequence.<br />

For the most part, these factors cannot be accurately predicted on a case by case basis, but can<br />

be accounted for statistically if determining the overall timing for a larger section of code.<br />

16.4.1 Cycle penalty for instruction flow change<br />

Whenever a control flow change occurs in the processor that the prefetch unit has not predicted,<br />

the pipeline must be flushed. This results in a cycle stall equal in number to the length of the<br />

integer pipeline. This branch mispredict penalty is 13 cycles. See Chapter 5 Program Flow<br />

Prediction for details on program execution prediction.<br />

16.4.2 Memory system effects on instruction timings<br />

Replay<br />

event<br />

Load data<br />

miss<br />

Data TLB<br />

miss<br />

Store<br />

buffer full<br />

Unaligned<br />

load or<br />

store<br />

request<br />

16.4.3 Thumb-2 instructions<br />

Because the processor is a statically scheduled design, any stall from the memory system can<br />

result in the minimum of a 8-cycle delay. This 8-cycle delay minimum is balanced with the<br />

minimum number of possible cycles to receive data from the L2 cache in the case of an L1 load<br />

miss. Table 16-16 gives the most common cases that can result in an instruction replay because<br />

of a memory system stall.<br />

Delay Description<br />

Table 16-16 Memory system effects on instruction timings<br />

8 cycles 1. A load instruction misses in the L1 data cache.<br />

2. A request is then made to the L2 data cache.<br />

3. If a miss also occurs in the L2 data cache, then a second replay occurs. The<br />

number of stall cycles depends on the external system memory timing. The time<br />

required to receive the critical word for an L2 cache miss is 18 core cycles plus<br />

the number of cycles required by the external memory system. The minimum<br />

number of additional cycles required for the external system is 2 cycles, making<br />

the total minimum cycle count 20 cycles. However, 20 cycles are likely to be<br />

optimistic because this can only occur in a system with a 1:1 bus ratio and zero<br />

wait-state memory.<br />

24 cycles 1. A table walk because of a miss in the L1 TLB causes a 24-cycle delay, assuming<br />

the translation table entries are found in the L2 cache.<br />

2. If the translation table entries are not present in the L2 cache, the number of stall<br />

cycles depends on the external system memory timing.<br />

8 cycles<br />

plus latency<br />

to drain fill<br />

buffer<br />

1. A store instruction miss does not result in any stalls unless the store buffer is<br />

full.<br />

2. In the case of a full store buffer, the delay is at least eight cycles. The delay can<br />

be more if it takes longer to drain some entries from the store buffer.<br />

8 cycles 1. If a load instruction address is unaligned and the full access is not contained<br />

within a 128-bit boundary, there is a 8-cycle penalty.<br />

2. If a store instruction address is unaligned and the full access is not contained<br />

within a 64-bit boundary, there is a 8-cycle penalty.<br />

As a general rule, Thumb-2 instructions are executed with timing constraints identical to their<br />

<strong>ARM</strong> counterparts. However, there are some second order effects to the cycle timing that you<br />

must observe. First, the code footprint is smaller, which can reduce the number of instruction<br />

cache misses and therefore reduce the cycle count. Second, branch instructions tend to be more<br />

<strong>ARM</strong> DDI 0344K Copyright © 2006-2010 <strong>ARM</strong> Limited. All rights reserved. 16-14<br />

ID060510 Non-Confidential

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!