29.11.2014 Views

Fast Models Reference Manual - ARM Information Center

Fast Models Reference Manual - ARM Information Center

Fast Models Reference Manual - ARM Information Center

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Accuracy and Functionality<br />

Real processors attempt to prefetch instructions ahead of execution and predict branch<br />

destinations to keep the prefetch queue full. The instruction prefetch behavior of a processor can<br />

be observed by a program that writes into its own prefetch queue (without using explicit<br />

barriers). The architecture does not define the results.<br />

The CT engine processes code in blocks. The effect is as if the processor filled its prefetch queue<br />

with a block of instructions, then executed the block to completion. As a result, this virtual<br />

prefetch queue is sometimes larger and sometimes smaller than the corresponding hardware. In<br />

the current implementation, the virtual prefetch queue can follow small forward branches.<br />

With an L1 instruction cache turned on, the instruction block size is limited to a single<br />

cache-line. The processor ensures that a line is present in the cache at the point where it starts<br />

executing instructions from that line.<br />

In real hardware, the effects of the instruction prefetch queue are to cause additional fetch<br />

transactions, some of these are redundant because of incorrect branch prediction. This causes<br />

extra cache and bus pressure.<br />

2.3.4 Out-of-order execution and write-buffers<br />

The current CT implementation always executes instructions sequentially in program order. One<br />

instruction is completely retired before the next starts to execute. In a real processor, multiple<br />

memory accesses can be outstanding at once, and can complete in a different order from their<br />

program order. Writes can also be delayed in a write-buffer.<br />

The programmer visible effects of these behaviors is defined in the architecture as the Weakly<br />

Ordered memory model, which the programmer must be aware of when writing lock-free<br />

multi-processor code.<br />

Within <strong>Fast</strong> <strong>Models</strong>, all memory accesses can be observed to happen in program order,<br />

effectively as if all memory is Strongly Ordered.<br />

2.3.5 Caches<br />

The effects of caches are programmer visible because they can cause a single memory location<br />

to exist as multiple inconsistent copies. If caches are not correctly maintained, reads can observe<br />

stale copies of locations, and flushes/cleans can cause writes to be lost.<br />

There are three ways in which incorrect cache maintenance can be programmer visible:<br />

From the D-Side interface of a single processor<br />

The only way of detecting the presence of caches is to create aliases in the<br />

memory map, so that the same range of physical addresses can be observed as<br />

both cached and non-cached memory.<br />

From the D-Side of a single processor to its I-Side<br />

Stale instruction data can be fetched when new instructions have been written by<br />

the D-side. This can either be due to deliberate self-modifying code, or as a<br />

consequence of incorrect OS demand paging.<br />

Between one processor and another device<br />

For example, another processor in a non-coherent MP system, or an external<br />

DMA device.<br />

<strong>ARM</strong> DUI 0423J Copyright © 2008-2011 <strong>ARM</strong>. All rights reserved. 2-7<br />

ID051811<br />

Non-Confidential

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!