29.01.2015 Views

Embedded Software for SoC - Grupo de Mecatrônica EESC/USP

Embedded Software for SoC - Grupo de Mecatrônica EESC/USP

Embedded Software for SoC - Grupo de Mecatrônica EESC/USP

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

360 Chapter 27<br />

evaluate the per<strong>for</strong>mance improvement that can be achieved by exploiting<br />

IR.<br />

Dynamic Instruction Reuse (IR) [1, 2] improves the execution time of an<br />

application by reducing the number of instructions that have to be executed<br />

dynamically. Research has shown that many instructions are executed<br />

repeatedly with the same inputs and hence producing the same output [3].<br />

Dynamic instruction reuse is a scheme in which instructions are buffered in<br />

a Reuse Buffer (RB) and future dynamic instances of the same instruction<br />

use the results from the RB if they have the same input operands. The RB is<br />

used to store the operand values and result of instructions that are executed<br />

by the processor. This scheme is <strong>de</strong>noted by (‘v’ <strong>for</strong> value) and was<br />

proposed by Sodani and Sohi [1]. The RB consists of tag, input operands,<br />

result, address and memvalid fields [1]. When an instruction is <strong>de</strong>co<strong>de</strong>d, its<br />

operand values are compared with those stored in the RB. The PC of the<br />

instruction is used to in<strong>de</strong>x into the RB. If a match occurs in the tag and<br />

operand fields, the instruction un<strong>de</strong>r consi<strong>de</strong>ration is said to be reused and<br />

the result from the RB is utilized i.e. the computation is not required. We<br />

assume that the reuse test can be per<strong>for</strong>med in parallel with instruction <strong>de</strong>co<strong>de</strong><br />

[1] and register read stage. The reuse test will usually not lie in the critical<br />

path since the accesses to the RB can be pipelined. The tag match can be<br />

initiated during the instruction fetch stage since the PC value of an instruction<br />

is known by then. The accesses into the operand fields in the RB can<br />

begin only after the operand registers have been read (register read stage in<br />

the pipeline). Execution of load instructions involves a memory address<br />

computation followed by an access to the memory location specified by the<br />

address. The address computation part of a load instruction can be reused if<br />

the instruction operands match an entry in the RB, while the actual memory<br />

value (outcome of load) can be reused if the addressed memory location was<br />

not written by a store instruction [1]. The memvalid field indicates whether<br />

the value loa<strong>de</strong>d from memory is valid (i.e. has not been overwritten by store<br />

instruction) while the address field indicates the memory address. When the<br />

processor executes a store instruction, the address field of each RB entry is<br />

searched <strong>for</strong> a matching address, and the memvalid bit is reset <strong>for</strong> matching<br />

entries [1]. In this article we assume that the RB is updated by instructions<br />

that have completed execution (non-speculative) and are ready to update the<br />

register file. This ensures that precise state is maintained with the RB<br />

containing only the results of committed instructions. IR improves per<strong>for</strong>mance<br />

since reused instructions bypass some stages in the pipeline with the<br />

result that it allows the dataflow limit to be excee<strong>de</strong>d. Per<strong>for</strong>mance improvement<br />

<strong>de</strong>pends on the number of pipeline stages between the register read and<br />

writeback stages and is significant if long latency operations such as divi<strong>de</strong><br />

instructions are reused. Additionally, in case of dynamically scheduled processors,<br />

per<strong>for</strong>mance is further improved since subsequent instructions that are<br />

<strong>de</strong>pen<strong>de</strong>nt on the reused instruction are resolved earlier and can be issued<br />

earlier (out-of-or<strong>de</strong>r issue).

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!