15.01.2013 Views

U. Glaeser

U. Glaeser

U. Glaeser

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

FIGURE 42.60 Basic pipeline architecture for a RISC and a DSP processor.<br />

Program instruction order<br />

Program instruction order<br />

Program instruction order<br />

Fetch<br />

Fetch<br />

FIGURE 42.61 Memory-intensive number crunching on a RISC and a DSP.<br />

accumulator is set. In the simple examples of Fig. 42.62, there needs to be two, respectively three instruction<br />

cycles between the setting of the accumulator flag and the usage of it in the decode stage, by the RISC<br />

and DSP processor, respectively. Therefore, the RISC has an advantage for control dominated applications.<br />

In practice these pipeline hazards are either hidden to the programmer by hardware solutions (e.g.,<br />

forwarding or stalls) or they are visible to the programmer, who can optimize his code around it. A typical<br />

example are the branch and the “delayed branch” instruction in DSP processor. Because an instruction is<br />

fetched in the cycle before it is decoded, a regular branch instruction will incur an unnecessary fetch of the<br />

next instruction in memory following the branch. To optimize the code in DSP processors, the delayed<br />

branch instruction is introduced. In this case, the instruction that follows the branch instruction in memory<br />

will be executed before the actual branch takes place. Hence, a delayed branch instruction takes effectively<br />

one cycle to execute while a regular branch will take two cycles to execute.<br />

© 2002 by CRC Press LLC<br />

Fetch<br />

Fetch<br />

Decode<br />

(a) RISC pipeline<br />

Decode<br />

(b) DSP pipeline<br />

Decode Execute<br />

Fetch<br />

Decode Execute<br />

Fetch<br />

Memory<br />

Access<br />

(a) RISC pipeline<br />

Execute<br />

Memory<br />

Access<br />

Memory<br />

Access<br />

Write<br />

Back<br />

Memory access / branch<br />

Execution/ address generation<br />

Execute<br />

Time in clock cycles<br />

Write<br />

Back<br />

Execution<br />

Memory access/address post modification<br />

Write<br />

Back<br />

Memory<br />

Access<br />

Decode Execute<br />

Time in clock cycles<br />

Write<br />

Back<br />

Memory<br />

Access<br />

Write<br />

Back<br />

Time in clock cycles<br />

Memory<br />

Write<br />

Decode Execute<br />

a0 = a0 + *p0;<br />

Access<br />

Back<br />

(b) DSP pipeline<br />

r0 = *p0; // load data<br />

a0 = a0 + r 0 // operate

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!