Soft-Core Processor Design - CiteSeer

More documents

Recommendations

Info

32 Figure 4.1 UT Nios datapath
Similarly, the control-flow instructions require an instruction fetch from the target address in the instruction memory, so their latency depends on the instruction memory latency. The control-flow instructions introduce at least two cycles of branch penalty if the synchronous memory is used. Pipeline execution results are stored temporarily in the pipeline registers. The result of the fetch stage is a fetched instruction, which is stored temporarily in the IR register. The results of the decode and operand stages are stored in pipeline registers D/O and O/X, respectively. Unlike traditional RISC architectures [34], UT Nios does not have a write-back stage. The operand stage was introduced instead to reduce the delay of the critical path in the execute stage. Introducing the write-back stage would likely decrease the processor performance because of the stalls caused by data hazards. The operand stage does not incur such stalls. A discussion of the pipeline organization and its implication on performance is given in Chapter 6. The following sections present the structure and the functionality of the datapath modules. 4.1.1. Prefetch Unit The UT Nios prefetch unit performs the functionality of the UT Nios instruction master. The prefetch unit connects directly to the Avalon bus, and communicates with the instruction memory by using the predefined Avalon signals. If the pipeline commits one instruction per cycle, instructions from the prefetch unit are directly forwarded to the decode stage of the pipeline. Since the instruction master on the Avalon bus supports latency transfers, the prefetch unit issues several consecutive reads, even if the pipeline stalls, and the instructions are not required immediately. In this case, the prefetched instructions are temporarily stored in a FIFO buffer. When the stall is resolved, the next instruction is ready, and the pipeline execution may continue immediately. Using the FIFO buffer reduces the pipeline latency. Without it, a memory read would have to be issued, and the execution could only continue when the new instruction has been fetched. The prefetch unit issues only as many memory read operations as the size of the FIFO buffer if the pipeline is stalled. The size of the UT Nios FIFO buffer is configurable using a defparam Verilog statement. On system reset, the prefetch unit starts fetching instructions from a user-defined starting memory address. To keep track of an instruction that needs to be fetched next, the prefetch unit maintains a copy of the program counter called the prefetch program counter (PPC). The PPC is independent of the program counter visible to the programmer, and gets incremented every time a memory read is issued by the prefetch unit. It is updated with the branch target address by the branch unit when a taken branch executes in the pipeline. Since branches are executed in the third stage of the pipeline, the prefetch unit may have already fetched instructions past the delay slot of 33
Page 1 and 2: SOFT-CORE PROCESSOR DESIGN by Franj
Page 3 and 4: Acknowledgments First, I would like
Page 5 and 6: 5.1.2. Development Tools ..........
Page 7 and 8: Chapter 1 Introduction Since their
Page 9 and 10: Chapter 2 Background Soft-core proc
Page 11 and 12: uilt using techniques proven to be
Page 13 and 14: logic and I/O blocks [11]. Since th
Page 15 and 16: timing-driven [11]. Although simula
Page 17 and 18: the HDL coding style. To ensure tha
Page 19 and 20: 3.1. Nios Architecture The Nios ins
Page 21 and 22: esult of a read operation from thes
Page 23 and 24: satisfied the instruction that foll
Page 25 and 26: Most instructions take 5 cycles to
Page 27 and 28: contents of the register window wil
Page 29 and 30: needed, the master asserts the flus
Page 31 and 32: There are several ways in which use
Page 33 and 34: code) is provided [47]. Both printf
Page 35 and 36: memory address has to be set in the
Page 37: parameters include the general-purp
Page 41 and 42: the logic resources may be more cri
Page 43 and 44: simple dual-port mode, which means
Page 45 and 46: prefetch program counter (PPC), whi
Page 47 and 48: There are two ways to resolve data
Page 49 and 50: individual bits (e.g. flags), and g
Page 51 and 52: LOAD state, except that a memory wr
Page 53 and 54: Chapter 5 Performance This chapter
Page 55 and 56: • Qsort: uses the well known qsor
Page 57 and 58: performance of the UT Nios and Alte
Page 59 and 60: 5.2.1. Performance Dependence on th
Page 61 and 62: Speedup Over Buffer Size 1 1.6 1.4
Page 63 and 64: underflow and overflow exceptions a
Page 65 and 66: Slowdown Over 29 Available Register
Page 67 and 68: Recursion Level # of recursive call
Page 69 and 70: Total # of Memory Accesses Performe
Page 71 and 72: System SRAM ONCHIP Size of the Regi
Page 73 and 74: a fixed access time, since it is no
Page 75 and 76: Speedup of the Pipeline Optimized f
Page 77 and 78: Improvement of UT Nios over Altera
Page 79 and 80: Improvement of UT Nios over Altera
Page 81 and 82: Number of Processors LEs (% increas
Page 83 and 84: pipelined implementation. Control-f
Page 85 and 86: not mean that, for example, the ins
Page 87 and 88: There are many paths in each group,
Page 89 and 90:
FPGA design flow is a random functi
Page 91 and 92:
each be connected to only a single
Page 93 and 94:
• The UT Nios design is analyzed
Page 95 and 96:
[13] C. Blum and A. Roli, “Metahe
Page 97 and 98:
[36] Microchip Technology, “PIC16
Page 99:
[60] Altera Corporation, “AN 184:
show all

Soft-Core Processor Design - CiteSeer

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?