Soft-Core Processor Design - CiteSeer

More documents

Recommendations

Info

Figure 5.1 shows that the performance of the ONCHIP system is the best for all benchmarks when the FIFO buffer has a unit size. The performance drop with the increasing buffer size results from the memory access conflicts, because both data and instructions reside in the same memory. If both instruction and data master request the memory access at the same time, the bus arbitration logic grants the access to one of the masters, while wait states are generated for the other. This is obvious from Figure 5.1, which shows that the Memory test benchmark suffers the biggest performance penalty because of the increasing FIFO buffer size. The Pipeline-Memory benchmark, which reads the data from the BOOT memory that is not accessed by the instruction master while the program is running, does not suffer any performance penalty. At the same time, the Pipeline-Memory benchmark that reads the data from the PROGRAM memory, suffers mildly from increasing the FIFO buffer size. Figures 5.2 and 5.3 show how the benchmark performance varies with the FIFO buffer size on the SRAM system. The best performance for all benchmarks is achieved with the buffer size 2. Similar to the ONCHIP system, further increases in the buffer size hurt the performance. The Pipeline test benchmark benefits the most from increasing the FIFO buffer size to two registers, Speedup Over Buffer Size 1 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 Loops Size = 2 Size = 3 Size = 4 Size = 15 Memory Pipeline Pipeline-Memory (BOOT) Figure 5.2 Performance of the test and toy benchmarks vs. the FIFO buffer size on the SRAM system 54 Pipeline-Memory (PROGRAM) Fibo Multiply QsortInt
Speedup Over Buffer Size 1 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 1 bit/loop because this benchmark does not experience any stalls if the prefetch unit always has the next instruction ready. However, with the unit buffer size, the prefetch unit can only issue two reads in three cycles, because the FIFO buffer has only one register to store the instruction coming from the memory, while the other one is forwarded to the decode stage immediately. Two instruction fetches per three clock cycles correspond to the performance improvement of 3/2 when instructions are fetched continuously, which is the case when the FIFO buffer of size two is used. This value explains the speedup of the Pipeline benchmark in Figure 5.2. Further increases in the FIFO buffer size do not influence the performance because the benchmark does not include memory operations. Size = 2 Size = 3 Size = 4 Size = 15 Recursive by nibbles Non-recursive by nibbles Non-recursive by bytes Shift and count CRC32 Figure 5.3 Performance of the application benchmarks vs. the FIFO buffer size on the SRAM system The Stringsearch benchmark suffers significant performance penalty when a buffer size of 15 is used over the unit buffer size. The performance of Stringsearch on a system with the FIFO buffer size 15 was analyzed using ModelSim. We determined that the Stringsearch benchmark contains store instructions that take as long as 16 cycles to commit. These instructions are targets of the branches in the program. When a branch commits, the prefetch FIFO buffer is flushed, and instructions from the target address are fetched. When the store instruction reaches the execute pipeline stage and issues a write request, the request is not handled before the prefetch unit lowers 55 Dijkstra Large Game of Life Patricia Qsort Large SHA Stringsearch
Page 1 and 2:
SOFT-CORE PROCESSOR DESIGN by Franj
Page 3 and 4:
Acknowledgments First, I would like
Page 5 and 6:
5.1.2. Development Tools ..........
Page 7 and 8:
Chapter 1 Introduction Since their
Page 9 and 10: Chapter 2 Background Soft-core proc
Page 11 and 12: uilt using techniques proven to be
Page 13 and 14: logic and I/O blocks [11]. Since th
Page 15 and 16: timing-driven [11]. Although simula
Page 17 and 18: the HDL coding style. To ensure tha
Page 19 and 20: 3.1. Nios Architecture The Nios ins
Page 21 and 22: esult of a read operation from thes
Page 23 and 24: satisfied the instruction that foll
Page 25 and 26: Most instructions take 5 cycles to
Page 27 and 28: contents of the register window wil
Page 29 and 30: needed, the master asserts the flus
Page 31 and 32: There are several ways in which use
Page 33 and 34: code) is provided [47]. Both printf
Page 35 and 36: memory address has to be set in the
Page 37 and 38: parameters include the general-purp
Page 39 and 40: Similarly, the control-flow instruc
Page 41 and 42: the logic resources may be more cri
Page 43 and 44: simple dual-port mode, which means
Page 45 and 46: prefetch program counter (PPC), whi
Page 47 and 48: There are two ways to resolve data
Page 49 and 50: individual bits (e.g. flags), and g
Page 51 and 52: LOAD state, except that a memory wr
Page 53 and 54: Chapter 5 Performance This chapter
Page 55 and 56: • Qsort: uses the well known qsor
Page 57 and 58: performance of the UT Nios and Alte
Page 59: 5.2.1. Performance Dependence on th
Page 63 and 64: underflow and overflow exceptions a
Page 65 and 66: Slowdown Over 29 Available Register
Page 67 and 68: Recursion Level # of recursive call
Page 69 and 70: Total # of Memory Accesses Performe
Page 71 and 72: System SRAM ONCHIP Size of the Regi
Page 73 and 74: a fixed access time, since it is no
Page 75 and 76: Speedup of the Pipeline Optimized f
Page 77 and 78: Improvement of UT Nios over Altera
Page 79 and 80: Improvement of UT Nios over Altera
Page 81 and 82: Number of Processors LEs (% increas
Page 83 and 84: pipelined implementation. Control-f
Page 85 and 86: not mean that, for example, the ins
Page 87 and 88: There are many paths in each group,
Page 89 and 90: FPGA design flow is a random functi
Page 91 and 92: each be connected to only a single
Page 93 and 94: • The UT Nios design is analyzed
Page 95 and 96: [13] C. Blum and A. Roli, “Metahe
Page 97 and 98: [36] Microchip Technology, “PIC16
Page 99: [60] Altera Corporation, “AN 184:
show all

Soft-Core Processor Design - CiteSeer

Create successful ePaper yourself

Delete template?

Save as template?