Soft-Core Processor Design - CiteSeer

More documents

Recommendations

Info

5.4.1. Performance The metric used in the performance comparison is the wall clock time required to execute a benchmark program. The wall clock time can be expressed as [34]: T = IC × CPI × C where IC is the instruction count, CPI is the average number of clock cycles needed to execute an instruction, and C is the cycle time (i.e. duration of a clock cycle). The instruction count is the same for both processors, because the same binary programs are run on both systems. Therefore, the performance will be defined by the number of cycles needed to execute the program (cycle count) and the cycle time. The best cycle times and the corresponding best Fmax for the four systems used in the performance comparison are given in Table 5.3. UT Nios has almost 50% longer cycle time than the Altera Nios for the SRAM system, and 40% longer cycle time for the ONCHIP system. Altera Nios UT Nios System Cycle Time (ns) Fmax (MHz) SRAM 8.47 118 ONCHIP 8.59 116 SRAM 12.66 79 ONCHIP 12.03 83 Table 5.3 Cycle time and Fmax of the systems used in performance comparison All system configurations were run at the clock speed of 50 MHz. Thus, the run times obtained can be directly used to compare the cycle counts of the two architectures. To obtain the wall clock run times, the measured times are prorated with the appropriate factor. The comparison of the SRAM systems based on the Altera and UT Nios is presented in Figures 5.10 and 5.11. The graphs in the figures show the improvement of UT Nios over the Altera Nios. Both the wall clock performance ratio and the cycle count ratio are given. Analysis of the cycle count advantage of the UT Nios for the Loops benchmark shows that the UT Nios implements branch logic better, since the Loops benchmark executes almost 56% faster in terms of the cycle count. However, because of the longer cycle time, the advantage of UT Nios over the Altera Nios in terms of the wall clock time for the Loops benchmark is only 4%. The lower cycle count comes from the branch logic implementation in UT Nios, where the control-flow instructions are committed early in the pipeline (in the operand stage), as described in section 4.1.5. The Altera Nios executes the control-flow instructions less efficiently, which is expected because of the deeper pipeline. 70
Improvement of UT Nios over Altera Nios 2.5 2 1.5 1 0.5 0 Loops Memory Pipeline Pipeline-Memory (Boot) Pipeline-Memory (Program) 71 Fibo Multiply QsortInt Wall Clock Cycle Count Figure 5.10 Performance comparison of the toy and test benchmarks on the UT and Altera Improvement of UT Nios over Altera Nios 1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 1 bit/loop Recursive by nibbles Nios based SRAM systems Non-recursive by nibbles Non-recursive by bytes Shift and count CRC32 Dijkstra Small Dijkstra Large Game of Life Patricia Qsort Small Qsort Large SHA Stringsearch Wall Clock Cycle Count Figure 5.11 Performance comparison of the application benchmarks on the UT and Altera Nios based SRAM systems
Page 1 and 2:
SOFT-CORE PROCESSOR DESIGN by Franj
Page 3 and 4:
Acknowledgments First, I would like
Page 5 and 6:
5.1.2. Development Tools ..........
Page 7 and 8:
Chapter 1 Introduction Since their
Page 9 and 10:
Chapter 2 Background Soft-core proc
Page 11 and 12:
uilt using techniques proven to be
Page 13 and 14:
logic and I/O blocks [11]. Since th
Page 15 and 16:
timing-driven [11]. Although simula
Page 17 and 18:
the HDL coding style. To ensure tha
Page 19 and 20:
3.1. Nios Architecture The Nios ins
Page 21 and 22:
esult of a read operation from thes
Page 23 and 24:
satisfied the instruction that foll
Page 25 and 26: Most instructions take 5 cycles to
Page 27 and 28: contents of the register window wil
Page 29 and 30: needed, the master asserts the flus
Page 31 and 32: There are several ways in which use
Page 33 and 34: code) is provided [47]. Both printf
Page 35 and 36: memory address has to be set in the
Page 37 and 38: parameters include the general-purp
Page 39 and 40: Similarly, the control-flow instruc
Page 41 and 42: the logic resources may be more cri
Page 43 and 44: simple dual-port mode, which means
Page 45 and 46: prefetch program counter (PPC), whi
Page 47 and 48: There are two ways to resolve data
Page 49 and 50: individual bits (e.g. flags), and g
Page 51 and 52: LOAD state, except that a memory wr
Page 53 and 54: Chapter 5 Performance This chapter
Page 55 and 56: • Qsort: uses the well known qsor
Page 57 and 58: performance of the UT Nios and Alte
Page 59 and 60: 5.2.1. Performance Dependence on th
Page 61 and 62: Speedup Over Buffer Size 1 1.6 1.4
Page 63 and 64: underflow and overflow exceptions a
Page 65 and 66: Slowdown Over 29 Available Register
Page 67 and 68: Recursion Level # of recursive call
Page 69 and 70: Total # of Memory Accesses Performe
Page 71 and 72: System SRAM ONCHIP Size of the Regi
Page 73 and 74: a fixed access time, since it is no
Page 75: Speedup of the Pipeline Optimized f
Page 79 and 80: Improvement of UT Nios over Altera
Page 81 and 82: Number of Processors LEs (% increas
Page 83 and 84: pipelined implementation. Control-f
Page 85 and 86: not mean that, for example, the ins
Page 87 and 88: There are many paths in each group,
Page 89 and 90: FPGA design flow is a random functi
Page 91 and 92: each be connected to only a single
Page 93 and 94: • The UT Nios design is analyzed
Page 95 and 96: [13] C. Blum and A. Roli, “Metahe
Page 97 and 98: [36] Microchip Technology, “PIC16
Page 99: [60] Altera Corporation, “AN 184:
show all

Soft-Core Processor Design - CiteSeer

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?