Soft-Core Processor Design - CiteSeer

More documents

Recommendations

Info

In addition to optional instructions, the Altera Nios instruction set can be customized by defining up to five user-defined (custom) instructions. Instruction encoding reserves five OP codes for user instructions. In the assembly language, user instructions can be referred to using mnemonics USR0 through USR4. User instruction functionality is defined as a module with a predefined interface [24]. Functionality specification can be given using any of the design entry formats supported by Quartus II. Both single-cycle (combinational), and multi-cycle (sequential) operations are allowed in user instructions. The number of cycles it takes for a multi-cycle operation to produce the result has to be fixed and declared at the system design time. Custom instruction logic is added in parallel to the functional units of the Nios processor, which corresponds to a closely coupled reconfigurable system. A module with user instruction specification can also interface user logic external to the processor, as shown in Figure 3.2 [24]. 3.1.3. Datapath and Memory Organization The Altera Nios datapath is organized as a 5-stage single-issue pipeline with separate data and instruction masters. The data master performs memory load and store operations, while the instruction master fetches the instructions from the instruction memory. Many pipeline details are not provided in the Altera literature. Figure 3.2 Adding custom instructions to Nios [24] 18
Most instructions take 5 cycles to execute. Execution time of the MUL instruction varies from 5 to 8 cycles, depending on the FPGA device it is implemented in. MSTEP and shift instructions take 6 cycles to execute. Memory operations take 5 or more cycles to execute, depending on the memory latency and bus arbitration. Although each instruction takes at least 5 cycles to execute, due to pipelined execution, one instruction per cycle will finish its execution in the ideal case. In reality, less than one instruction will commit per cycle, since some instructions take more than 5 cycles to execute. The pipeline implementation is not visible to user programs, except for the instructions in the delay slot, and the WRCTL instruction. An instruction in the delay slot executes out of the original program order. Any WRCTL instruction modifying the STATUS register has to be followed by a NOP instruction. NOP is a pseudo instruction implemented as MOV %r0, %r0. Systems using the Nios processor can use both on- and off-chip memory for instruction and data storage. Memory is byte addressable, and words and halfwords are stored in memory using little-endian byte ordering. All memory addresses are word aligned for 32-bit Nios, and halfword aligned for 16-bit Nios. Special control signals, called byte-enable lines, are used when a partial word has to be written to the memory. Systems using off-chip memory can use the on-chip memory as instruction and data caches. Caches are direct-mapped with write-through write policy for the data cache. The instruction cache is read-only. If a program writes to the instruction memory, the corresponding lines in the instruction cache have to be invalidated. Cache support is only provided for the 32-bit Altera Nios. Several Nios datapath parameters can be customized. The pipeline can be optimized for speed or area. The instruction decoder can be implemented in logic or in memory. Implementation in logic is faster, while memory implementation uses on-chip memory and leaves more resources for user-logic. According to [38], both 16- and 32-bit Nios can run at clock speeds over 125 MHz. This number varies depending on the target FPGA device. 3.1.4. Interrupt Handling There are three possible sources of interrupts in the Nios architecture: internal exceptions, software interrupts, and I/O interrupts. Internal exceptions occur because of unexpected results in instruction execution. <strong>Soft</strong>ware-interrupts are explicit calls to trap routines using TRAP instruction, often used for operating system calls. I/O interrupts come from I/O devices external to the Nios processor, which may reside both on and off-chip. The Nios architecture supports up to 64 vectored interrupts. Internal exceptions have predefined exception numbers, while software-interrupts provide the exception number as an 19
Page 1 and 2: SOFT-CORE PROCESSOR DESIGN by Franj
Page 3 and 4: Acknowledgments First, I would like
Page 5 and 6: 5.1.2. Development Tools ..........
Page 7 and 8: Chapter 1 Introduction Since their
Page 9 and 10: Chapter 2 Background Soft-core proc
Page 11 and 12: uilt using techniques proven to be
Page 13 and 14: logic and I/O blocks [11]. Since th
Page 15 and 16: timing-driven [11]. Although simula
Page 17 and 18: the HDL coding style. To ensure tha
Page 19 and 20: 3.1. Nios Architecture The Nios ins
Page 21 and 22: esult of a read operation from thes
Page 23: satisfied the instruction that foll
Page 27 and 28: contents of the register window wil
Page 29 and 30: needed, the master asserts the flus
Page 31 and 32: There are several ways in which use
Page 33 and 34: code) is provided [47]. Both printf
Page 35 and 36: memory address has to be set in the
Page 37 and 38: parameters include the general-purp
Page 39 and 40: Similarly, the control-flow instruc
Page 41 and 42: the logic resources may be more cri
Page 43 and 44: simple dual-port mode, which means
Page 45 and 46: prefetch program counter (PPC), whi
Page 47 and 48: There are two ways to resolve data
Page 49 and 50: individual bits (e.g. flags), and g
Page 51 and 52: LOAD state, except that a memory wr
Page 53 and 54: Chapter 5 Performance This chapter
Page 55 and 56: • Qsort: uses the well known qsor
Page 57 and 58: performance of the UT Nios and Alte
Page 59 and 60: 5.2.1. Performance Dependence on th
Page 61 and 62: Speedup Over Buffer Size 1 1.6 1.4
Page 63 and 64: underflow and overflow exceptions a
Page 65 and 66: Slowdown Over 29 Available Register
Page 67 and 68: Recursion Level # of recursive call
Page 69 and 70: Total # of Memory Accesses Performe
Page 71 and 72: System SRAM ONCHIP Size of the Regi
Page 73 and 74: a fixed access time, since it is no
Page 75 and 76:
Speedup of the Pipeline Optimized f
Page 77 and 78:
Improvement of UT Nios over Altera
Page 79 and 80:
Improvement of UT Nios over Altera
Page 81 and 82:
Number of Processors LEs (% increas
Page 83 and 84:
pipelined implementation. Control-f
Page 85 and 86:
not mean that, for example, the ins
Page 87 and 88:
There are many paths in each group,
Page 89 and 90:
FPGA design flow is a random functi
Page 91 and 92:
each be connected to only a single
Page 93 and 94:
• The UT Nios design is analyzed
Page 95 and 96:
[13] C. Blum and A. Roli, “Metahe
Page 97 and 98:
[36] Microchip Technology, “PIC16
Page 99:
[60] Altera Corporation, “AN 184:
show all

Soft-Core Processor Design - CiteSeer

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?