Introduction to the Altera Nios II Soft Processor - FTP - Altera
Introduction to the Altera Nios II Soft Processor - FTP - Altera
Introduction to the Altera Nios II Soft Processor - FTP - Altera
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
<strong>Introduction</strong> <strong>to</strong> <strong>the</strong> <strong>Altera</strong> <strong>Nios</strong> <strong>II</strong> <strong>Soft</strong> <strong>Processor</strong><br />
This tu<strong>to</strong>rial presents an introduction <strong>to</strong> <strong>Altera</strong>’s <strong>Nios</strong> R○ <strong>II</strong> processor, which is a soft processor that can be instantiated<br />
on an <strong>Altera</strong> FPGA device. It describes <strong>the</strong> basic architecture of <strong>Nios</strong> <strong>II</strong> and its instruction set. The <strong>Nios</strong><br />
<strong>II</strong> processor and its associated memory and peripheral components are easily instantiated by using <strong>Altera</strong>’s SOPC<br />
Builder in conjuction with <strong>the</strong> Quartus R○ <strong>II</strong> software.<br />
A full desciption of <strong>the</strong> <strong>Nios</strong> <strong>II</strong> processor is provided in <strong>the</strong> <strong>Nios</strong> <strong>II</strong> <strong>Processor</strong> Reference Handbook, which<br />
is available in <strong>the</strong> literature section of <strong>the</strong> <strong>Altera</strong> web site. An introduction <strong>to</strong> <strong>the</strong> SOPC Builder is given in <strong>the</strong><br />
tu<strong>to</strong>rial <strong>Introduction</strong> <strong>to</strong> <strong>the</strong> <strong>Altera</strong> SOPC Builder, which can be found in <strong>the</strong> University Program section of <strong>the</strong> web<br />
site.<br />
Contents:<br />
<strong>Nios</strong> <strong>II</strong> System<br />
Overview of <strong>Nios</strong> <strong>II</strong> <strong>Processor</strong> Features<br />
Register Structure<br />
Accessing Memory and I/O Devices<br />
Addressing<br />
Instruction Set<br />
Assembler Directives<br />
Example Program<br />
Exception Processing<br />
Cache Memory<br />
Tightly Coupled Memory<br />
1
<strong>Altera</strong>’s <strong>Nios</strong> <strong>II</strong> is a soft processor, defined in a hardware description language, which can be implemented in<br />
<strong>Altera</strong>’s FPGA devices by using <strong>the</strong> Quartus R○ <strong>II</strong> CAD system. This tu<strong>to</strong>rial provides a basic introduction <strong>to</strong> <strong>the</strong><br />
<strong>Nios</strong> <strong>II</strong> processor, intended for a user who wishes <strong>to</strong> implement a <strong>Nios</strong> <strong>II</strong> based system on <strong>the</strong> <strong>Altera</strong> DE2 board.<br />
1 <strong>Nios</strong> <strong>II</strong> System<br />
The <strong>Nios</strong> <strong>II</strong> processor can be used with a variety of o<strong>the</strong>r components <strong>to</strong> form a complete system. These components<br />
include a number of standard peripherals, but it is also possible <strong>to</strong> define cus<strong>to</strong>m peripherals. <strong>Altera</strong>’s DE2<br />
Development and Education board contains several components that can be integrated in<strong>to</strong> a <strong>Nios</strong> <strong>II</strong> system. An<br />
example of such a system is shown in Figure 1.<br />
Host computer<br />
USB-Blaster<br />
interface<br />
<strong>Nios</strong> <strong>II</strong> processor<br />
JTAG Debug<br />
module<br />
JTAG UART<br />
interface<br />
Cyclone <strong>II</strong><br />
FPGA chip<br />
Avalon switch fabric<br />
On-chip<br />
memory<br />
SRAM<br />
interface<br />
SDRAM<br />
interface<br />
Flash<br />
memory<br />
interface<br />
Parallel I/O<br />
interface<br />
Serial I/O<br />
interface<br />
SRAM<br />
chip<br />
SDRAM<br />
chip<br />
Flash<br />
memory<br />
chip<br />
Parallel<br />
I/O port<br />
lines<br />
Serial<br />
I/O port<br />
lines<br />
Figure 1. A <strong>Nios</strong> <strong>II</strong> system implemented on <strong>the</strong> DE2 board.<br />
2
The <strong>Nios</strong> <strong>II</strong> processor and <strong>the</strong> interfaces needed <strong>to</strong> connect <strong>to</strong> o<strong>the</strong>r chips on <strong>the</strong> DE2 board are implemented in<br />
<strong>the</strong> Cyclone <strong>II</strong> FPGA chip. These components are interconnected by means of <strong>the</strong> interconnection network called<br />
<strong>the</strong> Avalon Switch Fabric. Memory blocks in <strong>the</strong> Cyclone <strong>II</strong> device can be used <strong>to</strong> provide an on-chip memory for<br />
<strong>the</strong> <strong>Nios</strong> <strong>II</strong> processor. They can be connected <strong>to</strong> <strong>the</strong> processor ei<strong>the</strong>r directly or through <strong>the</strong> Avalon network. The<br />
SRAM and SDRAM memory chips on <strong>the</strong> DE2 board are accessed through <strong>the</strong> appropriate interfaces. Input/output<br />
interfaces are instantiated <strong>to</strong> provide connection <strong>to</strong> <strong>the</strong> I/O devices used in <strong>the</strong> system. A special JTAG UART<br />
interface is used <strong>to</strong> connect <strong>to</strong> <strong>the</strong> circuitry that provides a Universal Serial Bus (USB) link <strong>to</strong> <strong>the</strong> host computer <strong>to</strong><br />
which <strong>the</strong> DE2 board is connected. This circuitry and <strong>the</strong> associated software is called <strong>the</strong> USB-Blaster. Ano<strong>the</strong>r<br />
module, called <strong>the</strong> JTAG Debug module, is provided <strong>to</strong> allow <strong>the</strong> host computer <strong>to</strong> control <strong>the</strong> <strong>Nios</strong> <strong>II</strong> processor.<br />
It makes it possible <strong>to</strong> perform operations such as downloading programs in<strong>to</strong> memory, starting and s<strong>to</strong>pping<br />
execution, setting program breakpoints, and collecting real-time execution trace data.<br />
Since all parts of <strong>the</strong> <strong>Nios</strong> <strong>II</strong> system implemented on <strong>the</strong> FPGA chip are defined by using a hardware description<br />
language, a knowledgeable user could write such code <strong>to</strong> implement any part of <strong>the</strong> system. This would be an<br />
onnerous and time consuming task. Instead, one can use <strong>the</strong> SOPC Builder <strong>to</strong>ol in <strong>the</strong> Quartus <strong>II</strong> software <strong>to</strong><br />
implement a desired system simply by choosing <strong>the</strong> required components and specifying <strong>the</strong> parameters needed<br />
<strong>to</strong> make each component fit <strong>the</strong> overall requirements of <strong>the</strong> system.<br />
2 Overview of <strong>Nios</strong> <strong>II</strong> <strong>Processor</strong> Features<br />
The <strong>Nios</strong> <strong>II</strong> processor has a number of features that can be configured by <strong>the</strong> user <strong>to</strong> meet <strong>the</strong> demands of a desired<br />
system. The processor can be implemented in three different configurations:<br />
• <strong>Nios</strong> <strong>II</strong>/f is a "fast" version designed for superior performance. It has <strong>the</strong> widest scope of configuration<br />
options that can be used <strong>to</strong> optimize <strong>the</strong> processor for performance.<br />
• <strong>Nios</strong> <strong>II</strong>/s is a "standard" version that requires less resources in an FPGA device as a trade-off for reduced<br />
performance.<br />
• <strong>Nios</strong> <strong>II</strong>/e is an "economy" version which requires <strong>the</strong> least amount of FPGA resources, but also has <strong>the</strong> most<br />
limited set of user-configurable features.<br />
The <strong>Nios</strong> <strong>II</strong> processor has a Reduced Instruction Set Computer (RISC) architecture. Its arithmetic and logic<br />
operations are performed on operands in <strong>the</strong> general purpose registers. The data is moved between <strong>the</strong> memory<br />
and <strong>the</strong>se registers by means of Load and S<strong>to</strong>re instructions.<br />
The wordlength of <strong>the</strong> <strong>Nios</strong> <strong>II</strong> processor is 32 bits. All registers are 32 bits long. Byte addresses in a 32-bit<br />
word are assigned in little-endian style, in which <strong>the</strong> lower byte addresses are used for <strong>the</strong> less significant bytes<br />
(<strong>the</strong> rightmost bytes) of <strong>the</strong> word.<br />
The <strong>Nios</strong> <strong>II</strong> architecture uses separate instruction and data buses, which is often referred <strong>to</strong> as <strong>the</strong> Harvard<br />
architecture.<br />
A <strong>Nios</strong> <strong>II</strong> processor may operate in <strong>the</strong> following three modes:<br />
• Supervisor mode – allows <strong>the</strong> processor <strong>to</strong> execute all instructions and perform all available functions. When<br />
<strong>the</strong> processor is reset, it enters this mode.<br />
• User mode – <strong>the</strong> intent of this mode is <strong>to</strong> prevent execution of some instructions that shoud be used for<br />
systems purposes only. Some processor features are not accessible in this mode.<br />
• Debug mode – is used by software debugging <strong>to</strong>ols <strong>to</strong> implement features such as breakpoints and watchpoints.<br />
Application programs can be run in ei<strong>the</strong>r <strong>the</strong> User or Supervisor modes. Presently available versions of <strong>the</strong> <strong>Nios</strong><br />
<strong>II</strong> processor do not support <strong>the</strong> User mode.<br />
3
3 Register Structure<br />
The <strong>Nios</strong> <strong>II</strong> processor has thirty two 32-bit general purpose registers, as shown in Figure 2. Some of <strong>the</strong>se registers<br />
are intended for a specific purpose and have special names that are recognized by <strong>the</strong> Assembler.<br />
• Register r0 is referred <strong>to</strong> as <strong>the</strong> zero register. It always contains <strong>the</strong> constant 0. Thus, reading this register<br />
returns <strong>the</strong> value 0, while writing <strong>to</strong> it has no effect.<br />
• Register r1 is used by <strong>the</strong> Assembler as a temporary register; it should not be referenced in user programs<br />
• Registers r24 and r29 are used for processing of exceptions; <strong>the</strong>y are not available in User mode<br />
• Registers r25 and r30 are used exclusively by <strong>the</strong> JTAG Debug module<br />
• Registers r27 and r28 are used <strong>to</strong> control <strong>the</strong> stack used by <strong>the</strong> <strong>Nios</strong> <strong>II</strong> processor<br />
• Register r31 is used <strong>to</strong> hold <strong>the</strong> return address when a subroutine is called<br />
Register Name Function<br />
r0 zero 0x00000000<br />
r1 at Assembler Temporary<br />
r2<br />
r3<br />
· · ·<br />
· · ·<br />
· · ·<br />
r23<br />
r24 et Exception Temporary (1)<br />
r25 bt Breakpoint Temporary (2)<br />
r26 gp Global Pointer<br />
r27 sp Stack Pointer<br />
r28 fp Frame Pointer<br />
r29 ea Exception Return Address (1)<br />
r30 ba Breakpoint Return Address (2)<br />
r31 ra Return Address<br />
(1) The register is not available in User mode<br />
(2) The register is used exclusively by <strong>the</strong> JTAG Debug module<br />
Figure 2. General Purpose registers.<br />
There are six 32-bit control registers, as indicated in Figure 3. The names given in <strong>the</strong> figure are recognized<br />
by <strong>the</strong> Assembler. These registers are used au<strong>to</strong>matically for control purposes. They can be read and written <strong>to</strong> by<br />
special instructions rdctl and wrctl, which can be executed only in <strong>the</strong> supervisor mode. The registers are used as<br />
follows:<br />
• Register ctl0 reflects <strong>the</strong> operating status of <strong>the</strong> processor. Only two bits of this register are meaningful:<br />
– U is <strong>the</strong> User/Supervisor mode bit; U = 1 for User mode, while U = 0 for Supervisor mode.<br />
– PIE is <strong>the</strong> processor interrupt-enable bit. When PIE = 1, <strong>the</strong> processor may accept external interrupts.<br />
When PIE = 0, <strong>the</strong> processor ignores external interrupts.<br />
• Register ctl1 holds a saved copy of <strong>the</strong> status register during exception processing. The bits EU and EPIE<br />
are <strong>the</strong> saved values of <strong>the</strong> status bits U and PIE.<br />
4
• Register ctl2 holds a saved copy of <strong>the</strong> status register during debug break processing. The bits BU and BPIE<br />
are <strong>the</strong> saved values of <strong>the</strong> status bits U and PIE.<br />
• Register ctl3 is used <strong>to</strong> enable individual external interrupts. Each bit corresponds <strong>to</strong> one of <strong>the</strong> interrupts<br />
irq0 <strong>to</strong> irq31. The value of 1 means that <strong>the</strong> interrupt is enabled, while 0 means that it is disabled.<br />
• Register ctl4 indicates which interrupts are pending. The value of a given bit, ctl4 k , is set <strong>to</strong> 1 if <strong>the</strong> interrupt<br />
irqk is both active and enabled by having <strong>the</strong> interrupt-enable bit, ctl3 k , set <strong>to</strong> 1.<br />
• Register ctl5 holds a value that uniquely identifies <strong>the</strong> processor in a multiprocessor system.<br />
Register Name b 31 · · · b 2 b 1 b 0<br />
ctl0 status Reserved U PIE<br />
ctl1 estatus Reserved EU EPIE<br />
ctl2 bstatus Reserved BU BPIE<br />
ctl3 ienable Interrupt-enable bits<br />
ctl4 ipending Pending-interrupt bits<br />
ctl5 cpuid Unique processor identifier<br />
Figure 3. Control registers.<br />
4 Accessing Memory and I/O Devices<br />
Figure 4 shows how a <strong>Nios</strong> <strong>II</strong> processor can access memory and I/O devices. For best performance, <strong>the</strong> <strong>Nios</strong> <strong>II</strong>/f<br />
processor can include both instruction and data caches. The caches are implemented in <strong>the</strong> FPGA memory blocks.<br />
Their usage is optional and <strong>the</strong>y are specified (including <strong>the</strong>ir size) at <strong>the</strong> system generation time by using <strong>the</strong><br />
SOPC Builder. The <strong>Nios</strong> <strong>II</strong>/s version can have <strong>the</strong> instruction cache but not <strong>the</strong> data cache. The <strong>Nios</strong> <strong>II</strong>/e version<br />
has nei<strong>the</strong>r instruction nor data cache.<br />
Ano<strong>the</strong>r way <strong>to</strong> give <strong>the</strong> processor fast access <strong>to</strong> <strong>the</strong> on-chip memory is by using <strong>the</strong> tightly coupled memory<br />
arrangement, in which case <strong>the</strong> processor accesses <strong>the</strong> memory via a direct path ra<strong>the</strong>r than through <strong>the</strong> Avalon<br />
network. Accesses <strong>to</strong> a tightly coupled memory bypass <strong>the</strong> cache memory. There can be one or more tightly<br />
coupled instruction and data memories. If <strong>the</strong> instruction cache is not included in a system, <strong>the</strong>n <strong>the</strong>re must be at<br />
least one tightly coupled memory provided for <strong>Nios</strong> <strong>II</strong>/f and <strong>Nios</strong> <strong>II</strong>/s processors. On-chip memory can also be<br />
accessed via <strong>the</strong> Avalon network.<br />
Off-chip memory devices, such as SRAM, SDRAM, and Flash memory chips are accessed by instantiating <strong>the</strong><br />
appropriate interfaces. The input/output devices are memory mapped and can be accessed as memory locations.<br />
Data accesses <strong>to</strong> memory locations and I/O interfaces are performed by means of Load and S<strong>to</strong>re instructions,<br />
which cause data <strong>to</strong> be transferred between <strong>the</strong> memory and general purpose registers.<br />
5
Program counter<br />
Cyclone <strong>II</strong><br />
FPGA chip<br />
General purpose<br />
registers<br />
Instruction bus selec<strong>to</strong>r logic<br />
Data bus selec<strong>to</strong>r logic<br />
Instruction<br />
cache<br />
Data<br />
cache<br />
Avalon switch fabric<br />
Tightly coupled<br />
instruction memory<br />
Memory<br />
interface<br />
I/O<br />
interface<br />
Tightly coupled<br />
data memory<br />
Memory<br />
device<br />
I/O<br />
device<br />
Figure 4. Memory and I/O organization.<br />
5 Addressing<br />
The <strong>Nios</strong> <strong>II</strong> processor issues 32-bit addresses. The memory space is byte-addressable. Instructions can read and<br />
write words (32 bits), halfwords (16 bits), or bytes (8 bits) of data. Reading or writing <strong>to</strong> an address that does not<br />
correspond <strong>to</strong> an existing memory or I/O location produces an undefined result.<br />
There are five addressing modes provided:<br />
• Immediate mode – a 16-bit operand is given explicitly in <strong>the</strong> instruction. This value may be sign extended<br />
<strong>to</strong> produce a 32-bit operand in instructions that perform arithmetic operations.<br />
• Register mode – <strong>the</strong> operand is in a processor register<br />
• Displacement mode – <strong>the</strong> effective address of <strong>the</strong> operand is <strong>the</strong> sum of <strong>the</strong> contents of a register and a<br />
signed 16-bit displacement value given in <strong>the</strong> instruction<br />
• Register indirect mode – <strong>the</strong> effective address of <strong>the</strong> operand is <strong>the</strong> contents of a register specified in <strong>the</strong><br />
instruction. This is equivalent <strong>to</strong> <strong>the</strong> displacement mode where <strong>the</strong> displacement value is equal <strong>to</strong> 0.<br />
6
• Absolute mode – a 16-bit absolute address of an operand can be specified by using <strong>the</strong> displacement mode<br />
with register r0 which always contains <strong>the</strong> value 0.<br />
6 Instructions<br />
All <strong>Nios</strong> <strong>II</strong> instructions are 32-bits long. In addition <strong>to</strong> machine instructions that are executed directly by <strong>the</strong> processor,<br />
<strong>the</strong> <strong>Nios</strong> <strong>II</strong> instruction set includes a number of pseudoinstructions that can be used in assembly language<br />
programs. The Assembler replaces each pseudoinstruction by one or more machine instructions.<br />
Figure 5 depicts <strong>the</strong> three possible instruction formats: I-type, R-type and J-type. In all cases <strong>the</strong> six bits b 5−0<br />
denote <strong>the</strong> OP code. The remaining bits are used <strong>to</strong> specify registers, immediate operands, or extended OP codes.<br />
• I-type – Five-bit fields A and B are used <strong>to</strong> specify general purpose registers. A 16-bit field IMMED16<br />
provides immediate data which can be sign extended <strong>to</strong> provide a 32-bit operand.<br />
• R-type – Five-bit fields A, B and C are used <strong>to</strong> specify general purpose registers. An 11-bit field OPX is<br />
used <strong>to</strong> extend <strong>the</strong> OP code.<br />
• J-type – A 26-bit field IMMED26 contains an unsigned immediate value. This format is used only in <strong>the</strong><br />
Call instruction.<br />
31<br />
27<br />
26<br />
22<br />
21<br />
6<br />
5<br />
0<br />
A<br />
B<br />
IMMED16<br />
OP<br />
(a) I-type<br />
31<br />
27<br />
26<br />
22<br />
21<br />
17<br />
16<br />
6<br />
5<br />
0<br />
A<br />
B<br />
C<br />
OPX<br />
OP<br />
(b) R-type<br />
31<br />
6<br />
5<br />
0<br />
IMMED26<br />
OP<br />
(c) J-type<br />
Figure 5. Formats of <strong>Nios</strong> <strong>II</strong> instructions.<br />
The following subsections discuss briefly <strong>the</strong> main features of <strong>the</strong> <strong>Nios</strong> <strong>II</strong> instruction set. For a complete<br />
description of <strong>the</strong> instruction set, including <strong>the</strong> details of how each instruction is encoded, <strong>the</strong> reader should<br />
consult <strong>the</strong> <strong>Nios</strong> <strong>II</strong> <strong>Processor</strong> Reference Handbook.<br />
6.1 Load and S<strong>to</strong>re Instructions<br />
Load and S<strong>to</strong>re instructions are used <strong>to</strong> move data between memory (and I/0 interfaces) and <strong>the</strong> general purpose<br />
registers. They are of I-type. For example, <strong>the</strong> Load Word instruction<br />
ldw rB, byte_offset(rA)<br />
7
determines <strong>the</strong> effective address of a memory location as <strong>the</strong> sum of a byte_offset value and <strong>the</strong> contents of register<br />
A. The 16-bit byte_offset value is sign extended <strong>to</strong> 32 bits. The 32-bit memory operand is loaded in<strong>to</strong> register B.<br />
For instance, assume that <strong>the</strong> contents of register r4 are 1260 10 and <strong>the</strong> byte_offset value is 80 10 . Then, <strong>the</strong><br />
instruction<br />
ldw r3, 80(r4)<br />
loads <strong>the</strong> 32-bit operand at memory address 1340 10 in<strong>to</strong> register r3.<br />
The S<strong>to</strong>re Word instruction has <strong>the</strong> format<br />
stw rB, byte_offset(rA)<br />
It s<strong>to</strong>res <strong>the</strong> contents of register B in<strong>to</strong> <strong>the</strong> memory location at <strong>the</strong> address computed as <strong>the</strong> sum of <strong>the</strong> byte_offset<br />
value and <strong>the</strong> contents of register A.<br />
There are Load and S<strong>to</strong>re instructions that use operands that are only 8 or 16 bits long. They are referred <strong>to</strong> as<br />
Load/S<strong>to</strong>re Byte and Load/S<strong>to</strong>re Halfword instructions, respectively. Such Load instructions are:<br />
• ldb (Load Byte)<br />
• ldbu (Load Byte Unsigned)<br />
• ldh (Load Halfword)<br />
• ldhu (Load Halfword Unsigned)<br />
When a shorter operand is loaded in<strong>to</strong> a 32-bit register, its value has <strong>to</strong> be adjusted <strong>to</strong> fit in<strong>to</strong> <strong>the</strong> register. This<br />
is done by sign extending <strong>the</strong> 8- or 16-bit value <strong>to</strong> 32 bits in <strong>the</strong> ldb and ldh instructions. In <strong>the</strong> ldbu and ldhu<br />
instructions <strong>the</strong> operand is zero extended.<br />
The corresponding S<strong>to</strong>re instructions are:<br />
• stb (S<strong>to</strong>re Byte)<br />
• sth (S<strong>to</strong>re Halfword)<br />
The stb instruction s<strong>to</strong>res <strong>the</strong> low byte of register B in<strong>to</strong> <strong>the</strong> memory byte specified by <strong>the</strong> effective address. The<br />
sth instruction s<strong>to</strong>res <strong>the</strong> low halfword of register B. In this case <strong>the</strong> effective address must be halfword aligned.<br />
Each Load and S<strong>to</strong>re instruction has a version intended for accessing locations in I/O device interfaces. These<br />
instructions are:<br />
• ldwio (Load Word I/O)<br />
• ldbio<br />
• ldbuio<br />
• ldhio<br />
• ldhuio<br />
(Load Byte I/O)<br />
(Load Byte Unsigned I/O)<br />
(Load Halfword I/O)<br />
(Load Halfword Unsigned I/O)<br />
• stwio (S<strong>to</strong>re Word I/O)<br />
• stbio<br />
• sthio<br />
(S<strong>to</strong>re Byte I/O)<br />
(S<strong>to</strong>re Halfword I/O)<br />
The difference is that <strong>the</strong>se instructions bypass <strong>the</strong> cache, if one exists.<br />
8
6.2 Arithmetic Instructions<br />
The arithmetic instructions operate on <strong>the</strong> data that is ei<strong>the</strong>r in <strong>the</strong> general purpose registers or given as an immediate<br />
value in <strong>the</strong> instruction. These instructions are of R-type or I-type, respectively. They include:<br />
• add (Add Registers)<br />
• addi (Add Immediate)<br />
• sub (Subtract Registers)<br />
• subi (Subtract Immediate)<br />
• mul (Multiply)<br />
• muli (Multiply Immediate)<br />
• div (Divide)<br />
• divu (Divide Unsigned)<br />
The Add instruction<br />
add rC, rA, rB<br />
adds <strong>the</strong> contents of registers A and B, and places <strong>the</strong> sum in<strong>to</strong> register C.<br />
The Add Immediate instruction<br />
addi rB, rA, IMMED16<br />
adds <strong>the</strong> contents of register A and <strong>the</strong> sign-extended 16-bit operand given in <strong>the</strong> instruction, and places <strong>the</strong> result<br />
in<strong>to</strong> register B. The addition operation in <strong>the</strong>se instructions is <strong>the</strong> same for both signed and unsigned operands;<br />
<strong>the</strong>re are no condition flags that are set by <strong>the</strong> operation. This means that when unsigned operands are added, <strong>the</strong><br />
carry from <strong>the</strong> most significant bit position has <strong>to</strong> be detected by executing a separate instruction. Similarly, when<br />
signed operands are added, <strong>the</strong> arithmetic overflow has <strong>to</strong> be detected separately. The detection of <strong>the</strong>se conditions<br />
is dicussed in section 6.11.<br />
The Subtract instruction<br />
sub rC, rA, rB<br />
subtracts <strong>the</strong> contents of register B from register A, and places <strong>the</strong> result in<strong>to</strong> register C. Again, <strong>the</strong> carry and<br />
overflow detection has <strong>to</strong> be done by using additional instructions, as explained in section 6.11.<br />
The immediate version, subi, is a pseudoinstruction implemented as<br />
addi rB, rA, -IMMED16<br />
The Multiply instruction<br />
mul rC, rA, rB<br />
multiplies <strong>the</strong> contents of registers A and B, and places <strong>the</strong> low-order 32 bits of <strong>the</strong> product in<strong>to</strong> register C. The<br />
operands are treated as unsigned numbers. The carry and overflow detection has <strong>to</strong> be done by using additional<br />
instructions. In <strong>the</strong> immediate version<br />
muli rB, rA, IMMED16<br />
<strong>the</strong> 16-bit immediate operand is sign extended <strong>to</strong> 32 bits.<br />
9
The Divide instruction<br />
div rC, rA, rB<br />
divides <strong>the</strong> contents of register A by <strong>the</strong> contents of register B and places <strong>the</strong> integer portion of <strong>the</strong> quotient in<strong>to</strong><br />
register C. The operands are treated as signed integers. The divu instruction is performed in <strong>the</strong> same way except<br />
that <strong>the</strong> operands are treated as unsigned integers.<br />
6.3 Logic Instructions<br />
The logic instructions provide <strong>the</strong> AND, OR, XOR, and NOR operations. They operate on data that is ei<strong>the</strong>r in<br />
<strong>the</strong> general purpose registers or given as an immediate value in <strong>the</strong> instruction. These instructions are of R-type or<br />
I-type, respectively.<br />
The AND instruction<br />
and rC, rA, rB<br />
performs a bitwise logical AND of <strong>the</strong> contents of registers A and B, and s<strong>to</strong>res <strong>the</strong> result in register C. Similarly,<br />
<strong>the</strong> instructions or, xor and nor perform <strong>the</strong> OR, XOR and NOR operations, respectively.<br />
The AND Immediate instruction<br />
andi rB, rA, IMMED16<br />
performs a bitwise logical AND of <strong>the</strong> contents of register A and <strong>the</strong> IMMED16 operand which is zero-extended<br />
<strong>to</strong> 32 bits, and s<strong>to</strong>res <strong>the</strong> result in register B. Similarly, <strong>the</strong> instructions ori, xori and nori perform <strong>the</strong> OR, XOR<br />
and NOR operations, respectively.<br />
It is also possible <strong>to</strong> use <strong>the</strong> 16-bit immediate operand as <strong>the</strong> 16 high-order bits in <strong>the</strong> logic operations, in which<br />
case <strong>the</strong> low-order 16 bits of <strong>the</strong> operand are zeros. This is accomplished with <strong>the</strong> instructions:<br />
• andhi<br />
(AND High Immediate)<br />
• orhi (OR High Immediate)<br />
• xorhi<br />
(XOR High Immediate)<br />
6.4 Move Instructions<br />
The Move instructions copy <strong>the</strong> contents of one register in<strong>to</strong> ano<strong>the</strong>r, or <strong>the</strong>y place an immediate value in<strong>to</strong> a<br />
register. They are pseudoinstructions implemented by using o<strong>the</strong>r instructions. The instruction<br />
mov rC, rA<br />
copies <strong>the</strong> contents of register A in<strong>to</strong> register C. It is implemented as<br />
add rC, rA, r0<br />
The Move Immediate instruction<br />
movi rB, IMMED16<br />
sign extends <strong>the</strong> IMMED16 value <strong>to</strong> 32 bits and loads it in<strong>to</strong> register B. It is implemented as<br />
addi rB, r0, IMMED16<br />
The Move Unsigned Immediate instruction<br />
10
movui rB, IMMED16<br />
zero extends <strong>the</strong> IMMED16 value <strong>to</strong> 32 bits and loads it in<strong>to</strong> register B. It is implemented as<br />
ori rB, r0, IMMED16<br />
The Move Immediate Address instruction<br />
movia rB, LABEL<br />
loads a 32-bit value that corresponds <strong>to</strong> <strong>the</strong> address LABEL in<strong>to</strong> register B. It is implemented as:<br />
orhi<br />
ori<br />
rB, r0, %hi(LABEL)<br />
rB, rB, %lo(LABEL)<br />
The %hi(LABEL) and %lo(LABEL) are <strong>the</strong> Assembler macros which extract <strong>the</strong> high-order 16 bits and <strong>the</strong> loworder<br />
16 bits, respectively, of a 32-bit value LABEL. The orhi instruction sets <strong>the</strong> high-order bits of register B,<br />
followed by <strong>the</strong> ori instruction which sets <strong>the</strong> low-order bits of B. Note that two instructions are used because <strong>the</strong><br />
I-type format provides for only a 16-bit immediate operand.<br />
6.5 Comparison Instructions<br />
The Comparison instructions compare <strong>the</strong> contents of two registers or <strong>the</strong> contents of a register and an immediate<br />
value, and write ei<strong>the</strong>r 1 (if true) or 0 (if false) in<strong>to</strong> <strong>the</strong> result register. They are of R-type or I-type, respectively.<br />
These instructions correspond <strong>to</strong> <strong>the</strong> equality and relational opera<strong>to</strong>rs in <strong>the</strong> C programming language.<br />
The Compare Less Than Signed instruction<br />
cmplt rC, rA, rB<br />
performs <strong>the</strong> comparison of signed numbers in registers A and B, rA < rB, and writes a 1 in<strong>to</strong> register C if <strong>the</strong><br />
result is true; o<strong>the</strong>rwise, it writes a 0.<br />
The Compare Less Than Unsigned instruction<br />
cmpltu rC, rA, rB<br />
performs <strong>the</strong> same function as <strong>the</strong> cmplt instruction, but it treats <strong>the</strong> operands as unsigned numbers.<br />
O<strong>the</strong>r instructions of this type are:<br />
• cmpeq rC, rA, rB<br />
• cmpne rC, rA, rB<br />
• cmpge rC, rA, rB<br />
(Comparison rA == rB)<br />
(Comparison rA != rB)<br />
(Signed comparison rA >= rB)<br />
• cmpgeu rC, rA, rB (Unsigned comparison rA >= rB)<br />
• cmpgt rC, rA, rB (Signed comparison rA > rB)<br />
This is a pseudoinstruction implemented as <strong>the</strong> cmplt instruction by swapping its rA and rB operands.<br />
• cmpgtu rC, rA, rB (Unsigned comparison rA > rB)<br />
This is a pseudoinstruction implemented as <strong>the</strong> cmpltu instruction by swapping its rA and rB operands.<br />
• cmple rC, rA, rB (Signed comparison rA
The immediate versions of <strong>the</strong> Comparison instructions involve an immediate operand. For example, <strong>the</strong><br />
Compare Less Than Signed Immediate instruction<br />
cmplti rB, rA, IMMED16<br />
compares <strong>the</strong> signed number in register A with <strong>the</strong> sign-extended immediate operand. It writes a 1 in<strong>to</strong> register B<br />
if rA < IMMED16; o<strong>the</strong>rwise, it writes a 0.<br />
The Compare Less Than Unsigned Immediate instruction<br />
cmpltui rB, rA, IMMED16<br />
compares <strong>the</strong> unsigned number in register A with <strong>the</strong> zero-extended immediate operand. It writes a 1 in<strong>to</strong> register<br />
B if rA < IMMED16; o<strong>the</strong>rwise, it writes a 0.<br />
O<strong>the</strong>r instructions of this type are:<br />
• cmpeqi rB, rA, IMMED16<br />
• cmpnei rB, rA, IMMED16<br />
• cmpgei rB, rA, IMMED16<br />
• cmpgeui rB, rA, IMMED16<br />
(Comparison rA == IMMED16)<br />
(Comparison rA != IMMED16)<br />
(Signed comparison rA >= IMMED16)<br />
(Unsigned comparison rA >= IMMED16)<br />
• cmpgti rB, rA, IMMED16 (Signed comparison rA > IMMED16)<br />
This is a pseudoinstruction which is implemented by using <strong>the</strong> cmpgei instruction with an immediate value<br />
IMMED16 + 1.<br />
• cmpgtui rB, rA, IMMED16 (Unsigned comparison rA > IMMED16)<br />
This is a pseudoinstruction which is implemented by using <strong>the</strong> cmpgeui instruction with an immediate<br />
value IMMED16 + 1.<br />
• cmplei rB, rA, IMMED16 (Signed comparison rA and
The srl instruction shifts <strong>the</strong> contents of register A <strong>to</strong> <strong>the</strong> right by <strong>the</strong> number of bit positions specified by <strong>the</strong> five<br />
least-significant bits (number in <strong>the</strong> range 0 <strong>to</strong> 31) in register B, and s<strong>to</strong>res <strong>the</strong> result in register C. The vacated<br />
bits on <strong>the</strong> left side of <strong>the</strong> shifted operand are filled with 0s.<br />
The srli instruction shifts <strong>the</strong> contents of register A <strong>to</strong> <strong>the</strong> right by <strong>the</strong> number of bit positions specified by <strong>the</strong><br />
five-bit unsigned value, IMMED5, given in <strong>the</strong> instruction.<br />
The sra and srai instructions perform <strong>the</strong> same actions as <strong>the</strong> srl and srli instructions, except that <strong>the</strong> sign bit,<br />
rA 31 , is replicated in<strong>to</strong> <strong>the</strong> vacated bits on <strong>the</strong> left side of <strong>the</strong> shifted operand.<br />
The sll and slli instructions are similar <strong>to</strong> <strong>the</strong> srl and srli instructions, but <strong>the</strong>y shift <strong>the</strong> operand in register A <strong>to</strong> <strong>the</strong><br />
left and fill <strong>the</strong> vacated bits on <strong>the</strong> right side with 0s.<br />
6.7 Rotate Instructions<br />
There are three Rotate instructions, which use <strong>the</strong> R-type format:<br />
• ror rC, rA, rB (Rotate Right)<br />
• rol rC, rA, rB (Rotate Left)<br />
• roli rC, rA, IMMED5<br />
(Rotate Left Immediate)<br />
The ror instruction rotates <strong>the</strong> bits of register A in <strong>the</strong> left-<strong>to</strong>-right direction by <strong>the</strong> number of bit positions specified<br />
by <strong>the</strong> five least-significant bits (number in <strong>the</strong> range 0 <strong>to</strong> 31) in register B, and s<strong>to</strong>res <strong>the</strong> result in register C.<br />
The rol instruction is similar <strong>to</strong> <strong>the</strong> ror instruction, but it rotates <strong>the</strong> operand in <strong>the</strong> right-<strong>to</strong>-left direction.<br />
The roli instruction rotates <strong>the</strong> bits of register A in <strong>the</strong> right-<strong>to</strong>-left direction by <strong>the</strong> number of bit positions specified<br />
by <strong>the</strong> five-bit unsigned value, IMMED5, given in <strong>the</strong> instruction, and s<strong>to</strong>res <strong>the</strong> result in register C.<br />
6.8 Branch and Jump Instructions<br />
The flow of execution of a program can be changed by executing Branch or Jump instructions. It may be changed<br />
ei<strong>the</strong>r unconditionally or conditionally.<br />
The Jump instruction<br />
jmp rA<br />
transfers execution unconditionally <strong>to</strong> <strong>the</strong> address contained in register A.<br />
The Unconditional Branch instruction<br />
br LABEL<br />
transfers execution unconditionally <strong>to</strong> <strong>the</strong> instruction at address LABEL. This is an instruction of I-type, in which<br />
a 16-bit immediate value (interpreted as a signed number) specifies <strong>the</strong> offset <strong>to</strong> <strong>the</strong> branch target instruction. The<br />
offset is <strong>the</strong> distance in bytes from <strong>the</strong> instruction that immediately follows br <strong>to</strong> <strong>the</strong> address LABEL.<br />
Conditional transfer of execution is achieved with <strong>the</strong> Conditional Branch instructions, which compare <strong>the</strong><br />
contents of two registers and cause a branch if <strong>the</strong> result is true. These instructions are of I-type and <strong>the</strong> offset is<br />
determined as explained above for <strong>the</strong> br instruction.<br />
13
The Branch if Less Than Signed instruction<br />
blt rA, rB, LABEL<br />
performs <strong>the</strong> comparison rA < rB, treating <strong>the</strong> contents of <strong>the</strong> registers as signed numbers.<br />
The Branch if Less Than Unsigned instruction<br />
bltu rA, rB, LABEL<br />
performs <strong>the</strong> comparison rA < rB, treating <strong>the</strong> contents of <strong>the</strong> registers as unsigned numbers.<br />
The o<strong>the</strong>r Conditional Branch instructions are:<br />
• beq rA, rB, LABEL<br />
• bne rA, rB, LABEL<br />
• bge rA, rB, LABEL<br />
• bgeu rA, rB, LABEL<br />
(Comparison rA == rB)<br />
(Comparison rA != rB)<br />
(Signed comparison rA >= rB)<br />
(Unsigned comparison rA >= rB)<br />
• bgt rA, rB, LABEL (Signed comparison rA > rB)<br />
This is a pseudoinstruction implemented as <strong>the</strong> blt instruction by swapping <strong>the</strong> register operands.<br />
• bgtu rA, rB, LABEL (Unsigned comparison rA > rB)<br />
This is a pseudoinstruction implemented as <strong>the</strong> bltu instruction by swapping <strong>the</strong> register operands.<br />
• ble rA, rB, LABEL (Signed comparison rA
6.10 Control Instructions<br />
The <strong>Nios</strong> <strong>II</strong> control registers can be read and written by special instructions. The Read Control Register instruction<br />
rdctl rC, ctlN<br />
copies <strong>the</strong> contents of control register ctlN in<strong>to</strong> register C.<br />
The Write Control Register instruction<br />
wrctl ctlN, rA<br />
copies <strong>the</strong> contents of register A in<strong>to</strong> <strong>the</strong> control register ctlN.<br />
There are two instructions provided for dealing with exceptions: trap and eret. They are similar <strong>to</strong> <strong>the</strong> call<br />
and ret instructions, but <strong>the</strong>y are used for exceptions. Their use is discussed in section 8.2.<br />
The instructions break and bret generate breaks and return from breaks. They are used exclusively by <strong>the</strong><br />
software debugging <strong>to</strong>ols.<br />
The <strong>Nios</strong> <strong>II</strong> cache memories are managed with <strong>the</strong> instructions: flushd (Flush Data Cache Line), flushi (Flush<br />
Instruction Cache Line), initd (Initialize Data Cache Line), and initi (Initialize Instruction Cache Line). These<br />
instructions are discussed in section 9.1.<br />
6.11 Carry and Overflow Detection<br />
As pointed out in section 6.2, <strong>the</strong> Add and Subtract instructions perform <strong>the</strong> corresponding operations in <strong>the</strong> same<br />
way for both signed and unsigned operands. The possible carry and arithmetic overflow conditions are not detected,<br />
because <strong>Nios</strong> <strong>II</strong> does not contain condition flags that might be set as a result. These conditions can be<br />
detected by using additional instructions.<br />
Consider <strong>the</strong> Add instruction<br />
add rC, rA, rB<br />
Having executed this instruction, a possible occurrence of a carry out of <strong>the</strong> most-significant bit (C 31 ) can be<br />
detected by checking whe<strong>the</strong>r <strong>the</strong> unsigned sum (in register C) is less than one of <strong>the</strong> unsigned operands. For<br />
example, if this instruction is followed by <strong>the</strong> instruction<br />
<strong>the</strong>n <strong>the</strong> carry bit will be written in<strong>to</strong> register D.<br />
cmpltu rD, rC, rA<br />
Similarly, if a branch is required when a carry occurs, this can be accomplished as follows:<br />
add<br />
bltu<br />
rC, rA, rB<br />
rC, rA, LABEL<br />
A test for arithmetic overflow can be done by checking <strong>the</strong> signs of <strong>the</strong> summands and <strong>the</strong> resulting sum. An<br />
overflow occurs if two positive numbers produce a negative sum, or if two negative numbers produce a positive<br />
sum. Using this approach, <strong>the</strong> overflow condition can control a conditional branch as follows:<br />
add rC, rA, rB /* The required Add operation */<br />
xor rD, rC, rA /* Compare signs of sum and rA */<br />
xor rE, rC, rB /* Compare signs of sum and rB */<br />
and rD, rD, rE /* Set D 31 = 1 if ((A 31 == B 31 ) ! = C 31 ) */<br />
blt rD, r0, LABEL /* Branch if overflow occurred */<br />
15
A similar approach can be used <strong>to</strong> detect <strong>the</strong> carry and overflow conditions in Subtract operations. A carry out<br />
of <strong>the</strong> most-significant bit of <strong>the</strong> resulting difference can be detected by checking whe<strong>the</strong>r <strong>the</strong> first operand is less<br />
than <strong>the</strong> second operand. Thus, <strong>the</strong> carry can be used <strong>to</strong> control a conditional branch as follows:<br />
sub<br />
bltu<br />
rC, rA, rB<br />
rA, rB, LABEL<br />
The arithmetic overflow in a Subtract operation is detected by comparing <strong>the</strong> sign of <strong>the</strong> generated difference with<br />
<strong>the</strong> signs of <strong>the</strong> operands. Overflow occurs if <strong>the</strong> operands in registers A and B have different signs, and <strong>the</strong> sign<br />
of <strong>the</strong> difference in register C is different than <strong>the</strong> sign of A. Thus, a conditional branch based on <strong>the</strong> arithmetic<br />
overflow can be achieved as follows:<br />
sub rC, rA, rB /* The required Subtract operation */<br />
xor rD, rA, rB /* Compare signs of rA and rB */<br />
xor rE, rA, rC /* Compare signs of rA and rC */<br />
and rD, rD, rE /* Set D 31 = 1 if ((A 31 ! = B 31 ) && (A 31 ! = C 31 )) */<br />
blt rD, r0, LABEL /* Branch if overflow occurred */<br />
7 Assembler Directives<br />
The <strong>Nios</strong> <strong>II</strong> Assembler conforms <strong>to</strong> <strong>the</strong> widely used GNU Assembler, which is software available in <strong>the</strong> public<br />
domain. Thus, <strong>the</strong> GNU Assembler directives can be used in <strong>Nios</strong> <strong>II</strong> programs. Assembler directives begin with a<br />
period. We describe some of <strong>the</strong> more frequently used assembler directives below.<br />
.ascii "string"...<br />
A string of ASC<strong>II</strong> characters is loaded in<strong>to</strong> consecutive byte addresses in <strong>the</strong> memory. Multiple strings, separated<br />
by commas, can be specified.<br />
.asciz "string"...<br />
This directive is <strong>the</strong> same as .ascii, except that each string is followed (terminated) by a zero byte.<br />
.byte expressions<br />
Expressions separated by commas are specified. Each expression is assembled in<strong>to</strong> <strong>the</strong> next byte. Examples of<br />
expressions are: 8, 5 + LABEL, and K − 6.<br />
.data<br />
Identifies <strong>the</strong> data that should be placed in <strong>the</strong> data section of <strong>the</strong> memory. The desired memory location for <strong>the</strong><br />
data section can be specified in <strong>the</strong> <strong>Altera</strong> Moni<strong>to</strong>r Program’s system configuration window.<br />
.end<br />
Marks <strong>the</strong> end of <strong>the</strong> source code file; everything after this directive is ignored by <strong>the</strong> assembler.<br />
.equ symbol, expression<br />
Sets <strong>the</strong> value of symbol <strong>to</strong> expression.<br />
16
.global symbol<br />
Makes symbol visible outside <strong>the</strong> assembled object file.<br />
.hword expressions<br />
Expressions separated by commas are specified. Each expression is assembled in<strong>to</strong> a 16-bit number.<br />
.include "filename"<br />
Provides a mechanism for including supporting files in a source program.<br />
.org new-lc<br />
Advances <strong>the</strong> location counter by new-lc, where new-lc is used as an offset from <strong>the</strong> starting location specified<br />
in <strong>the</strong> <strong>Altera</strong> Moni<strong>to</strong>r Program’s system configuration window. The .org directive may only increase <strong>the</strong> location<br />
counter, or leave it unchanged; it cannot move <strong>the</strong> location counter backwards.<br />
.skip size<br />
Emits <strong>the</strong> number of bytes specified in size; <strong>the</strong> value of each byte is zero.<br />
.text<br />
Identifies <strong>the</strong> code that should be placed in <strong>the</strong> text section of <strong>the</strong> memory. The desired memory location for <strong>the</strong><br />
text section can be specified in <strong>the</strong> <strong>Altera</strong> Moni<strong>to</strong>r Program’s system configuration window.<br />
.word expressions<br />
Expressions separated by commas are specified. Each expression is assembled in<strong>to</strong> a 32-bit number.<br />
8 Example Program<br />
As an illustration of <strong>Nios</strong> <strong>II</strong> instructions and assembler directives, Figure 6 gives an assembly language program<br />
that computes a dot product of two vec<strong>to</strong>rs, A and B. The vec<strong>to</strong>rs have n elements. The required computation is<br />
Dot product = ∑ n−1<br />
i=0<br />
A(i) × B(i)<br />
The vec<strong>to</strong>rs are s<strong>to</strong>red in memory locations at addresses AVECTOR and BVECTOR, respectively. The number of elements,<br />
n, is s<strong>to</strong>red in memory location N. The computed result is written in<strong>to</strong> memory location DOT_PRODUCT.<br />
Each vec<strong>to</strong>r element is assumed <strong>to</strong> be a signed 32-bit number.<br />
17
.include "nios_macros.s"<br />
.global _start<br />
_start:<br />
movia r2, AVECTOR /* Register r2 is a pointer <strong>to</strong> vec<strong>to</strong>r A */<br />
movia r3, BVECTOR /* Register r3 is a pointer <strong>to</strong> vec<strong>to</strong>r B */<br />
movia r4, N<br />
ldw r4, 0(r4) /* Register r4 is used as <strong>the</strong> counter for loop iterations */<br />
add r5, r0, r0 /* Register r5 is used <strong>to</strong> accumulate <strong>the</strong> product */<br />
LOOP: ldw r6, 0(r2) /* Load <strong>the</strong> next element of vec<strong>to</strong>r A */<br />
ldw r7, 0(r3) /* Load <strong>the</strong> next element of vec<strong>to</strong>r B */<br />
mul r8, r6, r7 /* Compute <strong>the</strong> product of next pair of elements */<br />
add r5, r5, r8 /* Add <strong>to</strong> <strong>the</strong> sum */<br />
addi r2, r2, 4 /* Increment <strong>the</strong> pointer <strong>to</strong> vec<strong>to</strong>r A */<br />
addi r3, r3, 4 /* Increment <strong>the</strong> pointer <strong>to</strong> vec<strong>to</strong>r B */<br />
subi r4, r4, 1 /* Decrement <strong>the</strong> counter */<br />
bgt r4, r0, LOOP /* Loop again if not finished */<br />
stw r5, DOT_PRODUCT(r0) /* S<strong>to</strong>re <strong>the</strong> result in memory */<br />
STOP: br STOP<br />
N:<br />
.word 6 /* Specify <strong>the</strong> number of elements */<br />
AVECTOR:<br />
.word 5, 3, −6, 19, 8, 12 /* Specify <strong>the</strong> elements of vec<strong>to</strong>r A */<br />
BVECTOR:<br />
.word 2, 14, −3, 2, −5, 36 /* Specify <strong>the</strong> elements of vec<strong>to</strong>r B */<br />
DOT_PRODUCT:<br />
.skip 4<br />
Figure 6. A program that computes <strong>the</strong> dot product of two vec<strong>to</strong>rs.<br />
Note that <strong>the</strong> program ends by continuously looping on <strong>the</strong> last Branch instruction. If instead we wanted <strong>to</strong> pass<br />
control <strong>to</strong> debugging software, we could replace this br instruction with <strong>the</strong> break instruction.<br />
The program includes <strong>the</strong> assembler directive<br />
.include "nios_macros.s"<br />
which informs <strong>the</strong> Assembler <strong>to</strong> use some macro commands that have been created for <strong>the</strong> <strong>Nios</strong> <strong>II</strong> processor. In<br />
this program, <strong>the</strong> macro used converts <strong>the</strong> movia pseudoinstruction in<strong>to</strong> two OR instructions as explained in section<br />
6.4.<br />
The directive<br />
.global _start<br />
indicates <strong>to</strong> <strong>the</strong> Assembler that <strong>the</strong> label _start is accessible outside <strong>the</strong> assembled object file. This label is <strong>the</strong><br />
default label we use <strong>to</strong> indicate <strong>to</strong> <strong>the</strong> Linker program <strong>the</strong> beginning of <strong>the</strong> application program.<br />
The program includes some sample data. It illustrates how <strong>the</strong> .word directive can be used <strong>to</strong> load data items<br />
in<strong>to</strong> memory. The memory locations involved are those that follow <strong>the</strong> location occupied by <strong>the</strong> br instruction.<br />
Since we have not explicitly specified <strong>the</strong> starting address of <strong>the</strong> program itself, <strong>the</strong> assembled code will be loaded<br />
in memory starting at address 0.<br />
To execute <strong>the</strong> program in Figure 6 on <strong>Altera</strong>’s DE2 board, it is necessary <strong>to</strong> implement a <strong>Nios</strong> <strong>II</strong> processor<br />
and its memory (which can be just <strong>the</strong> on-chip memory of <strong>the</strong> Cyclone <strong>II</strong> FPGA). Since <strong>the</strong> program includes <strong>the</strong><br />
18
Multiply instruction, it cannot be executed on <strong>the</strong> economy version of <strong>the</strong> processor, because <strong>Nios</strong> <strong>II</strong>/e does not<br />
support <strong>the</strong> mul instruction. Ei<strong>the</strong>r <strong>Nios</strong> <strong>II</strong>/s or <strong>Nios</strong> <strong>II</strong>/f processors can be used.<br />
The tu<strong>to</strong>rial <strong>Introduction</strong> <strong>to</strong> <strong>the</strong> <strong>Altera</strong> SOPC Builder explains how a <strong>Nios</strong> <strong>II</strong> system can be implemented.<br />
The tu<strong>to</strong>rial <strong>Altera</strong> Moni<strong>to</strong>r Program explains how an application program can be assembled, downloaded and<br />
executed on <strong>the</strong> DE2 board.<br />
9 Exception Processing<br />
An exception in <strong>the</strong> normal flow of program execution can be caused by:<br />
• <strong>Soft</strong>ware trap<br />
• Hardware interrupt<br />
• Unimplemented instruction<br />
In response <strong>to</strong> an exception <strong>the</strong> <strong>Nios</strong> <strong>II</strong> processor performs <strong>the</strong> following actions:<br />
1. Saves <strong>the</strong> existing processor status information by copying <strong>the</strong> contents of <strong>the</strong> status register (ctl0) in<strong>to</strong> <strong>the</strong><br />
estatus register (ctl1)<br />
2. Clears <strong>the</strong> U bit in <strong>the</strong> status register, <strong>to</strong> ensure that <strong>the</strong> processor is in <strong>the</strong> Supervisor mode<br />
3. Clears <strong>the</strong> PIE bit in <strong>the</strong> status register, thus disabling <strong>the</strong> additional external processor interrupts<br />
4. Writes <strong>the</strong> address of <strong>the</strong> instruction after <strong>the</strong> exception in<strong>to</strong> <strong>the</strong> ea register (r29)<br />
5. Transfers execution <strong>to</strong> <strong>the</strong> address of <strong>the</strong> exception handler which determines <strong>the</strong> cause of <strong>the</strong> exception and<br />
dispatches an appropriate exception routine <strong>to</strong> respond <strong>to</strong> <strong>the</strong> exception<br />
The address of <strong>the</strong> exception handler is specified at system generation time using <strong>the</strong> SOPC Builder, and it cannot<br />
be changed by software at run time. This address can be provided by <strong>the</strong> designer; o<strong>the</strong>rwise, <strong>the</strong> default address<br />
is 20 16 from <strong>the</strong> starting address of <strong>the</strong> main memory. For example, if <strong>the</strong> memory starts at address 0, <strong>the</strong>n <strong>the</strong><br />
default address of <strong>the</strong> exception handler is 0x00000020.<br />
9.1 <strong>Soft</strong>ware Trap<br />
A software exception occurs when a trap instruction is encountered in a program. This instruction saves <strong>the</strong><br />
address of <strong>the</strong> next instruction in <strong>the</strong> ea register (r29). Then, it disables interrupts and transfers execution <strong>to</strong> <strong>the</strong><br />
exception handler.<br />
In <strong>the</strong> exception-service routine <strong>the</strong> last instruction is eret (Exception Return), which returns execution control<br />
<strong>to</strong> <strong>the</strong> instruction that follows <strong>the</strong> trap instruction that caused <strong>the</strong> exception. The return address is given by <strong>the</strong><br />
contents of register ea. The eret instruction res<strong>to</strong>res <strong>the</strong> previous status of <strong>the</strong> processor by copying <strong>the</strong> contents<br />
of <strong>the</strong> estatus register in<strong>to</strong> <strong>the</strong> status register.<br />
A common use of <strong>the</strong> software trap is <strong>to</strong> transfer control <strong>to</strong> a different program, such as an operating system.<br />
9.2 Hardware Interrupts<br />
Hardware interrupts can be raised by external sources, such as I/O devices, by asserting one of <strong>the</strong> processor’s 32<br />
interrupt-request inputs, irq0 through irq31. An interrupt is generated only if <strong>the</strong> following three conditions are<br />
true:<br />
• The PIE bit in <strong>the</strong> status register is set <strong>to</strong> 1<br />
• An interrupt-request input, irqk, is asserted<br />
• The corresponding interrupt-enable bit, ctl3 k , is set <strong>to</strong> 1<br />
19
The contents of <strong>the</strong> ipending register (ctl4) indicate which interrupt requests are pending. An exception routine<br />
determines which of <strong>the</strong> pending interrupts has <strong>the</strong> highest priority, and transfers control <strong>to</strong> <strong>the</strong> corresponding<br />
interrupt-service routine.<br />
Upon completion of <strong>the</strong> interrupt-service routine, <strong>the</strong> execution control is returned <strong>to</strong> <strong>the</strong> interrupted program<br />
by means of <strong>the</strong> eret instruction, as explained above. However, since an external interrupt request is handled<br />
without first completing <strong>the</strong> instruction that is being executed when <strong>the</strong> interrupt occurs, <strong>the</strong> interrupted instruction<br />
must be re-executed upon return from <strong>the</strong> interrupt-service routine. To achieve this, <strong>the</strong> interrupt-service routine<br />
has <strong>to</strong> adjust <strong>the</strong> contents of <strong>the</strong> ea register which are at this time pointing <strong>to</strong> <strong>the</strong> next instruction of <strong>the</strong> interrupted<br />
program. Hence, <strong>the</strong> value in <strong>the</strong> ea register has <strong>to</strong> be decremented by 4 prior <strong>to</strong> executing <strong>the</strong> eret instruction.<br />
9.3 Unimplemented Instructions<br />
This exception occurs when <strong>the</strong> processor encounters a valid instruction that is not implemented in hardware. This<br />
may be <strong>the</strong> case with instructions such as mul and div. The exception handler may call a routine that emulates <strong>the</strong><br />
required operation in software.<br />
9.4 Determining <strong>the</strong> Type of Exception<br />
When an exception occurs, <strong>the</strong> exception-handling routine has <strong>to</strong> determine what type of exception has occurred.<br />
The order in which <strong>the</strong> exceptions should be checked is:<br />
1. Read <strong>the</strong> ipending register <strong>to</strong> see if a hardware interrupt has occurred; if so, <strong>the</strong>n go <strong>to</strong> <strong>the</strong> appropriate<br />
interrupt-service routine.<br />
2. Read <strong>the</strong> instruction that was being executed when <strong>the</strong> exception occurred. The address of this instruction<br />
is <strong>the</strong> value in <strong>the</strong> ea register minus 4. If this is <strong>the</strong> trap instruction, <strong>the</strong>n go <strong>to</strong> <strong>the</strong> software-trap-handling<br />
routine.<br />
3. O<strong>the</strong>rwise, <strong>the</strong> exception is due <strong>to</strong> an unimplemented instruction.<br />
9.5 Exception Processing Example<br />
The following example illustrates <strong>the</strong> <strong>Nios</strong> <strong>II</strong> code needed <strong>to</strong> deal with a hardware interrupt. We will assume that<br />
an I/O device raises an interrupt request on <strong>the</strong> interrupt-request input irq1. Also, let <strong>the</strong> exception handler start at<br />
address 0x020, and <strong>the</strong> interrupt-service routine for <strong>the</strong> irq1 request start at address 0x0100.<br />
Figure 7 shows a portion of <strong>the</strong> code that can be used for this purpose. The exception handler first determines<br />
<strong>the</strong> type of exception that has occurred. Having determined that <strong>the</strong>re is a hardware interrupt request, it finds <strong>the</strong><br />
specific interrupt by examining <strong>the</strong> bits of <strong>the</strong> et register which has a copy of control register ctl4. If bit et 1 is<br />
equal <strong>to</strong> 1, <strong>the</strong>n <strong>the</strong> <strong>the</strong> interrupt-service routine EXT_IRQ1 is executed. O<strong>the</strong>rwise, it is necessary <strong>to</strong> check for<br />
o<strong>the</strong>r possible interrupts.<br />
Note that in Figure 7 we are using register r13 in <strong>the</strong> process of testing whe<strong>the</strong>r <strong>the</strong> bit irq1 is set <strong>to</strong> 1. In a<br />
practical application program this register may also be used for some o<strong>the</strong>r purpose, in which case its contents<br />
should first be saved on <strong>the</strong> stack and later res<strong>to</strong>red prior <strong>to</strong> returning from <strong>the</strong> exception handler.<br />
20
.org 0x20<br />
/* Exception handler */<br />
rdctl et, ctl4 /* Check if external interrupt occurred */<br />
beq et, r0, OTHER_EXCEPTIONS /* If zero, check exceptions */<br />
subi ea, ea, 4 /* Hardware interrupt, decrement ea <strong>to</strong> execute <strong>the</strong> interrupted */<br />
/* instruction upon return <strong>to</strong> main program */<br />
andi r13, et, 2 /* Check if irq1 asserted */<br />
bne r13, r0, EXT_IRQ1 /* If yes, go <strong>to</strong> IRQ1 service routine */<br />
/* Instructions that check for o<strong>the</strong>r hardware interrupts should be placed here */<br />
OTHER_EXCEPTIONS:<br />
/* Instructions that check for o<strong>the</strong>r types of exceptions should be placed here */<br />
.org 0x100<br />
/* Interrupt-service routine for <strong>the</strong> desired hardware interrupt */<br />
EXT_IRQ1:<br />
/* Instructions that handle <strong>the</strong> irq1 interrupt request should be placed here */<br />
eret /* Return from exception */<br />
Figure 7. Code used <strong>to</strong> handle a hardware interrupt.<br />
10 Cache Memory<br />
As shown in Figure 4, a <strong>Nios</strong> <strong>II</strong> system can include instruction and data caches, which are implemented in <strong>the</strong><br />
memory blocks in <strong>the</strong> FPGA chip. The caches can be specified when a system is being designed by using <strong>the</strong><br />
SOPC Builder software. Inclusion of caches improves <strong>the</strong> performance of a <strong>Nios</strong> <strong>II</strong> system significantly, particularly<br />
when most of <strong>the</strong> main memory is provided by an external SDRAM chip, as is <strong>the</strong> case with <strong>Altera</strong>’s DE2<br />
board. Both instruction and data caches are direct-mapped.<br />
The instruction cache can be implemented in <strong>the</strong> fast and standard versions of <strong>the</strong> <strong>Nios</strong> <strong>II</strong> processor systems.<br />
It is organized in 8 words per cache line, and its size is a user-selectable design parameter.<br />
The data cache can be implemented only with <strong>the</strong> <strong>Nios</strong> <strong>II</strong>/f processor. It has a configurable line size of 4, 16<br />
or 32 bytes per cache line. Its overall size is also a user-selectable design parameter.<br />
10.1 Cache Management<br />
Cache management is handled by software. For this purpose <strong>the</strong> <strong>Nios</strong> <strong>II</strong> instruction set includes <strong>the</strong> following<br />
instructions:<br />
• initd IMMED16(rA) (Initialize data-cache line)<br />
Invalidates <strong>the</strong> line in <strong>the</strong> data cache that is associated with <strong>the</strong> address determined by adding <strong>the</strong> signextended<br />
value IMMED16 and <strong>the</strong> contents of register rA.<br />
• initi rA (Initialize instruction-cache line)<br />
Invalidates <strong>the</strong> line in <strong>the</strong> instruction cache that is associated with <strong>the</strong> address contained in register rA.<br />
21
• flushd IMMED16(rA) (Flush data-cache line)<br />
Computes <strong>the</strong> effective address by adding <strong>the</strong> sign-extended value IMMED16 and <strong>the</strong> contents of register<br />
rA. Then, it identifies <strong>the</strong> cache line associated with this effective address, writes any dirty data in <strong>the</strong> cache<br />
line back <strong>to</strong> memory, and invalidates <strong>the</strong> cache line.<br />
• flushi rA (Flush instruction-cache line)<br />
Invalidates <strong>the</strong> line in <strong>the</strong> instruction cache that is associated with <strong>the</strong> address contained in register rA.<br />
10.2 Cache Bypass Methods<br />
A <strong>Nios</strong> <strong>II</strong> processor uses its data cache in <strong>the</strong> standard manner. But, it also allows <strong>the</strong> cache <strong>to</strong> be bypassed in<br />
two ways. As mentioned in section 6.1, <strong>the</strong> Load and S<strong>to</strong>re instructions have a version intended for accessing I/O<br />
devices, where <strong>the</strong> effective address specifies a location in an I/O device interface. These instructions are: ldwio,<br />
ldbio, lduio, ldhio, ldhuio, stwio, stbio, and sthio. They bypass <strong>the</strong> data cache.<br />
Ano<strong>the</strong>r way of bypassing <strong>the</strong> data cache is by using bit 31 of an address as a tag that indicates whe<strong>the</strong>r <strong>the</strong><br />
processor should transfer <strong>the</strong> data <strong>to</strong>/from <strong>the</strong> cache, or bypass it. This feature is available only in <strong>the</strong> <strong>Nios</strong> <strong>II</strong>/f<br />
processor.<br />
Mixing cached and uncached accesses has <strong>to</strong> be done with care. O<strong>the</strong>rwise, <strong>the</strong> coherence of <strong>the</strong> cached data<br />
may be compromised.<br />
11 Tightly Coupled Memory<br />
As explained in section 4, a <strong>Nios</strong> <strong>II</strong> processor can access <strong>the</strong> memory blocks in <strong>the</strong> FPGA chip as a tightly coupled<br />
memory. This arrangement does not use <strong>the</strong> Avalon network. Instead, <strong>the</strong> tightly coupled memory is connected<br />
directly <strong>to</strong> <strong>the</strong> processor.<br />
Data in <strong>the</strong> tightly coupled memory is accessed using <strong>the</strong> normal Load and S<strong>to</strong>re instructions, such as ldw or<br />
stw. The <strong>Nios</strong> <strong>II</strong> control circuits determine if <strong>the</strong> address of a memory location is in <strong>the</strong> tightly coupled memory.<br />
Accesses <strong>to</strong> <strong>the</strong> tightly coupled memory bypass <strong>the</strong> caches. For <strong>the</strong> address span of <strong>the</strong> tightly coupled memory,<br />
<strong>the</strong> processor operates as if caches were not present.<br />
Copyright c○2008 <strong>Altera</strong> Corporation. All rights reserved. <strong>Altera</strong>, The Programmable Solutions Company, <strong>the</strong><br />
stylized <strong>Altera</strong> logo, specific device designations, and all o<strong>the</strong>r words and logos that are identified as trademarks<br />
and/or service marks are, unless noted o<strong>the</strong>rwise, <strong>the</strong> trademarks and service marks of <strong>Altera</strong> Corporation in<br />
<strong>the</strong> U.S. and o<strong>the</strong>r countries. All o<strong>the</strong>r product or service names are <strong>the</strong> property of <strong>the</strong>ir respective holders.<br />
<strong>Altera</strong> products are protected under numerous U.S. and foreign patents and pending applications, mask work<br />
rights, and copyrights. <strong>Altera</strong> warrants performance of its semiconduc<strong>to</strong>r products <strong>to</strong> current specifications in<br />
accordance with <strong>Altera</strong>’s standard warranty, but reserves <strong>the</strong> right <strong>to</strong> make changes <strong>to</strong> any products and services at<br />
any time without notice. <strong>Altera</strong> assumes no responsibility or liability arising out of <strong>the</strong> application or use of any<br />
information, product, or service described herein except as expressly agreed <strong>to</strong> in writing by <strong>Altera</strong> Corporation.<br />
<strong>Altera</strong> cus<strong>to</strong>mers are advised <strong>to</strong> obtain <strong>the</strong> latest version of device specifications before relying on any published<br />
information and before placing orders for products or services.<br />
This document is being provided on an “as-is” basis and as an accommodation and <strong>the</strong>refore all warranties, representations<br />
or guarantees of any kind (whe<strong>the</strong>r express, implied or statu<strong>to</strong>ry) including, without limitation, warranties<br />
of merchantability, non-infringement, or fitness for a particular purpose, are specifically disclaimed.<br />
22