20.01.2014 Views

Introduction to the Altera Nios II Soft Processor - FTP - Altera

Introduction to the Altera Nios II Soft Processor - FTP - Altera

Introduction to the Altera Nios II Soft Processor - FTP - Altera

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Introduction</strong> <strong>to</strong> <strong>the</strong> <strong>Altera</strong> <strong>Nios</strong> <strong>II</strong> <strong>Soft</strong> <strong>Processor</strong><br />

This tu<strong>to</strong>rial presents an introduction <strong>to</strong> <strong>Altera</strong>’s <strong>Nios</strong> R○ <strong>II</strong> processor, which is a soft processor that can be instantiated<br />

on an <strong>Altera</strong> FPGA device. It describes <strong>the</strong> basic architecture of <strong>Nios</strong> <strong>II</strong> and its instruction set. The <strong>Nios</strong><br />

<strong>II</strong> processor and its associated memory and peripheral components are easily instantiated by using <strong>Altera</strong>’s SOPC<br />

Builder in conjuction with <strong>the</strong> Quartus R○ <strong>II</strong> software.<br />

A full desciption of <strong>the</strong> <strong>Nios</strong> <strong>II</strong> processor is provided in <strong>the</strong> <strong>Nios</strong> <strong>II</strong> <strong>Processor</strong> Reference Handbook, which<br />

is available in <strong>the</strong> literature section of <strong>the</strong> <strong>Altera</strong> web site. An introduction <strong>to</strong> <strong>the</strong> SOPC Builder is given in <strong>the</strong><br />

tu<strong>to</strong>rial <strong>Introduction</strong> <strong>to</strong> <strong>the</strong> <strong>Altera</strong> SOPC Builder, which can be found in <strong>the</strong> University Program section of <strong>the</strong> web<br />

site.<br />

Contents:<br />

<strong>Nios</strong> <strong>II</strong> System<br />

Overview of <strong>Nios</strong> <strong>II</strong> <strong>Processor</strong> Features<br />

Register Structure<br />

Accessing Memory and I/O Devices<br />

Addressing<br />

Instruction Set<br />

Assembler Directives<br />

Example Program<br />

Exception Processing<br />

Cache Memory<br />

Tightly Coupled Memory<br />

1


<strong>Altera</strong>’s <strong>Nios</strong> <strong>II</strong> is a soft processor, defined in a hardware description language, which can be implemented in<br />

<strong>Altera</strong>’s FPGA devices by using <strong>the</strong> Quartus R○ <strong>II</strong> CAD system. This tu<strong>to</strong>rial provides a basic introduction <strong>to</strong> <strong>the</strong><br />

<strong>Nios</strong> <strong>II</strong> processor, intended for a user who wishes <strong>to</strong> implement a <strong>Nios</strong> <strong>II</strong> based system on <strong>the</strong> <strong>Altera</strong> DE2 board.<br />

1 <strong>Nios</strong> <strong>II</strong> System<br />

The <strong>Nios</strong> <strong>II</strong> processor can be used with a variety of o<strong>the</strong>r components <strong>to</strong> form a complete system. These components<br />

include a number of standard peripherals, but it is also possible <strong>to</strong> define cus<strong>to</strong>m peripherals. <strong>Altera</strong>’s DE2<br />

Development and Education board contains several components that can be integrated in<strong>to</strong> a <strong>Nios</strong> <strong>II</strong> system. An<br />

example of such a system is shown in Figure 1.<br />

Host computer<br />

USB-Blaster<br />

interface<br />

<strong>Nios</strong> <strong>II</strong> processor<br />

JTAG Debug<br />

module<br />

JTAG UART<br />

interface<br />

Cyclone <strong>II</strong><br />

FPGA chip<br />

Avalon switch fabric<br />

On-chip<br />

memory<br />

SRAM<br />

interface<br />

SDRAM<br />

interface<br />

Flash<br />

memory<br />

interface<br />

Parallel I/O<br />

interface<br />

Serial I/O<br />

interface<br />

SRAM<br />

chip<br />

SDRAM<br />

chip<br />

Flash<br />

memory<br />

chip<br />

Parallel<br />

I/O port<br />

lines<br />

Serial<br />

I/O port<br />

lines<br />

Figure 1. A <strong>Nios</strong> <strong>II</strong> system implemented on <strong>the</strong> DE2 board.<br />

2


The <strong>Nios</strong> <strong>II</strong> processor and <strong>the</strong> interfaces needed <strong>to</strong> connect <strong>to</strong> o<strong>the</strong>r chips on <strong>the</strong> DE2 board are implemented in<br />

<strong>the</strong> Cyclone <strong>II</strong> FPGA chip. These components are interconnected by means of <strong>the</strong> interconnection network called<br />

<strong>the</strong> Avalon Switch Fabric. Memory blocks in <strong>the</strong> Cyclone <strong>II</strong> device can be used <strong>to</strong> provide an on-chip memory for<br />

<strong>the</strong> <strong>Nios</strong> <strong>II</strong> processor. They can be connected <strong>to</strong> <strong>the</strong> processor ei<strong>the</strong>r directly or through <strong>the</strong> Avalon network. The<br />

SRAM and SDRAM memory chips on <strong>the</strong> DE2 board are accessed through <strong>the</strong> appropriate interfaces. Input/output<br />

interfaces are instantiated <strong>to</strong> provide connection <strong>to</strong> <strong>the</strong> I/O devices used in <strong>the</strong> system. A special JTAG UART<br />

interface is used <strong>to</strong> connect <strong>to</strong> <strong>the</strong> circuitry that provides a Universal Serial Bus (USB) link <strong>to</strong> <strong>the</strong> host computer <strong>to</strong><br />

which <strong>the</strong> DE2 board is connected. This circuitry and <strong>the</strong> associated software is called <strong>the</strong> USB-Blaster. Ano<strong>the</strong>r<br />

module, called <strong>the</strong> JTAG Debug module, is provided <strong>to</strong> allow <strong>the</strong> host computer <strong>to</strong> control <strong>the</strong> <strong>Nios</strong> <strong>II</strong> processor.<br />

It makes it possible <strong>to</strong> perform operations such as downloading programs in<strong>to</strong> memory, starting and s<strong>to</strong>pping<br />

execution, setting program breakpoints, and collecting real-time execution trace data.<br />

Since all parts of <strong>the</strong> <strong>Nios</strong> <strong>II</strong> system implemented on <strong>the</strong> FPGA chip are defined by using a hardware description<br />

language, a knowledgeable user could write such code <strong>to</strong> implement any part of <strong>the</strong> system. This would be an<br />

onnerous and time consuming task. Instead, one can use <strong>the</strong> SOPC Builder <strong>to</strong>ol in <strong>the</strong> Quartus <strong>II</strong> software <strong>to</strong><br />

implement a desired system simply by choosing <strong>the</strong> required components and specifying <strong>the</strong> parameters needed<br />

<strong>to</strong> make each component fit <strong>the</strong> overall requirements of <strong>the</strong> system.<br />

2 Overview of <strong>Nios</strong> <strong>II</strong> <strong>Processor</strong> Features<br />

The <strong>Nios</strong> <strong>II</strong> processor has a number of features that can be configured by <strong>the</strong> user <strong>to</strong> meet <strong>the</strong> demands of a desired<br />

system. The processor can be implemented in three different configurations:<br />

• <strong>Nios</strong> <strong>II</strong>/f is a "fast" version designed for superior performance. It has <strong>the</strong> widest scope of configuration<br />

options that can be used <strong>to</strong> optimize <strong>the</strong> processor for performance.<br />

• <strong>Nios</strong> <strong>II</strong>/s is a "standard" version that requires less resources in an FPGA device as a trade-off for reduced<br />

performance.<br />

• <strong>Nios</strong> <strong>II</strong>/e is an "economy" version which requires <strong>the</strong> least amount of FPGA resources, but also has <strong>the</strong> most<br />

limited set of user-configurable features.<br />

The <strong>Nios</strong> <strong>II</strong> processor has a Reduced Instruction Set Computer (RISC) architecture. Its arithmetic and logic<br />

operations are performed on operands in <strong>the</strong> general purpose registers. The data is moved between <strong>the</strong> memory<br />

and <strong>the</strong>se registers by means of Load and S<strong>to</strong>re instructions.<br />

The wordlength of <strong>the</strong> <strong>Nios</strong> <strong>II</strong> processor is 32 bits. All registers are 32 bits long. Byte addresses in a 32-bit<br />

word are assigned in little-endian style, in which <strong>the</strong> lower byte addresses are used for <strong>the</strong> less significant bytes<br />

(<strong>the</strong> rightmost bytes) of <strong>the</strong> word.<br />

The <strong>Nios</strong> <strong>II</strong> architecture uses separate instruction and data buses, which is often referred <strong>to</strong> as <strong>the</strong> Harvard<br />

architecture.<br />

A <strong>Nios</strong> <strong>II</strong> processor may operate in <strong>the</strong> following three modes:<br />

• Supervisor mode – allows <strong>the</strong> processor <strong>to</strong> execute all instructions and perform all available functions. When<br />

<strong>the</strong> processor is reset, it enters this mode.<br />

• User mode – <strong>the</strong> intent of this mode is <strong>to</strong> prevent execution of some instructions that shoud be used for<br />

systems purposes only. Some processor features are not accessible in this mode.<br />

• Debug mode – is used by software debugging <strong>to</strong>ols <strong>to</strong> implement features such as breakpoints and watchpoints.<br />

Application programs can be run in ei<strong>the</strong>r <strong>the</strong> User or Supervisor modes. Presently available versions of <strong>the</strong> <strong>Nios</strong><br />

<strong>II</strong> processor do not support <strong>the</strong> User mode.<br />

3


3 Register Structure<br />

The <strong>Nios</strong> <strong>II</strong> processor has thirty two 32-bit general purpose registers, as shown in Figure 2. Some of <strong>the</strong>se registers<br />

are intended for a specific purpose and have special names that are recognized by <strong>the</strong> Assembler.<br />

• Register r0 is referred <strong>to</strong> as <strong>the</strong> zero register. It always contains <strong>the</strong> constant 0. Thus, reading this register<br />

returns <strong>the</strong> value 0, while writing <strong>to</strong> it has no effect.<br />

• Register r1 is used by <strong>the</strong> Assembler as a temporary register; it should not be referenced in user programs<br />

• Registers r24 and r29 are used for processing of exceptions; <strong>the</strong>y are not available in User mode<br />

• Registers r25 and r30 are used exclusively by <strong>the</strong> JTAG Debug module<br />

• Registers r27 and r28 are used <strong>to</strong> control <strong>the</strong> stack used by <strong>the</strong> <strong>Nios</strong> <strong>II</strong> processor<br />

• Register r31 is used <strong>to</strong> hold <strong>the</strong> return address when a subroutine is called<br />

Register Name Function<br />

r0 zero 0x00000000<br />

r1 at Assembler Temporary<br />

r2<br />

r3<br />

· · ·<br />

· · ·<br />

· · ·<br />

r23<br />

r24 et Exception Temporary (1)<br />

r25 bt Breakpoint Temporary (2)<br />

r26 gp Global Pointer<br />

r27 sp Stack Pointer<br />

r28 fp Frame Pointer<br />

r29 ea Exception Return Address (1)<br />

r30 ba Breakpoint Return Address (2)<br />

r31 ra Return Address<br />

(1) The register is not available in User mode<br />

(2) The register is used exclusively by <strong>the</strong> JTAG Debug module<br />

Figure 2. General Purpose registers.<br />

There are six 32-bit control registers, as indicated in Figure 3. The names given in <strong>the</strong> figure are recognized<br />

by <strong>the</strong> Assembler. These registers are used au<strong>to</strong>matically for control purposes. They can be read and written <strong>to</strong> by<br />

special instructions rdctl and wrctl, which can be executed only in <strong>the</strong> supervisor mode. The registers are used as<br />

follows:<br />

• Register ctl0 reflects <strong>the</strong> operating status of <strong>the</strong> processor. Only two bits of this register are meaningful:<br />

– U is <strong>the</strong> User/Supervisor mode bit; U = 1 for User mode, while U = 0 for Supervisor mode.<br />

– PIE is <strong>the</strong> processor interrupt-enable bit. When PIE = 1, <strong>the</strong> processor may accept external interrupts.<br />

When PIE = 0, <strong>the</strong> processor ignores external interrupts.<br />

• Register ctl1 holds a saved copy of <strong>the</strong> status register during exception processing. The bits EU and EPIE<br />

are <strong>the</strong> saved values of <strong>the</strong> status bits U and PIE.<br />

4


• Register ctl2 holds a saved copy of <strong>the</strong> status register during debug break processing. The bits BU and BPIE<br />

are <strong>the</strong> saved values of <strong>the</strong> status bits U and PIE.<br />

• Register ctl3 is used <strong>to</strong> enable individual external interrupts. Each bit corresponds <strong>to</strong> one of <strong>the</strong> interrupts<br />

irq0 <strong>to</strong> irq31. The value of 1 means that <strong>the</strong> interrupt is enabled, while 0 means that it is disabled.<br />

• Register ctl4 indicates which interrupts are pending. The value of a given bit, ctl4 k , is set <strong>to</strong> 1 if <strong>the</strong> interrupt<br />

irqk is both active and enabled by having <strong>the</strong> interrupt-enable bit, ctl3 k , set <strong>to</strong> 1.<br />

• Register ctl5 holds a value that uniquely identifies <strong>the</strong> processor in a multiprocessor system.<br />

Register Name b 31 · · · b 2 b 1 b 0<br />

ctl0 status Reserved U PIE<br />

ctl1 estatus Reserved EU EPIE<br />

ctl2 bstatus Reserved BU BPIE<br />

ctl3 ienable Interrupt-enable bits<br />

ctl4 ipending Pending-interrupt bits<br />

ctl5 cpuid Unique processor identifier<br />

Figure 3. Control registers.<br />

4 Accessing Memory and I/O Devices<br />

Figure 4 shows how a <strong>Nios</strong> <strong>II</strong> processor can access memory and I/O devices. For best performance, <strong>the</strong> <strong>Nios</strong> <strong>II</strong>/f<br />

processor can include both instruction and data caches. The caches are implemented in <strong>the</strong> FPGA memory blocks.<br />

Their usage is optional and <strong>the</strong>y are specified (including <strong>the</strong>ir size) at <strong>the</strong> system generation time by using <strong>the</strong><br />

SOPC Builder. The <strong>Nios</strong> <strong>II</strong>/s version can have <strong>the</strong> instruction cache but not <strong>the</strong> data cache. The <strong>Nios</strong> <strong>II</strong>/e version<br />

has nei<strong>the</strong>r instruction nor data cache.<br />

Ano<strong>the</strong>r way <strong>to</strong> give <strong>the</strong> processor fast access <strong>to</strong> <strong>the</strong> on-chip memory is by using <strong>the</strong> tightly coupled memory<br />

arrangement, in which case <strong>the</strong> processor accesses <strong>the</strong> memory via a direct path ra<strong>the</strong>r than through <strong>the</strong> Avalon<br />

network. Accesses <strong>to</strong> a tightly coupled memory bypass <strong>the</strong> cache memory. There can be one or more tightly<br />

coupled instruction and data memories. If <strong>the</strong> instruction cache is not included in a system, <strong>the</strong>n <strong>the</strong>re must be at<br />

least one tightly coupled memory provided for <strong>Nios</strong> <strong>II</strong>/f and <strong>Nios</strong> <strong>II</strong>/s processors. On-chip memory can also be<br />

accessed via <strong>the</strong> Avalon network.<br />

Off-chip memory devices, such as SRAM, SDRAM, and Flash memory chips are accessed by instantiating <strong>the</strong><br />

appropriate interfaces. The input/output devices are memory mapped and can be accessed as memory locations.<br />

Data accesses <strong>to</strong> memory locations and I/O interfaces are performed by means of Load and S<strong>to</strong>re instructions,<br />

which cause data <strong>to</strong> be transferred between <strong>the</strong> memory and general purpose registers.<br />

5


Program counter<br />

Cyclone <strong>II</strong><br />

FPGA chip<br />

General purpose<br />

registers<br />

Instruction bus selec<strong>to</strong>r logic<br />

Data bus selec<strong>to</strong>r logic<br />

Instruction<br />

cache<br />

Data<br />

cache<br />

Avalon switch fabric<br />

Tightly coupled<br />

instruction memory<br />

Memory<br />

interface<br />

I/O<br />

interface<br />

Tightly coupled<br />

data memory<br />

Memory<br />

device<br />

I/O<br />

device<br />

Figure 4. Memory and I/O organization.<br />

5 Addressing<br />

The <strong>Nios</strong> <strong>II</strong> processor issues 32-bit addresses. The memory space is byte-addressable. Instructions can read and<br />

write words (32 bits), halfwords (16 bits), or bytes (8 bits) of data. Reading or writing <strong>to</strong> an address that does not<br />

correspond <strong>to</strong> an existing memory or I/O location produces an undefined result.<br />

There are five addressing modes provided:<br />

• Immediate mode – a 16-bit operand is given explicitly in <strong>the</strong> instruction. This value may be sign extended<br />

<strong>to</strong> produce a 32-bit operand in instructions that perform arithmetic operations.<br />

• Register mode – <strong>the</strong> operand is in a processor register<br />

• Displacement mode – <strong>the</strong> effective address of <strong>the</strong> operand is <strong>the</strong> sum of <strong>the</strong> contents of a register and a<br />

signed 16-bit displacement value given in <strong>the</strong> instruction<br />

• Register indirect mode – <strong>the</strong> effective address of <strong>the</strong> operand is <strong>the</strong> contents of a register specified in <strong>the</strong><br />

instruction. This is equivalent <strong>to</strong> <strong>the</strong> displacement mode where <strong>the</strong> displacement value is equal <strong>to</strong> 0.<br />

6


• Absolute mode – a 16-bit absolute address of an operand can be specified by using <strong>the</strong> displacement mode<br />

with register r0 which always contains <strong>the</strong> value 0.<br />

6 Instructions<br />

All <strong>Nios</strong> <strong>II</strong> instructions are 32-bits long. In addition <strong>to</strong> machine instructions that are executed directly by <strong>the</strong> processor,<br />

<strong>the</strong> <strong>Nios</strong> <strong>II</strong> instruction set includes a number of pseudoinstructions that can be used in assembly language<br />

programs. The Assembler replaces each pseudoinstruction by one or more machine instructions.<br />

Figure 5 depicts <strong>the</strong> three possible instruction formats: I-type, R-type and J-type. In all cases <strong>the</strong> six bits b 5−0<br />

denote <strong>the</strong> OP code. The remaining bits are used <strong>to</strong> specify registers, immediate operands, or extended OP codes.<br />

• I-type – Five-bit fields A and B are used <strong>to</strong> specify general purpose registers. A 16-bit field IMMED16<br />

provides immediate data which can be sign extended <strong>to</strong> provide a 32-bit operand.<br />

• R-type – Five-bit fields A, B and C are used <strong>to</strong> specify general purpose registers. An 11-bit field OPX is<br />

used <strong>to</strong> extend <strong>the</strong> OP code.<br />

• J-type – A 26-bit field IMMED26 contains an unsigned immediate value. This format is used only in <strong>the</strong><br />

Call instruction.<br />

31<br />

27<br />

26<br />

22<br />

21<br />

6<br />

5<br />

0<br />

A<br />

B<br />

IMMED16<br />

OP<br />

(a) I-type<br />

31<br />

27<br />

26<br />

22<br />

21<br />

17<br />

16<br />

6<br />

5<br />

0<br />

A<br />

B<br />

C<br />

OPX<br />

OP<br />

(b) R-type<br />

31<br />

6<br />

5<br />

0<br />

IMMED26<br />

OP<br />

(c) J-type<br />

Figure 5. Formats of <strong>Nios</strong> <strong>II</strong> instructions.<br />

The following subsections discuss briefly <strong>the</strong> main features of <strong>the</strong> <strong>Nios</strong> <strong>II</strong> instruction set. For a complete<br />

description of <strong>the</strong> instruction set, including <strong>the</strong> details of how each instruction is encoded, <strong>the</strong> reader should<br />

consult <strong>the</strong> <strong>Nios</strong> <strong>II</strong> <strong>Processor</strong> Reference Handbook.<br />

6.1 Load and S<strong>to</strong>re Instructions<br />

Load and S<strong>to</strong>re instructions are used <strong>to</strong> move data between memory (and I/0 interfaces) and <strong>the</strong> general purpose<br />

registers. They are of I-type. For example, <strong>the</strong> Load Word instruction<br />

ldw rB, byte_offset(rA)<br />

7


determines <strong>the</strong> effective address of a memory location as <strong>the</strong> sum of a byte_offset value and <strong>the</strong> contents of register<br />

A. The 16-bit byte_offset value is sign extended <strong>to</strong> 32 bits. The 32-bit memory operand is loaded in<strong>to</strong> register B.<br />

For instance, assume that <strong>the</strong> contents of register r4 are 1260 10 and <strong>the</strong> byte_offset value is 80 10 . Then, <strong>the</strong><br />

instruction<br />

ldw r3, 80(r4)<br />

loads <strong>the</strong> 32-bit operand at memory address 1340 10 in<strong>to</strong> register r3.<br />

The S<strong>to</strong>re Word instruction has <strong>the</strong> format<br />

stw rB, byte_offset(rA)<br />

It s<strong>to</strong>res <strong>the</strong> contents of register B in<strong>to</strong> <strong>the</strong> memory location at <strong>the</strong> address computed as <strong>the</strong> sum of <strong>the</strong> byte_offset<br />

value and <strong>the</strong> contents of register A.<br />

There are Load and S<strong>to</strong>re instructions that use operands that are only 8 or 16 bits long. They are referred <strong>to</strong> as<br />

Load/S<strong>to</strong>re Byte and Load/S<strong>to</strong>re Halfword instructions, respectively. Such Load instructions are:<br />

• ldb (Load Byte)<br />

• ldbu (Load Byte Unsigned)<br />

• ldh (Load Halfword)<br />

• ldhu (Load Halfword Unsigned)<br />

When a shorter operand is loaded in<strong>to</strong> a 32-bit register, its value has <strong>to</strong> be adjusted <strong>to</strong> fit in<strong>to</strong> <strong>the</strong> register. This<br />

is done by sign extending <strong>the</strong> 8- or 16-bit value <strong>to</strong> 32 bits in <strong>the</strong> ldb and ldh instructions. In <strong>the</strong> ldbu and ldhu<br />

instructions <strong>the</strong> operand is zero extended.<br />

The corresponding S<strong>to</strong>re instructions are:<br />

• stb (S<strong>to</strong>re Byte)<br />

• sth (S<strong>to</strong>re Halfword)<br />

The stb instruction s<strong>to</strong>res <strong>the</strong> low byte of register B in<strong>to</strong> <strong>the</strong> memory byte specified by <strong>the</strong> effective address. The<br />

sth instruction s<strong>to</strong>res <strong>the</strong> low halfword of register B. In this case <strong>the</strong> effective address must be halfword aligned.<br />

Each Load and S<strong>to</strong>re instruction has a version intended for accessing locations in I/O device interfaces. These<br />

instructions are:<br />

• ldwio (Load Word I/O)<br />

• ldbio<br />

• ldbuio<br />

• ldhio<br />

• ldhuio<br />

(Load Byte I/O)<br />

(Load Byte Unsigned I/O)<br />

(Load Halfword I/O)<br />

(Load Halfword Unsigned I/O)<br />

• stwio (S<strong>to</strong>re Word I/O)<br />

• stbio<br />

• sthio<br />

(S<strong>to</strong>re Byte I/O)<br />

(S<strong>to</strong>re Halfword I/O)<br />

The difference is that <strong>the</strong>se instructions bypass <strong>the</strong> cache, if one exists.<br />

8


6.2 Arithmetic Instructions<br />

The arithmetic instructions operate on <strong>the</strong> data that is ei<strong>the</strong>r in <strong>the</strong> general purpose registers or given as an immediate<br />

value in <strong>the</strong> instruction. These instructions are of R-type or I-type, respectively. They include:<br />

• add (Add Registers)<br />

• addi (Add Immediate)<br />

• sub (Subtract Registers)<br />

• subi (Subtract Immediate)<br />

• mul (Multiply)<br />

• muli (Multiply Immediate)<br />

• div (Divide)<br />

• divu (Divide Unsigned)<br />

The Add instruction<br />

add rC, rA, rB<br />

adds <strong>the</strong> contents of registers A and B, and places <strong>the</strong> sum in<strong>to</strong> register C.<br />

The Add Immediate instruction<br />

addi rB, rA, IMMED16<br />

adds <strong>the</strong> contents of register A and <strong>the</strong> sign-extended 16-bit operand given in <strong>the</strong> instruction, and places <strong>the</strong> result<br />

in<strong>to</strong> register B. The addition operation in <strong>the</strong>se instructions is <strong>the</strong> same for both signed and unsigned operands;<br />

<strong>the</strong>re are no condition flags that are set by <strong>the</strong> operation. This means that when unsigned operands are added, <strong>the</strong><br />

carry from <strong>the</strong> most significant bit position has <strong>to</strong> be detected by executing a separate instruction. Similarly, when<br />

signed operands are added, <strong>the</strong> arithmetic overflow has <strong>to</strong> be detected separately. The detection of <strong>the</strong>se conditions<br />

is dicussed in section 6.11.<br />

The Subtract instruction<br />

sub rC, rA, rB<br />

subtracts <strong>the</strong> contents of register B from register A, and places <strong>the</strong> result in<strong>to</strong> register C. Again, <strong>the</strong> carry and<br />

overflow detection has <strong>to</strong> be done by using additional instructions, as explained in section 6.11.<br />

The immediate version, subi, is a pseudoinstruction implemented as<br />

addi rB, rA, -IMMED16<br />

The Multiply instruction<br />

mul rC, rA, rB<br />

multiplies <strong>the</strong> contents of registers A and B, and places <strong>the</strong> low-order 32 bits of <strong>the</strong> product in<strong>to</strong> register C. The<br />

operands are treated as unsigned numbers. The carry and overflow detection has <strong>to</strong> be done by using additional<br />

instructions. In <strong>the</strong> immediate version<br />

muli rB, rA, IMMED16<br />

<strong>the</strong> 16-bit immediate operand is sign extended <strong>to</strong> 32 bits.<br />

9


The Divide instruction<br />

div rC, rA, rB<br />

divides <strong>the</strong> contents of register A by <strong>the</strong> contents of register B and places <strong>the</strong> integer portion of <strong>the</strong> quotient in<strong>to</strong><br />

register C. The operands are treated as signed integers. The divu instruction is performed in <strong>the</strong> same way except<br />

that <strong>the</strong> operands are treated as unsigned integers.<br />

6.3 Logic Instructions<br />

The logic instructions provide <strong>the</strong> AND, OR, XOR, and NOR operations. They operate on data that is ei<strong>the</strong>r in<br />

<strong>the</strong> general purpose registers or given as an immediate value in <strong>the</strong> instruction. These instructions are of R-type or<br />

I-type, respectively.<br />

The AND instruction<br />

and rC, rA, rB<br />

performs a bitwise logical AND of <strong>the</strong> contents of registers A and B, and s<strong>to</strong>res <strong>the</strong> result in register C. Similarly,<br />

<strong>the</strong> instructions or, xor and nor perform <strong>the</strong> OR, XOR and NOR operations, respectively.<br />

The AND Immediate instruction<br />

andi rB, rA, IMMED16<br />

performs a bitwise logical AND of <strong>the</strong> contents of register A and <strong>the</strong> IMMED16 operand which is zero-extended<br />

<strong>to</strong> 32 bits, and s<strong>to</strong>res <strong>the</strong> result in register B. Similarly, <strong>the</strong> instructions ori, xori and nori perform <strong>the</strong> OR, XOR<br />

and NOR operations, respectively.<br />

It is also possible <strong>to</strong> use <strong>the</strong> 16-bit immediate operand as <strong>the</strong> 16 high-order bits in <strong>the</strong> logic operations, in which<br />

case <strong>the</strong> low-order 16 bits of <strong>the</strong> operand are zeros. This is accomplished with <strong>the</strong> instructions:<br />

• andhi<br />

(AND High Immediate)<br />

• orhi (OR High Immediate)<br />

• xorhi<br />

(XOR High Immediate)<br />

6.4 Move Instructions<br />

The Move instructions copy <strong>the</strong> contents of one register in<strong>to</strong> ano<strong>the</strong>r, or <strong>the</strong>y place an immediate value in<strong>to</strong> a<br />

register. They are pseudoinstructions implemented by using o<strong>the</strong>r instructions. The instruction<br />

mov rC, rA<br />

copies <strong>the</strong> contents of register A in<strong>to</strong> register C. It is implemented as<br />

add rC, rA, r0<br />

The Move Immediate instruction<br />

movi rB, IMMED16<br />

sign extends <strong>the</strong> IMMED16 value <strong>to</strong> 32 bits and loads it in<strong>to</strong> register B. It is implemented as<br />

addi rB, r0, IMMED16<br />

The Move Unsigned Immediate instruction<br />

10


movui rB, IMMED16<br />

zero extends <strong>the</strong> IMMED16 value <strong>to</strong> 32 bits and loads it in<strong>to</strong> register B. It is implemented as<br />

ori rB, r0, IMMED16<br />

The Move Immediate Address instruction<br />

movia rB, LABEL<br />

loads a 32-bit value that corresponds <strong>to</strong> <strong>the</strong> address LABEL in<strong>to</strong> register B. It is implemented as:<br />

orhi<br />

ori<br />

rB, r0, %hi(LABEL)<br />

rB, rB, %lo(LABEL)<br />

The %hi(LABEL) and %lo(LABEL) are <strong>the</strong> Assembler macros which extract <strong>the</strong> high-order 16 bits and <strong>the</strong> loworder<br />

16 bits, respectively, of a 32-bit value LABEL. The orhi instruction sets <strong>the</strong> high-order bits of register B,<br />

followed by <strong>the</strong> ori instruction which sets <strong>the</strong> low-order bits of B. Note that two instructions are used because <strong>the</strong><br />

I-type format provides for only a 16-bit immediate operand.<br />

6.5 Comparison Instructions<br />

The Comparison instructions compare <strong>the</strong> contents of two registers or <strong>the</strong> contents of a register and an immediate<br />

value, and write ei<strong>the</strong>r 1 (if true) or 0 (if false) in<strong>to</strong> <strong>the</strong> result register. They are of R-type or I-type, respectively.<br />

These instructions correspond <strong>to</strong> <strong>the</strong> equality and relational opera<strong>to</strong>rs in <strong>the</strong> C programming language.<br />

The Compare Less Than Signed instruction<br />

cmplt rC, rA, rB<br />

performs <strong>the</strong> comparison of signed numbers in registers A and B, rA < rB, and writes a 1 in<strong>to</strong> register C if <strong>the</strong><br />

result is true; o<strong>the</strong>rwise, it writes a 0.<br />

The Compare Less Than Unsigned instruction<br />

cmpltu rC, rA, rB<br />

performs <strong>the</strong> same function as <strong>the</strong> cmplt instruction, but it treats <strong>the</strong> operands as unsigned numbers.<br />

O<strong>the</strong>r instructions of this type are:<br />

• cmpeq rC, rA, rB<br />

• cmpne rC, rA, rB<br />

• cmpge rC, rA, rB<br />

(Comparison rA == rB)<br />

(Comparison rA != rB)<br />

(Signed comparison rA >= rB)<br />

• cmpgeu rC, rA, rB (Unsigned comparison rA >= rB)<br />

• cmpgt rC, rA, rB (Signed comparison rA > rB)<br />

This is a pseudoinstruction implemented as <strong>the</strong> cmplt instruction by swapping its rA and rB operands.<br />

• cmpgtu rC, rA, rB (Unsigned comparison rA > rB)<br />

This is a pseudoinstruction implemented as <strong>the</strong> cmpltu instruction by swapping its rA and rB operands.<br />

• cmple rC, rA, rB (Signed comparison rA


The immediate versions of <strong>the</strong> Comparison instructions involve an immediate operand. For example, <strong>the</strong><br />

Compare Less Than Signed Immediate instruction<br />

cmplti rB, rA, IMMED16<br />

compares <strong>the</strong> signed number in register A with <strong>the</strong> sign-extended immediate operand. It writes a 1 in<strong>to</strong> register B<br />

if rA < IMMED16; o<strong>the</strong>rwise, it writes a 0.<br />

The Compare Less Than Unsigned Immediate instruction<br />

cmpltui rB, rA, IMMED16<br />

compares <strong>the</strong> unsigned number in register A with <strong>the</strong> zero-extended immediate operand. It writes a 1 in<strong>to</strong> register<br />

B if rA < IMMED16; o<strong>the</strong>rwise, it writes a 0.<br />

O<strong>the</strong>r instructions of this type are:<br />

• cmpeqi rB, rA, IMMED16<br />

• cmpnei rB, rA, IMMED16<br />

• cmpgei rB, rA, IMMED16<br />

• cmpgeui rB, rA, IMMED16<br />

(Comparison rA == IMMED16)<br />

(Comparison rA != IMMED16)<br />

(Signed comparison rA >= IMMED16)<br />

(Unsigned comparison rA >= IMMED16)<br />

• cmpgti rB, rA, IMMED16 (Signed comparison rA > IMMED16)<br />

This is a pseudoinstruction which is implemented by using <strong>the</strong> cmpgei instruction with an immediate value<br />

IMMED16 + 1.<br />

• cmpgtui rB, rA, IMMED16 (Unsigned comparison rA > IMMED16)<br />

This is a pseudoinstruction which is implemented by using <strong>the</strong> cmpgeui instruction with an immediate<br />

value IMMED16 + 1.<br />

• cmplei rB, rA, IMMED16 (Signed comparison rA and


The srl instruction shifts <strong>the</strong> contents of register A <strong>to</strong> <strong>the</strong> right by <strong>the</strong> number of bit positions specified by <strong>the</strong> five<br />

least-significant bits (number in <strong>the</strong> range 0 <strong>to</strong> 31) in register B, and s<strong>to</strong>res <strong>the</strong> result in register C. The vacated<br />

bits on <strong>the</strong> left side of <strong>the</strong> shifted operand are filled with 0s.<br />

The srli instruction shifts <strong>the</strong> contents of register A <strong>to</strong> <strong>the</strong> right by <strong>the</strong> number of bit positions specified by <strong>the</strong><br />

five-bit unsigned value, IMMED5, given in <strong>the</strong> instruction.<br />

The sra and srai instructions perform <strong>the</strong> same actions as <strong>the</strong> srl and srli instructions, except that <strong>the</strong> sign bit,<br />

rA 31 , is replicated in<strong>to</strong> <strong>the</strong> vacated bits on <strong>the</strong> left side of <strong>the</strong> shifted operand.<br />

The sll and slli instructions are similar <strong>to</strong> <strong>the</strong> srl and srli instructions, but <strong>the</strong>y shift <strong>the</strong> operand in register A <strong>to</strong> <strong>the</strong><br />

left and fill <strong>the</strong> vacated bits on <strong>the</strong> right side with 0s.<br />

6.7 Rotate Instructions<br />

There are three Rotate instructions, which use <strong>the</strong> R-type format:<br />

• ror rC, rA, rB (Rotate Right)<br />

• rol rC, rA, rB (Rotate Left)<br />

• roli rC, rA, IMMED5<br />

(Rotate Left Immediate)<br />

The ror instruction rotates <strong>the</strong> bits of register A in <strong>the</strong> left-<strong>to</strong>-right direction by <strong>the</strong> number of bit positions specified<br />

by <strong>the</strong> five least-significant bits (number in <strong>the</strong> range 0 <strong>to</strong> 31) in register B, and s<strong>to</strong>res <strong>the</strong> result in register C.<br />

The rol instruction is similar <strong>to</strong> <strong>the</strong> ror instruction, but it rotates <strong>the</strong> operand in <strong>the</strong> right-<strong>to</strong>-left direction.<br />

The roli instruction rotates <strong>the</strong> bits of register A in <strong>the</strong> right-<strong>to</strong>-left direction by <strong>the</strong> number of bit positions specified<br />

by <strong>the</strong> five-bit unsigned value, IMMED5, given in <strong>the</strong> instruction, and s<strong>to</strong>res <strong>the</strong> result in register C.<br />

6.8 Branch and Jump Instructions<br />

The flow of execution of a program can be changed by executing Branch or Jump instructions. It may be changed<br />

ei<strong>the</strong>r unconditionally or conditionally.<br />

The Jump instruction<br />

jmp rA<br />

transfers execution unconditionally <strong>to</strong> <strong>the</strong> address contained in register A.<br />

The Unconditional Branch instruction<br />

br LABEL<br />

transfers execution unconditionally <strong>to</strong> <strong>the</strong> instruction at address LABEL. This is an instruction of I-type, in which<br />

a 16-bit immediate value (interpreted as a signed number) specifies <strong>the</strong> offset <strong>to</strong> <strong>the</strong> branch target instruction. The<br />

offset is <strong>the</strong> distance in bytes from <strong>the</strong> instruction that immediately follows br <strong>to</strong> <strong>the</strong> address LABEL.<br />

Conditional transfer of execution is achieved with <strong>the</strong> Conditional Branch instructions, which compare <strong>the</strong><br />

contents of two registers and cause a branch if <strong>the</strong> result is true. These instructions are of I-type and <strong>the</strong> offset is<br />

determined as explained above for <strong>the</strong> br instruction.<br />

13


The Branch if Less Than Signed instruction<br />

blt rA, rB, LABEL<br />

performs <strong>the</strong> comparison rA < rB, treating <strong>the</strong> contents of <strong>the</strong> registers as signed numbers.<br />

The Branch if Less Than Unsigned instruction<br />

bltu rA, rB, LABEL<br />

performs <strong>the</strong> comparison rA < rB, treating <strong>the</strong> contents of <strong>the</strong> registers as unsigned numbers.<br />

The o<strong>the</strong>r Conditional Branch instructions are:<br />

• beq rA, rB, LABEL<br />

• bne rA, rB, LABEL<br />

• bge rA, rB, LABEL<br />

• bgeu rA, rB, LABEL<br />

(Comparison rA == rB)<br />

(Comparison rA != rB)<br />

(Signed comparison rA >= rB)<br />

(Unsigned comparison rA >= rB)<br />

• bgt rA, rB, LABEL (Signed comparison rA > rB)<br />

This is a pseudoinstruction implemented as <strong>the</strong> blt instruction by swapping <strong>the</strong> register operands.<br />

• bgtu rA, rB, LABEL (Unsigned comparison rA > rB)<br />

This is a pseudoinstruction implemented as <strong>the</strong> bltu instruction by swapping <strong>the</strong> register operands.<br />

• ble rA, rB, LABEL (Signed comparison rA


6.10 Control Instructions<br />

The <strong>Nios</strong> <strong>II</strong> control registers can be read and written by special instructions. The Read Control Register instruction<br />

rdctl rC, ctlN<br />

copies <strong>the</strong> contents of control register ctlN in<strong>to</strong> register C.<br />

The Write Control Register instruction<br />

wrctl ctlN, rA<br />

copies <strong>the</strong> contents of register A in<strong>to</strong> <strong>the</strong> control register ctlN.<br />

There are two instructions provided for dealing with exceptions: trap and eret. They are similar <strong>to</strong> <strong>the</strong> call<br />

and ret instructions, but <strong>the</strong>y are used for exceptions. Their use is discussed in section 8.2.<br />

The instructions break and bret generate breaks and return from breaks. They are used exclusively by <strong>the</strong><br />

software debugging <strong>to</strong>ols.<br />

The <strong>Nios</strong> <strong>II</strong> cache memories are managed with <strong>the</strong> instructions: flushd (Flush Data Cache Line), flushi (Flush<br />

Instruction Cache Line), initd (Initialize Data Cache Line), and initi (Initialize Instruction Cache Line). These<br />

instructions are discussed in section 9.1.<br />

6.11 Carry and Overflow Detection<br />

As pointed out in section 6.2, <strong>the</strong> Add and Subtract instructions perform <strong>the</strong> corresponding operations in <strong>the</strong> same<br />

way for both signed and unsigned operands. The possible carry and arithmetic overflow conditions are not detected,<br />

because <strong>Nios</strong> <strong>II</strong> does not contain condition flags that might be set as a result. These conditions can be<br />

detected by using additional instructions.<br />

Consider <strong>the</strong> Add instruction<br />

add rC, rA, rB<br />

Having executed this instruction, a possible occurrence of a carry out of <strong>the</strong> most-significant bit (C 31 ) can be<br />

detected by checking whe<strong>the</strong>r <strong>the</strong> unsigned sum (in register C) is less than one of <strong>the</strong> unsigned operands. For<br />

example, if this instruction is followed by <strong>the</strong> instruction<br />

<strong>the</strong>n <strong>the</strong> carry bit will be written in<strong>to</strong> register D.<br />

cmpltu rD, rC, rA<br />

Similarly, if a branch is required when a carry occurs, this can be accomplished as follows:<br />

add<br />

bltu<br />

rC, rA, rB<br />

rC, rA, LABEL<br />

A test for arithmetic overflow can be done by checking <strong>the</strong> signs of <strong>the</strong> summands and <strong>the</strong> resulting sum. An<br />

overflow occurs if two positive numbers produce a negative sum, or if two negative numbers produce a positive<br />

sum. Using this approach, <strong>the</strong> overflow condition can control a conditional branch as follows:<br />

add rC, rA, rB /* The required Add operation */<br />

xor rD, rC, rA /* Compare signs of sum and rA */<br />

xor rE, rC, rB /* Compare signs of sum and rB */<br />

and rD, rD, rE /* Set D 31 = 1 if ((A 31 == B 31 ) ! = C 31 ) */<br />

blt rD, r0, LABEL /* Branch if overflow occurred */<br />

15


A similar approach can be used <strong>to</strong> detect <strong>the</strong> carry and overflow conditions in Subtract operations. A carry out<br />

of <strong>the</strong> most-significant bit of <strong>the</strong> resulting difference can be detected by checking whe<strong>the</strong>r <strong>the</strong> first operand is less<br />

than <strong>the</strong> second operand. Thus, <strong>the</strong> carry can be used <strong>to</strong> control a conditional branch as follows:<br />

sub<br />

bltu<br />

rC, rA, rB<br />

rA, rB, LABEL<br />

The arithmetic overflow in a Subtract operation is detected by comparing <strong>the</strong> sign of <strong>the</strong> generated difference with<br />

<strong>the</strong> signs of <strong>the</strong> operands. Overflow occurs if <strong>the</strong> operands in registers A and B have different signs, and <strong>the</strong> sign<br />

of <strong>the</strong> difference in register C is different than <strong>the</strong> sign of A. Thus, a conditional branch based on <strong>the</strong> arithmetic<br />

overflow can be achieved as follows:<br />

sub rC, rA, rB /* The required Subtract operation */<br />

xor rD, rA, rB /* Compare signs of rA and rB */<br />

xor rE, rA, rC /* Compare signs of rA and rC */<br />

and rD, rD, rE /* Set D 31 = 1 if ((A 31 ! = B 31 ) && (A 31 ! = C 31 )) */<br />

blt rD, r0, LABEL /* Branch if overflow occurred */<br />

7 Assembler Directives<br />

The <strong>Nios</strong> <strong>II</strong> Assembler conforms <strong>to</strong> <strong>the</strong> widely used GNU Assembler, which is software available in <strong>the</strong> public<br />

domain. Thus, <strong>the</strong> GNU Assembler directives can be used in <strong>Nios</strong> <strong>II</strong> programs. Assembler directives begin with a<br />

period. We describe some of <strong>the</strong> more frequently used assembler directives below.<br />

.ascii "string"...<br />

A string of ASC<strong>II</strong> characters is loaded in<strong>to</strong> consecutive byte addresses in <strong>the</strong> memory. Multiple strings, separated<br />

by commas, can be specified.<br />

.asciz "string"...<br />

This directive is <strong>the</strong> same as .ascii, except that each string is followed (terminated) by a zero byte.<br />

.byte expressions<br />

Expressions separated by commas are specified. Each expression is assembled in<strong>to</strong> <strong>the</strong> next byte. Examples of<br />

expressions are: 8, 5 + LABEL, and K − 6.<br />

.data<br />

Identifies <strong>the</strong> data that should be placed in <strong>the</strong> data section of <strong>the</strong> memory. The desired memory location for <strong>the</strong><br />

data section can be specified in <strong>the</strong> <strong>Altera</strong> Moni<strong>to</strong>r Program’s system configuration window.<br />

.end<br />

Marks <strong>the</strong> end of <strong>the</strong> source code file; everything after this directive is ignored by <strong>the</strong> assembler.<br />

.equ symbol, expression<br />

Sets <strong>the</strong> value of symbol <strong>to</strong> expression.<br />

16


.global symbol<br />

Makes symbol visible outside <strong>the</strong> assembled object file.<br />

.hword expressions<br />

Expressions separated by commas are specified. Each expression is assembled in<strong>to</strong> a 16-bit number.<br />

.include "filename"<br />

Provides a mechanism for including supporting files in a source program.<br />

.org new-lc<br />

Advances <strong>the</strong> location counter by new-lc, where new-lc is used as an offset from <strong>the</strong> starting location specified<br />

in <strong>the</strong> <strong>Altera</strong> Moni<strong>to</strong>r Program’s system configuration window. The .org directive may only increase <strong>the</strong> location<br />

counter, or leave it unchanged; it cannot move <strong>the</strong> location counter backwards.<br />

.skip size<br />

Emits <strong>the</strong> number of bytes specified in size; <strong>the</strong> value of each byte is zero.<br />

.text<br />

Identifies <strong>the</strong> code that should be placed in <strong>the</strong> text section of <strong>the</strong> memory. The desired memory location for <strong>the</strong><br />

text section can be specified in <strong>the</strong> <strong>Altera</strong> Moni<strong>to</strong>r Program’s system configuration window.<br />

.word expressions<br />

Expressions separated by commas are specified. Each expression is assembled in<strong>to</strong> a 32-bit number.<br />

8 Example Program<br />

As an illustration of <strong>Nios</strong> <strong>II</strong> instructions and assembler directives, Figure 6 gives an assembly language program<br />

that computes a dot product of two vec<strong>to</strong>rs, A and B. The vec<strong>to</strong>rs have n elements. The required computation is<br />

Dot product = ∑ n−1<br />

i=0<br />

A(i) × B(i)<br />

The vec<strong>to</strong>rs are s<strong>to</strong>red in memory locations at addresses AVECTOR and BVECTOR, respectively. The number of elements,<br />

n, is s<strong>to</strong>red in memory location N. The computed result is written in<strong>to</strong> memory location DOT_PRODUCT.<br />

Each vec<strong>to</strong>r element is assumed <strong>to</strong> be a signed 32-bit number.<br />

17


.include "nios_macros.s"<br />

.global _start<br />

_start:<br />

movia r2, AVECTOR /* Register r2 is a pointer <strong>to</strong> vec<strong>to</strong>r A */<br />

movia r3, BVECTOR /* Register r3 is a pointer <strong>to</strong> vec<strong>to</strong>r B */<br />

movia r4, N<br />

ldw r4, 0(r4) /* Register r4 is used as <strong>the</strong> counter for loop iterations */<br />

add r5, r0, r0 /* Register r5 is used <strong>to</strong> accumulate <strong>the</strong> product */<br />

LOOP: ldw r6, 0(r2) /* Load <strong>the</strong> next element of vec<strong>to</strong>r A */<br />

ldw r7, 0(r3) /* Load <strong>the</strong> next element of vec<strong>to</strong>r B */<br />

mul r8, r6, r7 /* Compute <strong>the</strong> product of next pair of elements */<br />

add r5, r5, r8 /* Add <strong>to</strong> <strong>the</strong> sum */<br />

addi r2, r2, 4 /* Increment <strong>the</strong> pointer <strong>to</strong> vec<strong>to</strong>r A */<br />

addi r3, r3, 4 /* Increment <strong>the</strong> pointer <strong>to</strong> vec<strong>to</strong>r B */<br />

subi r4, r4, 1 /* Decrement <strong>the</strong> counter */<br />

bgt r4, r0, LOOP /* Loop again if not finished */<br />

stw r5, DOT_PRODUCT(r0) /* S<strong>to</strong>re <strong>the</strong> result in memory */<br />

STOP: br STOP<br />

N:<br />

.word 6 /* Specify <strong>the</strong> number of elements */<br />

AVECTOR:<br />

.word 5, 3, −6, 19, 8, 12 /* Specify <strong>the</strong> elements of vec<strong>to</strong>r A */<br />

BVECTOR:<br />

.word 2, 14, −3, 2, −5, 36 /* Specify <strong>the</strong> elements of vec<strong>to</strong>r B */<br />

DOT_PRODUCT:<br />

.skip 4<br />

Figure 6. A program that computes <strong>the</strong> dot product of two vec<strong>to</strong>rs.<br />

Note that <strong>the</strong> program ends by continuously looping on <strong>the</strong> last Branch instruction. If instead we wanted <strong>to</strong> pass<br />

control <strong>to</strong> debugging software, we could replace this br instruction with <strong>the</strong> break instruction.<br />

The program includes <strong>the</strong> assembler directive<br />

.include "nios_macros.s"<br />

which informs <strong>the</strong> Assembler <strong>to</strong> use some macro commands that have been created for <strong>the</strong> <strong>Nios</strong> <strong>II</strong> processor. In<br />

this program, <strong>the</strong> macro used converts <strong>the</strong> movia pseudoinstruction in<strong>to</strong> two OR instructions as explained in section<br />

6.4.<br />

The directive<br />

.global _start<br />

indicates <strong>to</strong> <strong>the</strong> Assembler that <strong>the</strong> label _start is accessible outside <strong>the</strong> assembled object file. This label is <strong>the</strong><br />

default label we use <strong>to</strong> indicate <strong>to</strong> <strong>the</strong> Linker program <strong>the</strong> beginning of <strong>the</strong> application program.<br />

The program includes some sample data. It illustrates how <strong>the</strong> .word directive can be used <strong>to</strong> load data items<br />

in<strong>to</strong> memory. The memory locations involved are those that follow <strong>the</strong> location occupied by <strong>the</strong> br instruction.<br />

Since we have not explicitly specified <strong>the</strong> starting address of <strong>the</strong> program itself, <strong>the</strong> assembled code will be loaded<br />

in memory starting at address 0.<br />

To execute <strong>the</strong> program in Figure 6 on <strong>Altera</strong>’s DE2 board, it is necessary <strong>to</strong> implement a <strong>Nios</strong> <strong>II</strong> processor<br />

and its memory (which can be just <strong>the</strong> on-chip memory of <strong>the</strong> Cyclone <strong>II</strong> FPGA). Since <strong>the</strong> program includes <strong>the</strong><br />

18


Multiply instruction, it cannot be executed on <strong>the</strong> economy version of <strong>the</strong> processor, because <strong>Nios</strong> <strong>II</strong>/e does not<br />

support <strong>the</strong> mul instruction. Ei<strong>the</strong>r <strong>Nios</strong> <strong>II</strong>/s or <strong>Nios</strong> <strong>II</strong>/f processors can be used.<br />

The tu<strong>to</strong>rial <strong>Introduction</strong> <strong>to</strong> <strong>the</strong> <strong>Altera</strong> SOPC Builder explains how a <strong>Nios</strong> <strong>II</strong> system can be implemented.<br />

The tu<strong>to</strong>rial <strong>Altera</strong> Moni<strong>to</strong>r Program explains how an application program can be assembled, downloaded and<br />

executed on <strong>the</strong> DE2 board.<br />

9 Exception Processing<br />

An exception in <strong>the</strong> normal flow of program execution can be caused by:<br />

• <strong>Soft</strong>ware trap<br />

• Hardware interrupt<br />

• Unimplemented instruction<br />

In response <strong>to</strong> an exception <strong>the</strong> <strong>Nios</strong> <strong>II</strong> processor performs <strong>the</strong> following actions:<br />

1. Saves <strong>the</strong> existing processor status information by copying <strong>the</strong> contents of <strong>the</strong> status register (ctl0) in<strong>to</strong> <strong>the</strong><br />

estatus register (ctl1)<br />

2. Clears <strong>the</strong> U bit in <strong>the</strong> status register, <strong>to</strong> ensure that <strong>the</strong> processor is in <strong>the</strong> Supervisor mode<br />

3. Clears <strong>the</strong> PIE bit in <strong>the</strong> status register, thus disabling <strong>the</strong> additional external processor interrupts<br />

4. Writes <strong>the</strong> address of <strong>the</strong> instruction after <strong>the</strong> exception in<strong>to</strong> <strong>the</strong> ea register (r29)<br />

5. Transfers execution <strong>to</strong> <strong>the</strong> address of <strong>the</strong> exception handler which determines <strong>the</strong> cause of <strong>the</strong> exception and<br />

dispatches an appropriate exception routine <strong>to</strong> respond <strong>to</strong> <strong>the</strong> exception<br />

The address of <strong>the</strong> exception handler is specified at system generation time using <strong>the</strong> SOPC Builder, and it cannot<br />

be changed by software at run time. This address can be provided by <strong>the</strong> designer; o<strong>the</strong>rwise, <strong>the</strong> default address<br />

is 20 16 from <strong>the</strong> starting address of <strong>the</strong> main memory. For example, if <strong>the</strong> memory starts at address 0, <strong>the</strong>n <strong>the</strong><br />

default address of <strong>the</strong> exception handler is 0x00000020.<br />

9.1 <strong>Soft</strong>ware Trap<br />

A software exception occurs when a trap instruction is encountered in a program. This instruction saves <strong>the</strong><br />

address of <strong>the</strong> next instruction in <strong>the</strong> ea register (r29). Then, it disables interrupts and transfers execution <strong>to</strong> <strong>the</strong><br />

exception handler.<br />

In <strong>the</strong> exception-service routine <strong>the</strong> last instruction is eret (Exception Return), which returns execution control<br />

<strong>to</strong> <strong>the</strong> instruction that follows <strong>the</strong> trap instruction that caused <strong>the</strong> exception. The return address is given by <strong>the</strong><br />

contents of register ea. The eret instruction res<strong>to</strong>res <strong>the</strong> previous status of <strong>the</strong> processor by copying <strong>the</strong> contents<br />

of <strong>the</strong> estatus register in<strong>to</strong> <strong>the</strong> status register.<br />

A common use of <strong>the</strong> software trap is <strong>to</strong> transfer control <strong>to</strong> a different program, such as an operating system.<br />

9.2 Hardware Interrupts<br />

Hardware interrupts can be raised by external sources, such as I/O devices, by asserting one of <strong>the</strong> processor’s 32<br />

interrupt-request inputs, irq0 through irq31. An interrupt is generated only if <strong>the</strong> following three conditions are<br />

true:<br />

• The PIE bit in <strong>the</strong> status register is set <strong>to</strong> 1<br />

• An interrupt-request input, irqk, is asserted<br />

• The corresponding interrupt-enable bit, ctl3 k , is set <strong>to</strong> 1<br />

19


The contents of <strong>the</strong> ipending register (ctl4) indicate which interrupt requests are pending. An exception routine<br />

determines which of <strong>the</strong> pending interrupts has <strong>the</strong> highest priority, and transfers control <strong>to</strong> <strong>the</strong> corresponding<br />

interrupt-service routine.<br />

Upon completion of <strong>the</strong> interrupt-service routine, <strong>the</strong> execution control is returned <strong>to</strong> <strong>the</strong> interrupted program<br />

by means of <strong>the</strong> eret instruction, as explained above. However, since an external interrupt request is handled<br />

without first completing <strong>the</strong> instruction that is being executed when <strong>the</strong> interrupt occurs, <strong>the</strong> interrupted instruction<br />

must be re-executed upon return from <strong>the</strong> interrupt-service routine. To achieve this, <strong>the</strong> interrupt-service routine<br />

has <strong>to</strong> adjust <strong>the</strong> contents of <strong>the</strong> ea register which are at this time pointing <strong>to</strong> <strong>the</strong> next instruction of <strong>the</strong> interrupted<br />

program. Hence, <strong>the</strong> value in <strong>the</strong> ea register has <strong>to</strong> be decremented by 4 prior <strong>to</strong> executing <strong>the</strong> eret instruction.<br />

9.3 Unimplemented Instructions<br />

This exception occurs when <strong>the</strong> processor encounters a valid instruction that is not implemented in hardware. This<br />

may be <strong>the</strong> case with instructions such as mul and div. The exception handler may call a routine that emulates <strong>the</strong><br />

required operation in software.<br />

9.4 Determining <strong>the</strong> Type of Exception<br />

When an exception occurs, <strong>the</strong> exception-handling routine has <strong>to</strong> determine what type of exception has occurred.<br />

The order in which <strong>the</strong> exceptions should be checked is:<br />

1. Read <strong>the</strong> ipending register <strong>to</strong> see if a hardware interrupt has occurred; if so, <strong>the</strong>n go <strong>to</strong> <strong>the</strong> appropriate<br />

interrupt-service routine.<br />

2. Read <strong>the</strong> instruction that was being executed when <strong>the</strong> exception occurred. The address of this instruction<br />

is <strong>the</strong> value in <strong>the</strong> ea register minus 4. If this is <strong>the</strong> trap instruction, <strong>the</strong>n go <strong>to</strong> <strong>the</strong> software-trap-handling<br />

routine.<br />

3. O<strong>the</strong>rwise, <strong>the</strong> exception is due <strong>to</strong> an unimplemented instruction.<br />

9.5 Exception Processing Example<br />

The following example illustrates <strong>the</strong> <strong>Nios</strong> <strong>II</strong> code needed <strong>to</strong> deal with a hardware interrupt. We will assume that<br />

an I/O device raises an interrupt request on <strong>the</strong> interrupt-request input irq1. Also, let <strong>the</strong> exception handler start at<br />

address 0x020, and <strong>the</strong> interrupt-service routine for <strong>the</strong> irq1 request start at address 0x0100.<br />

Figure 7 shows a portion of <strong>the</strong> code that can be used for this purpose. The exception handler first determines<br />

<strong>the</strong> type of exception that has occurred. Having determined that <strong>the</strong>re is a hardware interrupt request, it finds <strong>the</strong><br />

specific interrupt by examining <strong>the</strong> bits of <strong>the</strong> et register which has a copy of control register ctl4. If bit et 1 is<br />

equal <strong>to</strong> 1, <strong>the</strong>n <strong>the</strong> <strong>the</strong> interrupt-service routine EXT_IRQ1 is executed. O<strong>the</strong>rwise, it is necessary <strong>to</strong> check for<br />

o<strong>the</strong>r possible interrupts.<br />

Note that in Figure 7 we are using register r13 in <strong>the</strong> process of testing whe<strong>the</strong>r <strong>the</strong> bit irq1 is set <strong>to</strong> 1. In a<br />

practical application program this register may also be used for some o<strong>the</strong>r purpose, in which case its contents<br />

should first be saved on <strong>the</strong> stack and later res<strong>to</strong>red prior <strong>to</strong> returning from <strong>the</strong> exception handler.<br />

20


.org 0x20<br />

/* Exception handler */<br />

rdctl et, ctl4 /* Check if external interrupt occurred */<br />

beq et, r0, OTHER_EXCEPTIONS /* If zero, check exceptions */<br />

subi ea, ea, 4 /* Hardware interrupt, decrement ea <strong>to</strong> execute <strong>the</strong> interrupted */<br />

/* instruction upon return <strong>to</strong> main program */<br />

andi r13, et, 2 /* Check if irq1 asserted */<br />

bne r13, r0, EXT_IRQ1 /* If yes, go <strong>to</strong> IRQ1 service routine */<br />

/* Instructions that check for o<strong>the</strong>r hardware interrupts should be placed here */<br />

OTHER_EXCEPTIONS:<br />

/* Instructions that check for o<strong>the</strong>r types of exceptions should be placed here */<br />

.org 0x100<br />

/* Interrupt-service routine for <strong>the</strong> desired hardware interrupt */<br />

EXT_IRQ1:<br />

/* Instructions that handle <strong>the</strong> irq1 interrupt request should be placed here */<br />

eret /* Return from exception */<br />

Figure 7. Code used <strong>to</strong> handle a hardware interrupt.<br />

10 Cache Memory<br />

As shown in Figure 4, a <strong>Nios</strong> <strong>II</strong> system can include instruction and data caches, which are implemented in <strong>the</strong><br />

memory blocks in <strong>the</strong> FPGA chip. The caches can be specified when a system is being designed by using <strong>the</strong><br />

SOPC Builder software. Inclusion of caches improves <strong>the</strong> performance of a <strong>Nios</strong> <strong>II</strong> system significantly, particularly<br />

when most of <strong>the</strong> main memory is provided by an external SDRAM chip, as is <strong>the</strong> case with <strong>Altera</strong>’s DE2<br />

board. Both instruction and data caches are direct-mapped.<br />

The instruction cache can be implemented in <strong>the</strong> fast and standard versions of <strong>the</strong> <strong>Nios</strong> <strong>II</strong> processor systems.<br />

It is organized in 8 words per cache line, and its size is a user-selectable design parameter.<br />

The data cache can be implemented only with <strong>the</strong> <strong>Nios</strong> <strong>II</strong>/f processor. It has a configurable line size of 4, 16<br />

or 32 bytes per cache line. Its overall size is also a user-selectable design parameter.<br />

10.1 Cache Management<br />

Cache management is handled by software. For this purpose <strong>the</strong> <strong>Nios</strong> <strong>II</strong> instruction set includes <strong>the</strong> following<br />

instructions:<br />

• initd IMMED16(rA) (Initialize data-cache line)<br />

Invalidates <strong>the</strong> line in <strong>the</strong> data cache that is associated with <strong>the</strong> address determined by adding <strong>the</strong> signextended<br />

value IMMED16 and <strong>the</strong> contents of register rA.<br />

• initi rA (Initialize instruction-cache line)<br />

Invalidates <strong>the</strong> line in <strong>the</strong> instruction cache that is associated with <strong>the</strong> address contained in register rA.<br />

21


• flushd IMMED16(rA) (Flush data-cache line)<br />

Computes <strong>the</strong> effective address by adding <strong>the</strong> sign-extended value IMMED16 and <strong>the</strong> contents of register<br />

rA. Then, it identifies <strong>the</strong> cache line associated with this effective address, writes any dirty data in <strong>the</strong> cache<br />

line back <strong>to</strong> memory, and invalidates <strong>the</strong> cache line.<br />

• flushi rA (Flush instruction-cache line)<br />

Invalidates <strong>the</strong> line in <strong>the</strong> instruction cache that is associated with <strong>the</strong> address contained in register rA.<br />

10.2 Cache Bypass Methods<br />

A <strong>Nios</strong> <strong>II</strong> processor uses its data cache in <strong>the</strong> standard manner. But, it also allows <strong>the</strong> cache <strong>to</strong> be bypassed in<br />

two ways. As mentioned in section 6.1, <strong>the</strong> Load and S<strong>to</strong>re instructions have a version intended for accessing I/O<br />

devices, where <strong>the</strong> effective address specifies a location in an I/O device interface. These instructions are: ldwio,<br />

ldbio, lduio, ldhio, ldhuio, stwio, stbio, and sthio. They bypass <strong>the</strong> data cache.<br />

Ano<strong>the</strong>r way of bypassing <strong>the</strong> data cache is by using bit 31 of an address as a tag that indicates whe<strong>the</strong>r <strong>the</strong><br />

processor should transfer <strong>the</strong> data <strong>to</strong>/from <strong>the</strong> cache, or bypass it. This feature is available only in <strong>the</strong> <strong>Nios</strong> <strong>II</strong>/f<br />

processor.<br />

Mixing cached and uncached accesses has <strong>to</strong> be done with care. O<strong>the</strong>rwise, <strong>the</strong> coherence of <strong>the</strong> cached data<br />

may be compromised.<br />

11 Tightly Coupled Memory<br />

As explained in section 4, a <strong>Nios</strong> <strong>II</strong> processor can access <strong>the</strong> memory blocks in <strong>the</strong> FPGA chip as a tightly coupled<br />

memory. This arrangement does not use <strong>the</strong> Avalon network. Instead, <strong>the</strong> tightly coupled memory is connected<br />

directly <strong>to</strong> <strong>the</strong> processor.<br />

Data in <strong>the</strong> tightly coupled memory is accessed using <strong>the</strong> normal Load and S<strong>to</strong>re instructions, such as ldw or<br />

stw. The <strong>Nios</strong> <strong>II</strong> control circuits determine if <strong>the</strong> address of a memory location is in <strong>the</strong> tightly coupled memory.<br />

Accesses <strong>to</strong> <strong>the</strong> tightly coupled memory bypass <strong>the</strong> caches. For <strong>the</strong> address span of <strong>the</strong> tightly coupled memory,<br />

<strong>the</strong> processor operates as if caches were not present.<br />

Copyright c○2008 <strong>Altera</strong> Corporation. All rights reserved. <strong>Altera</strong>, The Programmable Solutions Company, <strong>the</strong><br />

stylized <strong>Altera</strong> logo, specific device designations, and all o<strong>the</strong>r words and logos that are identified as trademarks<br />

and/or service marks are, unless noted o<strong>the</strong>rwise, <strong>the</strong> trademarks and service marks of <strong>Altera</strong> Corporation in<br />

<strong>the</strong> U.S. and o<strong>the</strong>r countries. All o<strong>the</strong>r product or service names are <strong>the</strong> property of <strong>the</strong>ir respective holders.<br />

<strong>Altera</strong> products are protected under numerous U.S. and foreign patents and pending applications, mask work<br />

rights, and copyrights. <strong>Altera</strong> warrants performance of its semiconduc<strong>to</strong>r products <strong>to</strong> current specifications in<br />

accordance with <strong>Altera</strong>’s standard warranty, but reserves <strong>the</strong> right <strong>to</strong> make changes <strong>to</strong> any products and services at<br />

any time without notice. <strong>Altera</strong> assumes no responsibility or liability arising out of <strong>the</strong> application or use of any<br />

information, product, or service described herein except as expressly agreed <strong>to</strong> in writing by <strong>Altera</strong> Corporation.<br />

<strong>Altera</strong> cus<strong>to</strong>mers are advised <strong>to</strong> obtain <strong>the</strong> latest version of device specifications before relying on any published<br />

information and before placing orders for products or services.<br />

This document is being provided on an “as-is” basis and as an accommodation and <strong>the</strong>refore all warranties, representations<br />

or guarantees of any kind (whe<strong>the</strong>r express, implied or statu<strong>to</strong>ry) including, without limitation, warranties<br />

of merchantability, non-infringement, or fitness for a particular purpose, are specifically disclaimed.<br />

22

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!