10.07.2015 Views

80960KA EMBEDDED 32-BIT MICROPROCESSOR - Datasheet ...

80960KA EMBEDDED 32-BIT MICROPROCESSOR - Datasheet ...

80960KA EMBEDDED 32-BIT MICROPROCESSOR - Datasheet ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>80960KA</strong>purpose registers provided in other popular microprocessors.The term global refers to the fact that theseregisters retain their contents across procedure calls.The local registers, on the other hand, are procedurespecific. For each procedure call, the <strong>80960KA</strong>allocates 16 local registers (R0 through R15). Eachlocal register is <strong>32</strong> bits wide.1.1.4. Multiple Register SetsTo further increase the efficiency of the register set,multiple sets of local registers are stored on-chip (SeeFigure 4). This cache holds up to four local registerframes, which means that up to three procedure callscan be made without having to access the procedurestack resident in memory.Although programs may have procedure calls nestedmany calls deep, a program typically oscillates backand forth between only two to three levels. As aresult, with four stack frames in the cache, the probabilityof having a free frame available on the cachewhen a call is made is very high. In fact, runs of representativeC-language programs show that 80% of thecalls are handled without needing to access memory.If four or more procedures are active and a newprocedure is called, the <strong>80960KA</strong> moves the oldestlocal register set in the stack-frame cache to aprocedure stack in memory to make room for a newset of registers. Global register G15 is the framepointer (FP) to the procedure stack.Global registers are not exchanged on a procedurecall, but retain their contents, making them availableto all procedures for fast parameter passing.1.1.5. Instruction CacheTo further reduce memory accesses, the <strong>80960KA</strong>includes a 512-byte on-chip instruction cache. Theinstruction cache is based on the concept of localityof reference; most programs are not usually executedin a steady stream but consist of many branches,loops and procedure calls that lead to jumping backand forth in the same small section of code. Thus, bymaintaining a block of instructions in cache, thenumber of memory references required to readinstructions into the processor is greatly reduced.To load the instruction cache, instructions are fetchedin 16-byte blocks; up to four instructions can befetched at one time. An efficient prefetch algorithmincreases the probability that an instruction willalready be in the cache when it is needed.Code for small loops often fits entirely within thecache, leading to a great increase in processingspeed since further memory references might not benecessary until the program exits the loop. Similarly,when calling short procedures, the code for thecalling procedure is likely to remain in the cache so itwill be there on the procedure’s return.1.1.6. Register ScoreboardingThe instruction decoder is optimized in several ways.One optimization method is the ability to overlapinstructions by using register scoreboarding.Register scoreboarding occurs when a LOAD movesa variable from memory into a register. When theinstruction initiates, a scoreboard bit on the targetregister is set. Once the register is loaded, the bit isreset. In between, any reference to the registercontents is accompanied by a test of the scoreboardbit to ensure that the load has completed beforeprocessing continues. Since the processor does notneed to wait for the LOAD to complete, it can executeadditional instructions placed between the LOAD andthe instruction that uses the register contents, asshown in the following example:ld data_2, r4ld data_2, r5Unrelated instructionUnrelated instructionadd R4, R5, R6In essence, the two unrelated instructions betweenLOAD and ADD are executed “for free” (i.e., take noapparent time to execute) because they are executedwhile the register is being loaded. Up to three loadinstructions can be pending at one time with threecorresponding scoreboard bits set. By exploiting thisfeature, system programmers and compiler writershave a useful tool for optimizing execution speed.5

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!