Soft-Core Processor Design - CiteSeer
Soft-Core Processor Design - CiteSeer
Soft-Core Processor Design - CiteSeer
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
underflow and overflow exceptions always occur in pairs in the normal program operation,<br />
because the procedure whose call caused the window underflow eventually returns and causes the<br />
window overflow exception. We will refer to the register window underflow/overflow pair as a<br />
register window exception in the rest of this section.<br />
The register file size changes are simulated by writing appropriate values to the WVALID<br />
register. Since the program only uses registers between LO_LIMIT and HI_LIMIT, and the CWP<br />
Manager saves and reloads only these registers, the run times will be equivalent to the system<br />
with the corresponding register file size. The physical register file-size is 512 registers for all<br />
experiments, which allows the use of 31 register windows. The number of register windows<br />
available to the program varies from 1 to 29, because one register window is used by the GERMS<br />
monitor, and one is reserved for the interrupt handler. We also measured the performance of the<br />
benchmarks when the register windows are not used, but only one register window is visible to<br />
the program. This is achieved by compiling a program with the mflat compiler option, which<br />
instructs the compiler not to use the SAVE and RESTORE instructions, but to save the registers<br />
altered by a procedure on the stack on entrance to the procedure, and restore the registers from the<br />
stack when returning from the procedure [61]. This is different from the system with the CWP<br />
Manager and only one register window available. The CWP Manager always saves the whole<br />
register window on the stack, while the code generated with the mflat compiler option saves only<br />
the registers that are actually altered by the procedure. Benchmarks CRC32, and Qsort could not<br />
be compiled with the mflat option because the compiler terminated with the “Internal Compiler<br />
Error” message.<br />
Figure 5.4 shows how the performance of the programs running on the ONCHIP system<br />
depends on the number of register windows available. Only toy benchmarks and two algorithms<br />
from the Bitcount benchmark with large datasets are shown. The other benchmarks are not<br />
presented, because they show trends similar to the ones presented in the figure. All values are<br />
program run times relative to the run time of the program when 29 register windows are<br />
available. The performance past the 12 register windows is not shown, because it remains close to<br />
1. The point on the horizontal axis with zero available register windows corresponds to the<br />
programs compiled with the mflat compiler option.<br />
There are three distinct trends visible in Figure 5.4. The performance of the Multiply<br />
benchmark does not depend on the register file size, because it does not contain calls to<br />
procedures that use the SAVE and RESTORE instructions. The Bitcount algorithm Non-recursive<br />
by bytes has only one procedure call from the main program to a bit counting procedure.<br />
Therefore, it experiences a slowdown when only one register window is available, because the<br />
57