01.12.2012 Views

Architecture of Computing Systems (Lecture Notes in Computer ...

Architecture of Computing Systems (Lecture Notes in Computer ...

Architecture of Computing Systems (Lecture Notes in Computer ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

threads ALUTs regs MHz<br />

1 14808 3857 27.17<br />

2 21129 5968 26.10<br />

3 27519 8061 24.38<br />

4 31603 10125 17.55<br />

5 39325 12195 11.10<br />

6 45400 14271 8.52<br />

7 49082 16378 7.00<br />

How to Enhance a Superscalar Processor 9<br />

50000<br />

40000<br />

30000<br />

20000<br />

10000<br />

ALUTs<br />

regs<br />

freq<br />

0<br />

0<br />

1 2 3 4 5 6 7<br />

#threads<br />

Fig. 2. CarCore hardware characteristics depend<strong>in</strong>g on the number <strong>of</strong> threads<br />

sent some cycles earlier <strong>in</strong> order that the second micro<strong>in</strong>struction arrives at the<br />

write back stage at the same cycle as the data arrives from memory.<br />

To avoid the restriction <strong>of</strong> the other threads, not to issue memory operations,<br />

Address Buffers are added. There is one address buffer per thread located <strong>in</strong><br />

the memory controller. After a store or the first micro<strong>in</strong>struction <strong>of</strong> a load the<br />

thread is temporarily suspended from further issu<strong>in</strong>g <strong>in</strong>structions. When the<br />

memory <strong>in</strong>struction arrives at the memory controller the address is saved <strong>in</strong><br />

the address buffer <strong>of</strong> the appropriate thread. Whenever a memory operation is<br />

completed, the memory controller looks at the address buffers <strong>in</strong> priority order<br />

and starts a new memory operation if there is a valid entry. At the same cycle the<br />

memory controller notifies the RTI to resume issu<strong>in</strong>g <strong>in</strong>structions <strong>of</strong> the thread<br />

whose data word had just arrived. Depend<strong>in</strong>g on the k<strong>in</strong>d <strong>of</strong> <strong>in</strong>struction the RTI<br />

cont<strong>in</strong>ues with the second micro<strong>in</strong>struction <strong>of</strong> a load or the next <strong>in</strong>struction after<br />

astore.<br />

There is another advantage <strong>of</strong> the Split Phase Load / Address Buffer technique:<br />

the memory latency can vary and there is no upper bound. If the memory<br />

access is fast, the second micro<strong>in</strong>struction <strong>of</strong> the load is issued earlier, if it<br />

takes longer, the second micro<strong>in</strong>struction (respectively the next <strong>in</strong>struction after<br />

a store) is issued later. So multiple memories with different access times are<br />

supported.<br />

5 Evaluation<br />

We started our SMT enhancement with a s<strong>in</strong>glethreaded SystemC model <strong>of</strong><br />

the CarCore and enhanced it to support multithread<strong>in</strong>g. The f<strong>in</strong>al SystemC<br />

model was translated to VHDL for FPGA synthesis. There are separate data and<br />

<strong>in</strong>struction memory buses, each 64 bits wide. In the FPGA model the memory<br />

latencies are fixed to 0 for the <strong>in</strong>struction memory (<strong>in</strong>ternal on-chip RAM) and<br />

4 cycles to the <strong>of</strong>f-chip data memory.<br />

Fig. 2 shows the size <strong>of</strong> the CarCore on an Altera Stratix II EP2S180F1020C3<br />

device. The numbers <strong>of</strong> Adaptive Lookup Tables (ALUTs) and register bits<br />

grow nearly l<strong>in</strong>ear with the number <strong>of</strong> threads. Each thread requires about 6000<br />

ALUTs and 2000 registers and the base processor adds about 9000 ALUTs and<br />

2000 registers.<br />

50<br />

40<br />

30<br />

20<br />

10<br />

MHz

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!