15.01.2013 Views

U. Glaeser

U. Glaeser

U. Glaeser

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

FIGURE 5.15 Organizing the PEs of a multithreaded processor as a circular queue.<br />

parallel on multiple PEs instead of time-sharing a single PE. As we look into the future, and the prospect<br />

of a billion transistors on a single chip, it seems inevitable that microprocessors will have multiple PEs.<br />

PE Organization<br />

The next issue of importance in a multithreaded processor is the organization of the PEs. This issue is<br />

strongly tied to the PE interconnect used. Most of the sequential threads model based processors organize<br />

the PEs as a circular queue, as shown in Fig. 5.15. The circular queue imposes a sequential order among<br />

the PEs, with the head pointer indicating the oldest active PE. When the tail PE is idle, a thread allocation<br />

unit (TAU) invokes the next thread (as per the sequential thread ordering) on the tail PE and advances<br />

the tail pointer. Completed threads are retired from the head of the PE queue, enforcing the required<br />

sequential ordering. Although this PE organization is tailored for sequential threads (from a sequential<br />

program), this multithreaded hardware can also execute multiple threads from different processes, if<br />

required.<br />

An important issue that needs to be considered when organizing the PEs as a circular queue is load<br />

balancing. If some PEs have long threads assigned to them, and the rest have short ones, only modest<br />

performance will be obtained. If threads are not close to the same size, a short thread may complete soon<br />

and perform no useful computation while it waits for longer predecessor threads to retire. To get good<br />

performance, threads should be of uniform length. 1 One option to deal with load balancing, albeit with<br />

additional hardware complexity, is to let each physical PE have multiple virtual PEs and assign a thread<br />

to each of the virtual PEs.<br />

Inter-PE Register Communication and Synchronization<br />

As discussed earlier, a few multithreading approaches have a shared register space for all threads, and the<br />

rest do not. When threads share a common register space, the thread sequencing model has always been<br />

the sequential threads model. Because the semantics of this model are in line with sequential control<br />

flow, synchronization happens automatically, once inter-PE register communication is handled properly.<br />

Register File Implementation<br />

When threads do not share a common register space, it is straightforward to implement the register file<br />

(RF)—each PE can have its own register file, thereby providing fast register access. When threads share<br />

1 The actual, more stringent, requirement is that the thread execution times should be matched across all PEs. This<br />

is a more difficult problem, because it depends on intra- and inter-PE data dependences as well.<br />

© 2002 by CRC Press LLC<br />

0 4<br />

1<br />

7<br />

TAU<br />

H T<br />

6<br />

2<br />

PE<br />

5<br />

3

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!