15.01.2013 Views

U. Glaeser

U. Glaeser

U. Glaeser

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

multithreaded hardware, possibly by making it part of the ISA (at the expense of incompatibility<br />

for existing binaries). Parallelizing compilers have been successful in parallelizing many numeric<br />

applications for the parallel threads model. As pointed out earlier, their success has not been<br />

spectacular when it comes to non-numeric applications and the parallel threads model. Several<br />

researchers are currently working on parallelizing compilers that parallelize such applications for<br />

the sequential threads model.<br />

• Hardware: It is also possible to let the run-time hardware do the program partitioning. If partitioning<br />

decisions are taken by the hardware, the multithreaded processor provides object code<br />

compatibility to existing sequential code. Furthermore, it has the ability to adapt to run-time<br />

behavior. Hardware-based partitioning is typically done only if the thread granularity is small,<br />

and if sequential control flow is used. The main limitation is the significant impact it may have<br />

on clock cycle time. In order to simplify the dynamic partitioning hardware and to reduce the<br />

impact on clock cycle time, the partitioning job is often split into two parts—a static part (which<br />

is done by pre-processing hardware) and a dynamic part. The static part collects information that<br />

is static in nature (such as register dependences in a straightline piece of code) and stores it in a<br />

special i-cache structure, often after performing some additional processing. The dynamic part<br />

uses this information while deciding the final partitioning at run-time. Examples of multithreaded<br />

processors that use hardware-based partitioning are trace processor [28,36], speculative multithreading<br />

processor [22], and dynamic multithreading processor [1].<br />

Compiling for Multithreading<br />

Most of the multithreading approaches perform partitioning at compile time, possibly with some help<br />

from the programmer; it is somewhat unrealistic at this time to expect programmers to write only parallel<br />

programs. The hardware is also limited in its program partitioning capability. Therefore, the compiler<br />

has the potential to play a significant role in multithreading. Besides program partitioning, it can schedule<br />

threads as well as the instructions within threads.<br />

The task of the compiler is to identify sufficient parallelism to keep the processors busy, while minimizing<br />

the effects of synchronization and communication latencies on the execution time of the program.<br />

To accomplish this objective, a parallelizing compiler typically performs the following functions:<br />

1. Identify the parallelism inherent in the program. This phase has received the most attention in<br />

parallel compiler research to date [25,26]. Many varied program transformations have been developed<br />

to unearth parallelism buried in the semantics of sequential programs.<br />

2. Partition the program into multiple threads for parallel execution. This is perhaps the most crucial<br />

phase. Many factors must be considered, such as inter-thread dependences, intra-thread locality,<br />

thread size, critical path, and deadlock avoidance.<br />

3. Schedule the concurrent execution of threads; the final scheduling is often determined by the runtime<br />

environment. The compiler must assign threads to processors in a way that maximizes<br />

processor utilization without severely restricting the amount of parallelism to be exploited.<br />

4. After program partitioning, the compiler can schedule the instructions in a thread so as to reduce<br />

inter-thread wait times. For instance, if a shared value is produced very late in one thread, but is<br />

needed very early in another thread, very little parallelism will be exploited by the hardware. This<br />

problem is likely to surface frequently, if the compiler assumed a single-threaded processor in the<br />

code generation phase. In such situations, post-partitioning scheduling can help minimize the<br />

waiting time of instructions by ensuring that shared values required in other threads are produced<br />

as early as possible. Post-partitioning scheduling is especially beneficial if PEs execute their instructions<br />

in strict serial order.<br />

Object Code Compatibility<br />

Another important issue, especially from the commercial point of view, is the level of compatibility that<br />

the multithreaded processor provides. We can think of three levels of compatibility in the context of<br />

multithreaded processors: full compatibility, family-wide compatibility, and no compatibility.<br />

© 2002 by CRC Press LLC

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!