12.07.2015 Views

Simultaneous Multithreading – Blending Thread-level and ...

Simultaneous Multithreading – Blending Thread-level and ...

Simultaneous Multithreading – Blending Thread-level and ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

ased on a hypothetical out-of-order issue superscalarmicroprocessor that resembles the MIPS R10000 <strong>and</strong> HPPA-8000 [7]. This approach evaluated more realisticprocessor configurations, again with the 8-threaded <strong>and</strong>8-issue superscalar organization <strong>and</strong> reached athroughput of 5.4 IPC on the same benchmarks.The SMT processor simulated at the University ofKarlsruhe [23] was based on a simplified PowerPC 604processor with multimedia enhancement [21]. Thesimulations showed that a single-threaded, 8-issuemaximum processor (assuming an abundance ofresources) reaches an IPC count of only 1.60, while an8-threaded 8-issue processor reaches an IPC of 6.07. Amore realistic processor model reaches an IPC of 1.21 inthe single-threaded 8-issue <strong>and</strong> 3.11 in the 8-threaded 8-issue model. Increasing the issue b<strong>and</strong>width from 4 to 8yields only a marginal gain (except for the 4-threaded to8-threaded maximum processor). Increasing the numberof threads from single-threaded to 2-threaded or 4-threaded yields a high gain for the 2-issue to 8-issuemodel, <strong>and</strong> a significant gain for the 8-threaded model.The steepest performance increases arise for the 4-issuemodel from the single-threaded (IPC of 1.21) to the twothreaded(IPC of 2.07) <strong>and</strong> to the 4-threaded (IPC of2.97) cases. In [21] a 2-threaded 4-issue or 4-threaded 4-issue processor configurations are suggested as realisticnext generation processor.At the University of California at Irvine combinedout-of-order execution within an instruction stream withthe simultaneous execution of instructions of differentinstruction streams [19] which resulted in the superscalardigital signal processor. Based on simulations aperformance gain of 20–55% due to multithreading wasachieved across a range of benchmarks. Similarly, at thePolytechnic University of Catalunya combinedsimultaneous multithreaded execution <strong>and</strong> out-of-orderexecution with an integrated vector unit <strong>and</strong> vectorinstructions [8]. Recently, a commercial four-threadedSMT processor Alpha 21464 has been announced.6 ConclusionResearch on multithreaded architectures has beenmotivated by two concerns: tolerating memory latency<strong>and</strong> bridging of synchronization waits by rapid contextswitches. Older multithreaded processor approachesfrom the 1980s usually extend scalar RISC processors bya multithreading technique <strong>and</strong> focus at effectivelybridging very long remote memory access latencies.Such processors will only be useful as processor nodesin distributed-shared-memory multiprocessors. However,developing a processor that is specifically designed fordistributed-shared-memory multiprocessors is commonlyregarded as too expensive. Multiprocessors todaycomprise st<strong>and</strong>ard off-the-shelf microprocessors <strong>and</strong>almost never specifically designed processors (with theexception of Tera MTA <strong>and</strong> SPELL). Therefore, newermultithreaded processor approaches also strive fortolerating smaller latencies that arise from primary cachemisses that hit in secondary cache, from long-latencyoperations, or even from unpredictable branches.Multithreaded processors aim at a low execution timeof a multithreaded workload, while a superscalarprocessor aims at a low execution time of a singleprogram. Depending on the implemented multithreadingtechnique, a multithreaded processor running only asingle thread does not reach the same efficiency as acomparable single-threaded processor. The penalty maybe only slight in the case of a block-interleavingprocessor or be several times as long as the run-time on asingle-threaded processor in the case of a cycle-by-cycleinterleaving processor.Reeferences:[1] A. Agarwal, J. Babb, D. Chaiken, G. D'Souza, K. L.Johnson, D. Kranz, J. Kubiatowicz, B.-H. Lim, G.Maa, K. Mackenzie, Sparcle: A multithreaded VLSIprocessor for parallel computing, Lect. NotesComput. Sc., Vol.748, 1993, pp.359.[2] R. Alverson, D. Callahan, D. Cummings, B.Koblenz, A. Porterfield, J.B. Smith, The Teracomputer system, Proc. 1990 Int. Conf.Supercomput., Amsterdam, The Nederl<strong>and</strong>, June1990, pp.1-6.[3] U. Brinkschulte, C. Krakowski, J. Kreuzinger, T.Ungerer, A multithreaded Java microcontroller forthread-oriented real-time event-h<strong>and</strong>ling, Proc. 1999Conf. PACT, Newport Beach, CA, 1999, pp.34-39.[4] M. Butler, T.-Y. Yeh, Y.N. Patt, M. Alsup, H.Scales, M. Shebanow, Single instruction streamparallelism is greater than two. Proc. 18th Ann.Symp. Comp. Arch., Toronto, Canada, May 1991,pp.276-286.[5] M. Dorojevets, COOL <strong>Multithreading</strong> in HTMTSPELL-1 processors, Intl. Journal on High SpeedElectronics <strong>and</strong> Systems, 1999. (to be published).[6] M.N. Dorozhevets, P. Wolcott, The El'brus-3 <strong>and</strong>MARS-M: Recent advances in Russian highperformancecomputing, The Journal ofSupercomputing, Vol.6, 1992, pp.5-48.[7] S.J. Eggers, J.S. Emer, H.M. Levy, J.L. Lo, R.M.Stamm, D.M. Tullsen, <strong>Simultaneous</strong> multit-hreading:A platform for next-generation processors, IEEEMicro, Vol.17, September/October 1997, pp.12-19.[8] R. Espasa, M. Valero, Exploiting instruction- <strong>and</strong>data-<strong>level</strong> parallelism, IEEE Micro, Vol.17,September/October 1997, pp.20-27.[9] A. Formella, J. Keller, T. Walle, HPP: A high performancePRAM, Lect. Notes Comput. Sc., Vol.1123,1996, pp.425-434.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!