Simultaneous Multithreading â Blending Thread-level and ...
Simultaneous Multithreading â Blending Thread-level and ...
Simultaneous Multithreading â Blending Thread-level and ...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
ased on a hypothetical out-of-order issue superscalarmicroprocessor that resembles the MIPS R10000 <strong>and</strong> HPPA-8000 [7]. This approach evaluated more realisticprocessor configurations, again with the 8-threaded <strong>and</strong>8-issue superscalar organization <strong>and</strong> reached athroughput of 5.4 IPC on the same benchmarks.The SMT processor simulated at the University ofKarlsruhe [23] was based on a simplified PowerPC 604processor with multimedia enhancement [21]. Thesimulations showed that a single-threaded, 8-issuemaximum processor (assuming an abundance ofresources) reaches an IPC count of only 1.60, while an8-threaded 8-issue processor reaches an IPC of 6.07. Amore realistic processor model reaches an IPC of 1.21 inthe single-threaded 8-issue <strong>and</strong> 3.11 in the 8-threaded 8-issue model. Increasing the issue b<strong>and</strong>width from 4 to 8yields only a marginal gain (except for the 4-threaded to8-threaded maximum processor). Increasing the numberof threads from single-threaded to 2-threaded or 4-threaded yields a high gain for the 2-issue to 8-issuemodel, <strong>and</strong> a significant gain for the 8-threaded model.The steepest performance increases arise for the 4-issuemodel from the single-threaded (IPC of 1.21) to the twothreaded(IPC of 2.07) <strong>and</strong> to the 4-threaded (IPC of2.97) cases. In [21] a 2-threaded 4-issue or 4-threaded 4-issue processor configurations are suggested as realisticnext generation processor.At the University of California at Irvine combinedout-of-order execution within an instruction stream withthe simultaneous execution of instructions of differentinstruction streams [19] which resulted in the superscalardigital signal processor. Based on simulations aperformance gain of 20–55% due to multithreading wasachieved across a range of benchmarks. Similarly, at thePolytechnic University of Catalunya combinedsimultaneous multithreaded execution <strong>and</strong> out-of-orderexecution with an integrated vector unit <strong>and</strong> vectorinstructions [8]. Recently, a commercial four-threadedSMT processor Alpha 21464 has been announced.6 ConclusionResearch on multithreaded architectures has beenmotivated by two concerns: tolerating memory latency<strong>and</strong> bridging of synchronization waits by rapid contextswitches. Older multithreaded processor approachesfrom the 1980s usually extend scalar RISC processors bya multithreading technique <strong>and</strong> focus at effectivelybridging very long remote memory access latencies.Such processors will only be useful as processor nodesin distributed-shared-memory multiprocessors. However,developing a processor that is specifically designed fordistributed-shared-memory multiprocessors is commonlyregarded as too expensive. Multiprocessors todaycomprise st<strong>and</strong>ard off-the-shelf microprocessors <strong>and</strong>almost never specifically designed processors (with theexception of Tera MTA <strong>and</strong> SPELL). Therefore, newermultithreaded processor approaches also strive fortolerating smaller latencies that arise from primary cachemisses that hit in secondary cache, from long-latencyoperations, or even from unpredictable branches.Multithreaded processors aim at a low execution timeof a multithreaded workload, while a superscalarprocessor aims at a low execution time of a singleprogram. Depending on the implemented multithreadingtechnique, a multithreaded processor running only asingle thread does not reach the same efficiency as acomparable single-threaded processor. The penalty maybe only slight in the case of a block-interleavingprocessor or be several times as long as the run-time on asingle-threaded processor in the case of a cycle-by-cycleinterleaving processor.Reeferences:[1] A. Agarwal, J. Babb, D. Chaiken, G. D'Souza, K. L.Johnson, D. Kranz, J. Kubiatowicz, B.-H. Lim, G.Maa, K. Mackenzie, Sparcle: A multithreaded VLSIprocessor for parallel computing, Lect. NotesComput. Sc., Vol.748, 1993, pp.359.[2] R. Alverson, D. Callahan, D. Cummings, B.Koblenz, A. Porterfield, J.B. Smith, The Teracomputer system, Proc. 1990 Int. Conf.Supercomput., Amsterdam, The Nederl<strong>and</strong>, June1990, pp.1-6.[3] U. Brinkschulte, C. Krakowski, J. Kreuzinger, T.Ungerer, A multithreaded Java microcontroller forthread-oriented real-time event-h<strong>and</strong>ling, Proc. 1999Conf. PACT, Newport Beach, CA, 1999, pp.34-39.[4] M. Butler, T.-Y. Yeh, Y.N. Patt, M. Alsup, H.Scales, M. Shebanow, Single instruction streamparallelism is greater than two. Proc. 18th Ann.Symp. Comp. Arch., Toronto, Canada, May 1991,pp.276-286.[5] M. Dorojevets, COOL <strong>Multithreading</strong> in HTMTSPELL-1 processors, Intl. Journal on High SpeedElectronics <strong>and</strong> Systems, 1999. (to be published).[6] M.N. Dorozhevets, P. Wolcott, The El'brus-3 <strong>and</strong>MARS-M: Recent advances in Russian highperformancecomputing, The Journal ofSupercomputing, Vol.6, 1992, pp.5-48.[7] S.J. Eggers, J.S. Emer, H.M. Levy, J.L. Lo, R.M.Stamm, D.M. Tullsen, <strong>Simultaneous</strong> multit-hreading:A platform for next-generation processors, IEEEMicro, Vol.17, September/October 1997, pp.12-19.[8] R. Espasa, M. Valero, Exploiting instruction- <strong>and</strong>data-<strong>level</strong> parallelism, IEEE Micro, Vol.17,September/October 1997, pp.20-27.[9] A. Formella, J. Keller, T. Walle, HPP: A high performancePRAM, Lect. Notes Comput. Sc., Vol.1123,1996, pp.425-434.