30.07.2015 Views

Actas JP2011 - Universidad de La Laguna

Actas JP2011 - Universidad de La Laguna

Actas JP2011 - Universidad de La Laguna

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Actas</strong> XXII Jornadas <strong>de</strong> Paralelismo (<strong>JP2011</strong>) , <strong>La</strong> <strong>La</strong>guna, Tenerife, 7-9 septiembre 2011form or local, while the latter introduces contentionand hotspots that reduce performance, as in complementor butterfly. Due to space limitations, only theresults for three traffic patterns are shown as theycan represent the behaviour observed on the rest.These are uniform, bit-complement and butterfly.Figure 4 shows the throughput and latency of kingnetworks using Knaive compared to those of 2d toriand meshes. It proves that the increased <strong>de</strong>gree ofthe king networks outperforms their baseline counterpartsby more than a factor two. The averagelatency on zero load is reduced according to the averagedistance theoretical values. Packets are 16-phitlong, thus making the latency improvement less obviousin the graphs. Observe that king meshes havesignificantly better performance than 2d tori, bothin throughput and latency.Figure 5 presents an analysis of the different routingtechniques un<strong>de</strong>r the three traffic patterns andfor 8 × 8 king tori and meshes. Comparing the resultsof networks with different sizes highlights thatthe throughput per no<strong>de</strong> is halved. This is due to thewell known fact that the number of no<strong>de</strong>s in squarenetworks grows quadratically with the si<strong>de</strong> while thebisection bandwidth grows linearly.For benign traffic patterns, the best results aregiven by Knaive routing. However in adverse traffic,a sensible <strong>de</strong>crease in performance is observed,caused by the reduced path diversity. As mentionedin Section III this limitation is overcome by theKmiss routing. In fact this routing yields poor performanceun<strong>de</strong>r benign traffic pattern but very goodun<strong>de</strong>r the adverse ones.Our composite routing algorithm KBugal gives thebest average performance on all traffic patterns. Inthe benign situations the throughput is slightly lessthan Knaive. And un<strong>de</strong>r adverse traffic, performanceis similar to the Kmiss routing, being even better insome situations. The results show that KBugal givesbetter performance than its more generic pre<strong>de</strong>cessorUGAL. As can be seen, un<strong>de</strong>r benign traffic aimprovement of 15% is obtained and between 10%(complement) and 90% (butterfly).V. ConclusionIn this paper we have presented the foundations ofking networks. Their topological properties offer tantalisingpossibilities, positioning them as clear candidatesfor future network-on-chip systems. Noteworthyare king meshes, which have the implementationsimplicity and wire length of a mesh yet better performancethan 2d tori. In addition, we have presenteda series of routing techniques specific for kingnetworks, that are both adaptive and <strong>de</strong>adlock free,which allow to exploit their topological richness. Afirst performance evaluation of these algorithms un<strong>de</strong>rsynthetic traffic has been presented in which theirproperties are highlighted. Further study will be requiredto take full advantage of these novel topologiesthat promise higher throughput, smaller latency,trivial partitioning and high fault-tolerance.AcknowledgmentThis work has been fun<strong>de</strong>d by the Spanish Ministryof Education and Science (grant TIN2007-68023-C02-01, Consoli<strong>de</strong>r CSD2007-00050) and bythe HiPEAC European Network of Excellence.References[1] J. Kim, W.J. Dally, S. Scott, and D. Abts, “Technologydriven,highly-scalable dragonfly topology,” SIGARCHComput. Archit. News, vol. 36, no. 3, pp. 77–88, 2008.[2] S. Scott, D. Abts, J. Kim, and W.J. Dally, “The blackwidowhigh-radix clos network,” SIGARCH Comput. Archit.News, vol. 34, no. 2, pp. 16–28, 2006.[3] J. Kim, J. Balfour, and W. Dally, “Flattened butterflytopology for on-chip networks,” in MICRO 07: Proceedingsof the 40th Annual IEEE/ACM International Symposiumon Microarchitecture, Washington, DC, USA,2007, pp. 172–182, IEEE Computer Society.[4] D. Wentzlaff, P. Griffin, H. Hoffmann, L. Bao, B. Edwards,C. Ramey, M. Mattina, C.-C. Miao, J.F.B. III,and A. Agarwal, “On-chip interconnection architectureof the tile processor,” IEEE Micro, vol. 27, pp. 15–31,2007.[5] S.R. Vangal, J. Howard, G. Ruhl, S. Dighe, H. Wilson,J. Tschanz, D. Finan, A. Singh, T. Jacob, S. Jain,V. Erraguntla, C. Roberts, Y. Hoskote, N. Borkar, andS. Borkar, “An 80-tile sub-100-w teraflops processor in65-nm cmos,” Solid-State Circuits, IEEE Journal of, vol.43, no. 1, pp. 29–41, 2008.[6] M. Igarashi, T. Mitsuhashi, A. Le, S. Kazi, Y.T. Lin,A. Fujimura, and S. Teig, “A diagonal interconnect architectureand its application to risc core <strong>de</strong>sign,” IEICTechnical Report (Institute of Electronics, Informationand Communication Engineers), vol. 102, no. 72, pp. 19–23, 2002.[7] A. Marshall, T. Stansfield, I. Kostarnov, J. Vuillemin,and B. Hutchings, “A reconfigurable arithmetic array formultimedia applications,” in FPGA 99: Proceedings ofthe 1999 ACM/SIGDA seventh international symposiumon Field programmable gate arrays, New York, NY, USA,1999, pp. 135–143, Acm.[8] K.W. Tang and S.A. Padubidri, “Diagonal and toroidalmesh networks,” Computers, IEEE Transactions on, vol.43, no. 7, pp. 815–826, 1994.[9] K.G. Shin and G. Dykema, “A distributed i/o architecturefor harts,” in Computer Architecture, 1990. Proceedings.,17th Annual International Symposium on, 1990,pp. 332–342.[10] WH Hu, SE Lee, and N. Bagherza<strong>de</strong>h, “Dmesh: adiagonally-linked mesh network-on-chip architecture,”nocarc, 2008.[11] I.S. Honkala and T. <strong>La</strong>ihonen, “Co<strong>de</strong>s for i<strong>de</strong>ntificationin the king lattice,” Graphs and Combinatorics, vol. 19,no. 4, pp. 505–516, 2003.[12] J.M. Camara, M. Moreto, E. Vallejo, R. Beivi<strong>de</strong>,J. Miguel-Alonso, C. Martinez, and J. Navaridas,“Twisted torus topologies for enhanced interconnectionnetworks,” IEEE Transactions on Parallel and DistributedSystems, vol. 99, no. PrePrints, 2010.[13] W. Dally and B. Towles, Principles and Practices ofInterconnection Networks, Morgan Kaufmann PublishersInc., San Francisco, CA, USA, 2003.[14] C. Martinez, E. Stafford, R. Beivi<strong>de</strong>, C. Camarero,F. Vallejo, and E. Gabidulin, “Graph-based metrics overqam constellations,” Information Theory, 2008. ISIT2008. IEEE International Symposium on, pp. 2494–2498,2008.[15] A. Singh, Load-Balanced Routing in InterconnectionNetworks, Ph.D. thesis, 2005.[16] L.G. Valiant, “A scheme for fast parallel communication,”SIAM Journal on Computing, vol. 11, no. 2, pp.350–361, 1982.[17] FJ Ridruejo Perez and J. Miguel-Alonso, “Insee: Aninterconnection network simulation and evaluation environment,”2005.[18] V. Puente, C. Izu, R. Beivi<strong>de</strong>, JA Gregorio, F. Vallejo,and JM Prellezo, “The adaptive bubble router,” J. ParallelDistrib. Comput., vol. 61, no. 9, pp. 1180–1208, 2001.<strong>JP2011</strong>-456

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!