On-chip Networks for Manycore Architecture Myong ... - People - MIT

More documents

Recommendations

Info

RANDOM FFT RADIX LU OCEAN WATER 8 61 60 61 61 61 Table 4.3: Maximum size of context queues in SWAPinf relative to the size of a thread context in RANDOM than in the application benchmarks because the random migration evenly distributes threads across the cores so there is no heavily congested core (cf. Table 4.3). However, the maximum number of contexts is over 60 for all application benchmarks, which is more than 95% of all threads on the system. This discourages the use of context bu↵ers to avoid deadlock. 2 Despite the potential overhead of ENC described earlier in this section, both ENC and ENC0 have comparable performance, and are overall 11.7% and 15.5% worse than SWAPinf, respectively. Although ENC0 has relatively large overhead of 30% in total migration cost under the random migration pattern, ENC reduces the overhead to only 0.8%. Under application-specific migration patterns, the performance largely depends on the characteristics of the patterns; while ENC and ENC0 have significantly greater migration costs than SWAPinf under RADIX, they perform much more competitively in most applications, sometimes better as in applications such as WATER and OCEAN. This is because each thread in these applications mostly works on its private data; provided a thread’s private data is assigned to its native core, the thread will mostly migrate to the native core (cf. Figure 4-4). Therefore, the native core is not only a safe place to move a context, but also the place where the context most likely makes progress. This is why ENC0 usually has less cost for autonomous migration, but higher eviction costs. Whenever a thread migrates, it needs to be “evicted” to its native core. After eviction, however, the thread need not migrate again if its native core was its migration destination. The e↵ect of the portion of native cores in total migration destinations can be seen in Figure 4-6, showing total migration distances in hop counts normalized to the SWAPinf case. When the destinations of most migrations are native cores, such as 2 Note that, however, the maximum size of context bu↵ers from the simulation results is not a necessary condition, but a su cient condition to prevent deadlock. 84
Figure 4-6: Total migration distance of ENC and SWAP for various SPLASH-2 benchmarks. in FFT, ENC has not much di↵erent total migration distance from SWAPinf. When the ratio is lower, such as in LU, the migration distance for ENC is longer because it is more likely for a thread to migrate to non-native cores after it is evicted to its native core. This also explains why ENC has the most overhead in total migration distance under random migrations because the least number of migrations are going to native cores. Even when the migrating thread’s destination is often not its native core, ENC has an overall migration cost similar to SWAPinf as shown in LU, because it is less a↵ected by network congestion than SWAPinf. This is because ENC e↵ectively distributes network tra c over the entire network, by sending out threads to their native cores. Figure 4-7 shows how many cycles are spent on migration due to congestion, normalized to the SWAPinf case. ENC and ENC0 have less congestion costs under 85
Page 1:
On-chip Networks for Manycore Archi
Page 5 and 6:
Acknowledgments This thesis would n
Page 7 and 8:
Contents 1 Introduction 15 1.1 Chip
Page 9:
4.2.2 Evaluation with Synthetic Mig
Page 12 and 13:
3-11 BAN performance under bursty t
Page 15 and 16:
Chapter 1 Introduction 1.1 Chip Mul
Page 17 and 18:
However, a simple common bus is not
Page 19 and 20:
Name #Cores Technology Frequency Po
Page 21:
1.2.3 System-Level Optimization Whi
Page 24 and 25:
gestion dynamically, it may have lo
Page 26 and 27:
2-phase n-phase DOR O1TURN Valiant
Page 28 and 29:
B 1 1 D 0.5 A 0.5 0.5 S 0.5 Figure
Page 30 and 31:
instanceproma x instancepromb D x D
Page 32 and 33:
(a) West-First (rotated 180 ) (b) N
Page 34 and 35: proof allocvc D4 D1 D4 D1 S m/2+1:m
Page 36 and 37: 2.4 Experimental Results To evaluat
Page 38 and 39: Characteristic Configuration Topolo
Page 40 and 41: Using Exclusive Dynamic VC allocati
Page 42 and 43: 2.2 Bit-complement 2.4 Bit-reverse
Page 45 and 46: Chapter 3 Oblivious Routing in On-c
Page 47 and 48: andwidth of the link in one directi
Page 49 and 50: 3.2.2 Bidirectional Links In the co
Page 51 and 52: neighbor mesh network. As can be se
Page 53 and 54: ectional links. Consequently, the r
Page 55 and 56: approach: the arbiter makes its dec
Page 57 and 58: 3.4 Results and Comparisons 3.4.1 E
Page 59 and 60: Total Throughput (packets/cycle) 7
Page 61 and 62: 3.4.3 Non-bursty Synthetic Tra c wi
Page 63 and 64: Total Throughput (packets/cycle) 6
Page 65: from asymmetric bandwidth allocatio
Page 68 and 69: and is performed by the operating s
Page 70 and 71: a core can block contexts arriving
Page 72 and 73: Figure 4-2: Deadlock scenarios with
Page 74 and 75: N n →C n This node depends on not
Page 76 and 77: 4.3.1 The Basic ENC Algorithm (ENC0
Page 78 and 79: advance to the next cycle. This alg
Page 80 and 81: 4.4.2 Network-Independent Traces (N
Page 82 and 83: Protocols and Migration Patterns Mi
Page 86 and 87: Figure 4-7: Part of migration cost
Page 88 and 89: and where to migrate; a thread may
Page 90 and 91: 32KB D$ 8KB I$ Core Predictor Route
Page 92 and 93: instructions follow a custom stack-
Page 94 and 95: compromised performance for power e
Page 96 and 97: (a) Processor core (b) Migration pr
Page 98 and 99: locks. Figure 5-3 illustrates the a
Page 100 and 101: (a) A corner of global power rings
Page 102 and 103: Figure 5-9: Tapeout-ready EM 2 proc
Page 104 and 105: (a) Concentrate-in Figure 5-11: Mig
Page 106 and 107: 106
Page 108 and 109: deadlock-free, fine-grained thread
Page 110 and 111: [12] Intel Corporation. Intel deliv
Page 112 and 113: [38] Jason Howard, Saurabh Dighe, Y
Page 114 and 115: [63] Gordon E. Moore. Cramming more
Page 116: [88] Sriram Vangal, Nitin Borkar, a
show all

On-chip Networks for Manycore Architecture Myong ... - People - MIT

Create successful ePaper yourself

Delete template?

Save as template?