On-chip Networks for Manycore Architecture Myong ... - People - MIT

More documents

Recommendations

Info

4.3.1 The Basic ENC Algorithm (ENC0) Whenever a thread needs to move from a non-native core to a destination core, ENC0 first sends the thread to its native core which has a dedicated resource for the thread. If the destination core is not the native core, the thread will then move from its native core to the destination core. Therefore, from a network standpoint, a thread movement either ends at its native core or begins from its native core. Since a thread arriving at its native core is guaranteed to be unloaded from the network, any migration is fully unloaded (and therefore momentarily occupies no network resources) somewhere along its path. To keep the migrations deadlock-free, however, we must also ensure that movements destined for a native core actually get there without being blocked by any other movements; otherwise the native-core movements might never arrive and be unloaded from the network. The most straightforward way of ensuring this is to use two sets of virtual channels, one for to-native-core tra c and the other for from-native-core tra c. If the baseline routing algorithm requires only one virtual channel to prevent network-level deadlock like dimension-order routing, ENC0 requires a minimum of two virtual channels per link to provide protocol-level deadlock avoidance. Note that ENC0 may work with any baseline routing algorithm for a given source-destination pair, such as Valiant [87] or O1TURN [76], both of which require two virtual channels to avoid deadlock. In this case, ENC0 will require four virtual channels. 4.3.2 The Full ENC Algorithm Although ENC0 is simple and straightforward, it su↵ers the potential overhead of introducing an intermediate destination for each thread migration: if thread T wishes to move from core A to B, it must first go to N, the native core for T . In some cases, this overhead might be significant: if A and B are close to each other, and N is far away, the move may take much longer than if it had been a direct move. To reduce this overhead, we can augment the ENC0 algorithm by distinguishing migrating tra c and evicted tra c: the former consists of threads that wish to mi- 76
grate on their own because, for example, they wish to access resources in a remote core, while the latter corresponds to the threads that are evicted from a core by another arriving thread. Whenever a thread is evicted, ENC, like ENC0, sends the thread to its native core, which is guaranteed to accept the thread. We will not therefore have a chain of evictions: even if the evicted thread wishes to go to a di↵erent core to make progress (e.g., return to the core it was evicted from), it must first visit its native core, get unloaded from the network, and then move again to its desired destination. Unlike ENC0, however, whenever a thread migrates on its own accord, it may go directly to its destination without visiting the home core. (Like ENC0, ENC must guarantee that evicted tra c is never blocked by migrating tra c; as before, this requires two sets of virtual channels). Note that network packets always travel within the same set of virtual channels. Based on these policies, the ENC migration algorithm works as follows: 1. If a native context has arrived and is waiting on the network, move it to a reserved register file and proceed to Step 3. 2. (a) If a non-native context is waiting on the network and there is an available register file for non-native contexts, move the context to the register file and proceed to Step 3. (b) If a non-native context is waiting on the network and all the register files for non-native contexts are full, choose one among the threads that have finished executing an instruction on the core 1 and the threads that want to migrate to other cores. Send the chosen thread to its native core on the virtual channel set for evicted tra c. Then, advance to the next cycle. (No need for Step 3). 3. Among the threads that want to migrate to other cores, choose one and send it to the desired destination on the virtual channel set for migrating tra c. Then, 1 No instructions should be in flight. 77
Page 1:
On-chip Networks for Manycore Archi
Page 5 and 6:
Acknowledgments This thesis would n
Page 7 and 8:
Contents 1 Introduction 15 1.1 Chip
Page 9:
4.2.2 Evaluation with Synthetic Mig
Page 12 and 13:
3-11 BAN performance under bursty t
Page 15 and 16:
Chapter 1 Introduction 1.1 Chip Mul
Page 17 and 18:
However, a simple common bus is not
Page 19 and 20:
Name #Cores Technology Frequency Po
Page 21:
1.2.3 System-Level Optimization Whi
Page 24 and 25:
gestion dynamically, it may have lo
Page 26 and 27: 2-phase n-phase DOR O1TURN Valiant
Page 28 and 29: B 1 1 D 0.5 A 0.5 0.5 S 0.5 Figure
Page 30 and 31: instanceproma x instancepromb D x D
Page 32 and 33: (a) West-First (rotated 180 ) (b) N
Page 34 and 35: proof allocvc D4 D1 D4 D1 S m/2+1:m
Page 36 and 37: 2.4 Experimental Results To evaluat
Page 38 and 39: Characteristic Configuration Topolo
Page 40 and 41: Using Exclusive Dynamic VC allocati
Page 42 and 43: 2.2 Bit-complement 2.4 Bit-reverse
Page 45 and 46: Chapter 3 Oblivious Routing in On-c
Page 47 and 48: andwidth of the link in one directi
Page 49 and 50: 3.2.2 Bidirectional Links In the co
Page 51 and 52: neighbor mesh network. As can be se
Page 53 and 54: ectional links. Consequently, the r
Page 55 and 56: approach: the arbiter makes its dec
Page 57 and 58: 3.4 Results and Comparisons 3.4.1 E
Page 59 and 60: Total Throughput (packets/cycle) 7
Page 61 and 62: 3.4.3 Non-bursty Synthetic Tra c wi
Page 63 and 64: Total Throughput (packets/cycle) 6
Page 65: from asymmetric bandwidth allocatio
Page 68 and 69: and is performed by the operating s
Page 70 and 71: a core can block contexts arriving
Page 72 and 73: Figure 4-2: Deadlock scenarios with
Page 74 and 75: N n →C n This node depends on not
Page 78 and 79: advance to the next cycle. This alg
Page 80 and 81: 4.4.2 Network-Independent Traces (N
Page 82 and 83: Protocols and Migration Patterns Mi
Page 84 and 85: RANDOM FFT RADIX LU OCEAN WATER 8 6
Page 86 and 87: Figure 4-7: Part of migration cost
Page 88 and 89: and where to migrate; a thread may
Page 90 and 91: 32KB D$ 8KB I$ Core Predictor Route
Page 92 and 93: instructions follow a custom stack-
Page 94 and 95: compromised performance for power e
Page 96 and 97: (a) Processor core (b) Migration pr
Page 98 and 99: locks. Figure 5-3 illustrates the a
Page 100 and 101: (a) A corner of global power rings
Page 102 and 103: Figure 5-9: Tapeout-ready EM 2 proc
Page 104 and 105: (a) Concentrate-in Figure 5-11: Mig
Page 106 and 107: 106
Page 108 and 109: deadlock-free, fine-grained thread
Page 110 and 111: [12] Intel Corporation. Intel deliv
Page 112 and 113: [38] Jason Howard, Saurabh Dighe, Y
Page 114 and 115: [63] Gordon E. Moore. Cramming more
Page 116: [88] Sriram Vangal, Nitin Borkar, a
show all

On-chip Networks for Manycore Architecture Myong ... - People - MIT

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?