Unreliable Failure Detectors for Reliable Distributed Systems

More documents

Recommendations

Info

262 T. D. CHANDRAAND S. TOUEGfailure detectors can be grouped into four classes according to the actualaccuracy property that they satisfy:$fW(k): the class of Strong@ lc-Mists/cen failure detectors,Y’S: the class of Strongly Finitely Mistaken failure detectors,W g(k): the class of WeaMy k-MMaken failure detectors, andW%: the class of Weakly Finite~ Mistaken failure detectors.Clearly, 9’9(0) > 9W(1) > “. .2 !Y~(k) > Y%(k + 1) >. .-2 9’9. Asimilar order holds for the W%. Consider a system of n processes of which atmost f may crash. In this system, there are at least n – f correct processes. Sinceany failure detector !3 E 9’%( (n – f ) – 1) makes fewer mistakes thanthe number of correct processes, there is at least one correct process that$3 never suspects. Thus, $3 is also weakly O-mistaken, and we conclude thatY%((n – f) – 1) > WW(0). Furthermore, it is clear that $% > W~.These classes of repentant failure detectors can be ordered by reducibility intoan infinite hierarchy, which is illustrated in Figure Al (an edge - represents the5 relation). Each failure detector class defined in Section 2.4 is equivalent tosome class in this hierarchy. In particular, it is easy to show that:For example, it is easy to see that the algorithm in Figure 3 transforms anyfailure detector in W9 into one in OW. Other conversions are similar orstraightforward and are therefore omitted. Note that V and OW are the strongestand weakest failure detector classes in this hierarchy, respectively. From Corollaries6.1.9 and 7.1.8, and Observation A2.1, we have:COROLLARY A2.2. Consensus and Atomic Broadcast are solvable using %%(0)in asynchronous systems with f < n.Similarly, from Corollaries 6.2.4 and 7.1.8, and Observation A2. 1, we have:COROLLARY A2.3. Consensus and Atomic Broadcast are solvable using W% inasynchronous systems with f < (n/21.A3. Tight Bouna3 on Fault-ToleranceSince Consensus and Atomic Broadcast are equivalent in asynchronous systemswith any number of faulty processes (Corollary 7.1.7), we can focus on establishingfault-tolerance bounds for Consensus. In Section 6, we showed that failuredetectors with perpetual accuracy (i.e., in 9, $2.,Y’, or W) can be used to solveConsensus in asynchronous systems with any number of failures. In contrast, withfailure detectors with eventual accuracy (i.e., in 0!7’, 09, OY, or OW), Consensuscan be solved if and only if a majority of the processes are correct. We now refinethis result by considering each failure detector class % in our infinite hierarchy,and determining how many correct processes are necessary to solve Consensususing %. The results are illustrated in Figure Al.
Unreliable Failure Detectors for Reliable Distributed Systems 263.YW(0)~ ~ E $??(atron.srnt) .....Con*nsuasolvablef#~(l) .. ...Consensus solvable ifT f < nfor all J < n\Yq(2).....Conaensus solvable ifl ~ < n - I‘,WWltJ - 1)-c0n*n8us ~l=ble ifl f < ~~1+‘2Consensusnlvableforall~ n – f thenConsensus cannot be solved using Y%(m).PROOF (SKETCH). Consider an asynchronous system with f a [n/21 andassume m > n – f. We show that there is a failure detector 9 G ~~(m ) suchthat no algorithm solves Consensus using 9. We do so by describing the behaviorof a Strongly m-Mistaken failure detector Q such that for every algorithm A,there is a run R~ of A usin 9 that violates the specification of Consensus.Since 1 s n – ~ s m/2 7, we can partition the processes into three sets 110,II, and IIc,a,h,d, such that HO and 111 are non-empty sets containing n – fprocesses each, and I&h=d is a (possibly empty) set containing the remainingn – 2(n – f) processes. Henceforth, we only consider runs in which allprocesses in IIC,a,A,d crash at the beginning of the run. Let go E llo andq ~ c II ~. Consider the following two runs of A using $2:Run RO = (FO, Ho, l., SO, TO). All processes propose O. All processesin 110 are correct in FO, while all the f processes in HI U llcr.$h=~ crash in
Page 1 and 2: Unreliable Failure Detectors for Re
Page 4 and 5: 228 T. D. CHANDRAAND S. TOUEGTo do
Page 6 and 7: 230 T. D. CHANDRAAND S. TOUEGChandr
Page 8 and 9: 232 T. D. CHANDRAAND S. TOUEGInform
Page 10 and 11: 234 T. D. CHANDRAAND S. TOUEGFIG. 1
Page 12 and 13: 236 T. D. CHANDRA AND S. TOUEGEvery
Page 18 and 19: 242 T. D. CHANDRAAND S. TOUEGLEMMA
Page 20 and 21: 244 T. D. CHANDRA AND S. TOUEGother
Page 22 and 23: 246 T. D. CHANDRAAND S. TOUEGSince
Page 24 and 25: 248 T. D. CHANDRAAND S. TOUEGIn R~
Page 26 and 27: 250 T. D, CHANDRAAND S. TOUEGdelive
Page 28 and 29: 252 T. D. CHANDRAAND S. TOUEGpropos
Page 30 and 31: 254 T. D. CHANDRAAND S. TOUEGI 1I 1
Page 32 and 33: 256 T. D, CHANDRAAND S. TOUEGEvery
Page 34 and 35: 258 T. D. CHANDRA AND S. TOUEGdetec
Page 36 and 37: 260 T. D. CIMNDRA AND S, TOUEGconti
Page 40 and 41: 264 T. D. CHANDRA AND S. TOUEGFO at
Page 42 and 43: 266 T. D. CHANDRA AND S. TOUEGBERMA

Unreliable Failure Detectors for Reliable Distributed Systems

Create successful ePaper yourself

Delete template?

Save as template?