12.07.2015 Views

Unreliable Failure Detectors for Reliable Distributed Systems

Unreliable Failure Detectors for Reliable Distributed Systems

Unreliable Failure Detectors for Reliable Distributed Systems

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

234 T. D. CHANDRAAND S. TOUEGFIG. 1.AccuracyCompleteness Strong Weak EventualStrong EventuaIWeakStrong Perfect Strung Eventuoilp Perfect Eventuall~ Strvng9 Y 09 Oy’Weak Wenk EventuaU~ Weak9 w 02 OwEight classes of failure detectors defined in terms of accuracy and completeness.definitions of algorithms and runs, based on the <strong>for</strong>mal definitions given inChandra et al. [19921.11An algorithm A is a collection of n deterministic automata, one <strong>for</strong> eachprocess in the system. Computation proceeds in steps of A. In each step, aprocess (1) may receive a message that was sent to it, (2) queries its failuredetector module, (3) undergoes a state transition, and (4) may send a message toa single process. 12Since we model asynchronous systems, messages may experiencearbitrary (but finite) delays. Furthermore, there is no bound on relativeprocess speeds.A run of algorithm A using a failure detector QJis a tuple R = (F, Ha, I, S, T)where F is a failure pattern, HQ G ~(F) is a history of failure detector ~ <strong>for</strong>failure pattern F, 1 is an initial configuration of A, S is an infinite sequence ofsteps of A, and T is a list of increasing time values indicating when each step inS occurred. A run must satisfy certain well-<strong>for</strong>medness and fairness properties.In particular, (1) a process cannot take a step after it crashes, (2) when a processtakes a step and queries its failure detector module, it gets the current valueoutput by its local failure detector module, and (3) every process that is correctin F takes an infinite number of steps in S and eventually receives every messagesent to it.In<strong>for</strong>mally, a problem P is defined by a set of properties that runs must satisfy.An algorithm A solves a problem P using a failure detector ~ if all the runs of Ausing $3 satisfy the properties required by P. Let % be a class of failure detectors.Algorithm A solves problem P using % if <strong>for</strong> all !2dE %, A solves P using !3.Finally, we say that problem P can be solved using% if <strong>for</strong> all failure detectors 9c & there is an algorithm A that solves P using Q.We use the following notation. Let v be a variable in algorithm A. We denoteby UPprocess p’s copy of v. The history of v in run R is denoted by ~, that is,#(p, t) is the value of VPat time t in run R. We denote by 3P process p’s localfailure detector module. Thus, the value of !21Pat time t in run R = (F, H%, 1, S, T)is HQ(p, t).2.6. REDUCIBILITY. We now define what it means <strong>for</strong> an algorithm TQ -Q, totrans<strong>for</strong>m a failure detector Q into another failure detector $3’ (T9+Q, is calleda reduction algotidtm). Algorithm Ta +9, uses ~ to maintain a variable outputp atevery process p. This variable, which is part of the local state of p, emulates theoutput of Q‘ at p. Algorithm TQ -Q, trans<strong>for</strong>ms !23into Qil’if and only if <strong>for</strong> everyrun R = (F, HQ, 1, S, T) of TQ+Q, using S3, outpu~ E $3’(F). Note thatII Forma] definitions are necessag in Chandra et al. [1992] to Provea subtlelowerbound.12Chandraet al. [1992]assumethat eaclrstep is atomic, that is, indivisible with reSpeCtto failures.Furthermore, each process can send a message to all processes during such a step. These assumptionswere made to strengthen the lower bound result of Chandra et al, [1992].

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!