12.07.2015 Views

Unreliable Failure Detectors for Reliable Distributed Systems

Unreliable Failure Detectors for Reliable Distributed Systems

Unreliable Failure Detectors for Reliable Distributed Systems

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

258 T. D. CHANDRA AND S. TOUEGdetectors must have in order to solve Consensus. The problem of implementing acertain type of failure detector in a specific model of partial synchrony becomesa separate issue; this separation af<strong>for</strong>ds greater modularity.Studying failure detectors rather than various models of partial synchrony hasother advantages as well. By showing that Consensus is solvable using a certaintype of failure detector we show that Consensus is solvable in all systems in whichthis type of failure detector can be implemented. An algorithm that relies on theaxiomatic properties of a failure detector is more general, more modular, andsimpler to understand than one that relies directly on specific operationalfeatures of partial synchrony (that can be used to implement this failuredetector).From this more abstract point of view, the question “What is the least amountof synchrony sufficient to solve Consensus?” translates to “What is the weakestfailure detector sufficient to solve Consensus?”. In contrast to Dolev et al.[1987], which identified a set of minimal models of partial synchrony in whichConsensus is solvable, Chandra et al. [1992] exhibit a single minimum failuredetector, OWO, that can be used to solve Consensus. The technical device thatmakes this possible is the notion of reduction between failure detectors.9.2. UNRELIABLE FAILURE DETECTION IN SHARED MEMORY SYSTEMS. Louiand Abu-Amara [1987] showed that in asynchronous shared memory systemswith atomic read/write registers, Consensus cannot be solved even if at most oneprocess may crash.26 This raises the following question: Can we use unreliablefailure detectors to circumvent this impossibility result?Lo and Hadzilacos [1994J showed that this is indeed possible: They gave analgorithm that solves Consensus using OW (in shared memory systems withregisters). This algorithm tolerates any number of faulty processes-in contrast toour result showing that in message-passing systems OW can be used to solveConsensus only if there is a majority of correct processes. Recently, Neiger [1995]extended the work of Lo and Hadzilacos by studying the conditions under whichunreliable failure detectors boost the Consensus power of shared objects.9.3. THE 1s[s TOOLKIT. With our approach, even if a correct process p isrepeatedly suspected to have crashed by the other processes, it is still required tobehave like every other correct process in the system. For example, with AtomicBroadcast, p is still required to A-deliver the same messages, in the same order,as all the other correct processes. Furthermore, p is not prevented fromA-broadcasting messages, and these messages must eventually be A-delivered byall correct processes (including those processes whose local failure detectormodules permanently suspect p to have crashed). In summary, applicationprograms that use unreliable failure detection are aware that the in<strong>for</strong>mationthey get from the failure detector may be incorrect: they only take thisin<strong>for</strong>mation as an imperfect “hint” about which processes have really crashed.Furthermore, processes are never “discriminated against” if they are falselysuspected to have crashed.26me proof in ~ui and Abrs.Arnara [1987] is similar to the proof that COnSenW5 Is impossibleinmessage-passing systems when send and receive are not part of the same atomic step [Dolev et al,1987],

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!