Unreliable Failure Detectors for Reliable Distributed Systems

More documents

Recommendations

Info

266 T. D. CHANDRA AND S. TOUEGBERMAN, P.,GARAY, J. A., AND PERRY, K. J. 1989. Towards optimal distributed consensus. InProceedings of the 30th Symposium on Foundations of Computer Science (Oct.). IEEE ComputerSociety Press, Washington, D.C., pp. 410-415.BiRAN,O., MORAN, S., AND~rcs, S. 1988. A combinatorialcharacterizationof the distributedtasks which are solvable in the presence of one faulty processor. In Proceedingsof the 7th ACMSymposium on Principlesof Distributed Computing (Toronto, Ont., Canada, Aug. 15-17). ACM,New York, pp. 263-275.BIRMAN, K. P., COOPER, R., JOSEPH, T. A., KANE, K. P., AND SCHMUCK,F. B. 1990. lsis-ADism”buted Programming Environment.BIRMAN, K. P., AND JOSEPH,T. A. 1987. Reliable communication in the presence of failures. ACMTrans. Compur. Sysf. 5, 1 (Feb.), 47-76.BRACHA, G., ANDTOUEG,S. 1985. Asynchronous consensus and broadcast protocols. J. ACM 32,4(Oct.), 824-840.BRIDGLAND,M. F., ANDWATRO,R. J. 1987. Fault-tolerant decision making in totally asynchronousdistributed systems. In Proceedings of the 6th ACIU Symposium on Principles of DistributedComputing (Vancouver, B.C., Canada, Aug. 10-12). ACM, New York, pp. 52-63.BUDHIRAJA,N., GOPAL, A., AND TOUEG, S. 1990. Early-stopping distributed bidding and applications.In Proceedings of the 4th International Workshop on Distributed Algon’thms (Sept.). Springer-Verlag, New York, pp. 301-320.CHANDRA, T. D., HADZILACOS, V., AND TOUEG, S. 1992. The weakest failure detector for solvingconsensus. Technical Report 92-1293 (July), Department of Computer Science, Cornell University.Available from ftp://ftp.cs.cornell. edu/pub/chandra/failure. detectom.weakest. dvi.Z. A preliminaryversion appeared in the Proceedings of the 1lth ACM Symposium on Principles of Distn”butedComputing (Vancouver, B.C., Canada, Aug. 10-12). ACM, New York, pp. 147-158.CHANDRA, T. D., HADZILACOS, V., ANDTOUEG, S. 1995. Impossibility of group membership inasynchronous systems. Tech. Rep. 95-1533. Computer Science Department, Cornell University,Ithaca, New York.CHANDRA,T. D., ANDLARREA,M. 1994. E-mail correspondence. Showed that OW cannot be usedto solve non-blocking atomic commit.CHANDRA, T. D., AND TOUEG, S. 1990. Time and message efficient reliable broadcasts. InProceedings of the Fourth International Workshop on Distributed Algorithms (Sept.). Springer-Verlag,New York, pp. 289-300.CHANG, J., ANDMAXEMCHUK,N. 1984. ReliabIe broadcast protocols. ACM Trans. Comput. Syst. 2,3 (Aug.), 251-273.CHOR, B., AND DWORK, C. 1989. Randomization in byzantine agreement. Adv. Compur. Res. 5,443-497.CRISTIAN, F. 1987. Issues in the design of highly available computing setvices. In Annual Symposiumof the Canadian information Processing Society (July), pp. 9–16. Also IBM Res. Rep. RJ5856.Thomas J. Watson Research Center, Hawthorne, N.Y,CRIST~AN,F., AGHILI, H., STRONG,R., ANDDOLEV, D. 1985/1989. Atomic broadcast: From simplemessage diffusion to Byzantine agreement. In Proceedings of the 15th international Symposium onFault- Tolemnt Computing (June 1985), pp. 200-206. A revised version appears as IBM ResearchLaboratory Technical Report RJ5244 (April 1989). Thomas J. Watson Research Center, Hawthorne,N.Y.CRtSTLAN,F., DANCEY,R. D., ANDDEHN, J. 1990. Fault-tolerance in the advanced automationsystem. Tech. Rep. RJ 7424 (April), IBM Research Laboratory, Thomas J. Watson ResearchCenter, Hawthorne, N.Y.DOLEV, D., DWORK, C., ANCISTOCKMEYER,L. 1987. On the minimal synchronism needed fordistributed consensus. J. ACM 34, 1 (Jan.), 77-97.DOLEV, D., LYNCH, N. A., PINTER, S. S., STARK, E. W., AND WEIHL, W. E. 1986. Reachingapproximate agreement in the presence of faults. J. ACM 33, 3 (July), 499-516.DWORK, C., LYNCH, N. A., AND STOCKMEYER,L. 1988. Consensus in the presence of partialsynchrony. J. ACM 35, 2 (Apr.), 288-323.FISCHER,M. J. 1983. The consensus problem in unreliable distributed systems (a brief survey).Tech. Rep. 273 (June), Department of Computer WIence, Yale University, New Haven, Corm.FISCHER,M. J., LYNCH,N. A., ANDPATERSON,M. S. 1985. Impossibility of distributed consensuswith one faulty process. J. ACM 32, 2 (Apr.), 374–382,
Unreliable Failure Detectors for Reliable Distributed Systems 267GOPAL, A., STRONG,R., TOUEG, S., AND CRISTIAN,F. 1990. Early-delivery atomic broadcast. InProceedings of the 9th ACM Symposium on Principles of Distributed Computing (Quebec City, Que,,Canada, Aug. 22-24). ACM, New York, pp. 297-310.GUERRAOUI,R. 1995. Revisiting the relationship between non blocking atomic commitment andconsensus. [n Proceedings of the 9th International Workshop on Distributed Algorithms (Sept.).Springer-Verlag, New York, pp. 87-100.HADZILACOS, V.,AND TOUEG, S, 1993. Fault-tolerant broadcasts and related problems. In DistributedSysrerns, Chap. 5, S. J, MULLENDER,Ed,, Addison-Wesley, Reading, Mass., pp. 97–145,HADZILACOS, V,, AND TOUEG, S. 1994. A modular approach to fault-tolerant broadcasts andrelated problems. Tech. Rep. 94-1425 (May), Computer Science Department, Cornell University,Ithaca. NY. Available by anonymous ftp from ftp://ftp.db.toronto. edu/pub/vassos/fault. tolerant.broadcasts. dvi.Z. (An earlier version is also available in Hadzilacos and Toueg [1993]),HALPERN, J. Y., AND MOSES, Y. 1990. Knowledge and common knowledge in a distributedenvironment. J. ACM 37, 3 (July), 549–587,LAMPORT, L. 1978. The implementation of reliable distributed multiprocess systems. Comput.Netw, 2, 95-114.LAMPORT, L., SHOSTAK,R., AND PEASE, M. 1982. The Byzantine generals problem. ACM Trans.Prog. Lang. Syst, 4, 3 (July), 382-401.Lo, W, K., AND HADZILACOS,V. 1994. Using failure detectors to solve consensus in asynchronousshared-memory systems. In Proceedings of the 8th International Workshop on Distributed Algon”thms(Sept.), Springer-Verlag, New York, pp. 280-295. Available from ftp://ftp,db.toronto. edu/pub/vassos/failure. detectors. shared. memory, ps.Z.LGUI, M., AND ABU-AMARA. 1987. Memory requirements for agreement among unreliable asynchronousprocesses, Adv. Compur. Res. 4, 163–1 83.MOSES,Y., DOLEV, D., AND HALPERN, J. Y. 1986. Cheating husbands and other stories: a casestudy of knowledge, action, and communication. Dishib. Compur. 1, 3, 167-176,MULLENDER.S. J., ED. 1987. The Amoeba Distributed Operating System. Seiected papers 1984-1987.Centre for Mathematics and Computer Science.NEIGER, G. 1995. Failure detectors and the wait-free hierarchy. In Proceedings of the 14th ACMSymposium on Principles of Diswibuted Computing (Ottawa, Ont. Canada, Aug.). ACM, New York,pp. 10(-109.NEIGER, G., AND TOUEG, S. 1990. Automatically increasing the fault-tolerance of distributedalgorithms. J. Algon”thms 11, 3 (Sept.), 374–419.PEASE,M., SHOSTAK,R., AND LAMPORT, L. 1980. Reaching agreement in the presence of faults. J.ACM 27, 2 (Apr.), 228-234.PETERSON,L. L., BUCHOLZ, N. C., AND SCHLICHTING,R. D. 1989. Preserving and using contextinformation in interprocess communication. ACM Trans. Comput. Syst. 7, 3 (Aug.), 217–246.PIITELLI, F., ANDGARCIA-M•LINA,H. 1989. Reliable scheduling in a tmr database system. ACMTrans. Compur. Syst. 7, 1 (Feb.), 25-60.POWELL, D., ED. 1991. Delta-4: A Generic Architecture for Dependable Distributed Computing.Springer-Verlag, New York.REISCHUK,R. 1982, A new solution for the Byzantine general’s problem. Tech. Rep. RJ 3673(Nov.), IBM Research Laboratory, Thomas J, Watson Research Center, Hawthorne, N.Y,RICCIARDI, A,, ANDBIRMAN,K. P. 1991. Using process groups to implement failure detection inasynchronous environment ts. Jn Proceedings of the IOth ACM Symposium on Principles of DrktnbutedComputing (Montreal, Que., Canada, Aug. 19-21). ACM, New York, pp. 341-354.SABEL, L,, ANDMARZULLO,K. 1995. Election vs. consensus in asynchronous systems. Tech. Rep.TR95-411 (Feb.). Univ. California at San Diego. San Diego, Calif. Available at ftp://ftp.cs.cornell.edu/pub/sabel/tr94-1413.ps.SCHNE]D~R,F. B. 1990. Implementing fault-tolerant services using the state machine approach: Atutorial. ACM Cornput. Surv. 22, 4 (Dec.), 299–319.WENSLEY, J. H., LAMPORT, L., GOLDBERG,J., GREEN, M. W., LEVITT, K. N., MELLIAR-SMITH, P.,SHOSTAK, R. E., AND WEINSTOCK, C. B. 1978. SIFT Design and analysis of a fault-tolerantcomputer for aircraft control. Proc. IEEE 66, 10 (Oct.), 1240–1255.RECEIVEDJULY 1993; REVISEDMARCH1995; ACCEPTEDOCTOBER1995Journal of the ACM, Vol. 43, No 2, March 1996.
Page 1 and 2: Unreliable Failure Detectors for Re
Page 4 and 5: 228 T. D. CHANDRAAND S. TOUEGTo do
Page 6 and 7: 230 T. D. CHANDRAAND S. TOUEGChandr
Page 8 and 9: 232 T. D. CHANDRAAND S. TOUEGInform
Page 10 and 11: 234 T. D. CHANDRAAND S. TOUEGFIG. 1
Page 12 and 13: 236 T. D. CHANDRA AND S. TOUEGEvery
Page 18 and 19: 242 T. D. CHANDRAAND S. TOUEGLEMMA
Page 20 and 21: 244 T. D. CHANDRA AND S. TOUEGother
Page 22 and 23: 246 T. D. CHANDRAAND S. TOUEGSince
Page 24 and 25: 248 T. D. CHANDRAAND S. TOUEGIn R~
Page 26 and 27: 250 T. D, CHANDRAAND S. TOUEGdelive
Page 28 and 29: 252 T. D. CHANDRAAND S. TOUEGpropos
Page 30 and 31: 254 T. D. CHANDRAAND S. TOUEGI 1I 1
Page 32 and 33: 256 T. D, CHANDRAAND S. TOUEGEvery
Page 34 and 35: 258 T. D. CHANDRA AND S. TOUEGdetec
Page 36 and 37: 260 T. D. CIMNDRA AND S, TOUEGconti
Page 38 and 39: 262 T. D. CHANDRAAND S. TOUEGfailur
Page 40 and 41: 264 T. D. CHANDRA AND S. TOUEGFO at

Unreliable Failure Detectors for Reliable Distributed Systems

Create successful ePaper yourself

Delete template?

Save as template?