Approximating Perfect Failure Detectors in Asynchronous Distributed Systems

Other Titles


Systems that provide group membership services (e.g., Isis) can be used to solve many canonical problems in distributed systems, such as Consensus and Leader Election. However, solving these problems has been shown to require {\em failure detectors} of various strengths, all of which are impossible to implement in an asynchronous system. It has been shown that it is possible to solve Consensus in an asynchronous distributed system using a very weak failure detector. However, existing group membership systems do not appear to implement this type of failure detector. Furthermore, there are other problems that can be solved with group membership services, such as Leader Election, that are harder than Consensus and cannot be solved without a perfect failure detector. This leads to an apparent contradiction: how can provably impossible problems be solved in existing group membership systems? In this dissertation, we investigate {\em approximations} to perfect failure detectors in order to obtain acceptable and practical solutions to otherwise-impossible problems such as Election. We argue that approximating a perfect failure detector yields approximate solutions to these problems, and define a notion of approximation that yields solutions that are useful in practice. We give a formal specification of an approximately-perfect failure detector, give two protocols that implement our approximation, and derive upper bounds on the number of tolerable failures for any such protocol. The approximately-perfect failure detector presented in this dissertation is similar to that used in Isis; this implies that the failure detector implemented in group membership services such as Isis are actually approximately-perfect failure detectors, and that the resulting solutions to Consensus, Election, and other problems are approximate solutions. Furthermore, the protocols that implement our specification are very simple and can be easily implemented. Such an implementation could be used in existing or future group membership services, or by other systems programmers for whom group membership services are not necessary but for whom failure detection would be useful.

Journal / Series

Volume & Issue



Date Issued



Cornell University


computer science; technical report


Effective Date

Expiration Date




Union Local


Number of Workers

Committee Chair

Committee Co-Chair

Committee Member

Degree Discipline

Degree Name

Degree Level

Related Version

Related DOI

Related To

Related Part

Based on Related Item

Has Other Format(s)

Part of Related Item

Related To

Related Publication(s)

Link(s) to Related Publication(s)


Link(s) to Reference(s)

Previously Published As

Government Document




Other Identifiers


Rights URI


technical report

Accessibility Feature

Accessibility Hazard

Accessibility Summary

Link(s) to Catalog Record