A Gossip-Style Failure Detection Service
Permanent Link(s)
Collections
Author
Van Renesse, Robbert
Minsky, Yaron
Hayden, Mark
Abstract
Failure Detection is valuable for system management, replication, load balancing, and other distributed services. To date, Failure Detection Services scale badly in the number of members that are being monitored. This paper describes a new protocol based on gossiping that does scale well and provides timely detection. We analyze the protocol, and then extend it to discover and leverage the underlying network topology for much improved resource utilization. We then combine it with another protocol, based on broadcast, that is used to handle partition failures.
Date Issued
1998-05
Publisher
Cornell University
Keywords
Previously Published as
http://techreports.library.cornell.edu:8081/Dienst/UI/1.0/Display/cul.cs/TR98-1687
Type
technical report