The Primary-Backup Approach: Lower and Upper Bounds
The most widely used approach to building replicated, fault-tolerant services is the primary-backup approach. In this approach, the state of the service is replicated across multiple servers, with one server designated as the primary and the rest as backups. Clients send requests only to the primary. However, in case the primary fails, one of the backups takes over as the new primary. Ever since it was introduced in 1976 by Alsberg and Day, the primary-backup approach has become the basis for building many practical fault-tolerant services. However, despite the widespread use, the approach has not been studied systematically, and little is known of the fundamental costs and tradeoffs of using the approach under various kinds of failures. Thus, there is a gap between theory and practice. In order to close this gap, this thesis analyzes the primary-backup approach, both from the theoretical perspective of specification, lower bounds and upper bounds, as well as from the practical viewpoint of performance tradeoffs in protocols. We identify three key cost metrics of primary-backup protocols--degree of replication, blocking time and failover time--and then show lower and upper bounds on these metrics for a hierarchy of failure models. We then implement an important subclass of our primary-backup protocols, called 0-blocking protocols, and give performance figures. In addition to leading to the development of new, more efficient protocols, we believe that the work in this thesis has resulted in a better understanding of the properties of existing primary-backup protocols.
computer science; technical report
Previously Published As