Designing Fault-Tolerant Algorithms for Distributed Systems Using Communication Primitives
Srikanth, T. K.
Fault-tolerance is an important requirement in distributed computing systems. However, designing applications for distributed systems is a difficult task, particularly when components of the system can fail. The difficulty of this task increases with the severity of failures encountered. Arbitrary process failures are generally much harder to overcome than failures that are restricted, e.g. where processes only fail by halting. Thus, techniques that restrict the disruptive behavior of faulty processes can greatly simplify the design of fault-tolerant algorithms. Such techniques effectively provide reduction mechanisms from one class of failures to a more benign class. Message authentication is an example of a technique that imposes restrictions on the bahavior of fault processes. This technique has been used to derive simple solutions to many problems of fault-tolerance for systems with arbitrary failures. To exploit the simplicity provided by authentication we present communication primitives that provide properties of authentication without using digital signatures. These primitives can also be extended to provide properties beyond those of authentication, thereby further restricting the types of faults that have to be overcome. (ABRIDGED ABSTRACT)
computer science; technical report
Previously Published As