Techniques for Simplifying the Design of Distributed Systems
Neiger, Gilbert A.
Distributed computing systems offer a number of advantages over centralized systems, such as the replication of data and functionality, which may result in increased performance and fault-tolerance. The design of protocols for distributed systems is more complex than for centralized systems because coordination and cooperation between the different processors can be difficult to achieve. Among the factors complicating this design are the following: lack of processor synchronization, lack of common knowledge, and processor failures. This thesis presents techniques for simplifying the design of distributed systems by addressing these three complicating factors. Processor synchronization is provided by using logical clocks as if they are real-time (and hence, perfectly synchronized) clocks. This can be done in solutions to a large class of problems. Common knowledge is simulated by timestamped common knowledge, which is identical to true common knowledge in systems with perfectly synchronized clocks. A communication primitive, called publication, is defined which achieves timestamped common knowledge, and an implementation of publications is given that uses logical clocks. When solving problems in the class characterized earlier, publications can be used as if they achieve true common knowledge. The design of fault-tolerant protocols is simplified through methods that automatically translate protocols tolerant of benign failures into ones tolerant of more severe failures. The design task is reduced to that of designing simpler protocols.
computer science; technical report
Previously Published As