Simulating Fail-Stop in Asynchronous Distributed Systems
The fail-stop failure model appears frequently in the distributed systems literature. However, in an asynchronous distributed system, the fail-stop model cannot be implemented. In particular, it is impossible to reliably detect crash failures in an asynchronous system. In this paper, we show that it is possible to specify and implement a failure model that is indistinguishable from the fail-stop model from the point of view of any process within an asynchronous system. We give necessary conditions for a failure model to be indistinguishable from the fail-stop model, and derive lower bounds on the amount of process replication needed to implement such a failure model. We present a simple one-round protocol for implementing one such failure model, which we call simulated fail-stop.