Tradeoffs in Implementing Primary-Backup Protocols
One way to implement a fault-tolerant service is to replicate the state of a server across a primary server and a set of backup servers. Clients make requests to the primary, which then computes the response, informs the backup of the state change, and then replies to the client. If the primary subsequently fails then a backup takes over as a new primary. Informally, the primary-backup protocol is nonblocking if the primary need not wait for acknowledgements from the backups before responding to the client. While most primary-backup protocols are blocking, we argue that non-blocking protocols can be constructed for most of the process and communication failures that are expected to occur in future communications systems. We then implement and measure the performance of two kinds of nonblocking protocols--one based on point-to-point communication and one based on broadcast--and compare the results with conventional blocking primary-backup protocols.