Heartbeat: A Timeout-Free Failure Detector for Quiescent Reliable Communication 

Marcos Kawazoe Aguilera

Wei Chen

Sam Toueg

Department of Computer Science Upson Hall, Cornell University Ithaca, NY 14853-7501, USA. aguilera,weichen,sam@cs.cornell.edu

May 30, 1997

Abstract

We study the problem of achieving reliable communication with quiescent algorithms (i.e., algorithms that

eventually stop sending messages) in asynchronous systems with process crashes and lossy links. We ﬁrst show

that it is impossible to solve this problem without failure detectors. We then show how to solve it using a new

failure detector, called heartbeat. In contrast to previous failure detectors that have been used to circumvent

impossibility results, the heartbeat failure detector is implementable, and its implementation does not use

timeouts. These results have wide applicability: they can be used to transform many existing algorithms that

tolerate only process crashes into This can be applied to consensus,

aqtuoimesiccebnrtoaaldgcoarsitth, m¡ -ssetht aatgtroeleemraetnet,baottohmpircocceosmsmcriatmsheenst,aentdc.message

losses.

The heartbeat failure detector is novel: besides being implementable without timeouts, it does not output

lists of suspects as typical failure detectors do. If we restrict failure detectors to output only lists of suspects,
quiescent reliable communication requires ¢¤£ [ACT97a], which is not implementable. Combined with

the results of this paper, this shows that traditional failure detectors that output only lists of suspects have

fundamental limitations.

1 Motivation
This paper introduces heartbeat, a failure detector that can be implemented without timeouts, and shows how it can be used to solve the problem of quiescent reliable communication in asynchronous message-passing systems with process crashes and lossy links.
To illustrate this problem consider two processes, a sender ¥ and a receiver ¦ , connected by an asynchronous bidirectional link. Process ¥ wishes to send some message § to ¦ . Suppose ﬁrst that no process may crash, but the link between ¥ and ¦ may lose messages (in both directions). If we put no restrictions on message losses it is obviously impossible to ensure that ¦ receives § . An assumption commonly made to circumvent this problem is
that the link is fair: if a message is sent inﬁnitely often then it is received inﬁnitely often.
With such a link, ¥ could repeatedly send copies of § forever, and ¦ is guaranteed to eventually receive § . This is impractical, since ¥ never stops sending messages. The obvious ﬁx is the following protocol: (a) ¥ sends a copy of § repeatedly until it receives ¨©§ from ¦ , and (b) upon each receipt of § , ¦ sends ¨©§ back to ¥ .
Note that this protocol is quiescent: eventually no process sends or receives messages.
Research partially supported by NSF grant CCR-9402896, by ARPA/ONR grant N00014-96-1-1014, and by an Olin Fellowship.
1

The situation changes if, in addition to message losses, process crashes may also occur. The protocol above still
works, but it is not quiescent anymore: for example, if ¦ crashes before sending ¨!©"#§ , then ¥ will send copies of § forever. Is there a quiescent protocol ensuring that if neither ¥ nor ¦ crashes then ¦ eventually receives § ?
It turns out that the answer is no, even if one assumes that the link can only lose a ﬁnite number of messages.
Since process crashes and message losses are common types of failures, this negative result is an obstacle to the design of fault-tolerant distributed systems. In this paper, we explore the use of unreliable failure detectors to circumvent this obstacle. Roughly speaking, unreliable failure detectors provide (possibly erroneous) hints on the operational status of processes. Each process can query a local failure detector module that provides some information about which processes have crashed. This information is typically given in the form of a list of suspects. In general, failure detectors can make mistakes: a process that has crashed is not necessarily suspected and a process may be suspected even though it has not crashed. Moreover, the local lists of suspects dynamically change and lists of different processes do not have to agree (or even eventually agree). Introduced in [CT96], the abstraction of unreliable failure detectors has been used to solve several important problems such as consensus, atomic broadcast, group membership, non-blocking atomic commitment, and leader election [BDM97, DFKM96, Gue95, LH94, SM95].
Our goal is to use unreliable failure detectors to achieve quiescence, but before we do so we must address the following important question. Note that any reasonable implementation of a failure detector in a message-passing system is itself not quiescent: A process being monitored by a failure detector must periodically send a message to indicate that it is still alive, and it must do so forever (if it stops sending messages it cannot be distinguished from a process that has crashed). Given that failure detectors are not quiescent, does it still make sense to use them as a tool to achieve quiescent applications (such as quiescent reliable broadcast, consensus, or group membership)?
The answer is yes, for two reasons. First, a failure detector is intended to be a basic system service that is shared by many applications during the lifetime of the system, and so its cost is amortized over all these applications. Second, failure detection is a service that needs to be active forever — and so it is natural that it sends messages forever. In contrast, many applications (such as a single RPC call or the reliable broadcast of a single message) should not send messages forever, i.e., they should be quiescent. Thus, there is no conﬂict between the goal of achieving quiescent applications and the use of a (non-quiescent) failure detection service as a tool to achieve this goal.
How can we use an unreliable failure detector to achieve quiescent reliable communication in the presence of
process and link failures? Consider the Eventually Perfect failure detector $&% [CT96]. Intuitively, $&% satisﬁes
the following two properties: (a) if a process crashes then there is a time after which it is permanently suspected,
and (b) if a process does not crash then there is a time after which it is never suspected. Using $'% , the following obvious algorithm solves our sender/receiver example: (a) while ¥ has not received ¨©§ from ¦ , it periodically does the following: ¥ queries $&% and sends a copy of § to ¦ if ¦ is not currently suspected; (b) upon each receipt of § , ¦ sends ¨!©§ back to ¥ . Note that this algorithm is quiescent: eventually no process sends or receives
messages.
In [ACT97a], Aguilera et al. show that among all failure detectors that output lists of suspects, $&% is the weakest one that can be used to solve the above problem.1 Unfortunately, $'% is not implementable in asynchronous
systems with process crashes (this would violate a known impossibility result [FLP85, CT96]). Thus, at a ﬁrst glance, it seems that achieving quiescent reliable communication requires a failure detector that cannot be implemented. In this paper we show that this is not so.
1See [CHT96] for the concept of “weakest” for failure detectors.
2

2 The Heartbeat Failure Detector
We will show that quiescent reliable communication can be achieved with a failure detector that can be implemented without timeouts in systems with process crashes and lossy links. This failure detector, called heartbeat and
denoted (0) , is very simple. Roughly speaking, the failure detector module of (1) at a process 2 outputs a vector of counters, one for each neighbor 3 of 2 . If neighbor 3 does not crash, its counter increases with no bound. If 3 crashes, its counter eventually stops increasing. The basic idea behind an implementation of (0) is the
obvious one: each process periodically sends an I-am-alive message (a “heartbeat”) and every process receiving a heartbeat increases the corresponding counter.2
Note that (1) does not use timeouts on the heartbeats of a process in order to determine whether this process has failed or not. (1) just counts the total number of heartbeats received from each process, and outputs these “raw”
counters without any further processing or interpretation.
Thus, (1) should not be confused with existing implementations of failure detectors (some of which, such as those
in Ensemble and Phoenix, have modules that are also called heartbeat [vR97, Cha97]). Even though existing failure detectors are also based on the repeated sending of a heartbeat, they use timeouts on heartbeats in order
to derive lists of processes considered to be up or down; applications can only see these lists. In contrast, (0)
simply counts heartbeats, and shows these counts to applications.
A remark is now in order regarding the practicality of (1) . As we mentioned above, (0) outputs a vector of
unbounded counters. In practice, these unbounded counters are not a problem for the following reasons. First,
they are in local memory and not in messages — our (1) implementations use bounded messages (which are
actually quite short). Second, if we bound each local counter to 64 bits, and assume a rate of one heartbeat per
nanosecond, which is orders of magnitude higher than currently used in practice, then (0) will work for more
than 500 years.
(0) can be used to solve the problem of quiescent reliable communication and it is implementable, but its counters
are unbounded. Can we solve this problem using a failure detector that is both implementable and has bounded output? [ACT97a] proves that the answer is no: The weakest failure detector with bounded output that can be
used to solve quiescent reliable communication is $54 . Thus, the difference between (0) , whose output is unbounded, and existing failure detectors, whose output is
bounded, is more than “skin deep”. The results in this paper combined with those of [ACT97a], show that failure detectors with bounded output (including those that output lists of processes) are restricted in power and/or applicability.
3 Outline of the Results
We focus on two types of reliable communication mechanisms: quasi reliable send/receive and reliable broadcast. Roughly speaking, a pair of send/receive primitives is quasi reliable if it satisﬁes the following property: if
processes ¥ and ¦ are correct (i.e., they do not crash), then ¦ receives a message from ¥ exactly as many times as ¥ sent that message to ¦ . Reliable broadcast [HT94] ensures that if a correct process broadcasts a message § then all correct processes deliver § ; moreover, all correct processes deliver the same set of messages.
We ﬁrst show that there is no quiescent implementation of quasi reliable send/receive or of reliable broadcast in a network with process crashes and message losses. This holds even if we assume that links can lose only a ﬁnite number of messages. We then show how to use failure detectors to circumvent the above impossibility result. We describe failure
2As we will see, however, in some types of networks the actual implementation is not entirely trivial.
3

p

sq

t

p q

s

r r

(a) simple network case

(b) general network case

Figure 1: Examples of the simple and general network cases

link is fair link is not fair

detector (1) , and show that it is strong enough to achieve quiescent reliable communication, but weak enough to
be implementable, in each one of the following two types of communication networks. In both types of networks, we assume that each correct process is connected to every other correct process through a fair path, i.e., a path containing only fair links and correct processes.3 In the ﬁrst type, all links are bidirectional and fair (Fig. 1a). In the second one, some links are unidirectional, and some links have no restrictions on message losses, i.e., they are not fair (Fig. 1b). Examples of such networks are networks that contain several unidirectional rings that intersect.
For each network type, we ﬁrst describe quiescent protocols that use (0) to solve quasi reliable send/receive and reliable broadcast, and then show how to implement (1) . For the ﬁrst type of networks, a common one in practice, the implementation of (1) and the reliable communication protocols are very simple and efﬁcient. The
algorithms for the second type are signiﬁcantly more complex.
We also brieﬂy consider two stronger types of communication primitives, namely, reliable send and receive, and
uniform reliable broadcast, and give quiescent implementations that use (0) . These implementations assume
that a majority of processes are correct (a result in [BCBT96] shows that this assumption is necessary).
We then explain how (0) can be used to easily transform many existing algorithms that tolerate process crashes
into quiescent algorithms that tolerate both process crashes and message losses (fair links). This transformation can be applied to the algorithms for consensus in [Ben83, Rab83, BT85, CMS89, FM90, AT96, CT96], for
atomic broadcast in [CT96], for  -set agreement in [Cha93], for atomic commitment in [Gue95], for approximate agreement in [DLP6 86], etc.
Finally, we show that (1) can be used to extend the work in [BCBT96] to obtain the following result. Let 4 be a problem. Suppose 4 is correct-restricted (i.e., its speciﬁcation refers only to the behavior of correct processes) or a majority of processes are correct. If 4 is solvable with a quiescent protocol that tolerates only process crashes, then 4 is also solvable with a quiescent protocol that tolerates process crashes and message losses.4
To summarize, the main contributions of this paper are:
3This assumption precludes permanent network partitioning. 4The link failure model in [BCBT96] is slightly different from the one used here (cf. Section 11).

4

1. This is the ﬁrst work that explores the use of unreliable failure detectors to achieve quiescent reliable communication in the presence of process crashes and lossy links — a problem that cannot be solved without failure detection.
2. We describe a simple and implementable failure detector (0) that can be used to solve this problem.
3. (1) can be used to extend existing algorithms for many fundamental problems (e.g., consensus, atomic broadcast,  -set agreement, atomic commitment, approximate agreement) to tolerate message losses. It can
also be used to extend the results of [BCBT96].
4. (1) is novel: it is implementable without timeouts, and it does not output lists of suspects as typical failure
detectors do [BDM97, CT96, Gue95, GLS95, LH94, SM95]. The results of this paper, combined with those in [ACT97a], show that lists of suspects is not always the best failure detector output.5
Reliable communication is a fundamental problem that has been extensively studied, especially in the context of data link protocols (see Chapter 22 of [Lyn96] for a compendium). Our work differs from previous results by focusing on the use of unreliable failure detectors to achieve quiescent reliable communication in the presence of process crashes and link failures. The work by Basu et al. in [BCBT96] is the closest to ours, but their protocols
do not use failure detectors and are not quiescent. In Section 11, we use (1) to extend the results of [BCBT96]
and obtain quiescent protocols. The paper is organized as follows. Our model is given in Section 4. Section 5 deﬁnes the reliable communication primitives that we focus on. In Section 6, we show that, without failure detectors, quiescent reliable communication is impossible. To overcome this problem, we deﬁne heartbeat failure detectors in Section 7, we show how to use them to achieve quiescent reliable communication in Section 8, and show how to implement them in Section 9. In Section 10, we consider two stronger types of communication primitives. In Section 11, we explain how to use heartbeat failure detectors to extend several previous results. In Section 12, we mention a generalization of our results for the case where the network may partition. A brief discussion of protocol quiescence versus protocol termination concludes the paper.
4 Model
We consider asynchronous message-passing distributed systems in which there are no timing assumptions. In particular, we make no assumptions on the time it takes to deliver a message, or on relative process speeds. Processes can communicate with each other by sending messages through the network. We do not assume that the network is completely connected or that the links are bidirectional. The system can experience both process failures and link failures. Processes can fail by crashing, and links can fail by dropping messages. To simplify the presentation of our model, we assume the existence of a discrete global clock. This is merely a
ﬁctional device: the processes do not have access to it. We take the range 7 of the clock’s ticks to be the set of
natural numbers.
4.1 Processes and Process Failures
The system consists of a set of 8 processes, Π 9A@ 1BDCECECDBF8HG . Processes can fail by crashing, i.e., by prematurely halting. A failure pattern I is a function from 7 to 2Π, where IPQR denotes the set of processes that have crashed through time Q . Once a process crashes, it does not “recover”, i.e., STQ : IUQVXWYIUQa` 1 . We deﬁne
5The authors of [CHT96] anticipated this possibility: they put no restrictions on the output of unreliable failure detectors when they determine the weakest one necessary to solve consensus.
5

ibncedgI fehpaienq drIsift29v u&bewyxccVieIPbE rQRIs 

and we

beccViebE rIsa9 Π  say 2 is correct in

bcedfeh!ieq I.

I5

.

If 2

bDcVdfehpieq rI5 we say 2

crashes (or is faulty)

4.2 Links and Link Failures

Some pairs of processes in the network are connected through unidirectional links. If there is a link from process

2 to process
of neighbors

3

o,fw2 eisdedneontoetetdhibsylin kibyhdR2c e2#3

, .

and

if,

in

addition,

39 2

we say that 3

is a neighbor of 2

. The set

With every link 2f3 we associate two primitives: sendggh ig§ and receivei"h gp§ . We say that process 2 sends message § to process 3 if 2 invokes sendggh ig§ . We assume that if 2 is correct, it eventually returns from this invocation. We allow process 2 to send the same message § more than once through the same link. We say that process 3 receives message § from process 2 if 3 returns from the execution of receiveieh gd§ . We describe a link 2f3 by the properties that its sendggh i and receivei"h g primitives satisfy. We assume that links do not create messages, i.e., every link 20j3 in the network satisﬁes:

k Integrity: For all 0l 1, if 3 receives § from 2 times, then 2 previously sent § to 3 at least  times.

A lossy link can fail by dropping messages. A link 2mn3 is fair if sendggh i and receiveieh g satisfy Integrity and: k Fairness: If 3 is correct and 2 sends § to 3 an inﬁnite number of times, then 3 receives § from 2 an inﬁnite
number of times.

4.3 Network Connectivity
A path e2 1 BDCECDCEB2po is fair if processes 2 1 BECDCECBq2ro are correct and links 2 1 f2 2, CDCEC , 2roEs 1 f2ro are fair. We
assume that every pair of distinct correct processes is connected through a fair path. Without loss of generality, we can assume that this path is simple (i.e., no process appears twice in that path).
4.4 Failure Detectors
Each process has access to a local failure detector module that provides (possibly incorrect) information about the failure pattern that occurs in an execution. A process can query its local failure detector module at any time.
A failure detector history t with range u is a function from Π v7 to u . twe2HBxQV is the output value of the failure detector module of process 2 at time Q . A failure detector y is a function that maps each failure pattern I to a set of failure detector histories with range uz (where u{z denotes the range of failure detector outputs of y ). ymrIs denotes the set of possible failure detector histories permitted by y for the failure pattern I . We stress that the output of a failure detector depends only on the failure pattern I ; it cannot depend on the behavior
of applications. This means that failure detectors can neither obtain feedback from applications nor be used by applications to transmit information in any manner.
As an example, consider a Strong failure detector y [CT96]. Each failure detector module of y outputs a set of processes that are suspected to have crashed, i.e., uz|9 2Π. y satisﬁes the following two properties:
k Strong Completeness: Eventually every process that crashes is permanently suspected by every correct
process. More precisely:
SI}BxSt~{yrI5DBVQ07{BS2 bcedgfehpieq rIsB S#3s beccViebD rI5DBSTQxpQ : 2twr3BFQq
6

k Weak Accuracy: Some correct process is never suspected. More precisely: SI}BS#t0ymrIsBe20 beccViebE rIsB STQm7BxS3X Π IPQV : 2mtwr3BxQV
The class of all failure detectors that satisfy the above two properties is denoted  . Let  be a class of failure detectors. An algorithm solves a problem using  if it can solve this problem using any y . An algorithm implements  if it implements some y .

5 Quiescent Reliable Communication
In this paper, we focus on quasi reliable send and receive and reliable broadcast, because these communication primitives are sufﬁcient to solve many problems (see Section 11.1). We also brieﬂy consider stronger types of communication primitives — reliable send and receive, and uniform reliable broadcast — in Section 10.

5.1 Quasi Reliable Send and Receive
Consider any two distinct processes ¥ and ¦ . We deﬁne quasi reliable send and receive from ¥ to ¦ in terms of two primitives, sendRh  and receiveh  , that must satisfy Integrity and the following property:
k Quasi No Loss6: For all l 1, if both ¥ and ¦ are correct and ¥ sends § to ¦ exactly  times, then ¦ receives § from ¥ at least  times.
Note that Quasi No Loss together with Integrity implies that for all l 0, if both ¥ and ¦ are correct and ¥ sends § to ¦ exactly  times, then ¦ receives § from ¥ exactly  times.
We want to implement quasi reliable send/receive primitives using the (lossy) send/receive primitives that are provided by the network. In order to differentiate between these two, the ﬁrst set of primitives is henceforth
denoted by SEND/RECEIVE, and the second one, by send/receive. Informally, an implementation of SENDRh  and RECEIVEh  is quiescent if a ﬁnite number of invocations of SENDVh  cause only a ﬁnite number of
invocations of sends throughout the network.

5.2 Reliable Broadcast

Reliable broadcast [BT85] is deﬁned in terms of two primitives: broadcast§ and deliver§ . We say that

process 2 broadcasts message § if
the following ﬁelds: the identity

2ofinitvsoskeensdberro, addencoatsetd§fDi

.


WqiDec

assume that
§ , and a

every broadcast message §
sequence number, denoted

ifDniec lu§de s.

These ﬁelds make every message unique. We say that 3 delivers message § if 3 returns from the invocation of

deliver§ . Primitives broadcast and deliver satisfy the following properties[HT94]:

k Validity: If a correct process broadcasts a message § , then it eventually delivers § .

k Agreement: If a correct process delivers a message § , then all correct processes eventually deliver § .

k

Uniform Integrity: For previously broadcast by

fDeiv erqyic

message
§ .

§

, every process delivers §

at most once, and only if §

was

6A stronger property, called No Loss, is used in Section 10.1 to deﬁne reliable send and receive.

7

1 For every process 2 :

2

3 To execute broadcast§ :

4 5

deliver § for all 3X

iDghRc

e2#

do SENDggh i

§

6 return

7

8 upon RECEIVEggh iE§ do

9 if 2 has not previously executed deliver§ then

10 11

deliver § for all 3X

iqghRc

2

do SENDggh i§

Figure 2: Quiescent implementation of reliable broadcast using a quiescent implementation of SEND and RECEIVE primitives between neighbors

We want to implement reliable broadcast using the (lossy) send and receive primitives that are provided by the network. Informally, an implementation of reliable broadcast is quiescent if a ﬁnite number of invocations of broadcast cause only a ﬁnite number of invocations of sends throughout the network.

5.3 Relation between Reliable Broadcast and Quasi Reliable Send and Receive
From a quiescent implementation of quasi reliable send and receive one can easily obtain a quiescent implementation of reliable broadcast, and vice versa.

Remark 1 From any quiescent implementation of reliable broadcast, we can obtain a quiescent implementation
of the quasi reliable primitives SENDggh i and RECEIVEi"h g for every pair of processes 2 and 3 .

The implementation is trivial: to SEND a using the given quiescent implementation

message §
of reliable

btoroa3 d, c2 assitm, wplhyerberofDaid cqasiDtc srthem9es2 saagned

fDie9r§wBq2H9Be3 BV,Ta

sequence number that 2 has not used before. Upon the delivery of §B2Be3Be , a process ¦ RECEIVEs § from

2 if ¦93 , and discards § otherwise. This implementation of SENDggh i and RECEIVEi"h g is clearly correct and

quiescent.

Remark 2 Suppose that every pair of correct processes is connected through a path of correct processes. If we
have a quiescent implementation of quasi reliable primitives SENDggh i and RECEIVEieh g for all processes 2 and 3X neighbor e2# , then we can obtain a quiescent implementation of reliable broadcast.

The implementation of reliable broadcast, a simple ﬂooding algorithm taken from [HT94], is given in Figure 2 (the code consisting of lines 9 and 10 is executed atomically7). It is clear that this implementation is quiescent.
Indeed, for every message § , an invocation of broadcast§ can cause at most 8 1 invocations of SEND
per process. Moreover, since the implementation of SEND is quiescent, each invocation of SEND causes only a ﬁnite number of invocations of sends. Thus, a ﬁnite number of invocations of broadcast causes a ﬁnite number of invocations of sends.
 7A process executes a region of code atomically if at any time there is at most one thread of in this region.

8

6 Impossibility of Quiescent Reliable Communication
Quiescent reliable communication cannot be achieved in a network with process crashes and message losses. This holds even if the network is completely connected, only a ﬁnite number of messages can be lost, and processes have access to a Strong failure detector.
Theorem 1 Consider a network where every pair of processes is connected by a fair link and at most one process
may crash. Let ¥ and ¦ be any two distinct processes. There is no quiescent implementation of quasi reliable send and receive from ¥ to ¦ . This holds even if we assume that only a ﬁnite number of messages can be lost, and the implementation can use  . Proof (Sketch). Assume, by contradiction, that there exists a quiescent implementation   of quasi reliable SENDRh  and RECEIVEh  using  . We now construct three runs of   , namely, ¡ 0, ¡ 1 and ¡ 2, in which only ¥ may SEND a message  to ¦ and no other process invokes any SEND. In run ¡ 0, ¥ SENDs no messages, all processes are correct, all messages are received one time unit after they are sent, and the failure detector behaves perfectly (i.e., no process suspects any other process). Since   is quiescent, there is a time Q 0 after which no messages are sent or received. By the Integrity property of SEND and RECEIVE, process ¦ never RECEIVEs any message. Run ¡ 1 is identical to run ¡ 0 up to time Q 0; at time Q 0 ` 1, ¥ SENDs  to ¦ , and ¦ crashes; after time Q 0 ` 1, no
processes crash, and all messages are received one time unit after they are sent; at all times, the failure detector
behaves perfectly (i.e., ¦ is suspected by all processes from time Q 0 ` 1 on, and there are no other suspicions). Since   is quiescent, there is a time Q 1 Q 0 after which no messages are sent or received. In run ¡ 2, ¦ and its failure detector module behave exactly as in run ¡ 0 (in particular, ¦ does not crash and ¦ receives a message § in ¡ 2 whenever it receives § in ¡ 0); all other processes and their failure detector modules behave exactly as in run ¡ 1 (in particular, a process 29¢ ¦ receives a message § in ¡ 2 whenever it receives § in ¡ 1). Note that, in ¡ 2, if messages are sent to or from ¦ after time Q 0, then they are never received. We now show that in ¡ 2 the send and receive primitives satisfy the Integrity property. Assume that for some l 1, some process 3 receives § from some process 2 times. There are several cases. (1) If 39Y¦ then ¦ receives § from 2 times in ¡ 0 (since ¦ behaves in the same way in ¡ 0 and ¡ 2). In ¡ 0, by the Integrity property of send and receive, 2 sent § to ¦ at least  times. This happens by time Q 0, since there are no sends in ¡ 0 after time Q 0. Note that by time Q 0, 2 behaves exactly in the same way in ¡ 0 Be¡ 1 and ¡ 2. Thus 2 sent § to ¦ at least  times by time Q 0 in ¡ 2. (2) If 319v ¦ and 29v¦ , then 3 receives § from ¦X times in ¡ 1 (since 3 behaves in the same way in ¡ 1 and ¡ 2). In ¡ 1, by the Integrity property of send and receive, ¦ sent § to 3 at least  times. This happens by time Q 0, since ¦ crashes at time Q 0 ` 1 in ¡ 1. By time Q 0, ¦ behaves exactly in the same way in ¡ 0 BV¡ 1 and ¡ 2. Thus ¦ sent § to 3 at least  times by time Q 0 in ¡ 2. (3) If 39v ¦ and 2£9¤ ¦ , then 3 receives § from 21 times in ¡ 1 (since 3 behaves in the same way in ¡ 1 and ¡ 2). By the Integrity property of send and receive in ¡ 1, 2 sent § to 3 at least  times. Note that 2 behaves exactly in the same way in ¡ 1 and ¡ 2. Thus 2 sent § to 3 at least  times in ¡ 2. Therefore, the send and receive primitives in ¡ 2 satisfy the
Integrity property.
We next show that in ¡ 2 the send and receive primitives satisfy the Fairness property, and in fact only a ﬁnite number of messages are lost. Note that ¦ sends only a ﬁnite number of messages in ¡ 0 (since it does not send messages after time Q 0), and every process 29¢ ¦ sends only a ﬁnite number of messages in ¡ 1 (since it does not send messages after time Q 1). So, by construction of ¡ 2, all processes send only a ﬁnite number of messages in ¡ 2. Therefore, only a ﬁnite number of messages are lost, and the send and receive primitives satisfy the
Fairness property.
Finally, we show that in ¡ 2 the failure detector satisﬁes the properties of a Strong failure detector. Indeed, there are no crashes and therefore Strong Completeness holds vacuously; also there exists a process, namely ¥ , which
9

is never suspected by any process, and so Weak Accuracy holds.

We conclude that ¡ 2 is a possible run of   using  in a network with fair links that lose only a ﬁnite number of

messages. Note that in ¡ 2: (a) both ¥ and ¦ are correct; (b) ¥ SENDs  to ¦ ; and (c) ¦ does not RECEIVE  .

This violates the Quasi No Loss property of SENDRh  and RECEIVEh  , and so   is not an implementation of

SENDRh  and RECEIVEh  — a contradiction.

¥

Theorem 1 and Remark 1 immediately imply:

Corollary 2 There is no quiescent implementation of reliable broadcast, even if the implementation can use  .

To overcome these impossibility results, we now introduce the heartbeat failure detector.

7 Deﬁnition of ¦§
A heartbeat failure detector y has the following features. The output of y at each process 2 is a list e2 1BF8 1DBEe2 2BF8 2BECDCECBEe2roBx8#o¨ , where 2 1B2 2 BDCECDCEB2po are the neighbors of 2 , and each 8p© is a nonnegative integer. Intuitively, 8 © increases while 2 © has not crashed, and stops increasing if 2 © crashes. We say that 8 © is the heartbeat value of 2 © at 2 . The output of y at 2 at time Q , namely twe2HBxQV , will be regarded as a vector indexed by the set @F2 1 B2 2 BDCECDCEB2poG . Thus, twe2HBxQVDª 2 ©V« is 8 © . The heartbeat sequence of 2 © at 2 is the sequence of the heartbeat values of 2 © at 2 as time increases. y satisﬁes the following properties:
k (1) -Completeness: At each correct process, the heartbeat sequence of every faulty neighbor is bounded.
Formally:
SI}BSt~ymrIsBxS2m beccViebE rI5DBS3s bcedfeh!ieq rIs#¬ iqghRc 2DBV­~1®}BS¯Qs17 : twe2HBxQVDª°3 «± ­ k (1) -Accuracy:
– At each process, the heartbeat sequence of every neighbor is nondecreasing. Formally:
SI}BxSt~ymrIsBxS2m ΠB S3X ihRgc e2#BxSTQ²m7 : twe2HBFQRDª°3 «³± twe2HBxQ` 1Dª°3 «
– At each correct process, the heartbeat sequence of every correct neighbor is unbounded. Formally:
SI}BS#tymrIsBxS2m beccViebE rIsB S3X beccViebE x´5¬ ihRgc e2#B S­µ0®}BeQs17 : twe2HBFQREª¶3 « ¢­
The class of all heartbeat failure detectors is denoted (1) . By a slight abuse of notation, we sometimes use (1)
to refer to an arbitrary member of that class.
It is easy to generalize the deﬁnition of (1) so that the failure detector module at each process 2 outputs the heartbeat of every process in the system [ACT97b], rather than just the heartbeats of the neighbors of 2 , but we
do not need this generality here.

10

8 Quiescent Reliable Communication Using ¦§
The communication networks that we consider are not necessarily completely connected, but we assume that every pair of correct processes is connected through a fair path. We ﬁrst consider a simple type of such networks, in which every link is assumed to be bidirectional8 and fair (Fig. 1a). This assumption, a common one in practice, allows us to give efﬁcient and simple algorithms. We then drop this assumption and treat a more general type of networks, in which some links may be unidirectional and/or not fair (Fig. 1b). For both network types, we give
quiescent reliable communication algorithms that use (1) . Our algorithms have the following feature: processes
do not need to know the entire network topology or the number of processes in the system; they only need to know the identity of their neighbors.
In our algorithms, y5g denotes the current output of the failure detector y at process 2 .

8.1 The Simple Network Case

Wof¦ r,foepqmaruosatschsueiimssrseSel¥ iEtahﬁbNarltseDatS.lflToElraiNnkskskDstrVhienh epteathaanestdknsReertenEwpdCoera¦EktBxI§waVsreEeB nfDbdieh i d¦fiBFor,§erwcttBhhifDieociench aa lrsuwaennh¦PsdeirfneaitrshiDe(eFqgbihigasR.ca1kcagf)rrr.¥eosWuh(nsedseﬁe,eqrrFseutipeggen.iacv3teee).dnalTuqymousiSbeeesEnrc,dNeanDRnt·¹diam¸¯tmhpºaeleeBFns§msiateB gnrfDeetieat§ utironttnoos

¦ , where ·P¸pº is a tag. This send occurs

heartbeat value of ¦ has increased. The task

from ¦ .
the ﬁrst

This time

iatcrkencoewivleedsgRe·¹m¸¯enºatBF§is BsfDeien t

by .

every time ¥ queries its
repeat send terminates if
¦ every time it receives

fR¥ a·Prileu¸¯crºteeBFivd§eetBsefDacientoa r.cmkPnoroodwcuellseesdag¦ nedRmEneonCttiEcye»²IVs¼aEt½¤hsaB §tfDiethaet

The code consisting of lines 7 and 8 is executed atomically, as well as the code consisting of lines 23 and 24. If there are several concurrent executions of the repeat send task (lines 11–18), then each execution must have its
own private copy of all the local variables in this task, namely, ¦ , § , seq, hb and prev hb r.

We now show that When ambiguities

the algorithm of Fig. 3 is correct and quiescent. Note that
may arise, a variable local to process 2 is subscripted by

2al,lev.agr.i,ahbleg siasrtehelolcoaclatloveaarciahbplerohce sosf.

process 2 .

Lemma 3 (Integrity) For all l 1, if ¦ RECEIVEs § from ¥} times, then ¥ SENT § to ¦ at least  times.

PfbSFreioronformocoerefea¥.¦c¦ hfNRor¿0oEretcCeweaeEtci@ hvhI1aVeBEt¿sE,CDCEfsÀRoCD·¹§¾rBe@ ¯¸¯a1GnºtBD,ytCEBFiT§CDmfaCÁseBBVk,spf ¦tGrh©e.oep nnBeflarytyohtmetRshreeEen¥ Ca.dInrEeTt¦eIhBF gV§irsdEiitBsfsyffee§ pnr© redonfprcctoeaavmrnnatyloou¥ nneoollsfyynfsboteehc1necfBEduoCDﬁrrCEarkCDisnenBtddf tlpiirdmneo uecerseiu1niic6tvghreoeat,fnhc¥Taeitanisv¦sveekornsecrdaecRstpe·PieoiRv¸¯an·¹etºtos¸¯sBFfeºa§RSn·PBFdB§E¸¯fN¦pºtB BFfDBF§ §fR©rBh o¨B f mf t§o© ©¥

¦

.

. ,

and each such invocation forks only one task repeat send. Therefore, ¥ SENT § to ¦ at least  times.

¥

Let § ¿{lÂ e9|t9 f ¿



1
©

be a
BDCECECDBe

message , we can

and ¤l
associate

1, and consider a a sequence number

rfu n©

in which
with the ¿

¥

invokes SENDRh  § exactly  -th invocation of SENDRh g§ as

be then

tfh¯eÃ}v9 aluf e

©

of the global variable seq after , since during each invocation

line 7 is executed during the ¿ -th invocation.9 of SENDVh  the global variable seq is increased

times. For
follows: we Note that if by one, and

it can never be decreased.

&ÄvÅ ÅaÄ| Å¤Æ Çe¨È8In our model, this means that link

is in the network if and only if link

is in the network. In other words, neighbor

if and only if ÉÆ neighborÇÅÈ .

Ê Ë Ì9If crashes during the -th invocation before executing line 7, we let sn be equal to one plus the value of the global variable seq at

the time of the invocation.

11

1 For process ¥ :

2

3 4

InitiafDliiezÉatÍ ion0:

@ seq is the current sequence number G

5

6 7 8

To exfDeiecÉutÍÎe SfDEieN` DV1h g§ : fork task repeat send¦BF§B

fDie



9 return

10 11 12

task

ÏrecepieÐ aths enc5dÍ

¦BF§B 1

fDie



:

13 14 15 16 17 18

ruenpteilaihrfte&ÏpcÍceeeirsÏ iÐiveyÑcVoeindhgÐVdi hc¨Rhaych ¨»²llRÒÓy¼a·Pcs½¤hd¸¯ÍÎBºtfDª ¦BFieh§«  tª hB¦ ffD«erieonm ¦

@ query the heartbeat failure detector G

19
20 For process ¦ :

21 22 23 24

upon receiveh  R·P¸pºaBx§wB fDie  do isfetnhdisih syt»ah¼²e ½¤ﬁrB sfDtiet im e ¦ receives R·P¸¯ºtBF§B

fDie



from ¥

then RECEIVEh  §

F¦¹igureiD3g:hRSc imr¥p le network case — quiescent implementation of SENDVh  and RECEIVEh  using (0) for

Lemma 4 (Quasi No Loss) For all l 1, if both ¥ and ¦ are correct and ¥ SENDs § to ¦ exactly  times, then ¦ RECEIVEs § from ¥ at least  times.

afbnPi¦ srs.eerosvsoWmtooehocmrefhi¥.aseseteeneSe¿vqndu¥eudwprweisypnn@iovyctt1s»²hioemeBEk¼ana,CDeeu½¤CEbsuCDmiyBntSBefibr¯cqEeeoGu©NrcneseatDurtsimaovcsRdheoeh ¥¨iscs.ctsih§tfaBaiarotgoytenemdfV,t,oh·Pwt¦ re¥h¸¯itastIahºthtneemt¦BFnteh¿§degRe-srstB sih¿Etfay-yCt»agtih©pemE¼ario½ÕIenoiVf,vpsB Eioetfnthrcsefteayo©v§tfreiookororfsnfmrnrseaooleycmftRnae·¹Sisdfi¥vkE¸¯iaetlNºarendersdBFDeps§bcerVyteaehhBetc¦iÔ¨ave§nsfeeirtvsonhe.mdaVt,Sti·Ptm¥ioi¥ nt¸¯.nercdºtseeSei.vpdBFiaene§NlnraclooteBrteehttf deiecrcale©eeycf c ih t.vseh© eesiaW’vesnstqeeda¦yu»²rbceeRRe¼aon·¹dfEnc½¤oi¸¯cesCrBltºteunifE.ndBFuIc§Lem©Vt,etbEB thtfefhrsafroet§©mri¦e©s

troec¦ e. iTvehiss×t»aas¼ak½ÕwB if ll

be
©

referred to as task Ö from ¦ . Therefore,

. Task Ö
the loop

never in lines

terminates, because it can only do so if ¥ crashes or if ¥ 13–18 of task Ö is repeated inﬁnitely often. Moreover,

since ¦ is correct, the (1) -Accuracy property guarantees that the heartbeat sequence of ¦ at ¥ is nondecreasing and

usR·Pnebn¸podºtusnBxR§wd·PeBd¸¯f,º²pa©ÁBxn§w dftBrhof umds©t¥ htaeotc¦ loeanandstiitnoiﬁonnncieitne—lniunamec1bo5enrterovafadtliiucmatiteoesns..(BinyttahsekFÖ a)irtnoetsrsuepraonpeinrtﬁynoitfesneunmdbaenrdofreticmeeivse. ,T¦ hreerecfeoirvee¥ ¥s

We now show that the implementation in Fig. 3 is quiescent. In order to do so, we focus on a single invocation of SEND and show that it causes only a ﬁnite number of invocations of sends. This immediately implies that a ﬁnite number of invocations of SENDs cause only a ﬁnite number of invocations of sends. So consider one

12

p¥R·Padrot¸piecºtsuBxln§waortB ifcnrvaoschfar,ot imocna¥  u,soietfsiSinnvEvoNokcDeastRihosgne§sndo ,fhasgneyd»an¼alde½ÕRth ¨BsnRf ·Pb ¸pe. ºtthSBxe§wosB efqrmueaniynceatalnssoukmcrebapueersaeatsisnsoevncoidcaat¦etiBxdo§wnwsB iftoh#f 

. It is clear that, if
.seWnhdenh gy¦ »²r¼aec½ÕeB if vre s,

and it
sendVh

i¨sV·Pcle¸¯aºtrBFt§haB tf

  

dooressennodt ch aguy»²se¼aa½ÕnB yf

other
r .

invocations

of

sends.

Therefore, a send caused by  

is either

We next show that ¥ sends V·P¸¯ºaBF§B f  to ¦ only a ﬁnite number of times, and that ¦ sends y»²¼a½¤B f  to ¥

only a ﬁnite number of times. This implies that   causes only a ﬁnite number of sends.

Lemma 5 ¥ sends  MSGBF§B sn to ¦ an inﬁnite number of times if and only if ¦ sends  ACKB sn to ¥ an inﬁnite
number of times.

Proof. If ¥ sends V·P¸¯ºtBF§B f  to ¦ an inﬁnite number of times, then ¥ is correct, and the condition in line 15

ei(0nvﬁ)anl-uiCtaeotemnsuptmolebtteerunreeosfasnt,i¦miniesﬁsnc.oiStrerinenccute.m¦ Tbsheeernnodfbsytiymt»ahee¼²sF.½¤aBTirf hnpee rsetsofop¥rreoe,paetchrhteythiomefaesrteibtnerdeatcaseneidvqeureescnRec·Peiv¸poeºaf, Bx¦¦ §wraetB cf¥ e#iisv,eu¦ snsbRe·Ponu¸pdnºtdseBxyd§w»a.¼²B Sf½¤oB , f

by an


to ¥ an inﬁnite number of times.

Conversely, if ¦ sends y»a¼a½ÕB f p to ¥ an inﬁnite number of times, then ¦ number of times, so, by the Integrity property of send and receive, ¥ sends

V·Pre¸¯cºteBFiv§eBsf

R·P¸¯ºtBF§B f p an inﬁnite  to ¦ an inﬁnite number

of times .

¥

Corollary 6 ¥ sends  MSGBF§B sn to ¦ only a ﬁnite number of times.

tPeyr»avureo¼²eno.½¤tfuT.B afhFlleoyr rerftaeoocrce¥eo,niavttnheraesdinti×acﬁ»astnik¼aoit½Õnree,B pnsf euuramp tpbfsoreeosrnemdothf¦ ¦a.tBFti§mT¥ heB ssuf esbn, ydtehsvLeeRec·Pnmotu¸pnmaºtdalilBxty§wi5o.tneB rf Bimrnyilnittnaohteee¦ s1Fa8aanni(ridonnfeﬁssotnasisittpkersnroeeupnpmeedrbatsyet Rrso·Peofnf¸psdtºtiem¦Bxn§wBxed§wsB .afB nTrf d³h e tr)noebc¦¦ eecsoivoenemlny,deas¥s

ﬁnite number of times — a contradiction.

¥

Corollary 7 ¦ sends  ACKB sn to ¥ only a ﬁnite number of times.
Proof. From Lemma 5 and Corollary 6.

¥

Lemma 8 The algorithm of Fig. 3 is quiescent.
Proof. From Corollaries 6 and 7, and the remarks before Lemma 5. From Lemmata 3, 4, and 8 we have:

¥

Theorem 9 For the simple network case and any ¦¹ neighbor ¥ , Fig. 3 is a quiescent implementation of quasi reliable SENDVh  and RECEIVEh  that uses (1) .

From this theorem, and Remarks 1 and 2, we have:

Corollary 10 In the simple network case, quasi reliable send and receive between every pair of processes and
reliable broadcast can both be implemented with quiescent algorithms that use (0) .

13

8.2 The General Network Case

In this case (Fig. 1b), some links may be unidirectional, e.g., the network may contain several unidirectional rings that intersect with each other. Moreover, some links may not be fair (and processes do not know which ones are fair).

Achieving quiescent reliable communication in this type of network is signiﬁcantly more complex than before. For instance, suppose that we seek a quiescent implementation of quasi reliable send and receive. In order for the
sender ¥ to SEND a message § to the receiver ¦ , it has to use a diffusion mechanism, even if ¦ is a neighbor of ¥ (since link ¥Ø¦ may be unfair). Because of intermittent message losses, this diffusion mechanism needs to ensure that § is repeatedly sent over fair links. But when should this repeated send stop? One possibility is to
use an acknowledgement mechanism. Unfortunately, the link in the reverse direction may not be fair (or may not even be part of the network), and so the acknowledgement itself has to be “reliably” diffused — a chicken and egg problem.

Figure 4 shows a quiescent implementation of reliable broadcast (by Remark 1 it can be used to obtain quasi

2rtmthoerlaaiu@Ftia2Ùnnb3 tsG laheiiannansssedthndafedeovlbraaikvarnsiecdarktbeargldesrecko§ euqi¨nv. dÚ}eIg .nÛpbª §IfDeonitr«wdtch§eeoiresn nt;ttoaaﬁeisbvnnkreai,onrl2lyagydppa2ceaasrirseireotttoduaofirfcmnppasrelrolfosycrscoeaecsmgshseseeetsch§ s)ke..s,IiF2 ninfovt,ﬁurofrieocstaiartvctdseihooelymnlmi,voeeaefsnrpsbseraoirg§ gocehae;b§sdtoshcret3ahn3sias2tt ii§nisnbi¨ t.r¨ioag Tlagª ih§dzª §ece« as,t«astvitshf,ake2erihqaahce bÚ}ahalsreÛpptefDbrvioe¨iacd§tegesonª s§fca2 e3 t«

at to

2

has those

increased, and if who are already

sino,2 ¨sg eª n§ d« s.1a0

message The task

containing terminates

§

whteonaallllnneeigighhbboorsrswohfo2 searheeacrotnbteaaitneindcirneas¨ed g

—
ª§

even
«.

aAdaÏ nped¨ldlplihevmaaeiprerspesasdeatanm§gsdeesoqsasuintstesdeone,nnclicftefebtnooiyofntÏt,pÏhd¨ired¨tohdach .ele.glsiFosverieinsrt.ahslml§Uy,paa2ornendfootffhrowetrhkaresredftcosaerstmikhpetqo§nÚ}feÛpBwsfDui¨cm h§eÜPsa s.fRam TgeBxehÏsesd¨na§gh2 eB ,awdp¨dhr oseÜPcrteehfRsegsocBF2 otÏ nd¨mﬁtersh sng ttsitcsohoaefacsllke¨sti tosiÜPffnpifRetr iohgtcahoesbsoas¨relsrseg taaªh§dnaydt«

The code consisting of lines 18 through 26 is executed atomically. Each concurrent execution of the diffuse task
(lines 9 to 16) has its own private copy of all the local variables in this task, namely § , hb, and prev hb.

We now show that this implementation is correct and quiescent.

Lemma 11 (Uniform Integrity) For every message § , every process delivers message § at most once, and only if § was previously broadcast by sender§ .

Proof (Sketch). Let § be a message and 2 be a process. Line 19 guarantees that 2 delivers § at most once. Now

suppose process
and clearly 29

2fDi

d eqliivc er§s §

.

. It can do that either
In the second case, 2

in line 4 or must have

line 20. In the ﬁrst case, 2 previously broadcast § received a message of the form §wBVÔ!BeÔ¨ . By the

Integrity property of send and receive, a
shows that this can only happen if § was

message of the form previously broadcast

b§y BefDÔ!i BV Ô¨q iDwc a§s p .reviously

sent.

An

easy

induction
¥

Lemma 12 (Validity) If a correct process broadcasts a message § , then it eventually delivers § .

Proof. If a correct process broadcasts § then it eventually executes line 4 and delivers § .

¥

iWnietianleixzat tsiohnowoftha¨t

g

every process in
ª § « takes place.

¨ g ª §
We do

« delivered
not assume

§ ¨.

g

But ﬁrst, we should be concerned about when the
ª § « is initialized to the empty set at start-up. Doing

so would be impractical since the set of all possible messages § that can ever be broadcast can be inﬁnite. Instead,

 ÝÞ ß&à10It may appear that does not need to send this message to processes in got , since they already got it! The reader should verify
that this “optimization” would make the algorithm fail.

14

1 For every process 2 :

2

3 To execute broadcast§ :

4 5

de¨liªv§ e«rÍ § @F2³G

6 fork task diffuse§

7 return

8

9 10

task diffuse§ : for all 3X

iDghRc

e2#

do Ï

cViDÐ

h ª°3 « Í

1

11 12 13 14 15 16

urenpteilaihft&fiDpoÍegrrfÏhsoiyÑcVooRridmgÐac ieclel2#ha3X3XltlyÍáW hi¨i ª h§ hdRR«gc c

e2# , 3¹ ¨ ª § e2# such that Ï

« cVaiDÐndhdÏ

ceiÐ ª°3 «

Òvhh @ ª¶ 3 qª¶«3 u«Òvedryohtshª°e3e«nhtdehggaehrigntb§eaB tf¨ailª §ure« B2#de tector

G

17 18

upon receivegh ig§wB  ÜPfR BFÏ d¨h  do

19 if 2 has not previously executed deliver§ then

20 21

deliªv§ e«rÍ § @F2ÙG

22 23 24 25 26

fÏo¨d¨rhUªa§ fslÍloe« r3 nÍjkÏsdud¨tggach ighPhs¨k §ãtªh2§daBitf«df3Øuâ¨s eª § ¨§« iDBFÜPÏgd¨fRh Rh  c e2# and 3 appears at most once in Ï dh do

Figure 4: General network case — quiescent implementation of broadcast and deliver using (1)

each process 2 initializes ¨ g ª §

lin¨e g

21).
ª§ «,

This guarantees that but one should always

«¨eg itª h§ e«r

when it broadcasts §
is never used before

(see line 5), or when it ﬁrst “hears” about § (see
it is initialized. We next establish invariants for

keep in mind that these invariants only hold after initialization has occurred.

Lemma 13 For any processes 2 and 3 , (1) if at some time Q , 3U gotg ª § « then 3U gotg ª § « at every time Q  l£Q ; (2) When gotg ª § « is initialized, 2 gotg ª § « ; (3) if 3X gotg ª § « then 3 delivered § .

Proof (Sketch). (1) and (2) are clear from the algorithm and (3) follows from the Integrity property of send and

receive.

¥

Lemma 14 For every § and path, there is a ﬁnite number of distinct messages of the form §wBVÔ!B path . Proof. Any message of the form §BeÔ!BxÏ d¨h  is equal to §wBxäpBFÏ d¨h  for some äW Π, where Π is a ﬁnite set. ¥

Lemma 15 Suppose link 2 3 is fair, and 2 and 3 are correct processes. If 2 delivers a message § , then 3 eventually delivers § .

Proof. it forks

Stauspkpoqs Ú}e,ÛpbfDyi

contradiction, that
§ . Since 3 does

2

delivers §
not deliver

§

and 3
, by

never delivers §
Lemma 13 part

. Since 2 delivers § (3) 3 never belongs

toan¨d

it
gª

§

is

«

correct, . Since

15

2 is correct, this implies that 2 executes the loop in lines 11–16 an inﬁnite number of times. Since 3 is a correct

neighbor of 2 , the (1) -Accuracy property guarantees that the heartbeat sequence of 3 at 2 is nondecreasing and

unbounded. Thus, the condition in line 13 evaluates to true an inﬁnite number of times. Therefore, 2 executes

line 14 an times. By

inﬁnite Lemma

1n4u,mthbeerreoefxtiismtseas,suanbdsesto¨2å

sends a message of the form §wBVÔ!B2# to 3 an inﬁnite number of W Π such that 2 sends message §BFä 0 Bq2# inﬁnitely often to 3 . So,

by the Fairness property of send and receive, 3 eventually receives §BFä 0 Bq2# . Therefore, 3 delivers § . This

contradicts the assumption that 3 does not deliver § .

¥

Lemma 16 (Agreement) If a correct process delivers a message § , then all correct processes eventually deliver §.

Proof. Suppose that some correct process 2 delivers § . For every correct process 3 , there is a simple fair path

e2 1 BECDCECDB2rog from 2 to 3 with 2 1 9æ2 and 2ro9¾3 . By successive applications of Lemma 15, we conclude that

2 2 Bq2 3 BECDCECEBq2ro eventually deliver § . Therefore 3Ñ9w2o eventually delivers § .

¥

We now show that the implementation of Fig. 4 is quiescent. In order to do so, we focus on a single invocation of broadcast and show that it causes only a ﬁnite number of invocations of sends. This implies that a ﬁnite number of invocations of broadcast cause only a ﬁnite number of invocations of sends.

Let § be a message and consider an invocation of broadcast§ . This invocation can only cause the sending of messages of form §wBVÔ!BeÔ¨ . Thus, all we need to show is that every process eventually stops sending messages
of this form.

Lemma 17 Let 2 be a correct process and 3 be a correct neighbor of 2 . If 2 forks task diffuse § , then eventually condition 3U gotg ª § « holds forever.

Proof. By Lemma
contradiction, that 3

13 part (1), we never belongs to

o¨n lgyª

and 2 o 4 Ã 9ée2

çÑ9è3 1 B2 2

. Let
BDCECECDB2

Ã

e2 .

o ç×Bq2 o ç
Note

t6 h1aBDtCEfCDoCErB12po± 

bÂ eÒ

§

need to show that eventually 3 belongs to « . Let e2 1B2 2BDCECECDB2roeç be a simple fair path a simple fair path from 3 to 2 with 2#o9¾2  , process 2 Ã 6 1 appears at most once in 4 Ã .

¨ g ª § from 2
. For

«.
to 1

3 ±

Suppose, by

wÂ ithÒ 2

1 9w2  , let

We claim that for each ¿9 1BECDCECEBVÕ 1, there is a set ä © containing @F2 1 Bq2 2 BECDCECBq2 © G such that 2 © sends §BFä © Be4 © 

mrftoeacec2 tse©sti6ahvga1eeta,3inm2 nimenavdﬁeedndrsiitabettheenlelouyncmogimbnsetpterolniotesfsto tigfhmªaä§ etoEs2«s..o1Fto9or i¿2ts9éevva¹erina tbula1el,lyth¨irseg ccª § leaii«vm.eSstiong§wceetBxhääeoEoErss

with the Fairness property of send and
1 BV4³oEs 1 . Upon the receipt of such a 1 contains 2roeça93 , this contradicts the

We show the claim by induction on ¿ . For the base case note that, since 3 never belongs to ¨ g ª § « and 3 is a

neighbor of 2 1 9v2 , then 2 1 executes the loop in lines 11–16 an inﬁnite number of times. Since 3 is a correct

neighbor of 2 1, the (0) -Accuracy property guarantees that the heartbeat sequence of 3 at 2 1 is nondecreasing and

unbounded. Thus, the condition in line 13 evaluates to true an inﬁnite number of times. So 2 1 executes line 14

inﬁnitely often. Since 2 2 is a correct neighbor of 2 1, its heartbeat sequence is nondecreasing and unbounded, and

so 2 1 sends a message of the form §wBVÔ!B2 1 to 2 2 an inﬁnite number of times. By Lemma 14, there is some ä 1

such that 2 1 sends §BFä 1 B2 1 to 2 2 an inﬁnite number of times. Note that Lemma 13 parts (1) and (2) implies

that 2 1 ä 1. This shows the base case.

2msFtfhro©ooaem6 rmts1setä ha2r© äege6© ©eci1anecincdsoiovuinnnect§ﬁattsaiinoBFniinäpn§witnseBVsg4Bx@Fntäe2©u@F©p61m2 Be,B114 2bs B©ue22 fprBDo2,pCEorBDiCEoftCEsCDsCEtosBeiC"me2mBtn©2hee6 da©sä1t.sG G .fStaohaiBrmanntyc¿deecst2 Òoh2s© nae©6 gt6 aF1e2iasnoiiessrf1nnab,tedh2onsests©ehifpsgo§wä rehro© mnbBxpaäoden©rsr6d§toy1f§w2 BeBeo2©Ô!4 fBx6© BV©ä6 1s46© .1e©1Bea6nB4 n1d©tydo aLta2tnoope© dmp262 er©2©mea66 arca21nse.1aaiiInvn4ttﬁe,miin,nstohiﬁ2 etseen©atr6iesonty1enuecmrtxnoeeiubscimsteneesrbei4vä oet©e©rfh66 stao11itfm,§eWteeiaamsBFccΠ.äheh© ssstBe,iuu4 mfcco© ¥ hher

16

Corollary 18 If a correct process 2 forks task diffuse§ , then eventually 2 stops sending messages in task diffuse§ . Proof. For every neighbor 3 of 2 , there are two cases. If 3 is correct then eventually condition 3XädêDQgdª § « holds oafofftree3 rvawetrh2 bicyishLbtehomeungmduaeadr1d,7a.innIdflisn3 oeise1v3faeiunslttuayal,wltlhyayecnsoftnahdlesite(0i.o)nH-eÏ CncVociDemÐ , p2 hdle egtveª¶en3 n«etsul saplhlryogdpsª°te3or« ptyhsogsluedasnrdfaoinnrtegevemesre.thsTsaathgteherseefihonereata,rsttbkheeqarteÚ}siÛpesqfDaui et§inmc ee. ¥

Lemma 19 If some process sends a message of the form §BeÔ!B path , then no process appears more than twice

in path.

Proof (Sketch). By line 25 of the algorithm, a process sends a message §wBxäpBFÏ d¨h  to a process 3 only if 3

appears at most once in path. The result follows by an easy induction that uses this fact and the Integrity property

of send and receive.

¥

Lemma 20 (Quiescence) Eventually every process stops sending messages of the form §wBVÔ!BeÔ¨ .

Proof. Suppose for a contradiction that some process 2 never stops sending messages of the form §wBVÔ!BeÔ¨ .

ﬁpNBrnyooitctLeeeetsshmseat2mto2 iafnmvv1oa4ulk,suetfeosbsr.esseTconohmrdereggreehciäfto.§wWrBe,BxyΠäpfoLBF,Ïr2ed¨mssohmemnaead1nÏs9d¨ia,nnthﬁh,nien2 ittﬁehsnienritdnuemdcnosbumeamrnpbooienfnrﬁetionnmftiteomesf.neasusmmagbeesessraog§femoBFä¯feBFsthÏsaed¨gfeh osr.mofSot§h,efBeofÔ!orBVrsÔ¨mo mra§engpBeerÔ!osBFcÏoevd¨seshr3



a . ,

fmTrncaoeanuherncdmemsersolbeieacvetg§wcaereÏursesorBVd¨fÔ!atniwnhBFiemnÏ ioi9¾vtd¨ethsocsehacsetr2qsaaaeigts1nisakeBEo.nCDlnioqCEFnsifnCEieÚ}roBqﬁts2rh1Ûpfnteog4,fDistiieenf,foonw§Ïnrrdmud¨higgmen.hh rigbel§wSiie§snirnBVeeocBFÔ!ml2äpefBF6ÏpBxt21i.Ï td¨my.d¨FoheCwonhs rlo.eytrwBhiooimecnylhlcvmeasturoehreeryckesdoeÏ1Iinsaan8d¨tdttseeslh celgihynanroesi9rdwteey2,asi6psnc2 tu.rhho1ptaaEBDapptsCEeaocktCDrcshotCEheyenqBr2pttsoÚ}erhuoEafaÛpsicdstshfD1ieÏcia tind¨.int§odivhSmnoaocecnsoo2aidanrntficrsrioteeeinenscrctalseeciwnsiaiovvheenfeeinca2o,sdht6tchal,c2weeutmarhisrsetethetoonisospnesnsamlaayengnspepwetiiynnrnohoﬁﬁfdpecnnnaietiinhsttt2hgeees

sneumndbser§wofBxsä e BFnÏ dd¨sh

of


a message of this form to 2 . to 2 . Therefore, there exists

By Lemma 14, for
a correct process 2



some such

äW
that

Π, there is an
sendg ç h g §BFä 

BFinÏ ﬁd¨nhit e

taainnmdiencsﬁo, nrwriethecetnrepurmÏ od¨bceeh srësooeìfsit2³simëeomeì sap.ntByd.y2#rë eoEps e1aì tsiuncghththisatarsgeunmdegEníïîxtðyXh gDíñîRò

1
1ð

more
§wBxädë

toimì BFeÏ sd¨wh eë ocì o niscliundveoktheadt

there exist an inﬁnite

number of

is
ärë

ionì vWokΠed,

number of

¥

From Lemmata 11, 12, 16, and 20 we have:

Theorem 21 For the general network case, Fig. 4 is a quiescent implementation of reliable broadcast that uses
(0) .

From this theorem and Remark 1 we have:

Corollary 22 In the general network case, quasi reliable send and receive between every pair of processes can
be implemented with a quiescent algorithm that uses (1) .

9 Implementations of ¦Y§ We now give implementations of (1) for the two types of communication networks that we considered in the
previous sections. These implementations do not use timeouts. 17

1 For every process 2 :

2

3 4

Initialization:
for all 3X

iDghRc

e2#

do y

g

ª°3 «

Í

0

5

6 cobegin

7 óïó Task 1:

8 9

repeafot rpearllio3Xdica lilyhdRc e2# do sendggh i¨ô}õ!»}ör÷&øaõp»¯÷}

10

11 óñó Task 2:

12 13

uponyÑreg!ª°c3 e« iÍveggyÑh iggdª°ô}3 «õp`»}öp1÷'øaõ!»¯÷' do

14 coend

Figure 5: Simple network case — implementation of (1)

9.1 The Simple Network Case

We assume all links in the network are bidirectional and fair (Fig. 1a). In this case, the implementation is
obvious. Every process 2 executes two concurrent tasks (Fig. 5). In the ﬁrst one, 2 periodically sends message ôõp»}öp÷'øaõp»¯÷ to all its neighbors. The second task handles the receipt of ôõp»}öp÷'øaõp»¯÷ messages. Upon the receipt of such a message from process 3 , 2 increases the heartbeat value of 3 .
We now prove that the implementation is correct.

Lemma 23 ((1) -Completeness) At each correct process, the heartbeat sequence of every faulty neighbor is
bounded.

Proof. Obvious.

¥

Lemma 24 At each process 2 , the heartbeat sequence of every neighbor 3 is nondecreasing. Proof. This is clear since y5gpª°3 « can only be changed in line 13.

¥

Lemma 25 At each correct process 2 , the heartbeat sequence of every correct neighbor 3 is unbounded.

Proof. Since 3¤ iqghRc 2 and all links are bidirectional, we have 2|À iDghRc r3g . Moreover, since 3 is

correct, its Task 1 executes forever. Therefore, 3 sends an inﬁnite number of HEARTBEAT messages to 2 . By

the Fairness property of send and receive, 2 receives an inﬁnite number of HEARTBEAT messages from 3 .

Every time 2 receives HEARTBEAT from 3 , it increments ysgpª¶3 « in line 13. So, 2 increments y5gpª°3 « an inﬁnite

number of times. Moreover, by Lemma 24, ysgpª¶3 « can never be decremented. So, the heartbeat sequence of 3 at 2

is unbounded.

¥

Corollary 26 ((1) -Accuracy) At each process the heartbeat sequence of every neighbor is nondecreasing, and

at each correct process the heartbeat sequence of every correct neighbor is unbounded.

Proof. From Lemmata 24 and 25.

¥

From Lemma 23 and the above corollary, we have:

Theorem 27 For the simple network case, Fig. 5 implements (1) .

18

1 For every process 2 :

2

3 4

Initialization:
for all 3X

iDghRc

e2#

do y

g

ª°3 «

Í

0

5

6 cobegin

7 óïó Task 1:

8 9

repeafot rpearllio3Xdica lilyhdRc e2# do sendggh i¨ô}õ!»}ör÷&øaõp»¯÷ùBq2#

10

11 12 13 14 15 16 17

óñó

Tasukp2o:nffÏ oord¨errch¹aaeyÑsllÍllievgp33 neª°Ï3ssdgguu«d¨hggigccÍh highhô}ãttôyùõphh2 õp»}aagptt»}ª°öp33X3Xöp÷'« ÷'ww`øaøaõ!1õp»¯iDiD»¯÷5gg÷ÑBxhhÏ BFRRÏd¨ccd¨h ee2#2#h daaonndd

3 3

appears in Ï d¨h do does not appear in Ï

dh

do

18 coend

Figure 6: General network case — implementation of (0)

9.2 The General Network Case
In this case some links may be unidirectional and/or not fair (Fig. 1b). The implementation is more complex than before because each HEARTBEAT has to be diffused, and this introduces the following problem: when a process
2 receives a HEARTBEAT message it has to relay it even if this is not the ﬁrst time 2 receives such a message.
This is because this message could be a new “heartbeat” from the originating process. But this could also be an
“old” heartbeat that cycled around the network and came back, and 2 must avoid relaying such heartbeats. The implementation is given in Fig. 6. Every process 2 executes two concurrent tasks. In the ﬁrst task, 2 mphôeeeraõpsirost»}badöpgeicae÷'atsløalvoyõpafl»¯tsuh÷ÑeeesnBFfÏ dood¨sfrmhaml letsôiotssõpaag»}nlleeöpiitg÷'Rshôøanbõpeõpoi»}r»¯gsöph÷Ñbt÷'BFhoÏ øaard¨tsõpat»¯h hp÷É ap.teBU2#daropoitnnonoÏttahad¨llephpirt.eesacTrneheiinipegntphoa2bftohasr.psu.pchenTmdhseesisstsaeegcloefnftdrootmÏ adspkh rohacanendsdslfeo3 s,rw2 thaierndcrsreemcaeseeispsstatghoeef
We now proceed to prove the correctness of the implementation.

Lemma 28 At every process 2 , the heartbeat sequence of every neighbor 3 is nondecreasing. Proof. This is clear since y5gpª°3 « can only be changed in line 14.

¥

Lemma 29 At each correct process, the heartbeat sequence of every correct neighbor is unbounded.

Proof. Let 2 be a correct process, and 3 be a correct neighbor of 2 . Let 4ú9e2 1 BDCECECDB2rog be a simple fair path

from 3 to 2 with 2 1 93 and 2ro9¢2 . For ¿19 1BDCECECDBe , let 4 © 9e2 1BECECDCEBq2 ©  . For each ¿19 1BDCECDCEBe 1, we

claim that 2 © sends ô}õp»}öp÷'øaõ!»¯÷5BV4 For the base case (¿9 1), note that 2

©
1

to 2 93

©

i6s

1 an inﬁnite number correct, so its Task

of times. We show this by induction on ¿ . 1 executes forever and therefore 2 1 sends

ôõp»}öp÷'øaõp»¯÷ÑB2 1 to all its neighbors, and thus to 2 2, an inﬁnite number of times. For the induction step, let

19

¿ Ò  1 and assume that 2 2nscoue© 6rmnr2edbcdsetoraeôonsfdõpnt»}itomhtöpeea÷'lspiøa.pneTõpka»¯h2 ri÷Ñ©sinBeûs4 h4 ©o©2 6w6 © 1s16 

1tat©honiesds2 efc©2 al6na©ir2d6i,msi22 n.©isl6ô}ina1õpern»}e1ecöp7ige÷'.hiTvøabehoõ!esr»¯roRe÷5ôffoBV2 õp4r©e»}© 6 , öp12 ,t÷'© os6 øao21õpe© s»¯6aec÷É1nhBead4tnsim© i nô}eaﬁ2õ!nn©»}ii6tnöre1ﬁ÷&nnreøauitcmeõpe»¯bniv÷5euremBVso4bf©eRô6rti1oõpm f»}ettosöpim.2÷'e© Søa6si.õpn2»¯Mcae÷Ñno2Beirne4© ﬁo6© vn1,eitiries,t

For ¿£9èm 1 this claim shows that 2oEs 1 sends ôõp»}öp÷'øaõp»¯÷ÉBe4ÙoEs 1 to 2ro an inﬁnite number of times.

tPimroecse.ssN2roo teisthcaotrr3mecvt a niDdglihnRkc

2oEs 1 e2o

2po
(since

is fair,
2ro9v2

so 2o
) and

receives ôõp»}öp÷'øaõp»¯÷ÑBe4³oEs 1 an inﬁnite number of 31À4³oEs 1 (since 2 1 93 ). So every time 2ro receives

ôõp»}öp÷'øaõp»¯÷ÑBe4³oEs
that, by Lemma 28,

1 yÑg

,
î

it
ª°3

increments
« can never

y5g
be

î ª°3 « in line 14.
decremented.

So y5g î
So, the

ª°3 « is incremented an inﬁnite number heartbeat sequence of 3 at 2ÙoÑ9w2 is

of times. Note
unbounded. ¥

Corollary 30 ((1) -Accuracy) At each process the heartbeat sequence of every neighbor is nondecreasing, and
at each correct process the heartbeat sequence of every correct neighbor is unbounded.

Proof. From Lemmata 28 and 29.

¥

Lemma 31 If some process 2 sends  HEARTBEATB path then (1) 2 is the last process in path and (2) no
process appears twice in path.

Proof (Sketch). This follows from lines 9, 15 and 16, and a simple induction that uses the Integrity property of

send and receive.

¥

Lemma 32 Let
 HEARTBEATB

2 ,3
path

bã e3g

processes, an inﬁnite

and path be a non-empty sequence of processes. If 2 receives message number of times, then 3 receives message  HEARTBEATB path an inﬁnite

number of times.

Proof. Let  be the message ô}õ!»}ör÷&øaõp»¯÷5BxÏ dhmã 3 and let  0 be the message ô}õp»}öp÷'øaõ!»¯÷5BxÏ d¨h  .

Suppose 2 2  sends 

receives  an inﬁnite number of to 2 an inﬁnite number of times.

times. By the Integrity property of send
By Lemma 31 part (1), we have 3Ñ9w2  .

and receive, some Since the length of

pÏ rdochseãss3

is at least two, 3 can only send  in line 17. So 3 only sends  if it receives  0. Therefore 3 receives  0

an inﬁnite number of times.

¥

Lemma 33 ((1) -Completeness) At each correct process, the heartbeat sequence of every faulty neighbor is
bounded.

Proof (Sketch). Let 2 be a correct process and let 3 be a faulty neighbor of 2 . Suppose that the heartbeat sequence

of 3 at 2 is not bounded. Then 2 increments y5gpª°3 « an inﬁnite number of times. So, for an inﬁnite number of times,

2 receives messages of the form ô}õ!»}ör÷&øaõp»¯÷5BVÔ¨ with a second component that contains 3 . By Lemma 31

part (2), the second
Thus there exists a Ï

cd¨oh mcpoonnteanintionfg

a
3

message of
such that 2

the form receives

ôô}õpõ!»}»}öpör÷'÷&øaøaõpõp»¯»¯÷Ñ÷5BFBVÏ Ô¨d¨ rh a nagnesinoﬁvneirteanﬁunmitbeesreotfotfimveaslu. es.

arcLenoecdntetÏrriaevd¨dceihesctimvs9üeethsaes2 na1fdgaBDecCEbtCEyCDtô}hLBõ!2raet»}om3 ör.mi÷&saTøafha3õpeu1»¯nlt÷5,pya.fBEroeIt2rf(1s¿1BEo)CE,ÒmCD3 Ce Bs2p¿ te©ÁhnR±edna,s nb,yiô2 nr©õpﬁe»}np9~ieöptea÷'3tne.øaudõpmIaf»¯pb÷Ñ¿wperlBFi9ýÏcoad¨fttihiomtnhesteosno.2,fTLbahyneemrtiehnmfeﬁoanrIein3t,te2eb,gynwruitmtehyecbpoIenrnrotceoplgfuerdrtititeymytephosraf.otsp2 Tee© hrn6tidys1

of send and receive and Lemma 31 part (1), 2T© sends ô}õp»}öp÷'øaõ!»¯÷ùBe2 of times. Since 2d©É9¤3 , this contradicts the fact that 3 is faulty.

1 BECDCECEBq2d©R

to 2d©

6

1 an inﬁnite number
¥

By Corollary 30 and the above lemma, we have:

Theorem 34 For the general network case, Fig. 6 implements (1) .

20

1 For process ¥ :

2

3 4

InitiafDliiezÉatÍ ion0:

@ seq is the current sequence number G

5

6 7 8 9 10

To

exbfDþefDiercieoÉuÑatÍÎeÍdcRfDafD-iesSie tE` §N1DBþ fDVieh g Be§¥¨BF ¦:  wait until RECEIVEd y»²¼a½ÕBEþ

fDie



from Q#`

1 processes

11 return

12
13 For every process 2 :

14 15

upon deliver§wBEþ fDie BV¥Bx¦ do

16 SENDggh Ey»a¼a½ÕBRÿy¥¡ Á3g

17 if 209¢¦ then R-RECEIVEh D§

Figure 7: Quiescent implementation of R-SENDRh  and R-RECEIVEh  for 8 2Q

10 Stronger Communication Primitives

Quasi reliable send and receive and reliable broadcast are sufﬁcient to solve many problems (see Section 11.1).
However, stronger types of communication primitives, namely, reliable send and receive, and uniform reliable broadcast, are sometimes needed. We now give quiescent implementations of these primitives for systems with process crashes and message losses.

Let Q be the number of processes that may crash. [BCBT96] shows that if QXlÓ8Ù 2 (i.e., half of the processes

may crash) these primitives cannot be implemented, even if we assume that links may lose only a ﬁnite number

of messages and we do not require that the implementation be quiescent.

We now show that if Q Ò 8Ù 2 then there are quiescent implementations of these primitives for the two types

of network considered in this paper. The implementations that we give here are simple and modular but highly

inefﬁcient. assume that

MQ Òore8Ùe f2ﬁ.cient

ones

can

be

obtained

by

modifying

the

algorithms

in

Figures 3

and

4.

Hereafter,

we

10.1 Reliable Send and Receive
If a process ¥ returns from the invocation of sendRh ¨§ we say that ¥ completes the sending of message § to ¦ . With quasi reliable send and receive, it is possible that ¥ completes the sending of § to ¦ , then ¥ crashes, and ¦ never receives § (even though it does not crash). In contrast, with reliable send and receive primitives, if ¥ completes the sending of message § to a correct process ¦ then ¦ eventually receives § (even if ¥ crashes). More
precisely, reliable send and receive satisfy Integrity (Section 4.2) and:
k No Loss: For all l 1, if ¦ is correct and ¥ completes the sending of § to ¦ exactly  times, then ¦ receives § from ¥ at least  times.11
11The No Loss and Quasi No Loss properties are very similar to the Strong Validity and Validity properties in Section 6 of [HT94].
21

1 For every process 2 :
2
3 To execute uniform-broadcast(§ ): 4 broadcast§
5 return
6
7 upon deliver§ do 8 for all 3X Π do SENDggh iy»²¼a½¤BF§ 9 wait until RECEIVEd y»²¼a½ÕBx§ from Q#` 1 processes 10 uniform-deliver(§ )
Figure 8: Quiescent implementation of uniform reliable broadcast for 8 2Q
Reliable send and receive primitives are denoted R-SEND/R-RECEIVE. As before, SEND/RECEIVE denote the quasi reliable primitives. Figure 7 shows a quiescent implementation of R-SEND and R-RECEIVE (the code consisting of lines 7 and 8 is executed atomically). It uses reliable broadcast and SEND/RECEIVE between every pair of processes. We
have already shown that these primitives have quiescent implementations using (1) for the two types of network
in consideration.
Roughly speaking, when ¥ wishes to R-SEND § to ¦ , it broadcasts a message that contains § , ¥ , ¦ and a fresh sequence number, and then waits to RECEIVE Q#` 1 acknowledgements for that message before returning from this invocation of R-SEND. When a process 2 delivers this broadcast message, it SENDs an acknowledgement back to ¥ , and if 29À¦ then it also R-RECEIVEs § from ¥ . The proof of correctness is straightforward and thus
omitted.
10.2 Uniform Reliable Broadcast
The Agreement property of reliable broadcast states that if a correct process delivers a message § , then all correct processes eventually deliver § . This requirement allows a faulty process (i.e., one that subsequently crashes)
to deliver a message that is never delivered by the correct processes. This behavior is undesirable in some applications, such as atomic commitment in distributed databases [Gra78, Had86, BT93]. For such applications, a stronger version of reliable broadcast is more suitable, namely, uniform reliable broadcast which satisﬁes Uniform Integrity, Validity (Section 5.2) and:
k Uniform Agreement [NT90]: If any process delivers a message § , then all correct processes eventually deliver § .
Figure 8 shows a quiescent implementation of uniform reliable broadcast which uses reliable broadcast and SEND/RECEIVE between every pair of processes. The proof of correctness is straightforward and thus omitted.
11 Using ¦§ to Extend Previous Work
(0) can be used to extend previous work in order to solve problems with algorithms that are both quiescent and
tolerant of process crashes and messages losses.
22

11.1 Extending Existing Algorithms to Tolerate Link Failures
(0) can be used to transform many existing algorithms that tolerate process crashes into quiescent algorithms that
tolerate both process crashes and message losses. For example, consider the randomized consensus algorithms of [Ben83, Rab83, CMS89, FM90], the failure-detector based ones of [CT96, AT96], the probabilistic one of
[BT85], and the algorithms for atomic broadcast in [CT96],  -set agreement in [Cha93], atomic commitment in [Gue95], and approximate agreement in [DLP6 86]. These algorithms tolerate process crashes, and they use
quasi reliable send and receive, and/or reliable broadcast, as their sole communication primitives. All of these algorithms can be made to tolerate both process crashes and message losses (with fair links) in two simple steps:
(1) implement (1) as described in Section 9, and (2) plug in the quiescent communication primitives given in
Section 8.12 The resulting algorithms tolerate message losses and are quiescent.

11.2 Extending Results of [BCBT96]

Another way to solve problems with quiescent algorithms that tolerate both process crashes and message losses
is obtained by extending the results of [BCBT96]. That work addresses the following question: given a problem that can be solved in a system where the only possible failures are process crashes, is the problem still solvable if links can also fail by losing messages? One of the models of lossy links considered in [BCBT96] is called fair
lossy. Roughly speaking, a fair lossy link 20j3 satisﬁes the following property: If 2 sends an inﬁnite number of messages to 3 and 3 is correct, then 3 receives an inﬁnite number of messages from 2 . Fair lossy and fair links differ in a subtle way. For instance, if process 2 sends the sequence of distinct messages § 1 BF§ 2 BF§ 3 BECECDC to 3 and 2á3 is fair lossy, then 3 is guaranteed to receive an inﬁnite subsequence, whereas if 2Î3 is fair, 3 may receive nothing (because each distinct message is sent only once). On the other hand, if 2 sends the sequence § 1 BF§ 2 BF§ 1 BF§ 2 BECDCEC and 2 3 is fair lossy, 3 may never receive a copy of § 2 (while it receives § 1 inﬁnitely often), whereas if 2mn3 is fair, 3 is guaranteed to receive an inﬁnite number of copies of both § 1 and § 2.13

[BCBT96] establishes the following result: any problem 4 that can be solved in systems with process crashes can also be solved in systems with process crashes and fair lossy links, provided 4 is correct-restricted14 or a majority
of processes are correct. For each of these two cases, [BCBT96] shows how to transform any algorithm that
solves 4 in a system with process crashes, into one that solves 4 in a system with process crashes and fair lossy
links. The algorithms that result from these transformations, however, are not quiescent: each transformation requires processes to repeatedly send messages forever.

Given (0) , we can modify the transformations in [BCBT96] to ensure that if the original algorithm is quiescent then

so is the transformed one. Roughly speaking, the modiﬁcation consists of (1) adding message acknowledgements;

(2) suppressing the sending of a message from 2 to 3 if either (a) 2 has received an acknowledgement for that

message from (3) modifying that are not in

3,
t¢¥he£

or (b) the heartbeat of 3 has not
 m£ e  a2nianrge aocfttuhaellyopaeprpateinodne“datpope¢¤n£d

 i¢¤n£ c £ r2e  .a£ sT  e1hdetosrien¢¥scu£el  tt£sh ein2”l[aBssotCttBihmaTte9o62 n],lsyceotnhmtebaeinlmeemedseswnatigstehinttho¢¤e3 £

; a£ nd    1 above

modiﬁcation, show that if a problem 4 can be solved with a quiescent algorithm in a system with crash failures

only, and either 4 is correct-restricted or a majority of processes are correct, then 4 is solvable with a quiescent

algorithm that uses (1) in a system with crash failures and fair lossy links.

12Similar steps can be applied to algorithms that use reliable send/receive or uniform reliable broadcast, provided a majority of processes are correct, by plugging in the implementations given in Section 10.
13In [BCBT96], message piggybacking is used to overcome message losses. To avoid this piggybacking, in this paper we adopted the model of fair links: message losses can now be overcome by separately sending each message repeatedly.
14Intuitively, a problem ¦ is correct-restricted if its speciﬁcation does not refer to the behavior of faulty processes [Gop92, BN92].

23

12 Generalization to Networks that Partition
In this paper, we assumed that every pair of correct processes are reachable from each other through fair paths. In [ACT97b], we drop this assumption and consider the more general problem of quiescent reliable communication in networks that may partition. In particular, we (a) generalize the deﬁnitions of quasi reliable send and receive and of reliable broadcast, (b) generalize the deﬁnition of the heartbeat failure detector and implement it in networks that may partition, and (c) show that this failure detector can be used to achieve quiescent reliable communication in such networks. In [ACT97b] we also consider the problem of consensus for networks that may partition, and
we use (1) to solve this problem with a quiescent protocol (we also use a generalization of the Eventually Strong
failure detector [CT96]).

13 Quiescence versus Termination
In this paper we considered communication protocols that tolerate process crashes and message losses, and focused on achieving quiescence. What about achieving termination? A terminating protocol guarantees that every process eventually reaches a halting state from which it cannot take further actions. A terminating protocol is obviously quiescent, but the converse is not necessarily true. For example, consider the protocol described at
the beginning of Section 1. In this protocol, (a) ¥ sends a copy of § repeatedly until it receives ¨©§ from ¦ , and then it halts; and (b) upon each receipt of § , ¦ sends ¨©§ back to ¥ . In the absence of process crashes this protocol is quiescent. However, the protocol is not terminating because ¦ never halts: ¦ remains (forever) ready to reply to the receipt of a possible message from ¥ .
Can we use (0) to obtain reliable communication protocols that are terminating? The answer is no, even for
systems with no process crashes. This follows from the result in [KT88] which shows that in a system with message losses (fair links) and no process crashes there is no terminating protocol that guarantees knowledge gain.

Acknowledgments
We are grateful to Anindya Basu and Vassos Hadzilacos for having provided extensive comments that improved the presentation of this paper. We would also like to thank Tushar Deepak Chandra for suggesting the name Heartbeat.

References

[ACT97a] Marcos Kawazoe Aguilera, Wei Chen, and Sam Toueg. On the weakest failure detector to achieve quiescence. Manuscript, April 1997.

[ACT97b] Marcos Kawazoe Aguilera, Wei Chen, and Sam Toueg. Quiescent reliable communication and quiescent consensus in partitionable networks. Technical Report 97-1632, Department of Computer Science, Cornell University, June 1997.

[AT96]

Marcos Kawazoe Aguilera and Sam Toueg. Randomization and failure detection: a hybrid approach
to solve consensus. In Proceedings of the 10th International Workshop on Distributed Algorithms, Lecture Notes on Computer Science, pages 29–39. Springer-Verlag, October 1996.

24

[BCBT96] Anindya Basu, Bernadette Charron-Bost, and Sam Toueg. Simulating reliable links with unreliable links in the presence of process crashes. In Proceedings of the 10th International Workshop on Dis-
tributed Algorithms, Lecture Notes on Computer Science, pages 105–122. Springer-Verlag, October 1996.

[BDM97]

O¨ zalp Babaog˘lu, Renzo Davoli, and Alberto Montresor. Partitionable group membership: speciﬁcation and algorithms. Technical Report UBLCS-97-1, Dept. of Computer Science, University of
Bologna, Bologna, Italy, January 1997.

[Ben83]

Michael Ben-Or. Another advantage of free choice: Completely asynchronous agreement protocols. In Proceedings of the 2nd ACM Symposium on Principles of Distributed Computing, pages 27–30, August 1983.

[BN92]

R. Bazzi and G. Neiger. Simulating crash failures with many faulty processors. In A. Segal and
S. Zaks, editors, Proceedings of the 6th International Workshop on Distributed Algorithms, volume 647 of Lecture Notes on Computer Science, pages 166–184. Springer-Verlag, 1992.

[BT85] [BT93]

Gabriel Bracha and Sam Toueg. Asynchronous consensus and broadcast protocols. Journal of the ACM, 32(4):824–840, October 1985.
O¨ zalp Babaog˘lu and Sam Toueg. Non-blocking atomic commitment. In Sape J. Mullender, editor, Distributed Systems, chapter 6. Addison-Wesley, 1993.

[Cha93] Soma Chaudhuri. More choices allow more faults: Set consensus problems in totally asynchronous systems. Information and Computation, 105(1):132–158, July 1993.

[Cha97] Tushar Deepak Chandra, April 1997. Private Communication.

[CHT96] Tushar Deepak Chandra, Vassos Hadzilacos, and Sam Toueg. The weakest failure detector for solving consensus. Journal of the ACM, 43(4):685–722, July 1996.

[CMS89] Benny Chor, Michael Merritt, and David B. Shmoys. Simple constant-time consensus protocols in realistic failure models. Journal of the ACM, 36(3):591–614, 1989.

[CT96]

Tushar Deepak Chandra and Sam Toueg. Unreliable failure detectors for reliable distributed systems. Journal of the ACM, 43(2):225–267, March 1996.

[DFKM96] Danny Dolev, Roy Friedman, Idit Keidar, and Dahlia Malkhi. Failure detectors in omission failure environments. Technical Report 96-1608, Department of Computer Science, Cornell University,
Ithaca, New York, 1996.

[DLP6 86] Danny Dolev, Nancy A. Lynch, Shlomit S. Pinter, Eugene W. Stark, and William E. Weihl. Reaching
approximate agreement in the presence of faults. Journal of the ACM, 33(3):499–516, July 1986.

[FLP85] Michael J. Fischer, Nancy A. Lynch, and Michael S. Paterson. Impossibility of distributed consensus with one faulty process. Journal of the ACM, 32(2):374–382, April 1985.

[FM90]

Paul Feldman and Silvio Micali. An optimal algorithm for synchronous Byzantine agreement. Technical Report MIT/LCS/TM-425, Laboratory for Computer Science, Massachusetts Institute of Technology, June 1990.

[GLS95]

Rachid Guerraoui, Michael Larrea, and Andre´ Schiper. Non blocking atomic commitment with an unreliable failure detector. In Proceedings of the 14th IEEE Symposium on Reliable Distributed
Systems, pages 13–15, 1995.

25

[Gop92] [Gra78]
[Gue95]
[Had86]
[HT94]
[KT88] [LH94]
[Lyn96] [NT90] [Rab83] [SM95]
[vR97]

Ajei Gopal. Fault-Tolerant Broadcasts and Multicasts: The Problem of Inconsistency and Contamination. PhD thesis, Cornell University, January 1992.
James N. Gray. Notes on database operating systems. In R. Bayer, R. M. Graham, and G. Seegmuller, editors, Operating Systems: An Advanced Course, volume 66 of Lecture Notes on Computer Science. Springer-Verlag, 1978. Also appears as IBM Research Laboratory Technical report RJ2188.
Rachid Guerraoui. Revisiting the relationship between non-blocking atomic commitment and consensus. In Proceedings of the 9th International Workshop on Distributed Algorithms, pages 87–100, Le Mont-St-Michel, France, 1995. Springer Verlag, LNCS 972.
Vassos Hadzilacos. On the relationship between the atomic commitment and consensus problems. In Proceedings of the Workshop on Fault-Tolerant Distributed Computing, volume 448 of Lecture Notes on Computer Science, pages 201–208. Springer-Verlag, March 1986.
Vassos Hadzilacos and Sam Toueg. A modular approach to fault-tolerant broadcasts and related problems. Technical Report 94-1425, Department of Computer Science, Cornell University, Ithaca, New York, May 1994.
Richard Koo and Sam Toueg. Effects of message loss on the termination of distributed protocols. Information Processing Letters, 27(4):181–188, April 1988.
Wai-Kau Lo and Vassos Hadzilacos. Using failure detectors to solve consensus in asynchronous shared-memory systems. In Proceedings of the 8th International Workshop on Distributed Algorithms, pages 280–295, Terschelling, The Netherlands, 1994.
Nancy A. Lynch. Distributed Algorithms. Morgan Kaufmann Publishers, Inc., 1996.
Gil Neiger and Sam Toueg. Automatically increasing the fault-tolerance of distributed algorithms. Journal of Algorithms, 11(3):374–419, 1990.
Michael Rabin. Randomized Byzantine generals. In Proceedings of the 24th Symposium on Foundations of Computer Science, pages 403–409. IEEE Computer Society Press, November 1983.
Laura S. Sabel and Keith Marzullo. Election vs. consensus in asynchronous systems. Technical Report 95-1488, Department of Computer Science, Cornell University, Ithaca, New York, Febrary 1995.
Robbert van Renesse, April 1997. Private Communication.

26