PROBABILISTIC SEPARATION LOGICS FOR
RANDOMIZED ALGORITHMS

A Dissertation

Presented to the Faculty of the Graduate School

of Cornell University

in Partial Fulfillment of the Requirements for the Degree of

Doctor of Philosophy

by

Jialu Bao

August 2025


© 2025 Jialu Bao

ALL RIGHTS RESERVED


PROBABILISTIC SEPARATION LOGICS FOR RANDOMIZED ALGORITHMS

Jialu Bao, Ph.D.

Cornell University 2025

Randomized algorithms are hard to test, thus accentuating the need for formal

methods to ensure their correctness. When probabilistic separation logic was

first developed as a formal method for proving probabilistic independence be-

tween program variables, it was unclear whether this approach generalizes to

weaker forms of probabilistic separation used in program analysis.

We first overview existing work in Bunched logic — the assertion logic un-

derlying separation logic — and probabilistic separation logic for independence

in chapter 2.

In chapter 3, we extend probabilistic separation logic to reason about negative

dependence, a relation in which an increase in one variable makes others less

likely to increase. We demonstrate the utility of this program logic by analyzing

hash-based data structures, such as Bloom filters.

In chapter 4, we introduce a variation of probabilistic separation logic for

reasoning about dependence and independence. Specifically, we use it to estab-

lish conditional independence between programs variables in simple programs.

Last, in chapter 5, we present the unary fragment of BLUEBELL to provide a

more ergonomic way to reason about conditional independence and indepen-

dence. We illustrate its application through more intricate examples drawn from

cryptography, security, and probabilistic graphical models.

All the program logics developed in this thesis target imperative programs

that can sample from probability distributions.


BIOGRAPHICAL SKETCH

Jialu spent the first eleventh years of her life in Ningbo, a coastal city with a

long history of fishing and trades. She went on to attend middle school and

high school in Hangzhou. After high school, she briefly enrolled at the Univer-

sity of Virginia for one semester before beginning her undergraduate studies at

Cornell University as a spring admit. At Cornell, she earned Bachelor of Arts

in Math and Computer Science. She then moved to Wisconsin to start her Ph.D.

at University of Wisconsin–Madison. After two years, she transferred back to

Cornell following her advisor’s move. Her doctoral research lies in the field of

programming languages, formal verification, and probabilistic programs. After

completing her Ph.D. in Computer Science at Cornell, she will start as a post-

doctoral researcher at Northeastern University with Prof. Steven Holtzen.

iii


To my family.

iv


ACKNOWLEDGEMENTS

First and foremost, I would like to express my deepest gratitude to my ad-

visor Justin Hsu, without whom this thesis would not have existed. Justin has

been such an exceptional teacher, an insightful mentor, and an inspiring role

model! Through thoughtful explanations and detailed feedback, he taught me

how to think, write, and present more clearly and break down complex prob-

lems into manageable parts.

I am also grateful to have Joseph Halpern, Dexter Kozen, and Alexandra

Silva on my committee. I was blessed to meet many wonderful teachers, and

Joe was one of them. His course ”Reasoning about Knowledge” gave a fasci-

nating introduction to epistemic logic and sparked my interest in modal logic.

I also feel fortunate to have the opportunity to learn from Dexter Kozen about

Kleene Algebra in a class. Outside of the classroom, Dexter’s wisdom has also

led to many inspiring discussions in PLDG and in the hallway. I am immensely

grateful to Alexandra Silva for collaborating with me and hosting me on var-

ious occasions, including my visit to her UCL group this year, during which

part of this thesis was written. The lab’s friendly and intellectually engaging

atmosphere made it an ideal place for me to reflect and learn.

I am indebted to all my collaborators, Jessica Cho, Simon Dorcherty,

Emanuele D’Osualdo, Azadeh Farzan, Marco Gaboardi, Tao Gu, Kun He, John

Hopcroft. Justin Hsu, Drashti Pathak, Oliver Richardson, Subhajit Roy, Alexan-

dra Silva, Joseph Tassarotti, Nitesh Trivedi, Xiaodong Xin, and Fabio Zanasi.

In particular, I would like to thank Emanuele and Azadeh for their close men-

torship during our collaboration. They gave me new perspectives on program

logics and showed me fresh ways to approach research problems. I would also

like to thank Shuchi Chawla for kindly mentoring me on a project, though it did

v


not result in a publication.

I am also grateful to have Eli Bingham and Zenna Tavares as my mentors

when I interned at Basis, and Ellie Cheng, Ayush Chopra, Poorva Garg, Palka

Puri, Raffi Sanna, and Andy Zane as my intern cohorts.

During my undergraduate studies, the courses taught by Paul Ginsparg,

Michael Clarkson, and Jon Kleinberg deeply influenced my career choice. I

would like to thank them for giving intellectually stimulating lectures and

thoughtful assignments. I am also grateful to Michael Macy, Chris Cameron,

John Hopcroft, and Nate Foster for mentoring me and introducing me to the

world of research.

Although my years in Wisconsin were cast in the shadow of the Covid lock-

downs, the wonderful people I met and the beautiful outdoor scenery colored

my memories. I would like to thank Evangelia Gergatsouli, Yang Guo, Xiating

Ouyang, Rojin Rezvan and Laura Stegner for many fun gatherings when we

could meet in person. I would also like to thank Kyrylo Chernyshov, Yuchen

Han, Lu Yang, Yujia Zhang, and other friends for sharing their daily moments

remotely and kept me in a good spirit during that period.

After returning to Cornell, I also had great pleasure to be surrounded by

fantastic friends and colleagues. Although we did not have a lab officially, other

students of Justin (Noah Bertram, Max Fan, Karuna Grewal, Vaibhav Mehta, Kei

Imada, Zachary Susag, and Laura Zielinski) and my officemates (Keri D’Angelo,

Kangbo Li, Khonzoda Umarova, and Noam Zilberstein) filled that role, offering

knowledge and support whenever it was needed. Their kindness, curiosity, and

good humor made both the research and the everyday moments delightful. I

am also especially thankful to Mark Moeller and Yulun Yao for being amazing

PLDG co-czars – I hope this PL group tradition continues for many years to

vi


come! In my last year, the Sunday casual tennis organized by Ayaka Yorihiro

and Nitika Saran became a cherished social routine. I had great fun hitting with

Ayaka, Ethan Yang, Max, Nitika, Rebecca Liu, Yunxi Shen and others. I also feel

fortunate to share my Ph.D. journey with Pedro de Amorim, Ryan Doenges, Ali

Farahbakhsh, Wen-Ding Li, Yueying Li, Rishabh Madan, Anshuman Mohan,

Rolph Recto, Oliver Richardson, Goktug Saatcioglu, Albert Tseng, Nathan Yan,

and Alicia Yang. I am deeply thankful for the support they offered and the

insights they generously shared. I hope our friendship will continue for many

years.

Outside of the CS department, I am fortunate to have Ning Duan, Lijun

Zhang, and Yujia Zhang as my close friends locally in Ithaca. I am also grateful

to have had Shi Tang and Yu Pan as best friends since high school — our en-

during friendship offers a sanctuary for reflecting on my personal journey and

sharing my feelings.

Last, I am infinitely grateful to my parents Fang and Guanzhen, wàipó

Yazhen, wàigōng Meiding, and the rest of my family for their unconditional love,

unwavering support, and the countless cherished moments we have shared.

Hangzhou, China

July, 2025

vii


TABLE OF CONTENTS

Biographical Sketch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi

1 Introduction 1
1.1 Probabilistic Programs . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Independence and Dependencies in Programs . . . . . . . . . . . 3
1.3 Separation Logic for Independence and Dependencies . . . . . . . 7
1.4 Outline of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2 Bunched Logic and Probabilistic Separation Logic 12
2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2 Bunched Logic (BI) . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.2.1 Syntax and Semantics . . . . . . . . . . . . . . . . . . . . . 15
2.2.2 Proof System . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.2.3 Soundness and Completeness of BI . . . . . . . . . . . . . 20
2.2.4 A Discrete Probabilistic Frame of BI . . . . . . . . . . . . . 32

2.3 Probabilistic Separation Logic . . . . . . . . . . . . . . . . . . . . . 36
2.3.1 A Simple Probabilistic Programming Language . . . . . . 38
2.3.2 A Concrete BI Model for Asserting Independence . . . . . 43
2.3.3 A Program Logic for Reasoning about Independence . . . 45

3 A Program Logic for Negative Dependence 52
3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.2 Negative Association . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.3 A BI Frame for Negative Dependence . . . . . . . . . . . . . . . . 60

3.3.1 Initial Attempts at a BI Frame for Negative Association . . 61
3.3.2 Our BI Frame for Negative Association . . . . . . . . . . . 63

3.4 𝑀-BI: Combining BI Models . . . . . . . . . . . . . . . . . . . . . . 67
3.4.1 The Syntax and Proof Rules . . . . . . . . . . . . . . . . . . 68
3.4.2 Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
3.4.3 A 𝑀-BI Model for Independence and NA . . . . . . . . . . 71

3.5 Logic of Independence and Negative Association . . . . . . . . . 73
3.5.1 Assertion Logic . . . . . . . . . . . . . . . . . . . . . . . . . 73
3.5.2 Program Logic . . . . . . . . . . . . . . . . . . . . . . . . . 77

3.6 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
3.6.1 Probability-related Axioms for Examples . . . . . . . . . . 81
3.6.2 Bloom filter, High-level . . . . . . . . . . . . . . . . . . . . 84
3.6.3 Bloom filter, Low-level . . . . . . . . . . . . . . . . . . . . . 93
3.6.4 Permutation Hashing . . . . . . . . . . . . . . . . . . . . . 95

viii


3.6.5 Fully-dynamic Dictionary . . . . . . . . . . . . . . . . . . . 97
3.6.6 Repeated Balls-into-bins Process . . . . . . . . . . . . . . . 105

3.7 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

4 A Bunched Logic for Dependence and Independence 114
4.1 DIBI Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

4.1.1 Syntax and semantics . . . . . . . . . . . . . . . . . . . . . 118
4.1.2 Proof system . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
4.1.3 Soundness and Completeness of DIBI . . . . . . . . . . . . 126

4.2 A Probabilistic Model of DIBI . . . . . . . . . . . . . . . . . . . . . 128
4.2.1 A Concrete Probabilistic Frame of DIBI . . . . . . . . . . . 130
4.2.2 Capturing Conditional Independence . . . . . . . . . . . . 133
4.2.3 Validating the Semi-graphoid Axioms . . . . . . . . . . . . 135

4.3 Conditional Probabilistic Separation Logic . . . . . . . . . . . . . 137
4.3.1 CPSL: Assertion Logic . . . . . . . . . . . . . . . . . . . . . 138
4.3.2 Conditional Probabilistic Separation Logic (CPSL) . . . . . 144
4.3.3 Example: CPSL in Action . . . . . . . . . . . . . . . . . . . 147

4.4 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152

5 Bluebell: A Unifying Framework for Independence, Conditional In-
dependence and Relational Reasoning 156
5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
5.2 Preliminaries: Programs and Probability Spaces . . . . . . . . . . 162
5.3 The BLUEBELL Logic . . . . . . . . . . . . . . . . . . . . . . . . . . 167

5.3.1 An Alternative Approach to Bunched Logic . . . . . . . . 167
5.3.2 A Model of Probabilistic Spaces . . . . . . . . . . . . . . . . 170
5.3.3 A Model of Mutable Probabilistic Stores . . . . . . . . . . . 174
5.3.4 Joint Conditioning . . . . . . . . . . . . . . . . . . . . . . . 178
5.3.5 The Rules of Conditioning and Independence . . . . . . . 179

5.4 Reasoning about Programs in BLUEBELL . . . . . . . . . . . . . . . 183
5.5 Case Studies for BLUEBELL . . . . . . . . . . . . . . . . . . . . . . 188

5.5.1 One Time Pad Revisited . . . . . . . . . . . . . . . . . . . . 188
5.5.2 Markov Blankets . . . . . . . . . . . . . . . . . . . . . . . . 192
5.5.3 Multi-party Secure Computation . . . . . . . . . . . . . . . 195
5.5.4 Von Neumann Extractor . . . . . . . . . . . . . . . . . . . . 200

5.6 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205

6 Discussion 209
6.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
6.2 Directions for Future Work . . . . . . . . . . . . . . . . . . . . . . . 211

A Bunched Logic and Probabilistic Separation Logic 231
A.1 Proofs related to Bunched Logic . . . . . . . . . . . . . . . . . . . . 231
A.2 Proofs related to Probabilistic Separation Logic . . . . . . . . . . . 235

ix


B LINA: A Separation Logic for Negative Dependence 242
B.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
B.2 A BI Frame for Negative Association . . . . . . . . . . . . . . . . . 246

B.2.1 Capturing Negative Association . . . . . . . . . . . . . . . 246
B.2.2 Omitted Proofs of Frame Conditions . . . . . . . . . . . . . 251

B.3 Soundness and Completeness of 𝑀-BI algebras . . . . . . . . . . . 253
B.3.1 Algebraic Soundness and Completeness . . . . . . . . . . . 253
B.3.2 Soundness of 𝑀-BI formulas . . . . . . . . . . . . . . . . . 255
B.3.3 Completeness of 𝑀-BI formulas . . . . . . . . . . . . . . . 257

B.4 A 𝑀-BI Model for Independence and Negative Association . . . . 258
B.4.1 Independence Implies PNA . . . . . . . . . . . . . . . . . . 258
B.4.2 Axioms of Negative Association . . . . . . . . . . . . . . . 261
B.4.3 The Restriction Property of 𝑀-BI Formulas . . . . . . . . . 263

C DIBI: A Bunched Logic for Conditional Independence 266
C.1 A Probabilistic Model of DIBI . . . . . . . . . . . . . . . . . . . . . 266

C.1.1 Well-definedness of the Structure . . . . . . . . . . . . . . . 266
C.1.2 Associativity of Parallel Composition . . . . . . . . . . . . 270
C.1.3 Commutativity of Parallel Composition . . . . . . . . . . . 273
C.1.4 Other Properties Used in Proving Frame Conditions . . . . 274
C.1.5 Main Theorem: Proving Frame Conditions . . . . . . . . . 276

C.2 Capturing Conditional Independence . . . . . . . . . . . . . . . . 279
C.2.1 Properties of the Probabilistic Frame . . . . . . . . . . . . . 279
C.2.2 Key Lemmas: Conditional Independence is Expressed . . 283
C.2.3 Validating Graphoid Axioms, Section 4.2.3 . . . . . . . . . 290

C.3 CPSL Assertion Logic . . . . . . . . . . . . . . . . . . . . . . . . . . 292
C.3.1 Restriction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294
C.3.2 Extra Axioms . . . . . . . . . . . . . . . . . . . . . . . . . . 298

C.4 CPSL Soundness . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308

D The Unary Fragment Bluebell for Reasoning About Independence and
Conditional Independence 314
D.1 The Rules of BLUEBELL . . . . . . . . . . . . . . . . . . . . . . . . . 314

D.1.1 Program Semantics . . . . . . . . . . . . . . . . . . . . . . . 315
D.2 Measure Theory Lemmas . . . . . . . . . . . . . . . . . . . . . . . 316
D.3 Construction of the BLUEBELL Model . . . . . . . . . . . . . . . . 331
D.4 Characterizations of Joint Conditioning . . . . . . . . . . . . . . . 336
D.5 Soundness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 340

D.5.1 Soundness of Primitive Rules . . . . . . . . . . . . . . . . . 340
D.5.2 Soundness of Primitive WP Rules . . . . . . . . . . . . . . 360
D.5.3 Soundness of Derived Rules . . . . . . . . . . . . . . . . . . 371

x


LIST OF FIGURES

2.2 BI frame requirements (with outermost universal quantification
omitted). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.3 Satisfaction for BI . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.4 Hilbert system for BI . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.5 pWhile command syntax . . . . . . . . . . . . . . . . . . . . . . . 39
2.6 Semantics of Expressions and Distributions . . . . . . . . . . . . . 41
2.7 Program semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
2.8 Rules of Probabilistic Separation Logic . . . . . . . . . . . . . . . 51

3.1 Hilbert system for 𝑀-BI . . . . . . . . . . . . . . . . . . . . . . . . 69
3.2 New LINA rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
3.3 Bloom filter examples . . . . . . . . . . . . . . . . . . . . . . . . . 85
3.4 Check the membership of a new item . . . . . . . . . . . . . . . . 89
3.5 Permutation hashing . . . . . . . . . . . . . . . . . . . . . . . . . . 95
3.6 Fully-dynamic dictionary [Ding and König, 2011] . . . . . . . . . 98
3.7 Repeated balls-into-bins [Becchetti et al., 2019] . . . . . . . . . . . 106

4.1 From probabilistic programs to kernels . . . . . . . . . . . . . . . 117
4.2 DIBI frame requirements (with outermost universal quantifica-

tion omitted for readability). . . . . . . . . . . . . . . . . . . . . . 120
4.3 Satisfaction for DIBI . . . . . . . . . . . . . . . . . . . . . . . . . . 121
4.4 Hilbert system for DIBI . . . . . . . . . . . . . . . . . . . . . . . . 123
4.5 Proof rules: CPSL . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
4.6 Example programs . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

5.1 Program Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
5.2 Satisfaction for BI formulas on RA . . . . . . . . . . . . . . . . . . 169
5.3 Primitive rules of BLUEBELL. . . . . . . . . . . . . . . . . . . . . . 180
5.4 Derived rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
5.5 The primitive WP rules of BLUEBELL. . . . . . . . . . . . . . . . . 185
5.6 Derived WP rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
5.7 One time pad. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
5.8 Von Neumann extractor. . . . . . . . . . . . . . . . . . . . . . . . . 201
5.9 Proof outline of the Von Neumann extractor example. . . . . . . 202

D.1 The assertions used in BLUEBELL. . . . . . . . . . . . . . . . . . . 314

xi


CHAPTER 1

INTRODUCTION

1.1 Probabilistic Programs

Whether one believes the world we live in is fundamentally deterministic or the

result of some dice rolling, probability offers a useful lens to model and analyze

various phenomena. For example: “Would it rain tomorrow?” “Who would

win US Open this year?” ”How unlikely are these constituencies to be divided

in a such biased way? ” These are all scenarios where we can use probability to

distill our uncertainties into some quantities.

For computer programs, probabilities again play an important role. For in-

stance, when an algorithm’s efficiency can vary largely depending on the in-

put, it makes sense to consider the average cost, which makes an assumption

about the program’s distribution and then computes the expected value of the

algorithm’s cost, when executed on inputs drawn from that distribution. Here,

the probability is not used by the program — we just use it to model our un-

certainties; moreover, we can also harness probabilistic mechanisms in the de-

sign of the algorithms. For example, while the deterministic version of Quick-

sort Hoare [1961] needs 𝑂 (𝑛2) comparisons to sort 𝑛 elements at the worst case,

the randomized Quicksort partitions the array based on a randomized pivot and

makes the average cost for every scenario (including the worst case) 𝑂 (𝑛 log 𝑛).

Similarly, using random bits allows algorithms to guarantee better average per-

formance against adversaries in distributed systems [Fischer et al., 1985, Lynch,

1996] and cache management [Psounis and Prabhakar, 2001, Suri, 2020].

1


Randomness also has many other usages in algorithms. Randomized assign-

ments ensures fairness when we want different outcomes all have possibilities

to occur, each with the desired probability. In cryptography, randomness makes

the secrets hard to guess and thus leads to security guarantees. Randomness

also allows us to trade accuracy for efficiency. For instance, although find-

ing solutions for integer linear programming is NP-hard, randomized round-

ing [Raghavan and Tompson, 1987] finds solutions with a good probability and

runs in polynomial time. In primality testing, where the goal is to determine

whether a given number 𝑛 is prime, Miller–Rabin primality test [Rabin, 1980]

probabilistically samples integers that may witness 𝑛 to be composite and in

polynomial time determines 𝑛’s primality, with an exponentially small proba-

bility of mistaking a composite number as a prime.

Following early breakthroughs in randomized algorithms, the seminal work

Kozen [1981] gives formal semantics for a programming language that allows

the usage of randomness. Roughly, probabilistic programs in Kozen [1981] ex-

tend standard imperative programming language with a command for sam-

pling from distributions. Kozen presents two natural and equivalent semantics

of probabilistic programs: the first reflects the view of probabilistic programs

as standard programs reading a tape of random bits, and the second directly

interprets probabilistic programs as maps from distributions to distributions.

Expressing randomized algorithms as probabilistic programs pins down

their behaviors precisely through the formal semantics and then facilitates rig-

orous analysis of these algorithms. In the study of programming languages,

researchers developed various formal methods for systematically checking the

correctness of programs. One kind of formal method is deductive verification,

2


where we use an expressive logic to specify the desired behavior of a program

and apply logical rules, i.e., deduction, to prove the validity of such specifica-

tions. This thesis will focus on the deductive verification of probabilistic pro-

grams through program logic.

It is worth noting that, besides randomized algorithm, probabilistic pro-

gramming languages are also developed and implemented to describe compli-

cated probabilistic processes succinctly [Gordon et al., 2014]. There, an impor-

tant addition to the language is the conditioning operator, sometimes also called

the observe statement, which transforms a distribution to a conditional distribu-

tion. Notably, the effect of the conditioning operator can be simulated using a

while loop, but adding the conditioning operator in the language facilitates po-

tentially different implementations of this command, whose effect is computa-

tionally expensive to implement exactly and often only approximated. Histori-

cally, the design of randomized algorithms rarely use the conditioning operator,

Since we focus on verifying probabilistic programs for randomized algorithms

in this thesis, and we leave out the conditioning operator in our probabilistic

programming language.

1.2 Independence and Dependencies in Programs

In our discourse, we consider probabilistic program variables as a superset of

deterministic program variables: each probabilistic program variable’s value is

sampled from a distribution, and a deterministic program variable can be con-

sidered as sampling its value from a point-mass distribution. We also abbreviate

“probabilistic program variables” as “variables” sometimes.

3


An important and ubiquitous relation between two probabilistic programs

variables is probabilistic independence. Independence between two variables

means that their values are unrelated, i.e., knowing the outcome of one of the

variables does not change one’s knowledge of the distribution of the other, vice

versa. Intuitively (and somewhat tautologically), two probabilistic variables are

independent if they are derived from fresh and distinct sources of randomness,

like two coin flips. In contrast, a coin flip 𝑥 and the derived variable 𝑥 + 1 are

clearly not independent because the value of 𝑥 dictates the value 𝑥 + 1 gets.

However, two variables that use a shared source of randomness and even have

logical dependency can also be probabilistically independent. For instance, in

one-time-pad encryption, we assume a 𝑙-bit message 𝑚 is drawn from some

distribution over binary strings, and we draw a key 𝑘 from the uniform dis-

tribution over 𝑙-bit binary strings and encrypt the message 𝑚 into 𝑐, defined

to be 𝑚 xor 𝑘 . This ciphered message 𝑐 is probabilistically independent of the

original message 𝑚, though it is also clearly derived from the original message.

Also, in the degenerated case, when a variable is deterministic, then knowing its

value does not give any information about how the outcome of other variables

are sampled. Because of that, deterministic variables are independent from any

other variables.

Sometimes, two variables 𝐴, 𝐵 are not exactly probabilistic independent but,

when we fix the value of a third variable 𝐶, then the values of 𝐴 and 𝐵 be-

come irrelevant. This is a case where 𝐴 and 𝐵 are conditionally independent given

𝐶. Or, two variables 𝐴, 𝐵 may be negatively associated in that when 𝐴 attains a

higher value, then 𝐵 tends to attain a lower value. We will refer to such re-

lations between program variables concerning their probabilistic dependencies

and independence as (in)dependencies.

4


Knowing the (in)dependencies between program variables can be extremely

helpful in program analysis for multiple reasons. First, sometimes ensuring the

desired (in)dependencies is straightforwardly the goal. For example, in cryptog-

raphy, perfect security means that the public information is independent from

the secrets. In multi-party secured computation, multiple parties want to com-

pute a value that depends on each party’s secrets without divulging their own

secrets. The perfect security is not an appropriate goal here because the dif-

ferent parties want the computed result to be made public and that value is not

independent from their secrets. A more appropriate goal is the conditional inde-

pendence of each party’s view and the other parties’ secrets given the outcome

of the computed result, so establishing that conditional independence proves

the protocol correct.

Second, (in)dependencies facilitates further analysis. For example, the law

of large numbers says that, if we draw a large number of independent sam-

ples from a distribution, then the empirical average of the results converges

to the expected value. Various inequalities upper-bound the probability that

the empirical average deviates from the expected value for more than certain

amount — these inequalities are called the “concentration bounds.” Concentra-

tion bounds can be applied, for instance, to upper-bound the probability that the

randomized Quicksort terminates within some desired time bound on an arbi-

trary instance, because the choice of each randomized pivot is independent and

it is unlikely to always choose the “bad” pivots. Some concentration bounds

also hold for negatively associated variables. Intuitively, if one variable getting

a bigger outcome means the others get smaller outcomes, then their deviation

from the expected value would likely cancel out. As an application, concentra-

tion bounds also help analyzing the collision probability or overflowing prob-

5


ability of hash algorithms: when we hash a fixed number of items into a set

of buckets, one bucket getting more items means less items can go to the other

buckets, so the number of items hashed to different buckets are negatively asso-

ciated, and thus we can apply concentration bounds to deduce that it is unlikely

for many buckets to get a lot of items.

Other than program analysis, we can also leverage (in)dependencies to rep-

resent a probabilistic model more concisely [Koller and Friedman, 2009], iden-

tify parallelizable computations, and perform more efficient probabilistic infer-

ence [Holtzen, 2021].

Analyzing (in)dependencies between probabilistic program variables, how-

ever, is intricate. First of all, testing probabilistic properties is hard. For deter-

ministic programs, we can run an implementation and test whether a property

is violated by the implementation; for probabilistic behaviors, however, test-

ing can only exhibit a finite number of execution traces, from which we can-

not conclude (in)dependencies between program variables in the distribution

of execution traces with certainty. Second, our mental model of probabilistic

(in)dependencies can be unreliable. As we illustrated above through the exam-

ple of one-time-pad encryption, somewhat counter-intuitively, logically depen-

dent variables can also be probabilistically independent. As another example,

consider the Bloom filter [Bloom, 1970], a widely-used randomized data struc-

ture for membership queries, which is highly space-efficient, at the price of re-

turning false positives sometimes. A Bloom filter stores a relatively small array

of 0-1 bits, and an item is mapped to a set of indices on the array using dis-

tinct hash functions and the corresponding bits are flipped to 1 when an item

is added. When analyzing the Bloom filter’s false positive rate, many sources

6


(e.g., Mullin [1983], Blustein and El-Maazawi [2002]) have mistaken the value

on different indices of the Bloom filter as independent,1 while they are not be-

cause one index flipped to 1 means other indices are more likely to be 0. A

possible explanation of such confusion is that people may intuitively think that

independence is preserved through arbitrary composition, while they are not.

These difficulties all speak to the need for deductive verification of

(in)dependencies in probabilistic programs, whose rigor allows us to confi-

dently use (in)dependencies in analysis.

1.3 Separation Logic for Independence and Dependencies

Separation logic extends Hoare logic to reason about programs. Originally,

it was developed to verify programs that manipulate pointer data structures,

i.e., heaps. The core innovation is the introduction of ”separating conjunction”

(symbolized by ∗), a logical connective that allows assertions about distinct,

non-overlapping regions of memory to be combined. Unlike traditional con-

junction 𝑃 ∧ 𝑄 that only requires the validity of two assertions 𝑃 and 𝑄, sepa-

rating conjunction 𝑃 ∗ 𝑄 also asserts the disjointness of the subheaps validat-

ing 𝑃 and 𝑄. At the program logic level, the signature frame rule allows local

reasoning about heap manipulations while preserving propositions on disjoint

pieces of memory. Using these new assertions and rules, Separation Logic ad-

dresses a critical limitation of classical Hoare logic, that reasoning about pointer-

manipulating programs was hindered by complex aliasing and interference be-

1This issue is first pointed out by Bose et al. [2008], which also attempted to fix it. Chris-
tensen et al. [2010] later identified an issue in the definition of Stirling numbers of the second
kind in Bose et al. [2008]. Gopinathan and Sergey [2020] formally certifies the analysis using
the theorem prover ROCQ.

7


tween memory regions.

The ideas that “we can reason about separate components separately” makes

no special assumptions about heaps, so Separation Logic can be a general tool

for reasoning about resources that can be separated or shared among different

entities. A influential extension of heap-based Separation Logic is Concurrent

Separation Logic (CSL) [Brookes, 2007a, Vafeiadis and Parkinson, 2007, Brookes,

2007b], which leverages the separating conjunction to ensure that concurrent

modifications to the heap are localized and do not interfere with each other. It

has led to practical and scalable verification tools like Infer [Facebook] for au-

tomatically verifying properties important to security, concurrency and in other

domains.

More recently, probabilistic separation logic (PSL) by Barthe et al. [2019]

reappropriates Separation Logic for reasoning about probabilistic programs,

with the insight that independence is a separation between different compo-

nents of a distribution. They do not make the distinction between the store

and the heap — both are considered as memories — and build their program

logic for probabilistic programs interpreted as maps between distributions over

memories. In PSL, the separation conjunction 𝑃 ∗ 𝑄 asserts the indepen-

dence of the formula 𝑃 and formula 𝑄 by requiring 𝑃 and 𝑄 to use disjoint

sets of variables and the two sets of variables to be independent. PSL enjoys

a proof system analogous to Separation Logic, also with a frame rule, but in-

stead for establishing probabilistic independence of probabilistic program vari-

ables. While Barthe et al. [2019] demonstrates that their program logic, with

the help of domain-specific axioms, can establish probabilistic independence in

several cryptography-based examples, we want to know how much further we

8


can push this idea.

Concretely, we ask the following questions:

1. Can we also adapt separation logic for reasoning about ”probabilistic sep-

aration” notions that are weaker than independence, such as conditional

independence or negative association?

2. Can we make the assertion logic more expressive? For instance, existing

PSL conflated probabilistic independence and variable disjointness; can

we precisely assert probabilistic independence without assuming variable

disjointness?

3. Can this style of “probabilistic separation logic” scale to bigger, more com-

plicated programs?

1.4 Outline of the Thesis

In this thesis, we first overview the assertion logic underpinning separation

logic, Bunched Logic (abbreviated as BI for “the logic of Bunched Implications”),

in chapter 2. The original BI is an important stepping stone before we intro-

duce its variations and other practical models of probabilistic separation logic.

In chapter 3, we extend probabilistic separation logic to also support composi-

tional reasoning of negative association and call the new logic LINA. In chap-

ter 4, we introduce a new assertion logic DIBI, which extends BI with a non-

commutative conjunction for modeling dependent resources, and design a pro-

gram logic CPSL on top of DIBI for proving conditional independence in prob-

abilistic programs. Chapter 3 and Chapter 4 together give a positive answer to

9


Question 1.

Last, in chapter 5, we focus on the unary fragment of BLUEBELL, a pro-

gram logic designed for integrating unary and relational reasoning of proba-

bilistic program. The unary fragment of BLUEBELL gives an alternative program

logic for proving conditional independence and independence. While CPSL ex-

presses conditional independence using two different conjunctions, BLUEBELL,

inspired by Li et al. [2023a], introduces a modality to the logic for condition-

ing on distributions and expresses conditional independence using the modal-

ity and the usual separation conjunction for independence. This new modality

also allows us to express probabilistic dependence such as, depending on the

outcome 𝑣 of the variable 𝑥, the variable 𝑦 is distributed as some 𝜅(𝑣). Mean-

while, similar to LINA and CPSL, BLUEBELL is a program logic developed for

imperative probabilistic programs. In BLUEBELL, we are able to decouple the as-

sumption of variable disjointness from assertion of probabilistic independence,

using the probabilistic independence BI model proposed by Li et al. [2023a] and

permissions, a concept developed in the concurrent separation logic for tracking

who can read from and write into a resource. This feature also answers Question

2 positively.

In addition, we apply LINA and BLUEBELL on some non-trivial probabilis-

tic programs, demonstrating their potential to scale. CPSL is only applied to

smaller examples. One difficulty in applying CPSL to more complicated pro-

grams is that, as a result of our design choices, the program logic rules only

apply to assertions following certain syntactic restrictions. In designing BLUE-

BELL, we prioritize the ergonomics and no longer impose syntactic restrictions

to assertion logic; instead, all assertions can be used in the program logic rules.

10


We also see this as a step towards a more scalable probabilistic separation logic,

thus making progress in answering Question 3.

We also want to note that chapter 2 is mainly based on prior work Docherty

[2019] and Barthe et al. [2019]. Chapter 4 is based on Bao et al. [2021]; Chapter 3

is based on Bao et al. [2022]; Chapter 5 is based on Bao et al. [2025].

11


CHAPTER 2

BUNCHED LOGIC AND PROBABILISTIC SEPARATION LOGIC

2.1 Background

A key feature of Separation Logic is using bunched logic instead of the usual

propositional logic or first-order logic for asserting program states. Bunched

logic is a substructural logic formulated by O’Hearn and Pym [1999]. The usual

propositional logic satisfies three structural rules — WEAKENING, CONTRAC-

TION and EXCHANGE. Intuitively, WEAKENING allows one to add unused

things to the context; CONTRACTION allows one to contract duplicated things

in the context, and EXCHANGE allows one to exchange things in the context.

Bunched logic does not require WEAKENING and CONTRACTION. The lack of

contraction makes its contexts behave like non-duplicable resources; in addi-

tion, the lack of weakening makes its contexts behave like resources that have

to be used. While this choice of structural rules is exactly the same as in linear

logic, bunched logic also allows contexts joined by another connective ‘;’ that

satisfies all three structural rules. That is, in sequent calculus style presentation,

Γ ⊢ 𝜓
Γ; 𝜙 ⊢ 𝜓

WEAKENING
Γ; 𝜙; 𝜙 ⊢ 𝜓
Γ; 𝜙 ⊢ 𝜓

CONTRACTION

Γ1; 𝜙; Γ2;𝜓; Γ3 ⊢ 𝜃
Γ1;𝜓; Γ2; 𝜙; Γ3 ⊢ 𝜃

EXCHANGE-1

Γ1, 𝜙, Γ2, 𝜓, Γ3 ⊢ 𝜃
Γ1, 𝜓, Γ2, 𝜙, Γ3 ⊢ 𝜃

EXCHANGE-2

(a) Substructural rules for bunched logic

,

;

𝜑 𝜓

𝑥

(b) An example
context

12


bunched logic has the structural rules in fig. 2.1a. Contexts that interleave these

two connectives are tree-structured instead of list-structured, for example, as

the context given in fig. 2.1b, thus giving the logic the name bunched logic.

Since it allows different ways to combine the contexts, bunched logic pro-

vides a flexible foundation for reasoning about resources whose sharing and

separation need careful accounting. We can already see it from the connectives

in bunched logic. First, the two ways to combine the contexts induce two con-

junctions, the multiplicative conjunction ∗ and the additive conjunction ∧,

Γ ⊢ 𝜙 Δ ⊢ 𝜓
∗-I

Γ,Δ ⊢ 𝜙 ∗ 𝜓
Γ ⊢ 𝜙 Δ ⊢ 𝜓

∧-I
Γ;Δ ⊢ 𝜙 ∧ 𝜓

Informally, the assertion 𝜙 ∗ 𝜓 can be used to ensure properties 𝜙, 𝜓 hold on sep-

arate resources, while 𝜙 ∧ 𝜓 allows us to assert the validity of facts 𝜙, 𝜓 without

extra requirements. Analogously, bunched logic has a multiplicative implica-

tion as well as a standard implication→,

Γ, 𝜙 ⊢ 𝜓
−∗-I

Γ ⊢ 𝜙 −∗ 𝜓
Γ; 𝜑 ⊢ 𝜓

→-I
Γ ⊢ 𝜙→ 𝜓

The multiplicative version 𝜙 −∗ 𝜓 asserts that combining current state with a

separate resource satisfying 𝜙 would validate 𝜓, while 𝜙 → 𝜓 simply asserts

that fact 𝜙 implies 𝜓.

In this chapter, we first give a formal overview of bunched logic, introducing

its syntax, semantics, and proof system; we then show that the proof system is

sound and complete. All the methodology and proofs of this part are taken

from Docherty [2019]. What we aim for is to list the precise definitions and

results needed for the rest of the chapters; we also detail some cases of induction

proofs omitted in Docherty [2019] to illustrate how they are proved.

13


It is worth noting that there are varied presentations of bunched logic’s se-

mantics in the literature: the original paper by [O’Hearn and Pym, 1999] inter-

prets BI formula over doubly closed categories; early works in separation logic

often interpret BI over partial commutative monoids that satisfies extra condi-

tions [Calcagno et al., 2007]; more recent works in higher-order concurrent sep-

aration logic use a customized resource algebra whose binary operation is total

and may not have a single unit, with extra functions on elements [Jung et al.,

2018], etc. We adopt the system from Simon Docherty’s thesis [Docherty, 2019],

because it provides a uniform account of various bunched logics, accompanied

with a completeness proof — we do not know if the proof system is complete

with other variations of semantics.

After introducing the metatheory of bunched logic, we introduce a proba-

bilistic separation logic based on bunched logic. First, we describe a concrete

bunched logic model XD based on probabilistic memories, i.e., distributions

over program memories. In this model, separating conjunction can be used to

assert probabilistic independence. Then, we define an imperative probabilistic

language pWhile that operates on probabilistic memories. Last, in this chapter,

we describe a program logic that reasons about pWhile programs with speci-

fications asserted using the bunched logic formulas of the concrete probabilis-

tic model XD. This program logic is a simplified but also generalized version

of probabilistic separation logic in prior work [Barthe et al., 2019] for proving

probabilistic independence.

14


2.2 Bunched Logic (BI)

2.2.1 Syntax and Semantics

The set of BI formulas, FormBI, extends propositional formula with the multi-

plicative conjunction 𝑃 ∗ 𝑄, and the implication 𝑃 −∗ 𝑄 and the unit 𝐼 associated

with it.

𝑃,𝑄 ::= 𝑝 ∈ AP | ⊤ | ⊥ | 𝐼 | 𝑃 ∧𝑄 | 𝑃 ∨𝑄 | 𝑃→ 𝑄 | 𝑃 ∗ 𝑄 | 𝑃 −∗ 𝑄

BI formulas are interpreted on a kind of mathematical structure named BI

frames.

Definition 2.2.1 (Downwards-Closed BI Frame). A Downwards-Closed BI frame

is a structure X = (𝑋, ⊑, ◦, 𝐸) such that ⊑ is a preorder (i.e., a transitive and

reflexive relation), 𝐸 ⊆ 𝑋 , and ◦ : 𝑋 × 𝑋 → P(𝑋) is a non-deterministic binary

operation, satisfying the rules in Figure 2.2.

𝑧 ∈ 𝑥 ◦ 𝑦 → 𝑧 ∈ 𝑦 ◦ 𝑥; (Commutativity)
𝑤 ∈ 𝑡 ◦ 𝑧 ∧ 𝑡 ∈ 𝑥 ◦ 𝑦 → ∃𝑠

(
𝑠 ∈ 𝑦 ◦ 𝑧 ∧ 𝑤 ∈ 𝑥 ◦ 𝑠

)
; (Associativity)

∃𝑒 ∈ 𝐸 (𝑥 ∈ 𝑒 ◦ 𝑥); (Unit Existence)
𝑒 ∈ 𝐸 ∧ 𝑒 ⊑ 𝑒′ → 𝑒′ ∈ 𝐸 ; (Unit Closure)
𝑒 ∈ 𝐸 ∧ 𝑦 ∈ 𝑥 ◦ 𝑒 → 𝑥 ⊑ 𝑦; (Unit Coherence)
𝑧 ∈ 𝑥 ◦ 𝑦 ∧ 𝑥′ ⊑ 𝑥 ∧ 𝑦′ ⊑ 𝑦′ → ∃𝑧′ (𝑧′ ⊑ 𝑧 ∧ 𝑧′ ∈ 𝑥′ ◦ 𝑦′). (Down-Closed)

Figure 2.2: BI frame requirements (with outermost universal quantification
omitted).

Intuitively, 𝑋 is a set of states, the preorder ⊑ describes when a smaller state

can be extended to a larger state, the binary operator ◦ offers a way of combin-

ing states, and 𝐸 is a set of states that act like units with respect to ◦. The binary

15


operator returns a set of states instead of a single state, and thus it can be de-

terministic (at most one state returned) or non-deterministic, partial (empty set

returned) or total. In alternative presentations of BI frames as partial commuta-

tive monoids, the binary operator is defined to be a partial map 𝑋 × 𝑋 ⇀ 𝑋 . But

the proof of Bunched logic’s completeness relies on the frame’s admission of

non-deterministic models Docherty [2019]. Furthermore, the non-deterministic

combination is useful for reasoning about probabilistic states, as we showcase

in Chapter 3 for negative dependence. For the preorder, there are two opposite

but equally sensible readings of 𝑥 ⊑ 𝑦 where 𝑥 and 𝑦 are interpreted as resources:

1. 𝑦 as a resource is an extension of resource 𝑥 and we can convert 𝑦 to 𝑥 by

using up some part of 𝑦;

2. Or, resource 𝑥 converts to resource 𝑦.

To avoid confusion, in this thesis, we consistently use the first reading. Also, we

sometimes write 𝑥 ⊒ 𝑦 as an interchangeable notation for 𝑦 ⊑ 𝑥.

The frame conditions define properties that must hold for all models of BI.

The first three properties (Commutativity), (Associativity), and (Unit Existence)

can be viewed as generalizations of familiar algebraic properties to non-

deterministic operations. (Unit Existence) also relaxes the usual unit existence

axiom for monoids, which states that there is one element 𝑒 that is the unit for

all other elements with respect to the binary operation, to allow different units

𝑒 ∈ 𝐸 chosen for different 𝑥. (Unit Closure) states that the set 𝐸 is closed under

the preorder ⊑. (Unit Coherence) say that if 𝑦 can be obtained composing 𝑥 with

a unit 𝑒 ∈ 𝐸 , then 𝑦 is an extension of 𝑥; roughly, this ensures that 𝐸 only has

elements that behave like units. Last, (Down-Closed) is another coherence con-

dition for the order ⊑ and the composition ◦, which says that for 𝑧 ∈ 𝑥 ◦ 𝑦, then

16


the composition of any 𝑥′ smaller than 𝑥 and any 𝑦′ smaller than 𝑦 contains an

element 𝑧′ smaller than 𝑧. Informally, it says that the resource conversion of the

components 𝑥, 𝑦 translates into the resource conversion of the composition 𝑧.1

We then use a Kripke-style semantics for BI. Given a BI frame, the semantics

defines which states in the frame satisfy each formula. Since the semantics is

defined inductively on formulas, we first need a specification of which states

satisfy the atomic propositions.

Definition 2.2.2 (Valuation and model). A persistent valuation is an assignment

V : AP → P(𝑋) of atomic propositions to subsets of states of a BI frame satis-

fying: if 𝑥 ∈ V(𝑝) and 𝑦 ⊒ 𝑥 then 𝑦 ∈ V(𝑝). A BI model (X,V) is a BI frame X

together with a persistent valuationV.

We now give a semantics to BI formulas in a BI model.

𝑥 |=V ⊤ always
𝑥 |=V ⊥ never
𝑥 |=V 𝐼 iff 𝑥 ∈ 𝐸
𝑥 |=V p iff 𝑥 ∈ V(p)
𝑥 |=V 𝑃 ∧𝑄 iff 𝑥 |=V 𝑃 and 𝑥 |=V 𝑄
𝑥 |=V 𝑃 ∨𝑄 iff 𝑥 |=V 𝑃 or 𝑥 |=V 𝑄
𝑥 |=V 𝑃→ 𝑄 iff for all 𝑦 ⊒ 𝑥, 𝑦 |=V 𝑃 implies 𝑦 |=V 𝑄
𝑥 |=V 𝑃 ∗ 𝑄 iff there exist 𝑥′, 𝑦, 𝑧 s.t. 𝑥 ⊒ 𝑥′ ∈ 𝑦 ◦ 𝑧, 𝑦 |=V 𝑃 and 𝑧 |=V 𝑄
𝑥 |=V 𝑃 −∗ 𝑄 iff for all 𝑦, 𝑧 s.t. 𝑧 ∈ 𝑥 ◦ 𝑦, 𝑦 |=V 𝑃 implies 𝑧 |=V 𝑄

Figure 2.3: Satisfaction for BI

Definition 2.2.3 (BI Satisfaction and Validity). Satisfaction at a state 𝑥 of a model

(X,V) is inductively defined by the clauses in Figure 2.3. 𝑃 is valid in a model,

X |=V 𝑃, iff 𝑥 |=V 𝑃 for all 𝑥 ∈ X. 𝑃 is valid, |= 𝑃, iff 𝑃 is valid in all models.

𝑃 |= 𝑄 iff, for all models (X,V), for any state 𝑥 ∈ X, 𝑥 |=V 𝑃 implies 𝑥 |=V 𝑄.
1It is also possible to interpret BI formulas on structures without (Down-Closed) while still

ensuring soundness and completeness with respect to usual BI proof system and the persis-
tence of formulas, but other axioms ((Associativity)) needs to be more delicate. The assumption
of (Down-Closed) is common in the presentation of BI models in the literature.

17


Where the context is clear, we omit the subscript V on the satisfaction re-

lation. With the semantics in Figure 2.3, persistence on propositional atoms

extends to all formulas:

Lemma 2.2.1 (Persistence Lemma). For all BI formula 𝑃, if 𝑥 |= 𝑃 and 𝑦 ⊒ 𝑥 then

𝑦 |= 𝑃.

Remark The emphasis on properties being persistent roots back to the history

of intuitionistic logic. Classical logic has the law of excluded middle, ⊢ 𝜑 ∨ ¬𝜑,

which says that for any property 𝑝, either 𝑝 holds or 𝑝 does not hold. However,

with some readings of formula satisfaction, the law seems to be on precarious

ground. For instance, if we interpret 𝑥 |= 𝑝 as saying that at state 𝑥, the fact

𝑝 has been verified to be true, and then, 𝑥 |= ¬𝑝 would be saying that at state

𝑥, the fact ¬𝑝 has been verified to be true, then we should not expect the law

of excluded middle to be valid — it is possible that neither 𝑝 nor ¬𝑝 has been

verified. This motivates non-classical logic without the law of excluded middle,

and furthermore, many properties that motivate such readings of formulas are

naturally persistent. For instance, suppose states are ordered by temporal order,

if 𝑝 has been verified to be true at state 𝑥, then for every state 𝑥′ following 𝑥, 𝑝

has been verified to be true at 𝑥′ too.

2.2.2 Proof System

In the study of logic, we are not only interested in when a formula holds, which

is captured by the semantics, but also interested in how to prove a formula holds

– a useful approach is to derive new formulas using formulas known to hold

18


𝑃 ⊢ 𝑃
AX

𝑃 ⊢ ⊤
TOP

⊥ ⊢ 𝑃
BOT

𝑃 ⊢ 𝑅 𝑄 ⊢ 𝑅
𝑃 ∨𝑄 ⊢ 𝑅

∨-E
𝑃 ⊢ 𝑄𝑖

𝑃 ⊢ 𝑄1 ∨𝑄2
∨-I

𝑃 ⊢ 𝑄 𝑃 ⊢ 𝑅
𝑃 ⊢ 𝑄 ∧ 𝑅

∧-I-R
𝑄 ⊢ 𝑅

𝑃 ∧𝑄 ⊢ 𝑅
∧-I-L

𝑃 ⊢ 𝑄1 ∧𝑄2

𝑃 ⊢ 𝑄𝑖
∧-E

𝑃 ∧𝑄 ⊢ 𝑅
𝑃 ⊢ 𝑄 → 𝑅

→-I
𝑃 ⊢ 𝑄 → 𝑅 𝑃 ⊢ 𝑄

𝑃 ⊢ 𝑅
→-E

𝑃 ⊢ 𝑅 𝑄 ⊢ 𝑆
𝑃 ∗ 𝑄 ⊢ 𝑅 ∗ 𝑆

∗-CONJ
𝑃 ∗ 𝑄 ⊢ 𝑅
𝑃 ⊢ 𝑄 −∗ 𝑅

−∗-I
𝑃 ⊢ 𝑄 −∗ 𝑅 𝑆 ⊢ 𝑄

𝑃 ∗ 𝑆 ⊢ 𝑅
−∗-E

𝑃 ⊣⊢ 𝑃 ∗ 𝐼
∗-UNIT

𝑃 ∗ 𝑄 ⊢ 𝑄 ∗ 𝑃
∗-COMM

(𝑃 ∗ 𝑄) ∗ 𝑅 ⊣⊢ 𝑃 ∗ (𝑄 ∗ 𝑅)
∗-ASSOC

Figure 2.4: Hilbert system for BI

following syntactic rules in a proof system. We present a Hilbert-style proof sys-

tem for BI in fig. 2.4. This calculus extends a system for propositional logic

with additional rules governing the multiplicative connectives ∗ and −∗ and the

multiplicative unit 𝐼. These rules say that the multiplicative conjunction ∗ is

commutative, associative, the multiplicative unit 𝐼 interacts with ∗ as expected,

and −∗ is adjoint to ∗ just as the regular→ is adjoint to ∧.

A useful proof system for a logic should be sound with respect to its seman-

tics. That is, if a formula 𝜙 is derivable from another formula 𝜑 using the rules

in the proof system, then 𝜓 should always hold when 𝜙 holds. On top of that,

19


it is nicer if the proof system is also complete with respect to its semantics. That

requires, if 𝜓 always holds when 𝜙 holds, then 𝜓 is derivable from 𝜙 as well.

2.2.3 Soundness and Completeness of BI

A methodology for proving the soundness and completeness of bunched logic

is given by Docherty [2019], inspired by the duality-theoretic approach to modal

logic Goldblatt [1989]. First, BI is proved sound and complete with respect to

an algebraic semantics obtained by interpreting the rules of the proof system

as algebraic axioms. Next, the algebraic soundness is used to establish sound-

ness of the proof system with respect to the Kripke semantics, and similarly, the

algebraic completeness is used to establish overall completeness.

Notably, a more straightforward proof for the soundness of the proof system

is by induction on the proof rules; here, we instead present the duality-theoretic

approach for proving soundness to illustrate the technique.

Algebraic Soundness and Completeness of BI Proof System

The algebraic semantics interpret BI formulas into elements in a structure that

we call BI algebra.

Definition 2.2.4 (BI Algebra). A BI algebra is an algebra A = (𝐴,∧A,∨A,→A

,⊤A,⊥A, ∗A,−∗A, 𝐼A) such that, for all 𝑎, 𝑏, 𝑐, 𝑑 ∈ 𝐴:

• (𝐴,∧A,∨A,→A,⊤A,⊥A) is a Heyting algebra, i.e., (𝐴,∧A,∨A,⊤A,⊥A) forms

a bounded lattice (with join and meet operations written ∨A and ∧A and

20


with least element ⊥A and greatest element ⊤A) and→A is a binary opera-

tion such that 𝑎 ∧A 𝑏 ≤A 𝑐 is equivalent to 𝑎 ≤ 𝑏 →A 𝑐.

• (𝐴, ∗A, 𝐼A) is a commutative monoid;

• 𝑎 ∗A 𝑏 ≤ 𝑐 iff 𝑎 ≤ 𝑏 −∗A 𝑐, where ≤ is the ordering associated with the

Heyting algebra.

In the following, we drop the subscripts A when it is clear that we are re-

ferring to elements and operations in the BI algebra and overload the notations

⊤,⊥, ∗,−∗, 𝐼, which are also used as connectives in BI formulas. By Goldblatt

[1989], the residuation property 𝑎 ∗A 𝑏 ≤ 𝑐 iff 𝑎 ≤ 𝑏 −∗A 𝑐 implies the following

useful properties.

Lemma 2.2.2. Given any BI algebra A, for any 𝑎, 𝑏, 𝑐 ∈ 𝐴, the following properties

hold:

(𝑎 ∨ 𝑏) ∗ 𝑐 = (𝑎 ∗ 𝑐) ∨ (𝑏 ∗ 𝑐) (BI-Alg:Dist-1)

𝑎 ∗ (𝑏 ∨ 𝑐) = (𝑎 ∗ 𝑏) ∨ (𝑎 ∗ 𝑐) (BI-Alg:Dist-2)

𝑎 ≤ 𝑎′ and 𝑏 ≤ 𝑏′ implies 𝑎 ∗ 𝑏 ≤ 𝑎′ ∗ 𝑏′ (BI-Alg:Coh)

⊥ ∗ 𝑎 = ⊥ = 𝑎 ∗ ⊥ (BI-Alg:Bot)

We can interpret bunched logic formulas in a BI algebra A. Given an assign-

ment V from atomic propositions to the carrier set of A, we can extend it to an

algebraic interpretation of bunched logic formulas ⟦−⟧A : FormBI → 𝐴 by taking

21


the unique homomorphic extension of this assignment:

⟦𝑝⟧A = V(𝑝)

⟦⊤⟧A = ⊤

⟦𝐼⟧A = 𝐼∗

⟦⊥⟧A = ⊥

⟦𝑃 ∧𝑄⟧A = ⟦𝑃⟧A ∧ ⟦𝑄⟧A

⟦𝑃 ∨𝑄⟧A = ⟦𝑃⟧A ∨ ⟦𝑄⟧A

⟦𝑃→ 𝑄⟧A = ⟦𝑃⟧A → ⟦𝑄⟧A

⟦𝑃 ∗ 𝑄⟧A = ⟦𝑃⟧A ∗ ⟦𝑄⟧A

⟦𝑃 −∗ 𝑄⟧A = ⟦𝑃⟧A −∗ ⟦𝑄⟧A

Theorem 2.2.3 (Algebraic Soundness). If 𝑃 ⊢ 𝑄 is derivable, then ⟦𝑃⟧A ≤ ⟦𝑄⟧A

for all algebraic interpretations ⟦−⟧A.

Proof. By induction on the derivation of 𝑃 ⊢ 𝑄. For instance, for the case of ∗-

CONJ: if 𝑃 ⊢ 𝑅 and 𝑄 ⊢ 𝑆, then by inductive hypothesis, we have ⟦𝑃⟧A ≤ ⟦𝑅⟧A

and ⟦𝑄⟧A ≤ ⟦𝑆⟧A for all algebraic interpretations ⟦−⟧A. By BI-Alg:Coh, that

means ⟦𝑃⟧A ∗ ⟦𝑄⟧A ≤ ⟦𝑅⟧A ∗ ⟦𝑆⟧A; therefore, for any algebraic interpretation

⟦−⟧A,

⟦𝑃 ∗ 𝑄⟧A = ⟦𝑃⟧A ∗ ⟦𝑄⟧A ≤ ⟦𝑅⟧A ∗ ⟦𝑆⟧A = ⟦𝑅 ∗ 𝑆⟧A

□

To prove algebraic completeness, we construct a term BI algebra by quoti-

enting formulas by equiderivability.

22


Definition 2.2.5 (Lindenbaum-Tarski Algebra). The Lindenbaum-Tarski algebra

corresponding to the bunched logic is the set of all equivalence classes of inter-

provable propositions. That is, define the equivalence relation 𝑃 ∼ 𝑄 as 𝑃 ⊢ 𝑄

and 𝑄 ⊢ 𝑃. Take 𝐼L, ⊤L, and ⊥L to be [𝐼]∼, [⊤]∼, and [⊥]∼, respectively. Then we

define:

[𝑃]∼ ∧L [𝑄]∼ = [𝑃 ∧𝑄]∼ (Lindenbaum–Tarski–And)

[𝑃]∼ ∨L [𝑄]∼ = [𝑃 ∨𝑄]∼ (Lindenbaum–Tarski-Or)

[𝑃]∼ →L [𝑄]∼ = [𝑃→ 𝑄]∼ (Lindenbaum–Tarski–Imp)

[𝑃]∼ ∗L [𝑄]∼ = [𝑃 ∗ 𝑄]∼ (Lindenbaum–Tarski–SepAnd)

[𝑃]∼ −∗L [𝑄]∼ = [𝑃 −∗ 𝑄]∼ (Lindenbaum–Tarski–SepImp)

Lemma 2.2.4. The operations ∧L,∨L,→L, ∗L,−∗L are well-defined. Also, the structure

({[𝑃]∼}𝑃∈FormBI ,∧L,∨L,→L,⊤L,⊥L, ∗L,−∗L, 𝐼L) in Lindenbaum-Tarski algebra forms a

BI algebra.

Furthermore, let ⟦−⟧L be the algebraic interpretation obtained by extending

the assignment 𝑝 ↦→ [𝑝]∼ for each atomic proposition 𝑝.

Lemma 2.2.5. For any formula 𝑃 ∈ FormBI, ⟦𝑃⟧L = [𝑃]∼.

The proof for lemma 2.2.4 and lemma 2.2.5 are straightforward, and we omit

them here. The Lindenbaum-Tarski algebra is crucially used in the proof of

algebraic completeness.

Theorem 2.2.6 (Algebraic Completeness). If ⟦𝑃⟧A ≤ ⟦𝑄⟧A for all algebraic inter-

pretations ⟦−⟧A, then 𝑃 ⊢ 𝑄 is derivable.

Proof. For any 𝑃,𝑄 ∈ FormBI, if ⟦𝑃⟧A ≤ ⟦𝑄⟧A for all algebraic interpretations,

then ⟦𝑃⟧L ≤ ⟦𝑄⟧L in the Lindenbaum-Tarski algebra. By lemma 2.2.5, that

23


means [𝑃]∼ ≤ [𝑄]∼. In addition,

[𝑃]∼ ≤ [𝑄]∼

⇔[⊤]∼ ∧A [𝑃]∼ ≤ [𝑄]∼

(𝑃 ⊣⊢ ⊤ ∧ 𝑃 by TOP, ∧-E, ∧-I-R, and also by definition Lindenbaum–Tarski–And)

⇔[⊤]∼ =A [𝑃]∼ →A [𝑄]∼ (By the residuation property of the bounded lattice)

⇔[⊤]∼ =A [𝑃→ 𝑄]∼ (By definition Lindenbaum–Tarski–Imp)

⇔⊤ ⊢ 𝑃→ 𝑄 ( TOP gives the other direction 𝑃→ 𝑄 ⊢ ⊤ of equiderivability)

⇔𝑃 ⊢ 𝑄 (By ∧-I-R, ∧-I-L, ∧-I,→-E)

Thus, if ⟦𝑃⟧A ≤A ⟦𝑄⟧A for all BI algebras, then 𝑃 ⊢ 𝑄. □

Soundness of BI Proof Systems

Next, we establish the soundness and completeness of BI algebra with respect

to BI Kripke semantics. To show soundness, we first give a recipe to construct

a BI algebra given a BI frame; in particular, the BI algebra’s carrier set consists

of upwards-closed subsets of states in the BI frame — we can think of these

subsets as states satisfied by specific formulas. This construction will help to

prove that: if there exists a BI model (X,V) in which 𝑃 ̸ |=(X,V) 𝑄, then there

exists a BI algebra and an algebraic interpretation ⟦−⟧A, such that ⟦𝑃⟧A ≰ ⟦𝑄⟧A.

The construction is called the complex algebra of a BI frame.

Definition 2.2.6 (Complex Algebra). If X is a BI frame, then the complex algebra

24


of X, written Com(X) is the structure (P⊑ (𝑋),∩,∪,→X , 𝑋, ∅, ∗,−∗, 𝐸) where

P⊑ (𝑋) = {𝐴 ⊆ 𝑋 | 𝑎 ∈ 𝐴 ∧ 𝑎 ⊑ 𝑏 → 𝑏 ∈ 𝐴}

𝐴→X 𝐵 = {𝑎 | ∀𝑏. 𝑎 ⊑ 𝑏 ∧ 𝑏 ∈ 𝐴→ 𝑏 ∈ 𝐵}

𝐴 ∗ 𝐵 = {𝑥 | ∃𝑤, 𝑦, 𝑧. 𝑤 ⊑ 𝑥 ∧ 𝑤 ∈ 𝑦 ◦ 𝑧 ∧ 𝑦 ∈ 𝐴 ∧ 𝑧 ∈ 𝐵}

𝐴 −∗ 𝐵 = {𝑥 | ∀𝑤, 𝑦, 𝑧. (𝑥 ⊑ 𝑤 ∧ 𝑧 ∈ 𝑤 ◦ 𝑦 ∧ 𝑦 ∈ 𝐴) → 𝑧 ∈ 𝐵}

The complex algebra of any BI frame forms a BI algebra.

Lemma 2.2.7. If X = (𝑋, ⊑, ◦, 𝐸) is a BI frame, then Com(X) is a BI algebra.

Proof. Given X = (𝑋, ⊑, ◦, 𝐸). Let us show that for any 𝐴 ∈ P⊑ (𝑋), 𝐴 ∗ 𝐸 = 𝐸 ∗

𝐴 = 𝐴 and omit the rest of the conditions. For the first part,

𝐴 ∗ 𝐸 = {𝑥 | ∃𝑤, 𝑦, 𝑧. 𝑤 ⊑ 𝑥 ∧ 𝑤 ∈ 𝑦 ◦ 𝑧 ∧ 𝑦 ∈ 𝐴 ∧ 𝑧 ∈ 𝐸}

= {𝑥 | ∃𝑤, 𝑦, 𝑧. 𝑤 ⊑ 𝑥 ∧ 𝑤 ∈ 𝑧 ◦ 𝑦 ∧ 𝑧 ∈ 𝐸 ∧ 𝑦 ∈ 𝐴} (By Commutativity)

= 𝐸 ∗ 𝐴 (2.1)

For the second part,

𝐸 ∗ 𝐴 = {𝑥 | ∃𝑤, 𝑦, 𝑧. 𝑤 ⊑ 𝑥 ∧ 𝑤 ∈ 𝑦 ◦ 𝑧 ∧ 𝑦 ∈ 𝐸 ∧ 𝑧 ∈ 𝐴}

⊇ {𝑥 | ∃𝑧, 𝑒𝑧 . 𝑧 ⊑ 𝑥 ∧ 𝑧 ∈ 𝑒𝑧 ◦ 𝑧 ∧ 𝑒𝑧 ∈ 𝐸 ∧ 𝑧 ∈ 𝐴}

By Unit Existence, for any 𝑧 ∈ 𝑋 , there exists 𝑒𝑧 ∈ 𝐸 such that 𝑧 ∈ 𝑒𝑦 ◦ 𝑧. Thus,

𝐸 ∗ 𝐴 ⊇ {𝑥 | ∃𝑧.𝑧 ⊑ 𝑥 ∧ 𝑧 ∈ 𝐴} = 𝐴

On the other hand, by Unit Coherence and Commutativity, 𝑤 ∈ 𝑦 ◦ 𝑧 ∧ 𝑦 ∈ 𝐸

implies that 𝑧 ⊑ 𝑤, and thus,

𝐸 ∗ 𝐴 ⊆ {𝑥 | ∃𝑤, 𝑧. 𝑤 ⊑ 𝑥 ∧ 𝑧 ⊑ 𝑤 ∧ 𝑧 ∈ 𝐴}

= {𝑥 | ∃𝑧. 𝑧 ⊑ 𝑥 ∧ 𝑧 ∈ 𝐴}

= 𝐴

25


Therefore, 𝐴 ∗ 𝐸 = 𝐸 ∗ 𝐴 = 𝐴. □

The complex algebra is constructed in a way that allows us to regard any per-

sistent valuation on the BI frame as an algebraic interpretation of the complex

algebra.

Theorem 2.2.8. Let X = (𝑋, ⊑, ◦, 𝐸) be a BI frame and let Vf : AP → P(𝑋) be a

persistent valuation on X. Define the algebraic assignment Va : AP → Com(X) by

lettingVa(𝑝) = Vf(𝑝) for all atomic proposition 𝑝. Define the algebraic interpretation

⟦−⟧𝑎 by taking the homomorphic extension ofV𝑎 Then we have: 𝑥 |=Vf 𝑃 if and only if

𝑥 ∈ ⟦𝑃⟧a.

Proof. We proceed by induction on 𝑃. We show the base case and one inductive

case, and omit the rest of the inductive cases.

• Case 𝑃 = 𝑝: We have:

𝑥 |=Vf 𝑝 iff 𝑥 ∈ Vf(𝑝) iff 𝑥 ∈ Va(𝑝) iff 𝑥 ∈ ⟦𝑝⟧a

• Case 𝑃 = 𝑄1 ∧𝑄2:

𝑥 |=Vf 𝑄1 ∧𝑄2 iff 𝑥 |=Vf 𝑄1 and 𝑥 |=Vf 𝑄2 (By satisfication rule)

iff 𝑥 ∈ ⟦𝑄1⟧a and 𝑥 ∈ ⟦𝑄2⟧a (Inductive Hypothesis)

iff 𝑥 ∈ ⟦𝑄1⟧a ∩ ⟦𝑄2⟧a

iff 𝑥 ∈ ⟦𝑄1 ∧𝑄2⟧a

(By the ∧ operation in Complex algebra and the recursive definition ofV)

□

26


This equivalence between persistent valuations and algebraic interpretations

to complex algebra bridges the remaining gap between algebraic soundness we

proved in theorem 2.2.3 and overall soundness of the proof system with respect

to BI models.

Theorem 2.2.9 (Soundness of BI). If 𝑃 ⊢ 𝑄 is derivable, then 𝑃 |= 𝑄.

Proof. We prove the contra-positive. If 𝑃 ̸ |= 𝑄, then there exists a BI model

(X,V) and a state 𝑥 ∈ 𝑋 such that 𝑥 |= 𝑃 but 𝑥 ̸ |= 𝑄. By theorem 2.2.8, if we

define Va : AP → Com(X) by Va(𝑝) = V(𝑝), then we can extend it into an

algebraic interpretationVa such that 𝑥 |=V 𝑃 if and only if 𝑥 ∈ ⟦𝑃⟧a. Thus, there

exists algebraic interpretation ⟦−⟧𝑎 of Com(X) such that 𝑥 ∈ ⟦𝑃⟧a and 𝑥 ∉ ⟦𝑄⟧a.

So ⟦𝑃⟧a ⊈ ⟦𝑄⟧a; since the order ≤V𝑎 in the algebra is exactly the set inclusion

⊆, we have ⟦𝑃⟧a ≰ ⟦𝑄⟧a. By algebraic soundness, that implies 𝑃 ⊢ 𝑄 is not

derivable. □

Completeness of BI Proof Systems

In the following, we show the completeness of the BI proof system. Dual to

the approach for proving soundness, we show that if there exists an algebraic

interpretation ⟦−⟧A and some formulas 𝑃,𝑄 such that ⟦𝑃⟧A ≰ ⟦𝑄⟧A, then there

exists a BI model (X,V) such that 𝑃 ̸ |=(X,V) 𝑄. To show that, we utilize a map

dual to the complex algebra construction in the soundness proof: here, given

an algebraic interpretation of BI formulas to that a BI algebra, we construct a BI

frame corresponding to the BI algebra and a valuation to that BI frame corre-

sponding to the algebraic interpretation.

We first recall a structure on a bounded distributive lattice, called prime filter.

27


Definition 2.2.7 (Prime Filter). If (𝐿,∧,∨) is a bounded distributive lattice, a

filter 𝐹 on 𝐿 is a non-empty subset of 𝐴 such that:

• If 𝑥 ∈ 𝐹 and 𝑥 ≤ 𝑦 then 𝑦 ∈ 𝐹.

• If 𝑥 ∈ 𝐹 and 𝑦 ∈ 𝐹 then 𝑥 ∧ 𝑦 ∈ 𝐹.

A filter is proper if it is a proper subset of 𝐴, i.e.,, it does not contain ⊥. A

prime filter is a proper filter that in addition satisfies: if 𝑥 ∨ 𝑦 ∈ 𝐹 then 𝑥 ∈ 𝐹 or

𝑦 ∈ 𝐹.

Given a BI algebra, we can construct a BI frame whose states are prime filters.

We write Prf(𝐿) for the set of prime filters on 𝐿.

Definition 2.2.8 (Prime Filter Frame). If A = (𝐴,∧,∨,→,⊤,⊥, ∗,−∗, 𝐼) is a BI al-

gebra, then the prime filter frame of A is defined as Prf(A) = (Prf(𝐴), ⊆, ◦, 𝐸)

where

𝐹1 ◦ 𝐹2 = {𝐹 ∈ Prf(𝐴) | ∀𝑎1 ∈ 𝐹1.∀𝑎2 ∈ 𝐹2. 𝑎1 ∗ 𝑎2 ∈ 𝐹}

𝐸 = {𝐹 ∈ Prf(𝐴) | 𝐼 ∈ 𝐹}

We need to check that the constructed structure is a BI frame.

Lemma 2.2.10. If A = (𝐴,∧,∨,→,⊤,⊥, ∗,−∗, 𝐼) is a BI algebra, then Prf(A) is a BI

frame.

Proof. Let us show Unit Coherence and Down-Closed.

28


For Unit Coherence, if 𝑒 ∈ 𝐸 , then for any 𝑥 ∈ Prf(𝐴) ,

𝑥 ◦ 𝑒 = {𝐹 ∈ Prf(𝐴) | ∀𝑎1 ∈ 𝑥.∀𝑎2 ∈ 𝑒. 𝑎1 ∗ 𝑎2 ∈ 𝐹}

⊇ {𝐹 ∈ Prf(𝐴) | ∀𝑎1 ∈ 𝑥. 𝑎1 ∗ 𝐼 ∈ 𝐹}

= {𝐹 ∈ Prf(𝐴) | ∀𝑎1 ∈ 𝑥. 𝑎1 ∈ 𝐹}

= {𝐹 ∈ Prf(𝐴) |𝑥 ⊆ 𝐹}

Thus, 𝑦 ∈ 𝑥 ◦ 𝑒 implies 𝑥 ⊑ 𝑦.

For Down-Closed, for any 𝑥, 𝑥′, 𝑦, 𝑦′, 𝑧 ∈ Prf(𝐴), if 𝑧 ∈ 𝑥 ◦ 𝑦 and 𝑥′ ⊆ 𝑥 and

𝑦′ ⊆ 𝑦 then for any 𝑎1 ∈ 𝑥′, 𝑎2 ∈ 𝑦′, it must 𝑎1 ∈ 𝑥, 𝑎2 ∈ 𝑦 as well, and we have

𝑎1 ∗ 𝑎2 ∈ 𝑧. Thus, 𝑧 ∈ 𝑥′ ◦ 𝑦′. □

Below, we show that any algebraic interpretation to A corresponds to a

“morally equivalent” persistent valuation on the prime filter frame Prf(A). This

result is in dual to theorem 2.2.8 used in the soundness proof. In the theorem

and its proof, we use the following notation: for any element 𝑎 in a lattice, we

write

[𝑎) := {𝑥 | 𝑎 ≤ 𝑥}.

By construction, any such [𝑎) is upwards-closed and closed under meet, thus a

filter.

Theorem 2.2.11. Let A = (𝐴, . . . ) be a BI algebra and let ⟦−⟧ : FormBI → 𝐴 be an

algebraic interpretation that homomorphically extends the assignmentVa : AP → 𝐴.

Define the persistent valuationVf : AP → P(Prf(𝐴)) on the prime filter frame Prf(A)

by:

Vf(𝑝) = {𝐹 ∈ Prf(𝐴) | Va(𝑝) ∈ 𝐹}

Then for 𝐹 ∈ Prf(𝐴), we have 𝐹 |=Vf 𝑃 if and only if ⟦𝑃⟧ ∈ 𝐹 .

29


Proof. We proceed by induction on the formula 𝑃.

• Case 𝑃 = 𝑝: For any 𝐹 ∈ Prf(𝐴) and atomic proposition 𝑝, we have

𝐹 |=Vf 𝑝 iff 𝐹 ∈ Vf(𝑝) (By Kripke semantics fig. 2.3)

iff 𝐹 ∈ Prf(A) andVa(𝑝) ∈ 𝐹 (By definition ofVf)

iff Va(𝑝) ∈ 𝐹 (By assumption 𝐹 ∈ Prf(𝐴))

iff ⟦𝑝⟧ ∈ 𝐹. ( ⟦−⟧ extendsVa(−))

• Case 𝑃 = 𝑄1 ∗ 𝑄2. For any 𝐹 ∈ Prf(𝐴) and formula 𝑄1, 𝑄2,

𝐹 |=Vf 𝑄1 ∗ 𝑄2

iff there exists 𝐹′, 𝐹𝑦, 𝐹𝑧 s.t. 𝐹 ⊒ 𝐹′ ∈ 𝐹𝑦 ◦ 𝐹𝑧, 𝐹𝑦 |=Vf 𝑄1 and 𝐹𝑧 |=Vf 𝑄2

(By Kripke semantics fig. 2.3)

iff there exists 𝐹′, 𝐹𝑦, 𝐹𝑧 s.t. 𝐹 ⊇ 𝐹′ ∈ 𝐹𝑦 ◦ 𝐹𝑧, ⟦𝑄1⟧ ∈ 𝐹𝑦 and ⟦𝑄2⟧ ∈ 𝐹𝑧

(By inductive hypothesis and definition of the preorder in Prf(A))

For the forward direction, by definition of prime filter frames, ⟦𝑄1⟧ ∈ 𝐹𝑦

and ⟦𝑄2⟧ ∈ 𝐹𝑧 imply that for any 𝐹′ ∈ 𝐹𝑦 ◦ 𝐹𝑧, it must ⟦𝑄1⟧ ∗ ⟦𝑄2⟧ ∈ 𝐹′.

Thus, it implies that ⟦𝑄1 ∗ 𝑄2⟧ ∈ 𝐹.

For the other direction, if ⟦𝑄1 ∗ 𝑄2⟧ ∈ 𝐹, then ⟦𝑄1⟧ ∗ ⟦𝑄2⟧ ∈ 𝐹. We do a

case analysis:

– Suppose 𝑄𝑖 is ⊥, then ⟦𝑄𝑖⟧ = ⊥A. By BI-Alg:Bot, ⟦𝑄1⟧ ∗ ⟦𝑄2⟧ = ⊥A.

And then 𝐹 ∋ ⊥A, contradicting with 𝐹 being a proper set. Thus, this

case is impossible.

– 𝑄1 and 𝑄2 are both not ⊥. This means that [⟦𝑄1⟧A), [⟦𝑄2⟧A) are both

proper filters. On the high level, we first show that 𝐹 ∈ [⟦𝑄1⟧A) ◦

[⟦𝑄2⟧A), and then use that to show that there exist prime filters 𝐹𝑦, 𝐹𝑧

30


such that 𝐹 ∈ 𝐹𝑦 ◦ 𝐹𝑧 and ⟦𝑄1⟧A ∈ 𝐹1 and ⟦𝑄2⟧A ∈ 𝐹2, which would

then be used to show 𝐹 |=Vf 𝑄1 ∗ 𝑄2. First,

[⟦𝑄1⟧A) ◦ [⟦𝑄2⟧A)

= {𝐹 ∈ Prf(𝐴) | ∀𝑎 ∈ [⟦𝑄1⟧A).∀𝑏 ∈ [⟦𝑄2⟧A). 𝑎 ∗ 𝑏 ∈ 𝐹}

= {𝐹 ∈ Prf(𝐴) | ∀𝑎, 𝑏. ⟦𝑄1⟧A ≤ 𝑎 ∧ ⟦𝑄2⟧A ≤ 𝑏 ⇒ 𝑎 ∗ 𝑏 ∈ 𝐹}

By BI-Alg:Coh, ⟦𝑄1⟧A ≤ 𝑎 and ⟦𝑄2⟧A ≤ 𝑏 implies

⟦𝑄1⟧ ∗ ⟦𝑄2⟧ ≤ 𝑎 ∗ 𝑏.

Thus, our given 𝐹 being a filter and ⟦𝑄1⟧ ∗ ⟦𝑄2⟧ ∈ 𝐹 imply that

𝑎 ∗ 𝑏 ∈ 𝐹 for any such 𝑎, 𝑏. Therefore, 𝐹 ∈ [⟦𝑄1⟧A) ◦ [⟦𝑄2⟧A).

Next, define a predicate 𝑃 such that 𝑃(𝐹1, 𝐹2) = 1 if and only if 𝐹 ∈

𝐹1 ◦ 𝐹2 and ⟦𝑄1⟧A ∈ 𝐹1 and ⟦𝑄2⟧A ∈ 𝐹2. Because 𝐹 ∈ [⟦𝑄1⟧A) ◦

[⟦𝑄2⟧A), we have 𝑃( [⟦𝑄1⟧A), [⟦𝑄2⟧A)) = 1. This predicate 𝑃 is a

prime predicate according to Docherty [2019] (cf. Definition 5.5) —

the proof follows from unfolding definitions and we omit it. Then,

applying the Prime Extension Lemma (cf. Lemma 5.7 of Docherty

[2019]), the existence of proper filters 𝐹1, 𝐹2 such that 𝑃(𝐹1, 𝐹2) = 1

implies that there exists prime filters 𝐹𝑦, 𝐹𝑧 such that 𝑃(𝐹𝑦, 𝐹𝑧) = 1.

Therefore. there exists 𝐹𝑦, 𝐹𝑧 such that 𝐹 ∈ 𝐹𝑦 ◦ 𝐹𝑧 and ⟦𝑄1⟧A ∈ 𝐹𝑦

and ⟦𝑄2⟧A ∈ 𝐹𝑧 — and by inductive hypothesis this means 𝐹𝑦 |=Vf 𝑄1

and 𝐹𝑧 |=Vf 𝑄2. The existence of such 𝐹𝑦 and 𝐹𝑧 validates that 𝐹 |=Vf

𝑄1 ∗ 𝑄2.

□

Now we are ready to prove completeness.

31


Theorem 2.2.12 (BI Completeness). If 𝑃 |=V 𝑄 for all BI models (X,V), then 𝑃 ⊢ 𝑄.

Proof. We prove the contra-positive. Assume 𝑃 ⊢ 𝑄 is not derivable. By al-

gebraic completeness, there exists algebra A and interpretation ⟦−⟧ such that

⟦𝑃⟧ ≰ ⟦𝑄⟧. Then, the element ⟦𝑄⟧ is not in [⟦𝑃⟧), the least filter containing 𝑃.

Let 𝐹 = [⟦𝑃⟧).

• 𝑃 is ⊥. Then ⊥ ⊢ 𝑄 by the proof rule BOT, contradicting with 𝑃 ⊬ 𝑄.

• 𝑃 is not ⊥. Then 𝐹 = [⟦𝑃⟧) is a proper filter. Define a predicate 𝑃 such

that 𝑃(𝐹′) = 1 iff ⟦𝑃⟧ ∈ 𝐹′ and ⟦𝑄⟧ ∉ 𝐹′. Because ⟦𝑄⟧ ∉ [⟦𝑃⟧) and

⟦𝑃⟧ ∈ [⟦𝑃⟧), we have 𝑃(𝐹) = 1. This predicate is a prime predicate, and

from prime extension lemma (cf. Lemma 5.7 of Docherty [2019]) it can

be established that there is a prime filter 𝐹′ on A such that ⟦𝑃⟧ ∈ 𝐹′ and

⟦𝑄⟧ ∉ 𝐹′.

Define a persistent valuationVf on Prf(A) by

Vf(𝑝) = {𝐹 ∈ Prf(𝐴) | Va(𝑝) ∈ 𝐹}.

By theorem 2.2.11, we have 𝐹Vf |= 𝑃 and 𝐹Vf ̸ |= 𝑄. Thus, 𝑃 ̸ |= 𝑄.

□

2.2.4 A Discrete Probabilistic Frame of BI

After displaying the metatheory of bunched logic, next we show a concrete ex-

ample of BI model. We present a model based on probabilistic distributions over

program memories, which will be useful later in reasoning about probabilistic

programs.

32


Definition 2.2.9 (Discrete Distribution). Given a set 𝑆, a discrete subdistribution

𝜇 is a countable support function 𝜇 : 𝑋 → [0, 1] satisfying
∑
𝑥∈𝑋 𝜇(𝑥) ≤ 1. A (full)

distribution is a subdistribution that in addition
∑
𝑥∈𝑋 𝜇(𝑥) = 1. We use D(𝑋) to

denote the set of discrete (full) distributions 𝜇 over 𝑋 .

Now we can define program memories. Throughout this thesis, we fix a set

of variables Var and a set of values that the variables can take Val.

Definition 2.2.10 (Program Memories). Let 𝑆 ⊆ Var be a set of variable names.

We call any function 𝑚 : 𝑆 → Val a program memory because such a map 𝑚 as-

signs a value to each variable in 𝑆. Let Mem[𝑆] denote the set of program mem-

ories from 𝑆 to Val; and for each 𝑚 ∈ Mem[𝑆], define the domain of 𝑚 to be 𝑆

and denote it as dom(𝑆)

As an example, the empty program memory Mem[∅] = ∅ → Val contains

exactly one element, which is a trivial map with an empty domain; we denote

the trivial map by ⟨⟩.

We need two operations on memories. First, a memory 𝑚 with domain 𝑆 can

be projected to a memory 𝜋𝑇𝑚 with domain 𝑇 if 𝑇 ⊆ 𝑆, defined as 𝜋𝑇𝑚(𝑥) = 𝑚(𝑥)

for any variable 𝑥 ∈ 𝑇 . Second, two memories can be combined if they agree on

the intersection of their domains.

Definition 2.2.11. Given memories 𝑚1 ∈ Mem[𝑆], 𝑚2 ∈ Mem[𝑇] such that

𝜋𝑆∩𝑇𝑚1 = 𝜋𝑆∩𝑇𝑚2, we define 𝑚1 ⊲⊳ 𝑚2 : 𝑆 ∪ 𝑇 → Val by

𝑚1 ⊲⊳ 𝑚2(𝑥) :=



𝑚1(𝑥) if 𝑥 ∈ 𝑆 \ 𝑇

𝑚2(𝑥) if 𝑥 ∈ 𝑇 \ 𝑆

𝑚1(𝑥) = 𝑚2(𝑥) if 𝑥 ∈ 𝑆 ∩ 𝑇

33


This operation is not defined when 𝑚1, 𝑚2 disagree on 𝑆 ∩ 𝑇 . This operation is

well-defined exactly because 𝑚1, 𝑚2 agrees on 𝑆 ∩ 𝑇 .

We also lift the projection map to distributions. We define the projection 𝜋𝑆

to marginalize a distribution 𝜇 onD(Mem[𝑆′]) to a distribution onD(Mem[𝑆 ∩

𝑆′]): for any 𝑥 ∈ Mem[𝑆 ∩ 𝑆′],

𝜋𝑆𝜇(𝑥) :=
∑︁

𝑥′∈Mem[𝑆′\𝑆]
𝜇(𝑥′ ⊲⊳ 𝑥).

This gives us enough ingredients to define a probabilistic BI frame that will

later be useful for reasoning about probabilistic independence.

Definition 2.2.12 (A Discrete Probabilistic BI Frame). Define a discrete probabilis-

tic BI frame to be a structure XD = (𝑋D, ⊑D, ⊗D, 𝐸D) where

• XD := ∪𝑆⊆VarD(Mem[𝑆]);

• Distributions 𝜇1 ⊑D 𝜇2 iff 𝜇1 = 𝜋dom(𝜇2),dom(𝜇1)𝜇2.

• For distributions 𝜇1 ∈ D(Mem[𝑆]), 𝜇2 ∈ D(Mem[𝑇]), the binary operation

⊗D takes the independent product of them iff 𝑆 and 𝑇 are disjoint:

𝜇1 ⊗D 𝜇2 =


{𝜇 | ∀𝑥 ∈ Mem[𝑆 ∪ 𝑇], 𝜇(𝑥) = 𝜇1(𝜋𝑆𝑥) · 𝜇2(𝜋𝑇𝑥)} if 𝑆, 𝑇 disjoint

∅ otherwise

• 𝐸D := ∪𝑆⊆VarD(Mem[𝑆]);

We check that the structure XD is a BI frame. In Barthe et al. [2019], they

check a very similar structure is a partial commutative monoid. The structure’s

carrier set consists of pairs of deterministic memories and randomized memo-

ries; our states can be viewed as a degenerated case of their states with trivial

34


deterministic memories. Meanwhile, BI frames can be viewed as a generaliza-

tion of partial commutative monoids. So it intuitively follows from their result

that XD is a BI frame.

Theorem 2.2.13. XD = (𝑋D, ⊑D, ⊗D, 𝐸D) is a BI frame.

Proof. We show that it satisfies all the frame conditions. For instance,

Down-Closed If 𝜇𝑧 ∈ 𝜇𝑥 ⊗D 𝜇𝑦, and 𝜇′𝑥 ⊑D 𝜇𝑥 , 𝜇′𝑦 ⊑D 𝜇𝑦, then define 𝑋 =

dom(𝜇𝑥), 𝑌 = dom(𝜇𝑦), 𝑋′ = dom(𝜇′𝑥), 𝑌 ′ = dom(𝜇′𝑦), and define 𝜇 =

𝜋𝑋 ′∪𝑌 ′𝜇𝑧. The fact that 𝜇𝑧 ∈ 𝜇𝑥 ⊗D 𝜇𝑦 implies that for any 𝑚 ∈ Mem[𝑋 ∪ 𝑌 ],

𝜇𝑧 (𝑚) = 𝜇𝑥 (𝜋𝑋𝑚) · 𝜇𝑦 (𝜋𝑌𝑚);

Thus,

𝜇(𝑚) = (𝜋𝑋 ′∪𝑌 ′𝜇𝑧) (𝑚)

=
∑︁

𝑚′∈Mem[𝑋∪𝑌\(𝑋 ′∪𝑌 ′)]
𝜇𝑧 (𝑚′ ⊲⊳ 𝑚)

=
∑︁

𝑚′∈Mem[𝑋∪𝑌\(𝑋 ′∪𝑌 ′)]
𝜇𝑥 (𝜋𝑋𝑚′ ⊲⊳ 𝑚) · 𝜇𝑦 (𝜋𝑌𝑚′ ⊲⊳ 𝑚)

=
∑︁

𝑚1∈Mem[𝑋\𝑋 ′]

∑︁
𝑚2∈Mem[𝑌\𝑌 ′]

𝜇𝑥 (𝜋𝑋𝑚1 ⊲⊳ 𝑚2 ⊲⊳ 𝑚) · 𝜇𝑦 (𝜋𝑌𝑚1 ⊲⊳ 𝑚2 ⊲⊳ 𝑚)

=
©­«

∑︁
𝑚1∈Mem[𝑋\𝑋 ′]

𝜇𝑥 (𝜋𝑋𝑚1 ⊲⊳ 𝑚)ª®¬ · ©­«
∑︁

𝑚2∈Mem[𝑌\𝑌 ′]
𝜇𝑦 (𝜋𝑌𝑚2 ⊲⊳ 𝑚)ª®¬

= 𝜋𝑋 ′𝜇𝑥 (𝑚) · 𝜋𝑌 ′𝜇𝑦 (𝑚)

= 𝜇′𝑥 (𝑚) · 𝜇′𝑦 (𝑚)

Hence, 𝜇 ∈ 𝜇′𝑥 ⊗D 𝜇′𝑦, and by definition, 𝜇 ⊑D 𝜇𝑧.

We delay the full proof to appendix A. □

35


This theorem indicates that, when given a set of atomic propositions and a

persistent valuation, we can interpret BI formulas on distributions over mem-

ories. In the next section, we will use BI formulas to specify probabilistic pro-

grams and also give proof rules for reasoning about probabilistic programs.

2.3 Probabilistic Separation Logic

In this section, we will overview probabilistic separation logic, which consists of

a set of rules for analyzing probabilistic programs. Each rule describes how

a probabilistic program transforms its input distribution into its output distri-

bution; and the input and output distributions are specified using BI formulas

interpreted on the concrete probabilistic BI frame XD — we will introduce a set

of atomic propositions and a valuation so that the BI formulas can effectively

specify probabilistic programs.

At the high level, probabilistic separation logic will utilize a useful and com-

mon property in probabilistic distributions — independence — to reason about

irrelevant parts of a probabilistic programs modularly. Independence is often

defined for events. For a discrete distribution 𝜇 : 𝑋 → [0, 1], an event EV is a

map 𝑋 → {0, 1}, and the probability of event EV is
∑
𝜔∈𝑋 𝜇(𝜔) · EV(𝜔), which

we will overload the notation and write as 𝜇(EV). The independence of two

events says that the occurrence of one event does not tell anything about the

occurrence of the other event. Formally,

Definition 2.3.1 (Probabilistic Independence of Events). Given any distribution

𝜇 : 𝑋 → [0, 1], two events EV1, EV2 are independent if and only if

𝜇(EV1 ∩ EV2) = 𝜇(EV1) · 𝜇(EV2).

36


A set of events EV1, . . . , EV𝑛 are mutually independent if for any subset 𝑆 ⊆ [𝑛],

𝜇(
⋂
𝑗∈𝑆
EV 𝑗 ) =

⋂
𝑗∈𝑆

𝜇(EV 𝑗 ).

For program analysis, another useful notion is the probabilistic indepen-

dence between two program variables, which says that knowing the value of

one program variable does not tell anything about the value of the other. To

formally define it, we need “a variable 𝑥 takes a value 𝑣” to be an event, which

is the case for distributions over program memories. Given a distribution 𝜇 over

D(Mem[𝑆]), then for any 𝑥 ∈ 𝑆, we write the event {𝜔 ∈ Mem[𝑆] | 𝜔(𝑥) = 𝑣} as

𝑥 = 𝑣.

We then define the probabilistic independence of variables as followings.

Definition 2.3.2 (Probabilistic Independence of Program Variables). Given any

distribution 𝜇 : 𝑋 → [0, 1] and two variables 𝑥, 𝑦 ∈ Var, if 𝑥 = 𝑣1, 𝑦 = 𝑣2 are

events for any two values 𝑣1, 𝑣2 ∈ Val, then we define the variables 𝑥 and 𝑦 to be

independent if and only if: for any 𝑣1, 𝑣2 ∈ Val,

𝜇(𝑥 = 𝑣1 ∧ 𝑦 = 𝑣2) = 𝜇(𝑥 = 𝑣1) · 𝜇(𝑦 = 𝑣2),

i.e., the events 𝑥 = 𝑣1 and 𝑦 = 𝑣2 are independent. Similarly, a set of program

variables 𝑦1, . . . , 𝑦𝑛 are mutual independent iff for any subset 𝑆 ⊆ [𝑛], for any set

of {𝑣𝑖 ∈ Val | 𝑖 ∈ 𝑆}

𝜇(
⋂
𝑖∈𝑆
𝑥 = 𝑣𝑖) =

∏
𝑖∈𝑆

𝜇(𝑥 = 𝑣𝑖),

In the following, when talking about a set of variables, we will abbreviate

mutual independence as independence.

Another commonly used notion is the independence between two sets of

program variables. We can talk about value assignments on a set of variables

37


in a similar way: given a distribution 𝜇 over Mem[𝑆], for any 𝑋 ⊆ 𝑆 and 𝑚 ∈

Mem[𝑋], we write 𝑋 = 𝑚 for {𝜔 ∈ Mem[𝑆] | ∀𝑥 ∈ 𝑋.𝜔(𝑥) = 𝑚(𝑥)}.

Definition 2.3.3 (Probabilistic Independence of Two Sets of Program Variables).

Given two sets of variables 𝑋,𝑌 ⊆ Var, program memories on 𝑋 and program

memories on 𝑌 , 𝑋,𝑌 are independent if for any 𝑚𝑋 ∈ Mem[𝑋], 𝑚𝑌 ∈ Mem[𝑌 ],

𝜇(𝑋 = 𝑚𝑋 ∩ 𝑌 = 𝑚𝑌 ) = 𝜇(𝑋 = 𝑚𝑋) · 𝜇(𝑌 = 𝑚𝑋).

An equivalent condition is as follows: for any 𝑚𝑋 ∈ Mem[𝑋], 𝑚𝑌 ∈ Mem[𝑌 ]

such that 𝑚𝑋 ⊲⊳ 𝑚𝑌 is defined,

𝜋𝑋∪𝑌 𝜇(𝑚𝑋 ⊲⊳ 𝑚𝑌 ) = 𝜋𝑋𝜇(𝑚𝑋) · 𝜋𝑌 𝜇(𝑚𝑌 ).

The probabilistic separation logic presented in this section will facilitate its

users to prove, track, and utilize the independence of (sets of) program vari-

ables.

2.3.1 A Simple Probabilistic Programming Language

We work with an imperative language pWhile that allows sampling from a set

of built-in primitive distributions. We first define the set of valid expressions E

and the set of allowed distributions D, and then define the formal grammar of

commands C. We assume a fixed set of typed program variables; 𝑥 stands for

a numeric variable, while 𝑏 stands for a boolean variable. The expression lan-

guage is standard. Distribution terms 𝑑 ∈ D can be Bern𝑣 for a Bernoulli (coin-

flip) distribution with bias 𝑣, Unif𝑆 for a uniform distribution over elements in 𝑆,

or some other symbols interpreted into distributions — we will introduce them

38


E ∋ 𝑒 ::= 𝑣 ∈ Val | 𝑥, 𝑏 ∈ Var | 𝑒1 = 𝑒2 | 𝑒1 + 𝑒2 | 𝑒1 × 𝑒2 | . . .
D ∋ 𝑑 ::= Bern𝑣 | Unif𝑆 | . . .
C ∋ 𝑐 ::= skip | 𝑥 ← 𝑒 | 𝑥 $← 𝑑 | if 𝑏 then 𝑐 else 𝑐′ | 𝑐 ; 𝑐′ | while 𝑏 do 𝑐

Figure 2.5: pWhile command syntax

later when needed. We assume throughout that all expressions and distribu-

tion terms are well-typed; in particular, the value 𝑣 in Bern𝑣 is a number in the

interval [0, 1], and 𝑆 in Unif𝑆 is a finite set whose size is |𝑆 |.

For commands, pWhile has six kinds of programs: the no-op skip; assign-

ments 𝑥 ← 𝑒, which assign the evaluated value of the expression 𝑒 to the pro-

gram variable 𝑥; sampling 𝑥 $← 𝑑 for drawing a value from a distribution 𝑑

and assigning it to 𝑥; conditionals if 𝑏 then 𝑐 else 𝑐′ for branching on a (possibly

randomized) condition 𝑏; sequencing 𝑐 ; 𝑐′; and loops while 𝑏 do 𝑐 for iterat-

ing a command 𝑐 until the condition 𝑏 is not true. We also write if 𝑏 then 𝑐 as

abbreviation of if 𝑏 then 𝑐 else skip.

Probabilistic Monad To concisely describe the denotational semantics of these

commands, we introduce operations on distributions and probabilistic monads.

Since D(𝑋) is the set of distributions over 𝑋 , we can view D as an operation

that maps a set into distributions over that set. This operation on sets can be

lifted to functions 𝑓 : 𝑋 → 𝑌 , resulting in a map of distributions D( 𝑓 ) : D(𝑋) →

D(𝑌 ) given by D( 𝑓 ) (𝜇) (𝑦) :=
∑
𝑓 (𝑥)=𝑦 𝜇(𝑥). Intuitively, D( 𝑓 ) takes the sum of the

probabilities of all elements in the pre-image of 𝑦. These operations turn D into

a functor on sets and, further, D is also a monad [Giry, 1982, Moggi, 1991].

Definition 2.3.4 (Distribution Monad). Define unit : 𝑋 → D(𝑋) as unit(𝑥) := 𝛿𝑥

39


where 𝛿𝑥 denotes the Dirac distribution on 𝑥: for any 𝑦 ∈ 𝑋 , we have 𝛿𝑥 (𝑦) = 1 if

𝑦 = 𝑥, otherwise 𝛿𝑥 (𝑦) = 0. Further, define bind : D(𝑋) × (𝑋 → D(𝑌 )) → D(𝑌 ) by

bind(𝜇, 𝑓 ) (𝑦) :=
∑
𝑝∈D(𝑌 ) D( 𝑓 ) (𝜇) (𝑝) · 𝑝(𝑦).

Intuitively, unit embeds a set into distributions over the set, and bind enables

the sequential combination of probabilistic computations. Both maps are natu-

ral transformations and satisfy the following interaction laws, establishing that

(D, unit, bind) is a monad:

bind(unit(𝑥), 𝑓 ) = 𝑓 (𝑥)

bind(𝜇, 𝑥 ↦→ unit(𝑥)) = 𝜇,

bind(bind(𝜇, 𝑓 ), 𝑔) = bind(𝜇, 𝜆𝑥.bind( 𝑓 (𝑥), 𝑔)).

The distribution monad has an equivalent presentation in which bind is replaced

with a multiplication operation join : D(D(𝑋)) → D(𝑋), which flattens distribu-

tions by averaging:

join(𝜇) (𝑥) :=
∑︁
𝜌∼𝜇

𝜇(𝜌) · 𝜌(𝑥).

Program Semantics Given a program memory containing all variables ap-

pearing in an expression, we interpret E terms as values in Val and interpret

D terms as distributions in D(Val) as in fig. 2.6. We overload the notation and

write ⟦𝑒⟧ for interpretation of expression 𝑒 and ⟦𝑑⟧ for interpretation of distri-

bution 𝑑.

We can also interpret expressions on probabilistic memories through a lift-

ing. For any 𝜇 ∈ D(Mem[𝑆]),

⟦𝑒⟧(𝜇) = bind(𝜇, 𝑚 ↦→ ⟦𝑒⟧(𝑚))

40


⟦𝑣⟧(𝑚) := 𝑣
⟦𝑥⟧(𝑚) := 𝑚(𝑥)

⟦𝑥 = 𝑦⟧(𝑚) := 1 if 𝑚(𝑥) = 𝑚(𝑦) else 0
⟦𝑥 + 𝑦⟧(𝑚) := 𝑚(𝑥) + 𝑚(𝑦)
⟦𝑥 × 𝑦⟧(𝑚) := 𝑚(𝑥) × 𝑚(𝑦)

⟦Bern𝑣⟧ =


1 ↦→ 𝑣

0 ↦→ 1 − 𝑣
𝜔 ↦→ 0 if 𝜔 ≠ 0 and 𝜔 ≠ 1

⟦Unif𝑆⟧ =
{
𝜔 ↦→ 1

|𝑆 | if 𝜔 ∈ 𝑆
𝜔 ↦→ 0 otherwise

Figure 2.6: Semantics of Expressions and Distributions

Then we can interpret programs in pWhile as distribution transformers

D(Mem[Var]) → D(Mem[Var]), as in fig. 2.7. The interpretation is standard.

The command skip simply outputs the input distribution; 𝑥 ← 𝑒 and 𝑥 $← 𝑑 use

the monadic operation bind to compose the input distribution 𝜇 with the updat-

ing map describing the output distribution corresponding to each deterministic

input memory 𝑚; last, 𝑐 ; 𝑐′ composes the interpretation of 𝑐 and 𝑐′ using usual

function composition.

Because the conditional if 𝑏 then 𝑐 else 𝑐′ allows a randomized guard 𝑏, inter-

preting it requires two more operations on distributions: a conditioning opera-

tion 𝜇 | 𝑆 to split control flow, and convex combination ⊕𝑝 to merge control flow.

Given any distribution 𝜇 ∈ D(𝐴) and event 𝑆 ⊆ 𝐴, if 𝜇(𝑆) > 0, the conditional

distribution of 𝜇 given 𝑆 is:

(𝜇 | 𝑆) (𝑎) :=


𝜇(𝑎)
𝜇(𝑆) : 𝑎 ∈ 𝑆

0 : 𝑎 ∉ 𝑆.

(2.2)

41


⟦skip⟧(𝜇) := 𝜇
⟦𝑥 ← 𝑒⟧(𝜇) := bind(𝜇, 𝑚 ↦→ unit(𝑚 [𝑥 ↦→ ⟦𝑒⟧(𝑚)]))
⟦𝑥 $← 𝑑⟧(𝜇) := bind(𝜇, 𝑚 ↦→ bind(𝑑, 𝑣 ↦→ unit(𝑚 [𝑥 ↦→ 𝑣])))
⟦𝑐 ; 𝑐′⟧(𝜇) := ⟦𝑐′⟧(⟦𝑐⟧(𝜇))

⟦if 𝑏 then 𝑐 else 𝑐′⟧(𝜇) := ⟦𝑐⟧(𝜇 | 𝑏 = tt) ⊕𝑝 ⟦𝑐′⟧(𝜇 | 𝑏 = ff ) where 𝑝 := 𝜇(𝑏 = tt)
⟦abort⟧(𝜇) := 𝜆𝜔.0

⟦while 𝑏 do 𝑐⟧(𝜇) := lim
𝑛→∞
⟦(if 𝑏 then 𝑐)𝑛; if 𝑏 then abort⟧(𝜇)

Figure 2.7: Program semantics

When 𝜇(𝑆) = 0, we leave 𝜇 | 𝑆 undefined. For convex combination, for any

𝜇1, 𝜇2 ∈ D(𝐴), we define 𝜇1 ⊕0 𝜇2 := 𝜇2 and 𝜇1 ⊕1 𝜇2 := 𝜇1. When 𝑝 ∈ (0, 1),

(𝜇1 ⊕𝑝 𝜇2) (𝑎) := 𝑝 · 𝜇1(𝑎) + (1− 𝑝) · 𝜇2(𝑎). Conditioning and convex combination

are inverses in the sense that 𝜇 = (𝜇 | 𝑆) ⊕𝜇(𝑆) (𝜇 | (𝐴 \ 𝑆)).

The command abort is only used in the definition of the while loops and is

not accessible by the users. It disregards the input distribution 𝜇 and returns a

subdistribution that assigns 0 to all possible outcomes 𝜔. The semantics of the

while loop while 𝑏 do 𝑐 is the limit of ⟦(if 𝑏 then 𝑐)𝑛; if 𝑏 then abort⟧ as 𝑛 ap-

proaches to infinity. The limit, taken with the point-wise order, exists according

to the monotone convergence theorem [Abbott, 2015, Strichartz, 2000] because

the subdistribution’s mass is non-decreasing as 𝑛 increases and is upper bound

by 1. In practice, we assumed that all loops terminate in finite steps, the limit is

always a full distribution, so all commands in pWhile can still be interpreted as

distribution transformers.

42


2.3.2 A Concrete BI Model for Asserting Independence

We define some atomic propositions to describe distributions over program

memories ∪𝑆⊆VarD(Mem[𝑆]). The BI frame XD defined in section 2.2.4 to-

gether with the valuation V∗ for atomic propositions defined below provide

a BI model, on which we can assert properties such as distributions of variables,

and independence between variables.

Let atomic propositions

APD ∋ 𝑝 ::= Own(E) | E $∼ 𝜇 | Detm⟨E⟩ | [E = E] | E[E] ⊲⊳ 𝑐 (2.3)

where ⊲⊳ ∈ {=, ≤, ≥}, 𝑏 ∈ {0, 1}, and 𝑐 ∈ R is a constant. Roughly, Own(E) asserts

that the distribution of the expression E is fully determined; E $∼ 𝜇 asserts that

the expression E has distribution 𝜇; Detm⟨E⟩ asserts that the expression E is

deterministic; [E1 = E2] asserts that the expression E1 and E2 are always equal;

last, E[𝑒] ⊲⊳ 𝑐 bound the expected value of an expression 𝑒 with respect to a

constant 𝑐. In particular, since events are maps from memories to {0, 1}, which

are the same type as the interpretation of boolean expressions in the language,

we assume the set of expressions contains events as well.

We define the satisfaction of atomic proposition on program configurations

as follows. Let FV(𝑒) be the set of free variables in expression 𝑒.

Definition 2.3.5 (Valuation). For 𝜇 ∈ XD, defineV∗ such that

• 𝜇 ∈ V∗(Own(𝑒)) holds if FV(𝑒) ⊆ dom(𝜇);

• 𝜇 ∈ V∗(𝑒 $∼ 𝜇′) iff FV(𝑒) ⊆ dom(𝜇) and ⟦𝑒⟧(𝜇) = 𝜇′;

• 𝜇 ∈ V∗(Detm⟨𝑒⟩) iff FV(𝑒) ⊆ dom(𝜇) and ⟦𝑒⟧(𝜇) is a Dirac distribution;

43


• 𝜇 ∈ V∗( [𝑒 ⊲⊳ 𝑒′]) iff FV(𝑒) ∪ FV(𝑒′) ⊆ dom(𝜇) and ⟦𝑒⟧(𝑚) ⊲⊳ ⟦𝑒′⟧(𝑚) for

any 𝑚 in the support of 𝜇;

• 𝜇 ∈ V(E[𝑒] ⊲⊳ 𝑐) iff the expected value of expression 𝑒 in 𝜇, i.e., E[𝑒] :=∑
𝑚∈Mem[dom(𝜇)] 𝜇(𝑚) · ⟦𝑒⟧(𝑚), satisfies E[𝑒] ⊲⊳ 𝑐.

For an event 𝑒𝑣, we also write Pr[𝑒𝑣] ⊲⊳ 𝑐 for E[𝑒𝑣] ⊲⊳ 𝑐.

It is straightforward to show that V∗ defined for these atomic propositions

is a persistent valuation.

Proposition 2.3.1. (XD,V∗) forms a BI model.

With these atomic propositions, we can now use bunched logic formulas

to assert interesting probabilistic properties. For instance, we can assert the

independence between two variables using the following assertion.

Lemma 2.3.2. For any distribution 𝜇 ∈ XD, for a set of variables {𝑋𝑖}𝑖∈𝑆, 𝜇 |=

∗𝑖∈𝑆 Own(𝑋𝑖) iff variables {𝑋𝑖}𝑖∈𝑆 are distinct and mutually independent.

We present the proof in appendix A.

The assertion logic has all the axioms for atomic formulas stated in Barthe

et al. [2019, Lemma 3, 4]. While this set of axioms is not complete, they are

useful for reasoning about a rich family of probabilistic properties.

Lemma 2.3.3. The following axiom schemas are valid:

|= [𝑒1 = 𝑒2] → [𝑒2 = 𝑒1] (Eq-Sym)

|= [𝑒1 = 𝑒2] ∧ [𝑒2 = 𝑒3] → [𝑒1 = 𝑒2] (Eq-Tran)

|= Own(𝑒1) → Own(𝑒2) whenever 𝐹𝑉 (𝑒2) ⊆ 𝐹𝑉 (𝑒1) (Own-Incl)

44


Note that |= [𝑒1 = 𝑒1] is not an axiom — it is not sound, since it may not

hold in a randomized memory D(Mem[∅]) with empty domain. We also have

axioms for uniformity propositions.

Lemma 2.3.4. The following axiom schemas are valid:

|= [𝑒1 = 𝑒2] ∧ Unif𝑆⟨𝑒1⟩ → Unif𝑆⟨𝑒2⟩ (Unif-Tran)

|= Unif𝑆⟨𝑒1⟩ → [𝑒1 = 𝑒1] (Unif-Weak)

|= Unif𝑆⟨𝑒1⟩ → Unif𝑆⟨ 𝑓 (𝑒1)⟩ for any bijection ⟦ 𝑓 ⟧ : 𝑆 → 𝑆 and FV( 𝑓 ) ⊆ FV(𝑒1)

(Unif-Bij)

2.3.3 A Program Logic for Reasoning about Independence

We now introduce the program logic layer of probabilistic separation logic. In

the spirit of other separation logic (e.g., [Reynolds, 2002, Brookes, 2007a, Jung

et al., 2018]), the logic is designed to prove separations and harness separations

to prove other properties more easily; in this case, the separation is probabilistic

independence. Similar to standard Hoare logic, it has judgments of the form

{𝑃} prog {𝑄}, where prog is a probabilistic program command in C, and 𝑃, 𝑄

are BI formulas with atomic propositions in APD.

Definition 2.3.6 (Validity). A probabilistic separation logic judgment is valid,

written |= {𝑃} prog {𝑄}, if for all 𝜇 ∈ D(Mem[Var]) such that 𝜇 |= 𝑃, we have

⟦prog⟧(𝜇) |= 𝑄.

Next, we proceed to the proof system, which consists of program rules for

each command and structural rules that match any program command. As be-

fore, we use FV(𝑒) to denote the set of free variables in an expression 𝑒.

45


Program Rules The program rules are presented in fig. 2.8a. The rules RASSN

and SAMP are for randomized assignment and random sampling. Both rules

are presented with the trivial pre-condition ⊤; in practice, one would want to

reason about assignments and sampling starting from general pre-conditions,

and we will derive variants of these rules with other pre-conditions using the

structural rules.

There are two rules governing conditionals. In COND, the precondition im-

plies that the randomized guard 𝑏 behaves deterministically, and thus either

the guard 𝑏 is true and 𝑐 executes or the guard 𝑏 is false and 𝑐′ executes. If both

branches guarantee 𝜓 as the post-condition, then 𝜓 is also the post-condition

for the conditional. The rule RCOND, on the other hand, applies when the ran-

domized guard 𝑏 is separate from the rest of the pre-condition — that is, it must

be probabilistically independent of the portion of the randomized memory cap-

tured by 𝜑. This independence is crucial for ensuring 𝜑 remains valid as the

pre-condition of both branches: each branch’s input distribution is obtained by

conditioning on the guard’s value in the original distribution; notably, that con-

ditioning operation can invalidate 𝜑 if the guard 𝑏 and variables in 𝜑 are corre-

lated, even if they share no variables. To illustrate this, recall [Barthe et al., 2019,

Example 1],

Example 2.3.1. Suppose that 𝑥, 𝑦, 𝑧 are boolean program variables, and let 𝜇 be

the output of:

𝑥 $← UnifB; 𝑦 $← UnifB; 𝑧 ← 𝑥 ∨ 𝑦

In other words, 𝑥 and 𝑦 store the results of two fair coin flips, and 𝑧 stores the

value of 𝑥 ∨ 𝑦. Then 𝑥 and 𝑦 are independent in 𝜇, i.e., Own(𝑥) ∗ Own(𝑦) holds

in 𝜇. However, if 𝑀 ⊆ Mem[Var] is the set of all randomized memories where

𝑧 = tt, representing the event that 𝑧 is true, then Own( [)𝑥] ∗ Own( [)𝑦] does not

46


hold in 𝜇 | 𝑀 . Intuitively, if we know 𝑧 = tt, then 𝑥 and 𝑦 are correlated: if one is

false, then the other must be true.

We also need to be more careful when formulating the post-condition for

conditionals. Even when both the true branch and the false branch guarantee 𝜓

as the post-condition, it is in general unsound to conclude the post-condition 𝜓

for if 𝑏 then 𝑐 else 𝑐′. We also illustrate this through an example.

Example 2.3.2. Suppose that 𝑥, 𝑦, 𝑧 are boolean program variables, and let 𝜇 be

the output of:

𝑧 $← Bern1/2;

if 𝑧 then 𝑥 $← Bern0.9; 𝑦 $← Bern0.9

else 𝑥 $← Bern0.1; 𝑦 $← Bern0.1

In both the true branch’s output and the false branch’s output, 𝑥 and 𝑦 are

probabilistically independent, and thus validating Own(𝑥) ∗ Own(𝑦) as a post-

condition. However, in 𝜇, 𝑥 and 𝑦 are not independent: when 𝑥 is true, then

more likely is 𝑧 true and the true branch to be executed, and thus 𝑦 is also more

likely to be true; the case is similar when 𝑥 is false.

To make sure that the post-conditions from the branches can be combined

into the post-condition of the conditional, the side condition of RCOND checks

that the part of post-condition 𝜓 determines unique portion of the distribution

over randomized memories. Formally, we adapt the following class of asser-

tions from separation logic [Reynolds, 2002].

Definition 2.3.7. A formula 𝜑 is supported (SP) if there exists a randomized mem-

ory 𝜇 such that if 𝜇′ |= 𝜑, then 𝜇 ⊑ 𝜇′.

We can prove by induction that the following syntactic conditions ensure SP.

47


Lemma 2.3.5. The following assertions are SP:

𝜂 ::= 𝑝𝑑 | [𝑥 = 𝑣] | 𝑥 $∼ 𝜇 | 𝜂 ∗ 𝜂

Last, the loop rule LOOP is in the same style of COND, which also requires

the guard to be deterministic as a consequence of the precondition 𝜑. This side

condition essentially restricts the loop to run a deterministic number of itera-

tions. In that case, if we have precondition 𝜑 and the program 𝑐 preserves 𝜑 as

an invariant, then when the loop while 𝑏 do 𝑐 terminates, we have 𝜑∧ [𝑏 = ff ] as

the post-condition.

Structural Rules The structural rules are in fig. 2.8b and they apply to Hoare

triples with any command 𝑐 as long as the pre- and post-conditions match. The

rules WEAK, TRUE, CONJ, and CASE are standard.

CONST is the rule of constancy from Hoare logic, which states that, if a for-

mula 𝜂 does not mention any of 𝑐’s modified variables MV(𝑐), then it can be

conjoined to the pre- and post-condition. This rule is not sound in standard sep-

aration logic — motivating the separating conjunction and the frame rule — but

it is sound in Probabilistic Separation Logic because writes in pWhile cannot

invalidate assertions about other variables.

But, the post-condition in CONST does not ensure that 𝜓 and 𝜂 use proba-

bilistically independent variables. For this stronger guarantee, we need FRAME,

whose side conditions mention several classes of variables. Roughly speaking,

RV(𝑐) is the set of variables that 𝑐 may read from, while WV(𝑐) is the set of

variables that 𝑐 must write to (before possibly reading from). MV(𝑐) is the set of

variables that 𝑐 may write to, so WV(𝑐) is a subset of MV(𝑐). Formally,

48


Definition 2.3.8. RV,WV,MV are defined as follows:

RV(𝑥𝑟 ← 𝑒𝑟) ≜ FV(𝑒𝑟) RV(𝑥𝑟 $← 𝜇) ≜ ∅

RV(𝑐 ; 𝑐′) ≜ RV(𝑐) ∪ (RV(𝑐′) \WV(𝑐))

RV(if 𝑏 then 𝑐 else 𝑐′) ≜ FV(𝑏) ∪ RV(𝑐) ∪ RV(𝑐′)

RV(while 𝑏 do 𝑐) ≜ FV(𝑏) ∪ RV(𝑐)

WV(𝑥𝑟 ← 𝑒𝑟) ≜ {𝑥𝑟} \ FV(𝑒𝑟) WV(𝑥𝑟 $← 𝜇) ≜ {𝑥𝑟}

WV(𝑐 ; 𝑐′) ≜ WV(𝑐) ∪ (WV(𝑐′) \ RV(𝑐))

WV(if 𝑏 then 𝑐 else 𝑐′) ≜ (WV(𝑐) ∩WV(𝑐′)) \ FV(𝑏)

WV(while 𝑏 do 𝑐) ≜ WV(𝑐)

MV(𝑥𝑟 ← 𝑒) ≜ {𝑥𝑟} MV(𝑥𝑟 $← 𝜇) ≜ {𝑥𝑟} MV(𝑐 ; 𝑐′) ≜ MV(𝑐) ∪MV(𝑐′)

MV(if 𝑏 then 𝑐 else 𝑐′) ≜ MV(𝑐) ∪MV(𝑐′) MV(while 𝑏 do 𝑐) ≜ MV(𝑐)

Last, FRAME says that we can conjoin a formula 𝜂 to both the pre- and post-

conditions if

1. 𝜂 does not use any variables modified by the program 𝑐;

2. the program 𝑐 only reads from the part of memories that the precondition

𝜑 describes;

3. the post-condition 𝜓 only talks about variables that the precondition 𝜑

already describes or variables the program 𝑐 writes to.

The first condition is standard in separation logic — separation logics for rea-

soning about heaps or concurrency also need an analogous condition. The sec-

ond and the third condition are needed because our star ∗ asserts probabilistic

49


independence: if 𝑐 reads from variables in 𝜂, then the post-condition 𝜓 may not

be independent from 𝜂; if 𝜓 talks about variables that are neither written by the

command 𝑐 nor described by 𝜓, those variables may be already correlated with

variables in 𝜂. Together, this set of conditions guarantees that 𝜂 refers to vari-

ables that are probabilistically independent of 𝜓, thus validating 𝜓 ∗ 𝜂 as the

post-condition.2

All these proof rules are sound.

Theorem 2.3.6 (Soundness). If ⊢ {𝜑} 𝑐 {𝜓} is derivable, then |= {𝜑} 𝑐 {𝜓}.

When proving the soundness of the program rules, we sometimes want to

focus on a smaller distribution 𝜇′ inside a given distribution 𝜇 |= 𝜑, such that 𝜇′

satisfies some sub-formula of 𝜑. Specifically, such reasoning is used in the proof

for CASE, CONST, and FRAME. To ensure there exists such a smaller distribu-

tion, we require the assertion logic to satisfy a key condition called restriction,

which says that to check whether a distribution satisfies 𝜑, it suffices to check

whether its marginalization on FV(𝜑) satisfies 𝜑. BI formulas in (XD,V∗) satis-

fies restriction:

Lemma 2.3.7 (Restriction). Let 𝜇 ∈ D(Mem[𝑆]) and let 𝜑 be a BI formula. Then:

𝜇 |= 𝜑⇔ (𝜎, 𝜋FV(𝜑) (𝜇)) |= 𝜑.

We leave the proof for theorem 2.3.6 and lemma 2.3.7 to appendix A.

Barthe et al. [2019] demonstrates that probabilistic separation logic can be

used to prove the correctness of various cryptographic schemes, where security

relies on the independence of secrets and public information.
2There also exist other choices for the side conditions of FRAME — we stick with the choice

by Barthe et al. [2019].

50


SKIP
⊢ {𝜑} skip {𝜑}

SEQN
⊢ {𝜑} 𝑐 {𝜓} ⊢ {𝜓} 𝑐′ {𝜂}

⊢ {𝜑} 𝑐 ; 𝑐′ {𝜂}

DASSN
⊢ {Detm⟨𝑒⟩ ∧ 𝜑[𝑒/𝑥]} 𝑥 ← 𝑒 {Detm⟨𝑥⟩ ∧ 𝜑}

RASSN
𝑥𝑟 ∉ FV(𝑒𝑟)

⊢ {⊤} 𝑥𝑟 ← 𝑒𝑟 {[𝑥𝑟 = 𝑒𝑟]}
SAMP

⊢ {⊤} 𝑥𝑟 $← 𝜇 {𝑥𝑟 $∼ 𝜇}

COND
⊢ {𝜑 ∧ [𝑏 = tt]} 𝑐 {𝜓} ⊢ {𝜑 ∧ [𝑏 = ff ]} 𝑐′ {𝜓} |= 𝜑→ Detm⟨𝑏⟩

⊢ {𝜑} if 𝑏 then 𝑐 else 𝑐′ {𝜓}

RCOND
⊢ {𝜑 ∗ [𝑏 = tt]} 𝑐 {𝜓 ∗ [𝑏 = tt]} ⊢ {𝜑 ∗ [𝑏 = ff ]} 𝑐′ {𝜓 ∗ [𝑏 = ff ]} 𝜓 ∈ SP

⊢ {𝜑 ∗ Own(𝑏)} if 𝑏 then 𝑐 else 𝑐′ {𝜓 ∗ Own(𝑏)}

LOOP
⊢ {𝜑 ∧ [𝑏 = tt]} 𝑐 {𝜑} |= 𝜑→ Detm⟨𝑏⟩

⊢ {𝜑} while 𝑏 do 𝑐 {𝜑 ∧ [𝑏 = ff ]}

(a) Program Rules of Probabilistic Separation Logic

WEAK
⊢ {𝜑} 𝑐 {𝜓} |= 𝜑′→ 𝜑 ∧ 𝜓 → 𝜓′

⊢ {𝜑′} 𝑐 {𝜓′}
TRUE

⊢ {⊤} 𝑐 {⊤}

CONJ
⊢ {𝜑1} 𝑐 {𝜓1} ⊢ {𝜑2} 𝑐 {𝜓2}
⊢ {𝜑1 ∧ 𝜑2} 𝑐 {𝜓1 ∧ 𝜓2}

CASE
⊢ {𝜑1} 𝑐 {𝜓1} ⊢ {𝜑2} 𝑐 {𝜓2}
⊢ {𝜑1 ∨ 𝜑2} 𝑐 {𝜓1 ∨ 𝜓2}

CONST
⊢ {𝜑} 𝑐 {𝜓} FV(𝜂) ∩MV(𝑐) = ∅

⊢ {𝜑 ∧ 𝜂} 𝑐 {𝜓 ∧ 𝜂}

FRAME

⊢ {𝜑} 𝑐 {𝜓} FV(𝜂) ∩MV(𝑐) = ∅
|= 𝜑→ Own(𝑇 ∪ RV(𝑐)) FV(𝜓) ⊆ 𝑇 ∪ RV(𝑐) ∪WV(𝑐)

⊢ {𝜑 ∗ 𝜂} 𝑐 {𝜓 ∗ 𝜂}

(b) Structural Rules of Probabilistic Separation Logic

Figure 2.8: Rules of Probabilistic Separation Logic

51


CHAPTER 3

A PROGRAM LOGIC FOR NEGATIVE DEPENDENCE

3.1 Overview

In the last chapter, we have seen a program logic for reasoning about proba-

bilistic independence. While independence is useful for many applications, it is

a strict requirement. A natural question is, what if we do not have perfect in-

dependence? Can we use other kind of probabilistic dependencies in program

analysis?

Utilizing probabilistic dependencies is, for example, important when we an-

alyze hashing-based probabilistic data structures such as hash tables and Bloom

filters. In these applications, a hash function ℎ maps a universe of possible val-

ues, typically large, to a set of buckets, typically small, and items are looked

up through their hashes. The performance of hash-based data structures is cap-

tured by a variety of probabilistic guarantees, e.g., the space usage, the amor-

tized cost of insertion, the amortized cost of look-up, etc. One useful proba-

bilistic guarantee is the false positive rate: the probability that a data structure

mistakenly identifies an element as being stored in the data structure, when it

was not inserted. We may also be interested in load measures, such as the prob-

ability that a bucket in the data structure overflows. A typical way to analyze

these quantities is to treat random hash functions as balls-into-bins processes.

For example, hashing unique elements into bins can be modeled as throwing

balls into bins, where each bin is drawn uniformly at random.

While this modeling is convenient, one complication is that the counts of

52


the elements in the different buckets are not probabilistically independent: one

bin containing many elements makes it more likely that other bins contain few

elements. The lack of independence makes it difficult to reason about multi-

ple bins, for instance, bounding the number of occupied bins. Moreover, many

common tools for analyzing probabilistic processes, like concentration bounds,

usually require independence. This subtlety has also been a source of prob-

lems in pen-and-paper analyses of probabilistic data structures (e.g., Mullin

[1983], Blustein and El-Maazawi [2002]). After many attempts to correct the

bounds for Bloom filter’s false positive rate [Bose et al., 2008, Christensen et al.,

2010] using pen-and-paper proofs, recently, Gopinathan and Sergey [2020] cer-

tified its analysis using a complex proof in ROCQ. We aim to develop a simpler

method to formally reason about hash-based data structures and balls-into-bins

processes, drawing on a key concept in probability theory: negative depen-

dence.

While there are multiple incomparable definitions of negative dependence,

Joag-Dev and Proschan [1983] proposed a notion called negative association (NA)

that shares many good probabilistic properties of probabilistic independence.

First, some standard theorems about sums of independent random variables ap-

ply more generally to sums of NA random variables. In particular, the widely-

used Chernoff bound, which intuitively says that the sum of independent ran-

dom variables is close to the expected value of the sum with high probability,

holds also for NA variables. Intuitively, it is unlikely for all variables to at-

tain high values compared to their expected value, and equally unlikely for all

variables to attain low values; thus, their sum most likely stays close with the

expected value of the sum. Second, negative association is preserved by some

common operations on random variables. For instance, variables that use an

53


independent source of randomness are independent — a crucial property that

validates FRAME in probabilistic separation logic; while this does not hold for

negative associated variables, the following variation holds: if 𝑋,𝑌 are nega-

tively associated, and 𝑍 is obtained by applying monotone map on 𝑋 , then 𝑍,𝑌

are also negatively associated. Such closure properties allow one to prove neg-

ative dependence in a compositional way.

In this chapter, we introduce a program logic for proving and utilizing neg-

ative association and independence. While probabilistic separation logic intro-

duced in the last chapter can assert independence using the multiplicative con-

junction, its assertion logic cannot express negative association. Inspired by this

approach, we think of asserting negative association using another multiplica-

tive conjunction, but that means we need to support multiple multiplicative

conjunctions in the assertion logic. For that purpose, we propose 𝑀-BI, an ex-

tension to bunched logic where each element in 𝑀 is associated with its own

multiplicative conjunction and implication. In the following, we first present a

BI model for asserting negative association, then combine it with the BI model

for asserting probabilistic independence into an 𝑀-BI model, and last, we de-

sign a program logic that incorporates compositional proof principles of NA.

3.2 Negative Association

We now define negative association precisely and state its properties. Negative

association is a property of a set of random variables, formalized as follows:

Definition 3.2.1 (Negative Association (NA)). Let 𝑋1, . . . 𝑋𝑛 be random vari-

ables. The set {𝑋𝑖}𝑖 is negatively associated (NA) if for every pair of subsets

54


𝐼, 𝐽 ⊆ {1, . . . , 𝑛} such that 𝐼 ∩ 𝐽 = ∅, and every pair of both monotone or both

antitone functions1 𝑓 : R|𝐼 | → R and 𝑔 : R|𝐽 | → R, where 𝑓 , 𝑔 is either lower

bounded or upper bounded,2 we have:

E
[
𝑓 (𝑋𝑖, 𝑖 ∈ 𝐼) · 𝑔(𝑋 𝑗 , 𝑗 ∈ 𝐽)

]
≤ E[ 𝑓 (𝑋𝑖, 𝑖 ∈ 𝐼)] · E

[
𝑔(𝑋 𝑗 , 𝑗 ∈ 𝐽)

]
.

We can view NA as generalizing independence: a set of independent random

variables is NA because equality holds. NA also strengthens negative covariance,

a simpler notion of negative dependence that occurs frequently in statistics lit-

erature. Negative correlation [Rice, 2007, Chapter 4.3] of 𝑋1, . . . , 𝑋𝑛 says that

E


∏
𝑖∈[𝑛]

𝑋𝑖

 ≤
∏
𝑖∈[𝑛]

E[𝑋𝑖],

which automatically holds if {𝑋1, . . . , 𝑋𝑛} are negatively associated. To see that,

we show the following:

Lemma 3.2.1. Let 𝑋1, . . . , 𝑋𝑛 be a sequence of NA random variables, then for any

family of non-negative all monotone or all antitone functions 𝑓𝑖 : R→ R,

E


∏
𝑖∈[𝑛]

𝑓𝑖 (𝑋𝑖)
 ≤

∏
𝑖∈[𝑛]

E[ 𝑓𝑖 (𝑋𝑖)] .

Proof. We prove it by induction. The base case is when 𝑛 = 1, then trivially,

E


∏
𝑖∈[𝑛]

𝑓𝑖 (𝑋𝑖)
 = E[ 𝑓1(𝑋1)] ≤

∏
𝑖∈[𝑛]

E[ 𝑓1(𝑋1)] .

1In the following, we will consistently use monotone to mean monotonically non-decreasing
and antitone to mean monotonically non-increasing.

2Technically, we slightly modify Dubhashi and Ranjan [1998]’s NA by in addition assuming
that 𝑓 , 𝑔 are bounded from one side. We add the condition to have a cleaner version of theo-
rem 3.3.1 and Theorem 3.3.5. All our other results and properties we state about NA in Sec-
tion 3.2 hold with or without this condition.

55


When 𝑛 > 1, note that the map (𝑋1, . . . , 𝑋𝑛−1) →
∏
𝑖∈[𝑛−1] 𝑓𝑖 (𝑋𝑖)) is also monotone

if all 𝑓𝑖 are monotone, and antitone if all 𝑓𝑖 are antitone,

E


∏
𝑖∈[𝑛]

𝑓𝑖 (𝑋𝑖)
 = E

©­«
∏

𝑖∈[𝑛−1]
𝑓𝑖 (𝑋𝑖)ª®¬ · 𝑓𝑛 (𝑥𝑛)


= E

©­«
∏

𝑖∈[𝑛−1]
𝑓𝑖 (𝑋𝑖)ª®¬

 · E[ 𝑓𝑛 (𝑥𝑛)] (Because the variables are NA)

=
∏
𝑖∈[𝑛]

E[ 𝑓𝑖 (𝑋𝑖)] (By inductive hypothesis)

□

In particular, when we take all 𝑓𝑖 to be identity functions, we derive

E
[∏

𝑖∈[𝑛] 𝑋𝑖
]
≤ ∏

𝑖∈[𝑛] E[𝑋𝑖] from variables being NA.

NA variables can arise from various mechanisms.

Theorem 3.2.2 (See Dubhashi and Ranjan [1998]). We enumerate three scenarios:

1. The set of independent random variables {𝑋1, . . . , 𝑋𝑛} is negatively associated.

2. If {𝑋1, . . . , 𝑋𝑛} are Bernoulli random variables such that
∑
𝑖∈[𝑛] 𝑋𝑖 = 1, then the

set of variables is negatively associated.

3. Let 𝑋 be a uniformly random permutation of a finite, nonempty multi-set 𝐴, and

for each 𝑖, let 𝑋𝑖 be the 𝑖-th entry in the vector 𝑋 . Then {𝑋1, . . . , 𝑋𝑛} is negatively

associated.

As an example of the second case, consider a deck of cards perfectly shuffled

— so that the cards’ order is uniformly sampled from all possible permutations.

If, for each 𝑖, 𝑋𝑖 gets the value on the 𝑖-th card, then the variables {𝑋𝑖}𝑖 are neg-

atively associated. Also, the third case of this theorem implies that if we draw

56


a length-𝑛 one-hot vector, i.e., a vector that has one entry being one and all re-

maining entries being zero, uniformly at random, then the entries of the vector

satisfies negative association.

The following theorem states three key closure properties of NA random

variables.

Theorem 3.2.3 (See Dubhashi and Ranjan [1998]). We enumerate three scenarios:

1. For any negatively associated set of variables 𝑇 , and for any 𝑆 that is a non-empty

subset of 𝑇 , the set 𝑆 of random variables is negatively associated;

2. For any two sets of negatively associated random variables 𝑇,𝑈 such that every

𝑋 ∈ 𝑇 and 𝑌 ∈ 𝑈 is independent of each other, the union set 𝑇 ∪ 𝑈 of random

variables is negatively associated.

3. Let {𝑋1, . . . 𝑋𝑛} be negatively-associated, and 𝐼1, . . . , 𝐼𝑚 be a partition of the set

{1, . . . , 𝑛}. For each 1 ≤ 𝑗 ≤ 𝑚, let 𝑓 𝑗 : R|𝐼 𝑗 | → R be monotone. Let 𝑆 =

{ 𝑓1(𝑋𝑘 , 𝑘 ∈ 𝐼1), . . . , 𝑓𝑚 (𝑋𝑘 , 𝑘 ∈ 𝐼𝑚)}. Then 𝑆 is negatively associated.

The first case shows that NA is preserved if we discard random variables,

while the second case allows us to join two independent sets of negatively as-

sociated random variables to form a larger negatively associated set. Finally,

the third case guarantees that negative association is preserved under applying

monotone maps on disjoint subsets of variables.

Chernoff’s Bound and Negative Association Another nice property of NA is

that negatively associated random variables satisfies some frequently used tail

bounds, including Chernoff’s bound.

57


Chernoff’s bound is one of the most basic and versatile tools in the

life of a theoretical computer scientist, with a seemingly endless

amount of applications. — Mulzer [2019]

Qualitatively, Chernoff’s bound says the sum 𝑋1 + · · · + 𝑋𝑛 is usually close to its

expected value, and upper bounds the probability that the sum deviates from

the mean for more than a tolerated amount. This kind of analysis is useful for

establishing high-probability guarantees of randomized algorithms, e.g., showing

that the error of a random estimate is at most 0.01 with probability at least 99%.

There are various formulations of Chernoff’s bound, with different assumptions

of the random variables (e.g., {𝑋𝑖}𝑖 being independent Bernoulli random vari-

ables, or {𝑋𝑖}𝑖 simply being independent bounded random variables) and dif-

ferent ways to measure the error (e.g., the additive form uses the absolute dif-

ference between the realized value and the expected value, and the multiplica-

tive form uses error ratio). While the mainstream formulation of Chernoff’s

bound all require the variables {𝑋𝑖}𝑖 to be independent, Dubhashi and Ranjan

[1998] observes that Chernoff’s bound also holds on negatively associated ran-

dom variables.

We state the result using a formulation in the additive form for [0, 1]

bounded random variables. This version is also known as the Hoeffding’s in-

equality.

Theorem 3.2.4 (Chernoff-Hoeffiding Bound for NA variables [Dubhashi and

Ranjan, 1998]). Let 𝑋1, . . . , 𝑋𝑛 be a sequence of NA random variables, each bounded in

[0, 1], and let 𝑌 =
∑𝑛
𝑖=1 𝑋𝑖. Then for any failure probability 𝛽 ∈ (0, 1], we have:

Pr[|𝑌 − E[𝑌 ] | ≥ 𝛽] ≤ 𝐹 (𝛽, 𝑛) where 𝐹 (𝛽, 𝑛) = e−2𝛽2/𝑛.

58


Or, an equivalent way to express it is,

Pr[|𝑌 − E[𝑌 ] | ≥ 𝑇 (𝛽, 𝑛)] ≤ 𝛽 where 𝑇 (𝛽, 𝑛) =
√︁
(𝑛/2) ln(1/𝛽).

Proof. The proof uses Hoeffding’s lemma: for any real-valued random variable

𝑋 such that 𝑋 ∈ [𝑎, 𝑏] almost surely. Then for any 𝜆 ∈ R, E
[
e𝜆(𝑋−E[𝑋])

]
≤

e𝜆
2 (𝑏−𝑎)2/8. (See, e.g., Romanı́ [2021], for the proof of Hoeffding’s lemma.)

For any 𝜆 > 0, the event 𝑌 −E[𝑌 ] ≥ 𝛽 is the same as the event e𝜆(𝑌−E[𝑌 ]) ≥ e𝜆𝛽,

Pr(𝑌 − E[𝑌 ] ≥ 𝛽) = Pr(e𝜆(𝑌−E[𝑌 ]) ≥ e𝜆𝛽).

By Markov inequality, for any positive random variable 𝑋 , Pr(𝑋 ≥ 𝑎) ≤ E[𝑋]
𝑎

. By

regarding e𝜆(𝑌−E[𝑌 ]) as the random variable, the Markov inequality gives us

Pr(e𝜆(𝑌−E[𝑌 ]) ≥ e𝜆·𝛽) ≤
E
[
e𝜆(𝑌−E[𝑌 ])

]
e𝜆·𝛽

Now we analyze the nominator on the right,

E
[
e𝜆(𝑌−E[𝑌 ])

]
= E

[
e𝜆(

∑
𝑖∈[𝑛] 𝑋𝑖−E[∑𝑖∈[𝑛] 𝑋𝑖])

]
(By definition of 𝑌 )

= E
[
e𝜆·

∑
𝑖∈[𝑛] (𝑋𝑖−E[𝑋𝑖])

]
(By linearity of expectation)

= E


∏
𝑖∈[𝑛]

e𝜆·(𝑋𝑖−E[𝑋𝑖])


≤
∏
𝑖∈[𝑛]

E
[
e𝜆·(𝑋𝑖−E[𝑋𝑖])

]
(Because {𝑋𝑖}𝑖 are NA and by lemma 3.2.1)

≤
∏
𝑖∈[𝑛]

e𝜆
2/8. (Hoeffding’s lemma)

Therefore,

Pr(e𝜆(𝑌−E[𝑌 ]) ≥ e𝜆·𝛽) ≤
∏
𝑖∈[𝑛] e𝜆

2/8

e𝜆·𝛽
= e(𝑛·𝜆

2/8)−𝜆·𝛽

59


Similarly,

Pr(E[𝑌 ] − 𝑌 ≥ 𝛽) = Pr(e𝜆(E[𝑌 ]−𝑌 ) ≥ e𝜆·𝛽)

=
E
[
e𝜆·

∑
𝑖∈[𝑛] (E[𝑋𝑖]−𝑋𝑖)

]
e𝜆·𝛽

(By linearity of expectation)

≤
∏
𝑖∈[𝑛] E

[
e𝜆·(E[𝑋𝑖]−𝑋𝑖)

]
e𝜆·𝛽

(Because {𝑋𝑖}𝑖 are NA and by lemma 3.2.1)

≤
∏
𝑖∈[𝑛] e𝜆

2/8

e𝜆·𝛽
(Hoeffding’s lemma)

= e(𝑛·𝜆
2/8)−𝜆·𝛽

The 𝜆 that minimizes e(𝑛·𝜆
2/8)−𝜆·𝛽 is the 𝜆 that minimizes (𝑛 ·𝜆2/8)−𝜆 ·𝛽, which

is 4𝛽
𝑛

. Substitute 4𝛽
𝑛

for 𝜆 in e(𝑛·𝜆
2/8)−𝜆·𝛽, we can reduce the bound e(𝑛·𝜆

2/8)−𝜆·𝛽 into

e−2𝛽2/𝑛.

□

Crucially, the step that previously relied on independence of {𝑋𝑖}𝑖 now fol-

lows from {𝑋𝑖}𝑖 being NA and lemma 3.2.1.

3.3 A BI Frame for Negative Dependence

Now that we have seen some nice properties of negative association, we start

the quest of building a bunched logic that can assert both negative association

and independence. Concretely, we construct a BI frame XPNA that can capture

negative association and then combine it with our BI model for probabilistic

independence (XD,V∗). To be compatible with (XD,V∗), we let XPNA have the

same set of states and the same pre-order as XD. The important remaining piece

60


of the puzzle is the binary operation ⊕, which must satisfy the frame conditions

while capturing negative association.

The meaning of “capturing negative association” has so far been left ambigu-

ous. Previously, in the design process of (XD,V∗), we want the satisfaction of

𝑃 ∗ 𝑄 ensures that 𝑃 and 𝑄 hold on independent components of distributions,

and more precisely, we choose to require all variables involved in 𝑃 are inde-

pendent of all variables involved in 𝑄. Because 𝑃 ∗ 𝑄 is interpreted through the

binary operation ⊗D, we define ⊗D to take the independent product of two dis-

tributions when possible. Now, analogously, we want to interpret the formula

𝑃 ⊛ 𝑄 through a binary operation ⊕:

𝑥 |=V∗ 𝑃 ⊛ 𝑄 iff there exist 𝑥′, 𝑦, 𝑧 s.t. 𝑥 ⊒ 𝑥′ ∈ 𝑦 ⊕ 𝑧, 𝑦 |=V∗ 𝑃 and 𝑧 |=V∗ 𝑄

such that 𝑃 ⊛ 𝑄 ensures that 𝑃 and𝑄 hold on negatively associated components

of distributions — we use ⊛ for the separating conjunction interpreted on XPNA

to distinguish it from the separating conjunction for asserting independence.

But negative association is defined for a set of variables instead of two (groups)

of variables, and it is unclear what “𝑃,𝑄 holds on negatively associated compo-

nents” should mean. In the following, we explore several plausible definitions

of ⊕, with the goal that we can express a set of variables 𝑥1, . . . , 𝑥𝑛 is negatively

associated using formulas involving ⊛.

3.3.1 Initial Attempts at a BI Frame for Negative Association

One first attempt is to let 𝜇1⊕ 𝜇2 be the set of distributions that agree with 𝜇1, 𝜇2,

and satisfy strong NA — we say 𝜇 satisfies strong NA if dom(𝜇) satisfies NA.

61


Definition 3.3.1. (Attempt 1: Strong NA model) Recall that 𝑋D = 𝐸D =

∪𝑆⊆VarD(Mem[𝑆]), and for 𝜇, 𝜇′ ∈ 𝑋D, we have 𝜇 ⊑D 𝜇′ iff dom(𝜇) ⊆ dom(𝜇′)

and 𝜋dom(𝜇)𝜇
′ = 𝜇. Define ⊕𝑠 : 𝑋D × 𝑋D → P(𝑋D):

𝜇1⊕𝑠𝜇2 = {𝜇 ∈ D(Mem[𝑆∪𝑇]) | 𝜇 satisfies strong NA, 𝜋𝑆𝜇 = 𝜇1, 𝜋𝑇𝜇 = 𝜇2, 𝑆∩𝑇 = ∅}.

We call X𝑠 = (𝑋, ⊑, ⊕𝑠, 𝐸𝑠) the strong NA structure.

Unfortunately, the strong NA structure fails to satisfy the Unit Existence con-

dition: if 𝜇 does not satisfy strong NA, then there exists no 𝜇′ that marginalizes

to 𝜇 and satisfies strong NA; and because our definition of 𝑒 ⊕𝑠 𝜇 only includes

distributions 𝜇′ that marginalize to 𝜇, then there is no 𝑒 such that 𝑒 ⊕𝑠 𝜇 is not

empty. The failure of this property implies that whether or not two states can be

combined depends not just on how the two states relate to each other, but also

critically on properties of the single states in isolation (e.g., whether a distribu-

tion satisfies strong NA); this is hard to justify if we are to read ⊕ as describing

which pairs of states can be safely combined.

Looking for a different way of capturing NA, we try working with a weaker

notion of NA. We try letting 𝜇1 ⊕ 𝜇2 return distributions that agree with 𝜇1, 𝜇2

where any variable 𝑥 in dom(𝜇1) must be negatively associated with any vari-

able 𝑦 in dom(𝜇2), but variables within dom(𝜇1) and variables within dom(𝜇2)

need not be negatively associated. We call this notion weak NA.

Definition 3.3.2 (Weak NA). Let 𝑆 ⊆ Var be a set of variables, and let 𝐴, 𝐵 be

two disjoint subsets of 𝑆. A distribution 𝜇 ∈ D(Mem[𝑆]) satisfies (𝐴, 𝐵)-NA if

for every pair of both monotone or both antitone functions 𝑓 : Mem[𝐴] → R,

𝑔 : Mem[𝐵] → R, where we take the point-wise orders on Mem[𝐴] and Mem[𝐵],

such that 𝑓 , 𝑔 is either lower bounded or upper bounded, we have

E𝑚∼𝜇 [ 𝑓 (𝜋𝐴𝑚) · 𝑔(𝜋𝐵𝑚)] ≤ E𝑚∼𝜇 [ 𝑓 (𝜋𝐴𝑚)] · E𝑚∼𝜇 [𝑔(𝜋𝐵𝑚)] .

62


By definition, being (𝐴, 𝐵)-NA for all disjoint 𝐴, 𝐵 ⊆ 𝑆 is equivalent to strong

NA on 𝑆. Also, (𝐴, 𝐵)-NA is closed under projection in the sense that if 𝜇 satis-

fies (𝐴, 𝐵)-NA and 𝐴′ ⊆ 𝐴, 𝐵′ ⊆ 𝐵¡ then 𝜇 satisfies (𝐴′, 𝐵′)-NA as well. Now, we

try defining a model based on weak NA.

Definition 3.3.3. (Attempt 2: Weak NA model) Define ⊕𝑤 : 𝑋D × 𝑋D → P(𝑋D):

𝜇1⊕𝑤𝜇2 = {𝜇 ∈ D(Mem[𝑆∪𝑇]) | 𝜇 satisfies (𝑆, 𝑇)-NA, 𝜋𝑆𝜇 = 𝜇1, 𝜋𝑇𝜇 = 𝜇2, 𝑆∩𝑇 = ∅}.

We call X𝑤 = (𝑋D, ⊑D, ⊕𝑤, 𝐸D) the weak NA structure.

This weak NA structure satisfies most BI frame conditions, except that Asso-

ciativity is unclear. In short, the definition of ⊕𝑤 and Associativity requires that:

if 𝑤 satisfies (𝑅 ∪ 𝑆, 𝑇)-NA and (𝑅, 𝑆)-NA, then 𝑤 also satisfies (𝑆, 𝑇)-NA and

(𝑅, 𝑆 ∪ 𝑇)-NA. Now 𝑤 satisfies (𝑆, 𝑇)-NA by projection closure, but it is unclear

whether (𝑅, 𝑆∪𝑇)-NA follows from these conditions. Failing to satisfy Associa-

tivity would lead to a logic where separating conjunction is not associative, and

significantly more difficult to use. Since we could not find a counter-example

nor prove that X𝑤 satisfies Associativity and thus forms a BI frame, we will

leave this question as an open problem and define another structure to capture

negative association.

3.3.2 Our BI Frame for Negative Association

Facing the problems with the strong NA structure and the weak NA structures,

we define a BI model for negative association based on a new notion of negative

association called S-partition negative association (S-PNA), where S is a partition

of a set of random variables. This notion interpolates weak NA and strong NA in

63


the following sense: when 𝐴, 𝐵 are both sets of variables, {𝐴, 𝐵}-PNA is equiva-

lent to (𝐴, 𝐵)-NA for disjoint 𝐴, 𝐵, and {{𝑥} | 𝑥 ∈ 𝑆}-PNA is equivalent to strong

NA for distributions in D(Mem[𝑆]).

We say a partition S′ coarsens a partition S if ∪S = ∪S′ and for any 𝑠′ ∈ S′,

𝑠′ = ∪R for some R ⊆ S. In particular, any partition S coarsens itself.

Definition 3.3.4 (Partition Negative Association). A distribution 𝜇 is S-PNA

if and only if for any T that coarsens S, for any family of non-negative mono-

tone functions (or family of non-negative antitone functions) { 𝑓𝐴 : Mem[𝐴] →

R+}𝐴∈T ,3 where for each 𝐴 ∈ T the order on Mem[𝐴] is taken to be the point-

wise order, we have

E𝑚∼𝜇

[∏
𝐴∈T

𝑓𝐴 (𝜋𝐴𝑚)
]
≤

∏
𝐴∈T

E𝑚∼𝜇 [ 𝑓𝐴 (𝜋𝐴𝑚)] .

We can use PNA to encode NA:

Theorem 3.3.1. Given a set of variables 𝑆, 𝑆 satisfies NA in 𝜇 iff 𝜇 satisfies S-PNA

for any S partitioning 𝑆 iff 𝜇 satisfies {{𝑥} | 𝑥 ∈ 𝑆}-PNA.

See appendix B.2.1 for the proof. We require PNA to be closed under coars-

ening, which helps us to prove the structure defined next is a BI frame.

Definition 3.3.5. Define the operation ⊕ : 𝑋D × 𝑋D → P(𝑋D):

𝜇1 ⊕ 𝜇2 = {𝜇 ∈D(Mem[𝑆 ∪ 𝑇]) | 𝜋𝑆𝜇 = 𝜇1, 𝜋𝑇𝜇 = 𝜇2,

𝜇 is (S ∪ T )-PNA for any partition S,T such that

𝜇1 is S-PNA, 𝜇2 is T -PNA, and (∪S) ∩ (∪T ) = ∅.}
3We restrict the family of functions to be non-negative: prior work like Joag-Dev and

Proschan [1983] has assumed non-negativity when working with notions of NA on partitions;
furthermore, without that requirement, for partitions with an odd number of components, PNA
would be equivalent to independence, a strange property.

64


This definition of ⊕ interpolates ⊕𝑤 and ⊕𝑠, in the following sense.

Theorem 3.3.2. For any two states 𝜇1, 𝜇2 ∈ 𝑋 , 𝜇1 ⊕𝑠 𝜇2 ⊆ 𝜇1 ⊕ 𝜇2 ⊆ 𝜇1 ⊕𝑤 𝜇2.

The first inclusion is because 𝜇 satisfying strong NA implies 𝜇 is R-PNA for

any partition R on dom(𝜇). The second inclusion is because 𝜇1 ∈ D(Mem[𝑆])

satisfies {𝑆}-PNA and 𝜇2 ∈ D(Mem[𝑇]) satisfies {𝑇}-PNA trivially, which im-

plies any 𝜇 ∈ 𝜇1 ⊕ 𝜇2 would satisfy (𝑆, 𝑇)-NA.

Note that ⊕ is non-deterministic, and not just partial.

Theorem 3.3.3. There are distributions 𝜇1, 𝜇2 such that |𝜇1 ⊕ 𝜇2 | ≥ 2.

Proof. Let 𝜇1 ∈ D(Mem[{𝑥}]) and 𝜇2 ∈ D(Mem[{𝑦}]) be uniform distribu-

tion over memories over boolean variables 𝑥, 𝑦. Then the independent product

𝜇∗ ∈ 𝜇1 ⊗D 𝜇2 is in 𝜇1 ⊕ 𝜇2, because the projections to 𝑥 and to 𝑦 are 𝜇1 and

𝜇2 respectively, and 𝜇∗ satisfies PNA since independence implies PNA (we will

see this shortly in Theorem 3.4.5). But the one-hot uniform distribution 𝜇𝑜ℎ over

variables 𝑥 and 𝑦, i.e., 𝜇𝑜ℎ ( [𝑥 ↦→ 1, 𝑦 ↦→ 0]) = 𝜇𝑜ℎ ( [𝑥 ↦→ 0, 𝑦 ↦→ 1]) = 1/2, is also in

𝜇1 ⊕ 𝜇2, since again the projections match 𝜇1 and 𝜇2 and the one-hot distribution

satisfies NA, and hence PNA. Since 𝜇𝑜ℎ ≠ 𝜇∗, we are done. □

Thus, we build the following BI frame that crucially uses a non-deterministic

binary operation on states to capture negative association.

Theorem 3.3.4. The structure XPNA = (𝑋D, ⊑D, ⊕, 𝐸D) is a Down-Closed BI frame.

For the frame conditions where the previous attempts failed, (Unit Exis-

tence) holds by letting the unit 𝑒 to always be the trivial distribution on the

empty set, and (Associativity) can be proved using the facts that PNA is closed

65


under coarsening and coarsening commute with projections. See the full proof

in Appendix B.2.2.

Furthermore, the binary combination ⊕ in XPNA captures negative associa-

tion: consider the atomic propositions introduced in eq. (2.3) and the valuation

V∗ for them, clearly (XPNA,V∗) forms a BI model; when interpreting BI for-

mulas on the model (XPNA,V∗), we can express negative association of a set of

variables using the separating conjunction ⊛. We use the iterative version of the

connective ∗, which is well-defined because it is associative.

Definition 3.3.6. For any connective ⊙ ∈ {∧,∨,⊛, ∗}, we use the corresponding

big-connective
⊙
∈

{∧
,
∨
,⊛,∗}.

• For any constant or logical variable 𝑁 ≥ 1, let
⊙𝑁

𝑖=0 𝑃𝑖 = 𝑃0 abbreviate

((𝑃0⊙𝑃1)⊙· · · )⊙𝑃𝑁−1. Formally, let
⊙𝑁

𝑖=0 𝑃𝑖 = ⊤ if 𝑁 = 0, and let
⊙𝑁

𝑖=0 𝑃𝑖 ≜(⊙𝑁−1
𝑖=0 𝑃𝑖

)
⊙ 𝑃𝑁 for 𝑁 > 0.

• For a finite multiset of formula {𝑃𝑖}𝑖∈𝑆, let
⊙

𝑠∈𝑆 𝑃𝑠 abbreviate ((𝑃𝑠0 ⊙𝑃𝑠1) ⊙

· · · ) ⊙ 𝑃𝑠𝑘 , where 𝑠0, . . . , 𝑠𝑘 is an arbitrary ordering of 𝑆. The satisfaction is

not ambiguous since ⊙ is associative and commutative.

• For any program variable 𝑣 ∈ Var, for any state 𝜇 |= [𝑣 = 𝑁], we

want
⊙𝑣

𝑖=0 𝑃𝑖 to be equivalent to
⊙𝑁

𝑖=0 𝑃𝑖. Formally,
⊙𝑣

𝑖=0 𝑃𝑖 abbreviates∨
𝑁∈Val( [𝑣 = 𝑁] ∧

⊙𝑁

𝑖=0 𝑃𝑖).

Theorem 3.3.5. Let 𝑆 be any subset of Var. A set of randomized program variables

𝑌 = {𝑦𝑖 | 0 ≤ 𝑖 < 𝐾} satisfies NA in the distribution 𝜇 ∈ D(Mem[𝑆]) if and only if

we have 𝜇 |=⊛𝐾

𝑖=0 Own(𝑦𝑖).

Proof. Forward direction: We denote {𝑦𝑖} as 𝑌 [𝑖], and denote {𝑦𝑖 | 0 ≤ 𝑖 < 𝑗} as

𝑌 [: 𝑗]. We prove by induction on 𝑗 that 𝜋𝑌 [: 𝑗]𝜇 |=⊛ 𝑗

𝑖=0 Own(𝑦𝑖).

66


Base case 𝑗 = 1 : Trivially, 𝜋𝑌 [:1]𝜇 |= Own(𝑦0), and then by persistence.

Inductive case 𝑗 ≥ 1 : Assuming 𝜋𝑌 [: 𝑗]𝜇 |= ⊛ 𝑗

𝑖=0 Own(𝑦𝑖). Since 𝑌 satisfies NA

in 𝜇, by Theorem 3.3.1, 𝜇 is T -PNA for any partition T of 𝑌 . In particu-

lar, for any partition T1 on 𝑌 [: 𝑗] and any (trivial) partition T2 on 𝑌 [ 𝑗], 𝜇

must be T1 ∪ T2-PNA. Thus, 𝜋𝑌 [: 𝑗+1]𝜇 ∈ 𝜋𝑌 [: 𝑗]𝜇 ⊕ 𝜋𝑦[ 𝑗+1]𝜇. Since 𝜋𝑌 [ 𝑗]𝜇 |=

⊛ 𝑗

𝑖=0 Own(𝑦𝑖) and 𝜋𝑌 [ 𝑗]𝜇 |= Own(𝑦 𝑗 ), that implies 𝜋𝑌 [: 𝑗+1] |=⊛ 𝑗+1
𝑖=0 Own(𝑦𝑖).

Thus, we have 𝜋𝑌 𝜇 |=⊛𝐾

𝑖=0 Own(𝑦𝑖). By persistence, 𝜇 |=⊛𝐾

𝑖=0 Own(𝑦𝑖).

Backward direction: for any 𝐴, 𝐵 being disjoint subsets of [𝑛], by commuta-

tivity and associativity of ⊛, we can reorder formula and get

𝜇 |=
(
⊛
𝑖∈𝐴

Own(𝑦𝑖) ⊛⊛
𝑖∈𝐵

Own(𝑦𝑖)
)
⊛ ⊛

𝑦𝑖∈[𝑛]\(𝐴∪𝐵)
Own(𝑦𝑖)

By satisfaction rules, there exists 𝜇1, 𝜇2, 𝜇
′ such that 𝜇 ⊒ 𝜇′ ∈ 𝜇1 ⊕ 𝜇2, and 𝜇1 |=

⊛𝑖∈𝐴 Own(𝑦𝑖), and 𝜇2 |= ⊛𝑖∈𝐵 Own(𝑦𝑖). Note that 𝜇1 is trivially {𝐴}-PNA, and

𝜇2 is trivially {𝐵}-PNA. Thus, 𝜇′ satisfies {𝐴, 𝐵}-PNA.

Therefore, 𝜇 satisfies (𝐴, 𝐵)-NA for any 𝐴, 𝐵 being disjoint subsets of 𝑌 , i.e.,

𝜇 satisfies NA on 𝑌 . □

3.4 𝑀-BI: Combining BI Models

Now that we have a BI model for capturing independence and a BI model for

capturing negative association, we want to combine them and design an as-

sertion logic that can express both independence and negative association; fur-

thermore, it would be helpful to internalize the fact that independence implies

67


negative association in the assertion logic. To achieve that goal, we now ex-

tend bunched logic to support multiple separating conjunctions related by a

pre-order. While our motivation is to use one separating conjunction to assert

independence, and use another to assert negative association, the logic poten-

tially also has other interesting models.

3.4.1 The Syntax and Proof Rules

Let AP be a set of atomic propositions, and (𝑀, ≤) be a finite pre-order. The

formula in the logic of 𝑀-bunched implications (𝑀-BI) has the following gram-

mar:

𝑃,𝑄 ::= 𝑝 ∈ AP | ⊤ | 𝐼𝑚∈𝑀 | ⊥ | 𝑃 ∧𝑄 | 𝑃 ∨𝑄 | 𝑃→ 𝑄 | 𝑃 ∗𝑚∈𝑀 𝑄 | 𝑃 −∗𝑚∈𝑀 𝑄.

We associate each element of 𝑚 ∈ 𝑀 with a separating conjunction ∗𝑚, a

corresponding multiplicative identity 𝐼𝑚 and a separating implication −∗𝑚. The

proof system for M-BI is based on the proof system for BI, with an indexed

copy of rules for each separation, and additionally has the ∗-WEAKENING rules.

We present the full Hilbert-style proof system in fig. 3.1. The new rule ∗-

WEAKENING simply says that the separation conjunction associated with a big-

ger element in 𝑀 is weaker: if 𝑚1 ≤ 𝑚2, then the assertion 𝑃 ∗𝑚1 𝑄 implies

𝑃 ∗𝑚2 𝑄.

We can derive analogous weakening rules for separating implications and

multiplicative identities, in the reverse direction.

68


𝑃 ⊢ 𝑃
AX

𝑃 ⊢ ⊤
TOP

⊥ ⊢ 𝑃
BOT

𝑃 ⊢ 𝑅 𝑄 ⊢ 𝑅
𝑃 ∨𝑄 ⊢ 𝑅

∨-E

𝑃 ⊢ 𝑄𝑖
𝑃 ⊢ 𝑄1 ∨𝑄2

∨-I
𝑃 ⊢ 𝑄 𝑃 ⊢ 𝑅
𝑃 ⊢ 𝑄 ∧ 𝑅

∧-I-R
𝑄 ⊢ 𝑅

𝑃 ∧𝑄 ⊢ 𝑅
∧-I-L

𝑃 ⊢ 𝑄1 ∧𝑄2

𝑃 ⊢ 𝑄𝑖
∧-E

𝑃 ∧𝑄 ⊢ 𝑅
𝑃 ⊢ 𝑄 → 𝑅

→-I
𝑃 ⊢ 𝑄 → 𝑅 𝑃 ⊢ 𝑄

𝑃 ⊢ 𝑅
→-E

𝑃 ⊢ 𝑅 𝑄 ⊢ 𝑆
𝑃 ∗𝑚 𝑄 ⊢ 𝑅 ∗𝑚 𝑆

∗-CONJ
𝑃 ∗𝑚 𝑄 ⊢ 𝑅
𝑃 ⊢ 𝑄 −∗𝑚 𝑅

−∗-I
𝑃 ⊢ 𝑄 −∗𝑚 𝑅 𝑆 ⊢ 𝑄

𝑃 ∗𝑚 𝑆 ⊢ 𝑅
−∗-E

𝑃 ⊣⊢ 𝑃 ∗𝑚 𝐼𝑚
∗-UNIT

𝑃 ∗𝑚 𝑄 ⊢ 𝑄 ∗𝑚 𝑃
∗-COMM

(𝑃 ∗𝑚 𝑄) ∗𝑚 𝑅 ⊣⊢ 𝑃 ∗𝑚 (𝑄 ∗𝑚 𝑅)
∗-ASSOC

𝑚1 ≤ 𝑚2

𝑃 ∗𝑚1 𝑄 ⊢ 𝑃 ∗𝑚2 𝑄
∗-WEAKENING

Figure 3.1: Hilbert system for 𝑀-BI

Lemma 3.4.1. The following rules are derivable in 𝑀-BI:

𝑚1 ≤ 𝑚2

𝑃 −∗𝑚2 𝑄 ⊢ 𝑃 −∗𝑚1 𝑄

−∗-WEAKENING
𝑚1 ≤ 𝑚2

𝐼𝑚2 ⊢ 𝐼𝑚1

UNITWEAKENING

3.4.2 Semantics

As is standard with bunched logics, we give a Kripke style semantics to 𝑀-BI.

We will define a structure called 𝑀-BI frame, and then define 𝑀-BI models and

the satisfaction rules on 𝑀-BI models.

An 𝑀-BI frame is a collection of BI frames sharing the same set of states and

pre-order, with ordered binary operations.

69


Definition 3.4.1 (𝑀-BI Frame). An 𝑀-BI frame is a structureX = (𝑋, ⊑, ⊕𝑚∈𝑀 , 𝐸𝑚)

such that for each 𝑚, (𝑋, ⊑, ⊕𝑚, 𝐸𝑚) is a BI frame (see definition 2.2.1), and there

is a preorder ≤ on 𝑀 satisfying:

𝑚1 ≤ 𝑚2 → 𝑥 ⊕𝑚1 𝑦 ⊆ 𝑥 ⊕𝑚2 𝑦 (Operation Inclusion)

The Operation Inclusion condition together with the frame conditions of BI

imply an inclusion on unit sets:

Lemma 3.4.2. Let X be an 𝑀-BI frame. If 𝑚1 ≤ 𝑚2 then 𝐸𝑚2 ⊆ 𝐸𝑚1 .

Proof. Let 𝑒2 ∈ 𝐸𝑚2 . By Unit Existence, there exists 𝑒1 ∈ 𝐸𝑚1 such that 𝑒2 ∈

𝑒1 ⊕𝑚1 𝑒2. By Operation Inclusion, 𝑒2 ∈ 𝑒1 ⊕𝑚2 𝑒2, so Unit Coherence implies that

𝑒1 ⊑ 𝑒2, and then Unit Closure implies 𝑒2 ∈ 𝐸𝑚1 . So 𝐸𝑚2 ⊆ 𝐸𝑚1 . □

To obtain a 𝑀-BI model over a given 𝑀-BI frame, we need a valuation that

defines which states in the 𝑀-BI frame satisfy each atomic proposition. Again,

for the soundness of the proof system, the valuation must be persistent: any

formula true at a state remains true at any larger state.

Definition 3.4.2 (Valuation and model). An 𝑀-BI model (X,V) is an 𝑀-BI frame

X = (𝑋, ⊑, ⊕𝑚, 𝐸𝑚) associated with a persistent valuationV on it.

Next, we define the satisfaction of 𝑀-BI formula in a 𝑀-BI model. The defi-

nition is almost the same as fig. 2.3, except that it supports the 𝑀-indexed sepa-

ration conjunctions, implications, and units.

Definition 3.4.3. On an 𝑀-BI model (X,V), we define the satisfaction relation

|=V between states in X and 𝑀-BI formula: for any 𝑥 ∈ X,

70


𝑥 |=V ⊤ always
𝑥 |=V ⊥ never
𝑥 |=V 𝐼𝑚 iff 𝑥 ∈ 𝐸𝑚
𝑥 |=V p iff 𝑥 ∈ V(p)
𝑥 |=V 𝑃 ∧𝑄 iff 𝑥 |=V 𝑃 and 𝑥 |=V 𝑄
𝑥 |=V 𝑃 ∨𝑄 iff 𝑥 |=V 𝑃 or 𝑥 |=V 𝑄
𝑥 |=V 𝑃→ 𝑄 iff for all 𝑦 ⊒ 𝑥, 𝑦 |=V 𝑃 implies 𝑦 |=V 𝑄
𝑥 |=V 𝑃 ∗𝑚 𝑄 iff there exist 𝑥′, 𝑦, 𝑧 s.t. 𝑥 ⊒ 𝑥′ ∈ 𝑦 ⊕𝑚 𝑧, 𝑦 |=V 𝑃 and 𝑧 |=V 𝑄
𝑥 |=V 𝑃 −∗𝑚 𝑄 iff for all 𝑦, 𝑧 s.t. 𝑧 ∈ 𝑥 ⊕𝑚 𝑦, 𝑦 |=V 𝑃 implies 𝑧 |=V 𝑄

Analogous to the case in standard BI, we say 𝑃 |= 𝑄 iff, for all models (X,V),

for any state 𝑥 ∈ X, 𝑥 |=V 𝑃 implies 𝑥 |=V 𝑄. We prove that the proof system

for 𝑀-BI is sound and complete with respect to its semantics using the duality-

theoretic framework proposed by Docherty [2019].

Theorem 3.4.3. Let 𝑃 and 𝑄 be any two 𝑀-BI formulas. Then 𝑃 |= 𝑄 iff 𝑃 ⊢ 𝑄.

We show the proof in appendix B.3

3.4.3 A 𝑀-BI Model for Independence and NA

We now combineXPNA with the BI frameXD to construct a 𝑀-BI frame. Since the

separating conjunction in XD captures independence, and separating conjunc-

tion in XPNA captures negative association, we can expect to use 𝑀-BI formulas

interpreted on the combined model to express both probabilistic independence

and negative association.

We combine XD and XPNA into a 2̂-BI model where 2̂ denotes the set {0, 1}

ordered as 0 ≤ 1, the index 0 is associated with the independent combination

⊗D and the index 1 is associated with the NA combination ⊕.

71


Theorem 3.4.4. The structure X𝑁𝐴 = (𝑋D, ⊑D, (⊗D, ⊕), (𝐸D, 𝐸D)) forms a 2̂-BI frame.

Proving X𝑁𝐴 forms a 2̂-BI boils down to showing that for any 𝜇1, 𝜇2 ∈ 𝑋 ,

𝜇1 ⊗D 𝜇2 ⊆ 𝜇1 ⊕ 𝜇2.

The inclusion is implied by the following theorem generalizing the indepen-

dence closure for NA (theorem 3.2.2). Its proof, however, is more involved be-

cause PNA is more expressive and is closed under coarsening.

Theorem 3.4.5 (Independence implies PNA). Let 𝑆, 𝑇 ⊆ Var be two disjoint sets of

variables. Suppose 𝜇1 ∈ D(Mem[𝑆]), 𝜇2 ∈ D(Mem[𝑇]). If 𝜇1 satisfies S-PNA and

𝜇2 satisfies T -PNA, then any 𝜇 ∈ 𝜇𝑆 ⊗D 𝜇𝑇 satisfies (S ∪ T )-PNA.

The proof is based on the observation that: for any coarsening R of S ∪ T ,

any block 𝑝 in R is the union of some blocks from S and some blocks from T .

Intuitively, by the independence closure for NA, for any block 𝑝 in R, the blocks

from S and T that are in 𝑝 are negatively associated with the rest of the blocks

in S and T . Because any other block in R is formed by merging some remaining

blocks in S and some remaining blocks in T , the block 𝑝 is also negatively asso-

ciated with any other block in R. Formally, we establish that proof by induction

on the number of blocks in the coarsening R (see appendix B.4.1).

Because X𝑁𝐴 has the same carrier set as XD, we can combine X𝑁𝐴 with the

persistent valuation V∗ : AP → 𝑋D (Definition 2.3.5) to form a 2̂-BI model

(X𝑁𝐴,V∗). In the remaining of this chapter, we take 2̂-BI formulas interpreted

in this model as our assertion logic.

72


3.5 Logic of Independence and Negative Association

3.5.1 Assertion Logic

When designing a separation logic for reasoning about independence and

negative association, we also want the assertion logic to satisfy restriction

(lemma 2.3.7) so that, to check whether a distribution satisfies 𝜑, it suffices to

check whether the distribution’s projection on FV(𝜑) satisfies 𝜑. Previously, to

prove the soundness of the program logic in section 2.3.3, we show that all BI

formulas satisfy the restriction property when interpreted on XD. Here, not all

2̂-BI formulas satisfy the restriction property when interpreted onX𝑁𝐴; we iden-

tify a subset MBI+ that satisfies restriction.

Definition 3.5.1. We define MBI+ as

MBI+ ∋ 𝑃,𝑄 ::= 𝑝 ∈ AP | ⊤ | ⊥ | 𝑃 ∧𝑄 | 𝑃 ∨𝑄 | 𝑃→ 𝑄 | 𝑃 ∗ 𝑄 | 𝑃 −∗ 𝑄 | 𝑃 ⊛ 𝑄

where AP is defined as in 2.3.

MBI+ omits multiplicative identities 𝐼𝑚 because on (X𝑁𝐴,V∗) they are all

equivalent to ⊤. The only limitation is that MBI+ excludes the use of −⊛.

Theorem 3.5.1 (Restriction). For any distribution 𝜇 ∈ 𝑋D, for any 𝜑 be an MBI+

formula interpreted on (X𝑁𝐴,V∗), and any valuationV,

𝜇 |=V 𝜑⇔ 𝜋FV(𝜑)𝜇 |=V 𝜑.

We defer its proof to appendix B.4.3. Indeed, we can exhibit a counterexam-

ple showing that −⊛ does not satisfy restriction.

73


Theorem 3.5.2. There exists 𝜇 ∈ D(Mem[𝑆]) and formula 𝜑 such that 𝜇 |= 𝜑 but

𝜋FV(𝜑) ̸ |= 𝜑.

We also defer its proof to appendix B.4.3. Below, we consider MBI+ formula

on the (X𝑁𝐴,V∗) model as the assertion logic. In this assertion logic, all the

axioms for the Independence BI model section 2.3.2 and the NA BI model sec-

tion 3.3.2 still hold. Also, because (X𝑁𝐴,V∗) is a conservative extension of

(XPNA,V∗), the theorem theorem 3.3.5 that says NA is captured by separating

conjunction ⊛ also still holds. We also have some new axioms for the nega-

tive association conjunction, which involves two new distributions introduced

below.

Definition 3.5.2 (One-Hot Vectors). Let oh(𝑛) denote the set of one-hot vectors

of length 𝑛, where a one-hot vector [. . . , 1, . . . ] has exactly one entry set to 1 and

all other entries set to 0. We abbreviate Unifoh(𝑛) as OH𝑛.

To describe the next distribution, we generalize the function Unif (−) so that

it can also be used to describe uniform distributions over multi-sets. A multi-

set is an unordered collection of items that allow an item to occur more than

once. When 𝐴 is an multi-set, the distribution Unif𝐴 assigns the outcome 𝑥 with

weight

Unif𝐴 (𝑎) =
Multiplicity of 𝑥∑
𝑦∈𝐴 Multiplicity of 𝑦

.

It is clear that, when 𝐴 is simply a set, this definition agrees with our definition

of uniform distribution over a set in fig. 2.6.

Definition 3.5.3 (Permutations). Given a finite multi-set of 𝐴, a permutation of

𝐴 is a bijective function 𝛼 : 𝐴 → 𝐴. We let permutation(𝐴) be the multi-set

of 𝐴’s permutations. When 𝐴 has duplicates, we distinguish them using addi-

74


tional labels; so there are always |𝐴|! elements in permutation(𝐴). We abbreviate

Unifpermutation(𝐴) as Permu𝐴.

Then, we have the following axioms that introduce formulas that assert neg-

ative association among variables.

Lemma 3.5.3. Let 𝑥𝛾 be variables. The following axioms are valid in (X𝑁𝐴,V∗).

|= OH𝑁 ⟨[𝑥0, . . . , 𝑥𝑁−1]⟩ →
𝑁

⊛
𝛾=0

Own(𝑥𝛾) (OH-PNA)

|= Permu𝐴⟨[𝑥0, . . . , 𝑥𝑁−1]⟩ →
𝑁

⊛
𝛾=0

Own(𝑥𝛾) (Perm-PNA)

The two axioms follow from Theorem 3.2.2, which shows that random vari-

ables in one-hot distributions and permutation distributions are NA, and Theo-

rem 3.3.5, which shows that ⊛ captures the NA of random variables. We can also

encode the monotone map closure in Theorem 3.2.3 as an axiom in the logic.

Lemma 3.5.4 (BINARY MONOTONE MAP). The following is valid in (X𝑁𝐴,V∗).

|= (𝜑 ⊛ 𝜂 ∧ [𝑦 = 𝑓 (FV(𝜑))]) → Own(𝑦) ⊛ 𝜂 where 𝑓 is monotone

(Binary-Mono-Map)

Proof. For any 𝜇 |= 𝜑 ⊛ 𝜂 ∧ [𝑦 = 𝑓 (FV(𝜑))], there exists 𝜇1, 𝜇2, 𝜇
′ such that 𝜇 ⊒

𝜇′ ∈ 𝜇1 ⊕ 𝜇2, 𝜇1 |= 𝜑 and 𝜇2 |= 𝜂; furthermore, for any 𝑚 such that 𝜇(𝑚) > 0,

⟦𝑦⟧(𝑚) = ⟦ 𝑓 (𝑋)⟧(𝑚).

Let 𝑆 = dom𝜇2. and let 𝜇′′ denote 𝜋𝑆∪{𝑦}𝜇. We want to show that 𝜇′′ ∈

(𝜋𝑦𝜇) ⊕ 𝜇2. For any partition {𝑆1, . . . , 𝑆𝑘 } of 𝑆, for any family of non-negative all

75


monotone or all antitone functions 𝑓0, 𝑓1, . . . , 𝑓𝑘 ,

E𝑚∈𝜇′′
 𝑓0(𝜋𝑦𝑚) ·

∏
𝑖∈[𝑘]

𝑓𝑖 (𝜋𝑆𝑖𝑚)


= E𝑚∈𝜇

 𝑓0(𝜋𝑦𝑚) ·
∏
𝑖∈[𝑘]

𝑓𝑖 (𝜋𝑆𝑖𝑚)
 (𝜇′′ is a marginalization of 𝜇)

= E𝑚∈𝜇

 𝑓0( 𝑓 (𝜋𝑋𝑚)) ·
∏
𝑖∈[𝑘]

𝑓𝑖 (𝜋𝑆𝑖𝑚)
 (Because ⟦𝑦⟧(𝑚) = ⟦ 𝑓 (𝑋)⟧(𝑚))

≤ E𝑚∈𝜇 [ 𝑓0( 𝑓 (𝜋𝑋𝑚))] ·
∏
𝑖∈[𝑘]

E𝑚∈𝜇′′
[
𝑓𝑖 (𝜋𝑆𝑖𝑚)

]
(Because 𝜇 ⊒ 𝜇′ ∈ 𝜇1 ⊕ 𝜇2)

≤ E𝑚∈𝜇 [ 𝑓0(𝑦)] ·
∏
𝑖∈[𝑘]

E𝑚∈𝜇′′
[
𝑓𝑖 (𝜋𝑆𝑖𝑚)

]
(Because ⟦𝑦⟧(𝑚) = ⟦ 𝑓 (𝑋)⟧(𝑚))

≤ E𝑚∈𝜇′′ [ 𝑓0(𝑦)] ·
∏
𝑖∈[𝑘]

E𝑚∈𝜇′′ [ 𝑓𝑖 (𝜋𝑚𝑆𝑖)] (𝜇′′ is a marginalization of 𝜇)

Also, because 𝜇′′ = 𝜋𝑆∪{𝑦}𝜇, we have 𝜋𝑦𝜇′′ = 𝜋𝑦𝜇 and 𝜋𝑆𝜇
′′ = 𝜋𝑆𝜇 = 𝜇2. Thus,

𝜇′′ ∈ 𝜋𝑦𝜇⊕𝜇2. Because 𝜋𝑦𝜇 |= Own(𝑦), we have 𝜇′′ |= Own(𝑦) ⊛ 𝜂. By persistence,

𝜇′ |= Own(𝑦) ⊛ 𝜂. □

(Mono-Map) will play an important role in reasoning about negative associ-

ation arising in probabilistic programs. Furthermore, we can prove an N-nary

version of the monotone map axiom.

Lemma 3.5.5 (N-NARY MONOTONE MAP). Let 𝑥, 𝑥𝛾,𝛼 and 𝑦𝛾 be program variables.

Let 𝐾𝛾 be natural numbers. The following is valid in (X𝑁𝐴,V∗).

|=
𝑁

⊛
𝛾=0

©­«
𝐾𝛾∧
𝛼=0

Own(𝑥𝛾,𝛼)ª®¬ ∧
𝑁∧
𝛾=0

[
𝑦𝛾 = 𝑓𝛾

(
𝑥𝛾,0, . . . , 𝑥𝛾,𝐾𝛾

)]
→

𝑁

⊛
𝛾=0

Own(𝑦𝛾)

when 𝑓1, . . . , 𝑓𝑁 all monotone or all antitone (Mono-Map)

76


We defer the proof to appendix B.4.2.

We also have an axiom particular to permutation distributions. When we

establish NA from permutation distributions, it is preserved under not only

monotone/antitone maps but also any element-wise homogeneous maps. The

reason is that fixing a multi-set and a permutation, permuting first, and then

applying the same map on each element is equivalent to applying the map on

each element and then permuting. So applying homogeneous maps on a per-

mutation distribution gives another permutation distribution. We capture this

property in the following axiom.

Lemma 3.5.6 (Permutation Map). Let 𝑥𝛾 be variables, and 𝑓 (𝐴) be { 𝑓 (𝑎) | 𝑎 ∈ 𝐴}.

The following axiom is valid in (X𝑁𝐴,V∗).

|= Permu𝐴⟨[𝑥1, . . . , 𝑥𝑁 ]⟩ ∧ [𝑦 = [ 𝑓 (𝑥1), . . . , 𝑓 (𝑥𝑁 )]] → Permu 𝑓 (𝐴) ⟨𝑦⟩

(Perm-Map)

The proof is straightforward by unfolding the definitions, so we omit it here.

3.5.2 Program Logic

We now build upon the assertion logic and develop a program logic LINA for

reasoning about independence and negative association in probabilistic pro-

grams. Judgements in LINA have the form {𝑃} 𝑐 {𝑄}, where 𝑐 ∈ C is a proba-

bilistic program introduced in fig. 2.8, and bunched formulas 𝑃,𝑄 are restricted

assertions in MBI+.

Definition 3.5.4 (Validity). A LINA judgment is valid, written |= {𝑃} 𝑐 {𝑄}, if

for all 𝜇 ∈ Mem[Var] such that 𝜇 |= 𝑃, we have ⟦𝑐⟧(𝜇) |= 𝑄.

77


Next, we present the proof system of LINA. Since our assertions are a con-

servative extension of assertions from probabilistic separation logic, all the

rules from Figure 2.8 carry over unchanged. We have one new program

rule NEGFRAME, which acts as the frame rule for the negative association sep-

arating conjunction ⊛, and one new structural rule RCASE, which does case

analysis where each case only has some probability of occurring.

NEGFRAME

|= 𝜑→ Own(𝑋) FV(𝜂) ∩MV(𝑐) = ∅ 𝑋 ∩MV(𝑐) = ∅
⊢ {𝜑} 𝑐 {[𝑦 = 𝑓 (𝑋)]} 𝑓 is a monotone function

⊢ {𝜑 ⊛ 𝜂} 𝑐 {Own(𝑦) ⊛ 𝜂}

RCASE

𝜂 ∈ CC ∀𝛼 ∈ 𝑆. ⊢ {𝜑 ∗ 𝜂(𝛼)} 𝑐 {𝜓} |=Mem 𝜂→
∨
𝛼∈𝑆

𝜂(𝛼) 𝜓 ∈ 𝐶𝑀

⊢ {𝜑 ∗ 𝜂} 𝑐 {𝜓}

PROBBOUND
⊢ {𝑒𝑣1 = 1} 𝑐 {Pr[𝑒𝑣2] ≤ 𝛿}

⊢ {Pr[𝑒𝑣1] ≥ 1 − 𝜖} 𝑐 {Pr[𝑒𝑣2] ≤ 𝛿 + 𝜖}

Figure 3.2: New LINA rules.

Informally, the NEGFRAME rule says that if a set of variables 𝑋 is negatively

associated with another set of variables 𝑌 that satisfy 𝜂 in a program state, and

the program 𝑐 performs a monotone operation 𝑓 on 𝑋 and stores the result in a

variable 𝑦, then in the resulting program state, 𝑦 and the untouched variables 𝑌

will also be negatively associated, and 𝑌 will still satisfy 𝜂. Like the FRAME rule

for independence ∗, the NEGFRAME rule uses syntactic restrictions to control

which variables the program may read and write. The three sets of variables

RV(𝑐),WV(𝑐),MV(𝑐) are the ones defined in definition 2.3.8. Roughly, the side

conditions guarantee the program 𝑐 does not read from or modify 𝑌 , the set

of variables satisfying 𝜂; they in addition guarantee that 𝑋 , the domain of the

monotone map will not be modified by 𝑐, and 𝑦, the codomain of the monotone

78


map does not belong to 𝑌 .

For RCASE, we write |=Mem 𝑃 iff ∀𝑚 ∈ ∪𝑆⊆VarMem[𝑆], 𝛿(𝑚) |= 𝑃. We say a

formula 𝜂 is closed under conditioning (CC) iff for any 𝜇, 𝜇 |= 𝜂 implies that for any

event 𝑆, we have 𝜇 | 𝑆 |= 𝜂. And as in RCOND, a formula 𝜂 in CM means that 𝜂 is

closed under the mixture. At a high-level, RCASE allows us to first condition the

input distribution on one specific case, reason about the post-condition with the

conditioned input distribution, and then use the post-condition – we implicitly

combined post-conditions from different cases by requiring the post-condition

to be closed under the mixture.

Last, we present the rule PROBBOUND to facilitate bounding probabilities.

It says that if the pre-condition 𝑒𝑣1 = 1 guarantees that event 𝑒𝑣2 happens with

at most 𝛿 probability after command 𝑐, then in general, event 𝑒𝑣2 happens with

at most probability 𝛿 + 𝜖 after 𝑐, where 𝜖 upper bounds the probability that 𝑒𝑣1

is not true in the pre-condition. The validity of this rule uses the law of total

probability, which says for any two events 𝑒𝑣1 and 𝑒𝑣2,

Pr(𝑒𝑣1) = Pr(𝑒𝑣1 | 𝑒𝑣2) · Pr(𝑒𝑣2) + Pr(𝑒𝑣1 | ¬𝑒𝑣2) · Pr(¬𝑒𝑣2)

≤ Pr(𝑒𝑣1 | 𝑒𝑣2) + Pr(¬𝑒𝑣2).

As expected, the LINA proof system is sound.

Theorem 3.5.7. (Soundness of LINA) If ⊢ {𝜑} 𝑐 {𝜓} is derivable, then it is valid:

|= {𝜑} 𝑐 {𝜓}.

Proof. We prove the soundness of each new rule in LINA.

NEGFRAME We show that NEGFRAME follows from Binary-Mono-Map and ex-

isting program rules. By CONST, the side conditions FV(𝜂) ∩MV(𝑐) = ∅

79


and 𝑋 ∩ MV(𝑐) imply that {Own(𝑋) ⊛ 𝜂} 𝑐 {Own(𝑋) ⊛ 𝜂}. Because

|= 𝜑 → Own(𝑋), by ∗-CONJ, it must |= 𝜑 ⊛ 𝜂 → Own(𝑋) ⊛ 𝜂. Thus,

by WEAK,

{𝜑 ⊛ 𝜂} 𝑐 {Own(𝑋) ⊛ 𝜂} .

Also by WEAK, the premise {𝜑} 𝑐 {[𝑦 = 𝑓 (𝑋)]} implies {𝜑 ⊛ 𝜂} 𝑐 {[𝑦 = 𝑓 (𝑋)]}.

Thus, by CONJ,

{𝜑 ⊛ 𝜂} 𝑐 {Own(𝑋) ⊛ 𝜂 ∧ [𝑦 = 𝑓 (𝑋)]} .

By (Binary-Mono-Map), Own(𝑋) ⊛ 𝜂 ∧ [𝑦 = 𝑓 (𝑋)] implies Own(𝑦) ⊛ 𝜂.

Thus, by WEAK again,

{𝜑 ⊛ 𝜂} 𝑐 {Own(𝑦) ⊛ 𝜂} .

RCASE For any 𝜇 |= 𝜑 ∗ 𝜂, there exists 𝜇1, 𝜇2, 𝜇
′ such that 𝜇 ⊇ 𝜇′ ∈ 𝜇1 ◦ 𝜇2, 𝜇1 |=

𝜑, and 𝜇2 |= 𝜂. The formula 𝜂 being CC means for any 𝑚 in the support of

𝜇2, 𝛿𝑚 |= 𝜂 as well. Then, with the side condition |=Mem 𝜂 → ∨
𝛼∈𝑆 𝜂𝛼, we

have

𝛿𝑚 |=
∨
𝛼∈𝑆

𝜂(𝛼).

Combining the side-condition that {𝜂(𝛼)} 𝑐 {𝜓} for all 𝛼 with CASE, we

get {∨𝛼∈𝑆 𝜂(𝛼)} 𝑐 {𝜓}. Thus, for any 𝑚 ∈ supp(𝜇) we have ⟦𝑐⟧(𝛿𝑚) |= 𝜓.

According to the semantics, ⟦𝑐⟧(𝜇) is a convex combination of ⟦𝑐⟧(𝛿𝑚) for

different 𝑚, and thus ⟦𝑐⟧𝜇 |= 𝜓.

PROBBOUND Denote the function 𝜆𝑥.1 − 𝑒𝑣1(𝑥) as ¬𝑒𝑣1.

For any program state 𝜇 |= Pr[𝑒𝑣1] ≥ 1− 𝜖 , let 𝜌 = 𝜇(𝑒𝑣1), it must 𝜌 ≤ 1− 𝜖 .

Let 𝜇𝑒𝑣1 = ⟦𝑐⟧(𝜇 | 𝑒𝑣1) and let 𝜇¬𝑒𝑣1 = ⟦𝑐⟧(𝜇 | ¬𝑒𝑣1). By induction on

the denotational semantics of the commands, we can prove that ⟦𝑐⟧(𝜇) =

𝜇𝑒𝑣1 ◦𝜌 𝜇¬𝑒𝑣1 .

80


Also, by construction, ⟦𝑒𝑣1⟧(𝜇 | 𝑒𝑣1) = 1, so 𝜇 | 𝑒𝑣1 |= 𝑒𝑣1. By the rule’s

assumption and inductive hypothesis, we have |= {𝑒𝑣1} 𝑐 {Pr[𝑒𝑣2] ≤ 𝛿},

which implies

𝜇𝑒𝑣1 |= Pr[𝑒𝑣2] ≤ 𝛿.

Thus, we have 𝜇𝑒𝑣1 (𝑒𝑣2) ≤ 𝛿.

Then, by definition and the law of total probability,

⟦𝑐⟧(𝜇) (𝑒𝑣2(𝑥)) = (𝜇𝑒𝑣1 ◦𝜌 𝜇¬𝑒𝑣1) (𝑒𝑣2)

≤ 𝜌 · 𝜇𝑒𝑣1 (𝑒𝑣2) + (1 − 𝜌)

≤ 𝜌 · 𝛿 + (1 − 𝜌)

≤ 𝜌 · 𝛿 + 𝜖

≤ 𝛿 + 𝜖

That ensures ⟦𝑐⟧(𝜇) |= Pr[𝑒𝑣2] ≤ 𝛿 + 𝜖 . □

3.6 Examples

Now that we have introduced LINA, we present a series of formalized case

studies. Our examples are extracted from various algorithms using hashing

and balls-into-bins processes.

3.6.1 Probability-related Axioms for Examples

Our examples will use a handful of standard facts about probability distribu-

tions, encoded as axioms in the assertion logic. For completeness, we list the

81


axioms used below. We also observe the following conventions throughout the

examples: logical variables are denoted by Greek (𝛼, 𝛽, 𝛾, . . . ) and capital Ro-

man letters (𝑀, 𝑁, 𝐾, . . . ). Program variables start with lower-case Roman let-

ters (𝑥, 𝑦, 𝑧, . . . ).

The most important axiom is the one encoding Chernoff Bound: in each of

our examples, we establish negative dependence of a sequence of random vari-

ables {𝑋𝑖}𝑖 and apply the Chernoff bound to derive a tail bound. In our assertion

logic, the Chernoff bound can be encoded as the following axiom schema:

Theorem 3.6.1 (Chernoff bound, axiom). Let {𝑥𝛼} be a family of variables indexed

by 𝛼, where each variable is bounded in [0, 1] and is a monotone function of its program

variables. Then for any 𝛽 ∈ (0, 1], the following axiom schema is sound in our model:

|=
𝑁

⊛
𝛼=0

Own(𝑥𝛼) → Pr

[ ����� 𝑁∑︁
𝛼=0

𝑥𝛼 − E
[
𝑁∑︁
𝛼=0

𝑥𝛼

] ����� ≥ 𝛽
]
≤ 𝐹 (𝛽, 𝑛) (NA-Chernoff-1)

|=
𝑁

⊛
𝛼=0

Own(𝑥𝛼) → Pr

[ ����� 𝑁∑︁
𝛼=0

𝑥𝛼 − E
[
𝑁∑︁
𝛼=0

𝑥𝛼

] ����� ≥ 𝑇 (𝛽, 𝑛)
]
≤ 𝛽 (NA-Chernoff-2)

For the other axioms, we present the axioms in binary form for simplicity,

though most extend directly to big operations.

• Linearity of expectation. Let 𝑒, 𝑓 be bounded expressions.

|= [E[𝛼 · 𝑒 + 𝛽 · 𝑓 ] = 𝛼 · E[𝑒] + 𝛽 · E[ 𝑓 ]] (LinExp)

• Union bound. Let 𝑒𝑣1, 𝑒𝑣2 ∈ EV,

|= Pr[𝑒𝑣1 ∨ 𝑒𝑣2] ≤ Pr[𝑒𝑣1] + Pr[𝑒𝑣2] (UnionBd)

• Permutation marginal. Let 𝑥 be an array variable, and let 𝑆 be a finite set.

|= Permu𝑆⟨𝑥⟩ → Unif𝑆⟨𝑥 [𝛼]⟩ (PermMarg)

82


• Expectation Indicator. Let 𝑒 be a 0/1 valued expression,

|= [E[𝑒] = Pr[𝑒 = 1]] (ExpectInd)

• Bernoulli variables probabilities. Let 𝑒 be an expression,

|= Bern𝑝 ⟨𝑒⟩ → Pr[𝑒 = 1] = 𝑝 (BernProb)

• Probability of uniform. Let 𝑆 be a finite set.

|= [Pr[Unif𝑆⟨𝑥⟩ = 𝛼] = 1/|𝑆 |] (ProbUnif)

• Bijection uniform. Let 𝑆 be a finite set, and let 𝑓 : 𝑆 → 𝑆 be a bijection.

|= Unif𝑆⟨𝑥⟩ → Unif𝑆⟨ 𝑓 (𝑥)⟩ (BijectUnif)

• One-hot marginal. Let 𝑥 be an array variable.

|= OH𝑆⟨𝑥⟩ → Unif𝑆⟨𝑥 [𝛼]⟩ (OHMarg)

• Independent product one-hot.

|= OH[𝑀] ⟨𝑥⟩ ∗ OH[𝑁] ⟨𝑦⟩ → OH[𝑀]×[𝑁]
〈
𝑥⊤ · 𝑦

〉
(IndProdOH)

• Independent map. Let 𝑥 be an array variable of length 𝑁 .

|=
𝑁∗
𝛼=0

𝑥 [𝛼] $∼→
𝑁∗
𝛼=0

𝑓 (𝑥 [𝛼]) $∼ (IndMap)

• Deterministic independent. Let 𝑥 be a variable.

|= Detm⟨𝑥⟩ → 𝑥 $∼∗ 𝑒 $∼ (DetInd)

• Events happen only if they have probability one. Let 𝑒𝑣 ∈ EV,

|= 𝑒𝑣 = 1→ Pr(𝑒𝑣) = 1 (ProbOne)

83


• Uniform sampling from a population. We represent a population as a bit-

vector, where each entry is an individual and 1 indicates they have some

feature and 0 indicates not. Then, if we uniformly sample from the popula-

tion, the probability of getting one is equal to the population-level ratio of

ones, regardless of how they are distributed in the population. Let 𝑁 ≥ 𝐽

be constants or logical variables, 𝑏 be an array variable of length 𝑁 , and

𝑥, ℎ𝑖𝑡 be variables:

|=
( (

bv(𝑏, 𝐽, 𝑁) ∗ Unif [𝑁] ⟨𝑥⟩
)
∧ [ℎ𝑖𝑡 = 𝑏[𝑥]]

)
→ Bernℎ𝑖𝑡

〈
𝐽

𝑁

〉
∗ ©­«

𝑁∑︁
𝛽=0

𝑏[𝛽] = 𝐽ª®¬.
(UniformSamp)

• Independent product probabilities. Let 𝑒𝑣1, 𝑒𝑣2 ∈ EV , 𝐽, 𝐾 be two real

numbers,

|= Pr[𝑒𝑣1] ≤ 𝐽 ∗ Pr[𝑒𝑣2] ≤ 𝐾 → Pr[𝑒𝑣1 ∧ 𝑒𝑣2] ≤ 𝐽 · 𝐾. (IndepProb)

• Equal probabilities. Let 𝑏1, 𝑏2 be two boolean expressions. Recall that

𝑏1, 𝑏2 ∈ EV too.

|= [𝑏1 = 𝑏2] → Pr[𝑏1] = Pr[𝑏2] (EqualProb)

3.6.2 Bloom filter, High-level

We demonstrate how NA and its closure properties can be used to analyze

Bloom filters. A Bloom filter is a space-efficient probabilistic data structure for

storing a set of items from a universe 𝑈. An 𝑁-bit Bloom filter consists of a

length-𝑁 array 𝑏𝑙𝑜𝑜𝑚 holding zero-one entries. We assume there is a family 𝑆

of hash functions mapping𝑈 to {0, . . . , 𝑁 − 1} and a distributionH over 𝑆 such

that for any 𝑥 ∈ 𝑈 and any bucket 𝑘 , Pr 𝑓∼H ( 𝑓 (𝑥) = 𝑘) = 1/𝑁 . Let 𝑙1, . . . , 𝑙𝐻 be a

84


BLOOM :
𝑏𝑙𝑜𝑜𝑚 ← zero(𝑁);
𝑚 ← 0;
while 𝑚 < 𝑀 do
ℎ← 0
while ℎ < 𝐻 do
𝑏𝑖𝑛 $← OH[𝑁] ;
𝑢𝑝𝑑 ← 𝑏𝑙𝑜𝑜𝑚 | | 𝑏𝑖𝑛;
𝑏𝑙𝑜𝑜𝑚 ← 𝑢𝑝𝑑;

ℎ← ℎ + 1;
𝑚 ← 𝑚 + 1;

(a) Higher-level version

BLOOMARRAY :
𝑏𝑙𝑜𝑜𝑚 ← zero(𝑁);
𝑚 ← 0;
while 𝑚 < 𝑀 do
ℎ← 0
while ℎ < 𝐻 do
𝑏𝑖𝑛 $← OH[𝑁] ;
𝑛← 0;
while 𝑛 < 𝑁 do
𝑢𝑝𝑑 ← 𝑏𝑙𝑜𝑜𝑚 [𝑛] | | 𝑏𝑖𝑛[𝑛];
𝑏𝑙𝑜𝑜𝑚 [𝑛] ← 𝑢𝑝𝑑;
𝑛← 𝑛 + 1

ℎ← ℎ + 1;
𝑚 ← 𝑚 + 1

(b) Array version

Figure 3.3: Bloom filter examples

collection of hash functions drawn from H . We assume the hash functions are

independent, meaning the collection of variables {𝑙𝑖 (𝑥) | 𝑥 ∈ 𝑈, 𝑖 ∈ {1, . . . , 𝐻}}

are independent. To add an item 𝑥 ∈ 𝑈 to the filter, we compute 𝑙1(𝑥), . . . , 𝑙𝐻 (𝑥)

to get 𝐻 positions in the bit array 𝑏𝑙𝑜𝑜𝑚 and then set the bits at each of these

positions to 1. To check if an item 𝑦 is in the filter, we check whether the bits

at positions 𝑙1(𝑦), . . . , 𝑙𝐻 (𝑦) in 𝑏𝑙𝑜𝑜𝑚 are all 1. If they are, the item is said to be

in the filter, but if any is 0, then the item is not in the filter. This membership

test may suffer from false positives, i.e., it may show that an item 𝑦 is in the filter

even when 𝑦 was never added to the filter. This can happen because, with hash

collisions, other items added to the Bloom filter could set all the bits at loca-

tions 𝑙1(𝑦), . . . , 𝑙𝐻 (𝑦) to 1. A basic quantity of interest is the false positive rate: the

probability that a Bloom filter reports a false positive.

We model the process of adding 𝑀 distinct items into a Bloom filter as the

program BLOOM in fig. 3.3a. Because the 𝑀 items are distinct, we model the

85


hash functions as if they independently, randomly sample hash values for each

item as they are added, a standard model used in the analysis of hashing data

structures [Mitzenmacher and Upfal, 2005]. That is, we encode the hashing step

as sampling a one-hot vector from the distribution OH[𝑁] and storing it in the

variable 𝑏𝑖𝑛, where the hot bit of the vector 𝑏𝑖𝑛 represents the selected position.

To set the corresponding position in the filter to 1, we update 𝑏𝑙𝑜𝑜𝑚, which is

set to be an all-zero vector at the beginning of the program, to be 𝑏𝑙𝑜𝑜𝑚 | | 𝑏𝑖𝑛,

the bitwise-or of the current array and the sampled one-hot array.

Our goal is to bound an 𝑁-bit Bloom filter’s false positive rate after 𝑀 dis-

tinct items are added. We split the analysis of the false positive rate into two

steps. First, we will analyze BLOOM and prove that the entries in 𝑏𝑙𝑜𝑜𝑚 are

negatively associated at the end of the process. By (NA-Chernoff-2), NA be-

tween the entries of 𝑏𝑙𝑜𝑜𝑚 gives a tail bound of the fraction of bits in 𝑏𝑙𝑜𝑜𝑚

that are set to 1. Second, we analyze a program that checks the membership

of a new item in a given Bloom filter, presented as CHECKMEM in fig. 3.4, and

bound the probability that the 𝐻 hashed values of the new item are all already

in the Bloom filter. Last, we combine them into one proof that bounds the false

positive rate of a Bloom filter with 𝑀 elements.

Proving NA of BLOOM Recall that the code models inserting 𝑀 distinct el-

ements into a Bloom filter backed by an array 𝑏𝑙𝑜𝑜𝑚 of length 𝑁 , where each

element is hashed by 𝐻 functions, each producing an element of [𝑁] uniformly

at random.

We refer to the outer loop as outer, and the inner loop as inner. For both

the outer and the inner loop, we apply the rule LOOP with the loop invari-

86


ant: ⊛𝑁

𝛽=0 Own(𝑏𝑙𝑜𝑜𝑚 [𝛽]). We consider the inner loop first. We show that

the invariant is preserved by the body of inner. After the sampling command

𝑏𝑖𝑛 $← OH[𝑁] , SAMP gives:

©­«
𝑁

⊛
𝛽=0

Own(𝑏𝑙𝑜𝑜𝑚 [𝛽])ª®¬ ∗ OH[𝑁] ⟨𝑏𝑖𝑛⟩

By negative association of the one-hot distribution (OH-PNA), we get

©­«
𝑁

⊛
𝛽=0

Own(𝑏𝑙𝑜𝑜𝑚 [𝛽])ª®¬ ∗ ©­«
𝑁

⊛
𝛾=0

𝑏𝑖𝑛[𝛾]ª®¬
which implies ©­«

𝑁

⊛
𝛽=0

Own(𝑏𝑙𝑜𝑜𝑚 [𝛽])ª®¬ ⊛ ©­«
𝑁

⊛
𝛾=0

𝑏𝑖𝑛[𝛾]ª®¬
using WEAK. By rearranging terms, this is equivalent to

𝑁

⊛
𝛽=0

Own(𝑏𝑙𝑜𝑜𝑚 [𝛽]) ⊛ Own(𝑏𝑖𝑛[𝛽]).

After the assignment to 𝑢𝑝𝑑, we have:

©­«
𝑁

⊛
𝛽=0

Own(𝑏𝑙𝑜𝑜𝑚 [𝛽]) ⊛ Own(𝑏𝑖𝑛[𝛽])ª®¬ ∧ [𝑢𝑝𝑑 = 𝑏𝑙𝑜𝑜𝑚 | | 𝑏𝑖𝑛] .

Because | | is monotone, applying the monotone mapping axiom (Mono-Map)

gives us:
𝑁

⊛
𝛽=0

Own(𝑢𝑝𝑑 [𝛽]).

Using the assignment rule (RASSN) on the assignment to bloom shows that the

loop invariant is preserved by the inner loop. Thus, LOOP gives:

{
𝑁

⊛
𝛽=0

Own(𝑏𝑙𝑜𝑜𝑚 [𝛽])} inner {
𝑁

⊛
𝛽=0

Own(𝑏𝑙𝑜𝑜𝑚 [𝛽])}
Next, we turn to the outer loop. The argument showing that the invariant is

preserved by the outer loop follows from a straightforward argument, since the

87


outer loop only modifies 𝑏𝑙𝑜𝑜𝑚 through the inner loop, so LOOP gives:

{
𝑁

⊛
𝛽=0

Own(𝑏𝑙𝑜𝑜𝑚 [𝛽])} outer {
𝑁

⊛
𝛽=0

Own(𝑏𝑙𝑜𝑜𝑚 [𝛽])}
Then, we have:

{⊤} BLOOM {
𝑁

⊛
𝛽=0

Own(𝑏𝑙𝑜𝑜𝑚 [𝛽])}
because initializing 𝑏𝑙𝑜𝑜𝑚 to the all-zeros vector, a deterministic value, estab-

lishes the loop invariant. This judgment shows that the 𝑏𝑙𝑜𝑜𝑚 vector satisfies

NA at the end of the program.

We now apply the Chernoff bound to the NA variables (NA-Chernoff-2) to

prove that, with high probability, the number of occupied bins in BLOOM is near

its mean with high probability:{
⊤
}

BLOOM

{
Pr


������ 𝑁∑︁𝛽=0

𝑏𝑙𝑜𝑜𝑚 [𝛽] − E

𝑁∑︁
𝛽=0

𝑏𝑙𝑜𝑜𝑚 [𝛽]

������ ≥ 𝑇 (𝛿, 𝑁)

 ≤ 𝛿
}
.

This concentration bound implies that a tail bound, which says with high prob-

ability
∑𝑁
𝛽=0 𝑏𝑙𝑜𝑜𝑚 [𝛽] is upper bounded by its expected value plus 𝑇 (𝛿, 𝑁),{

⊤
}

BLOOM

{
Pr


𝑁∑︁
𝛽=0

𝑏𝑙𝑜𝑜𝑚 [𝛽] < E


𝑁∑︁
𝛽=0

𝑏𝑙𝑜𝑜𝑚 [𝛽]
 + 𝑇 (𝛿, 𝑁)

 ≥ 1 − 𝛿
}
. (3.1)

Furthermore, since the first line of the program, 𝑏𝑙𝑜𝑜𝑚 has been kept as a

bit-array, i.e., all its entries are either 0 or 1. So it is easy to prove that

{⊤} BLOOM {
𝑁∧
𝛽=0

(𝑏𝑙𝑜𝑜𝑚 [𝛽] = 0 ∨ 𝑏𝑙𝑜𝑜𝑚 [𝛽] = 1)} .

In the following, we will abbreviate formulas that assert 𝑏 is a bit-array

where exactly 𝐽 of its first 𝑁 entries are one, i.e.,

©­«
𝑁∑︁
𝛽=0

𝑏[𝛽] = 𝐽ª®¬ ∧
𝑁∧
𝛽=0

(𝑏[𝛽] = 0 ∨ 𝑏[𝛽] = 1),

88


as bv(𝑏, 𝐽, 𝑁). Similarly, we will use bv(𝑏, < 𝐽, 𝑁) to abbreviate

©­«
𝑁∑︁
𝛽=0

𝑏[𝛽] < 𝐽ª®¬ ∧
𝑁∧
𝛽=0

(𝑏[𝛽] = 0 ∨ 𝑏[𝛽] = 1).

Now we restate our goal as

{bv(𝑏𝑙𝑜𝑜𝑚, < 𝐾, 𝑁)} CHECKMEM {Pr[𝑎𝑙𝑙ℎ𝑖𝑡] ≤ (𝐾/𝑁)𝐻} .

CHECKMEM(𝐻, 𝑏𝑙𝑜𝑜𝑚) :
ℎ← 0;
𝑎𝑙𝑙ℎ𝑖𝑡 ← 1
while ℎ < 𝐻 do
𝑏𝑖𝑛 $← Unif [𝑁] ;
ℎ𝑖𝑡 ← 𝑏𝑙𝑜𝑜𝑚 [𝑏𝑖𝑛];
𝑎𝑙𝑙ℎ𝑖𝑡 ← ℎ𝑖𝑡&& 𝑎𝑙𝑙ℎ𝑖𝑡;
ℎ← ℎ + 1;

Figure 3.4: Check the membership of a new item

Bounding the false positive rate Now, we turn to verifying a bound on the

false positive rate of the Bloom filter. Recall that a false positive occurs if the

filter returns true when querying with an element that was not inserted. We

can encode the membership check of a new element as a program CHECKMEM

(𝐻, 𝑏𝑙𝑜𝑜𝑚), listed in Figure 3.4. The program hashes the new element into 𝐻

uniformly random positions and checks if these positions are all set to one in

the filter. If so, the Bloom filter will report that the new element is in set, even

though it was never inserted — a false positive.

To verify the false positive rate, we place the program CHECKMEM(𝐻, 𝑏𝑙𝑜𝑜𝑚)

immediately after BLOOM, and then verify a bound on the probability that 𝑎𝑙𝑙ℎ𝑖𝑡

is 1 at the end of the combined program.

89


CHECKMEM first initializes ℎ and 𝑎𝑙𝑙ℎ𝑖𝑡 deterministically to 1. Then, us-

ing RASSN and FRAME, we can show that

⊢ {⊤} ℎ← 0; 𝑎𝑙𝑙ℎ𝑖𝑡 ← 1 {[ℎ = 0] ∗ [𝑎𝑙𝑙ℎ𝑖𝑡 = 1]} .

Using the (ProbOne) axiom and the fact that 1 ≤ (𝐾/𝑁)0 for any 𝐾 and 𝑁 , we

can show |= ( [ℎ = 0]) ∗ ([𝑎𝑙𝑙ℎ𝑖𝑡 = 1]) → Pr[𝑎𝑙𝑙ℎ𝑖𝑡] ≤ (𝐾/𝑁)ℎ. Thus,

⊢ {⊤} ℎ← 0; 𝑎𝑙𝑙ℎ𝑖𝑡 ← 1 {Pr[𝑎𝑙𝑙ℎ𝑖𝑡] ≤ (𝐾/𝑁)ℎ} .

Because the assignments ℎ← 0; 𝑎𝑙𝑙ℎ𝑖𝑡 ← 1 do not modify the Bloom filter array

𝑏𝑙𝑜𝑜𝑚, we can then apply FRAME to derive

⊢ {bv(𝑏𝑙𝑜𝑜𝑚, < 𝐾, 𝑁)} ℎ← 0; 𝑎𝑙𝑙ℎ𝑖𝑡 ← 1 {bv(𝑏𝑙𝑜𝑜𝑚, < 𝐾, 𝑁) ∗ Pr[𝑎𝑙𝑙ℎ𝑖𝑡] ≤ (𝐾/𝑁)ℎ} .
(3.2)

We will abbreviate bv(𝑏𝑙𝑜𝑜𝑚, < 𝐾, 𝑁) ∗ Pr[𝑎𝑙𝑙ℎ𝑖𝑡] ≤ (𝐾/𝑁)ℎ ∧ ℎ < 𝐻 as 𝜂. Be-

cause
∑𝑁
𝛽=0 𝑏[𝛽] is an integer upper bounded by 𝑁 ,

|=Mem 𝜂→
∨

0≤𝐽<𝐾
𝜂𝐽 ,

where 𝜂𝐽 abbreviates 𝐽 < 𝐾 ∧
(
bv(𝑏𝑙𝑜𝑜𝑚, 𝐽, 𝑁) ∗ Pr[𝑎𝑙𝑙ℎ𝑖𝑡] ≤ (𝐾/𝑁)ℎ

)
∧ ℎ ≤ 𝐻.

We will then prove that for each 𝐽, the formula 𝜂𝐽 is a loop invariant of

CHECKMEM’s loop body. The loop body first uniformly samples an element

from [𝑁], so by SAMP and FRAME, we have the following as the post-condition:

𝜂𝐽 ∗ Unif [𝑁] ⟨𝑏𝑖𝑛⟩. (3.3)

Together with the axiom |= ((𝑃∧𝑄) ∗ 𝑅) → (𝑃∧ (𝑄 ∗ 𝑅)), the post-condition 3.3

implies

𝐽 < 𝐾 ∧ ℎ ≤ 𝐻 ∧
(
bv(𝑏𝑙𝑜𝑜𝑚, 𝐽, 𝑁) ∗

(
Pr[𝑎𝑙𝑙ℎ𝑖𝑡] ≤ (𝐾/𝑁)ℎ

)
∗ Unif [𝑁] ⟨𝑏𝑖𝑛⟩

)
.

90


Then, ℎ𝑖𝑡 gets assigned to 𝑏𝑙𝑜𝑜𝑚 [𝑏𝑖𝑛], so by RASSN and CONST, we have

{bv(𝑏𝑙𝑜𝑜𝑚, 𝐽, 𝑁) ∗ Unif [𝑁] ⟨𝑏𝑖𝑛⟩}

ℎ𝑖𝑡 ← 𝑏𝑙𝑜𝑜𝑚 [𝛽]

{ (bv(𝑏𝑙𝑜𝑜𝑚, 𝐽, 𝑁) ∗ Unif [𝑁] ⟨𝑏𝑖𝑛⟩
)
∧ [ℎ𝑖𝑡 = 𝑏𝑙𝑜𝑜𝑚 [𝑏𝑖𝑛]]}

.

Since the array 𝑏𝑙𝑜𝑜𝑚 only contains zero-one entries, when the sum of its entries

is 𝐽, an entry 𝑏𝑙𝑜𝑜𝑚 [𝑏𝑖𝑛] drawn uniformly at random has probability 𝐽
𝑁

to be

1. If the entry is in addition chosen independently from values in 𝑏𝑙𝑜𝑜𝑚, then

the bit 𝑏𝑙𝑜𝑜𝑚 [𝑏𝑖𝑛] is distributed independent from the distribution of 𝑏𝑙𝑜𝑜𝑚.

The (UniformSamp) axiom encodes this fact:

|=
( (

bv(𝑏, 𝐽, 𝑁) ∗ Unif [𝑁] ⟨𝑥⟩
)
∧ [ℎ𝑖𝑡 = 𝑏[𝑥]]

)
→ Bern 𝐽

𝑁
⟨ℎ𝑖𝑡⟩ ∗ bv(𝑏, 𝐽, 𝑁).

Thus, we have

{bv(𝑏𝑙𝑜𝑜𝑚, 𝐽, 𝑁) ∗ Unif [𝑁] ⟨𝑏𝑖𝑛⟩} ℎ𝑖𝑡 ← 𝑏𝑙𝑜𝑜𝑚 [𝛽] {Bern 𝐽
𝑁
⟨ℎ𝑖𝑡⟩ ∗ bv(𝑏𝑙𝑜𝑜𝑚, 𝐽, 𝑁)} .

Because ℎ𝑖𝑡 ← 𝑏𝑙𝑜𝑜𝑚 [𝑏𝑖𝑛] does not modify 𝑎𝑙𝑙ℎ𝑖𝑡, we can apply FRAME and get{
bv(𝑏𝑙𝑜𝑜𝑚, 𝐽, 𝑁) ∗ Unif [𝑁] ⟨𝑏𝑖𝑛⟩ ∗ Pr[𝑎𝑙𝑙ℎ𝑖𝑡] ≤

(
𝐾

𝑁

)ℎ}
ℎ𝑖𝑡 ← 𝑏𝑙𝑜𝑜𝑚 [𝛽]{

Bern 𝐽
𝑁
⟨ℎ𝑖𝑡⟩ ∗ bv(𝑏𝑙𝑜𝑜𝑚, 𝐽, 𝑁) ∗ Pr[𝑎𝑙𝑙ℎ𝑖𝑡] ≤

(
𝐾

𝑁

)ℎ}
.

Next, with the assignment 𝑎𝑙𝑙ℎ𝑖𝑡 ← ℎ𝑖𝑡&& 𝑎𝑙𝑙ℎ𝑖𝑡, by applying the RASSN rule

and the axioms (IndepProb), (EqualProb), we get:{
Bern 𝐽

𝑁
⟨ℎ𝑖𝑡⟩ ∗ bv(𝑏𝑙𝑜𝑜𝑚, 𝐽, 𝑁) ∗ Pr[𝑎𝑙𝑙ℎ𝑖𝑡] ≤

(
𝐾

𝑁

)ℎ}
𝑎𝑙𝑙ℎ𝑖𝑡 ← ℎ𝑖𝑡&& 𝑎𝑙𝑙ℎ𝑖𝑡{(

Pr[𝑎𝑙𝑙ℎ𝑖𝑡] ≤ 𝐽

𝑁
·
(
𝐾

𝑁

)ℎ)
∗ bv(𝑏𝑙𝑜𝑜𝑚, 𝐽, 𝑁)

}

91


We can then apply the rule of constancy CONST and get{
𝐽 < 𝐾 ∧ ℎ ≤ 𝐻 ∧

(
bv(𝑏𝑙𝑜𝑜𝑚, 𝐽, 𝑁) ∗ Unif [𝑁] ⟨𝑏𝑖𝑛⟩ ∗ Pr[𝑎𝑙𝑙ℎ𝑖𝑡] ≤

(
𝐾

𝑁

)ℎ)}
ℎ𝑖𝑡 ← 𝑏𝑙𝑜𝑜𝑚 [𝛽]; 𝑎𝑙𝑙ℎ𝑖𝑡 ← ℎ𝑖𝑡&& 𝑎𝑙𝑙ℎ𝑖𝑡{

𝐽 < 𝐾 ∧ ℎ ≤ 𝐻 ∧
((

Pr[𝑎𝑙𝑙ℎ𝑖𝑡] ≤ 𝐽

𝑁
·
(
𝐾

𝑁

)ℎ)
∗ bv(𝑏𝑙𝑜𝑜𝑚, 𝐽, 𝑁)

)}
When we have 𝐽 < 𝐾 , then (𝐾/𝑁)ℎ · 𝐽

𝑁
≤ (𝐾/𝑁)ℎ+1, so the postcondition implies

𝐽 < 𝐾 ∧ ℎ ≤ 𝐻 ∧
((

Pr[𝑎𝑙𝑙ℎ𝑖𝑡] ≤
(
𝐾

𝑁

)ℎ+1)
∗ bv(𝑏𝑙𝑜𝑜𝑚, 𝐽, 𝑁)

)
The last step in the loop body is the assignment ℎ← ℎ + 1. By the deterministic

assignment rule DASSN, we can establish the postcondition 𝜂𝐽 afterwards:

𝐽 < 𝐾 ∧ ℎ ≤ 𝐻
(
bv(𝑏𝑙𝑜𝑜𝑚, 𝐽, 𝑁) ∗ Pr[𝑎𝑙𝑙ℎ𝑖𝑡] ≤

(
𝐾

𝑁

)ℎ)
.

Thus, we have {𝜂𝐽} loop body {𝜂𝐽}

By LOOP rule, we can establish {𝜂𝐽} 𝑙𝑜𝑜𝑝 {𝜂𝐽 ∧ ℎ ≥ 𝐻}. Recall 𝜂 abbreviates

bv(𝑏𝑙𝑜𝑜𝑚, < 𝐾, 𝑁) ∗ Pr[𝑎𝑙𝑙ℎ𝑖𝑡] ≤ (𝐾/𝑁)ℎ∧ℎ < 𝐻, so the post-condition 𝜂𝐽∧ℎ ≥ 𝐻

implies ℎ = 𝐻, which further implies Pr[𝑎𝑙𝑙ℎ𝑖𝑡] ≤ (𝐾/𝑁)𝐻 . We then have

{𝜂𝐽} 𝑙𝑜𝑜𝑝 {Pr[𝑎𝑙𝑙ℎ𝑖𝑡] ≤
(
𝐾

𝑁

)𝐻
} .

Because Pr[𝑎𝑙𝑙ℎ𝑖𝑡] ≤ (𝐾/𝑁)𝐻 is closed under mixtures, and 𝜂 is closed under

conditioning, we can then apply RCASE to prove that

{𝜂} 𝑙𝑜𝑜𝑝 {Pr[𝑎𝑙𝑙ℎ𝑖𝑡] ≤
(
𝐾

𝑁

)𝐻
} . (3.4)

Using the SEQN rule to combine the proved judgments for CHECKMEM’s ini-

tialization (3.2) and loop (3.4), we derive

{bv(𝑏𝑙𝑜𝑜𝑚, < 𝐾, 𝑁)} CHECKMEM {Pr[𝑎𝑙𝑙ℎ𝑖𝑡] ≤
(
𝐾

𝑁

)𝐻
}

92


Then, by the PROBBOUND rule and basic axioms about probabilities, we have

{Pr

𝑁∑︁
𝛽=0

𝑏𝑙𝑜𝑜𝑚 [𝛽] < 𝐾
 ≥ 1 − 𝛿 ∧

𝑁∧
𝛽=0

(𝑏[𝛽] = 0 ∨ 𝑏[𝛽] = 1)}
CHECKMEM

{Pr[𝑎𝑙𝑙ℎ𝑖𝑡] ≤ (𝐾/𝑁)𝐻 + 𝛿}

. (3.5)

We then use SEQN to combine the proved judgements for BLOOM (3.1) and

CHECKMEM (3.5) to derive that, for any 𝛿,

{⊤} BLOOM; CHECKMEM {Pr[𝑎𝑙𝑙ℎ𝑖𝑡] ≤
©­­«
E
[∑𝑁

𝛽=0 𝑏𝑙𝑜𝑜𝑚 [𝛽]
]
+ 𝑇 (𝛿, 𝑁)

𝑁

ª®®¬
𝐻

+ 𝛿} .
Since 𝑎𝑙𝑙ℎ𝑖𝑡 is 1 exactly when there is a false positive, this judgment proves an

upper bound on the false positive rate of the Bloom filter.4

3.6.3 Bloom filter, Low-level

The previous Bloom filter uses a vector operation 𝑏𝑙𝑜𝑜𝑚 | | 𝑏𝑖𝑛 to transform an

array of negatively associated values. We next consider a lower-level version of

the previous example, BLOOMARRAY, in Figure 3.3b, where the vector operation

is replaced by a loop that applies the Boolean-or.

Let outer and mid be the outer-most and second outer-most loops, and let

inner be the inner-most loop. Again, our goal is to show that the vector 𝑏𝑙𝑜𝑜𝑚

is negatively associated at the end of the program. We first prove the following

4The precise expected value is 𝑁 · (1 − (1 − 1/𝑁)𝑀 ·𝐻 ), a fact that can also be shown in our
logic. Roughly speaking, this fact follows because each element of 𝑏𝑙𝑜𝑜𝑚 is the logical-or of 𝑀 ·𝐻
probabilistically independent bits, each 1 with probability 1/𝑁 and 0 otherwise. This argument
does not rely on negative association.

93


judgment for inner:

{
𝑁

⊛
𝛽=0

Own(𝑏𝑙𝑜𝑜𝑚 [𝛽]) ∗
𝑁

⊛
𝛾=0

Own(𝑏𝑖𝑛[𝛾])} inner {
𝑁

⊛
𝛽=0

Own(𝑏𝑙𝑜𝑜𝑚 [𝛽]) ⊛
𝑁

⊛
𝛾=𝑛

Own(𝑏𝑖𝑛[𝛾])}
We will apply the rule LOOP on inner with the following loop invariant:

𝜑 =

𝑁

⊛
𝛽=0

Own(𝑏𝑙𝑜𝑜𝑚 [𝛽]) ⊛
𝑁

⊛
𝛾=𝑛

Own(𝑏𝑖𝑛[𝛾])

To show that the loop invariant is preserved by the body, we can first show:

{Own(𝑏𝑙𝑜𝑜𝑚 [𝑛], 𝑏𝑖𝑛[𝑛])} 𝑢𝑝𝑑 ← 𝑏𝑙𝑜𝑜𝑚 [𝑛] | | 𝑏𝑖𝑛[𝑛] {[𝑢𝑝𝑑 = 𝑏𝑙𝑜𝑜𝑚 [𝑛] | | 𝑏𝑖𝑛[𝑛]]}

using RASSN. Noting that the boolean-or operator is a monotone operation, we

may apply NEGFRAME to obtain:

{Own(𝑏𝑙𝑜𝑜𝑚 [𝑛], 𝑏𝑖𝑛[𝑛]) ⊛ 𝜂} 𝑢𝑝𝑑 ← 𝑏𝑙𝑜𝑜𝑚 [𝑛] | | 𝑏𝑖𝑛[𝑛] {Own(𝑢𝑝𝑑) ⊛ 𝜂}

with the framing condition

𝜂 =
©­«
𝑛−1

⊛
𝛽=0

Own(𝑏𝑙𝑜𝑜𝑚 [𝛽])ª®¬ ⊛ ©­«
𝑁

⊛
𝛽=𝑛+1

Own(𝑏𝑙𝑜𝑜𝑚 [𝛽])ª®¬ ⊛ ©­«
𝑁

⊛
𝛾=𝑛+1

Own(𝑏𝑖𝑛[𝛾])ª®¬.
Thus, by re-associating the separating conjunction and applying DASSN for the

remaining two assignments in the inner-most loop, we have:

{𝜑} 𝑢𝑝𝑑 ← 𝑏𝑙𝑜𝑜𝑚 [𝑛] | | 𝑏𝑖𝑛[𝑛]; 𝑏𝑙𝑜𝑜𝑚 [𝑛] ← 𝑢𝑝𝑑; 𝑛← 𝑛 + 1 {𝜑}

and thus by LOOP, we have:

{
𝑁

⊛
𝛽=0

Own(𝑏𝑙𝑜𝑜𝑚 [𝛽]) ⊛
𝑁

⊛
𝛾=𝑛

Own(𝑏𝑖𝑛[𝛾])} inner {
𝑁

⊛
𝛽=0

Own(𝑏𝑙𝑜𝑜𝑚 [𝛽]) ⊛
𝑁

⊛
𝛾=𝑛

Own(𝑏𝑖𝑛[𝛾])} .
Now for loop mid, we establish the same loop invariant as we took before:

𝜓 =

𝑁

⊛
𝛽=0

Own(𝑏𝑙𝑜𝑜𝑚 [𝛽])

94


If 𝜓 holds at the beginning of mid, then invariant for the inner-most loop 𝜑 holds

after assigning 0 to 𝑛 and sampling 𝑏𝑖𝑛, since 𝑏𝑖𝑛 is independent of 𝜓 and 𝑏𝑖𝑛

is distributed as OH𝑛, which implies entries in 𝑏𝑖𝑛 are negatively associated

(OH-PNA). Furthermore, 𝜑 implies 𝜓 at the exit of inner, by dropping the con-

junct describing 𝑏𝑖𝑛. Thus, 𝜓 is a valid invariant for mid, and the rest of the proof

proceeds unchanged.

3.6.4 Permutation Hashing

PERMHASH :
𝑔 $← Permu[𝐵·𝐾] ;
𝑛← 0;
𝑐𝑡 ← 0;
while 𝑛 < 𝑁 do
𝑏𝑖𝑛[𝑛] ← 𝑚𝑜𝑑 (𝑔[𝑛], 𝐵);
ℎ𝑖𝑡𝑍 [𝑛] ← [𝑏𝑖𝑛[𝑛] = 𝑍];
𝑐𝑡 ← 𝑐𝑡 + ℎ𝑖𝑡𝑍 [𝑛];
𝑛← 𝑛 + 1

Figure 3.5: Permutation hashing

Our second example considers a scheme for hashing using a random per-

mutation. Consider the program in Figure 3.5, from an algorithm for fast set

intersection [Ding and König, 2011]. Letting 𝐵 be the number of bins, and the

data universe be [𝐵 · 𝐾] = {1, . . . , 𝐵 · 𝐾} where 𝐵 · 𝐾 ≥ 𝑁 , we first draw a uni-

formly random permutation 𝑔 of the data universe. Then, we hash the numbers

𝑛 ∈ [𝑁] into 𝑏𝑖𝑛[𝑛] by applying the hash function 𝑔 and then taking the result

modulo 𝐵. Then, we record whether the item landed in a specific bucket 𝑍 by

computing the indicator ℎ𝑖𝑡𝑍 [𝑛] = [𝑏𝑖𝑛[𝑛] = 𝑍], which is 1 if 𝑏𝑖𝑛[𝑛] = 𝑍 and 0

otherwise, and accumulate the result into the count 𝑐𝑡.

95


Our goal is to show that 𝑐𝑡 is usually not far from its expected value, which

is 𝑁/𝐵. If the quantities {[𝑏𝑖𝑛[𝑛] = 𝑍]}𝑛 were independent, we would be able to

apply a standard concentration bound to the sum 𝑐𝑡. However, {[𝑏𝑖𝑛[𝑛] = 𝑍]}𝑛

are not independent: for instance, since exactly 𝐾 elements from [𝐵 · 𝐾] map to

𝑍 , if 𝑏𝑖𝑛[𝑛] = 𝑍 for 𝑛 ∈ {0, 1, . . . , 𝐾 − 1}, then 𝑏𝑖𝑛[𝐾] = 𝑍 must be false.

Nevertheless, we can show that {[𝑏𝑖𝑛[𝑛] = 𝑍]}𝑛 are negatively associated

random variables. Intuitively, {𝑔[𝑛]}𝑛 are NA random variables because the

result of a uniformly random permutation is NA. Then, {𝑏𝑖𝑛[𝑛]}𝑛 is computed

by mapping the function𝑚𝑜𝑑 (−, 𝐵) over the array 𝑔; since this produces another

uniform permutation distribution, the vector {𝑏𝑖𝑛[𝑛]}𝑛 is also NA. By similar

reasoning {[𝑏𝑖𝑛[𝑛] = 𝑍]}𝑛 is also NA, as it is obtained by mapping the function

[− = 𝑍] over {𝑏𝑖𝑛[𝑛]}𝑛.

We formalize the reasoning using the program logic LINA. For the main

loop, we apply the rule LOOP with the following loop invariant:

𝑛∧
𝛼=0

[ℎ𝑖𝑡𝑍 [𝛼] = [𝑚𝑜𝑑 (𝑔[𝛼], 𝐵) = 𝑍]] ∧ Permu[𝐵·𝐾] ⟨𝑔⟩

∧ 𝑐𝑡 =
𝑛∑︁
𝛼=0

ℎ𝑖𝑡𝑍 [𝛼] ∧ ((𝑛 ≥ 𝑁) → [𝑛 = 𝑁])

The loop invariant is preserved by the body of the loop, using RASSN

and CONST.

Thus we can show the following judgment:

{[𝑐𝑡 = 0] ∧ [𝑛 = 0]}

𝑙𝑜𝑜𝑝

{
𝑁∧
𝛼=0

[ℎ𝑖𝑡𝑍 [𝛼] = [𝑚𝑜𝑑 (𝑔[𝛼], 𝐵) = 𝑍]] ∧ Permu[𝐵·𝐾] ⟨𝑔⟩ ∧
[
𝑐𝑡 =

𝑁∑︁
𝛼=0

ℎ𝑖𝑡𝑍 [𝛼]
]
}

96


Applying (Perm-Map), the post-condition implies:

𝑁∧
𝛼=0

[ℎ𝑖𝑡𝑍 [𝛼] = [𝑚𝑜𝑑 (𝑔[𝛼], 𝐵) = 𝑍]] ∧
𝑁∗
𝛼=0

Own(ℎ𝑖𝑡𝑍 [𝛼])

∧ Permu[𝐵·𝐾] ⟨𝑔⟩ ∧
[
𝑐𝑡 =

𝑁∑︁
𝛼=0

ℎ𝑖𝑡𝑍 [𝛼]
]

Applying basic axioms about expected value and the permutation distribution

((PermMarg) (ProbUnif) (BijectUnif)), we have:

𝑁∗
𝛼=0

Own(ℎ𝑖𝑡𝑍 [𝛼]) ∧
[
𝑐𝑡 =

𝑁∑︁
𝛼=0

ℎ𝑖𝑡𝑍 [𝛼]
]
∧ [E[𝑐𝑡] = 𝑁/𝐵]

And we can apply the negative-association Chernoff bound (NA-Chernoff-2) to

conclude:

{⊤} PERMHASH {Pr[|𝑐𝑡 − 𝑁/𝐵 | > 𝑇 (𝛽, 𝑁)] < 𝛽}

This conclusion corresponds to Proposition A.2 in Ding and König [2011] algo-

rithm for fast set intersection.5

3.6.5 Fully-dynamic Dictionary

For our next example, we consider a hashing scheme for a fully-dynamic dic-

tionary, a space-efficient data structure that supports insertions, deletions, and

membership queries. The top level of the data structure by Bercea and Even

[2022] uses a two-level hashing scheme: elements are first hashed into a crate,

and then hashed into a pocket dictionary within each crate. As part of the space

analysis of their scheme, Bercea and Even [2022] proves a high-probability

5Ding and König [2011] apply a variant of the Chernoff bound to obtain a multiplicative,
rather than an additive, error guarantee. We present the additive version since the bound is a
bit simpler, but there is no difficulty in handling the multiplicative version in our framework.

97


FDDICT :
𝑏𝑖𝑛𝐶𝑡 ← zero(𝐶, 𝑃);
𝑜𝑣𝑒𝑟𝐶𝑡 ← zero(𝐶);
𝑛← 0;
while 𝑛 < 𝑁 do
𝑐𝑟𝑎𝑡𝑒[𝑛] $← OH[𝐶] ;
𝑝𝑜𝑐𝑘𝑒𝑡 [𝑛] $← OH[𝑃] ;
𝑏𝑖𝑛[𝑛] ← 𝑐𝑟𝑎𝑡𝑒[𝑛]⊤ · 𝑝𝑜𝑐𝑘𝑒𝑡 [𝑛];
𝑐 ← 0;
while 𝑐 < 𝐶 do
𝑝 ← 0;
while 𝑝 < 𝑃 do
𝑢𝑝𝑑 ← 𝑏𝑖𝑛𝐶𝑡 [𝑐] [𝑝] + 𝑏𝑖𝑛[𝑛] [𝑐] [𝑝];
𝑏𝑖𝑛𝐶𝑡 [𝑐] [𝑝] ← 𝑢𝑝𝑑;
𝑝 ← 𝑝 + 1;

𝑐 ← 𝑐 + 1;
𝑛← 𝑛 + 1;

𝑐 ← 0;
while 𝑐 < 𝐶 do
𝑝 ← 0;
while 𝑝 < 𝑃 do
𝑜𝑣𝑒𝑟 [𝑐] [𝑝] ← [𝑏𝑖𝑛𝐶𝑡 [𝑐] [𝑝] > 𝑇𝑏𝑖𝑛];
𝑢𝑝𝑑 ← 𝑜𝑣𝑒𝑟𝐶𝑡 [𝑐] + 𝑜𝑣𝑒𝑟 [𝑐] [𝑝];
𝑜𝑣𝑒𝑟𝐶𝑡 [𝑐] ← 𝑢𝑝𝑑;
𝑝 ← 𝑝 + 1;

𝑐 ← 𝑐 + 1

Figure 3.6: Fully-dynamic dictionary [Ding and König, 2011]

bound on the number of pocket dictionaries that overflow after a given number

of elements are inserted.

We extract the program FDDICT in Figure 3.6 from the scheme in Bercea

and Even [2022]. The program models the insertion of 𝑁 elements. Each ele-

ment is first hashed into one of 𝐶 possible crates uniformly at random and then

hashed into one of 𝑃 possible pocket dictionaries uniformly at random. The

variable 𝑏𝑖𝑛[𝑛] is a 𝐶 by 𝑃 matrix, with all entries zero except for the entry at

(𝑐𝑟𝑎𝑡𝑒[𝑛], 𝑝𝑜𝑐𝑘𝑒𝑡 [𝑛]), which is set to 1. Next, the program totals up the number

98


of elements hashing to each (crate, pocket) pair, storing the result in the 𝐶 by

𝑃 matrix 𝑏𝑖𝑛𝐶𝑡. Finally, the program checks which (𝑐𝑟𝑎𝑡𝑒, 𝑝𝑜𝑐𝑘𝑒𝑡) pairs have

count larger than some concrete threshold 𝑇𝑏𝑖𝑛 and records that in 𝑜𝑣𝑒𝑟, and

totals up the number of full pocket dictionaries in each crate (𝑜𝑣𝑒𝑟𝐶𝑡).

Our logic can prove a judgment of the following form for 𝑇𝑏𝑖𝑛 ≥ 𝑁/(𝑃 · 𝐶):

{⊤} FDDICT {
𝐶∧
𝛾=0

Pr[𝑜𝑣𝑒𝑟𝐶𝑡 [𝛾] ≥ 𝑃 · 𝐹 (𝑇𝑏𝑖𝑛 − 𝑁/(𝑃 · 𝐶), 𝑁) + 𝑇 (𝜌𝑜𝑣𝑒𝑟 , 𝑃)] ≤ 𝜌𝑜𝑣𝑒𝑟},

where the logical variables 𝜌𝑏𝑖𝑛 and 𝜌𝑜𝑣𝑒𝑟 represents the parametric overflow

properties. This formalizes a result similar to Bercea and Even [2022, Claim

21], which states that except with probability 𝛽, all crates have at most 𝑇𝑜𝑣𝑒𝑟

overfull pocket dictionaries. The core of the proof shows that for every crate

index 𝛾, the counts 𝑏𝑖𝑛𝐶𝑡 [𝛾] [𝛽] are negatively associated, using the NEGFRAME

rule as in the array version of the Bloom filter example. Then, we show that

vector 𝑜𝑣𝑒𝑟 [𝛾] [𝛽], which indicates whether each pocket dictionary 𝛽 in crate 𝛾

is overfull or not, is also negatively associated. This holds because 𝑜𝑣𝑒𝑟 [𝛾] [𝛽]

is obtained from 𝑏𝑖𝑛𝐶𝑡 [𝛾] [𝛽] by applying a monotone function. Furthermore,

the count of overflows 𝑜𝑣𝑒𝑟𝐶𝑡 [𝛾] is obtained by another monotone function on

𝑜𝑣𝑒𝑟 [𝛾] [𝛽] and thus its entries are also negatively associated.

Now we prove each step using the program logic. We will refer to the two

outer-most loops as (1) and (2), the next two outer-most loops as (1.1) and (2.1),

and the inner-most loop as (1.1.1).

Computing E[𝑏𝑖𝑛𝐶𝑡 [𝑐] [𝑝]]. For loop (1), we apply LOOP with the following

loop invariant 𝜑:

𝐶∧
𝛾=0

𝑃∧
𝛽=0

[E[𝑏𝑖𝑛𝐶𝑡 [𝛾] [𝛽]] = 𝑛/(𝑃 · 𝐶)] ∧ ([𝑛 ≥ 𝑁] → [𝑛 = 𝑁]) ∧ Detm⟨𝑛⟩.

99


To show that this invariant is preserved by the loop, by applications of SAMP

and RASSN and CONST and FRAME, the following holds after the sampling and

the assignment:

OH[𝑃] ⟨𝑝𝑜𝑐𝑘𝑒𝑡 [𝑛]⟩ ∗ OH[𝐶] ⟨𝑐𝑟𝑎𝑡𝑒[𝑛]⟩ ∧
[
𝑏𝑖𝑛[𝑛] = 𝑐𝑟𝑎𝑡𝑒[𝑛]⊤ · 𝑝𝑜𝑐𝑘𝑒𝑡 [𝑛]

]
. (3.6)

Using an axiom about independence and products of one-hot vectors

(IndProdOH), this implies:

OH[𝐶]×[𝑃] ⟨𝑏𝑖𝑛[𝑛]⟩.

Using an axiom about the one-hot encoding (OHMarg):

E[𝑏𝑖𝑛[𝛼] [𝛾] [𝛽]] = 1/(𝑃 · 𝐶)

for every 𝛼, 𝛾, and 𝛽. Standard loop invariants for loop (1.1) and (1.1.1) show

that: [
𝑏𝑖𝑛𝐶𝑡 [𝑐] [𝑝] =

𝑛∑︁
𝛼=0

𝑏𝑖𝑛[𝛼] [𝑐] [𝑝]
]
,

and linearity of expectation establishes the invariant condition 3.6 for loop (1).

The invariant holds at the start of the loop (1) since 𝑏𝑖𝑛𝐶𝑡 is zero-initialized, and

it also holds at the end of the loop (1). Since 𝑏𝑖𝑛𝐶𝑡 is not modified further, the

expectation equality remains valid at the end of the program due to CONST.

Bounding Pr[𝑏𝑖𝑛𝐶𝑡 [𝑐] [𝑝] > 𝑇𝑏𝑖𝑛]. For loop (1), we also apply LOOP with the

following loop invariant:( 𝑛∗
𝛼=0

Own(𝑏𝑖𝑛[𝛼])
)
∧

𝐶∧
𝛾=0

𝑃∧
𝛽=0

[
𝑏𝑖𝑛𝐶𝑡 [𝛾] [𝛽] =

𝑁∑︁
𝛼=0

𝑏𝑖𝑛[𝛼] [𝛾] [𝛽]
]
∧ [𝑛 = 𝑁] ∧Detm⟨𝑛⟩.

The first conjunction is an invariant, by applying SAMP and FRAME. The rest

of the invariant is preserved, following standard invariants for loops (1.1) and

100


(1.1.1). By projection (IndMap), at the end of the loop (1) we can conclude:
𝐶∧
𝛾=0

𝑃∧
𝛽=0

(
𝑁∗
𝛼=0

Own(𝑏𝑖𝑛[𝛼] [𝛾] [𝛽])
)
∧

[
𝑏𝑖𝑛𝐶𝑡 [𝛾] [𝛽] =

𝑁∑︁
𝛼=0

𝑏𝑖𝑛[𝛼] [𝛾] [𝛽]
]
.

Thus, a standard Chernoff bound gives (here, we apply NA-Chernoff-1 to in-

dependent variables by first changing the independence star into NA star us-

ing WEAK) :
𝐶∧
𝛾=0

𝑃∧
𝛽=0

Pr[𝑏𝑖𝑛𝐶𝑡 [𝛾] [𝛽] ≥ E[𝑏𝑖𝑛𝐶𝑡 [𝛾] [𝛽]] + 𝜌𝑏𝑖𝑛] ≤ 𝐹 (𝜌𝑏𝑖𝑛, 𝑁).

where E[𝑏𝑖𝑛𝐶𝑡 [𝛾] [𝛽]] is 𝑁/(𝑃 ·𝐶) by the previous step. Thus, by applying CONJ,

we can combine the post-conditions and derive
𝐶∧
𝛾=0

𝑃∧
𝛽=0

Pr[𝑏𝑖𝑛𝐶𝑡 [𝛾] [𝛽] ≥ 𝑁/(𝑃 · 𝐶) + 𝜌𝑏𝑖𝑛] ≤ 𝐹 (𝜌𝑏𝑖𝑛, 𝑁). (3.7)

Again, the property holds until the end of the program since 𝑏𝑖𝑛𝐶𝑡 is not modi-

fied further (CONST).

Bounding E[𝑜𝑣𝑒𝑟𝐶𝑡 [𝑐]]. Using standard loop invariants, at the end of the loop

(2) we have:

𝐶∧
𝛾=0

𝑜𝑣𝑒𝑟𝐶𝑡 [𝛾] =
𝑃∑︁
𝛽=0

𝑜𝑣𝑒𝑟 [𝛾] [𝛽]
 ∧

𝑃∧
𝛽=0

[𝑜𝑣𝑒𝑟 [𝛾] [𝛽] = [𝑏𝑖𝑛𝐶𝑡 [𝛾] [𝛽] > 𝑇𝑏𝑖𝑛]] .

Using linearity of expectation and the fact that 𝑜𝑣𝑒𝑟 [𝛾] [𝛽] is either zero or one,

we have:

E[𝑜𝑣𝑒𝑟𝐶𝑡 [𝛾]] =
𝑃∑︁
𝛽=0

E[𝑜𝑣𝑒𝑟 [𝛾] [𝛽]]

=

𝑃∑︁
𝛽=0

E[𝑏𝑖𝑛𝐶𝑡 [𝛾] [𝛽] > 𝑇𝑏𝑖𝑛]

=

𝑃∑︁
𝛽=0

Pr[𝑏𝑖𝑛𝐶𝑡 [𝛾] [𝛽] > 𝑇𝑏𝑖𝑛]

101


Because the bound we obtained in eq. (3.7),

𝑃∑︁
𝛽=0

Pr[𝑏𝑖𝑛𝐶𝑡 [𝛾] [𝛽] > 𝑇𝑏𝑖𝑛] ≤
𝑃∑︁
𝛽=0

𝐹 (𝑇𝑏𝑖𝑛 − 𝑁/(𝑃 · 𝐶), 𝑁)

= 𝑃 · 𝐹 (𝑇𝑏𝑖𝑛 − 𝑁/(𝑃 · 𝐶), 𝑁)

for 𝑇𝑏𝑖𝑛 > 𝑁/(𝑃 · 𝐶). Thus, E[𝑜𝑣𝑒𝑟𝐶𝑡 [𝛾]] = 𝑃 · 𝐹 (𝑇𝑏𝑖𝑛 − 𝑁/(𝑃 · 𝐶), 𝑁).

Bounding Pr[𝑜𝑣𝑒𝑟𝐶𝑡 [𝑐]] > 𝑇𝑜𝑣𝑒𝑟]. At the high level, we want the following

loop invariant for Loop (1):

𝐶∧
𝛾=0

𝑃

⊛
𝛽=0

Own(𝑏𝑖𝑛𝐶𝑡 [𝛾] [𝛽])

We want the following loop invariant for (1.1):

𝐶∧
𝛾=𝑐

𝑃

⊛
𝛽=0

Own(𝑏𝑖𝑛𝐶𝑡 [𝛾] [𝛽]) ⊛
𝑃

⊛
𝛽=0

Own(𝑏𝑖𝑛[𝑛] [𝛾] [𝛽])

∧
𝑐∧
𝛾=0

𝑃

⊛
𝛽=0

Own(𝑏𝑖𝑛𝐶𝑡 [𝛾] [𝛽]) ∧ [𝑐 ≤ 𝐶] ∧ Detm⟨𝑐⟩

And the following loop invariant for (1.1.1):

𝐶∧
𝛾=𝑐+1

𝑃

⊛
𝛽=0

Own(𝑏𝑖𝑛𝐶𝑡 [𝛾] [𝛽]) ⊛
𝑃

⊛
𝛽=0

Own(𝑏𝑖𝑛[𝑛] [𝛾] [𝛽])

∧
𝑃

⊛
𝛽=0

Own(𝑏𝑖𝑛𝐶𝑡 [𝑐] [𝛽]) ⊛
𝑃

⊛
𝛽=𝑝

Own(𝑏𝑖𝑛[𝑛] [𝑐] [𝛽])

∧
𝑐∧
𝛾=0

𝑃

⊛
𝛽=0

Own(𝑏𝑖𝑛𝐶𝑡 [𝛾] [𝛽]) ∧ [𝑐 ≤ 𝐶] ∧ Detm⟨𝑐⟩ ∧ [𝑝 ≤ 𝑃] ∧ Detm⟨𝑝⟩

We show the loops preserve the respective invariant for a fixed 𝛾; the big con-

junction then follows by applying CONJ. Working from inside to outside, we

102


start with loop (1.1.1). To establish the invariant condition, the critical case is

𝛾 = 𝑐. We can pull out:

𝜑 :=
𝑝

⊛
𝛽=0

Own(𝑏𝑖𝑛𝐶𝑡 [𝑐] [𝛽]) ⊛
𝑃

⊛
𝛽=𝑝+1

Own(𝑏𝑖𝑛𝐶𝑡 [𝑐] [𝛽])

⊛
𝑃

⊛
𝛽=𝑝+1

Own(𝑏𝑖𝑛[𝑛] [𝑐] [𝛽]) ⊛ Own(𝑏𝑖𝑛𝐶𝑡 [𝑐] [𝑝]) ⊛ Own(𝑏𝑖𝑛[𝑛] [𝑐] [𝑝])︸                                                    ︷︷                                                    ︸
Φ

Now, we can use the assignment rule to show:

{Φ} 𝑢𝑝𝑑 ← 𝑏𝑖𝑛𝐶𝑡 [𝑐] [𝑝] + 𝑏𝑖𝑛[𝑛] [𝑐] [𝑝] {𝜑 ∧ [𝑢𝑝𝑑 = 𝑏𝑖𝑛𝐶𝑡 [𝑐] [𝑝] + 𝑏𝑖𝑛[𝑛]𝑐𝑝]}

Since addition is a monotone function, the NA frame rule NEGFRAME gives:

𝑝

⊛
𝛽=0

Own(𝑏𝑖𝑛𝐶𝑡 [𝑐] [𝛽]) ⊛
𝑃

⊛
𝛽=𝑝+1

Own(𝑏𝑖𝑛𝐶𝑡 [𝑐] [𝛽]) ⊛
𝑃

⊛
𝛽=𝑝+1

Own(𝑏𝑖𝑛[𝑛] [𝑐] [𝛽]) ⊛ Own(𝑢𝑝𝑑)

after the assignment to 𝑢𝑝𝑑. After the assignment to 𝑏𝑖𝑛[𝑐] [𝑝], we can fold it

into
𝑃

⊛
𝛽=0

Own(𝑏𝑖𝑛𝐶𝑡 [𝑐] [𝛽]) ⊛
𝑃

⊛
𝛽=𝑝+1

Own(𝑏𝑖𝑛[𝑛] [𝑐] [𝛽]).

Then, by applying DASSN, we get

𝑃

⊛
𝛽=0

Own(𝑏𝑖𝑛𝐶𝑡 [𝑐] [𝛽]) ⊛
𝑃

⊛
𝛽=𝑝

Own(𝑏𝑖𝑛[𝑛] [𝑐] [𝛽]).

The assertion for all the other 𝛾 remains unchanged, so we establish the invari-

ant for loop (1.1.1).

Now we reason about the loop (1.1). Since 𝑏𝑖𝑛𝐶𝑡 is zero-initialized (DetInd),

the invariant for loop (1.1.1) holds on loop entry. Then, apply LOOP with the

103


loop invariant established above for loop (1.1.1) gives us

𝐶∧
𝛾=𝑐+1

𝑃

⊛
𝛽=0

Own(𝑏𝑖𝑛𝐶𝑡 [𝛾] [𝛽]) ⊛
𝑃

⊛
𝛽=0

Own(𝑏𝑖𝑛[𝑛] [𝛾] [𝛽])

∧
𝑐+1∧
𝛾=0

𝑃

⊛
𝛽=0

Own(𝑏𝑖𝑛𝐶𝑡 [𝛾] [𝛽]) ∧ [𝑐 < 𝐶] ∧ Detm⟨𝑐⟩

after the termination of loop (1.1.1). The program then deterministically in-

creases 𝑐 by 1, and by DASSN, we can establish the loop invariant for loop 1.1.

Similarly, loop invariant for loop (1) is established when loop (1.1) exits and

we increase 𝑛 by 1. Thus, after loop (1) terminates, we have the postcondition:

𝐶∧
𝛾=0

𝑃

⊛
𝛽=0

Own(𝑏𝑖𝑛𝐶𝑡 [𝛾] [𝛽])

Next, we tackle loop (2). We take the invariant:

𝑐∧
𝛾=0

𝑃

⊛
𝛽=0

Own(𝑜𝑣𝑒𝑟 [𝛾] [𝛽]) ∧
𝑜𝑣𝑒𝑟𝐶𝑡 [𝛾] =

𝑃∑︁
𝛽=0

𝑜𝑣𝑒𝑟 [𝛾] [𝛽]


∧
𝐶∧
𝛾=0

𝑃

⊛
𝛽=0

Own(𝑏𝑖𝑛𝐶𝑡 [𝛾] [𝛽]) ∧ [𝑐 ≤ 𝐶] ∧ Detm⟨𝑐⟩

For the inner loop (2.1), we take the invariant:

𝑐∧
𝛾=0

𝑃

⊛
𝛽=0

Own(𝑜𝑣𝑒𝑟 [𝛾] [𝛽]) ∧
𝑜𝑣𝑒𝑟𝐶𝑡 [𝛾] =

𝑃∑︁
𝛽=0

𝑜𝑣𝑒𝑟 [𝛾] [𝛽]


∧
𝑝

⊛
𝛽=0

Own(𝑜𝑣𝑒𝑟 [𝑐] [𝛽]) ⊛
𝑃

⊛
𝛽=𝑝

Own(𝑏𝑖𝑛𝐶𝑡 [𝑐] [𝛽]) ∧
𝑜𝑣𝑒𝑟𝐶𝑡 [𝛾] =

𝑝∑︁
𝛽=0

𝑜𝑣𝑒𝑟 [𝛾] [𝛽]


∧
𝐶∧
𝛾=0

𝑃

⊛
𝛽=0

Own(𝑏𝑖𝑛𝐶𝑡 [𝛾] [𝛽]) ∧ [𝑐 ≤ 𝐶] ∧ Detm⟨𝑐⟩

104


Again, we show the invariant post-conditions for a fixed 𝛾. For the critical itera-

tion 𝛾 = 𝑐, we again isolate 𝑏𝑖𝑛𝐶𝑡 [𝑐] [𝑝], observe that addition is monotone and

the function [𝑏𝑖𝑛𝐶𝑡 [𝑐] [𝑝] > 𝑇𝑏𝑖𝑛] is monotone in 𝑏𝑖𝑛𝐶𝑡 [𝛾] [𝑝], and apply the NA

frame rule NEGFRAME.

Finally, at the end of the program, we can show:

𝑃

⊛
𝛽=0

Own(𝑜𝑣𝑒𝑟 [𝛾] [𝛽])

along with the regular invariant𝑜𝑣𝑒𝑟𝐶𝑡 [𝛾] =
𝑃∑︁
𝛽=0

𝑜𝑣𝑒𝑟 [𝛾] [𝛽]
 .

We can then apply the negative-dependence Chernoff bound (NA-Chernoff-2):

Pr[𝑜𝑣𝑒𝑟𝐶𝑡 [𝛾] ≥ E[𝑜𝑣𝑒𝑟𝐶𝑡 [𝛾]] + 𝑇 (𝜌𝑜𝑣𝑒𝑟 , 𝑃)] ≤ 𝜌𝑜𝑣𝑒𝑟 .

Using the expectation bound from the previous step and putting everything

together, we conclude:

{⊤} FDDICT {
𝐶∧
𝛾=0

Pr[𝑜𝑣𝑒𝑟𝐶𝑡 [𝛾] ≥ 𝑃 · 𝐹 (𝑇𝑏𝑖𝑛 − 𝑁/(𝑃 · 𝐶), 𝑁) + 𝑇 (𝜌𝑜𝑣𝑒𝑟 , 𝑃)] ≤ 𝜌𝑜𝑣𝑒𝑟},

thus showing a high-probability upper-bound on the number of overfull pock-

ets dictionaries within each crate.

3.6.6 Repeated Balls-into-bins Process

Our final example considers a probabilistic protocol proposed by Becchetti et al.

[2019], implemented as REPEATBIB in Figure 3.7. Intuitively, the program im-

plements a repeated balls-into-bins process. Initially, 𝑁 balls are distributed

105


REPEATBIB :
𝑟 ← 0;
while 𝑟 < 𝑅 do
𝑛← 0
𝑟𝑒𝑚 ← 0;
while 𝑛 < 𝑁 do
𝑐𝑡 [𝑛] ← 𝑐𝑡 [𝑛] − [𝑐𝑡 [𝑛] > 0];
𝑟𝑒𝑚 ← 𝑟𝑒𝑚 + [𝑛 > 0];
𝑛← 𝑛 + 1;

𝑗 ← 0;
while 𝑗 < 𝑟𝑒𝑚 do
𝑏𝑖𝑛[ 𝑗] $← OH[𝑁] ;
𝑘 ← 0;
while 𝑘 < 𝑁 do
𝑢𝑝𝑑 ← 𝑐𝑡 [𝑘] + 𝑏𝑖𝑛[ 𝑗] [𝑘];
𝑐𝑡 [𝑘] ← 𝑢𝑝𝑑;
𝑘 ← 𝑘 + 1;

𝑗 ← 𝑗 + 1;
𝑛← 0;
𝑒𝑚𝑝𝑡𝑦𝐶𝑡 [𝑟] ← 0;
𝑒𝑚𝑝𝑡𝑦 ← 𝑖𝑠𝑍𝑒𝑟𝑜(𝑐𝑡);
while 𝑛 < 𝑁 do
𝑢𝑝𝑑 ← 𝑒𝑚𝑝𝑡𝑦𝐶𝑡 [𝑟] + 𝑒𝑚𝑝𝑡𝑦[𝑛];
𝑒𝑚𝑝𝑡𝑦𝐶𝑡 [𝑟] ← 𝑢𝑝𝑑;
𝑛← 𝑛 + 1;

𝑟 ← 𝑟 + 1;

Figure 3.7: Repeated balls-into-bins [Becchetti et al., 2019]

among 𝑁 bins (𝑐𝑡 [𝑛]). For 𝑅 rounds, in each round, a ball is first removed from

every non-empty bin. Then, the 𝑟𝑒𝑚 removed balls are randomly reassigned to

bins. This process is useful for distributed protocols and scheduling algorithms,

where the balls represent tasks and the bins represent computation nodes. Bec-

chetti et al. [2019] proposed and analyzed this algorithm (e.g., bounding the

maximum load, proving how long it takes for all balls to visit all bins). We can

verify the following lower-bound on the number of empty bins, analogous to

106


Becchetti et al. [2019, Lemma 1 and Lemma 2]:

{𝑁 ≥ 2 ∧
[
𝑁∑︁
𝛼=0

𝑐𝑡 [𝛼] = 𝑁
]
}

REPEATBIB

{Pr

𝑅∨
𝛽=0

(𝑒𝑚𝑝𝑡𝑦𝐶𝑡 [𝛽] < 𝑁

15
− 𝑇 (𝜌𝑒𝑚𝑝𝑡𝑦, 𝑁))

 ≤ 𝑅 · 𝜌𝑒𝑚𝑝𝑡𝑦}
Two aspects of this program make it more difficult to verify. First, there is a loop

with a randomized guard: the number of removed balls 𝑟𝑒𝑚 is a randomized

quantity. Reasoning about such loops is challenging because our LOOP rule is

not directly applicable and only far weaker rules are available for loops with

general randomized guards. Becchetti et al. [2019] sidestep this problem by con-

ditioning on the number of balls in each bin, which also fixes 𝑟𝑒𝑚 to be some

value, proving the target property for every fixed setting, and then combining

the proofs together. LINA can formalize this style of reasoning using the ran-

domized case analysis rule RCASE to condition on 𝑟𝑒𝑚’s value, and then apply

the section 2.3.3 rule; however, the post-condition of section 3.5.2 must be closed

under mixtures, while independence and negative association are known not to

satisfy this side-condition. Thus, it is not possible to prove negative association

by first conditioning and then combining. To work around this second problem,

we use a technique from Becchetti et al. [2019] and prove, on each conditional

distribution, a high-probability bound using the Chernoff bound. The benefit of

this approach is that high-probability bounds are closed under mixture, so we

can apply RCASE to combine the results.

In the formal proof, we will refer to the loops in Figure 3.7 using the same

scheme we used before: the outer-most loop is loop (1), the three next-outer-

most loops are loops (1.1), (1.2), and (1.3), and the inner-most loop is loop (1.2.1).

107


Starting from the outside, we take the following invariant for loop (1):

Pr

𝑟∨
𝛽=0

(𝑒𝑚𝑝𝑡𝑦𝐶𝑡 [𝛽] > 𝑇𝑒𝑚𝑝𝑡𝑦)
 ≤ 𝑟 · 𝜌𝑒𝑚𝑝𝑡𝑦 ∧

[
𝑁∑︁
𝛼=0

𝑐𝑡 [𝛼] = 𝑁
]

Showing the invariant condition requires some work. First, note that:

|=Mem

[
𝑁∑︁
𝛼=0

𝑐𝑡 [𝛼] = 𝑁
]
→

∨
𝜎:[𝑁]→[𝑁]

𝑁∧
𝛼=0

[
𝑐𝑡 [𝛼] = |𝜎−1(𝛼) |

]
where 𝜎 : [𝑁] → [𝑁] ranges over all assignments of 𝑁 balls to 𝑁 bins. We write

𝜏(𝛼) = |𝜎−1(𝛼) | for the number of balls in bin 𝛼. We will show:

{
𝑁∧
𝛼=0

[𝑐𝑡 [𝛼] = 𝜏(𝛼)]} 𝑏𝑜𝑑𝑦 {Pr
[
𝑒𝑚𝑝𝑡𝑦𝐶𝑡 [𝑟] < 𝑇𝑒𝑚𝑝𝑡𝑦

]
≤ 𝜌𝑒𝑚𝑝𝑡𝑦}

where 𝑏𝑜𝑑𝑦 is the body of loop (1). For loop (1.1), it is straightforward to show

the invariant using RASSN:
𝑁∧
𝛼=𝑛

( [𝑐𝑡 [𝛼] = 𝜏(𝛼)]) ∧
𝑛∧
𝛼=0

( [𝑐𝑡 [𝛼] = 𝜏(𝛼) − [𝜏(𝛼) > 0]])

∧
[
𝑟𝑒𝑚 =

𝑛∑︁
𝛼=0

[𝜎(𝛼) > 0]
]
∧ [𝑛 ≤ 𝑁]

Using the loop rule LOOP, we derive the following at the exit of loop (1.1),
𝑁∧
𝛼=0

( [𝑐𝑡 [𝛼] = 𝜏(𝛼) − [𝜏(𝛼) > 0]]) ∧
[
𝑟𝑒𝑚 =

𝑁∑︁
𝛼=0

[𝜎(𝛼) > 0]
]

Since counts are all equal to expressions of logical variables, conditioning on the

logical variable 𝜎, they are all deterministic; Thus, we have
𝑁∧
𝛼=0

Detm⟨𝑐𝑡 [𝛼]⟩ ∧ Detm⟨𝑟𝑒𝑚⟩

which implies⊛𝑁

𝛼=0 Own(𝑐𝑡 [𝛼]) ∧ Detm⟨𝑟𝑒𝑚⟩.

We take ⊛𝑁

𝛼=0 Own(𝑐𝑡 [𝛼]) ∧ Detm⟨𝑟𝑒𝑚⟩ to be the invariant for loop (1.2).

To establish this, we reason much as in the previous examples. The sampling

rule SAMP gives:
𝑁

⊛
𝛼=0

Own(𝑐𝑡 [𝛼]) ∗ Own(𝑏𝑖𝑛[ 𝑗])

108


By negative association for one-hot encoding (OH-PNA):

𝑁

⊛
𝛼=0

Own(𝑐𝑡 [𝛼]) ∗
𝑁

⊛
𝛼=0

Own(𝑏𝑖𝑛[ 𝑗] [𝛼]).

For the inner-most loop (1.2.1), we apply the same technique as for loop (1.2).

Since loop (1.2) has a randomized guard, 𝑘 is a random variable, and loop (1.2.1)

also has a randomized guard. However, under the conditioning, we may as-

sume that 𝑘 is deterministic and apply LOOP on loop (1.2.1) with the following

invariant:

𝑘

⊛
𝛼=0

Own(𝑐𝑡 [𝛼]) ⊛
𝑁

⊛
𝛼=𝑘
(Own(𝑐𝑡 [𝑘]) ⊛ Own(𝑏𝑖𝑛[ 𝑗] [𝛼])) ∧ [𝑘 ≤ 𝑁]

Like in earlier examples, we can establish this invariant using NEGFRAME since

𝑐𝑡 [𝑛] + 𝑏𝑖𝑛[ 𝑗] [𝑛] is monotone. Thus at the exit of loop (1.2.1), we have:

𝑁

⊛
𝛼=0

Own(𝑐𝑡 [𝛼])

And that is preserved to the end of loop (1.2). Next, three applications of the

assignment rule RASSN give:

𝑁

⊛
𝛼=0

Own(𝑐𝑡 [𝛼]) ∧ [𝑛 = 0] ∧ [𝑒𝑚𝑝𝑡𝑦𝐶𝑡 [𝑟] = 0] ∧ [𝑒𝑚𝑝𝑡𝑦 = 𝑖𝑠𝑍𝑒𝑟𝑜(𝑐𝑡)]

The function 𝑖𝑠𝑍𝑒𝑟𝑜(𝑣) takes a numerical vector 𝑣 and returns a vector where

each index 𝑖 is 1 if 𝑣 [𝑖] is zero, else it holds 0. This is an antitone func-

tion: it is non-increasing in its argument. Thus, the monotone mapping axiom

(Mono-Map) gives:
𝑁

⊛
𝛼=0

Own(𝑒𝑚𝑝𝑡𝑦[𝛼])

Then, a standard loop invariant for loop (1.3) gives:

𝑁

⊛
𝛼=0

Own(𝑒𝑚𝑝𝑡𝑦[𝛼]) ∧
[
𝑒𝑚𝑝𝑡𝑦𝐶𝑡 [𝑟] =

𝑁∑︁
𝛼=0

𝑒𝑚𝑝𝑡𝑦[𝛼]
]

109


at the end of loop (1.3). Thus,

{
𝑁∧
𝛼=0

[𝑐𝑡 [𝛼] = 𝜏(𝛼)]} 𝑏𝑜𝑑𝑦 {
𝑁

⊛
𝛼=0

Own(𝑒𝑚𝑝𝑡𝑦[𝛼]) ∧
[
𝑒𝑚𝑝𝑡𝑦𝐶𝑡 [𝑟] =

𝑁∑︁
𝛼=0

𝑒𝑚𝑝𝑡𝑦[𝛼]
]
} .

Now, we are in a position to apply the negative association Chernoff bound

(NA-Chernoff-2), giving the judgment:

{
𝑁∧
𝛼=0

[𝑐𝑡 [𝛼] = 𝜏(𝛼)]} 𝑏𝑜𝑑𝑦 {Pr[𝑒𝑚𝑝𝑡𝑦𝐶𝑡 [𝑟] ≤ E[𝑒𝑚𝑝𝑡𝑦𝐶𝑡 [𝑟]] − 𝑇 (𝜌𝑒𝑚𝑝𝑡𝑦, 𝑁)] ≤ 𝜌𝑒𝑚𝑝𝑡𝑦}

where 𝑏𝑜𝑑𝑦 is the body of loop (1).

Next we bound E[𝑒𝑚𝑝𝑡𝑦𝐶𝑡 [𝑟]] by translating an argument by Becchetti et al.

[2019, Lemma 2] into our logic gives:

{
𝑁∧
𝛼=0

[𝑐𝑡 [𝛼] = 𝜏(𝛼)] ∧ [𝑁 ≥ 2]} 𝑏𝑜𝑑𝑦 {E[𝑒𝑚𝑝𝑡𝑦𝐶𝑡 [𝑟]] ≥ 𝑁/15}

The argument makes use of basic properties of expected values and the expo-

nential function; we omit the details. Thus, we can conclude that

{
𝑁∧
𝛼=0

[𝑐𝑡 [𝛼] = 𝜏(𝛼)]} 𝑏𝑜𝑑𝑦 {Pr[𝑒𝑚𝑝𝑡𝑦𝐶𝑡 [𝑟] ≤ 𝑁/15 − 𝑇 (𝜌𝑒𝑚𝑝𝑡𝑦, 𝑁)] ≤ 𝜌𝑒𝑚𝑝𝑡𝑦} .

Note that this post-condition is closed under the mixture. So now we can ap-

ply the randomized case analysis rule RCASE to combine the proof for different

assignments 𝜎. We can take the trivial pre-condition 𝜑 = ⊤, and the case condi-

tion:

𝜂 :=
∨

𝜎:{𝑁}→{𝑁}

𝑁∧
𝛼=0

[𝑐𝑡 [𝛼] = 𝜏(𝛼)] .

Since 𝜂 asserts that one of the equality holds with probability 1, it is closed under

conditioning. Applying RCASE, we have:

{
∨

𝜎:{𝑁}→{𝑁}

𝑁∧
𝛼=0

[𝑐𝑡 [𝛼] = 𝜏(𝛼)]} 𝑏𝑜𝑑𝑦 {Pr[𝑒𝑚𝑝𝑡𝑦𝐶𝑡 [𝑟] < 𝑁/15 − 𝑇 (𝜌𝑒𝑚𝑝𝑡𝑦, 𝑁)] ≤ 𝜌𝑒𝑚𝑝𝑡𝑦}

110


Also, since |= ∑𝑁
𝛼=0 [𝑐𝑡 [𝛼] = 𝑁] →

∨
𝜎:{𝑁}→{𝑁}

∧𝑁
𝛼=0 [𝑐𝑡 [𝛼] = 𝜏(𝛼)], we can

weaken the precondition into
∑𝑁
𝛼=0 [𝑐𝑡 [𝛼] = 𝑁] →

∨
𝜎:{𝑁}→{𝑁}.

Recalling that we wanted the following invariant to get preserved by loop

(1):

Pr

𝑟∨
𝛽=0

(𝑒𝑚𝑝𝑡𝑦𝐶𝑡 [𝛽] < 𝑇𝑒𝑚𝑝𝑡𝑦)
 ≤ 𝑟 · 𝜌𝑒𝑚𝑝𝑡𝑦 ∧

𝑁∑︁
𝛼=0

[𝑐𝑡 [𝛼] = 𝑁] ∧ Detm⟨𝑟⟩ ∧ [𝑁 ≥ 2]

We can use the rule of constancy CONST and the assignment rule DASSN to

preserve the first conjunct to show:

Pr

𝑟−1∨
𝛽=0

(𝑒𝑚𝑝𝑡𝑦𝐶𝑡 [𝛽] < 𝑇𝑒𝑚𝑝𝑡𝑦)
 ≤ (𝑟 − 1) · 𝜌𝑒𝑚𝑝𝑡𝑦

at the end of the body of loop (1). Combined with the probability bound for

𝑒𝑚𝑝𝑡𝑦𝐶𝑡 [𝑟], an application of the union bound (UnionBd) establishes the in-

variant for loop (1). Putting everything together, we have:

{𝑁 ≥ 2 ∧
𝑁∑︁
𝛼=0

[𝑐𝑡 [𝛼] = 𝑁]}

REPEATBIB

{Pr

𝑅∨
𝛽=0

(𝑒𝑚𝑝𝑡𝑦𝐶𝑡 [𝛽] < 𝑁/15 − 𝑇 (𝜌𝑒𝑚𝑝𝑡𝑦, 𝑁))
 ≤ 𝑅 · 𝜌𝑒𝑚𝑝𝑡𝑦}

analogous to Becchetti et al. [2019, Lemma 1 and 2].

3.7 Related Work

Verifying approximate data structures and applying concentration bounds.

Bloom filters are a data structure supporting approximate membership queries

(AMQs). Ceramist [Gopinathan and Sergey, 2020] is a recent framework for

111


verifying hash-based AMQ structures in the Coq theorem prover. Besides han-

dling Bloom filters, Ceramist supports subtle proofs of correctness for many

other AMQs. Compared with our approach, Ceramist proofs are more precise

but also more intricate, applying theorems about Stirling numbers to achieve a

precise bound on the false positive probability. In contrast, our approach rea-

sons about negative dependence to achieve a substantially simpler proof, albeit

with less precise bounds.

Prior works in verification have also applied the Chernoff bound to bound

sums of independent random quantities (e.g., [Wang et al., 2021, Chakarov and

Sankaranarayanan, 2013]). While independence is easier to establish, the nega-

tive association property that we need is more subtle.

Negative dependence. There are multiple definitions of negative dependence

in the literature, each with their own strengths and weaknesses. We work with

negative association (NA) [Joag-Dev and Proschan, 1983, Dubhashi and Ranjan,

1998], because it holds in many situations where negative dependence should

hold and it is closed under various notions of composition. Recently, the notion

of Strong Rayleigh (SR) [Borcea et al., 2009] distribution has been proposed as

an ideal definition of negative dependence. The SR condition satisfies more

closure properties than NA does; in particular, it is preserved under various

forms of conditioning. However, SR distributions have mostly been studied for

Boolean variables only, and we do not know if an analogue of the monotone

maps property of NA holds for SR.

Beyond theoretical investigations, negative dependence plays a useful role

in many practical applications. In machine learning, negative dependence can

112


help ensure diversity in predictions by a model [Kulesza and Taskar, 2012],

and fast algorithms are known to learn and sample from negatively-dependent

distributions [Anari et al., 2016]. In algorithm design, negative dependence is

a useful tool to randomly round solutions of linear programs to integral so-

lutions [Srinivasan, 2001]. Negative dependence can ensure that certain con-

straints are satisfied exactly after rounding, while still allowing concentration

bounds to be applied to analyze the quality of the rounded solution.

113


CHAPTER 4

A BUNCHED LOGIC FOR DEPENDENCE AND INDEPENDENCE

Conditional independence (CI) is a well-studied notion in probability the-

ory and statistics [Dawid, 1979, Pearl et al., 1989, Dawid, 2001, Simpson, 2018].

While there are many interpretations of CI, a natural reading is in terms of ir-

relevance: 𝑋 and 𝑌 are independent conditioned on 𝑍 if knowing the value of

𝑍 renders 𝑋 and 𝑌 unrelated; in other words, observing one gives no further

information about the other.

Conditional independence has a wide range of applications. For exam-

ple, it enables distinguishing superfluous correlation from causation. For in-

stance, suppose researchers found a strong positive correlation between a na-

tion’s per capita Nobel laureates number and chocolate consumption. A con-

venient (mis)interpretation would be that chocolate consumption makes people

smarter and leads to more Nobel Laureates. But the correlation is likely due

to other factors, e.g., a nation’s economic status, and the two are conditionally

independent fixing the third factor.

Conditional independence can also succinctly encode interesting properties.

As more and more life-changing decisions, e.g., job hiring, judicial decisions,

and loan approvals, are automated using prediction algorithms, algorithmic

fairness has gained more attention. To prevent algorithms from discriminat-

ing based on sensitive features (e.g., race and gender), researchers formalized

notions of fairness originated from different philosophies using conditional in-

dependence. For instance, one school of thought is that an algorithm is fair if it

satisfies equalized odds, i.e., the algorithm’s predictions and the sensitive features

are conditionally independent, fixing the innate quality (i.e., the target label)

114


that the algorithm is aiming to predict; another proposal for fairness is calibra-

tion, which says that fixing on the algorithm’s prediction, the sensitive features

and the target label are conditionally independent. (More details are presented

in Barocas et al. [2023].)

Since we are studying probabilistic programs, we want to reason about con-

ditional independence of (sets of) program variables, which is defined as fol-

lows:

Definition 4.0.1 (Conditional independence). Let 𝑋,𝑌, 𝑍 ⊆ Var. For any 𝑚 ∈

Mem[Var], we write the event {𝜔 ∈ Mem[Var] | ∀𝑥 ∈ 𝑋.𝜔(𝑥) = 𝑚(𝑥)} as 𝑋 = 𝑚.

The set of variables 𝑋 and 𝑌 are independent conditioned on 𝑍 , written 𝑋 ⊥⊥ 𝑌 | 𝑍 ,

if for all 𝑥 ∈ Mem[𝑋], 𝑦 ∈ Mem[𝑌 ], and 𝑧 ∈ Mem[𝑍]:

𝜇(𝑋 = 𝑥 | 𝑍 = 𝑧) · 𝜇(𝑌 = 𝑦 | 𝑍 = 𝑧) = 𝜇(𝑋 = 𝑥,𝑌 = 𝑦 | 𝑍 = 𝑧).

When 𝑍 = ∅, we say 𝑋 and 𝑌 are independent, written 𝑋 ⊥⊥ 𝑌 .

Conditional independence of program variables allows for more efficient

representation of distributions over program memories. For instance, if 𝑋 ⊥⊥

𝑌 | 𝑍 , then instead of storing the joint distribution of 𝑋,𝑌, 𝑍 , one can store the

distribution of 𝑍 , the marginal distribution of 𝑋 given 𝑍 , and the marginal dis-

tribution of 𝑌 given 𝑍 ; when there are 𝑛 possible outcomes for each of 𝑋,𝑌, 𝑍 ,

storing the former takes 𝑂 (𝑛3) space, while storing the latter only takes 𝑂 (𝑛2)

space. The factored representation also enables more efficient inference algo-

rithms (e.g., Holtzen [2021]), which are developed to compute or approximate

the distribution after conditioning on an observation.

Thus, we want to extend probabilistic separation logic to prove conditional

independence of program variables. To achieve that, we need an assertion logic

115


that can express conditional independence. The existing probabilistic BI model

(Section 2.3.2) provides no means to describe the distribution over program

memories conditioned on the values a set of variables takes. Accordingly, one

cannot capture the basic statement of conditional independence, i.e., 𝑋 and 𝑌

are independent conditioned on any value of 𝑍 . To address that problem, we

develop a novel assertion logic DIBI, short for Dependence and Independence BI.

DIBI extends BI with new connectives: the conjunction 𝑃 # 𝑄 for modeling de-

pendence between states and its adjoints 𝑃 � 𝑄 and 𝑃 ⊸ 𝑄. We then develop a

probabilistic model of DIBI so that 𝑃 ∗ 𝑄 can assert probabilistic independence

and 𝑃 # 𝑄 can assert dependence. Then, we express conditional independence

of 𝑋 and 𝑌 given 𝑍 roughly as 𝑍 # (𝑋 ∗ 𝑌 ), which asserts the independence of 𝑋

and 𝑌 while they both depend on 𝑍¿

Intuitively, to assert dependence with the conjunction 𝑃 # 𝑄, we want to in-

terpret # through a binary operator ⊙, where the operator ⊙ is defined so that

in the composed distribution 𝑓 ⊙ 𝑔, the variables described by 𝑔 depends on

the variables described 𝑓 ; however, it is unclear how to define such an operator

⊙ for distributions. To address this problem, we design a DIBI model whose

states are not distributions but Markov kernels [Panangaden, 2009], which are

essentially maps from a set 𝐴 to a distribution over a set 𝐵 and get the name

because of their role in the theory of general Markov processes [Dynkin, 2012].

We will sometimes abbreviate them as kernels for convenience.

Crucially, Markov kernels can be composed sequentially using the bind op-

eration in the distribution monad: given 𝑓 : 𝑋 → D(𝑌 ), 𝑔 : 𝑌 → D(𝑍), the Kleisli

composition 𝑓 ; 𝑔 : 𝑋 → D(𝑍) is:

( 𝑓 ; 𝑔) (𝑥) := bind( 𝑓 (𝑥), 𝑔) (4.1)

116


𝑧 $← Bern1/2;
if 𝑧 then

𝑥 $← Bern3/4;
𝑦 $← Bern3/4;

else
𝑥 $← Bern1/2;
𝑦 $← Bern1/2

(a) Probabilistic
program 𝑝

𝑥 𝑦 𝑧 𝜇

0 0 0 1/8
0 0 1 1/32
1 0 0 1/8
1 0 1 3/32
0 1 0 1/8
0 1 1 3/32
1 1 0 1/8
1 1 1 9/32

(b) Distribution 𝜇

generated by 𝑝

𝑥 𝑦 𝜇0
0 0 1/4
1 0 1/4
0 1 1/4
1 1 1/4

(c) 𝜇 conditioned
on 𝑧 = 0

𝑥 𝑦 𝜇1
0 0 1/16
1 0 3/16
0 1 3/16
1 1 9/16

(d) 𝜇 conditioned
on 𝑧 = 1

Figure 4.1: From probabilistic programs to kernels

Markov kernels generalize distributions because we can lift any distribution

𝜇 : D(𝑋) to a kernel 𝑓𝜇 : 1 → D(𝑋) by assigning 𝜇 to the single element of 1.

Kernels can also encode conditional distributions, which play a key role in con-

ditional independence. We show an example of how to encode conditional dis-

tributions using kernels below.

Example 4.0.1 (Kernels and Conditional Probabilities). Consider the program 𝑝

in Figure 4.1a, where 𝑥, 𝑦, and 𝑧 are Boolean variables. First, flip a fair coin and

store the result in 𝑧. If 𝑧 = 0, flip a fair coin twice, and store the results in 𝑥 and

𝑦, respectively. If 𝑧 = 1, flip a coin with bias 1/4 twice, and store the results in 𝑥

and 𝑦. This program produces a distribution 𝜇, shown in Figure 4.1b.

If we condition 𝜇 on 𝑧 = 0, then the resulting distribution 𝜇0 models two

independent fair coin flips: 1/4 probability for each possible pair of outcomes

(Figure 4.1c). If we condition on 𝑧 = 1, however, then the distribution 𝜇1 will be

skewed — there will be a much higher probability that we observe (1, 1) than

(0, 0), but 𝑥 and 𝑦 are still independent given 𝑧 (Figure 4.1d).

To connect 𝜇0 and 𝜇1 to the original distribution 𝜇, we package 𝜇0 and 𝜇1

117


into a Markov kernel 𝑘 : Mem[𝑧] → D(Mem[{𝑥, 𝑦, 𝑧}]) given by

𝑘 (𝑧 = 𝑖) (𝑥 = 𝑣𝑥 , 𝑦 = 𝑣𝑦) = 𝜇𝑖 (𝑥 = 𝑣𝑥 , 𝑦 = 𝑣𝑦)

for any 𝑣𝑥 , 𝑣𝑦 ∈ Val. Then, the relation between the conditional and original

distributions is 𝑓𝜇 = 𝑓𝜇𝑧 ; 𝑘 , where 𝜇𝑧 is the projection of 𝜇 on {𝑧}.

In the following, we first introduce the metatheory of the DIBI logic in sec-

tion 4.1, then define the probabilistic model of DIBI on Markov kernels in sec-

tion 4.2. After showing that conditional independence can be asserted in the

probabilistic model of DIBI in subsection 4.2.2, we lay out a program logic us-

ing DIBI as the assertion logic in section 4.3.

4.1 DIBI Logic

Analogous to the sections about bunched logic and LINA, we first introduce the

syntax and semantics for DIBI formulas, then provide a proof system, and last

show that the proof system is sound and complete.

4.1.1 Syntax and semantics

The syntax of DIBI extends BI with a non-commutative conjunctive connective #

and its associated implications ⊸ and �. Let AP be a set of propositional

atoms. The set of DIBI formulas, FormDIBI, is generated by the following gram-

118


mar:

𝑃,𝑄 ::= 𝑝 ∈ AP | ⊤ | 𝐼 | ⊥ | 𝑃 ∧𝑄 | 𝑃 ∨𝑄 | 𝑃→ 𝑄

| 𝑃 ∗ 𝑄 | 𝑃 −∗ 𝑄 | 𝑃 #𝑄 | 𝑃 ⊸ 𝑄 | 𝑃 � 𝑄.

DIBI formulas are interpreted on DIBI frames, which extend BI frames. As in

BI frames, we want to define one binary operator, denoted ⊕ here, to interpret

𝑃 ∗ 𝑄, which asserts the separation of resources validating 𝑃 and 𝑄. The main

extension is a new binary operator ⊙ for interpreting the formulas 𝑃 #𝑄, 𝑃 ⊸ 𝑄

and 𝑃 � 𝑄. Informally, we want 𝑃 # 𝑄 to assert that the resource validating

𝑄 depends on the resource validating 𝑃. Because dependence in general is not

commutative, we define ⊙ as a non-commutative operator.

Definition 4.1.1 (DIBI Frame). A DIBI frame is a structure X = (𝑋, ⊑, ⊕, ⊙, 𝐸)

such that ⊑ is a preorder, 𝐸 ⊆ 𝑋 , and ⊕ : 𝑋2 → P(𝑋) and ⊙ : 𝑋2 → P(𝑋) are

binary operations, satisfying the rules in Figure 4.2.

Similar to the case in BI frames, 𝑋 is a set of states, the preorder ⊑ describes

when a smaller state can be extended to a larger state, the binary operators ⊙,

⊕ offer two ways of combining states, and 𝐸 is the set of states that act like

units with respect to these operations. For instance, in our intended model for

probabilistic programs, the states would be Markov kernels that preserve their

input through to their output, which present conditional distributions. We would

define 𝑓 ⊕ 𝑔 to return the set of independent products of two kernels — there

is no standard definition for this but roughly it should generalize independent

product of distributions, and define 𝑓 ⊙𝑔 to return the set of kernels obtained by

the sequential composition of two kernels, which is based on the monadic bind.

The definition of pre-order would generalize the pre-order in PSL’s assertion

logic, which says 𝜇1 is smaller than 𝜇2 if 𝜇1 is a marginal distribution of 𝜇2.

119


𝑧 ∈ 𝑥 ⊕ 𝑦 ∧ 𝑥 ⊒ 𝑥′ ∧ 𝑦 ⊒ 𝑦′→ ∃𝑧′(𝑧 ⊒ 𝑧′ ∧ 𝑧′ ∈ 𝑥′ ⊕ 𝑦′); (⊕ Down-Closed)
𝑧 ∈ 𝑥 ⊙ 𝑦 ∧ 𝑧′ ⊒ 𝑧 → ∃𝑥′, 𝑦′(𝑥′ ⊒ 𝑥 ∧ 𝑦′ ⊒ 𝑦 ∧ 𝑧′ ∈ 𝑥′ ⊙ 𝑦′) (⊙ Up-Closed)
𝑧 ∈ 𝑥 ⊕ 𝑦 → 𝑧 ∈ 𝑦 ⊕ 𝑥; (⊕ Commutativity)
𝑤 ∈ 𝑡 ⊕ 𝑧 ∧ 𝑡 ∈ 𝑥 ⊕ 𝑦 → ∃𝑠(𝑠 ∈ 𝑦 ⊕ 𝑧 ∧ 𝑤 ∈ 𝑥 ⊕ 𝑠); (⊕ Associativity)
∃𝑒 ∈ 𝐸 (𝑥 ∈ 𝑒 ⊕ 𝑥); (⊕ Unit Existence)
𝑒 ∈ 𝐸 ∧ 𝑥 ∈ 𝑦 ⊕ 𝑒 → 𝑥 ⊒ 𝑦; (⊕ Unit Coherence)
∃𝑡 (𝑤 ∈ 𝑡 ⊙ 𝑧 ∧ 𝑡 ∈ 𝑥 ⊙ 𝑦) ↔ ∃𝑠(𝑠 ∈ 𝑦 ⊙ 𝑧 ∧ 𝑤 ∈ 𝑥 ⊙ 𝑠); (⊙ Associativity)
∃𝑒 ∈ 𝐸 (𝑥 ∈ 𝑒 ⊙ 𝑥); (⊙ Unit ExistenceL)
∃𝑒 ∈ 𝐸 (𝑥 ∈ 𝑥 ⊙ 𝑒); (⊙ Unit ExistenceR)
𝑒 ∈ 𝐸 ∧ 𝑥 ∈ 𝑦 ⊙ 𝑒 → 𝑥 ⊒ 𝑦; (⊙ CoherenceR)
𝑒 ∈ 𝐸 ∧ 𝑒′ ⊒ 𝑒 → 𝑒′ ∈ 𝐸 ; (Unit Closure)

𝑥 ∈ 𝑦 ⊕ 𝑧 ∧ 𝑦 ∈ 𝑦1 ⊙ 𝑦2 ∧ 𝑧 ∈ 𝑧1 ⊙ 𝑧2 → ∃𝑢, 𝑣(𝑢 ∈ 𝑦1 ⊕ 𝑧1 ∧ 𝑣 ∈ 𝑦2 ⊕ 𝑧2 ∧ 𝑥 ∈ 𝑢 ⊙ 𝑣).
(Reverse Exchange)

Figure 4.2: DIBI frame requirements (with outermost universal quantification
omitted for readability).

The frame conditions define properties that must hold for all models of DIBI.

The frame conditions required for ⊕ are exactly the frame conditions satisfied by

the binary combination in a BI frame; that is, (𝑋, ⊑, ⊕, 𝐸) forms a BI frame. The

binary combination ⊙, in contrast, is not commutative, but it is still associative

and has units. Having ⊙ being non-commutative splits the ⊙ analogues of ⊕ ax-

ioms into pairs of axioms, although we exclude the left version of (⊙ Coherence)

for reasons we explain in section 4.1.2. Also, while ⊕ is downwards-closed as

in the binary operation in BI frames, the new binary combination ⊙ is upwards-

closed. These choices of closedness conditions match the desired interpretations

of ⊕ as independence and ⊙ as dependence: independence should drop down

to substates (which must necessarily be independent if the superstates were so),

while independence should be inherited by superstates (the source of depen-

dence will still be present in any extensions). Finally, the (Reverse Exchange)

condition defines the interaction between ⊕ and ⊙: intuitively, if 𝑦2 depends on

120


𝑦1 and 𝑧2 depends on 𝑧1, and 𝑦1, 𝑦2 are independent from 𝑧1, 𝑧2, then the combi-

nation of 𝑦2 and 𝑧2 depends on 𝑦1 and 𝑧1.

We give a Kripke-style semantics for DIBI.

Definition 4.1.2 (Valuation and model). A persistent valuation of DIBI is an as-

signment V : AP → P(𝑋) of atomic propositions to subsets of states of a DIBI

frame satisfying persistence: if 𝑥 ∈ V(𝑝) and 𝑦 ⊒ 𝑥 then 𝑦 ∈ V(𝑝). A DIBI model

(X,V) is a DIBI frame X together with a persistent valuationV.

We now inductively define satisfaction of DIBI formulas in a DIBI model.

𝑥 |=V ⊤ always
𝑥 |=V ⊥ never
𝑥 |=V 𝐼 iff 𝑥 ∈ 𝐸
𝑥 |=V 𝑝 iff 𝑥 ∈ V(𝑝)
𝑥 |=V 𝑃 ∧𝑄 iff 𝑥 |=V 𝑃 and 𝑥 |=V 𝑄
𝑥 |=V 𝑃 ∨𝑄 iff 𝑥 |=V 𝑃 or 𝑥 |=V 𝑄
𝑥 |=V 𝑃→ 𝑄 iff for all 𝑦 ⊒ 𝑥, 𝑦 |=V 𝑃 implies 𝑦 |=V 𝑄
𝑥 |=V 𝑃 ∗ 𝑄 iff there exist 𝑥′, 𝑦, 𝑧 s.t. 𝑥 ⊒ 𝑥′ ∈ 𝑦 ⊕ 𝑧, 𝑦 |=V 𝑃 and 𝑧 |=V 𝑄
𝑥 |=V 𝑃 #𝑄 iff there exist 𝑦, 𝑧 s.t. 𝑥 ∈ 𝑦 ⊙ 𝑧, 𝑦 |=V 𝑃 and 𝑧 |=V 𝑄
𝑥 |=V 𝑃 −∗ 𝑄 iff for all 𝑦, 𝑧 s.t. 𝑧 ∈ 𝑥 ⊕ 𝑦: 𝑦 |=V 𝑃 implies 𝑧 |=V 𝑄
𝑥 |=V 𝑃 ⊸ 𝑄 iff for all 𝑥′, 𝑦, 𝑧 s.t. 𝑥′ ⊒ 𝑥 and 𝑧 ∈ 𝑥′ ⊙ 𝑦: 𝑦 |=V 𝑃 implies 𝑧 |=V 𝑄
𝑥 |=V 𝑃 � 𝑄 iff for all 𝑥′, 𝑦, 𝑧 s.t. 𝑥′ ⊒ 𝑥 and 𝑧 ∈ 𝑦 ⊙ 𝑥′: 𝑦 |=V 𝑃 implies 𝑧 |=V 𝑄

Figure 4.3: Satisfaction for DIBI

Definition 4.1.3 (DIBI Satisfaction and Validity). Satisfaction at a state 𝑥 in a

model is inductively defined by the clauses in Figure 4.3. As before, we say 𝑃 is

valid in a model, X |=V 𝑃, iff 𝑥 |=V 𝑃 for all 𝑥 ∈ X. 𝑃 is valid, |= 𝑃, iff 𝑃 is valid in

all models. 𝑃 |= 𝑄 iff, for all models, 𝑥 |=V 𝑃 implies 𝑥 |=V 𝑄.

Where the context is clear, we omit the subscript V on the satisfaction re-

lation. With the semantics in Figure 4.3, persistence on propositional atoms

indeed extends to all formulas:

121


Lemma 4.1.1 (Persistence Lemma). For all 𝑃 ∈ FormDIBI, if 𝑥 |= 𝑃 and 𝑥 ⊑ 𝑦, then

𝑦 |= 𝑃.

Proof. We prove that induction on the syntax of the formulas. Specifically, the

persistence of ⊤ and ⊥ is trivial, and the persistent of 𝐼 follows from Unit Clo-

sure. 𝑃 ∧ 𝑄 and 𝑃 ∨ 𝑄 are persistent because of their inductive hypothesis. For

𝑃→ 𝑄, 𝑃 ∗ 𝑄, 𝑃 � 𝑄, and 𝑃 ⊸ 𝑄, persistence is evident because their semantic

clauses account for the order. □

Notably, in fig. 4.3, the semantic clauses for # and ∗ are different even besides

that they use different binary operations — the semantic clause for ∗ has an

additional variable 𝑥′ under the existential quantifier and only requires 𝑥 ⊒ 𝑥′ ∈

𝑦⊕𝑧 instead of 𝑥 ∈ 𝑦⊕𝑧; the semantic clauses for −∗ and ⊸ are also different — the

semantic clause for ⊸ also uses an additional variable 𝑥′ under the existential

quantifier. This difference is due to the different frame axioms satisfied by ⊙ and

⊕ and our goal to ensure lemma 4.1.1 holds. The satisfaction of ⊙ Up-Closed

frame axiom ensures the persistence of the simpler clause for #, and similarly ⊕

Down-Closed ensures the persistence of −∗ [Cao et al., 2017].

4.1.2 Proof system

Now we describe how DIBI formulas can be derived. We give a Hilbert-style

proof system for DIBI in Figure 4.4. This calculus extends the proof system for

BI with additional rules governing the new connectives #, ⊸, and �. In sec-

tion 4.1.3, we will prove this calculus is sound and complete. Here we comment

on two important details in this proof system.

122


𝑃 ⊢ 𝑃
AX

𝑃 ⊢ ⊤
TOP

⊥ ⊢ 𝑃
BOT

𝑃 ⊢ 𝑅 𝑄 ⊢ 𝑅
𝑃 ∨𝑄 ⊢ 𝑅

∨-E
𝑃 ⊢ 𝑄𝑖

𝑃 ⊢ 𝑄1 ∨𝑄2
∨-I

𝑃 ⊢ 𝑄 𝑃 ⊢ 𝑅
𝑃 ⊢ 𝑄 ∧ 𝑅

∧-I-R
𝑄 ⊢ 𝑅

𝑃 ∧𝑄 ⊢ 𝑅
∧-I-L

𝑃 ⊢ 𝑄1 ∧𝑄2

𝑃 ⊢ 𝑄𝑖
∧-E

𝑃 ∧𝑄 ⊢ 𝑅
𝑃 ⊢ 𝑄 → 𝑅

→-I
𝑃 ⊢ 𝑄 → 𝑅 𝑃 ⊢ 𝑄

𝑃 ⊢ 𝑅
→-E

𝑃 ⊢ 𝑅 𝑄 ⊢ 𝑆
𝑃 ∗ 𝑄 ⊢ 𝑅 ∗ 𝑆

∗-CONJ
𝑃 ∗ 𝑄 ⊢ 𝑅
𝑃 ⊢ 𝑄 −∗ 𝑅

−∗-I
𝑃 ⊢ 𝑄 −∗ 𝑅 𝑆 ⊢ 𝑄

𝑃 ∗ 𝑆 ⊢ 𝑅
−∗-E

𝑃 ⊣⊢ 𝑃 ∗ 𝐼
∗-UNIT

𝑃 ∗ 𝑄 ⊢ 𝑄 ∗ 𝑃
∗-COMM

(𝑃 ∗ 𝑄) ∗ 𝑅 ⊣⊢ 𝑃 ∗ (𝑄 ∗ 𝑅)
∗-ASSOC

𝑃 #𝑄 ⊢ 𝑅
𝑃 ⊢ 𝑄 ⊸ 𝑅

⊸-I
𝑃 ⊢ 𝑄 ⊸ 𝑅 𝑆 ⊢ 𝑄

𝑃 # 𝑆 ⊢ 𝑅
⊸ MP

𝑃 #𝑄 ⊢ 𝑅
𝑄 ⊢ 𝑃 � 𝑅

�-I
𝑃 ⊢ 𝑄 � 𝑅 𝑆 ⊢ 𝑄

𝑆 # 𝑃 ⊢ 𝑅
� MP

𝑃 ⊢ 𝐼 # 𝑃
#-LEFT UNIT

𝑃 ⊣⊢ 𝑃 # 𝐼
#-RIGHT UNIT

𝑃 ⊢ 𝑅 𝑄 ⊢ 𝑆
𝑃 #𝑄 ⊢ 𝑅 # 𝑆

#-CONJ

(𝑃 #𝑄) # 𝑅 ⊣⊢ 𝑃 # (𝑄 # 𝑅)
#-ASSOC

(𝑃 #𝑄) ∗ (𝑅 # 𝑆) ⊢ (𝑃 ∗ 𝑅) # (𝑄 ∗ 𝑆)
REVEX

Figure 4.4: Hilbert system for DIBI

123


Reverse exchange The proof system of DIBI shares many similarities with

Concurrent Kleene Bunched Logic (CKBI) [Docherty, 2019], which also extends

BI with a non-commutative conjunction. Inspired by concurrent Kleene alge-

bra (CKA) Hoare et al. [2011], CKBI supports the following exchange axiom,

derived from CKA’s exchange law:

(𝑃 ∗ 𝑅) # (𝑄 ∗ 𝑆) ⊢CKBI (𝑃 #𝑄) ∗ (𝑅 # 𝑆)
EXCH

In models of CKBI, ∗ describes interleaving concurrent composition, while #

describes sequential composition. The exchange rule states that the process on

the left has fewer behaviors than the process on the right — e.g., 𝑃 # 𝑄 allows

fewer behaviors than 𝑃 ∗ 𝑄, so 𝑃 #𝑄 ⊢CKBI 𝑃 ∗ 𝑄 is derivable.

In our models, ∗ has a different reading: it states that two computations can

be combined because they are independent (i.e., non-interfering). Accordingly,

DIBI replaces EXCH by the reversed version REVEX — the fact that the process on

the left is safe to combine implies that the process on the right is also safe. 𝑃 ∗ 𝑄

is now stronger than 𝑃 #𝑄, and instead 𝑃 ∗ 𝑄 ⊢ 𝑃 #𝑄 is derivable (Lemma 4.1.2).

Lemma 4.1.2. In the proof system given by fig. 4.4, 𝑃 ∗ 𝑄 ⊢ 𝑃 #𝑄.

Proof. For better readability, we break the proof tree down into two components.

#-RIGHT UNIT
𝑃 ⊢ 𝑃 # 𝐼 #-LEFT UNIT

𝑄 ⊢ 𝐼 #𝑄 ∗-CONJ
𝑃 ∗ 𝑄 ⊢ (𝑃 # 𝐼) ∗ (𝐼 #𝑄) REVEX(𝑃 # 𝐼) ∗ (𝐼 #𝑄) ⊢ (𝑃 ∗ 𝐼) # (𝐼 ∗ 𝑄)

CUT
𝑃 ∗ 𝑄 ⊢ (𝑃 ∗ 𝐼) # (𝐼 ∗ 𝑄)

With 𝑃 ∗ 𝑄 ⊢ (𝑃 ∗ 𝐼) # (𝐼 ∗ 𝑄), we construct the following

124


𝑃 ∗ 𝑄 ⊢ (𝑃 ∗ 𝐼) # (𝐼 ∗ 𝑄)

∗-UNIT
𝑃 ∗ 𝐼 ⊢ 𝑃

∗-COMM
𝐼 ∗ 𝑄 ⊢ 𝑄 ∗ 𝐼 ∗-UNIT

𝑄 ∗ 𝐼 ⊢ 𝑄
CUT

𝐼 ∗ 𝑄 ⊢ 𝑄
#-CONJ(𝑃 ∗ 𝐼) # (𝐼 ∗ 𝑄) ⊢ 𝑃 #𝑄

CUT
𝑃 ∗ 𝑄 ⊢ 𝑃 #𝑄

This proof uses the admissible rule CUT, which can be derived as follows:

𝑄 ⊢ 𝑅
∧2

𝑃 ∧𝑄 ⊢ 𝑅 →
𝑃 ⊢ 𝑄 → 𝑅 𝑃 ⊢ 𝑄

MP
𝑃 ⊢ 𝑅

□

Left unit While # has a right unit in our logic, it does not have a proper left

unit. Semantically, this corresponds to the lack of a frame condition

𝑒 ∈ 𝐸 ∧ 𝑥 ∈ 𝑒 ⊙ 𝑦 → 𝑥 ⊒ 𝑦; (⊙ CoherenceL)

in our definition of DIBI frames. This difference can also be seen in our proof

rules: while #-UNIT-R gives entailment in both directions, #-UNIT-R only shows

entailment in one direction — there is no axiom stating 𝐼 # 𝑃 ⊢ 𝑃.

We make this relaxation to support our intended model, which we will see in

Section 4.2. In a nutshell, states in our models are Markov kernels that preserve

their input through to their output. Our models take ⊙ to be Kleisli composition,

which exhibits an important asymmetry for such arrows: 𝑓 can always be re-

covered from 𝑓 ⊙ 𝑒, but not from arbitrary 𝑒⊙ 𝑓 . As a result, the set of all kernels

naturally serves as the set of right units, but these kernels cannot all serve as left

units.1

1In the special case that 𝑒 maps the input of 𝑓 to the Dirac distribution on it, then 𝑒 ⊙ 𝑓 = 𝑓 .
But because we also want Unit Closure, which says the set of units is closed under the pre-order
⊑, our unit set 𝐸 contains other elements 𝑔 such that 𝑓 cannot be recovered from 𝑔 ⊙ 𝑓 .

125


4.1.3 Soundness and Completeness of DIBI

The soundness and completeness of DIBI follow the same recipe as before, us-

ing the methodology given by Docherty [2019]. First, DIBI is proved sound and

complete with respect to an algebraic semantics obtained by interpreting the

rules of the proof system as algebraic axioms. We then establish a representa-

tion theorem: every DIBI algebra A embeds into a DIBI algebra generated by a

DIBI frame, that is in turn generated by A. Soundness and completeness of the

algebraic semantics can then be transferred to the Kripke semantics.

We prove algebraic soundness and completeness of DIBI proof systems with

respect to a new structure called DIBI algebra.

Definition 4.1.4 (DIBI Algebra). A DIBI algebra is an algebra A =

(𝐴,∧,∨,→,⊤,⊥, ∗,−∗, #,⊸,�, 𝐼) such that, for all 𝑎, 𝑏, 𝑐, 𝑑 ∈ 𝐴:

• (𝐴,∧,∨,→,⊤,⊥) is a Heyting algebra;

• (𝐴, ∗, 𝐼) is a commutative monoid;

• (𝐴, #, 𝐼) is a weak monoid: # is an associative operation with right unit 𝐼 and

𝑎 ≤ 𝐼 # 𝑎;

• 𝑎 ∗ 𝑏 ≤ 𝑐 iff 𝑎 ≤ 𝑏 −∗ 𝑐;

• 𝑎 # 𝑏 ≤ 𝑐 iff 𝑎 ≤ 𝑏 ⊸ 𝑐 iff 𝑏 ≤ 𝑎 � 𝑐;

• (𝑎 # 𝑏) ∗ (𝑐 # 𝑑) ≤ (𝑎 ∗ 𝑐) # (𝑏 ∗ 𝑑).

An algebraic interpretation of DIBI is specified by an assignment on the atomic

propositions ⟦−⟧ : AP → 𝐴. The interpretation is obtained as the unique

homomorphic extension of this assignment, and so we use the notation ⟦−⟧

126


interchangeably for both assignment and interpretation. Soundness and com-

pleteness can be established by constructing a term DIBI algebra by quotienting

formulas by equiderivability.

Theorem 4.1.3. 𝑃 ⊢ 𝑄 is derivable iff ⟦𝑃⟧ ≤ ⟦𝑄⟧ for all algebraic interpretations ⟦−⟧.

We now connect these algebras to DIBI frames so we can transfer the sound-

ness and completeness of DIBI proof systems with respect to these algebras to

the DIBI frames. Again, we use the notion of complex algebras and prime filters.

We denote the set of prime filters of a DIBI algebra A by Prf(A).

Definition 4.1.5 (Prime Filter Frame). Given a DIBI algebra A, the prime filter

frame of A is defined as 𝑃𝑟 (A) = (Prf(A), ⊆, ⊕A, ⊙A, 𝐸A), where

𝐹 ⊕A 𝐺 = {𝐻 ∈ Prf(A) | ∀𝑎 ∈ 𝐹, 𝑏 ∈ 𝐺 (𝑎 ∗ 𝑏 ∈ 𝐻)}

𝐹 ⊙A 𝐺 = {𝐻 ∈ Prf(A) | ∀𝑎 ∈ 𝐹, 𝑏 ∈ 𝐺 (𝑎 # 𝑏 ∈ 𝐻)}

𝐸A = {𝐹 ∈ Prf(A) | 𝐼 ∈ 𝐹}.

Lemma 4.1.4. For any DIBI algebra A, the prime filter frame 𝑃𝑟 (A) is a DIBI frame.

In the other direction, DIBI frames generate DIBI algebras.

Definition 4.1.6 (Complex Algebra). Given a DIBI frame X = (𝑋, ⊑, ⊕, ⊙, 𝐸), the

complex algebra ofX is Com(X) = (P⊑ (𝑋),∩,∪,⇒X , 𝑋, ∅, •X ,�X , ⊲X ,−⊲X , ⊲−X , 𝐸):

P⊑ (𝑋) = {𝐴 ⊆ 𝑋 | if 𝑎 ∈ 𝐴 and 𝑎 ⊑ 𝑏 then 𝑏 ∈ 𝐴}

𝐴⇒X 𝐵 = {𝑎 | for all 𝑏, if 𝑏 ⊒ 𝑎 and 𝑏 ∈ 𝐴 then 𝑏 ∈ 𝐵}

𝐴 •X 𝐵 = {𝑥 | there exist 𝑥′, 𝑎, 𝑏 s.t 𝑥 ⊒ 𝑥′ ∈ 𝑎 ⊕ 𝑏, 𝑎 ∈ 𝐴 and 𝑏 ∈ 𝐵}

𝐴 �X 𝐵 = {𝑥 | for all 𝑎, 𝑏, if 𝑏 ∈ 𝑥 ⊕ 𝑎 and 𝑎 ∈ 𝐴 then 𝑏 ∈ 𝐵}

𝐴 ⊲X 𝐵 = {𝑥 | there exist 𝑎, 𝑏 s.t 𝑥 ∈ 𝑎 ⊙ 𝑏, 𝑎 ∈ 𝐴 and 𝑏 ∈ 𝐵}

𝐴 −⊲X 𝐵 = {𝑥 | for all 𝑥′, 𝑎, 𝑏, if 𝑥 ⊑ 𝑥′, 𝑏 ∈ 𝑥′ ⊙ 𝑎 and 𝑎 ∈ 𝐴 then 𝑏 ∈ 𝐵}

𝐴 ⊲−X 𝐵 = {𝑥 | for all 𝑥′, 𝑎, 𝑏, if 𝑥 ⊑ 𝑥′, 𝑏 ∈ 𝑎 ⊙ 𝑥′ and 𝑎 ∈ 𝐴 then 𝑏 ∈ 𝐵}.

127


Lemma 4.1.5. For any DIBI frame X, the complex algebra Com(X) is a DIBI algebra.

The following main result facilitates the transference of soundness and com-

pleteness.

Theorem 4.1.6 (Representation of DIBI algebras). Every DIBI algebra is isomorphic

to a subalgebra of a complex algebra: given a DIBI algebra A, the map 𝜃A : A →

Com(Prf(A)) defined by 𝜃A(𝑎) = {𝐹 ∈ Prf(A) | 𝑎 ∈ 𝐹} is an embedding.

Given the previous correspondence between DIBI algebras and frames, we

only need to show that 𝜃 is a monomorphism: the necessary argument is iden-

tical to that for similar bunched logics [Docherty, 2019, Theorems 6.11, 6.25].

Given ⟦−⟧ on A, the representation theorem establishes thatV⟦−⟧(𝑝) := 𝜃A(⟦𝑝⟧)

is a persistent valuation on 𝑃𝑟 (A) such that 𝐹 |=V⟦−⟧ 𝑃 iff ⟦𝑃⟧ ∈ 𝐹, from which

our main theorem can be proved.

Theorem 4.1.7 (Soundness and Completeness). 𝑃 ⊢ 𝑄 is derivable iff 𝑃 |= 𝑄.

4.2 A Probabilistic Model of DIBI

Now we develop a probabilistic model of DIBI where 𝑃 ∗ 𝑄 can assert proba-

bilistic independence and 𝑃 #𝑄 can assert dependence.

Because our DIBI model is designed to describe probabilistic programs’ pro-

gram states, in the remainder of this chapter, we use the term (Markov) kernels

to specifically refer to maps 𝑓 : Mem[𝑆] → D(Mem[𝑈]) with 𝑆,𝑈 ⊆ Var. For a

kernel 𝑓 , we define its domain dom( 𝑓 ) = 𝑆 and its range range( 𝑓 ) = 𝑈. We can

also project kernels to a smaller range.

128


Definition 4.2.1 (Marginalizing kernels). For a Markov kernel 𝑓 : Mem[𝑆] →

D(Mem[𝑈]) and 𝑉 ⊆ 𝑈, the marginalization of 𝑓 to V is the map 𝜋𝑉 𝑓 : Mem[𝑆] →

D(Mem[𝑉]): (𝜋𝑉 𝑓 ) (𝑑) (𝑟) :=
∑
𝑚∈Mem[𝑈\𝑉] 𝑓 (𝑑) (𝑟 ⊲⊳ 𝑚) for 𝑑 ∈ Mem[𝑆], 𝑟 ∈

Mem[𝑉].

Now we define an important requirement for our DIBI model’s states.

Definition 4.2.2. We use unit𝑆 to denote the kernel 𝑔 : Mem[𝑆] → D(Mem[𝑆])

defined by 𝑔(𝑚) = unit(𝑚) for all 𝑚 ∈ Mem[𝑆]. We say a kernel 𝑓 : Mem[𝑆] →

D(Mem[𝑈]) preserves its input to its output if 𝑆 ⊆ 𝑈 and 𝜋𝑆 𝑓 = unit𝑆.

Intuitively, kernels that preserve their input to their output are suitable for

encoding conditional distributions: once a variable has been conditioned, its

value should not change. We define two ways to compose these kernels.

Definition 4.2.3 (Composing Markov kernels on memories). Given 𝑓 : Mem[𝑆] →

D(Mem[𝑇]) and 𝑔 : Mem[𝑈] → D(Mem[𝑉]) that preserve their inputs, we

define their parallel composition, whenever 𝑆 ∩ 𝑈 = 𝑇 ∩ 𝑉 , as the map

𝑓 ⊕ 𝑔 : Mem[𝑆 ∪𝑈] → D(Mem[𝑇 ∪𝑉]) given by

( 𝑓 ⊕ 𝑔) (𝑑) (𝑚) := 𝑓 (𝑑𝑆) (𝑚𝑇 ) · 𝑔(𝑑𝑈) (𝑚𝑉 ).

If 𝑇 = 𝑈, the sequential composition 𝑓 ⊙ 𝑔 : Mem[𝑆] → D(Mem[𝑉]) is just Kleisli

composition (eq. (4.1)).

Example 4.2.1 (Kernel decomposition). Recall the distribution 𝜇 on Mem[{𝑥, 𝑦, 𝑧}]

from Example 4.0.1. Let 𝑘𝑥 : Mem[𝑧] → D(Mem[{𝑥, 𝑧}]) encode the conditional

distribution of 𝑥 given 𝑧, and let 𝑘𝑦 : Mem[𝑧] → D(Mem[{𝑦, 𝑧}]) encode the

conditional distribution of 𝑦 given 𝑧. Explicitly, for 𝛼 = 𝑥 or 𝑦,

𝑘𝛼 (𝑧 = 0) (𝛼 = 1, 𝑧 = 0) = 1/2 𝑘𝛼 (𝑧 = 0) (𝛼 = 0, 𝑧 = 0) = 1/2

𝑘𝛼 (𝑧 = 1) (𝛼 = 1, 𝑧 = 1) = 1/4 𝑘𝛼 (𝑧 = 1) (𝛼 = 0, 𝑧 = 1) = 3/4.

129


Since 𝑘𝑥 , 𝑘𝑦 exactly include 𝑧 in their range, 𝑘𝑥⊕𝑘𝑦 is defined. A small calculation

shows that 𝑘𝑥⊕𝑘𝑦 = 𝑘 , where 𝑘 : Mem[𝑧] → D(Mem[{𝑥, 𝑦, 𝑧}]) is the conditional

distribution of (𝑥, 𝑦, 𝑧) given 𝑧. This decomposition shows that 𝑥 and 𝑦 are inde-

pendent conditioned on 𝑧. The correspondence between the decomposition of

kernels and conditional independence is proved in

4.2.1 A Concrete Probabilistic Frame of DIBI

We now have all the ingredients to define a first concrete model: states are

Markov kernels that preserve their input; the binary operation ⊕ behaves as a

parallel composition, and the binary operation ⊙ serves as the sequential com-

position. While there is a canonical choice for the sequential composition of

Markov kernels, i.e., Kleisli composition, there are many choices for the parallel

composition. For instance, it is unclear whether we should only allow paral-

lel composition of kernels with the same domain, or work with a more relaxed

condition. Another difficulty is in the definition of the pre-order. We are go-

ing to define two very different binary operations, and not only do we need

both of their unit sets be closed under the pre-order, we also need the coher-

ence conditions for the pre-order and both binary operations (⊕ Down-Closed,

⊙ Up-Closed, ⊕ Unit Coherence, ⊙ CoherenceR) to hold.

Definition 4.2.4 (Probabilistic frame). We define the frame (X𝐶𝐼 , ⊑, ⊕̂, ⊙̂,X𝐶𝐼) as

follows:

• X𝐶𝐼 are Markov kernels that preserve their input to their output;

• ⊕̂ and ⊙̂ are defined through the parallel and sequential composition of

130


kernels:

𝑓 ⊕̂ 𝑔 =


{ 𝑓 ⊕ 𝑔} if range( 𝑓 ) ∩ range(𝑔) = dom( 𝑓 ) ∩ dom(𝑔)

∅ otherwise

𝑓 ⊙̂ 𝑔 =


{ 𝑓 ⊙ 𝑔} if range( 𝑓 ) = dom(𝑔)

∅ otherwise

• Given 𝑓 , 𝑔 ∈ X𝐶𝐼 , 𝑓 ⊑ 𝑔 if there exist a set of variables 𝑅 ⊆ Val and another

kernel ℎ ∈ X𝐶𝐼 such that 𝑔 = ( 𝑓 ⊕ unit𝑅) ⊙ ℎ.

We make three remarks. First, the binary combinations ⊕̂ and ⊙̂ return sets

with at most one element. So they are essentially a wrapper over their under-

lying operations ⊕ and ⊙, which are partial and deterministic; in the following,

including when proving the structure (X𝐶𝐼 , ⊑, ⊕̂, ⊙̂,X𝐶𝐼) is a DIBI frame, we will

work directly with the underlying operations ⊕ and ⊙.

Second, the definition of 𝑓 ⊙𝑔 onX𝐶𝐼 can be simplified. Given 𝑓 : Mem[𝑆] →

D(Mem[𝑇]) and 𝑔 : Mem[𝑇] → D(Mem[𝑉]), eq. (4.1) yields the formula:

( 𝑓 ⊙ 𝑔) (𝑑) (𝑚) :=
∑︁

𝑚′∈Mem[𝑇]
𝑓 (𝑑) (𝑚′) · 𝑔(𝑚′) (𝑚).

Since 𝑓 , 𝑔 ∈ X𝐶𝐼 preserve input to output, this reduces to

( 𝑓 ⊙ 𝑔) (𝑑) (𝑚) = 𝑓 (𝑑) (𝑚𝑇 ) · 𝑔(𝑚𝑇 ) (𝑚𝑉 ). (4.2)

Third, the preorder is defined so that 𝑓 ⊑ 𝑔 holds when 𝑔 can be obtained

from extending 𝑓 . If 𝑔 is obtained by composing 𝑓 in parallel with unit𝑅, and

then extending the range via composition with ℎ, then we can recover 𝑓 from 𝑔

by marginalizing 𝑔 to range( 𝑓 ) ∪ 𝑅, and then ignoring the 𝑅 portion.

We show that our probabilistic frame is indeed a DIBI frame.

131


Theorem 4.2.1. (X𝐶𝐼 , ⊑, ⊕̂, ⊙̂,X𝐶𝐼) is a DIBI frame.

Proof sketch. Since ⊕̂ and ⊙̂ returns either a singleton set or an empty set, for any

axioms that mention 𝑥 ∈ 𝑦 ⊕̂ 𝑧 (resp. 𝑥 ∈ 𝑦 ⊙̂ 𝑧), we can always use the 𝑥 such that

𝑥 = 𝑦 ⊕ 𝑧 (resp. 𝑥 = 𝑦 ⊙ 𝑧).

We first show thatX𝐶𝐼 is closed under ⊕ and ⊙, and ⊑ is transitive and reflex-

ive. Then we show the frame axioms, which are mostly straightforward. Several

conditions rely on a property of our model that we call Exchange Equality: if both

( 𝑓1⊕ 𝑓2) ⊙ ( 𝑓3⊕ 𝑓4) and ( 𝑓1⊙ 𝑓3) ⊕ ( 𝑓2⊙ 𝑓4) are defined, then they are equal, and if

the second is defined, then so is the first. While its connection with REVEX is the

most obvious, the Exchange Equality is also useful for proving other conditions

since the preorder in X𝐶𝐼 is defined through the binary combinations ⊕ and ⊙.

For example:

(⊕ Unit Coherence): Since the unit set in this frame is the entire state space X𝐶𝐼 ,

we must show that for any 𝑓1, 𝑓2 ∈ X𝐶𝐼 , if 𝑓1⊕ 𝑓2 is defined, then 𝑓1 ⊑ 𝑓1⊕ 𝑓2:

𝑓1 ⊕ 𝑓2 = ( 𝑓1 ⊙ unitrange( 𝑓1)) ⊕ (unitdom( 𝑓2) ⊙ 𝑓2)

= ( 𝑓1 ⊕ unitdom( 𝑓2)) ⊙ (unitrange( 𝑓1) ⊕ 𝑓2) (By Exchange Equality)

= ( 𝑓1 ⊕ unitdom( 𝑓2)) ⊙ ( 𝑓2 ⊕ unitrange( 𝑓1)) (By ⊕ Commutativity)

Also, for the commutativity and associativity of the binary combinations, the

main difficulty lies in showing that both terms are defined at the same time. In

particular, the associativity of ⊕ requires ( 𝑓 ⊕ 𝑔) ⊕ ℎ being defined iff 𝑓 ⊕ (𝑔 ⊕ ℎ)

being defined, which takes some non-trivial set manipulations to prove

We present the complete proof in appendix C.1.5 □

132


4.2.2 Capturing Conditional Independence

Now we return to our original goal: express conditional independence of pro-

gram variables. For that, we introduce some basic atomic propositions and in-

terpret DIBI formulas on the probabilistic DIBI frame (X𝐶𝐼 , ⊑, ⊕̂, ⊙̂,X𝐶𝐼). If the

only property we need to express is conditional independence of program vari-

ables, we only need atomic propositions in the form of (𝐴 ⊲ 𝐵), which intends

to describe the domain and range of the current kernel.

Definition 4.2.5 (Basic atomic proposition). For sets of variables 𝐴, 𝐵 ⊆ Var, a

basic atomic proposition has the form (𝐴 ⊲ 𝐵) and the semantics:

𝑓 |= (𝐴 ⊲ 𝐵) iff there exists 𝑓 ′ ⊑ 𝑓

such that dom( 𝑓 ′) = 𝐴 and range( 𝑓 ′) ⊇ 𝐵.

For example, 𝑓 : Mem[𝑦] → D(Mem[𝑦, 𝑧]) defined by 𝑓 (𝑦 ↦→ 𝑣) := unit(𝑦 ↦→

𝑣, 𝑧 ↦→ 𝑣) satisfies (𝑦 ⊲ 𝑦), (𝑦 ⊲ 𝑧), (𝑦 ⊲ ∅), (𝑦 ⊲ 𝑦, 𝑧), (∅ ⊲ ∅), and no other atomic

propositions.

With these atomic propositions, we can assert conditional independence of

program variables using a simple formula:

Theorem 4.2.2. Given distribution 𝜇 ∈ D(Mem[Var]), then for any 𝑋,𝑌, 𝑍 ⊆ Var,

𝑓𝜇 |= (∅ ⊲ 𝑍) # (𝑍 ⊲ 𝑋) ∗ (𝑍 ⊲ 𝑌 ) (4.3)

if and only if 𝑋 ⊥⊥ 𝑌 | 𝑍 and 𝑋 ∩ 𝑌 ⊆ 𝑍 are both satisfied.

We prove it in the restriction 𝑋 ∩ 𝑌 ⊆ 𝑍 is harmless: when 𝑋 ⊥⊥ 𝑌 | 𝑍 but

𝑋 ∩ 𝑌 ⊈ 𝑍 , then the variables in 𝑋 ∩ 𝑌 must be determined by variables in 𝑍

133


(see lemma C.2.7), and it suffices to check 𝑋 ⊥⊥ 𝑌 | 𝑍 ∪ (𝑋 ∩ 𝑌 ). For simplicity,

we abbreviate the formula (∅ ⊲ 𝑍) # ((𝑍 ⊲ 𝑋) ∗ (𝑍 ⊲ 𝑌 )) as [𝑍] # ( [𝑋] ∗ [𝑌 ]).

Proof sketch. For the forward direction, suppose 𝑓𝜇 satisfies 4.3. We first show

in lemma C.2.6 that this intuitionistic logic has some classical flavor: when-

ever 𝑓𝜇 satisfies 4.3 there exist 𝑓 , 𝑔, and ℎ in X𝐶𝐼 with 𝑓 ⊙ (𝑔 ⊕ ℎ) ⊑ 𝑓𝜇, where

𝑓 : Mem[∅] → D(Mem[𝑍]), 𝑔 : Mem[𝑍] → D(Mem[𝑍 ∪ 𝑋]), and ℎ : Mem[𝑍] →

D(Mem[𝑍 ∪ 𝑌 ]); we also have 𝑋 ∩ 𝑌 ⊆ 𝑍 as 𝑓 ⊙ (𝑔 ⊕ ℎ) is defined. Since

dom( 𝑓𝜇) = Mem[∅], 𝑓 ⊙ (𝑔 ⊕ ℎ) ⊑ 𝑓𝜇 implies:

𝑓 ⊙ (𝑔 ⊕ ℎ) = 𝜋𝑍∪𝑋∪𝑌 𝑓𝜇 and 𝑓 = 𝜋𝑍 𝑓𝜇 .

Further, we can show that 𝑓 ⊙ (𝑔 ⊕ ℎ) = 𝑓 ⊙ 𝑔 ⊙ (unit𝑋 ⊕ ℎ) = 𝑓 ⊙ ℎ ⊙ (unit𝑌 ⊕ 𝑔),

and thus:

𝑓 ⊙ 𝑔 = 𝜋𝑍∪𝑋 𝑓𝜇 and 𝑓 ⊙ ℎ = 𝜋𝑍∪𝑌 𝑓𝜇 .

These imply that 𝑔 (resp. ℎ) encodes the conditional distributions of 𝑋 (resp. 𝑌 )

given 𝑍 , and 𝑔⊕ ℎ encodes the conditional distribution of (𝑋,𝑌 ) given 𝑍 . Hence,

𝑓 ⊙(𝑔⊕ℎ) ⊑ 𝑓𝜇 implies that the conditional distribution of (𝑋,𝑌 ) given 𝑍 is equal

to the product distribution of 𝑋 given 𝑍 and 𝑌 given 𝑍 , and so 𝑋 ⊥⊥ 𝑌 | 𝑍 holds

in 𝜇.

For the reverse direction, suppose that 𝑋 ⊥⊥ 𝑌 | 𝑍 holds in 𝜇 and 𝑋 ∩ 𝑌 ⊆

𝑍 . Now, consider 𝜋𝑋∪𝑌∪𝑍 𝑓𝜇, the marginal distribution on (𝑋,𝑌, 𝑍) encoded as

a kernel, and observe that 𝜋𝑋,𝑌,𝑍 𝑓𝜇 = 𝑓 ⊙ 𝑓 ′, where 𝑓 encodes the marginal

distribution of 𝑍 , and 𝑓 ′ is the conditional distribution of (𝑋,𝑌 ) given values of

𝑍 . From (a), the conditional distribution of (𝑋,𝑌 ) given 𝑍 is the product of the

conditional distributions of 𝑋 given 𝑍 , and 𝑌 given 𝑍 , that is 𝑓 ′ = 𝑔 ⊕ ℎ, where 𝑔

(resp. ℎ) encode the conditional distribution of 𝑋 (resp. 𝑌 ) given 𝑍 . Then by (b),

134


𝑓 ⊙ (𝑔 ⊕ ℎ) is defined and 𝑓 ⊙ (𝑔 ⊕ ℎ) = 𝜋𝑋∪𝑌∪𝑍 𝑓𝜇 ⊑ 𝑓𝜇. It is straightforward to

see that 𝑓 ⊙ (𝑔 ⊕ ℎ) satisfies [𝑍] # ( [𝑋] ∗ [𝑌 ]). Hence, persistence shows that 𝑓𝜇

also satisfies [𝑍] # ( [𝑋] ∗ [𝑌 ]).

See lemma C.2.8 for details. □

4.2.3 Validating the Semi-graphoid Axioms

Notions analogous to conditional independence are useful in different domains.

For instance, in database theory [Abiteboul et al., 1995], join dependency, which

can be seen as conditional independence for powersets instead of distributions,

allows more efficient storage and querying of relational databases [Fagin and

Vardi, 1984]. There is a long line of research on logical characterizations of con-

ditional independence and join dependency. Graphoids is perhaps the most well-

known approach [Pearl and Paz, 1985]; later, Dawid [2001] has a similar notion

called separoids. Here, we focus on graphoids.

Definition 4.2.6 (Graphoids and semi-graphoids). Suppose that 𝐼 (𝑋, 𝑍,𝑌 ) is a

ternary relation on subsets of Var (i.e., 𝑋, 𝑍,𝑌 ⊆ Var). Then the relation 𝐼 is a

graphoid if it satisfies:

𝐼 (𝑋, 𝑍,𝑌 ) ⇔ 𝐼 (𝑌, 𝑍, 𝑋) (SYMMETRY)

𝐼 (𝑋, 𝑍,𝑌 ∪𝑊) ⇒ 𝐼 (𝑋, 𝑍,𝑌 ) ∧ 𝐼 (𝑋, 𝑍,𝑊) (DECOMPOSITION)

𝐼 (𝑋, 𝑍,𝑌 ∪𝑊) ⇒ 𝐼 (𝑋, 𝑍 ∪𝑊,𝑌 ) (WEAK UNION)

𝐼 (𝑋, 𝑍,𝑌 ) ∧ 𝐼 (𝑋, 𝑍 ∪ 𝑌,𝑊) ⇔ 𝐼 (𝑋, 𝑍,𝑌 ∪𝑊) (CONTRACTION)

𝐼 (𝑋, 𝑍 ∪𝑊,𝑌 ) ∧ 𝐼 (𝑋, 𝑍 ∪ 𝑌,𝑊) ⇒ 𝐼 (𝑋, 𝑍,𝑌 ∪𝑊) (INTERSECTION)

If 𝐼 satisfies the first four properties, then it is a semi-graphoid.

135


Because 𝐼 (𝑋, 𝑍,𝑌 ) intends to capture CI-like notions, these conditions aims

at axiomatizing the relation “knowing 𝑍 renders 𝑋 irrelevant to𝑌 .” As an exam-

ple, it is known that conditional independence relation forms a semi-graphoid:

if we fix a distribution over 𝜇 ∈ D(Mem[Var]), then taking 𝐼 (𝑋, 𝑍,𝑌 ) to be the

set of triples such that 𝑋 ⊥⊥ 𝑌 | 𝑍 holds in 𝜇 defines a semi-graphoid.

Below, we show that the semi-graphoid axioms can be naturally translated

into valid formulas in our probabilistic model.

Theorem 4.2.3. We abbreviate our probabilistic model as 𝑀 . Define 𝐼 (𝑋, 𝑍,𝑌 ) iff

𝑀 |= [𝑍] # ( [𝑋] ∗ [𝑌 ]). Then, SYMMETRY, DECOMPOSITION, WEAK UNION, and

CONTRACTION are valid. Furthermore, SYMMETRY is derivable in the proof system,

and DECOMPOSITION is derivable given the following axiom, valid in 𝑀 :

(𝑍 ⊲ 𝑌 ∪𝑊) ↔ (𝑍 ⊲ 𝑌 ) ∧ (𝑍 ⊲ 𝑊) (SPLIT)

Proof sketch. We show the proof for the derivable axioms. To derive SYMMETRY,

we use the ∗-COMM rule to commute the separating conjunction.

AX
𝑃 ⊢ 𝑃 ∗-COMM

𝑄 ∗ 𝑅 ⊢ 𝑅 ∗ 𝑄 #-Conj
𝑃 # (𝑄 ∗ 𝑅) ⊢ 𝑃 # (𝑅 ∗ 𝑄) →⊢ 𝑃 # (𝑄 ∗ 𝑅) → 𝑃 # (𝑅 ∗ 𝑄)

The proof of DECOMPOSITION uses the axiom SPLIT to split up 𝑌 ∪𝑊 , and

then uses proof rules to derive the following.

AX
𝑃 ⊢ 𝑃

AX
𝑄 ⊢ 𝑄

AX
𝑅 ∧ 𝑆 ⊢ 𝑅 ∧ 𝑆 ∧3
𝑅 ∧ 𝑆 ⊢ 𝑅 ∗-CONJ

𝑄 ∗ (𝑅 ∧ 𝑆) ⊢ 𝑄 ∗ 𝑅
#-CONJ

𝑃 # (𝑄 ∗ (𝑅 ∧ 𝑆)) ⊢ 𝑃 # (𝑄 ∗ 𝑅)
Similar to left

𝑃 # (𝑄 ∗ (𝑅 ∧ 𝑆)) ⊢ 𝑃 # (𝑄 ∗ 𝑆)
∧1

𝑃 # (𝑄 ∗ (𝑅 ∧ 𝑆)) ⊢ 𝑃 # (𝑄 ∗ 𝑅) ∧ 𝑃 # (𝑄 ∗ 𝑆)→ ⊢ 𝑃 # (𝑄 ∗ (𝑅 ∧ 𝑆)) → 𝑃 # (𝑄 ∗ 𝑅) ∧ 𝑃 # (𝑄 ∗ 𝑆)

136


Thus, as an instance

⊢(∅ ⊲ 𝑍) # ((𝑍 ⊲ 𝑋) ∗ ((𝑍 ⊲ 𝑌 ) ∧ (𝑍 ⊲ 𝑊)))

→ (∅ ⊲ 𝑍) # ((𝑍 ⊲ 𝑋) ∗ (𝑍 ⊲ 𝑌 )) ∧ (∅ ⊲ 𝑍) # ((𝑍 ⊲ 𝑋) ∗ (𝑍 ⊲ 𝑊))

Combine that with eq. (SPLIT), we have

⊢(∅ ⊲ 𝑍) # ((𝑍 ⊲ 𝑋) ∗ ((𝑍 ⊲ 𝑌 ∪𝑊)))

→ (∅ ⊲ 𝑍) # ((𝑍 ⊲ 𝑋) ∗ (𝑍 ⊲ 𝑌 )) ∧ (∅ ⊲ 𝑍) # ((𝑍 ⊲ 𝑋) ∗ (𝑍 ⊲ 𝑊))

We prove validity of WEAK UNION and CONTRACTION in appendix C.2.3.

□

Our conference paper Bao et al. [2021] in addition introduces a relational

model of DIBI, where [𝑍] # ( [𝑋] ∗ [𝑌 ]) asserts join dependency, and the semi-

graphoid axioms can be translated into valid formulas in the relational model

as well.

4.3 Conditional Probabilistic Separation Logic

Conditional independence of program variables can be subtle to reason about,

motivating formal methods for proving it. We design a program logic CPSL

for formally proving conditional independence in a simplified probabilistic im-

perative language. The language has assignments, sampling, sequencing, and

conditionals, but no loops, which would make the reasoning even trickier. Here,

our goal is to simply show how a DIBI-based program logic could work in a ba-

sic setting.

137


4.3.1 CPSL: Assertion Logic

Like PSL and LINA, CPSL is constructed in two layers: the assertion logic de-

scribes program states — probability distributions here — while the program

logic describes probabilistic programs, using the assertion logic to specify pre-

and post-conditions. Our starting point for the assertion logic is the probabilistic

model of DIBI introduced in section 4.2, with atomic assertions we introduced

to assert conditional independence in section 4.2.2. We encode distributions

as Markov kernels with domain Mem[∅] in order to interpret DIBI on program

states. However, it turns out that the full logic DIBI is not suitable for a program

logic. The main problem is that not all formulas in DIBI satisfy a key technical

condition, the restriction property.

Definition 4.3.1 (Restriction). A formula 𝑃 satisfies restriction if: a Markov ker-

nel 𝑓 satisfies 𝑃 iff there exists 𝑓 ′ ⊑ 𝑓 such that range( 𝑓 ′) ⊆ FV(𝑃) and 𝑓 ′ |= 𝑃.

A similar restriction property plays an important role in the soundness of

Frame-like rules in PSL and LINA because formulas satisfying restriction are

preserved if the program does not modify variables appearing in the formula.

Here, we also need it, not only to prove FRAME but also to reason about how

the preconditions are preserved in ASSN, SAMP and COND.

Thus, we want to show that the restriction property holds for DIBI formulas.

The reverse direction is immediate by persistence, but the forward direction is

more delicate – there are simple formulas where restriction fails.

Example 4.3.1 (Failure of restriction). Consider the kernel 𝑓 : Mem[𝑧] → D(Mem[𝑥, 𝑧])

with 𝑓 (𝑧 ↦→ 𝑐) := unit(𝑥 ↦→ 𝑐, 𝑧 ↦→ 𝑐). We can show that 𝑓 satisfies the

formula 𝜑 := ⊤ # (𝑥 ⊲ Own(𝑥)): letting 𝑓1 : Mem[𝑧] → D(Mem[𝑥, 𝑧]) and

138


𝑓2 : Mem[𝑥, 𝑧] → D(Mem[𝑥, 𝑧]) with

𝑓1(𝑧 ↦→ 𝑐) := unit(𝑥 ↦→ 𝑐, 𝑧 ↦→ 𝑐)

𝑓2 := unitMem[𝑥] ⊕ unitMem[𝑧]

then we have 𝑓1 |= ⊤ and 𝑓2 |= (𝑥 ⊲ [𝑥]). Also, 𝑓 = 𝑓1 ⊙ 𝑓2, so

𝑓 |= ⊤ # (𝑥 ⊲ [𝑥]).

Since FV(𝜑) = {𝑥}, any subkernel 𝑓 ′ ⊑ 𝑓 simultaneously satisfying 𝜑 and wit-

nessing restriction must be of type 𝑓 ′ : Mem[𝑧] → D(Mem[𝑥]), but there are no

input-preserving kernels of that type.

To address this problem, we will identify a fragment of DIBI that satis-

fies restriction and is sufficiently rich to support an interesting program logic.

Intuitively, restriction may fail for 𝜑 when the satisfaction of 𝜑 implicitly re-

quires unexpected variables in the domain of the kernel, or 𝜑 does not describe

needed variables in its range. Thus, we employ syntactic conditions to over-

approximate variables that can appear in the domain of a kernel satisfying 𝜑

as FVD(𝜑) and under-approximate variables that can appear in the range as

FVR(𝜑).

Definition 4.3.2 (FVD and FVR). For DIBI formulas generated by probabilistic

atomic propositions, conjunctions (∧, ∗, #) and disjunction (∨), we define two

139


sets of variables:

FVD(⊤) = FVD(⊥) := ∅ FVR(⊤) = FVR(⊥) := ∅

FVD(𝐴 ⊲ 𝐵) := FV(𝐴) FVR(𝐴 ⊲ 𝐵) := FV(𝐴) ∪ FV(𝐵)

FVD(𝑃 ∧𝑄) := FVD(𝑃) ∪ FVD(𝑄) FVR(𝑃 ∧𝑄) := FVR(𝑃) ∪ FVR(𝑄)

FVD(𝑃 ∗ 𝑄) := FVD(𝑃) ∪ FVD(𝑄) FVR(𝑃 ∗ 𝑄) := FVR(𝑃) ∪ FVR(𝑄)

FVD(𝑃 #𝑄) := FVD(𝑃) ∪ FVD(𝑄) FVR(𝑃 #𝑄) := FVR(𝑃) ∪ FVR(𝑄)

FVD(𝑃 ∨𝑄) := FVD(𝑃) ∪ FVD(𝑄) FVR(𝑃 ∨𝑄) := FVR(𝑃) ∩ FVR(𝑄)

Now, we have all the ingredients to introduce our assertions. The logic DIBI+

is a fragment of DIBI with atomic propositions AP, with formulas DIBI+ de-

fined by the following grammar:

𝑃,𝑄 ::= AP | ⊤ | ⊥ | 𝑃 ∨𝑄 | 𝑃 ∗ 𝑄

| 𝑃 #𝑄 (FVD(𝑄) ⊆ FVR(𝑃))

| 𝑃 ∧𝑄 (FVR(𝑃) = FVR(𝑄) = FV(𝑃) = FV(𝑄)).

The side-condition for 𝑃 # 𝑄 ensures that variables used by 𝑄 are described by

𝑃. The side-condition for 𝑃 ∧𝑄 is the most restrictive — to understand why we

need it, consider the following example.

Example 4.3.2 (Failure of restriction for And). Consider the formula

𝑃 := (∅ ⊲ {𝑥}) ∧ (∅ ⊲ {𝑦}), and kernel 𝑓 : Mem[𝑧] → D(Mem[𝑥, 𝑦, 𝑧])with 𝑓 (𝑧 ↦→ tt)

being the distribution with 𝑥 a fair coin flip, 𝑦 = 𝑥, and 𝑧 = tt, and 𝑓 (𝑧 ↦→ ff ) being

the distribution with 𝑥 a fair coin flip, 𝑦 = ¬𝑥, and 𝑧 = ff .

Then, there exist 𝑓1 : Mem[∅] → D(Mem[𝑥]) and 𝑓2 : Mem[∅] → D(Mem[𝑦])

such that 𝑓1 ⊑ 𝑓 and 𝑓2 ⊑ 𝑓 . Since 𝑓1 |= (∅ ⊲ {𝑥}) and 𝑓2 |= (∅ ⊲ {𝑦}), it follows

𝑓 |= 𝑃. But, because 𝑧 is correlated with (𝑥, 𝑦), there is no kernel 𝑓 ′ : Mem[∅] →

140


D(Mem[𝑥, 𝑦]) satisfying 𝑃 such that 𝑓 ′ ⊑ 𝑓 because that means 𝑓 can be ob-

tained by parallel combination of 𝑓 ′with another kernel with domain {𝑧}, which

requires them to be independent.

With atomic propositions introduced to express conditional independence,

i.e., (𝐴 ⊲ 𝐵) where 𝐴, 𝐵 ⊆ Var, all formulas in DIBI+ satisfy the restriction

property. But before proving the restriction property for DIBI+, we enrich the

atomic propositions to describe more fine-grained information about the do-

main and range of kernels, and then show that DIBI+ with the enriched set of

atomic propositions still satisfies the restriction property. In particular, we want

to enrich the atomic propositions in the following ways.

Domain. Given a kernel 𝑓 , the existing atomic propositions (𝐴 ⊲ 𝐵) can only

describe properties that hold for all (well-typed) inputs 𝑚 to 𝑓 . We would

like to be able to describe properties that hold for only certain inputs, e.g.,

for memories 𝑚 where a variable 𝑧 is true.

Range. Given any input 𝑚 to a kernel 𝑓 , the existing atomic propositions can

only guarantee the presence of variables in the output distribution 𝑓 (𝑚).

We would like to describe more precise information about 𝑓 (𝑚), e.g., that

certain variables are independent conditioned on a particular value of 𝑚,

rather than on all values of 𝑚.

Thus, we extend atomic propositions to all pairs of logical formula (𝜙 ⊲ 𝜓),

where 𝜙 is a logical formula over the kernel domain (i.e., memories), while 𝜓 is

a logical formula over the kernel range (i.e., distributions over memories).

To describe memories, we take a simple propositional logic.

141


Definition 4.3.3 (Domain logic). The domain logic has formulas 𝜙 of the form

𝑆 : 𝑝𝑑 , where 𝑆 ⊆ Var is a subset of variables and

𝑝𝑑 ::= [𝑒1 = 𝑒2] | ⊤ | ⊥ | 𝑝𝑑 ∧ 𝑝′𝑑 | 𝑝𝑑 ∨ 𝑝
′
𝑑 .

A formula 𝑆 : 𝑝𝑑 is satisfied in by a memory𝑚, written𝑚 |=𝑑 𝑆 : 𝑝𝑑 , if dom(𝑚) = 𝑆

and 𝑝𝑑 holds in 𝑚. In particular, [𝑒1 = 𝑒2] holds in 𝑚 iff ⟦𝑒1⟧(𝑚) = ⟦𝑒2⟧(𝑚).

We read 𝑆 : 𝑝𝑑 as “memories over 𝑆 such that 𝑝𝑑” and abbreviate 𝑆 : ⊤ as 𝑆.

To describe distributions over memories, we adapt formulas in probabilistic

BI for the range logic.

Definition 4.3.4 (Range logic). The range logic has the following formulas from

probabilistic BI:

𝑝𝑟 ::= [𝑆] | 𝑥 $∼ 𝑑 | [𝑥 = 𝑒] | ⊤ | ⊥ | 𝑝𝑟 ∧ 𝑝′𝑟 | 𝑝𝑟 ∗ 𝑝′𝑟 .

We give a semantics where states are distributions over memories:

𝑀𝑟 = {𝜇 : D(Mem[𝑆]) | 𝑆 ⊆ Var}.

We define a preorder on states via 𝜇1 ⊑𝑟 𝜇2 if and only if dom(𝜇1) ⊆ dom(𝜇2)

and 𝜋dom(𝜇1)𝜇2 = 𝜇1, and we define a partial binary operation on states: for any

𝜇1 ∈ D(Mem[𝑆]) and 𝜇2 ∈ D(Mem[𝑇]),

𝜇1 ⊕𝑟 𝜇2 :=


{𝜋𝑆\𝑇𝜇1 ⊗ 𝛿𝑚 ⊗ 𝜋𝑇\𝑆𝜇2} if ∃𝑚 ∈ Mem[𝑆 ∩ 𝑇] s.t. 𝜋𝑆∩𝑇𝜇1 = 𝜋𝑆∩𝑇𝜇2 = 𝛿𝑚

{} otherwise

where ⊗ takes the independent product of two distributions over disjoint do-

mains. That is, for any 𝑥 ∈ D(Mem[𝑆 ∪ 𝑇]) ,

(𝜋𝑆\𝑇𝜇1 ⊗ 𝛿𝑚 ⊗ 𝜋𝑇\𝑆𝜇2) (𝑥) := 𝜋𝑆\𝑇𝜇1(𝜋𝑆\𝑇𝑥) · 1 · 𝜋𝑇\𝑆𝜇2(𝜋𝑇\𝑆𝑥)

142


This operation generalizes the monoid from the probabilistic BI frame to allow

for combining distributions with overlapping domains if the distributions over

the overlap is deterministic and equal; this mild generalization is useful for our

setting, where distributions often have deterministic variables (e.g., variables

corresponding to the input of kernels).

Then, we define the semantics of the range logic as:

𝜇 |=𝑟 ⊤ always

𝜇 |=𝑟 ⊥ never

𝜇 |=𝑟 [𝑠] iff 𝑠 ⊆ dom(𝜇) or 𝑠 ∈ dom(𝜇)

𝜇 |=𝑟 𝑒 $∼ 𝑑 iff FV(𝑒) ⊆ dom(𝜇) and ⟦𝑒⟧(𝜇) = 𝑑

𝜇 |=𝑟 [𝑒1 = 𝑒2] iff FV(𝑒1) ∪ FV(𝑒2) ⊆ dom(𝜇) and ⟦𝑒⟧(𝑚) = ⟦𝑒′⟧(𝑚)

for any 𝑚 in the support of 𝜇

𝜇 |=𝑟 𝑝𝑟 ∧ 𝑝′𝑟 iff 𝜇 |=𝑟 𝑝𝑟 and 𝜇 |=𝑟 𝑝′𝑟
𝜇 |=𝑟 𝑝𝑟 ∗ 𝑝′𝑟 iff there exists 𝜇1 ⊕𝑟 𝜇2 ⊑ 𝜇 with 𝜇1 |=𝑟 𝑝𝑟 and 𝜇2 |=𝑟 𝑝′𝑟 .

We only use domain formulas 𝜙 and range formulas 𝜓 in the enriched atomic

propositions of the form (𝜙 ⊲ 𝜓), so we do not need to show formulas in the do-

main logic are persistent and similarly formulas in the range logic are persistent.

Now, we can give a semantics to our enriched atomic propositions.

Definition 4.3.5. Given a kernel 𝑓 and atomic proposition (𝜙 ⊲ 𝜓), we define

𝑓 |= (𝜙 ⊲ 𝜓) iff there exists 𝑓 ′ ⊑ 𝑓 such that 𝑚 |=𝑑 𝜙 implies 𝑚 ∈ dom( 𝑓 ′) and

𝑓 (𝑚) |=𝑟 𝜓.

This valuation is persistent by construction. Furthermore, formulas in DIBI+

with these atomic propositions satisfy restriction.

Theorem 4.3.1 (Restriction in DIBI+). Let 𝑃 ∈ DIBI+ with atomic propositions (𝜙 ⊲

143


𝜓), as described above. Then 𝑓 |= 𝑃 if and only if there exists 𝑓 ′ ⊑ 𝑓 such that

range( 𝑓 ′) ⊆ FV(𝑃) and 𝑓 ′ |= 𝑃.

Proof sketch. We prove a stronger statement by induction on 𝑃: 𝑓 |= 𝑃 if and only

if there exists 𝑓 ′ ⊑ 𝑓 such that dom( 𝑓 ′) ⊆ FVD(𝑃), and FVR(𝑃) ⊆ range( 𝑓 ′) ⊆

FV(𝑃). □

Last, atomic propositions satisfy some axiom schemas, inspired by proof

rules of BI.

Proposition 4.3.2. The following axiom schemas for atomic propositions are valid.

(𝑆 : 𝑝𝑑 ⊲ 𝑝𝑟) ∧ (𝑆 : 𝑝′𝑑 ⊲ 𝑝
′
𝑟) → (𝑆 : 𝑝𝑑 ∧ 𝑝′𝑑 ⊲ 𝑝𝑟 ∧ 𝑝

′
𝑟) if FV(𝑝𝑟) = FV(𝑝′𝑟)

(AP-AND)

(𝑆 : 𝑝𝑑 ⊲ 𝑝𝑟) ∧ (𝑆 : 𝑝′𝑑 ⊲ 𝑝
′
𝑟) → (𝑆 : 𝑝𝑑 ∨ 𝑝′𝑑 ⊲ 𝑝𝑟 ∨ 𝑝

′
𝑟) (AP-OR)

(𝑆 : 𝑝𝑑 ⊲ 𝑝𝑟) ∗ (𝑆′ : 𝑝′𝑑 ⊲ 𝑝
′
𝑟) → (𝑆 ∪ 𝑆′ : 𝑝𝑑 ∧ 𝑝′𝑑 ⊲ 𝑝𝑟 ∗ 𝑝

′
𝑟) (AP-PAR)

𝑝′𝑑 → 𝑝𝑑 and |=𝑟 𝑝𝑟 → 𝑝′𝑟 implies |= (𝑆 : 𝑝𝑑 ⊲ 𝑝𝑟) → (𝑆 : 𝑝′𝑑 ⊲ 𝑝
′
𝑟) (AP-IMP)

We omitted the proofs to appendix C.3.2.

4.3.2 Conditional Probabilistic Separation Logic (CPSL)

With the assertion logic set, we are now ready to introduce our program logic.

We call it Conditional Probabilistic Separation Logic, abbreviated as CPSL.

Judgments in CPSL have the form {𝑃} 𝑐 {𝑄}, where 𝑐 is a loopless probabilistic

program in pWhile and 𝑃,𝑄 ∈ DIBI+ are restricted assertions serving as the pre-

and post-conditions. As usual, a judgment holds if the program in the judgment

144


ASSN
𝑥 ∉ FV(𝑒) ∪ FV(𝑃)

⊢ {𝑃} 𝑥 ← 𝑒 {𝑃 # (FV(𝑒) ⊲ [𝑥 = 𝑒])}
SAMP

𝑥 ∉ FV(𝑃)
⊢ {𝑃} 𝑥 $← 𝑑 {𝑃 # (∅ ⊲ 𝑥 $∼ 𝑑)}

SKIP
⊢ {𝑃} skip {𝑃}

SEQN
⊢ {𝑃} 𝑐 {𝑄} ⊢ {𝑄} 𝑐′ {𝑅}

⊢ {𝑃} 𝑐 ; 𝑐′ {𝑅}

COND

⊢ {(∅ ⊲ [𝑏 = tt]) # 𝑃} 𝑐 {(∅ ⊲ [𝑏 = tt]) # (𝑏 : [𝑏 = tt] ⊲ 𝑄1)}
⊢ {(∅ ⊲ [𝑏 = ff ]) # 𝑃} 𝑐′ {(∅ ⊲ [𝑏 = ff ]) # (𝑏 : [𝑏 = ff ] ⊲ 𝑄2)}

⊢ {(∅ ⊲ [𝑏]) # 𝑃} if 𝑏 then 𝑐 else 𝑐′ {(∅ ⊲ [𝑏]) # ((𝑏 : [𝑏 = tt] ⊲ 𝑄1) ∧ (𝑏 : [𝑏 = ff ] ⊲ 𝑄2))}

WEAK

⊢ {𝑃} 𝑐 {𝑄}
|= 𝑃′ → 𝑃 ∧𝑄 → 𝑄′

⊢ {𝑃′} 𝑐 {𝑄′}
FRAME

⊢ {𝑃} 𝑐 {𝑄} FV(𝑅) ∩MV(𝑐) = ∅
FV(𝑄) ⊆ FVR(𝑃) ∪WV(𝑐) RV(𝑐) ⊆ FVR(𝑃)

⊢ {𝑃 ∗ 𝑅} 𝑐 {𝑄 ∗ 𝑅}

Figure 4.5: Proof rules: CPSL

maps states satisfying the pre-condition to states satisfying the post-condition.

One small difference is that DIBI+ formulas are interpreted on kernels while

the program states are distributions — the mismatch is handled by the natural

lifting of the distributions to kernels.

Definition 4.3.6 (CPSL Validity). A CPSL judgment {𝑃} 𝑐 {𝑄} is valid, written

|= {𝑃} 𝑐 {𝑄}, if for every input distribution 𝜇 ∈ D(Mem[Var]) such that the

lifted input 𝑓𝜇 ≜ ⟨⟩ ↦→ 𝜇 satisfies 𝑓𝜇 |= 𝑃, the lifted output satisfies 𝑓⟦𝑐⟧𝜇 |= 𝑄.

The proof rules of CPSL are presented in Figure 4.5. Note that the require-

ment that the assertions in the judgments are in DIBI+ poses implicit side condi-

tions. For example, the rule ASSN requires that the post-condition 𝑃 # (FV(𝑒) ⊲

𝑥 = 𝑒) is a formula in DIBI+, which in turn requires that FV(𝑒) ⊆ FVR(𝑃).

The rules SKIP, SEQ, WEAK are standard, we comment on the other, more

interesting rules. ASSN and SAMP allow forward reasoning across assignments

and random sampling commands. In both cases, a pre-condition that does not

mention the assigned variable 𝑥 is augmented with new information tracking

145


the value or distribution of 𝑥, and variables 𝑥 may depend on.

COND allows reasoning about probabilistic control flow, and the ensu-

ing conditional dependence that may result. The main pre-condition 𝑃 is al-

lowed to depend on the guard variable 𝑏 but nothing else — because we need

FVD(𝑃) ⊆ FVR(∅ ⊲ [𝑏]) for the formula to be in DIBI+ — and 𝑃 is preserved

as a pre-condition for both branches. The post-conditions allows introducing

new facts (𝑏 : 𝑏 = tt ⊲ 𝑄1) and (𝑏 : 𝑏 = ff ⊲ 𝑄2), which are then combined in the

post-condition of the entire conditional command. As in PSL, the rule for con-

ditionals does not allow the branches to modify the guard 𝑏 — this restriction is

needed to accurately associate each post-condition to each branch.

Finally, FRAME is the frame rule for CPSL. Much like in PSL, the rule in-

volves three classes of variables: MV(𝑐) is the set of variables that 𝑐 may write

to, RV(𝑐) is the set of variables that 𝑐 may read from the input, and WV(𝑐) is the

set of variables that 𝑐 must write to; these variable sets are defined as in defini-

tion 2.3.8. Then, the first side-condition FV(𝑅) ∩MV(𝑐) = ∅ of FRAME ensures

that the framing condition is not modified, which is a fairly standard condition

in frame-like rules. The second and third side-conditions are more specialized.

Observe that the variables described by 𝑄 in the post-condition are either al-

ready described by 𝑃 in the pre-condition, or are written by 𝑐. These two side

conditions ensure that variables mentioned by𝑄 that were not already indepen-

dent of 𝑅 are freshly written, and freshly written variables are computed using

variables that were already independent of 𝑅 in the precondition, which can be

guaranteed if the variables 𝑐 reads from are all in FV(𝑃).

Theorem 4.3.3 (CPSL Soundness). CPSL is sound: derivable judgments are valid.

Proof sketch. By induction on the proof derivation. The restriction property is

146


𝑧 $← Bern1/2;
𝑥 $← Bern1/2;
𝑦 $← Bern1/2;
𝑎 ← 𝑥 ∨ 𝑧;
𝑏 ← 𝑦 ∨ 𝑧

(a) COMMONCAUSE

𝑧 $← Bern1/2;
if 𝑧 then

𝑥 $← Bern𝑝; 𝑦 $← Bern𝑝
else

𝑥 $← Bern𝑞; 𝑦 $← Bern𝑞

(b) CONDSAMPLES

Figure 4.6: Example programs

used repeatedly to constrain the domains and ranges of kernels witnessing dif-

ferent sub-assertions, ensuring that pre-conditions about unmodified variables

continue to hold in the post-condition. □

We include the full proof in appendix C.4.

4.3.3 Example: CPSL in Action

Now, we demonstrate CPSL on two example programs.

Example 4.3.3. Figure 4.6 introduces two more example programs. The pro-

gram COMMONCAUSE (Figure 4.6a) generates a distribution where two ran-

dom observations share a common cause. Specifically, 𝑧, 𝑥, and 𝑦 are indepen-

dent random samples, and 𝑎 and 𝑏 are values computed from (𝑥, 𝑧) and (𝑦, 𝑧),

respectively. Intuitively, 𝑧, 𝑥, and 𝑦 could represent independent noisy measure-

ments, while 𝑎 and 𝑏 could represent quantities derived from these measure-

ments. Since 𝑎 and 𝑏 share a common source of randomness 𝑧, they are not

independent. However, 𝑎 and 𝑏 are independent conditioned on the value of 𝑧;

this is a textbook example of conditional independence.

The program CONDSAMPLES (Figure 4.6b) is a bit more complex: it branches

147


on a random value 𝑧, and then assigns 𝑥 and 𝑦 with two independent samples

from Bern𝑝 in the true branch, and Bern𝑞 in the false branch (𝑝, 𝑞 are constant

value in [0, 1]). While we might think that 𝑥 and 𝑦 are independent at the end

of the program since they are independent at the end of each branch, this is not

true because their distributions are different in the two branches. For example,

suppose that 𝑝 = 1 and 𝑞 = 0. Then at the end of the first branch (𝑥, 𝑦) = (tt, tt)

with probability 1, while at the end of the second branch (𝑥, 𝑦) = (ff , ff ) with

probability 1. Thus, observing whether 𝑥 = tt or 𝑥 = ff determines the value of

𝑦 — clearly, 𝑥 and 𝑦 can’t be independent. However, 𝑥 and 𝑦 are independent

conditioned on 𝑧.

In both cases, we will prove a conditional independence assertion as the

post-condition. We will need some axioms for implications between formulas

in DIBI+. The following axioms are valid in our probabilistic model X𝐶𝐼 .

Proposition 4.3.4. (AXIOMS FOR DIBI+) The following axioms are sound, assuming

both precedent and antecedent are in DIBI+.

(𝑃 #𝑄) # 𝑅 → 𝑃 # (𝑄 ∗ 𝑅) (INDEP-1)

𝑃 #𝑄 → 𝑃 ∗ 𝑄 if FVD(𝑄) = ∅ (INDEP-2)

𝑃 #𝑄 → 𝑃 # (𝑄 ∗ (𝑆 ⊲ [𝑆])) (PAD)

(𝑃 ∗ 𝑄) # (𝑅 ∗ 𝑆) → (𝑃 # 𝑅) ∗ (𝑄 # 𝑆) (RESTEXCH)

We briefly explain the axioms. INDEP-1 may look surprising, and it does not

hold if we do not require the formulas to be in DIBI+. Under this assumption, it

holds because 𝑃 # (𝑄 ∗ 𝑅) ∈ DIBI+ implies that 𝑅 only mentions variables that

are guaranteed to be in 𝑃, and then with some maneuver, we can change one

sequential composition into a parallel composition. INDEP-2 holds because any

148


kernel witnessing 𝑄 depends on no variables and thus is independent of any

kernel witnessing 𝑃. PAD allows conjoining (𝑆 ⊲ [𝑆]) to the second conjunct:

since 𝑃 # (𝑄 ∗ (𝑆 ⊲ [𝑆])) is in DIBI+, 𝑆 can only mention variables that are already

in 𝑃. Finally, RESTEXCH shows that the standard exchange law also holds for

restricted assertions. We defer the proof to Appendix C.3.2.

We also need the following axioms for a particular form of atomic propo-

sitions, in addition to the axioms for general atomic propositions in Proposi-

tion 4.3.2.

Proposition 4.3.5. (AXIOMS FOR ATOMIC PROPOSITIONS) The following axioms

are sound. For any 𝑆, 𝐴, 𝐵, 𝐶 ⊆ Var,

(𝑆 ⊲ [𝐴] ∗ [𝐵]) → (𝑆 ⊲ [𝐴]) ∗ (𝑆 ⊲ [𝐵]) if 𝐴 ∩ 𝐵 ⊆ 𝑆 (REVPAR)

(𝑆 ⊲ [𝐴] ∗ [𝐵]) → (𝑆 ⊲ [𝐴 ∪ 𝐵]) (UNIONRAN)

(𝐴 ⊲ 𝐵) # (𝐵 ⊲ 𝐶) → (𝐴 ⊲ 𝐶) (ATOMSEQ)

(𝐴 ⊲ 𝐵) → (𝐴 ⊲ 𝐴) # (𝐴 ⊲ 𝐵) (UNITL)

(𝐴 ⊲ 𝐵) → (𝐴 ⊲ 𝐵) # (𝐵 ⊲ 𝐵) (UNITR)

We defer the proof to Appendix C.3.2.

Now, we have all the ingredients for verifying our example programs,

COMMONCAUSE and CONDSAMPLES. Throughout, we must ensure that all

formulas used in CPSL rules and DIBI+ axioms are in DIBI+. The conjunction #

raises a tricky point: DIBI+ is not closed under reassociating #, so we add paren-

theses for formulas that must be in DIBI+. However, we may soundly use the

full proof system of DIBI when proving implications between DIBI+ assertions,

since DIBI+ is a fragment of DIBI.

149


Verification of COMMONCAUSE We aim to prove the following judgment:

⊢ {⊤} COMMONCAUSE {(∅ ⊲ [𝑧]) # ((𝑧 ⊲ [𝑎]) ∗ (𝑧 ⊲ [𝑏]))}

By Theorem 4.2.2, this shows that 𝑎, 𝑏 are conditionally independent given 𝑧 at

the end of the program. First, using SAMP to handle the sampling for 𝑧, 𝑥, 𝑦, we

can prove the assertion: (∅ ⊲ [𝑧]) # (∅ ⊲ [𝑥]) # (∅ ⊲ [𝑦]). Using Axioms PAD, AP-

PAR, UNIONRAN, and # ASSOC, this assertion implies (∅ ⊲ [𝑧]) # (𝑧 ⊲ [𝑧, 𝑥]) # (𝑧 ⊲

[𝑧, 𝑦]).

(
(∅ ⊲ [𝑧]) # (∅ ⊲ [𝑥])

)
# (∅ ⊲ [𝑦])

PAD(
(∅ ⊲ [𝑧]) # ((∅ ⊲ [𝑥]) ∗ (𝑧 ⊲ [𝑧]))

)
# ((∅ ⊲ [𝑦]) ∗ (𝑧 ⊲ [𝑧])) ∗ (𝑧 ⊲ [𝑧])

AP-PAR(∅ ⊲ [𝑧]) # (𝑧 ⊲ [𝑧] ∗ [𝑥]) # (𝑧 ⊲ [𝑧] ∗ [𝑦])
UNIONRAN(∅ ⊲ [𝑧]) # (𝑧 ⊲ [𝑧, 𝑥]) # (𝑧 ⊲ [𝑧, 𝑦])

We take the proved formula as the pre-condition before assigning to 𝑎 and as-

signing to 𝑏. After the assignments, ASSN proves:( (
(∅ ⊲ [𝑧]) # (𝑧 ⊲ [𝑧, 𝑥]) # (𝑧 ⊲ [𝑧, 𝑦])

)
# (𝑧, 𝑥 ⊲ [𝑎])

)
# (𝑧, 𝑦 ⊲ [𝑏]).

Then, we can reassociate and apply INDEP-1 to derive:

(∅ ⊲ [𝑧]) #
(
(𝑧 ⊲ [𝑧, 𝑥]) # (𝑧, 𝑥 ⊲ [𝑎])

)
∗

(
(𝑧 ⊲ [𝑧, 𝑦]) # (𝑧, 𝑦 ⊲ [𝑏])

)
.

By Axiom ATOMSEQ, we obtain the desired post-condition:

(∅ ⊲ [𝑧]) # ((𝑧 ⊲ [𝑎]) ∗ (𝑧 ⊲ [𝑏])).

□

Verification of CONDSAMPLES We aim to show the following judgment:

⊢ {⊤} CONDSAMPLES {(∅ ⊲ [𝑧]) # ((𝑧 ⊲ [𝑥]) ∗ (𝑧 ⊲ [𝑦]))}

150


Again, by Theorem 4.2.2, this shows that 𝑥, 𝑦 are conditionally independent

given 𝑧 at the end of the program. Starting with the sampling statement for

𝑧, applying SAMP, the Axiom INDEP-2, ∗-UNIT and #-UNIT-R gives:

⊢ {⊤} 𝑧 $← Bern1/2 {(∅ ⊲ [𝑧]) # ⊤} .

To reason about the branching, we use COND. We start with the

first branch. By SAMP, ASSN, SKIP, WEAK and SEQ, we have ⊢

{(∅ ⊲ 𝑧 = 𝑡𝑡) # ⊤} 𝑥 $← Bern𝑝 # 𝑦 $← Bern𝑝 {(∅ ⊲ 𝑧 = tt) # (∅ ⊲ [𝑥]) # (∅ ⊲ [𝑦])}. As

before, Axioms PAD, AP-PAR, UNIONRAN, together with # ASSOC give the post-

condition

(∅ ⊲ 𝑧 = tt) # (𝑧 ⊲ [𝑧, 𝑥]) # (𝑧 ⊲ [𝑧, 𝑦]).

Applying Axiom INDEP-1, we can show (∅ ⊲ 𝑧 = tt) # ((𝑧 ⊲ [𝑧, 𝑥]) ∗ (𝑧 ⊲ [𝑧, 𝑦])) at

the end of the branch. Thus:

⊢ {(∅ ⊲ 𝑧 = 𝑡𝑡) # ⊤} 𝑥 $← Bern𝑝 # 𝑦 $← Bern𝑝 {(∅ ⊲ 𝑧 = tt) # (𝑧 : 𝑧 = tt ⊲ [𝑧, 𝑥] ∗ [𝑧, 𝑦])} .

The second branch is similar:

⊢ {(∅ ⊲ 𝑧 = 𝑓 𝑓 ) # ⊤} 𝑥 $← Bern𝑞 # 𝑦 $← Bern𝑞 {(∅ ⊲ 𝑧 = ff ) # (𝑧 : 𝑧 = ff ⊲ [𝑧, 𝑥] ∗ [𝑧, 𝑦])} .

Applying COND, we have:

⊢

{(∅ ⊲ [𝑧])}

CONDSAMPLES

{(∅ ⊲ [𝑧]) # ((𝑧 : 𝑧 = tt ⊲ [𝑧, 𝑥] ∗ [𝑧, 𝑦]) ∧ (𝑧 = ff ⊲ [𝑧, 𝑥] ∗ [𝑧, 𝑦]))}

By AP-OR, the postcondition implies

(∅ ⊲ [𝑧]) # ((𝑧 : 𝑧 = tt ∨ 𝑧 = ff ) ⊲ [𝑧, 𝑥] ∗ [𝑧, 𝑦] ∨ [𝑧, 𝑥] ∗ [𝑧, 𝑦]).

In the domain and range logic, we have: |=𝑑 𝑧 : ⊤ → 𝑧 : (𝑧 = tt ∨ 𝑧 = ff ) and

|=𝑟 [𝑧, 𝑥] ∗ [𝑧, 𝑦] ∨ [𝑧, 𝑥] ∗ [𝑧, 𝑦] → [𝑧, 𝑥] ∗ [𝑧, 𝑦] .

151


So AP-IMP implies (∅ ⊲ [𝑧]) # (𝑧 ⊲ [𝑧, 𝑥] ∗ [𝑧, 𝑦]). We can then apply REVPAR

because {𝑧, 𝑥}∩ {𝑧, 𝑦} = 𝑧, deriving the postcondition (∅ ⊲ [𝑧]) # ((𝑧 ⊲ [𝑧, 𝑥]) ∗ (z ⊲

[𝑧, 𝑦])). By Axiom SPLIT, we obtain the desired post-condition: (∅ ⊲ [𝑧]) # ((𝑧 ⊲

[𝑥]) ∗ (z ⊲ [𝑦])). □

4.4 Related Work

While our program logic is the first separation logic for proving conditional in-

dependence, related work has explored other approaches to capture dependen-

cies and independence and has potential to lead to alternative formal methods

for reasoning about conditional independence.

Other non-classical logics for modeling dependencies There are other non-

classical logics that aim to model dependencies. Independence-friendly (IF)

logic [Hintikka and Sandu, 1989] and dependence logic [Väänänen, 2007] intro-

duce new quantifiers and propositional atoms to state that a variable depends,

or does not depend, on another variable logically; these logics are each equiv-

alent in expressivity to existential second-order logic. More recently, Durand

et al. [2018] proposed a probabilistic team semantics for dependence logic, and

Hannula et al. [2020] gave a descriptive complexity result connecting this logic

to real-valued Turing machines. Under probabilistic team semantics, the uni-

versal and existential quantifiers bear a resemblance to our separating and de-

pendent conjunctions, respectively. It would be interesting to understand the

relation between these two logics, akin to how the semantics of propositional IF

forms a model of BI [Abramsky and Väänänen, 2009].

152


Conditional independence, join dependency, and logic There is a long line

of research on logical characterizations of conditional independence and join

dependency. The literature is too vast to survey here. On the conditional inde-

pendence side, we can point to work by Geiger and Pearl [1993] on graphical

models; on the join dependency side, the survey by Fagin and Vardi [1984] de-

scribes the history of the area in database theory. There are several broadly

similar approaches to axiomatizing the general properties of conditional depen-

dence, including graphoids [Pearl and Paz, 1985] and separoids [Dawid, 2001].

Graphical Approach to Conditional Independence Probabilistic graphical

models offer a powerful framework for representing probabilistic relation-

ships [Koller and Friedman, 2009, Pearl, 2014]. In particular, Bayesian networks

model joint distributions of program variables using directed acyclic graphs

(DAGs), where edges represent conditional dependencies: each child node is

associated with a conditional distribution given its parent nodes. This structure

enables a more compact representation of the overall distribution by leveraging

conditional independence between variables.

Bayesian networks are widely used in machine learning as a flexible and

interpretable class of models for fitting data [Friedman et al., 1997, Murphy,

2012]. In many cases, the structure of the network is fixed, and the parame-

ters—defining the conditional distributions along the edges—are learned from

data via probabilistic inference. In such settings, the conditional indepen-

dence encoded in the network can significantly improve the efficiency of infer-

ence [Obermeyer et al., 2019]. Moreover, the graphical structure makes it easier

to identify and exploit these independence relations.

153


However, when the structure of a suitable Bayesian network is not known

in advance, structure learning techniques are applied to discover it from

data [Chickering, 2002, 1996, Kitson et al., 2023]. These methods often rely

on identifying conditional independence relationships among variables—either

through statistical tests or scoring criteria—to constrain the space of possible

graph structures. Consequently, the accuracy and reliability of structure learn-

ing depend heavily on the ability to verify conditional independence in ob-

served data.

Categorical probability The view of conditional independence as a factoriza-

tion of Markov kernels has previously been explored [Jacobs and Zanasi, 2017,

Cho and Jacobs, 2019, Fritz, 2020]. Taking a different approach, Simpson [2018]

has recently introduced category-theoretic structures for modeling generalized

conditional independence, capturing conditional independence and join depen-

dency as well as analogues in heaps and nominal sets [Pitts, 2013]. Roughly

speaking, conditional independence in heaps requires two disjoint portions ex-

cept for a common overlap contained in the part that is conditioned; this notion

can be smoothly accommodated in our framework as a DIBI model where ker-

nels are Kleisli arrows for the identity monad (Brotherston and Calcagno [2009])

also consider a similar notion of separation). Simpson [2018]’s notion of condi-

tional independence in nominal sets suggests that there might be a DIBI model

where kernels are Kleisli arrows for some monad in nominal sets, although the

appropriate monad is unclear.

A recent work [Simpson, 2024] studies logical reasoning principles for gener-

alized conditional independence and equality, when equality is a coarser notion

than equivalence. They provide a semantic foundation for these reasoning prin-

154


ciples based on atomic sheaves, and shows a category of probability sheaves as

an instantiation. While they do not concern probabilistic programs, and their

reasoning principles derive new relations of variables based on known relations

in a fixed distribution, it could be interesting to explore alternative assertion

logic for capturing probabilistic programs based on their atomic sheaf logic.

A Categorical model to DIBI logic The relational model of DIBI, introduced

in the conference version [Bao et al., 2021], and the probabilistic model intro-

duced above are similar but Bao et al. [2021] does not provide a unifying way

to construct such similar DIBI models. In our follow-up work Gu et al. [2024],

we develop an abstract framework for systematically constructing DIBI models,

using category theory as the unifying mathematical language. In particular, we

use string diagrams – a graphical presentation of monoidal categories – to give

a uniform definition of the parallel composition and preorder in DIBI models.

Our approach not only generalises known models, but also yields new models

of interest and reduces properties of DIBI models to structures in the underlying

categories. Furthermore, our categorical framework enables a logical notion of

CI, in terms of the satisfaction of specific DIBI formulas.

155


CHAPTER 5

BLUEBELL: A UNIFYING FRAMEWORK FOR INDEPENDENCE,

CONDITIONAL INDEPENDENCE AND RELATIONAL REASONING

5.1 Overview

In this chapter, we present BLUEBELL, another separation logic for reasoning

about probabilistic programs. While BLUEBELL is designed to be a flexible

framework for combining unary reasoning and relational reasoning of proba-

bilistic programs, in this thesis, we mostly focus on the unary part of BLUEBELL.

The unary part of BLUEBELL shares functionality with CPSL in that they can

both be used to prove independence and conditional independence of program

variables. However, BLUEBELL allows more expressive assertions and has more

ergonomic rules, which enable us to prove conditional independence arising in

more complicated programs. Below, we overview our motivation for design-

ing a unifying framework for unary reasoning and relational reasoning, prior

work that we get inspiration from, and our key design choices to make it more

ergonomic.

Motivation: Independence Helps Relational Reasoning

Unary reasoning means analyzing the behavior of the one target program di-

rectly. For example, the program logics introduced in previous chapters, PSL,

LINA and CPSL, are all techniques for conducting unary reasoning. This choice

aligns with the nature that independence, negative dependence and conditional

independence, are all properties of program variables in a single program.

156


However, sometimes the properties of concern are naturally relational. For in-

stance, we may want to show that two probabilistic programs have the same

behaviors, or that their outputs only differ up to a certain margin. To prove

such relational goals, it could be easier to compare the two programs step by

step, instead of individually capturing their outputs and comparing their out-

puts. Formalization of relational reasoning of probabilistic programs has been

an active research area. One prominent line of work in this area is probabilistic

relational Hoare logic (pRHL) [Barthe et al., 2013, Hsu, 2017], which formalizes a

technique known as “proof by coupling” by the probability theory community.

Conceptually, we can think of any probabilistic program as a distribution

over different execution traces. To compare two probabilistic programs, it can

be helpful to pair up the execution traces from the two programs and examine

the pair: for example, consider simple programs 𝐴 and 𝐵 that both make coin

flips, then their coin flips have the same bias iff we can pair up their execution

traces such that when 𝐴 flips head, 𝐵 flips head as well, and vice versa. This

method works not only for proving program equivalence but also for a range

of properties important for cryptography and differential privacy [Barthe et al.,

2009, 2015, Hsu, 2017, Wang et al., 2019, Zhang and Kifer, 2017]. To describe the

pairing, we can use a coupling, i.e., the joint distribution of the two distributions,

thus making sense of the name “proof by coupling.” Note that the pairing and

the coupling here are just reasoning tools — the actual execution of the two

programs can be either correlated, or completely oblivious of each other.

Our motivation for designing BLUEBELL comes from an observation that in-

dependence allows one to decompose relational arguments. As an example, say

we want to show two probabilistic programs 𝐴 and 𝐵 are equivalent, and 𝐴1, 𝐴2

157


are two independent components in program 𝐴, and similarly, 𝐵1, 𝐵2 are inde-

pendent two components in program 𝐵. Then it is sufficient (and of course not

necessary) if we can develop one relational argument showing 𝐴1 equivalent

to 𝐵1 and another relational argument showing 𝐴2 equivalent to 𝐵2. Here, the

key condition used is independence: when 𝐴1, 𝐴2 are not independent, or 𝐵1, 𝐵2

are not independent, then component-wise equivalence does not guarantee the

overall equivalence because the components may be correlated differently; the

relations between the two programs do not matter, i.e., we can also replace pro-

gram equivalence with other relations between 𝐴𝑖 and 𝐵𝑖.

Such decomposition can make relational reasoning more scalable, especially

when the only other tool for building relational arguments is “proof by cou-

pling,” which until recently requires rigid alignments between two programs.

Since both reasoning of probabilistic independence and coupling can be sub-

tle, we want to formalize such usage of probabilistic independence in relational

proofs. Furthermore, since probabilistic independence is inherently a unary

property, it is more natural to prove it in unary style arguments, such as in prob-

abilistic separation logic, prompting us to unify unary reasoning and relational

reasoning of probabilistic programs in one framework.

Unary Fragment of BLUEBELL: A More Ergonomic Probabilistic Separation

Logic

On the unary reasoning side, we want to present a program separation logic

that can cleanly prove independence and conditional independence. Concretely,

we want a logic that allows precise description of complicated program states

and formalizes subtle probabilistic reasoning as easy-to-apply syntactic rules.

158


Ideally, the users of the logic can carry out all (or at least most of) the important

steps using rules in the logic, instead of resorting to meta-level mathematics.

Also, we want to relieve the users from the burden of checking side conditions

for applying a rule as much as possible.

In the previous chapter, we show one way to assert conditional indepen-

dence in bunched logic — using the DIBI logic interpreted on the probabilis-

tic kernels model, and also showed a program logic based on it for reasoning

about conditional independence arising in programs. The approach, however,

has some limitations. For one thing, DIBI+ excludes a lot of formulas in DIBI

to ensure the restriction property (theorem 4.3.1) which says that a formula 𝑃

holds in a kernel iff it holds in the subkernel restricted to free variables of 𝑃.

Although we demonstrated that conditional independence in small programs

can be proved using CPSL, whose program rules only involve DIBI+ formulas,

it is cumbersome to always have to check if a formula is in DIBI+. In addition,

in PSL, LINA and DIBI, we cannot assert expressions 𝑒1, 𝑒2 independent if 𝑒1

and 𝑒2 share variables. For example, while 𝑥 may be independent of 𝑥 xor 𝑦, the

formula Own(𝑥) ∗ Own(𝑥 xor 𝑦) always implies false in their assertion logic.

When designing BLUEBELL, we take inspirations from Li et al. [2023a] (Lilac),

which proposes a variant of probabilistic separation logic that addresses these

limitations of DIBI for functional programs. We investigate whether we can

design a program logic for an imperative probabilistic programming language also

with these nice features. We work with an imperative probabilistic program-

ming language both because of intellectual curiosity, investigating whether it

would allow us to use less technical program semantics than Lilac, and be-

cause of our bigger goal to unify unary reasoning and relational reasoning in

159


one framework — a lot of work in pRHL (e.g., Barthe et al. [2013, 2015, 2016b,a,

2017]) also feature an imperative probabilistic programming language. Allow-

ing the program states to be mutable creates new challenges for validating the

frame rule, which requires any separate resource framed to the current one to

always be preserved in program execution.

Quick Walkthrough of Lilac [Li et al., 2023a]

Lilac’s key innovation is using a new BI model based on measure-theoretic prob-

abilities. In PSL and LINA, no state 𝜇 can satisfy assertions such as Own(𝑥) ∗

Own(𝑥 xor 𝑦) because evaluating 𝑥 and (𝑥 xor 𝑦) respectively needs marginal

distributions with domain {𝑥} and {𝑥, 𝑦}, and the independent product of these

two marginal distributions is undefined because their domain overlap on {𝑥}.

However, measure-theoretic probability spaces are specified by a sigma-algebra

describing the event space and a measure on the sigma-algebra, and it is pos-

sible to separate the event space of 𝑥 and the event space of (𝑥 xor 𝑦) such

that their independent product recovers the original probability space. With

measure-theoretic probabilities, they also give a rigorous treatment for contin-

uous probabilities, which enables them to handle examples that sample from a

uniform distribution over the interval [0, 1].

To assert conditional independence, Li et al. [2023a] introduce a modality

C𝑥←𝑋 to the assertion logic: their assertion logic model consists of distributions

— represented using measure-theoretic probability spaces — instead of kernels

as states; in their model, a distribution satisfies C𝑥←𝑋𝑃(𝑥) iff, fixing the variable

𝑋 to any value 𝑥 it can take, the conditional distribution satisfies 𝑃(𝑋). Using

that, they show that the conditional independence of variables 𝑌, 𝑍 given 𝑋 can

160


be asserted using C𝑥←𝑋 Own(𝑌 ) ∗ Own(𝑍) and encode several axioms regarding

conditioning and independence.

On the program logic level, following the convention adopted by the higher-

order concurrent separation logic framework Iris [Jung et al., 2018], Lilac

chooses to work with a functional probabilistic language and define the va-

lidity of a Hoare triple differently. Their definition of Hoare triples implicitly

requires that any frame conjuncted to the current resource must be preserved.

This allows the frame rule to be proven easily, without relying on side condi-

tions or verifying that the formulas satisfy the restriction property. In exchange,

one needs to inductively prove that each program preserves the frame. As they

work with a functional probabilistic programming language, they fix a probabil-

ity space (“ambient sampling space” in their term) and their program variables

are simply random variables on that probability space. They show that every-

thing works out when they fix the ambient sampling space to be the product of

countably infinite copies of the [0, 1] interval under a particular set of technical

constraints. The choice of the “ambient sampling space” seems highly non-

robust, e.g., a product of finite copies of [0, 1] interval would not work in their

proofs even if the probabilistic programs only make a finite number of sampling

calls.

Bluebell’s Design Choices

In BLUEBELL, we combine Lilac’s measure-theoretic BI model with a BI model

of permissions, which are used in Concurrent Separation Logic literature to track

who can read from and write to a resource. In our model section 5.3.3, two

resources 𝑎, 𝑏 can be composed together only if their permissions can be com-

161


bined as well, and this extra requirement plays a crucial role in ensuring that

the Frame rule holds in our logic.

We draw insights from both DIBI and Lilac to design a modified condition-

ing modality, introduced in section 5.3.4, that is more expressive and satisfies

a richer family of axioms than Lilac’s conditioning modality. Our conditioning

modality is the key to how we mix unary reasoning and relational reasoning.

It supports conditioning on one program state as well as two or more program

states, and using that, we can not only capture conditional independence, but

also couplings. Furthermore, from the axioms of the conditioning modality, we

not only derive reasoning principles important for proving conditional inde-

pendence, but also relational reasoning principles and their interactions. Even

when we only focus on the unary reasoning functionality of BLUEBELL, we can

formalize more interesting proof steps in the logic using the richer set of axioms

enjoyed by this conditioning modality, showcased by examples in section 5.5.

5.2 Preliminaries: Programs and Probability Spaces

To formally define the model of BLUEBELL and validate its rules, we introduce

a number of preliminary notions. Our starting point is the measure-theoretic

approach of Li et al. [2023a] in defining probabilistic separation. We recall the

main definitions below.

One crucial difference between elementary probability (as how we in-

troduced distributions in definition 2.2.9) and measure-theoretic probabil-

ity [Rosenthal, 2006, Fristedt and Gray, 2013] is their treatment of event space.

In elementary probability, we work with the set of outcomes directly — distri-

162


butions are defined as maps from outcomes to numbers in [0, 1] and any subset

of outcomes is considered as an event. In measure-theoretic probability, events

are specified by a structure called 𝜎-algebra.

Definition 5.2.1 (𝜎-algebra). Given a set of possible outcomes Ω, a 𝜎-algebra F

is a set of subsets of Ω that is closed under countable unions and complements,

and such that Ω ∈ F . We call an element of a 𝜎-algebra an event. We denote the

set of 𝜎-algebras over a set of outcomes Ω as A(Ω).

The full 𝜎-algebra over Ω is ΣΩ = 𝒫(Ω), the powerset of Ω. For 𝐹 ⊆ 𝒫(Ω),

we write 𝜎(𝐹) ∈ A(Ω) for the smallest 𝜎-algebra containing 𝐹.

The measure-theoretic notion of distributions map events in 𝜎-algebras to a

number between 0 and 1, which we call the measure of the event, or the proba-

bility of the event.

Definition 5.2.2 (Probability Distributions). Given F ∈ A(Ω), a probability dis-

tribution 𝜇 ∈ D(F ), is a function 𝜇 : F → [0, 1] such that

• For any countable set of disjoint events {𝐸𝑖 | 𝑖 ∈ 𝐼} , 𝜇(⊎𝑖∈𝐼𝐸𝑖) =
∑
𝑖∈𝐼 𝜇(𝐸𝑖).

• 𝜇(Ω) = 1.

The support of a distribution 𝜇 ∈ D(ΣΩ) is the set of outcomes with non-zero

probability supp(𝜇) ≜ {𝑎 ∈ Ω | 𝜇(𝑎) > 0}, where 𝜇(𝑎) abbreviates 𝜇({𝑎}).

Probability spaces are given as a triple of the outcome space, the 𝜎-algebra,

and the distribution.

Definition 5.2.3. A probability space P is a pair P = (Ω, F , 𝜇) of a 𝜎-algebra

F ∈ A(Ω) and a probability distribution 𝜇 ∈ D(F ). We call the distribution

163


𝜇 the measure in a probability space P = (F , 𝜇). The trivial probability space

𝟙Ω ∈ P(Ω) is the trivial 𝜎-algebra {Ω, ∅} equipped with the trivial probability

distribution that maps Ω to probability 1 and maps ∅ to probability 0.

When the outcome space is clear, we omit outcome space in the triple of

probability spaces.

We define a pre-order on probability spaces to capture the intuition that a

probability space 𝐴 is smaller than a probability space 𝐵 if 𝐴 is defined for a

subset of events where 𝐵 is defined on and 𝐴 agrees with 𝐵 on the subset. This

pre-order will be used in BLUEBELL’s BI model over probability spaces.

Definition 5.2.4. Given F1 ⊆ F2 and 𝜇 ∈ D(F2), the distribution 𝜇 |F1 ∈ D(F1)

is the restriction of 𝜇 to F1. The extension pre-order (⊑) over probability spaces is

defined as (F1, 𝜇1) ⊑ (F2, 𝜇2) ≜ F1 ⊆ F2 ∧ 𝜇1 = 𝜇2 |F1 .

Given two probability spaces, we identify a set of functions that transfer

nicely between them, called measurable functions.

Definition 5.2.5. A function 𝑓 : Ω1 → Ω2 is measurable on F1 ∈ A(Ω1) and F2 ∈

A(Ω2) if for any event 𝑋 ∈ F2, we also have 𝑓 −1(𝑋) ∈ F1. When F2 = ΣΩ2 we simply

say 𝑓 is measurable on F1.

Later we will want to decompose one probability space into two, and its

definition depends on how we compose two probability spaces into one. Two

natural ways to combine two 𝜎-algebras are taking the Cartesian product and

taking the union.

Definition 5.2.6 (Product and union spaces). Given F1 ∈ A(Ω1), F2 ∈ A(Ω2),

164


their product is the 𝜎-algebra F1 ⊗ F2 ∈ A(Ω1 ×Ω2) defined as

F1 ⊗ F2 ≜ 𝜎({𝑋1 × 𝑋2 | 𝑋1 ∈ F1, 𝑋2 ∈ F2}),

and their union is the 𝜎-algebra F1 ⊕ F2 ∈ A(Ω1 ×Ω2) defined as 𝜎(F1 ∪ F2).

We can take the product of two distributions to obtain a distribution over the

product 𝜎-algebra.

Definition 5.2.7. The product of two probability distributions 𝜇1 ∈ D(F1) and

𝜇2 ∈ D(F2) is the distribution (𝜇1 ⊗ 𝜇2) ∈ D(F1 ⊗ F2) defined by

(𝜇1 ⊗ 𝜇2) (𝑋1 × 𝑋2) = 𝜇1(𝑋1) · 𝜇2(𝑋2)

for all 𝑋1 ∈ F1, 𝑋2 ∈ F2.

In this chapter, we will frequently use the independent product of two distri-

butions, which is over the union of their 𝜎-algebra and is not always defined.

Definition 5.2.8 (Independent product Li et al. [2023a]). Given (F1, 𝜇1), (F2, 𝜇2) ∈

P(Ω), their independent product is the probability space (F1 ⊕ F2, 𝜇) ∈ P(Ω) where

for all 𝑋1 ∈ F1, 𝑋2 ∈ F2, 𝜇(𝑋1 ∩ 𝑋2) = 𝜇1(𝑋1) · 𝜇2(𝑋2). It is unique, if it exists [Li

et al., 2023a, Lemma 2.3]. Let P1 ⊛ P2 be the unique independent product of P1

and P2 when it exists, and be undefined otherwise.

Probabilistic Programming Language

Another important component in BLUEBELL is the probabilistic programming

language. We use a simple first-order imperative language very similar to

pWhile except that it contains a different construct for loops. As in pWhile, we

165


T ∋ 𝑡 ::= skip | 𝑥 := 𝑒 | 𝑥 𝑑 | if 𝑏 then 𝑡1 else 𝑡2 | 𝑡1 ; 𝑡2 | repeat 𝑒 𝑡

Figure 5.1: Program Syntax

fix a finite set of program variables 𝑥 ∈ Var and countable set of values 𝑣 ∈ Val ≜ Z

and define the program stores to be 𝑠 ∈ Mem[Var] ≜ Var → Val. For sim-

plicity, booleans are encoded by using 0 ∈ Val as false and any other value as

true. Program terms 𝑡 ∈ T are formed according to the grammar in fig. 5.1. (We

call them terms to follow the terminology in the conference version Bao et al.

[2025] and distinguish them from the commands in pWhile.) The expressions

𝑒 are interpreted into ⟦𝑒⟧ : Mem[Var] → Val following the standard definition

(see definition D.1.1). As before, we write FV(𝑒) for the set of program variables

that occur in 𝑒. The distributions 𝑑 are interpreted to be measures over prob-

ability space Σ𝐴 for some type 𝐴; when 𝑑 : Σ𝐴 is used in the sampling 𝑥 𝑑,

we expect 𝐴 to be a subset of Z. An example distribution is Bern𝑣, the Bernoulli

distribution with probability 𝑣 to yield 1 and probability 1 − 𝑣 to yield 0.

Though we do not allow general loops because of difficulties around rea-

soning about them, we allow iterations implemented through using a simpler

construct repeat 𝑒 𝑡, which evaluates 𝑒 to a value 𝑛 ∈ Val and, if 𝑛 > 0, executes

𝑡 in sequence 𝑛 times. Only allowing this restrictive version of iteration means

we only consider a subset of terminating programs.

For the semantics of programs, we interpret each term 𝑡 to a function

⟦𝑡⟧ : D(ΣMem[Var]) → D(ΣMem[Var]), i.e., a map from distributions of input stores

to distributions of output stores. The interpretation of the terms is standard, and

we defer the mathematical definition to definition D.1.2. Notably, working with

a countable set of values Val means that the set of program stores are also count-

166


able, so distributions in D(ΣMem[Var]) are also discrete. In BLUEBELL, we work

with discrete distributions because it is unclear how continuous distributions

interact with some relational constructs in our logic. However, we still use the

measure-theoretic definitions for more granular control over the event space.

5.3 The BLUEBELL Logic

We are now ready to define BLUEBELL’s semantic model and show its laws.

5.3.1 An Alternative Approach to Bunched Logic

While the assertion logic of BLUEBELL extends from the Bunched logic, we use

a different presentation than the one used in PSL, LINA and DIBI. We adapt the

approach to BI in Krebbers et al. [2018], which is motivated by efforts in mech-

anizing various separation logics in a ROCQ framework called Iris. Though

BLUEBELL has not been mechanized yet in Bao et al. [2025], we look forward to

mechanizing it in the future, and thus, we lay the foundation of the logic in a

style that aligns with the Iris framework and its follow-up works. Specifically,

instead of interpreting formulas in a structure similar to BI frames and DIBI

frames, which combine two states using non-deterministic binary operators, we

use a structure called “ordered unital resource algebras ” (henceforth RA). RAs

allow their states to be combined using either partial or total binary operators:

RAs are always equipped with a total binary operation and a predicate V indi-

cating which elements of the carrier are considered valid resources; then partial-

ity of the operation manifests as mapping some combinations of arguments to

167


invalid elements.

Definition 5.3.1 (Ordered Unital Resource Algebra). An ordered unital resource

algebra (RA) is a tuple (𝑀, ⪯,V, ·, 𝜀) where ⪯ : 𝑀 is a pre-order called the resource

order, V : 𝑀 → Prop is the validity predicate, (·) : 𝑀 → 𝑀 → 𝑀 is the resource

composition, a commutative and associative binary operation on 𝑀 , and 𝜀 ∈ 𝑀

is the unit of 𝑀 , satisfying, for all 𝑎, 𝑏, 𝑐 ∈ 𝑀 :

V(𝜀); (Unit Validity)

𝜀 · 𝑎 = 𝑎; (Unit Existence)

V(𝑎 · 𝑏) → V(𝑎); (Element Validity)

𝑎 ⪯ 𝑏 → 𝑎 · 𝑐 ⪯ 𝑏 · 𝑐; (Validity Closure)

𝑎 ⪯ 𝑏 → 𝑎 · 𝑐 ⪯ 𝑏 · 𝑐. (Order Coherence)

BLUEBELL also differs from PSL, LINA and DIBI in that, in BLUEBELL, we

take a semantic approach to assertions: we do not insist on a specific syntax and

instead characterize what constitutes an assertion by its type. We embed our

definitions in a standard first-order logic, and will refer to it as the meta-level

logic. We overload ∧ and ∨ as the conjunction and disjunction and write ⇒

for the implication in this meta-level logic. Following the convention in ROCQ

community, we use Prop denote to the type of propositions.

BLUEBELL uses an alternative definition of BI assertions. To disambiguate

from the definition of BI assertions in previous chapters, we denote the BLUE-

BELL version as BI∗ assertions.

Definition 5.3.2. We define BI∗ assertions relative to some RA 𝑀 as the upward

closed functions 𝑀 → Prop. A map 𝑃 : 𝑀 → Prop is upward closed if ∀𝑎, 𝑎′ ∈ 𝑀

such that 𝑎 ⪯𝑀 𝑎′, 𝑃(𝑎) ⇒ 𝑃(𝑎′) in the propositional logic.

168


The requirement that BI∗ assertions need to be upward closed maps is another

way to express the persistence condition we impose on assertions in previous

chapters. In this chapter, we do not use the symbol |= as the satisfaction relation;

instead, we say that a resource 𝑎 satisfies an assertion 𝑃 if 𝑃(𝑎). Entailment is

defined as (𝑃 ⊢ 𝑄) ≜ ∀𝑎 ∈ 𝑀.V(𝑎) ⇒ (𝑃(𝑎) ⇒ 𝑄(𝑎)). Logical equivalence is

defined as entailment in both directions: 𝑃 ⊣⊢ 𝑄 ≜ (𝑃 ⊢ 𝑄) ∧ (𝑄 ⊢ 𝑃).

We introduce two families of assertions useful in separation logic. First,

pure assertions ⌜𝜙⌝ lift meta-level propositions 𝜙 to BI∗ assertions (by ignor-

ing the resource). For example, an formula about specified distributions such as

Bern0.5 = bind(Bern0.3, 𝑣 ↦→ Bern0.5) is pure and can be used in separation logic

as ⌜Bern0.5 = bind(Bern0.3, 𝑣 ↦→ Bern0.5)⌝. Second, Own(𝑏) holds on resources

that are greater or equal than 𝑏 in the RA order; this means 𝑏 represents a lower

bound on the available resources. Mathematically,

⌜𝜙⌝ ≜ λ . 𝜙

Own(𝑏) ≜ λ𝑎. 𝑏 ⪯ 𝑎

We also use standard connectives from BI to produce new assertions given

existing ones. We interpret these connectives relative to an RA, and the defini-

tion is standard:

𝑃 ∧𝑄 ≜ λ𝑎. 𝑃(𝑎) ∧𝑄(𝑎)
𝑃 ∨𝑄 ≜ λ𝑎. 𝑃(𝑎) ∨𝑄(𝑎)
𝑃→ 𝑄 ≜ λ𝑎.∀𝑏 s.t. 𝑎 ⪯ 𝑏, 𝑃(𝑏) ⇒ 𝑄(𝑏)
𝑃 ∗ 𝑄 ≜ λ𝑎. ∃𝑏, 𝑐 s.t. 𝑎 ⊒ 𝑏 ◦ 𝑐, 𝑃(𝑏) ∧𝑄(𝑐)
𝑃 −∗ 𝑄 ≜ λ𝑎.∀𝑏, 𝑐 s.t. 𝑎 ◦ 𝑏 ⪯ 𝑐, 𝑃(𝑏) implies 𝑄(𝑐)

∀𝑥 : 𝑋. 𝑃(𝑥) ≜ λ𝑎.∀𝑥 ∈ 𝑋. 𝑃(𝑥) (𝑎)
∃𝑥 : 𝑋. 𝑃(𝑥) ≜ λ𝑎. ∃𝑥 ∈ 𝑋. 𝑃(𝑥) (𝑎)

Figure 5.2: Satisfaction for BI formulas on RA

169


5.3.2 A Model of Probabilistic Spaces

BLUEBELL’s assertions will be interpreted over a specific RA, which we con-

struct by combining more basic RAs. The main component is the Probability

Spaces RA, which uses the independent product as the RA operation.

Definition 5.3.3 (Probability Spaces RA). The probability spaces RA PSpΩ is the

resource algebra (P(Ω) ⊎ { }, ⪯,V, ·, 𝟙Ω)where ⪯ is the extension pre-order (def-

inition 5.2.4) with  added as the top element, i.e. P1 ⪯ P2 ≜ P1 ⊑ P2 and

∀𝑎 ∈ PSpΩ. 𝑎 ⪯  ; V(𝑎) ≜ 𝑎 ≠  ; composition is the independent product:

𝑎 · 𝑏 ≜


P1 ⊛ P2 if 𝑎 = P1, 𝑏 = P2, and P1 ⊛ P2 is defined

 otherwise

The fact that PSpΩ satisfies the axioms of RAs is established in appendix D.3

and builds on the analogous construction in Lilac.

We now introduce assertions that are specific to PSpΩ. We use the follow-

ing two abbreviations so we do not need to write out the resource pedantically

when using the BI∗ assertion Own(−):

Own(F , 𝜇, 𝑝) ≜ Own(((F , 𝜇), 𝑝)) Own(F , 𝜇) ≜ ∃𝑝.Own(F , 𝜇, 𝑝)

We also want to use expressions in assertions. Let A-typed expressions be

maps 𝐸 of type Mem[Var] → 𝐴. We allow PSpΩ assertions to use A-typed ex-

pressions for any type 𝐴. As an example, the interpretation of any program ex-

pression ⟦𝑒⟧ : Mem[Var] → Val is a Val-typed expression. Thus, we seamlessly

use program expressions in assertions by implicitly coercing them to their se-

mantics.

170


The first kind of PSpΩ assertions we want to introduce are 𝐸 $∼ 𝜇. Intuitively,

we want it to assert that the expression 𝐸 has the distribution 𝜇 in the probability

space specified; and to evaluate the expression 𝐸 , the probability space needs

to have enough information — we refer to the condition needed to evaluate an

expression 𝐸 as ownership over 𝐸 below.

Lilac proposed to use measurability as the notion of ownership. Recall that

a function 𝑓 : 𝐴 → 𝐵 is measurable in a sigma-algebra F :A(𝐴) if 𝑓 −1(𝑏) =

{𝑎 ∈ 𝐴 | 𝑓 (𝑎) = 𝑏} ∈ F for all 𝑏 ∈ 𝐵. An A-typed expression 𝐸 always de-

fines a measurable function (i.e. a random variable) on ΣMem[Var] but might not

be measurable on some sub-algebras of ΣMem[Var] . Their definition makes sense

because any resource that makes 𝐸 measurable contains enough information to

determine 𝐸 ’s distribution, However, we discovered that this choice made ax-

ioms used in Lilac’s proofs flawed. In short, axioms such as Own(𝑥) ∗ ⌈𝑥 = 𝑦⌉ |=

Own(𝑦), which intuitively convey that idea that if 𝑥 is measurable and 𝑥, 𝑦 are

equal in any plausible outcomes, then 𝑦 is also measurable, played crucial role

in Lilac’s proofs of example programs but are not sound. 1

Thus, we propose a slight weakening of the notion of measurability which

solves those issues while still retaining the intent behind the notion of owner-

ship. We call this weaker notion “almost measurability”.

Definition 5.3.4 (Almost-measurability). Given a probability space (F , 𝜇) ∈

P(Ω) and a set 𝑋 ⊆ Ω, we say 𝑋 is almost measurable in (F , 𝜇), written 𝑋 � (F , 𝜇),

if

∃𝑋1, 𝑋2 ∈ F . 𝑋1 ⊆ 𝑋 ⊆ 𝑋2 ∧ 𝜇(𝑋1) = 𝜇(𝑋2)

We say a function 𝐸 : Ω→ 𝐴, is almost measurable in (F , 𝜇), written 𝐸 � (F , 𝜇), if

1A later revision Li et al. [2023b] corrected the issue, although with a different solution from
ours.

171


𝐸−1(𝑎) � (F , 𝜇) for all 𝑎 ∈ 𝐴.

While almost-measurability does not imply measurability, it constrains the

current probability space to uniquely determine the distribution of 𝐸 in any

extension where 𝐸 becomes measurable.

Example 5.3.1. For example, let 𝑋 = {𝑠 | 𝑠(𝑥) = 42} and F = 𝜎({𝑋}) =

{Mem[Var], ∅, 𝑋,Mem[Var] \ 𝑋}. If 𝜇(𝑋) = 1, then 𝑥 � (F , 𝜇) holds but 𝑥 is not

measurable in F , as F lacks events for 𝑥 = 𝑣 for all 𝑣 except 42. Nevertheless,

any extension (F ′, 𝜇′) ⊒ (F , 𝜇) where 𝑥 is measurable, would need to assign

𝜇′(𝑋) = 1 and 𝜇′(𝑥 = 𝑣) = 0 for every 𝑣 ≠ 42.

In general, when 𝑋1 ⊆ 𝑋 ⊆ 𝑋2 and 𝜇(𝑋1) = 𝜇(𝑋2) = 𝑝, we can unambigu-

ously assign probability 𝑝 to 𝑋 , as any extension of 𝜇 to ΣΩ must assign 𝑝 to 𝑋 ;

then we write 𝜇(𝑋) for 𝑝.

When defining 𝐸 $∼ 𝜇, we require 𝐸 to be almost-measurable and to be dis-

tributed as 𝜇 in any extension of the local probability space. Formally, given

𝜇 :D(Σ𝐴) and 𝐸 : Mem[Var] → 𝐴, we define:

𝐸 $∼ 𝜇 ≜ ∃F , 𝜇.Own(F , 𝜇) ∗ ⌜𝐸 � (F , 𝜇) ∧ 𝜇 = 𝜇 ◦ 𝐸−1⌝

Notably, 𝐸 � (F , 𝜇) ∧ 𝜇 = 𝜇 ◦ 𝐸−1 is a pure fact that we can reason without

using the local probability space — the probability space (F , 𝜇) is fixed by the

existential quantifier and not relying on the local probability space.

Using the 𝐸 $∼ 𝜇 assertion, we can define a number of useful derived asser-

tions. In their definition, we use the following events of the outcome space Val:

172


false ≜ {0} and true ≜ {𝑛 ∈ Val | 𝑛 ≠ 0}.

E[𝐸] = 𝑟 ≜ ∃𝜇. 𝐸 $∼ 𝜇 ∗ ⌜𝑟 =
∑︁
∗𝑎∈supp(𝜇)𝜇(𝑎) · 𝑎⌝

Pr(𝐸) = 𝑟 ≜ ∃𝜇. 𝐸 $∼ 𝜇 ∗ ⌜𝜇(true) = 𝑟⌝

⌈𝐸⌉ ≜ 𝐸 $∼ 𝛿true

Own(𝐸) ≜ ∃𝜇. 𝐸 $∼ 𝜇

Assertions about expectations (E[𝐸]) and probabilities (Pr(𝐸)) both assert

𝐸 is distributed as 𝜇 for some distribution 𝜇 and that 𝜇 satisfies the desired

pure property. To assert E[𝐸] = 𝑟, we implicitly assume 𝐸 is a numerical-typed

expression. The assertion holds if 𝐸 is uniquely determined to distribute as 𝜇

and the expected value in 𝜇 is 𝑟. To assert Pr(𝐸) = 𝑟, we implicitly assume 𝐸 is

a Val-typed expression. The assertion Pr(𝐸) = 𝑟 holds on a probability space if

the probability space uniquely determines 𝐸 to distribute as 𝜇, where 𝜇 assign

probability 𝑟 to the event true.

The “almost surely” assertion ⌈𝐸⌉ takes an expression 𝐸 and asserts that 𝐸 al-

ways “evaluate to true.” Because we encode booleans by treating 0 ∈ Val as false

and any other value as true, we define it to evaluate whether 𝐸 is distributed as

the Dirac distribution true — the handling of ownership over 𝐸 is baked in the

definition of distributed as. By this definition, an assertion like ⌈𝑥 = 𝑦⌉ owns the

expression 𝑥 = 𝑦 but not necessarily 𝑥 itself: the only events needed to make the

expression 𝑥 = 𝑦 almost measurable are {𝑠 | 𝑥 = 𝑦} and {𝑠 | 𝑥 ≠ 𝑦}, which would

be not enough to make 𝑥 itself almost measurable.

Now we see an example formula that is not satisfiable in PSL’s assertion

logic, but is satisfiable in the PSpΩ model.

Example 5.3.2. Assume there are only two variables 𝑥 and 𝑦. Let 𝑋𝑣 =

173


{𝑠 | 𝑠(𝑥) = 𝑣} and P1 = (F1, 𝜇1) with F1 = 𝜎({𝑋𝑣 | 𝑣 ∈ Val}) and let 𝜇1 give 𝑥 the

distribution of a fair coin, i.e. 𝜇1 is the extension to F1 of 𝜇1(𝑋0) = 𝜇1(𝑋1) = 1
2 .

Intuitively, the assertion 𝑥 $∼ Bern 1
2

holds on P1. Similarly, ⌈𝑥 = 𝑦⌉ holds

on P2 = (F2, 𝜇2) where F2 = {∅,Mem[Var], {𝐸},Mem[Var] \ 𝐸} with 𝐸 =

{𝑠 | 𝑠(𝑥) = 𝑠(𝑦)} and 𝜇2(𝐸) = 1. Note that F2 is very coarse: it does not con-

tain events that can pin the value of 𝑥 precisely; thanks to this, 𝜇2 does not need

to specify what is the distribution of 𝑥, but only that y will coincide on 𝑥 with

probability 1. It is easy to see that the independent product of P1 and P2 exists

and is P3 = (F1 ⊕ F2, 𝜇3) where 𝜇3 is determined by 𝜇3(𝑋0 ∩ 𝐸) = 𝜇3(𝑋1 ∩ 𝐸) = 1
2 ,

i.e. makes 𝑥, y the outcomes of the same fair coin. This means P3 is a model of

𝑥 $∼ Bern 1
2
∗ ⌈𝑥 = 𝑦⌉.

5.3.3 A Model of Mutable Probabilistic Stores

In BLUEBELL, we want to develop a program logic to reason about an imperative

probabilistic programming language. Ideally, we want a clean frame rule as in

Lilac, which does not need side conditions as in PSL (see section 2.3.3), to make

modular reasoning of independent components easy. That means we want to

allow assertions on any independent probability spaces to be framed to the pre-

and post-conditions of our program judgements simultaneously. Lilac shows

that it is sound to do so in their model because their program variables are

immutable: their program variables are essentially maps (i.e., random variables)

on a fixed probability space over an infinite tape, and they can always perform

some manipulations so that the random variables used in the frame assertion

depends on a previously unused index of the tape. However, we work with

a language with mutation — our program term updates the probability space

174


over stores as it runs, and it is problematic to allow such a frame rule in our

setting.

Example 5.3.3. To illustrate the problem, consider a simple assignment 𝑥 := 0. In

the spirit of separation logic’s local reasoning, we expect to be able to prove a

small footprint triple for the assignment, i.e., one where the precondition only

involves ownership of the variable 𝑥, such as {Own(𝑥)} 𝑥 := 0 {⌈𝑥 = 0⌉}. How-

ever, we would run into problems when proving the Frame rule, which is the

key to enabling modular reasoning in separation logics. As we remarked, an

assertion like ⌈𝑥 = 𝑦⌉ is a valid frame of Own(𝑥), so the Frame rule would allow

us to derive ⊢ {Own(𝑥) ∗ ⌈𝑥 = 𝑦⌉} 𝑥 := 0 {⌈𝑥 = 0⌉ ∗ ⌈𝑥 = 𝑦⌉}. Yet the Hoare triple

{Own(𝑥) ∗ ⌈𝑥 = 𝑦⌉} 𝑥 := 0 {⌈𝑥 = 0⌉ ∗ ⌈𝑥 = 𝑦⌉} would be invalid because, as long

as 𝑦 ≠ 0 in input state , the formula ⌈𝑥 = 𝑦⌉ would not hold after the assignment.

We solve this problem by combining PSpMem[Var] , the RA of probability

spaces over the outcome space Mem[Var], with an RA of permissions over vari-

ables. The idea is that in addition to information about the distribution, asser-

tions can indicate which “write permissions” we own on variables. An assertion

that owns write permissions on 𝑥 would be incompatible with any frame predi-

cating on 𝑥. Then a triple for assignment just needs to require write permission

to the assigned variable. We model permissions using a standard fractional per-

mission RA.

Definition 5.3.5. The permissions RA is defined as (Perm, ⪯,V, ·, 𝜀) where the

carrier set Perm is defined to be maps Var→ Q+, where Q+ denotes non-negative

rational numbers. The resource pre-order is the point-wise order: for any two

𝑎, 𝑏 ∈ Perm, we have 𝑎 ⪯ 𝑏 iff ∀𝑥 ∈ Var. 𝑎(𝑥) ≤ 𝑏(𝑥). A permission is valid if it is

upper-bounded by 1: for 𝑎 ∈ Perm, V(𝑎) iff ∀𝑥 ∈ Var. 𝑎(𝑥) ≤ 1. The composition

175


of two permissions adds the two maps together point-wise: 𝑎1 · 𝑎2 ≜ λ𝑥. 𝑎1(𝑥) +

𝑎2(𝑥). The unit with respect to the composition is the constant zero permission:

𝜀 = λ . 0.

We now want to associate probability spaces with permissions. The goal

is to make sure that, for any resource 𝑠 with permission 1 on a variable 𝑥,

any resources that validly compose with 𝑠 must impose no constraints on the

marginal distribution of 𝑥. Since resources that validly compose with 𝑠 must

have zero permission on 𝑥, we only put restriction on probability spaces’ infor-

mation about variables with zero permission. For variables with strictly positive

permission, whether the permission is 0.01 or 1, the probability space can spec-

ify full distributions of them, or give no information, or anything in between.

This gives rise to the following definition.

Definition 5.3.6 (Compatibility). Given a probability space P ∈ P(Mem[Var])

and a permission map 𝑝 ∈ Perm, let 𝑆 = {𝑥 ∈ Var | 𝑝(𝑥) = 0}. We say that P is

compatible with 𝑝, written P # 𝑝, if there exists P′ ∈ P((Var \ 𝑆) → Val) such

that P is isomorphic to P′ ⊗ 𝟙𝑆→Val, witnessed by the isomorphism lifted from

Mem[Var] � ((Var \ 𝑆) → Val) × (𝑆 → Val) on the outcome space. We extend

the definition to PSpMem[Var] by declaring  # 𝑝.

We now construct an RA that associates probability spaces with permissions.

Definition 5.3.7. Let PSpPm ≜
{
(P , 𝑝) | P ∈ PSpMem[Var] , 𝑝 ∈ Perm,P # 𝑝

}
.

176


We define Probability Spaces with Permissions RA (PSpPm, ⪯,V, ·, 𝜀) where

V((P , 𝑝)) ≜ P ≠  ∧ ∀𝑥.𝑝(𝑥) ≤ 1

(P , 𝑝) ⪯ (P′ , 𝑝
′) ≜ P ⪯ P′ and 𝑝 ⪯ 𝑝′

(P , 𝑝) · (P′ , 𝑝
′) ≜ (P · P′ , 𝑝 · 𝑝

′)

𝜀 ≜ (𝟙Mem[Var] , λ𝑥. 0)

We define the following assertions specific to (PSpPm, ⪯,V, ·, 𝜀).

(𝑥:𝑞) ≜ ∃P, 𝑝.Own(P, 𝑝) ∗ ⌜𝑝(𝑖) (𝑥) = 𝑞⌝ 𝑃@𝑝 ≜ ∃P . 𝑃(P) ∧ Own(P, 𝑝) (5.1)

The first assertion (𝑥:𝑞) states that the current resource (P′, 𝑝′) assigns per-

mission at least 𝑞 to the variable 𝑥, i.e., 𝑝′(𝑥) ≥ 𝑞. In particular, any resource that

can be composed with a resource that satisfies (𝑥:1) would have a 𝜎-algebra

which is trivial on 𝑥. Therefore, having (𝑥:1) holds forbids any frame to re-

tain information about 𝑥. We can also differentiate between an assertion
(
𝑥:1

2

)
,

which does not allow frames that mutate 𝑥 but allows frames that predicate on

𝑥 (e.g. ⌈𝑥 = 𝑦⌉) , and an assertion (𝑥:1) which does not allows frames that pred-

icate on 𝑥; consequently, having
(
𝑥:1

2

)
hold standalone does not allow mutation

of 𝑥, but having (𝑥:1) enables mutation of 𝑥.

The second assertion 𝑃@𝑝 states 𝑃 holds in the probability space and 𝑝 lower

bounds the permission. The assertion (𝑥:𝑞) is a special case of 𝑃@𝑝 where 𝑃 is

set to⊤ and 𝑝 is defined as: 𝑝(𝑥) = 𝑞 and 𝑝(𝑦) = 0 for any other variables 𝑦 ∈ Var.

Also, in practice, preconditions of valid program logic triples are always of the

form 𝑃@𝑝 where 𝑝 contains full permissions for every variable, the relevant

program mutates, and non-zero permissions for the other variables referenced

in the assertions or program. For example, we define Hoare triples such that

{𝑃@𝑝} 𝑥 := 𝑦 {𝑄@𝑞} is valid only if 𝑝(𝑥) = 1 and 𝑝(𝑦) > 0.

177


While permissions allow for a clean semantic treatment of mutation and in-

dependence, it does incur some bookkeeping of permissions in practice. The

necessary permissions are however easy to infer from the variables used in the

assertions, as we will illustrate later in example 5.4.1.

Since we focus on BLUEBELL for unary reasoning in this thesis, our BI model

is simplyM � PSpPm, and we use assertions in the form of (𝑥:𝑞) and 𝑃@𝑝 to

describe resources in this model. We write the type of assertionsM → Prop as

PA.

5.3.4 Joint Conditioning

To assert conditional independence, we want to assert independence of vari-

ables in conditional distributions. We thus introduce the joint conditioning

modality C𝜇 𝐾 to assert on conditional distributions. Here we show the defi-

nition of C𝜇 restricted to the unary setting; a more general version defined for

tuples of program states (with permissions) is presented in the conference ver-

sion [Bao et al., 2025].

Definition 5.3.8 (Joint conditioning modality). Let 𝜇′ ∈ D(Σ𝐴) and 𝐾 : 𝐴 → PA,

then we define the assertion C𝜇′ 𝐾 : PA as follows:

C𝜇′ 𝐾 ≜ λ𝑎. ∃F , 𝜇, 𝑝, 𝜅. (F , 𝜇, 𝑝) ⪯ 𝑎 ∧ 𝜇 = bind(𝜇′, 𝜅)

∧ ∀𝑣 ∈ supp(𝜇′). 𝐾 (𝑣) (F , 𝜅(𝑣), 𝑝)

Intuitively, C𝜇 𝐾 holds on resources whose probability spaces can be seen as

the result of binding the given 𝜇 and some kernel 𝜅. Then for each outcome 𝑣

that is in the support of 𝜇, the assertion 𝐾 (𝑣) is required to hold on the distri-

bution 𝜅(𝑣) (packaged with the original 𝜎-algebra and permission to make up a

178


resource). Note that the definition is upward-closed by construction because of

the part ∃F , 𝜇, 𝑝.(F , 𝜇, 𝑝) ⪯ 𝑎.

As the name “conditioning modality” suggests, we want to use C𝜇 𝐾 to as-

sert formulas on conditional distributions. To assert 𝑄(𝑣) on the conditional

distribution fixing the value of a variable 𝑥 to 𝑣, we assert

∃𝜇′. C𝜇′ 𝑣.⌈𝑥 = 𝑣⌉ ∗𝑄(𝑣),

where we use the notation 𝑣.⌈𝑥 = 𝑣⌉ ∗𝑄(𝑣) to denote the map from any outcome

𝑣 ∈ supp(𝜇′) to the assertion ⌈𝑥 = 𝑣⌉ ∗ 𝑄(𝑣). This works because: if a resource

(F , 𝜇, 𝑝) satisfies C𝜇′ 𝑣.⌈𝑥 = 𝑣⌉ ∗ 𝑄(𝑣), then we can prove from C𝜇′ 𝑣.⌈𝑥 = 𝑣⌉ that

𝑥 is distributed as 𝜇′ in the 𝜇; furthermore, it says there must exists 𝜅 such that

𝜇 is the distribution of 𝑥 extended with 𝜅, i.e., 𝜇 = bind(𝜇′, 𝜅), and ⌈𝑥 = 𝑣⌉ in

(F , 𝜅(𝑣), 𝑝) for every 𝑣 ∈ supp(𝜇′), these two conditions together constrain 𝜅(𝑣)

to be the original distribution 𝜇 conditioned on ⌈𝑥 = 𝑣⌉. Because𝑄(𝑣) is asserted

on the distribution 𝜅(𝑣) for each 𝑣 ∈ supp(𝜇′) we have 𝑄(𝑣) holds in the respec-

tive conditional distributions. As an example, the conditional independence of

variables 𝑦 and 𝑧 given 𝑥 can be asserted as

∃𝜇′. C𝜇′ 𝑣.⌈𝑥 = 𝑣⌉ ∗ Own(𝑦) ∗ Own(𝑧).

In this particular case, 𝑄(𝑣) is invariant with respect to 𝑣.

5.3.5 The Rules of Conditioning and Independence

Although we adopt a “shallow embedding” approach to assertions in this chap-

ter, the rules of BLUEBELL provide an axiomatic treatment of these assertions so

that the user should never manipulate raw predicates over the semantic model. For

179


Distribution ownership rules

DIST-INJ
𝐸 $∼ 𝜇 ∧ 𝐸 $∼ 𝜇′ ⊢ ⌜𝜇 = 𝜇′⌝

SURE-MERGE
⌈𝐸1⌉ ∗ ⌈𝐸2⌉ ⊣⊢ ⌈(𝐸1 ∧ 𝐸2)⌉

PROD-SPLIT
(𝐸1, 𝐸2) $∼ 𝜇1 ⊗ 𝜇2 ⊢ 𝐸1 $∼ 𝜇1 ∗ 𝐸2 $∼ 𝜇2

Joint conditioning rules

C-TRUE
⊢ C𝜇 . True

C-FALSE
C𝜇 𝑣. False ⊢ False

C-CONS
∀𝑣. 𝐾1(𝑣) ⊢ 𝐾2(𝑣)

C𝜇 𝑣. 𝐾1(𝑣) ⊢ C𝜇 𝑣. 𝐾2(𝑣)

C-FRAME
𝑃 ∗ C𝜇 𝑣. 𝐾 (𝑣) ⊢ C𝜇 𝑣. (𝑃 ∗ 𝐾 (𝑣))

C-UNIT-L
C 𝛿𝑣0

𝑣. 𝐾 (𝑣) ⊣⊢ 𝐾 (𝑣0)
C-UNIT-R
𝐸 $∼ 𝜇 ⊣⊢ C𝜇 𝑣. ⌈𝐸 = 𝑣⌉

C-ASSOC
𝜇0 = bind(𝜇, λ𝑣. (bind(𝜅(𝑣), λ𝑤. return(𝑣, 𝑤))))

C𝜇 𝑣. C𝜅 (𝑣) 𝑤. 𝐾 (𝑣, 𝑤) ⊢ C𝜇0 (𝑣, 𝑤). 𝐾 (𝑣, 𝑤)

C-UNASSOC
Cbind(𝜇,𝜅 ) 𝑤. 𝐾 (𝑤) ⊢ C𝜇 𝑣. C𝜅 (𝑣) 𝑤. 𝐾 (𝑤)

C-SKOLEM
𝜇 :D(Σ𝐴)

C𝜇 𝑣. ∃𝑥 : 𝑋. 𝑄(𝑣, 𝑥) ⊢ ∃ 𝑓 : 𝐴→ 𝑋. C𝜇 𝑣. 𝑄(𝑣, 𝑓 (𝑣))

C-TRANSF
𝑓 : supp(𝜇) → supp(𝜇′) bijective
∀𝑏 ∈ supp(𝜇′). 𝜇′(𝑏) = 𝜇( 𝑓 −1(𝑏))
C𝜇 𝑎. 𝐾 (𝑎) ⊢ C𝜇′ 𝑏. 𝐾 ( 𝑓 −1(𝑏))

SURE-STR-CONVEX
C𝜇 𝑣. (𝐾 (𝑣) ∗ ⌈𝐸⌉) ⊢ ⌈𝐸⌉ ∗ C𝜇 𝑣. 𝐾 (𝑣)

C-FOR-ALL
C𝜇 𝑣.∀𝑥 : 𝑋. 𝑄(𝑣, 𝑥) ⊢ ∀𝑥 : 𝑋. C𝜇 𝑣. 𝑄(𝑣, 𝑥)

C-PURE
⌜𝜇(𝑋) = 1⌝ ∗ C𝜇 𝑣. 𝐾 (𝑣) ⊣⊢ C𝜇 𝑣. (⌜𝑣 ∈ 𝑋⌝ ∗ 𝐾 (𝑣))

Figure 5.3: Primitive rules of BLUEBELL.

brevity, we omit the rules that apply to the basic connectives of separation logic,

as they are well-known and have been proven correct for any model that is an

RA. For those we refer to Krebbers et al. [2018].

We make a distinction between “primitive” and “derived” rules. The primi-

tive rules require proofs that manipulate the semantic model definitions directly.

180


Ownership and distributions

SURE-DIRAC
𝐸 $∼ 𝛿𝑣 ⊣⊢ ⌈𝐸 = 𝑣⌉

SURE-EQ-INJ
⌈𝐸 = 𝑣⌉ ∗ ⌈𝐸 = 𝑣′⌉ ⊢ ⌜𝑣 = 𝑣′⌝

SURE-SUB
𝐸1 $∼ 𝜇 ∗ ⌈(𝐸2 = 𝑓 (𝐸1))⌉ ⊢ 𝐸2 $∼ 𝜇 ◦ 𝑓 −1

DIST-FUN
𝐸 $∼ 𝜇 ⊢ ( 𝑓 ◦ 𝐸) $∼ 𝜇 ◦ 𝑓 −1

DIRAC-DUP
𝐸 $∼ 𝛿𝑣 ⊢ 𝐸 $∼ 𝛿𝑣 ∗ 𝐸 $∼ 𝛿𝑣

DIST-SUPP
𝐸 $∼ 𝜇 ⊢ 𝐸 $∼ 𝜇 ∗ ⌈𝐸 ∈ supp(𝜇)⌉

PROD-UNSPLIT
𝐸1 $∼ 𝜇1 ∗ 𝐸2 $∼ 𝜇2 ⊢ (𝐸1, 𝐸2) $∼ 𝜇1 ⊗ 𝜇2

Joint conditioning

C-FUSE
C𝜇 𝑣. C𝜅 (𝑣) 𝑤. 𝐾 (𝑣, 𝑤) ⊣⊢ C𝜇�𝜅 (𝑣, 𝑤). 𝐾 (𝑣, 𝑤)

C-SWAP
C𝜇1 𝑣1. C𝜇2 𝑣2. 𝐾 (𝑣1, 𝑣2) ⊢ C𝜇2 𝑣2. C𝜇1 𝑣1. 𝐾 (𝑣1, 𝑣2)

SURE-CONVEX
C𝜇 𝑣. ⌈𝐸⌉ ⊢ ⌈𝐸⌉

DIST-CONVEX
C𝜇 𝑣. 𝐸 $∼ 𝜇′ ⊢ 𝐸 $∼ 𝜇′

C-SURE-PROJ
C𝜇 (𝑣, 𝑤). ⌈𝐸 (𝑣)⌉ ⊣⊢ C𝜇◦𝜋−1

1
𝑣. ⌈𝐸 (𝑣)⌉

C-EXTRACT
C𝜇1 𝑣1.

(
⌈𝐸1 = 𝑣1⌉ ∗ 𝐸2 $∼ 𝜇2

)
⊢ 𝐸1 $∼ 𝜇1 ∗ 𝐸2 $∼ 𝜇2

C-DIST-PROJ
C𝜇 (𝑥, 𝑦). 𝐸 (𝑥) $∼ 𝜇(𝑥) ⊢ C𝜇◦𝜋−1

1
𝑥. 𝐸 (𝑥) $∼ 𝜇(𝑥)

Figure 5.4: Derived rules.

The derived rules can be proved sound by staying at the level of the logic, i.e. by

using the primitive rules of BLUEBELL. Figure 5.3 presents the primitive rules

and fig. 5.4 presents the derived rules.2

We first present three primitive rules concerning distribution ownership.

DIST-INJ allows us to conclude from two assertions on an expression’s distribu-

tion that the distributions asserted in the two are the same. SURE-MERGE com-

bines two sure assertions into one. PROD-SPLIT rewrites an assertion saying that

2We omitted rules for relational reasoning here. They are presented in the appendix of the
conference version Bao et al. [2025].

181


two expressions are distributed as the independent product of 𝜇1, 𝜇2 using the

independent conjunction ∗ in the logic.

We then present the primitive rules for the conditioning modality. Among

the primitive rules, C-TRUE, C-FALSE, C-FRAME, C-SKOLEM and C-FOR-ALL

describe how the conditioning modality interacts with other connectives in the

logic — respectively True, False, ∗, ∃ and ∀ here. In particular, C-TRUE allows to

introduce a trivial modality; together with C-FRAME this allows for the intro-

duction of the modality around any assertion.

Because distributions form a monad, and the definition of the conditioning

modality uses the monadic bind, we also have rules corresponding to the three

monad laws: C-UNIT-L (resp. C-UNIT-R) reflects the existence of left units (resp.

right unit) for bind; and C-ASSOC and C-UNASSOC) hold because the monadic

bind is associative. Among the rest, C-CONS allows us to weaken the asser-

tion under conditioning; C-TRANSF allows for the transformation of the convex

combination using 𝜇 into using 𝜇′ by applying a bijection between their support

in a way that does not affect the weight of each outcome; SURE-STR-CONVEX

internalizes a stronger version of convexity of ⌈𝐸⌉ assertions and allows us to

pull ⌈𝐸⌉ out of conditioning modality — it is the reversal of C-FRAME but only

applies for sure assertions; C-PURE allows to translate facts that hold with prob-

ability 1 in 𝜇 to predicates that hold on every 𝑣 bound by conditioning on 𝜇.

The derived rules capture other useful reasoning patterns followed from the

primitive rules. For instance, C-FUSE is derived from C-ASSOC and C-UNASSOC.

It concerns a particular distribution 𝜇 � 𝑘 , defined by

𝜇 � 𝜅 := bind(𝜇, 𝑣 ↦→ unit(𝑣) ⊗ 𝜅(𝑣)).

We explain the rest of these rules in section 5.5 when they are used.

182


5.4 Reasoning about Programs in BLUEBELL

To reason about programs, we introduce a weakest-precondition assertion (WP)

wp 𝑡 {𝑄}. Our weakest-precondition assertion wp 𝑡 {𝑄} intuitively states: given

the current input distributions, if we run the programs in 𝑡, we obtain output

distributions that satisfy 𝑄; furthermore, every frame is preserved.

Definition 5.4.1 (Weakest Precondition). For 𝑎 ∈ M𝐼 and 𝜇 :D(ΣMem[Var] 𝐼 ), let

𝑎 ⪯ 𝜇 to abbreviate for 𝑎 ⪯ (ΣMem[Var] 𝐼 , 𝜇, λ𝑥. 1).

wp 𝑡 {𝑄} ≜ λ𝑎.∀𝜇0.∀𝑐. (𝑎 · 𝑐) ⪯ 𝜇0 ⇒ ∃𝑏.
(
(𝑏 · 𝑐) ⪯ ⟦𝑡⟧(𝜇0) ∧𝑄(𝑏)

)
The assertion holds on the resources 𝑎 such that if, together with some

frame 𝑐, they can be seen as a fragment of the global distribution 𝜇0, then it is

possible to update the resource to some 𝑏 which still composes with the frame 𝑐,

and 𝑏 · 𝑐 can be seen as a fragment of the output distribution ⟦𝑡⟧(𝜇0). Moreover,

such 𝑏 needs to satisfy the postcondition 𝑄.

In the previous chapters, we use Hoare triples for reasoning about programs.

We remark two kinds of differences here. The first difference is between the

Hoare-style logic and weakest-precondition style specifications. Previously, we

treat Hoare triples as judgments in the program logic layer, which uses the as-

sertion logic layer for specifications. But here, we consider WP as a modality of

the logic, analogous to the conditioning modality. The WP modality only uses

the post-conditions and the programs while Hoare triple in addition takes the

preconditions. One can also define Hoare triples on top of the WP by

{𝑃} 𝑡 {𝑄} ≜ 𝑃 ⊢ wp 𝑡 {𝑄}

and computes a sufficient pre-condition. The second difference is our design

183


choice to require every frame to be preserved in the definition of WP. While this

sets us apart from the the original notion of weakest preconditions in Dijkstra’s

seminal paper Dijkstra [1975], the choice of requiring the frame to be preserved

is also prevalent in separation logics literature (e.g., Jung et al. [2015], Li et al.

[2023a]). Crucially, this more complicated version of WP frees us from requiring

the formulas to satisfy the restriction property, which is needed to isolate a part

of the resource sufficient for validating a formula in LINA and DIBI.

We present the full set of WP rules in fig. 5.5. The structural rules has the

standard WP-CONS that allows us to weaken the post-condition. WP-CONS,

as we desired, does not need side conditions. C-WP-SWAP is a new rule, saying

that we can commute the conditioning modality and the WP. This rule facilitates

case analysis in program analysis: it implies that, if we can condition the current

probability space based on different scenarios 𝑣 ∼ 𝜇 and, for each scenario 𝑣, we

have 𝑄(𝑣) after running 𝑡, then we can push the case analysis to the postcondi-

tion after running 𝑡. There is a side condition, however, that we need to own

all the variables in Var because subtleties in the interaction of C-WP-swapping

frame preservation.

For the program rules, WP-SKIP and WP-SEQ are standard. We discuss the

rules for assignments and sampling in more detail below. WP-IF-PRIM is also

the standard rule for a conditional whose guard is simply a value; but we can

reason about conditionals whose guard is a randomized variable as well by first

conditioning on the value of the guard, and then apply WP-IF-PRIM together

with WP-BIND and C-WP-SWAP. We encapsulate this reasoning pattern as the

derived rule WP-IF-UNARY. The loop rule WP-LOOP-UNF helps unfolding a loop

with (𝑛+1) iterations, and WP-LOOP reduces the task of reasoning about 𝑛 itera-

184


Structural WP rules

WP-CONS
𝑄 ⊢ 𝑄′

wp 𝑡 {𝑄} ⊢ wp 𝑡 {𝑄′}
WP-FRAME
𝑃 ∗ wp 𝑡 {𝑄} ⊢ wp 𝑡 {𝑃 ∗ 𝑄}

C-WP-SWAP
C𝜇 𝑣.wp 𝑡 {𝑄(𝑣)} ∧ ownVar ⊢ wp 𝑡

{
C𝜇 𝑣. 𝑄(𝑣)

}
Program WP rules

WP-SKIP
𝑃 ⊢ wp [skip] {𝑃}

WP-SEQ
wp [𝑡]

{
wp [𝑡′] {𝑄}

}
⊢ wp [𝑡; 𝑡′] {𝑄}

WP-ASSIGN
𝑥 ∉ FV(𝑒) ∀𝑦 ∈ FV(𝑒). 𝑝(𝑦) > 0 𝑝(𝑥) = 1

(𝑝) ⊢ wp [x := 𝑒]
{
⌈𝑥 = 𝑒⌉@𝑝

} WP-SAMP
(𝑥:1) ⊢ wp [x 𝑑(®𝑣)] {𝑥 $∼ 𝑑 (®𝑣)}

WP-IF-PRIM
if 𝑣 then wp [𝑡1] {𝑄(1)} else wp [𝑡2] {𝑄(0)}
⊢ wp [if 𝑣 then 𝑡1 else 𝑡2] {𝑄(𝑣)}

WP-BIND
⌈𝑒 = 𝑣⌉ ∗ wp

[
E[𝑣]

]
{𝑄} ⊢ wp

[
E[𝑒]

]
{𝑄}

WP-LOOP-UNF
wp [repeat 𝑛 𝑡] {wp [𝑡] {𝑄}}
⊢ wp [repeat (𝑛 + 1) 𝑡] {𝑄}

WP-LOOP
∀𝑖 < 𝑛. 𝑃(𝑖) ⊢ wp [𝑡] {𝑃(𝑖 + 1)}
𝑃(0) ⊢ wp [repeat 𝑛 𝑡] {𝑃(𝑛)}

𝑛 ∈ N

Figure 5.5: The primitive WP rules of BLUEBELL.

tions to reason about each loop iteration. Both loop rules are proved by straight-

forward inductions on the semantics level, and we can also derive WP-LOOP-0

from these two rules.

We prove the soundness of each rules, using facts in first-order logic, in ap-

pendix D.5.

Theorem 5.4.1. If 𝑃 ⊢ 𝑄, then 𝑃⇒ 𝑄 is derivable in first-order logic.

WP-SAMP is the expected “small footprint” rule for sampling; the precondi-

185


WP-LOOP-0
𝑃 ⊢ wp [𝑖:repeat 0 𝑡] {𝑃}

WP-IF-UNARY
𝑃 ∗ ⌈𝑒 = 1⌉ ⊩ wp [𝑡1] {𝑄(1)} 𝑃 ∗ ⌈𝑒 = 0⌉ ⊩ wp [𝑡2] {𝑄(0)}

𝑃 ∗ 𝑒 $∼ 𝛽 ⊩ wp [if 𝑒 then 𝑡1 else 𝑡2]
{
C𝛽 𝑏. 𝑄(𝑏)

}
Figure 5.6: Derived WP rules.

tion only requires full permission on the variable being assigned, to forbid any

frame to record information about it. WP-ASSIGN requires full permission on

𝑥, and non-zero permission on the variables on the right-hand side of the as-

signment. This allows the postcondition to assert that 𝑥 and the expression 𝑒

assigned to it are equal with probability 1. The condition 𝑥 ∉ FV( ®𝑒) ensures 𝑒

has the same meaning before and after the assignment, but is not restrictive: if

needed the old value of 𝑥 can be stored in a temporary variable, or the proof can

condition on 𝑥 to work with its pure value.

The assignment and sampling rules are the only ones that impose constraints

on the owned permissions. In proofs, this means that most rule applications

simply thread through permissions so that the needed permissions can reach

the applications of the assignment rules. To avoid cluttering proof derivations

with this bookkeeping, we mostly omit permission information from assertions.

The appropriate permission annotations can be inferred, as we show in the fol-

lowing example.

Example 5.4.1. Consider the following triple with an unknown permission 𝑝:

(𝑥 $∼ 𝜇1 ∗ ⌈𝑥 = 𝑦⌉ ∗ 𝑧 $∼ 𝜇2)@(𝑝) ⊢ wp [x :=z] {(⌈𝑥 = 𝑧⌉ ∗ 𝑧 $∼ 𝜇2)@(𝑝)}

We want to determine a choice for 𝑝 that makes the proof derivation goes

through. Because the assignment only changes the variable 𝑥 by an assignment,

186


our proof strategy is to first apply WP-ASSIGN and WP-CONS to prove

(𝑥 $∼ 𝜇1 ∗ ⌈𝑥 = 𝑦⌉)@(𝑝′) ⊢ wp [x :=z] {⌈𝑥 = 𝑧⌉@(𝑝′)},

for some suitable 𝑝′ and then apply WP-FRAME to frame 𝑧 $∼ 𝜇2 and prove the

original goal.

To apply WP-ASSIGN, we need to ensure 𝑝′(𝑥) = 1 and 𝑝′(𝑧) > 0. Because

⌈𝑥 = 𝑦⌉ is not trivial on 𝑦, so it must 𝑝′(𝑦) > 0 as well Also, to apply WP-FRAME to

frame 𝑧 $∼ 𝜇2, we need to ensure 𝑝′ composes with another permission 𝑝′′ that is

compatible with the probability space where 𝑧 $∼ 𝜇2 holds; because 𝑧 $∼ 𝜇2 asserts

𝑧 is non-trivial, it must 𝑝′′(𝑧) > 0, indicating 𝑝′(𝑧) < 1. Thus, one reasonable way

to distribute the permission is

𝑝′(𝑥) = 1 𝑝′(𝑦) = 1/2𝑝′(𝑧) = 1/3

𝑝′′(𝑥) = 0 𝑝′′(𝑦) = 0𝑝′′(𝑧) = 1/3.

We can thus prove that

(𝑥 $∼ 𝜇1 ∗ ⌈𝑥 = 𝑦⌉)@
(
𝑥:1, 𝑦:

1
2
, 𝑧:

1
3

)
∗ 𝑧 $∼ 𝜇2@

(
𝑧:

1
3

)
⊢ wp [x :=z] {(⌈𝑥 = 𝑧⌉@

(
𝑥:1, 𝑦:

1
2
, 𝑧:

1
3

)
∗ 𝑧 $∼ 𝜇2)@

(
𝑧:

1
3

)
}

The triple can be further composed with a frame that asserts fractional permis-

sion for 𝑦 and 𝑧. because permissions in the range (0, 1) essentially serve the

same role, we can also pick different number for 𝑝′(𝑦), 𝑝′(𝑧) and 𝑝′′(𝑧) as long

as 𝑝′(𝑦), 𝑝′(𝑧), 𝑝′′(𝑧) stay in (0, 1) and 𝑝′(𝑧) + 𝑝′′(𝑧) stay in (0, 1).

187


5.5 Case Studies for BLUEBELL

Our evaluation of BLUEBELL is based on two main lines of enquiry: (1) Are high-

level principles about probabilistic reasoning provable from the core constructs

of BLUEBELL? (2) Does BLUEBELL, through enabling new reasoning patterns,

expand the horizon for verification of probabilistic programs beyond what was

possible before? We include case studies that try to highlight the contribution

of BLUEBELL each question, and sometimes both at the same time. Specifically,

our evaluation is guided by the following research questions:

RQ1: Do joint conditioning and independence offer a good abstract interface

over the underlying semantic model?

RQ2: Can known unary/relational principles be reconstructed from BLUE-

BELL’s primitives?

RQ3: Can new unary/relational principles be discovered (as new lemmas) and

proved from BLUEBELL’s primitives?

RQ4: Can BLUEBELL’s primitives be successfully incorporated in an effective

program logic?

Since we only introduced the unary part of BLUEBELL, we only show exam-

ples that exercise BLUEBELL’s unary reasoning.

5.5.1 One Time Pad Revisited

188


def encrypt():

k Ber(1
2)

m Ber(𝑝)

c :=k xor m

Figure 5.7: One time pad.

In fig. 5.7 we show a simple example

adapted from Barthe et al. [2019]: the

encrypt procedure uses a fair coin flip

to generate an encryption key 𝑘 , gen-

erates a plaintext message in boolean

variable 𝑚 (using a coin flip with

some bias 𝑝) and produces the cipher-

text c by XORing the key and the mes-

sage.

One way of stating and proving the correctness of encrypt is to establish that

in the output distribution 𝑐 and 𝑚 are independent, which can be expressed as

the unary goal:

( [𝑘: 1, 𝑚: 1, 𝑐: 1]) ⊢ wp [1:encrypt()] {𝑐 $∼ Bern1/2 ∗ 𝑚 $∼ Bern𝑝}

The triple states that after running encrypt, the ciphertext 𝑐 is distributed as

a fair coin, and—importantly—is not correlated with the plaintext in 𝑚. The

PSL proof in Barthe et al. [2019] performs some of the steps within the logic,

but needs to carry out some crucial entailments at the meta-level, which shows

some limitations of its abstractions (RQ1). The same applies to the Lilac proof

in Li et al. [2023a] which requires ad-hoc lemmas proven on the semantic model.

The stumbling block is proving the valid entailment:

𝑘 $∼ Bern 1
2
∗ 𝑚 $∼ Bern𝑝 ∗ ⌈𝑐 = 𝑘 xor 𝑚⌉ ⊢ 𝑚 $∼ Bern𝑝 ∗ 𝑐 $∼ Bern 1

2

In BLUEBELL we can prove the entailment in two steps: (1) we condition on 𝑚

and 𝑘 to compute the result of the 𝑥𝑜𝑟 operation and obtain that 𝑐 is distributed

as Bern 1
2
; (2) we carefully eliminate the conditioning while preserving the inde-

pendence of 𝑚 and 𝑐.

189


The first step starts by conditioning the distribution of the message 𝑚 us-

ing C-UNIT-R.

𝑘 $∼ Bern 1
2
∗ 𝑚 $∼ Bern𝑝 ∗ ⌈𝑐 = 𝑘 xor 𝑚⌉

⊢ 𝑚 $∼ Bern𝑝 ∗ 𝑘 $∼ Bern 1
2
∗ ⌈𝑐 = 𝑘 xor 𝑚⌉

⊢
(
CBern𝑝 𝑢. ⌈𝑚 = 𝑢⌉

)
∗ 𝑘 $∼ Bern 1

2
∗ ⌈𝑐 = 𝑘 xor 𝑚⌉ (C-UNIT-R)

Because the key 𝑘 is sampled independently from the message 𝑚 and the

almost sure assertion ⌈𝑐 = 𝑘 xor 𝑚⌉ also holds on an independent probability

space, conditioning on 𝑚 does not change assertion on them — this idea is for-

malized by the rule C-FRAME.(
CBern𝑝 𝑢. ⌈𝑚 = 𝑢⌉

)
∗ 𝑘 $∼ Bern 1

2
∗ ⌈𝑐 = 𝑘 xor 𝑚⌉

⊢ CBern𝑝 𝑢.
(
⌈𝑚 = 𝑢⌉ ∗ 𝑘 $∼ Bern 1

2
∗ ⌈𝑐 = 𝑘 xor 𝑚⌉

)
(C-FRAME)

Because ⌈𝑚 = 𝑢⌉ and ⌈𝑐 = 𝑘 xor 𝑚⌉ under conditioning, so we can merge the

two facts into ⌈𝑐 = 𝑘 xor 𝑢⌉ using SURE-MERGE. After that, we condition on 𝑘 ,

again by first using C-UNIT-R and then moving the fact ⌈𝑐 = 𝑘 xor 𝑢⌉ under the

conditioning using C-FRAME.

CBern𝑝 𝑢.
(
⌈𝑚 = 𝑢⌉ ∗ 𝑘 $∼ Bern 1

2
∗ ⌈𝑐 = 𝑘 xor 𝑚⌉

)
⊢ CBern𝑝 𝑢.

(
⌈𝑚 = 𝑢⌉ ∗ 𝑘 $∼ Bern 1

2
∗ ⌈𝑐 = 𝑘 xor 𝑢⌉

)
(SURE-MERGE)

⊢ CBern𝑝 𝑢.
(
⌈𝑚 = 𝑢⌉ ∗

(
CBern 1

2
𝑣. ⌈𝑘 = 𝑣⌉

)
∗ ⌈𝑐 = 𝑘 xor 𝑢⌉

)
(C-UNIT-R)

⊢ CBern𝑝 𝑢.
(
⌈𝑚 = 𝑢⌉ ∗ CBern 1

2
𝑣.

(
⌈𝑘 = 𝑣⌉ ∗ ⌈𝑐 = 𝑘 xor 𝑢⌉

) )
(C-FRAME)

Under the conditioning of 𝑚 and 𝑘 , we have the fact that ⌈𝑘 = 𝑣⌉ ∗ ⌈𝑐 =

𝑘 xor 𝑢⌉. In the final goal, we do not care what the key 𝑘 is as long as the ciphered

message 𝑐 does not leak any information about the original message 𝑚. So next,

190


we side-step assertions about 𝑘 to facilitate reasoning about 𝑐 and 𝑚. Formally,

we merge ⌈𝑘 = 𝑣⌉ ∗ ⌈𝑐 = 𝑘 xor 𝑢⌉ into ⌈𝑘 = 𝑣∧𝑐 = 𝑘 xor 𝑢⌉ using SURE-MERGE, and

then apply propositional reasoning to rewrite it into a case analysis, i.e. ⌈𝑐 = 𝑣⌉

when 𝑢 = 0 and ⌈𝑐 = ¬𝑣⌉ when 𝑢 = 1.

CBern𝑝 𝑢.
(
⌈𝑚 = 𝑢⌉ ∗ CBern 1

2
𝑣.

(
⌈𝑘 = 𝑣⌉ ∗ ⌈𝑐 = 𝑘 xor 𝑢⌉

) )
⊢ CBern𝑝 𝑢.

(
⌈𝑚 = 𝑢⌉ ∗ CBern 1

2
𝑣. ⌈𝑘 = 𝑣 ∧ 𝑐 = 𝑣 xor 𝑢⌉

)
(SURE-MERGE)

⊢ CBern𝑝 𝑢.
©­­­«⌈𝑚 = 𝑢⌉ ∗


CBern 1

2
𝑣. ⌈𝑐 = 𝑣⌉ if 𝑢 = 0

CBern 1
2
𝑣. ⌈𝑐 = ¬𝑣⌉ if 𝑢 = 1

ª®®®¬ (C-CONS)

The next is a crucial application of C-TRANSF. Because Bern 1
2

is uniform

between 0 and 1, if we choose the bijection 𝑓 : 𝑣 ↦→ ¬𝑣, then the pushforward

measure of Bern 1
2

by 𝑓 is also Bern 1
2
. Thus, applying C-TRANSF, we can rewrite

CBern 1
2
𝑣. ⌈𝑐 = ¬𝑣⌉ into CBern 1

2
𝑤. ⌈𝑐 = ¬¬𝑤⌉. Therefore,

CBern𝑝 𝑢.
©­­­«⌈𝑚 = 𝑢⌉ ∗


CBern 1

2
𝑣. ⌈𝑐 = 𝑣⌉ if 𝑢 = 0

CBern 1
2
𝑣. ⌈𝑐 = ¬𝑣⌉ if 𝑢 = 1

ª®®®¬
⊢ CBern𝑝 𝑢.

©­­­«⌈𝑚 = 𝑢⌉ ∗


CBern 1

2
𝑣. ⌈𝑐 = 𝑣⌉ if 𝑢 = 0

CBern 1
2
𝑤. ⌈𝑐 = 𝑤⌉ if 𝑢 = 1

ª®®®¬
Now, the formulas in the two cases of the case analysis are equivalent, so we

can combine the two cases.

CBern𝑝 𝑢.
©­­­«⌈𝑚 = 𝑢⌉ ∗


CBern 1

2
𝑣. ⌈𝑐 = 𝑣⌉ if 𝑢 = 0

CBern 1
2
𝑤. ⌈𝑐 = 𝑤⌉ if 𝑢 = 1

ª®®®¬ (C-TRANSF)

⊢ CBern𝑝 𝑢.
(
⌈𝑚 = 𝑢⌉ ∗ CBern 1

2
𝑣. ⌈𝑐 = 𝑣⌉

)
191


Now, we can almost read off that no matter what value 𝑚 takes, the vari-

able 𝑐 is distributed as a Bernoulli distribution Bern 1
2
, and that implies that 𝑚

and 𝑐 are independent. Formally, we apply a sequence of steps to reach that

conclusion. The first two steps are familiar applications of C-FRAME followed

by SURE-MERGE. Then, C-ASSOC binds the two distribution together — because

the second distribution Bern 1
2

does not use any 𝑢 drawn from the first distri-

bution, we get the independent product of Bern𝑝 ⊗ Bern 1
2

as the result. After

that, C-UNIT-R eliminates the conditioning. Last, PROD-SPLIT helps us pull the

independent product in the distribution Bern𝑝 ⊗ Bern 1
2

into the independence

conjunction asserted in our logic 𝑚 $∼ Bern𝑝 ∗ 𝑐 $∼ Bern 1
2
.

CBern𝑝 𝑢.
(
⌈𝑚 = 𝑢⌉ ∗ CBern 1

2
𝑣. ⌈𝑐 = 𝑣⌉

)
⊢ CBern𝑝 𝑢. CBern 1

2
𝑣.

(
⌈𝑚 = 𝑢⌉ ∗ ⌈𝑐 = 𝑣⌉

)
(C-FRAME)

⊢ CBern𝑝 𝑢. CBern 1
2
𝑣. ⌈𝑚 = 𝑢 ∧ 𝑐 = 𝑣⌉ (SURE-MERGE)

⊢ CBern𝑝⊗Bern 1
2
(𝑢, 𝑣). ⌈(𝑚, 𝑐) = (𝑢, 𝑣)⌉ (C-ASSOC)

⊢ (𝑚, 𝑐) $∼ (Bern𝑝 ⊗ Bern 1
2
) (C-UNIT-R)

⊢ 𝑚 $∼ Bern𝑝 ∗ 𝑐 $∼ Bern 1
2

(PROD-SPLIT)

5.5.2 Markov Blankets

We next study Markov Blankets [Pearl, 2014] — a useful concept in Bayesian rea-

soning — to illustrate BLUEBELL’s expressiveness (RQ1 and RQ2). Intuitively,

Markov blankets identify a set of variables that contains all useful information

of a target set of variables. When knowing the values of variables in a Markov

192


blanket, we no longer need to worry about how the other variables influence

the target set.

For concreteness, consider the program

𝑥1 𝑑1;𝑥2 𝑑2(𝑥1);𝑥3 𝑑3(𝑥2).

The program describes a Markov chain of three variables, where we first sample

𝑥1, then sample 𝑥2 from a distribution determined by the value of 𝑥1, and last

sample 𝑥3 from a distribution determined by the value of 𝑥2. These kinds of de-

pendencies are ubiquitous in, for instance, hidden Markov models and Bayesian

network representations of distributions.

Clearly, 𝑥3 depends on 𝑥2 and, indirectly, on 𝑥1. However, Markov chains

enjoy the memorylessness property: when fixing a variable in the chain, the

variables that follow it are independent of the variables that preceded it. For

our example, this means that conditioning on 𝑥2, then 𝑥1 and 𝑥3 are independent

(i.e. we can ignore the indirect dependencies). The memorylessness property is

used in a lot of analysis of Markov chain based algorithms and also causal infer-

ence. In the following, we prove the memorylessness property for this specific

program using BLUEBELL.

Using BLUEBELL’s program rules, we can prove that after the program exe-

cution the output distribution satisfies the assertion

C𝑑1 𝑣1.

(
⌈𝑥1 = 𝑣1⌉ ∗ C𝑑2 (𝑣1) 𝑣2.

(
⌈𝑥2 = 𝑣2⌉ ∗ 𝑥3 $∼ 𝑑3(𝑣2)

) )
We want to transform the assertion into:

C𝜇2 𝑣2.
(
⌈𝑥2 = 𝑣2⌉ ∗ 𝑥1 $∼ 𝜇1(𝑣2) ∗ 𝑥3 $∼ 𝑑3(𝑣2)

)
for appropriate 𝜇2 and 𝜇1.

193


In probability theory, the proof of memorylessness is an application of Bayes’

law: we compute the distribution of 𝑥1, 𝑥3 conditioned on 𝑥2, using the distribu-

tion of 𝑥2 conditioned on 𝑥1 and the distribution of 𝑥3 conditioned on 𝑥2. In

BLUEBELL we can reproduce the transformation using the joint conditioning

rules, in particular the right-to-left direction of C-FUSE and the primitive rule C-

UNASSOC.

Using these we can prove:

C𝑑1 𝑣1.

(
⌈𝑥1 = 𝑣1⌉ ∗ C𝑑2 (𝑣1) 𝑣2.

(
⌈𝑥1 = 𝑣2⌉ ∗ 𝑥3 $∼ 𝑑3(𝑣2)

) )
⊢ C𝑑1 𝑣1.

(
C𝑑2 (𝑣1) 𝑣2.

(
⌈𝑥1 = 𝑣1⌉ ∗ ⌈𝑥1 = 𝑣2⌉ ∗ 𝑥3 $∼ 𝑑3(𝑣2)

) )
(C-FRAME)

⊢ C𝜇0 (𝑣1, 𝑣2).
(
⌈𝑥1 = 𝑣1⌉ ∗ ⌈𝑥2 = 𝑣2⌉ ∗ 𝑥3 $∼ 𝑑3(𝑣2)

)
(C-FUSE)

⊢ C𝜇2 𝑣2.

(
C𝜇1 (𝑣2) 𝑣1.

(
⌈𝑥1 = 𝑣1⌉ ∗ ⌈𝑥2 = 𝑣2⌉ ∗ 𝑥3 $∼ 𝑑3(𝑣2)

) )
(C-UNASSOC)

⊢ C𝜇2 𝑣2.

(
⌈𝑥2 = 𝑣2⌉ ∗ C𝜇1 (𝑣2) 𝑣1.

(
⌈𝑥1 = 𝑣1⌉ ∗ 𝑥3 $∼ 𝑑3(𝑣2)

) )
(SURE-STR-CONVEX)

⊢ C𝜇2 𝑣2.
(
⌈𝑥2 = 𝑣2⌉ ∗ 𝑥1 $∼ 𝜇1(𝑣2) ∗ 𝑥3 $∼ 𝑑3(𝑣2)

)
(C-EXTRACT)

where 𝑑1 � 𝑑2 = 𝜇0 = 𝜇2 � 𝜇1. The existence of such 𝜇2 and 𝜇1 is a simple ap-

plication of Bayes’ law: 𝜇2(𝑣2) =
∑
𝑣1∈Val 𝜇0(𝑣1, 𝑣2), and 𝜇1(𝑣2) (𝑣1) = 𝜇0 (𝑣1,𝑣2)

𝜇2 (𝑣2) . The

second to last step pull out ⌈𝑥2 = 𝑣2⌉ from the second conditioning modality

C𝜇2 𝑣2. It is sound to pull out almost sure assertions like ⌈𝑥2 = 𝑣2⌉ but not gen-

eral assertions. Last, we use the derived rule C-EXTRACT to eliminate the second

conditioning and extract the distribution of 𝑥1 given 𝑥2.

We see the ability of BLUEBELL to perform these manipulations as evidence

that joint conditioning and independence form a sturdy abstraction over the se-

mantic model (RQ1). The meta-reasoning required to manipulate the distribu-

tions and the conditioning modality — here it is showing the existence of 𝜇1, 𝜇2

such that 𝑑1 � 𝑑2 = 𝜇2 � 𝜇1 — are minimal and localized. Our abstraction and

194


rules also offer a good way to inject facts about distributions without interfering

with the rest of the proof context.

5.5.3 Multi-party Secure Computation

In multi-party secure computation [Goldreich, 1998], the goal is for 𝑁 parties to

compute a function 𝑓 (𝑥1, . . . , 𝑥𝑁 ) of some private data 𝑥𝑖 owned by each party 𝑖,

without revealing any more information about 𝑥𝑖 than the output of 𝑓 would

reveal. For example, if 𝑓 is addition, a secure computation of 𝑓 can be used

to compute the total number of votes without revealing who voted positively:

some information would leak (e.g. if the total is non-zero then somebody voted

positively) but only what is revealed by knowing the total and nothing more.

To achieve this objective, multi-party secure addition (MPSAdd) works by

having the parties break their secret into 𝑁 secret shares which individually look

random, but the sum of which amounts to the original secret. These secret

shares are then distributed to the other parties so that each party knows an in-

complete set of shares of the other parties. Yet, each party can reliably compute

the result of the function by computing a function of the received shares.

Barthe et al. [2019] analyze this example, where they prove the independence

between each party’s view and any other party’s secrets after the first round of

communications. However, there are two rounds of communication in the proto-

col, and after the second round, each party would get more information about

the other party’s values. So Barthe et al. [2019]’s proof does not ensure the end-

to-end security of the protocol.

195


We aim to formally establish the end-to-end security of the protocol. As is

very often the case, there is no single “canonical” way of specifying this kind of

security property. For MPSAdd between three parties, for instance, one way to

formalize security is a unary specification saying that, conditionally on the secret

of party 1 and the sum of the other secrets, all the values received by 1 (we call

this the view of 1) are independent from the secrets of the other parties. Roughly:

(𝑥1, 𝑥2, 𝑥3) $∼ 𝜇0 ⊢ wp [𝑀𝑃𝑆𝐴𝑑𝑑]

∃𝜇. C𝜇 (𝑣, 𝑠).

⌈𝑥1 = 𝑣 ∧ (𝑥2 + 𝑥3) = 𝑠⌉ ∗

Own(𝑣𝑖𝑒𝑤1) ∗ Own(𝑥2, 𝑥3)




where 𝜇0 is an arbitrary distribution of the three secrets among the three par-

ties. The post-condition asserts that if we condition on the party 1’s secret and

the sum of party 2 and party 3’s secrets, which the party 1 can infer based on the

sum of the three secrets, then what party 1 can view, capture by 𝑣𝑖𝑒𝑤1 is inde-

pendent of party 2 and party 3’s secrets. Here, the conditioning nicely expresses

that the acceptable leakage is just the sum.

There is also a natural relational formulation of the security goal, and in the

conference paper Bao et al. [2025] we provide BLUEBELL proofs for:

1. the unary specification;

2. the relational specification (not using unary proof);

3. the equivalence of the two specifications.

The relational specification can also be proved using Probabilistic Relational

Hoare Logic Barthe et al. [2009]. In the following, we elaborate on the proof

for the unary specification.

196


First, we can apply the program rules for loops and assignments to obtain

the postcondition 𝑄:

𝑄 = 𝑋 ∗ 𝑅12 ∗ 𝑅3 ∗ 𝑆 ∗ 𝑆𝑢𝑚

where 𝑋 = (𝑥1, 𝑥2, 𝑥3) $∼ 𝜇0

𝑅12 = ∗
𝑖∈{1,2,3}

(𝑟 [𝑖] [1] $∼ U𝑝 ∗ 𝑟 [𝑖] [2] $∼ U𝑝)

𝑅3 = ∗
𝑖∈{1,2,3}

⌈𝑟 [𝑖] [3] = 𝑥𝑖 − 𝑟 [𝑖] [1] − 𝑟 [𝑖] [2]⌉

𝑆 =


∧

𝑖∈{1,2,3}
𝑠[𝑖] = 𝑟 [1] [𝑖] + 𝑟 [2] [𝑖] + 𝑟 [3] [𝑖]


Sum = ⌈𝑠𝑢𝑚 = 𝑠[1] + 𝑠[2] + 𝑠[3]⌉

Now the goal is to show that 𝑄 entails the desired postcondition. As a first

step we transform 𝑋 into (𝑥1, 𝑥2 + 𝑥2) $∼ 𝜇 by DIST-FUN. Then we condition on

197


(𝑥1, 𝑥2 + 𝑥3, 𝑥2, 𝑥3) and the variables in 𝑅12, obtaining:

C𝜇′ (𝑣1, 𝑣23, 𝑣2, 𝑣3).



⌈𝑥1 = 𝑣1 ∧ (𝑥2 + 𝑥3) = 𝑣23 ∧ (𝑥2, 𝑥3) = (𝑣2, 𝑣3)⌉ ∗

CU𝑝 𝑢11. CU𝑝 𝑢12. CU𝑝 𝑢21. CU𝑝 𝑢22. CU𝑝 𝑢31. CU𝑝 𝑢32.

𝑟 [1] [1] = 𝑢11 ∧ 𝑟 [1] [2] = 𝑢12 ∧ 𝑟 [1] [3] = 𝑣1 − 𝑢11 − 𝑢12

𝑟 [2] [2] = 𝑢22 ∧ 𝑟 [2] [3] = 𝑣2 − 𝑢21 − 𝑢22

𝑟 [3] [2] = 𝑢32 ∧ 𝑟 [3] [3] = (𝑣23 − 𝑣2) − 𝑢31 − 𝑢32

𝑠[1] = 𝑢11 + 𝑢21 + 𝑢31

𝑠[2] = 𝑢12 + 𝑢22 + 𝑢32

𝑠[3] = 𝑣1 − 𝑢11 − 𝑢12 + 𝑣2 − 𝑢21 − 𝑢22 + (𝑣23 − 𝑣2) − 𝑢31 − 𝑢32

𝑠𝑢𝑚 = 𝑠[1] + 𝑠[2] + 𝑠[3]




Here 𝜇′ = bind(𝜇0, (𝑥1, 𝑥2, 𝑥3) ↦→ unit(𝑥1, 𝑥2 + 𝑥3, 𝑥2). We already weakened the

assertion by forgetting the information about 𝑟 [2] [1] and 𝑟 [3] [1], which are not

part of 𝑣𝑖𝑒𝑤1.

Now we perform a change of variables using C-TRANSF, to express our

equalities in terms of 𝑢′21 = 𝑢21 − 𝑣2 instead of 𝑢21 and 𝑢′31 = 𝑢31 − (𝑣23 − 𝑣2)

instead of 𝑢31. To justify the change we simply observe that, for all 𝑛 ∈ Z𝑝, the

function 𝑓𝑛 (𝑢) = (𝑢 − 𝑛) mod 𝑝 is a bijection and U𝑝 ◦ 𝑓 −1
𝑛 = U𝑝. This gives us,

198


with some simple arithmetic simplifications:

C𝜇′ (𝑣1, 𝑣23, 𝑣2, 𝑣3).



⌈𝑥1 = 𝑣1 ∧ (𝑥2 + 𝑥3) = 𝑣23 ∧ (𝑥2, 𝑥3) = (𝑣2, 𝑣3)⌉ ∗

CU𝑝 𝑢11. CU𝑝 𝑢12. CU𝑝 𝑢
′
21. CU𝑝 𝑢22. CU𝑝 𝑢

′
31. CU𝑝 𝑢32.

𝑟 [1] [1] = 𝑢11 ∧ 𝑟 [1] [2] = 𝑢12 ∧ 𝑟 [1] [3] = 𝑣1 − 𝑢11 − 𝑢12

𝑟 [2] [2] = 𝑢22 ∧ 𝑟 [2] [3] = −𝑢′21 − 𝑢22

𝑟 [3] [2] = 𝑢32 ∧ 𝑟 [3] [3] = −𝑢′31 − 𝑢32

𝑠[1] = 𝑢11 + 𝑢′21 + 𝑢
′
31 + 𝑣23

𝑠[2] = 𝑢12 + 𝑢22 + 𝑢32

𝑠[3] = 𝑣1 − 𝑢11 − 𝑢12 − 𝑢′21 − 𝑢22 − 𝑢′31 − 𝑢32

𝑠𝑢𝑚 = 𝑠[1] + 𝑠[2] + 𝑠[3]




In particular, we removed all dependencies on 𝑣2 from the inner formula. We

can now apply C-ASSOC to collapse all the inner conditioning into a single one:

C𝜇′ (𝑣1, 𝑣23, 𝑣2, 𝑣3).


⌈𝑥1 = 𝑣1 ∧ (𝑥2 + 𝑥3) = 𝑣23 ∧ (𝑥2, 𝑥3) = (𝑣2, 𝑣3)⌉ ∗

C𝑈 (𝑣1,𝑣23) 𝑢. ⌈ ∗ ⌉𝑣𝑖𝑒𝑤1 = 𝑢


where 𝑈 (𝑣1, 𝑣23) = (𝑣 ← U𝑝 ⊗ . . . ⊗ U𝑝; return 𝑔(𝑣)) takes the six independent

samples from U𝑝 and returns the values for each of the components of 𝑣𝑖𝑒𝑤1

(which justifies the dependency on 𝑣1 and 𝑣23). Finally, we split 𝜇′ = bind(𝜇, 𝜅)

199


obtaining:

C𝜇′ (𝑣1, 𝑣23, 𝑣2, 𝑣3).


⌈𝑥1 = 𝑣1 ∧ (𝑥2 + 𝑥3) = 𝑣23 ∧ (𝑥2, 𝑥3) = (𝑣2, 𝑣3)⌉ ∗

C𝑈 (𝑣1,𝑣23) 𝑢. ⌈𝑣𝑖𝑒𝑤1 = 𝑢⌉


⊢ C𝜇′ (𝑣1, 𝑣23, 𝑣2, 𝑣3).


⌈𝑥1 = 𝑣1 ∧ (𝑥2 + 𝑥3) = 𝑣23⌉ ∗

⌈(𝑥2, 𝑥3) = (𝑣2, 𝑣3)⌉ ∗

𝑣𝑖𝑒𝑤1 $∼ 𝑈 (𝑣1, 𝑣23)


(SURE-MERGE, C-UNIT-R)

⊢ C 𝜇̃0 (𝑣1, 𝑣23).


⌈𝑥1 = 𝑣1 ∧ (𝑥2 + 𝑥3) = 𝑣23⌉ ∗

C𝜅(𝑣1,𝑣23) (𝑣2, 𝑣3).(
⌈(𝑥2, 𝑥3) = (𝑣2, 𝑣3)⌉ ∗ 𝑣𝑖𝑒𝑤1 $∼ 𝑈 (𝑣1, 𝑣23)

)


(C-UNASSOC, SURE-STR-CONVEX)

⊢ C 𝜇̃0 (𝑣1, 𝑣23).


⌈𝑥1 = 𝑣1 ∧ (𝑥2 + 𝑥3) = 𝑣23⌉ ∗

(𝑥2, 𝑥3) $∼ 𝜅(𝑣1, 𝑣23) ∗ 𝑣𝑖𝑒𝑤1 $∼ 𝑈 (𝑣1, 𝑣23)

 (C-EXTRACT)

⊢ C 𝜇̃0 (𝑣1, 𝑣23).


⌈𝑥1 = 𝑣1 ∧ (𝑥2 + 𝑥3) = 𝑣23⌉ ∗

𝑣𝑖𝑒𝑤1 $∼ 𝑈 (𝑣1, 𝑣23) ∗ ∃𝜇23. (𝑥2, 𝑥3) $∼ 𝜇23


This gets us the desired postcondition and concludes the proof.

5.5.4 Von Neumann Extractor

A randomness extractor is a mechanism that transforms a stream of “low-

quality” randomness sources into a stream of “high-quality” randomness

sources. The von Neumann extractor [von Neumann, 1951] is perhaps the ear-

liest instance of such a mechanism, and it converts a stream of independent

coins with the same bias 𝑝 into a stream of independent fair coins. Verifying

200


the correctness of the extractor requires careful reasoning under conditioning,

and showcases the use of C-WP-SWAP in a unary setting, which gives positive

answer to RQ2 and RQ4.

def vn(𝑁):

len :=0

repeat 𝑁:

coin1 Ber(𝑝)

coin2 Ber(𝑝)

if coin1 ≠coin2 then:

out[len] :=coin1

len :=len+1

Figure 5.8: Von Neumann extractor.

We can model the extractor, up to

𝑁 ∈ N iterations, in our language3 as

shown in fig. 5.8. The program repeat-

edly flips two biased coins, and outputs

the outcome of the first coin if the out-

comes where different, otherwise it re-

tries. As an example, we prove in BLUE-

BELL that the bits produced in out are

independent fair coin flips. Formally, for

ℓ produced bits, we want the following

to hold:

Outℓ ≜ 𝑜𝑢𝑡 [0] $∼ Bern 1
2
∗· · ·∗𝑜𝑢𝑡 [ℓ−1] $∼ Bern 1

2
.

To know how many bits were produced, however, we need to condition on len

obtaining the specification (recall 𝑃 ⊩ 𝑄 ≜ 𝑃 ∧ ownVar ⊢ 𝑄 ∧ ownVar):

⊩ wp [1: 𝑣𝑛(𝑁)]
{
∃𝜇. C𝜇 ℓ.

(
⌈𝑙𝑒𝑛 = ℓ ≤ 𝑁⌉ ∗ Outℓ

)}
The BLUEBELL proof of this specification is shown in the outline in fig. 5.9.

The postcondition straightforwardly generalizes to a loop invariant

𝑃(𝑖) = ∃𝜇. C𝜇 ℓ.
(
⌈𝑙𝑒𝑛 = ℓ ≤ 𝑖⌉ ∗ Outℓ

)
At step (5.2) we show, by using C-UNIT-L and the definition of ⌈ · ⌉, that we can

3While technically our language does not support arrays, they can be easily encoded as a
collection of 𝑁 variables.

201


{
ownVar

}
len :=0{
⌈𝑙𝑒𝑛 = 0⌉

}{
∃𝜇. C𝜇 ℓ.

(
⌈𝑙𝑒𝑛 = ℓ ≤ 0⌉

)}
(5.2)

W
P

-L
O

O
P

w
it

h
in

va
ri

an
t𝑃
(𝑖)

=
∃𝜇
.
C
𝜇
ℓ
.
( ⌈𝑙𝑒𝑛

=
ℓ
≤
𝑖⌉
∗O

ut
ℓ

) ,, repeat 𝑁:{
𝑃(𝑖)

}
coin1 Ber(𝑝)
coin2 Ber(𝑝){
𝑃(𝑖) ∗ 𝑐𝑜𝑖𝑛1 $∼ Bern𝑝 ∗ 𝑐𝑜𝑖𝑛2 $∼ Bern𝑝

}{
C𝜇 ℓ. C𝛽 𝑏.

( ⌈𝑙𝑒𝑛 = ℓ ≤ 𝑖⌉ ∗ ⌈(𝑐𝑜𝑖𝑛1 ≠ 𝑐𝑜𝑖𝑛2) = 𝑏⌉
∗Outℓ ∗ (⌜𝑏 = 1⌝ ⇒ 𝑐𝑜𝑖𝑛1 $∼ Bern 1

2
)

)}
(5.3)

C
-W

P
-E

L
IM

,,

{⌈𝑙𝑒𝑛 = ℓ ≤ 𝑖⌉ ∗ ⌈(𝑐𝑜𝑖𝑛1 ≠ 𝑐𝑜𝑖𝑛2) = 𝑏⌉
∗Outℓ ∗ (⌜𝑏 = 1⌝ ⇒ 𝑐𝑜𝑖𝑛1 $∼ Bern 1

2
)

}
if coin1 ≠coin2 then:{⌈𝑙𝑒𝑛 = ℓ ≤ 𝑖⌉ ∗ ⌈(𝑐𝑜𝑖𝑛1 ≠ 𝑐𝑜𝑖𝑛2)⌉
∗Outℓ ∗ 𝑐𝑜𝑖𝑛1 $∼ Bern 1

2

}
(5.4)

out[len] :=coin1
len :=len+1{⌈𝑙𝑒𝑛 = ℓ + 1 ≤ 𝑖 + 1⌉ ∗ ⌈(𝑐𝑜𝑖𝑛1 ≠ 𝑐𝑜𝑖𝑛2)⌉
∗Outℓ ∗ 𝑐𝑜𝑖𝑛1 $∼ Bern 1

2
∗ ⌈(𝑜𝑢𝑡 [𝑙𝑒𝑛] = 𝑐𝑜𝑖𝑛1)⌉

}
(5.5)

⌈(𝑐𝑜𝑖𝑛1 ≠ 𝑐𝑜𝑖𝑛2) = 𝑏⌉ ∗ Outℓ

∗
{
⌈𝑙𝑒𝑛 = ℓ + 1 ≤ 𝑖 + 1⌉ ∗ 𝑜𝑢𝑡 [𝑙𝑒𝑛] $∼ Bern 1

2
if 𝑏 = 1

⌈𝑙𝑒𝑛 = ℓ ≤ 𝑖 + 1⌉ if 𝑏 = 0

 (5.6){
C𝜇 ℓ. C𝛽 𝑏. ⌈(𝑐𝑜𝑖𝑛1 ≠ 𝑐𝑜𝑖𝑛2) = 𝑏⌉ ∗ Outℓ ∗ . . .

}{
C𝜇′ ℓ

′.
(
⌈𝑙𝑒𝑛 = ℓ′ ≤ 𝑖 + 1⌉ ∗ Outℓ′

)}
(5.7){

∃𝜇. C𝜇 ℓ.
(
⌈𝑙𝑒𝑛 = ℓ ≤ 𝑁⌉ ∗ Outℓ

)}
Figure 5.9: Proof outline of the Von Neumann extractor example.

obtain the loop invariant with 𝑖 = 0: 𝑃(0) = ∃𝜇. C𝜇 ℓ.
(
⌈𝑙𝑒𝑛 = ℓ ≤ 0⌉ ∗ Out0

)
=

∃𝜇. C𝜇 ℓ.
(
⌈𝑙𝑒𝑛 = ℓ ≤ 0⌉

)
.

For the proof of the body of the loop we can assume 𝑃(𝑖) and we need to

prove the postcondition 𝑃(𝑖 + 1). After sampling the two coins, at step (5.3) we

apply the crucial insight behind the extractor. The key idea is that with some

probability 𝑞 the two coins will be different, in which case the outcomes of the

two coins can be either (0, 1) or (1, 0), which both have the same probability

𝑝(1 − 𝑝). Therefore, if the coins are different, 𝑐𝑜𝑖𝑛1 = 0 and 𝑐𝑜𝑖𝑛1 = 1 have the

same probability, i.e. 𝑐𝑜𝑖𝑛1 looks like a fair coin.

202


BLUEBELL is capable of representing this reasoning as follows.

We start with two independent biased coins, which we can combine into

a random variable (𝑐𝑜𝑖𝑛1 ≠ 𝑐𝑜𝑖𝑛2, 𝑐𝑜𝑖𝑛1) recording whether the two outcomes

were different and the outcome of the first coin. We use PROD-UNSPLIT and

DIST-FUN to derive:

𝑐𝑜𝑖𝑛1 $∼ Bern𝑝 ∗ 𝑐𝑜𝑖𝑛2 $∼ Bern𝑝 ⊢ (𝑐𝑜𝑖𝑛1 ≠ 𝑐𝑜𝑖𝑛2, 𝑐𝑜𝑖𝑛1) $∼ 𝜇0

where 𝜇0 ≜ bind(Bern𝑝 ⊗Bern𝑝, (𝑐𝑜𝑖𝑛1, 𝑐𝑜𝑖𝑛2) ↦→ (𝑐𝑜𝑖𝑛1 ≠ 𝑐𝑜𝑖𝑛2, 𝑐𝑜𝑖𝑛1)) Now we

can reformulate 𝜇0 to reflect our above-mentioned insight into why this extrac-

tor works: there exists some probability 𝑞 and 𝑞′ such that

𝜇0 = 𝛽 � 𝜅 𝛽 ≜ Bern𝑞 𝜅(1) ≜ Bern 1
2

𝜅(0) ≜ Bern𝑞′

Here one first determines whether the two coins will be different or equal using

a Bernoulli distribution that assigns probability 𝑞 for them to be different; here 𝑞

can be obtained using a function of 𝑝. The process then generates 𝑐1 accordingly

using 𝜅: in the “different” branch (𝑏 = 1) the first coin is distributed as Bern 1
2

while in the “equal” branch (𝑏 = 0) the first coin is distributed with some bias 𝑞′

(also a function of 𝑝).

So using 𝜇0 = 𝛽 � 𝜅 we derive:

(𝑐𝑜𝑖𝑛1 ≠ 𝑐𝑜𝑖𝑛2, 𝑐𝑜𝑖𝑛1) $∼ (𝛽 � 𝜅)

⊢ C𝛽�𝜅 (𝑏, 𝑐1). ⌈(𝑐𝑜𝑖𝑛1 ≠ 𝑐𝑜𝑖𝑛2) = 𝑏 ∧ 𝑐𝑜𝑖𝑛1 = 𝑐1⌉ (C-UNIT-R)

⊢ C𝛽 𝑏. C𝜅(𝑏) 𝑐1. ⌈(𝑐𝑜𝑖𝑛1 ≠ 𝑐𝑜𝑖𝑛2) = 𝑏⌉ ∗ ⌈𝑐𝑜𝑖𝑛1 = 𝑐1⌉ (C-FUSE)

⊢ C𝛽 𝑏.
(
⌈(𝑐𝑜𝑖𝑛1 ≠ 𝑐𝑜𝑖𝑛2) = 𝑏⌉ ∗ C𝜅(𝑏) 𝑐1. ⌈𝑐𝑜𝑖𝑛1 = 𝑐1⌉

)
(SURE-STR-CONVEX)

⊢ C𝛽 𝑏.
(
⌈(𝑐𝑜𝑖𝑛1 ≠ 𝑐𝑜𝑖𝑛2) = 𝑏⌉ ∗ ⌜𝑏 = 1⌝ ⇒ CBern 1

2
𝑐1. ⌈𝑐𝑜𝑖𝑛1 = 𝑐1⌉

)
(C-CONS)

⊢ C𝛽 𝑏.
(
⌈(𝑐𝑜𝑖𝑛1 ≠ 𝑐𝑜𝑖𝑛2) = 𝑏⌉ ∗ ⌜𝑏 = 1⌝ ⇒ 𝑐𝑜𝑖𝑛1 $∼ Bern 1

2

)
(C-UNIT-R)

203


The application of C-FUSE allows us to first condition on 𝑐𝑜𝑖𝑛1 ≠ 𝑐𝑜𝑖𝑛2, and

then the first coin. We can then weaken the case where 𝑏 = 0 and only record

that if 𝑏 = 1 then 𝑐𝑜𝑖𝑛1 is a fair coin.

This takes us through step (5.3) of fig. 5.9. Now the precondition of the if

statement is conditional on len and 𝑐𝑜𝑖𝑛1 ≠ 𝑐𝑜𝑖𝑛2. Intuitively, we want to evalu-

ate the effects of the if statement in the two possible outcomes and put together

the results. This is precisely the purpose of the C-WP-SWAP rule, which together

with C-CONS gives us the derived rule:

C-WP-ELIM

∀𝑣 ∈ supp(𝜇). 𝑃(𝑣) ⊩ wp 𝑡 {𝑄(𝑣)}

C𝜇 𝑣. 𝑃(𝑣) ⊩ wp 𝑡 {C𝜇 𝑣. 𝑄(𝑣)}

By applying the rule twice (first on the conditioning on 𝑙𝑒𝑛, and then on the

conditioning on 𝑐𝑜𝑖𝑛1 ≠ 𝑐𝑜𝑖𝑛2), we can process the if statement case by case,

and then combine the postconditions obtained in each case. For the conditional

with the guard 𝑐𝑜𝑖𝑛1 ≠ 𝑐𝑜𝑖𝑛2, the false branch is a skip (omitted), so it preserves

the precondition with 𝑏 = 0. In the true branch, starting with the precondition

at (5.4), we apply WP-ASSIGN to the assignments to obtain (5.5). Last, we com-

bine the results from both branches by making the overall postcondition at (5.6)

to be parametric on the value of 𝑏 (and ℓ).

The last non-obvious step is (5.7) in fig. 5.9, where we show that the condi-

tional postcondition of the if statement implies the loop invariant 𝑃(𝑖 + 1). Let

𝐾 (ℓ, 𝑏) =


⌈𝑙𝑒𝑛 = ℓ + 1 ≤ 𝑖 + 1⌉ ∗ 𝑜𝑢𝑡 [𝑙𝑒𝑛] $∼ Bern 1

2
if 𝑏 = 1

⌈𝑙𝑒𝑛 = ℓ ≤ 𝑖 + 1⌉ if 𝑏 = 0

204


then the step is proven as follows:

C𝜇 ℓ. C𝛽 𝑏.
(
⌈(𝑐𝑜𝑖𝑛1 ≠ 𝑐𝑜𝑖𝑛2) = 𝑏⌉ ∗ Outℓ ∗ 𝐾 (ℓ, 𝑏)

)
⊢ C𝜇⊗𝛽 (ℓ, 𝑏).

(
⌈(𝑐𝑜𝑖𝑛1 ≠ 𝑐𝑜𝑖𝑛2) = 𝑏⌉ ∗ Outℓ ∗ 𝐾 (ℓ, 𝑏)

)
(C-ASSOC)

⊢ C𝜇⊗𝛽 (ℓ, 𝑏).
(
Outℓ ∗ 𝐾 (ℓ, 𝑏)

)
(C-CONS)

⊢ C𝜇′′ (ℓ′, ℓ).


⌈𝑙𝑒𝑛 = ℓ′ ≤ 𝑖 + 1⌉ ∗Outℓ′−1 ∗ 𝑜𝑢𝑡 [𝑙𝑒𝑛] $∼ Bern 1

2
if ℓ′ = ℓ + 1

⌈𝑙𝑒𝑛 = ℓ′ ≤ 𝑖 + 1⌉ ∗ Outℓ′ if ℓ′ = ℓ

(C-TRANSF)

⊢ C𝜇′′◦𝜋−1
1
ℓ′.

(
⌈𝑙𝑒𝑛 = ℓ′ ≤ 𝑖 + 1⌉ ∗ Outℓ′

)
(C-DIST-PROJ)

⊢ ∃𝜇′. C𝜇′ ℓ′.
(
⌈𝑙𝑒𝑛 = ℓ′ ≤ 𝑖 + 1⌉ ∗ Outℓ′

)
The application of C-TRANSF uses the function 𝑓 (ℓ, 𝑏) = (ℓ+𝑏, ℓ) to introduce

the new ℓ′ and then we project away the unused ℓ using the derived C-DIST-PROJ

(note that the rule applies to ⌈ · ⌉ assertions and multiple ownership assertions

in a separating conjunction thanks to PROD-SPLIT and PROD-UNSPLIT).

5.6 Related Work

Research on deductive verification of probabilistic programs has developed a

wide range of techniques that employ unary and relational styles of reasoning.

BLUEBELL advances the state of the art in both styles, by coherently unifying the

strengths of both. Since this chapter focuses on the unary fragment of BLUE-

BELL, here we compare BLUEBELL with unary-style deductive techniques in

more details, and overview relational-style deductive techniques only at a high-

level. At the end, we briefly survey other relevant techniques of reasoning about

probabilistic programs.

205


Other Unary Reasoning Techniques for Probabilistic Programs Outside of

probabilistic separation logic, another line of unary deductive verification tech-

niques is the expectation-based approach. The high-level idea is to reason about

expected quantities of probabilistic programs via a weakest-pre-expectation op-

erator that propagates information about expected values backwards through

the program. The approach has been classically used to verify randomized al-

gorithms [Kozen, 1983, Morgan et al., 1996, Kaminski et al., 2016, Kaminski,

2019, Aguirre et al., 2021, Bartocci et al., 2020]. These logics offer ergonomic

principles for expectations, but do not aim at unifying principles for analyzing

more general classes of properties or proof techniques, like we attempt here.

Ellora [Barthe et al., 2018] proposes an assertion-based logic to overcome the

limitation of working only with expectations. But it does not support reasoning

about separation or conditioning.

Detailed Comparison with Lilac [Li et al., 2023a] Now that we have intro-

duced the unary fragment of BLUEBELL thoroughly, we revisit its comparison

with Lilac [Li et al., 2023a]. Lilac supports reasoning about independence and

conditional independence in continuous distributions, thanks to their measure-

theoretic model based on Borel spaces. BLUEBELL also uses a measure-theory

based model, similar to Lilac, but is limited to discrete distributions. The lim-

itation is imposed because we are only able to prove some key lemmas [Bao

et al., 2025, Lemma C.1 - C.7] for discrete measures, though we speculate that

they also hold for continuous measures. These lemmas are used in the proof of

key rules such as C-WP-SWAP and C-FRAME, and Lilac’s proof system does not

include similar rules.

Also, while BLUEBELL uses Lilac’s independent product as a model of sep-

206


arating conjunction, it differs from Lilac in three aspects: (1) the treatment of

ownership, (2) support for mutable state, and (3) the model of conditioning.

In Lilac, ownership over program variables and expressions is de-

fined as measurability. In BLUEBELL, however, we define ownership

as almost-measurability, which is required to support inferences like

Own(𝑥) ∗ ⌈𝑥 = 𝑦⌉ ⊢ Own(𝑦) These inferences were implicitly used in the first ver-

sion of Lilac, but were not valid in its model. Their arxiv version Li et al. [2023b]

fixes the issue by changing the meaning of ⌈𝑥 = 𝑦⌉, while our fix changes the

meaning of ownership (and we define ⌈𝐸⌉ assertions based on regular owner-

ship).

Lilac works with immutable state [Staton, 2020], which simplifies reason-

ing in certain contexts (e.g., the frame rule and the if rule). BLUEBELL’s model

supports mutable state through a creative use of permissions, obtaining a clean

frame rule, at the cost of some predictable bookkeeping.

The more significant difference with Lilac is however in the definition of

the conditioning modality. Lilac’s modality C𝑣←𝐸 𝑃(𝑣) roughly corresponds

to the BLUEBELL assertion ∃𝜇. C𝜇 𝑣. (⌈𝐸 = 𝑣⌉ ∗ 𝑃(𝑣)). The difference is not

merely syntactic, and requires changing the model of the modality. For example,

Lilac’s modality satisfies C𝑣←𝐸 𝑃1(𝑣) ∧ C𝑣←𝐸 𝑃2(𝑣) ⊢ C𝑣←𝐸 (𝑃1(𝑣) ∧ 𝑃2(𝑣)), but

the analogous rule C𝜇 𝑣. 𝐾1(𝑣) ∧ C𝜇 𝑣. 𝐾2(𝑣) ⊢ C𝜇 𝑣. (𝐾1(𝑣) ∧ 𝐾2(𝑣)) is unsound

in BLUEBELL: the meaning of the modalities in the premise ensures the existence

of two kernels 𝜅1 and 𝜅2 supporting 𝐾1 and 𝐾2 respectively, but the conclusion

requires the existence of a single kernel supporting both 𝐾1 and 𝐾2. Lilac’s rule

holds because when one conditions on a random variable, the corresponding

kernels are unique. We did not find losing this rule limiting.

207


On the other hand, Lilac’s conditioning has two disadvantages: (i) it does

not record the distribution of 𝐸 , losing this information when conditioning,

(ii) it does not generalize to the relational setting. Even considering only the

unary setting, having access to the distribution 𝜇 unlocks a number of new rules

(e.g. C-UNIT-R and C-FUSE). In particular, the rules of BLUEBELL provide more

ways to convert a conditional assertion back into an unconditional one, which is

crucial when the end goal is not under conditioning but conditioning is helpful

in intermediate steps.

Relational Reasoning We summarize several extensions of pRHL-style rela-

tional reasoning, the vanilla version of which is overviewed in section 5.1

Polaris [Tassarotti and Harper, 2019], is an early instance of a probabilis-

tic relational (concurrent) separation logic. The motivation is to reconcile the

relational lifting based reasoning with the semantics of concurrent programs.

However, separation in Polaris is again classic disjointness of state and is not

related to (conditional) independence in the style of PSL, Lilac, or BLUEBELL.

Gregersen et al. [2024] recently proposed Clutch, a logic to prove contextual

refinement in a probabilistic higher-order language, where “out of order” cou-

plings between samplings are achieved by using ghost code that pre-samples

some assignments, a technique inspired by prophecy variables [Jung et al., 2019].

In the conference version Bao et al. [2025], we showed how BLUEBELL can re-

solve this issue without ghost code (in the context of first-order imperative pro-

grams) by using framing and probabilistic independence creatively. In contrast

to BLUEBELL, Clutch can only express relational properties; it also uses separa-

tion but with its classical interpretation as disjointness of deterministic state.

208


CHAPTER 6

DISCUSSION

In this thesis, we show three extensions of probabilistic separation logic for

reasoning about probabilistic programs involving not only independence, but

also negative dependence or conditional independence. We demonstrated that

we can use these program logics to formalize the correctness of a range of ran-

domized algorithms. With rules that utilize probabilistic (in)dependencies for

modular reasoning, these program logic allows relatively compact and clean

formal proofs. In this final chapter, we discuss more related work and direc-

tions for future work.

6.1 Related Work

Concurrent Developments in Probabilistic Separation Logic During the

course of my doctoral study, various papers have explored other variations of

probabilistic separation logic. Dal Lago et al. [2024] extends PSL to prove com-

putational independence, which relaxes probabilistic independence by only re-

quiring variables’ distribution to be computationally indistinguishable from in-

dependence. They demonstrate their logic by formalizing several simple cryp-

tographic protocols developed to guard against against adversaries with com-

putational power. Ho et al. [2025] extends Lilac [Li et al., 2023a] to support a

probabilistic programming language that can not only sample from distribu-

tions but also condition based on observations and soft constraints. They then

apply the logic to reason about a range of classic examples in Bayesian reason-

ing, including Bayesian coin flip, collider Bayesian network, and the burglar

alarm model. Yan et al. [2024] mechanizes a variation of PSL in Isabelle/HOL

209


and uses it to mechanize the security of several probabilistic oblivious algo-

rithms. Similar to the unary fragment of BLUEBELL, Ignacij Jereb and Simpson

[2025] also tackles the problem of formulating a clean frame rule for impera-

tive probabilistic programs, using a model more similar to PSL’s than Lilac’s.

However, they do not provide a full program logic with a soundness proof.

On the foundation side, Li et al. [2024] bridges the gap between Lilac’s inde-

pendent product with the standard independent product in probability theory

by showing that they are equivalent up to a suitable equivalence of categories.

Quantitative Separation Logic An alternative separation logic for probabilis-

tic programs is Quantitative Separation Logic (QSL) by Batz et al. [2019]. QSL is

developed to unify separation logic for heap-manipulating programs and weak-

est preexpectation, which captures program states using expectations, i.e., numer-

ical quantities, instead of assertions. QSL uses connectives in the bunched logic

to construct new expectations over the existing one; for example, 𝑄1 ∗ 𝑄2 is the

maximum of the numerical product of the quantity 𝑄1(ℎ1) and 𝑄2(ℎ2), where

ℎ1, ℎ2 are disjoint subheaps of the current heap. As a result of these design

choices, applications of QSL often involve reasoning about some quantities re-

lating to the heap — for example, the expected length of a list, which are quite

different from the applications of PSL. QSL is automated in Batz et al. [2022]

through using a weakest preexpectation calculus to reason about programs and

reducing entailments checking to standard separation logic’s entailments check-

ing by Echenim et al. [2020].

Automated Verification of Probabilistic Programs Automated verification

of probabilistic programs is an active field of research. For example, there

210


is a long line of work in automating weakest pre-expectation style calculus

for probabilistic programs [Gretz et al., 2013, Chen et al., 2015, Feng et al.,

2017, Batz et al., 2023, Bao et al., 2024], where the bottleneck is to compute

the weakest pre-expectation of probabilistic loops. Bartocci et al. [2019, 2020]

also utilize moment-based analysis to develop automated tools for analyz-

ing probabilistic programs. Probabilistic model checking provides verifica-

tion tools [Dehnert et al., 2017, Kwiatkowska et al., 2002] of a different flavor.

They typically target probabilistic models specified as discrete-time Markov

chains (DTMCs), continuous-time Markov chains (CTMCs), or Markov decision

processes (MDPs) and ask users to specify desired properties using temporal

formulas involving probabilities. While probabilistic programs can be trans-

lated into such probabilistic models, it is unclear how to encode probabilistic

(in)dependencies as a property in these tools, whose specification languages do

not naturally support relating the probabilities of more than one events.

6.2 Directions for Future Work

While recent work shows the potential of probabilistic separation logic, more

work has to be done to make this technique appealing to practitioners of prob-

abilistic programs. Here we envision some directions for further development

of probabilistic separation logic. First of all, currently it takes expertise to man-

ually construct the formal proof, so automation of the proof construction is im-

portant to make it easier to adopt. Second, users may want to use a richer set of

language features, e.g., concurrency, the conditioning operator, a more general

loop, or higher-order programs, and it would be helpful to enrich probabilis-

tic separation logic to support these language features as well. Third, it would

211


be great to have an unifying framework that is also extensible, so one does not

need to apply different program logic to prove different properties.

In my most recent project, I work on automating PSL for loop-less proba-

bilistic programs, joint with former Cornell undergraduate student Jessica Cho

and my advisor Justin Hsu. We draw inspiration from the automation of stan-

dard separation logic [Berdine et al., 2004, 2005, 2006, Appel, 2011, Piskac et al.,

2014] to develop a syntax-directed version of PSL. However, we observe that

probabilistic programs introduce several additional challenges. For example,

automating PSL must account for conditionals and loops that branch on ran-

domized guards and apply custom axioms about independence at appropriate

places. Moreover, various verification tasks demand the ability to infer the dis-

tribution of an expression given an assertion. We view our work as a pilot study

that demonstrates one approach to automating probabilistic separation logic,

with significant opportunities for further exploration. For instance, how can we

effectively leverage the post-condition to prune the proof during construction?

Is there a loop rule particularly well-suited to automation? Furthermore, can

existing techniques for synthesizing probabilistic loop invariants be adapted

to support the automation of probabilistic separation logic in programs with

loops?

Advances in automation are also closely tied to the foundational develop-

ment of probabilistic separation logic itself, which may lead to a cleaner and

more automatable set of rules. In addition, a unified and extensible framework

would enable automation of new extensions to build upon the automation in-

frastructure developed for existing ones.

212


BIBLIOGRAPHY

Stephen Abbott. Understanding analysis. Springer, 2015.

Serge Abiteboul, Richard Hull, and Victor Vianu. Foundations of Databases, vol-

ume 8. Addison-Wesley Reading, 1995.

Samson Abramsky and Jouko A. Väänänen. From IF to BI. Synthese, 167(2):

207–230, 2009. URL https://doi.org/10.1007/s11229-008-9415-6.

Alejandro Aguirre, Gilles Barthe, Justin Hsu, Benjamin Lucien Kaminski, Joost-

Pieter Katoen, and Christoph Matheja. A pre-expectation calculus for proba-

bilistic sensitivity. Proceedings of the ACM on Programming Languages, 5(POPL),

January 2021. URL https://doi.org/10.1145/3434333.

Nima Anari, Shayan Oveis Gharan, and Alireza Rezaei. Monte carlo markov

chain algorithms for sampling strongly rayleigh distributions and determi-

nantal point processes. In Vitaly Feldman, Alexander Rakhlin, and Ohad

Shamir, editors, 29th Annual Conference on Learning Theory, volume 49 of

Proceedings of Machine Learning Research, pages 103–115, Columbia Univer-

sity, New York, New York, USA, 23–26 Jun 2016. PMLR. URL https://

proceedings.mlr.press/v49/anari16.html.

Andrew W Appel. Verismall: Verified smallfoot shape analysis. In International

Conference on Certified Programs and Proofs, pages 231–246. Springer, 2011.

Jialu Bao, Simon Docherty, Justin Hsu, and Alexandra Silva. A bunched logic

for conditional independence. In Proceedings of the 36th Annual ACM/IEEE

Symposium on Logic in Computer Science, LICS ’21, New York, NY, USA, 2021.

Association for Computing Machinery. ISBN 9781665448956. URL https://

doi.org/10.1109/LICS52264.2021.9470712.

213

https://doi.org/10.1007/s11229-008-9415-6
https://doi.org/10.1145/3434333
https://proceedings.mlr.press/v49/anari16.html
https://proceedings.mlr.press/v49/anari16.html
https://doi.org/10.1109/LICS52264.2021.9470712
https://doi.org/10.1109/LICS52264.2021.9470712


Jialu Bao, Marco Gaboardi, Justin Hsu, and Joseph Tassarotti. A separation logic

for negative dependence. Proceedings of the ACM on Programming Languages, 6

(POPL), January 2022. URL https://doi.org/10.1145/3498719.

Jialu Bao, Nitesh Trivedi, Drashti Pathak, Justin Hsu, and Subhajit Roy. Data-

driven invariant learning for probabilistic programs. Formal Methods in System

Design, pages 1–29, 2024.

Jialu Bao, Emanuele D’Osualdo, and Azadeh Farzan. Bluebell: An alliance of

relational lifting and independence for probabilistic reasoning. Proceedings

of the ACM on Programming Languages, 9(POPL), January 2025. URL https:

//doi.org/10.1145/3704894.

Solon Barocas, Moritz Hardt, and Arvind Narayanan. Fairness and machine learn-

ing: Limitations and opportunities. MIT press, 2023.

Gilles Barthe, Benjamin Grégoire, and Santiago Zanella Béguelin. Formal certi-

fication of code-based cryptographic proofs. In Proceedings of the 36th annual

ACM SIGPLAN-SIGACT symposium on Principles of programming languages,

pages 90–101, 2009. URL https://doi.org/10.1145/1594834.1480894.

Gilles Barthe, Boris Köpf, Federico Olmedo, and Santiago Zanella-Béguelin.

Probabilistic relational reasoning for differential privacy. ACM Transactions

on Programming Languages and Systems (TOPLAS), 35(3):1–49, 2013.

Gilles Barthe, Thomas Espitau, Benjamin Grégoire, Justin Hsu, Léo Stefanesco,

and Pierre-Yves Strub. Relational reasoning via probabilistic coupling. In

International Conference on Logic for Programming, Artificial Intelligence and Rea-

soning (LPAR), pages 387–401. Springer, 2015.

214

https://doi.org/10.1145/3498719
https://doi.org/10.1145/3704894
https://doi.org/10.1145/3704894
https://doi.org/10.1145/1594834.1480894


Gilles Barthe, Noémie Fong, Marco Gaboardi, Benjamin Grégoire, Justin Hsu,

and Pierre-Yves Strub. Advanced probabilistic couplings for differential pri-

vacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Com-

munications Security, pages 55–67, 2016a.

Gilles Barthe, Marco Gaboardi, Benjamin Grégoire, Justin Hsu, and Pierre-Yves

Strub. Proving differential privacy via probabilistic couplings. In 2016 31st

Annual ACM/IEEE Symposium on Logic in Computer Science (LICS), pages 1–10,

2016b. URL https://doi.org/10.1145/2933575.2934554.

Gilles Barthe, Benjamin Grégoire, Justin Hsu, and Pierre-Yves Strub. Coupling

proofs are probabilistic product programs. SIGPLAN Not., 52(1):161–174, Jan-

uary 2017. ISSN 0362-1340. URL https://doi.org/10.1145/3093333.3009896.

Gilles Barthe, Thomas Espitau, Marco Gaboardi, Benjamin Grégoire, Justin Hsu,

and Pierre-Yves Strub. An assertion-based program logic for probabilistic

programs. In Programming Languages and Systems, pages 117–144, Cham, 2018.

Springer International Publishing.

Gilles Barthe, Justin Hsu, and Kevin Liao. A probabilistic separation logic.

Proc. ACM Program. Lang., 4(POPL), December 2019. URL https://doi.org/

10.1145/3371123.

Ezio Bartocci, Laura Kovács, and Miroslav Stankovič. Automatic generation of

moment-based invariants for prob-solvable loops. In Automated Technology for

Verification and Analysis, Taipei, Taiwan, 2019. URL https://doi.org/10.1007/

978-3-030-31784-3 15.

Ezio Bartocci, Laura Kovács, and Miroslav Stankovič. Mora-automatic genera-

tion of moment-based invariants. In International Conference on Tools and Al-

215

https://doi.org/10.1145/2933575.2934554
https://doi.org/10.1145/3093333.3009896
https://doi.org/10.1145/3371123
https://doi.org/10.1145/3371123
https://doi.org/10.1007/978-3-030-31784-3_15
https://doi.org/10.1007/978-3-030-31784-3_15


gorithms for the Construction and Analysis of Systems (TACAS), Dublin, Ireland,

2020. URL https://doi.org/10.1007/978-3-030-45190-5 28.

Kevin Batz, Benjamin Lucien Kaminski, Joost-Pieter Katoen, Christoph Matheja,

and Thomas Noll. Quantitative separation logic: a logic for reasoning about

probabilistic pointer programs. Proc. ACM Program. Lang., 3(POPL), January

2019. URL https://doi.org/10.1145/3290347.

Kevin Batz, Ira Fesefeldt, Marvin Jansen, Joost-Pieter Katoen, Florian Keßler,

Christoph Matheja, and Thomas Noll. Foundations for entailment check-

ing in quantitative separation logic (extended version). arXiv preprint

arXiv:2201.11464, 2022.

Kevin Batz, Mingshuai Chen, Sebastian Junges, Benjamin Lucien Kaminski,

Joost-Pieter Katoen, and Christoph Matheja. Probabilistic program verifica-

tion via inductive synthesis of inductive invariants. In International Conference

on Tools and Algorithms for the Construction and Analysis of Systems, pages 410–

429. Springer, 2023.

Luca Becchetti, Andrea Clementi, Emanuele Natale, Francesco Pasquale, and

Gustavo Posta. Self-stabilizing repeated balls-into-bins. International Sympo-

sium on Distributed Computing (DISC), 2019.

Ioana O. Bercea and Guy Even. Dynamic dictionaries for multisets and counting

filters with constant time operations. Algorithmica, 85(6):1786–1804, December

2022. ISSN 0178-4617. URL https://doi.org/10.1007/s00453-022-01057-0.

Josh Berdine, Cristiano Calcagno, and Peter W. O’Hearn. A decidable fragment

of separation logic. In Proceedings of the 24th International Conference on Founda-

tions of Software Technology and Theoretical Computer Science, FSTTCS’04, page

216

https://doi.org/10.1007/978-3-030-45190-5_28
https://doi.org/10.1145/3290347
https://doi.org/10.1007/s00453-022-01057-0


97–109, Berlin, Heidelberg, 2004. Springer-Verlag. ISBN 3540240586. URL

https://doi.org/10.1007/978-3-540-30538-5 9.

Josh Berdine, Cristiano Calcagno, and Peter W O’hearn. Symbolic execution

with separation logic. In Programming Languages and Systems: Third Asian Sym-

posium, APLAS 2005, Tsukuba, Japan, November 2-5, 2005. Proceedings 3, pages

52–68. Springer, 2005.

Josh Berdine, Cristiano Calcagno, and Peter W O’hearn. Smallfoot: Modular au-

tomatic assertion checking with separation logic. In Formal Methods for Com-

ponents and Objects: 4th International Symposium, FMCO 2005, Amsterdam, The

Netherlands, November 1-4, 2005, Revised Lectures 4, pages 115–137. Springer,

2006.

Burton H. Bloom. Space/time trade-offs in hash coding with allowable errors.

Communications of the ACM, 13(7):422–426, 1970. URL https://doi.org/10.

1145/362686.362692.

James Blustein and Amal El-Maazawi. Bloom filters. a tutorial, analysis, and

survey. Halifax, NS: Dalhousie University, pages 1–31, 2002.

Julius Borcea, Petter Brändén, and Thomas M. Liggett. Negative dependence

and the geometry of polynomials. Journal of the American Mathematical Society,

22(2):521–567, 2009.

Prosenjit Bose, Hua Guo, Evangelos Kranakis, Anil Maheshwari, Pat Morin,

Jason Morrison, Michiel Smid, and Yihui Tang. On the false-positive rate

of bloom filters. Information Processing Letters, 108(4):210–213, October 2008.

ISSN 0020-0190. URL https://doi.org/10.1016/j.ipl.2008.05.018.

217

https://doi.org/10.1007/978-3-540-30538-5_9
https://doi.org/10.1145/362686.362692
https://doi.org/10.1145/362686.362692
https://doi.org/10.1016/j.ipl.2008.05.018


Stephen Brookes. A semantics for concurrent separation logic. tcs, 375(1–3):

227–270, 2007a. URL https://doi.org/10.1016/j.tcs.2006.12.034.

Stephen Brookes. A semantics for concurrent separation logic. Theoretical Com-

puter Science, 375(1):227–270, 2007b. ISSN 0304-3975. URL https://doi.org/

10.1016/j.tcs.2006.12.034. Festschrift for John C. Reynolds’s 70th birthday.

James Brotherston and Cristiano Calcagno. Classical BI: A Logic for Reasoning

about Dualising Resources. In Proceedings of the 36th Annual ACM SIGPLAN-

SIGACT Symposium on Principles of Programming Languages, POPL ’09, page

328–339, New York, NY, USA, 2009. Association for Computing Machinery.

ISBN 9781605583792. URL https://doi.org/10.1145/1480881.1480923.

Cristiano Calcagno, Peter W. O’Hearn, and Hongseok Yang. Local Action and

Abstract Separation Logic. In 22nd Annual IEEE Symposium on Logic in Com-

puter Science (LICS 2007), pages 366–378, Wroclaw, Poland, 2007. IEEE. ISBN

978-0-7695-2908-0. URL https://doi.org/10.1109/LICS.2007.30.

Qinxiang Cao, Santiago Cuellar, and Andrew W. Appel. Bringing order to the

separation logic jungle. In Asian Symposium on Programming Languages and

Systems (APLAS), pages 190–211, Suzhou, China, 2017. Springer.

Aleksandar Chakarov and Sriram Sankaranarayanan. Probabilistic program

analysis with martingales. In International Conference on Computer Aided Veri-

fication (CAV), pages 511–526, Saint Petersburg, Russia, 2013. Springer. URL

https://doi.org/10.1007/978-3-642-39799-8 34.

Yu-Fang Chen, Chih-Duo Hong, Bow-Yaw Wang, and Lijun Zhang.

Counterexample-guided polynomial loop invariant generation by Lagrange

interpolation. In CAV, 2015. doi: 10.1007/978-3-319-21690-4\ 44.

218

https://doi.org/10.1016/j.tcs.2006.12.034
https://doi.org/10.1016/j.tcs.2006.12.034
https://doi.org/10.1016/j.tcs.2006.12.034
https://doi.org/10.1145/1480881.1480923
https://doi.org/10.1109/LICS.2007.30
https://doi.org/10.1007/978-3-642-39799-8_34


David Maxwell Chickering. Learning bayesian networks is np-complete.

In Learning from data: Artificial intelligence and statistics V, pages 121–130.

Springer, 1996.

David Maxwell Chickering. Optimal structure identification with greedy

search. Journal of machine learning research, 3(Nov):507–554, 2002.

Kenta Cho and Bart Jacobs. Disintegration and Bayesian inversion via string

diagrams. Mathematical Structures in Computer Science, 29(7):938–971, 2019.

URL https://doi.org/10.1017/S0960129518000488.

Ken Christensen, Allen Roginsky, and Miguel Jimeno. A new analysis of the

false positive rate of a bloom filter. Information Processing Letters, 110(21):944–

949, 2010. ISSN 0020-0190. URL https://doi.org/10.1016/j.ipl.2010.07.024.

Ugo Dal Lago, Davide Davoli, and Bruce M Kapron. On Separation Logic, Com-

putational Independence, and Pseudorandomness. In 2024 IEEE 37th Com-

puter Security Foundations Symposium (CSF), pages 651–666. IEEE Computer

Society, 2024.

A Philip Dawid. Conditional independence in statistical theory. Journal of the

Royal Statistical Society: Series B (Methodological), 41(1):1–15, 1979.

A. Philip Dawid. Separoids: A mathematical framework for conditional inde-

pendence and irrelevance. Annals of Mathematics and Artificial Intelligence, 32

(1-4):335–372, 2001.

Christian Dehnert, Sebastian Junges, Joost-Pieter Katoen, and Matthias Volk.

A storm is coming: A modern probabilistic model checker. In International

Conference on Computer Aided Verification, pages 592–600. Springer, 2017.

219

https://doi.org/10.1017/S0960129518000488
https://doi.org/10.1016/j.ipl.2010.07.024


Edsger W Dijkstra. Guarded commands, nondeterminacy and formal deriva-

tion of programs. Communications of the ACM, 18(8):453–457, 1975.

Bolin Ding and Arnd Christian König. Fast set intersection in memory. Pro-

ceedings of the VLDB Endowment, 4(4):255–266, 2011. URL https://doi.org/10.

14778/1938545.1938550.

Simon Docherty. Bunched Logics: A Uniform Approach. PhD thesis, UCL (Univer-

sity College London), 2019.

Devdatt P. Dubhashi and Desh Ranjan. Balls and bins: A study in negative

dependence. Random Structures and Algorithms, 13(2):99–124, 1998.

Arnaud Durand, Miika Hannula, Juha Kontinen, Arne Meier, and Jonni

Virtema. Probabilistic Team Semantics. In International Symposium on Founda-

tions of Information and Knowledge Systems (FoIKS), Budapest, Hungary, volume

10833 of Lecture Notes in Computer Science, pages 186–206. Springer, 2018. URL

https://doi.org/10.1007/978-3-319-90050-6 11.

Evgeniı̆ Borisovich Dynkin. Theory of Markov processes. Courier Corporation,

2012.

Mnacho Echenim, Radu Iosif, and Nicolas Peltier. The bernays-schönfinkel-

ramsey class of separation logic with uninterpreted predicates. ACM Trans.

Comput. Logic, 21(3), March 2020. ISSN 1529-3785. doi: 10.1145/3380809. URL

https://doi.org/10.1145/3380809.

Facebook. Infer static analyzer. URL https://fbinfer.com/.

Ronald Fagin and Moshe Y. Vardi. The Theory of Data Dependencies -

An Overview. In International Colloquium on Automata, Languages and

220

https://doi.org/10.14778/1938545.1938550
https://doi.org/10.14778/1938545.1938550
https://doi.org/10.1007/978-3-319-90050-6_11
https://doi.org/10.1145/3380809
https://fbinfer.com/


Programming (ICALP), pages 1–22, 1984. URL https://doi.org/10.1007/

3-540-13345-3 1.

Yijun Feng, Lijun Zhang, David N Jansen, Naijun Zhan, and Bican Xia. Finding

polynomial loop invariants for probabilistic programs. In ATVA, 2017.

Michael J Fischer, Nancy A Lynch, and Michael S Paterson. Impossibility of

distributed consensus with one faulty process. Journal of the ACM (JACM), 32

(2):374–382, 1985.

Nir Friedman, Dan Geiger, and Moises Goldszmidt. Bayesian network classi-

fiers. Machine learning, 29(2):131–163, 1997.

Bert E Fristedt and Lawrence F Gray. A modern approach to probability theory.

Springer Science & Business Media, 2013.

Tobias Fritz. A synthetic approach to Markov kernels, conditional independence

and theorems on sufficient statistics. Advances in Mathematics, 370:107–239,

2020. URL https://doi.org/10.1016/j.aim.2020.107239.

Dan Geiger and Judea Pearl. Logical and Algorithmic Properties of Conditional

Independence and Graphical Models. The Annals of Statistics, 21(4):2001–2021,

1993.

Michele Giry. A categorical approach to probability theory. Categorical aspects of

topology and analysis, pages 68–85, 1982.

Robert Goldblatt. Varieties of complex algebras. Annals of Pure and Applied Logic,

44(3):173–242, 1989. URL https://doi.org/10.1016/0168-0072(89)90032-8.

Oded Goldreich. Secure multi-party computation. Manuscript. Preliminary ver-

sion, 78(110):1–108, 1998.

221

https://doi.org/10.1007/3-540-13345-3_1
https://doi.org/10.1007/3-540-13345-3_1
https://doi.org/10.1016/j.aim.2020.107239
https://doi.org/10.1016/0168-0072(89)90032-8


Kiran Gopinathan and Ilya Sergey. Certifying certainty and uncertainty in

approximate membership query structures. In International Conference on

Computer Aided Verification (CAV), volume 12225 of Lecture Notes in Com-

puter Science, pages 279–303, Los Angeles, California, 2020. springer. URL

https://doi.org/10.1007/978-3-030-53291-8 16.

Andrew D Gordon, Thomas A Henzinger, Aditya V Nori, and Sriram K Raja-

mani. Probabilistic programming. In Future of Software Engineering Proceed-

ings, pages 167–181. 2014.

Simon Oddershede Gregersen, Alejandro Aguirre, Philipp G Haselwarter,

Joseph Tassarotti, and Lars Birkedal. Asynchronous Probabilistic Couplings

in Higher-Order Separation Logic. Proceedings of the ACM on Programming

Languages, 8(POPL):753–784, 2024.

Friedrich Gretz, Joost-Pieter Katoen, and Annabelle McIver. Prinsys - on

a quest for probabilistic loop invariants. In QEST, 2013. doi: 10.1007/

978-3-642-40196-1\ 17.

Tao Gu, Jialu Bao, Justin Hsu, Alexandra Silva, and Fabio Zanasi. A categorical

approach to dibi models. In 9th International Conference on Formal Structures

for Computation and Deduction, FSCD 2024, July 10-13, 2024, Tallinn, Estonia,

volume 299 of LIPIcs, pages 17:1–17:20. Schloss Dagstuhl - Leibniz-Zentrum

für Informatik, 2024. URL https://doi.org/10.4230/LIPIcs.FSCD.2024.17.

Miika Hannula, Juha Kontinen, Jan Van den Bussche, and Jonni Virtema.

Descriptive complexity of real computation and probabilistic independence

logic. In ”IEEE Symposium on Logic in Computer Science (LICS), pages 550–563,

2020.

222

https://doi.org/10.1007/978-3-030-53291-8_16
https://doi.org/10.4230/LIPIcs.FSCD.2024.17


Jaakko Hintikka and Gabriel Sandu. Informational Independence as a Semanti-

cal Phenomenon. In Logic, Methodology and Philosophy of Science VIII, volume

126 of Studies in Logic and the Foundations of Mathematics, pages 571–589. Else-

vier, 1989. URL https://doi.org/10.1016/S0049-237X(08)70066-1.

Shing Hin Ho, Nicolas Wu, and Azalea Raad. Bayesian separation logic. arXiv

preprint arXiv:2507.15530, 2025.

Charles Antony Richard Hoare. Algorithm 64: quicksort. Communications of the

ACM, 4(7):321, 1961.

Tony Hoare, Bernhard Möller, Georg Struth, and Ian Wehrman. Concurrent

Kleene algebra and its foundations. The Journal of Logic and Algebraic Program-

ming, 80(6):266–296, 2011.

Steven J Holtzen. Exploiting Program Structure for Scaling Probabilistic Program-

ming. University of California, Los Angeles, 2021.

Justin Hsu. Probabilistic Couplings for Probabilistic Reasoning. PhD thesis, Univer-

sity of Pennsylvania, 2017.

Janez Ignacij Jereb and Alex Simpson. Safety, relative tightness and the proba-

bilistic frame rule. arXiv e-prints, pages arXiv–2506, 2025.

Bart Jacobs and Fabio Zanasi. A Formal Semantics of Influence in Bayesian Rea-

soning. In International Symposium on Mathematical Foundations of Computer

Science (MFCS), Aalborg, Denmark, volume 83 of Leibniz International Proceed-

ings in Informatic, pages 21:1–21:14. dagstuhl, 2017. URL https://doi.org/10.

4230/LIPIcs.MFCS.2017.21.

Kumar Joag-Dev and Frank Proschan. Negative association of random variables

223

https://doi.org/10.1016/S0049-237X(08)70066-1
https://doi.org/10.4230/LIPIcs.MFCS.2017.21
https://doi.org/10.4230/LIPIcs.MFCS.2017.21


with applications. The Annals of Statistics, 11(1):286–295, 1983. URL https:

//doi.org/10.1214/aos/1176346079.

Ralf Jung, David Swasey, Filip Sieczkowski, Kasper Svendsen, Aaron Turon,

Lars Birkedal, and Derek Dreyer. Iris: Monoids and invariants as an orthog-

onal basis for concurrent reasoning. ACM SIGPLAN Notices, 50(1):637–650,

2015.

Ralf Jung, Robbert Krebbers, Jacques-Henri Jourdan, Ales Bizjak, Lars Birkedal,

and Derek Dreyer. Iris from the ground up: A modular foundation for higher-

order concurrent separation logic. Journal of Functional Programming, 28:e20,

2018. URL https://doi.org/10.1017/S0956796818000151.

Ralf Jung, Rodolphe Lepigre, Gaurav Parthasarathy, Marianna Rapoport, Amin

Timany, Derek Dreyer, and Bart Jacobs. The future is ours: prophecy variables

in separation logic. Proc. ACM Program. Lang., 4(POPL), December 2019. URL

https://doi.org/10.1145/3371113.

Benjamin Lucien Kaminski. Advanced Weakest Precondition Calculi for Probabilistic

Programs. PhD thesis, RWTH Aachen University, 2019.

Benjamin Lucien Kaminski, Joost-Pieter Katoen, Christoph Matheja, and Fed-

erico Olmedo. Weakest precondition reasoning for expected run–times of

probabilistic programs. In European Symposium on Programming (ESOP), pages

364–389. Springer, 2016.

Neville Kenneth Kitson, Anthony C Constantinou, Zhigao Guo, Yang Liu, and

Kiattikun Chobtham. A survey of bayesian network structure learning. Arti-

ficial Intelligence Review, 56(8):8721–8814, 2023.

224

https://doi.org/10.1214/aos/1176346079
https://doi.org/10.1214/aos/1176346079
https://doi.org/10.1017/S0956796818000151
https://doi.org/10.1145/3371113


Daphne Koller and Nir Friedman. Probabilistic graphical models: principles and

techniques. MIT press, 2009.

Dexter Kozen. Semantics of probabilistic programs. Journal of Computer

and System Sciences, 22(3):328–350, 1981. URL https://doi.org/10.1016/

0022-0000(81)90036-2.

Dexter Kozen. A probabilistic PDL. In Proceedings of the Fifteenth Annual ACM

Symposium on Theory of Computing, pages 291–297, 1983.

Robbert Krebbers, Jacques-Henri Jourdan, Ralf Jung, Joseph Tassarotti, Jan-

Oliver Kaiser, Amin Timany, Arthur Charguéraud, and Derek Dreyer. MoSeL:

A general, extensible modal framework for interactive proofs in separation

logic. Proceedings of the ACM on Programming Languages, 2(ICFP):77:1–77:30,

2018. URL https://doi.org/10.1145/3236772.

Alex Kulesza and Ben Taskar. Determinantal point processes for machine learn-

ing. Found. Trends Mach. Learn., 5(2-3):123–286, 2012. URL https://doi.org/

10.1561/2200000044.

Marta Kwiatkowska, Gethin Norman, and David Parker. Prism: Probabilistic

symbolic model checker. In International Conference on Modelling Techniques

and Tools for Computer Performance Evaluation, pages 200–204. Springer, 2002.

John M. Li, Amal Ahmed, and Steven Holtzen. Lilac: A modal separation logic

for conditional probability. Proc. ACM Program. Lang., 7(PLDI), June 2023a.

URL https://doi.org/10.1145/3591226.

John M. Li, Amal Ahmed, and Steven Holtzen. Lilac: A modal separation logic

for conditional probability, 2023b. URL https://arxiv.org/abs/2304.01339.

225

https://doi.org/10.1016/0022-0000(81)90036-2
https://doi.org/10.1016/0022-0000(81)90036-2
https://doi.org/10.1145/3236772
https://doi.org/10.1561/2200000044
https://doi.org/10.1561/2200000044
https://doi.org/10.1145/3591226
https://arxiv.org/abs/2304.01339


John M. Li, Jon Aytac, Philip Johnson-Freyd, Amal Ahmed, and Steven Holtzen.

A nominal approach to probabilistic separation logic. In IEEE Symposium on

Logic in Computer Science (LICS), pages 55:1–55:14. ACM, 2024.

Nancy A Lynch. Distributed algorithms. Elsevier, 1996.

Michael Mitzenmacher and Eli Upfal. Probability and Computing - Randomized

Algorithms and Probabilistic Analysis. Cambridge University Press, 2005.

Eugenio Moggi. Notions of computation and monads. Information and Computa-

tion, 93(1):55–92, 1991. URL https://doi.org/10.1016/0890-5401(91)90052-4.

Carroll Morgan, Annabelle McIver, and Karen Seidel. Probabilistic predicate

transformers. ACM Transactions on Programming Languages and Systems, 1996.

URL https://doi.org/10.1145/229542.229547.

James K Mullin. A second look at bloom filters. Communications of the ACM, 26

(8):570–571, 1983.

Wolfgang Mulzer. Five Proofs of Chernoff’s Bound with Applications, May

2019. URL https://doi.org/10.48550/arXiv.1801.03365.

Kevin P Murphy. Machine learning: a probabilistic perspective. MIT press, 2012.

Fritz Obermeyer, Eli Bingham, Martin Jankowiak, Neeraj Pradhan, Justin Chiu,

Alexander Rush, and Noah Goodman. Tensor variable elimination for plated

factor graphs. In International Conference on Machine Learning, pages 4871–

4880. PMLR, 2019.

Peter W. O’Hearn and David J. Pym. The logic of bunched implications. bsl,

pages 215–244, 1999.

226

https://doi.org/10.1016/0890-5401(91)90052-4
https://doi.org/10.1145/229542.229547
https://doi.org/10.48550/arXiv.1801.03365


Prakash Panangaden. Labelled Markov Processes. Imperial College Press, 2009.

URL https://doi.org/10.1142/p595.

Judea Pearl. Probabilistic reasoning in intelligent systems: networks of plausible in-

ference. Elsevier, 2014.

Judea Pearl and Azaria Paz. Graphoids: A Graph-Based Logic for Reasoning about

Relevance Relations. University of California (Los Angeles). Computer Science

Department, ˜, 1985.

Judea Pearl, Dan Geiger, and Thomas Verma. Conditional independence and its

representations. Kybernetika, 25(7):33–44, 1989.

Ruzica Piskac, Thomas Wies, and Damien Zufferey. Automating separation

logic with trees and data. In Computer Aided Verification: 26th International

Conference, CAV 2014, Held as Part of the Vienna Summer of Logic, VSL 2014,

Vienna, Austria, July 18-22, 2014. Proceedings 26, pages 711–728. Springer, 2014.

Andrew M. Pitts. Nominal Sets: Names and Symmetry in Computer Science. Cam-

bridge Tracts in Theoretical Computer Science. Cambridge University Press,

2013. URL https://doi.org/10.1017/CBO9781139084673.

Konstantinos Psounis and Balaji Prabhakar. A randomized web-cache re-

placement scheme. In Proceedings IEEE INFOCOM 2001. Conference on Com-

puter Communications. Twentieth Annual Joint Conference of the IEEE Computer

and Communications Society (Cat. No.01CH37213), volume 3, pages 1407–1415

vol.3, 2001. URL https://doi.org/110.1109/INFCOM.2001.916636.

Michael O Rabin. Probabilistic algorithm for testing primality. Journal of number

theory, 12(1):128–138, 1980.

227

https://doi.org/10.1142/p595
https://doi.org/10.1017/CBO9781139084673
https://doi.org/110.1109/INFCOM.2001.916636


Prabhakar Raghavan and Clark D Tompson. Randomized rounding: a tech-

nique for provably good algorithms and algorithmic proofs. Combinatorica, 7

(4):365–374, 1987.

John C Reynolds. Separation logic: A logic for shared mutable data structures.

In Proceedings 17th Annual IEEE Symposium on Logic in Computer Science, pages

55–74. IEEE, 2002.

John A Rice. Mathematical statistics and data analysis, volume 371. Thom-

son/Brooks/Cole Belmont, CA, 2007.

Marc Romanı́. A short proof of Hoeffding’s lemma, May 2021.

Jeffrey S. Rosenthal. A First Look at Rigorous Probability Theory. World Scientific

Publishing Company, 2006.

Alex Simpson. Category-theoretic Structure for Independence and Conditional

Independence. In Conference on the Mathematical Foundations of Programming

Semantics (MFPS), pages 281–297, 2018. URL https://doi.org/10.1016/j.entcs.

2018.03.028.

Alex Simpson. Equivalence and conditional independence in atomic sheaf logic.

In Proceedings of the 39th Annual ACM/IEEE Symposium on Logic in Computer

Science, pages 1–14, 2024.

Aravind Srinivasan. Distributions on level-sets with applications to approx-

imation algorithms. In IEEE Symposium on Foundations of Computer Science

(FOCS), pages 588–597, Las Vegas, Nevada, 2001. IEEE. URL https://doi.

org/10.1109/SFCS.2001.959935.

Sam Staton. Probabilistic programs as measures. Foundations of Probabilistic

Programming, page 43, 2020.

228

https://doi.org/10.1016/j.entcs.2018.03.028
https://doi.org/10.1016/j.entcs.2018.03.028
https://doi.org/10.1109/SFCS.2001.959935
https://doi.org/10.1109/SFCS.2001.959935


Robert S Strichartz. The way of analysis. Jones & Bartlett Learning, 2000.

Subhash Suri. Caching, January 2020. URL https://sites.cs.ucsb.edu/∼suri/

ccs130a/Caching.pdf.

Joseph Tassarotti and Robert Harper. A separation logic for concurrent random-

ized programs. Proceedings of the ACM on Programming Languages, 3(POPL):

64:1–64:30, 2019. URL https://doi.org/10.1145/3290377.

Jouko Väänänen. Dependence Logic: A New Approach to Independence Friendly

Logic. London Mathematical Society Student Texts. Cambridge University

Press, 2007. URL https://doi.org/10.1017/CBO9780511611193.

Viktor Vafeiadis and Matthew Parkinson. A marriage of rely/guarantee and

separation logic. In CONCUR 2007, Lisbon, Portugal, September 3-8, 2007. Pro-

ceedings 18, pages 256–271. Springer, 2007.

John von Neumann. Various techniques used in connection with random digits.

Journal of Research of the National Bureau of Standards, Applied Math Series, pages

36–38, 1951.

Jinyi Wang, Yican Sun, Hongfei Fu, Krishnendu Chatterjee, and Amir Kafsh-

dar Goharshady. Quantitative analysis of assertion violations in probabilis-

tic programs. In ACM SIGPLAN Conference on Programming Language Design

and Implementation (PLDI), pages 1171–1186, Virtual, 2021. acmpress. URL

https://doi.org/10.1145/3453483.3454102.

Yuxin Wang, Zeyu Ding, Guanhong Wang, Daniel Kifer, and Danfeng Zhang.

Proving differential privacy with shadow execution. Proceedings of the ACM

on Programming Languages, (POPL, 19):655–669, 2019.

229

https://sites.cs.ucsb.edu/~suri/ccs130a/Caching.pdf
https://sites.cs.ucsb.edu/~suri/ccs130a/Caching.pdf
https://doi.org/10.1145/3290377
https://doi.org/10.1017/CBO9780511611193
https://doi.org/10.1145/3453483.3454102


Pengbo Yan, Toby Murray, Olga Ohrimenko, Van-Thuan Pham, and Robert Si-

son. Combining classical and probabilistic independence reasoning to ver-

ify the security of oblivious algorithms. In International Symposium on Formal

Methods, pages 188–205. Springer, 2024.

Danfeng Zhang and Daniel Kifer. LightDP: Towards automating differential

privacy proofs. In ACM SIGPLAN–SIGACT Symposium on Principles of Pro-

gramming Languages (POPL), pages 888–901, 2017.

230


APPENDIX A

BUNCHED LOGIC AND PROBABILISTIC SEPARATION LOGIC

A.1 Proofs related to Bunched Logic

Theorem 2.2.13. XD = (𝑋D, ⊑D, ⊗D, 𝐸D) is a BI frame.

Proof. We show that the defined structure satisfies all the frame conditions.

Commutativity For any 𝜇1 ∈ D(Mem[𝑆]), 𝜇2 ∈ D(Mem[𝑇]),

𝜇 ∈ 𝜇1 ⊗D 𝜇2

⇔𝜇(𝑚) = 𝜇1(𝜋𝑆𝑚) · 𝜇2(𝜋𝑇𝑚) for any 𝑚 ∈ Mem[𝑆 ∪ 𝑇]

⇔𝜇 ∈ 𝜇2 ⊗D 𝜇1.

Associativity Given 𝜇1 ∈ D(Mem[𝑆]), 𝜇2 ∈ D(Mem[𝑇]), 𝜇3 ∈ D(Mem[𝑅]), for

any 𝜇0 ∈ 𝜇1 ⊗D 𝜇2 and 𝜇 ∈ 𝜇0 ⊗D 𝜇3,

𝜇 ∈ 𝜇0 ⊗D 𝜇3

⇔𝜇(𝑚) = 𝜇0(𝜋𝑆∪𝑇𝑚) · 𝜇3(𝜋𝑅𝑚) for any 𝑚 ∈ Mem[𝑆 ∪ 𝑇 ∪ 𝑅]

⇔𝜇(𝑚) = (𝜇1(𝜋𝑆𝑚) · 𝜇2(𝜋𝑇𝑚)) · 𝜇3(𝜋𝑅𝑚) for any 𝑚 ∈ Mem[𝑆 ∪ 𝑇 ∪ 𝑅]

⇔𝜇(𝑚) = 𝜇1(𝜋𝑆𝑚) · (𝜇2(𝜋𝑇𝑚) · 𝜇3(𝜋𝑅𝑚)) for any 𝑚 ∈ Mem[𝑆 ∪ 𝑇 ∪ 𝑅]

231


Define 𝜇′ = 𝜋𝑇∪𝑅𝜇. Then for any 𝑚 ∈ Mem[𝑇 ∪ 𝑅],

𝜇′(𝑚) =
∑︁

𝑚′∈Mem[𝑆]
𝜇(𝑚′ ⊲⊳ 𝑚)

=
∑︁

𝑚′∈Mem[𝑆]
𝜇1(𝜋𝑆𝑚′ ⊲⊳ 𝑚) · (𝜇2(𝜋𝑇𝑚′ ⊲⊳ 𝑚) · 𝜇3(𝜋𝑅𝑚′ ⊲⊳ 𝑚))

=
©­«

∑︁
𝑚′∈Mem[𝑆]

𝜇1(𝑚′)ª®¬ · (𝜇2(𝜋𝑇𝑚) · 𝜇3(𝜋𝑅𝑚))

= 𝜇2(𝜋𝑇𝑚) · 𝜇3(𝜋𝑅𝑚)

Thus, 𝜇′ ∈ 𝜇2 ⊗D 𝜇3, and furthermore,

𝜇(𝑚) = 𝜇1(𝜋𝑆𝑚) · (𝜇2(𝜋𝑇𝑚) · 𝜇3(𝜋𝑅𝑚)) for any 𝑚 ∈ Mem[𝑆 ∪ 𝑇 ∪ 𝑅]

⇒𝜇(𝑚) = 𝜇1(𝜋𝑆𝑚) · 𝜇′(𝜋𝑇∪𝑅𝑚) for any 𝑚 ∈ Mem[𝑆 ∪ 𝑇 ∪ 𝑅]

⇒𝜇 ∈ 𝜇1 ⊗D 𝜇′

Unit Existence Given any 𝜇 ∈ D(Mem[𝑆]), for any 𝑚 ∈ Mem[𝑆],

(𝜇 ⊗D ⟨⟩)(𝑚) = 𝜇(𝜋𝑆𝑚) · ⟨⟩(𝜋∅𝑚) = 𝜇(𝑚).

Thus, 𝜇 ∈ 𝜇 ⊗D ⟨⟩.

Unit Closure 𝐸D is closed under the pre-order as 𝐸D = 𝑋D.

Unit Coherence For any 𝜇𝑥 ∈ D(Mem[𝑆]), 𝜇𝑒 ∈ D(Mem[𝑇]), if 𝜇𝑦 ∈ 𝜇𝑥 ⊗D 𝜇𝑒,

then for any 𝑚 ∈ Mem[𝑆 ∪ 𝑇], 𝜇𝑦 (𝑚) = 𝜇𝑥 (𝜋𝑆𝑚) · 𝜇𝑒 (𝜋𝑇𝑚). Thus, for any

𝑚𝑆 ∈ Mem[𝑆], 𝑚𝑇 ∈ Mem[𝑇], 𝜇𝑦 (𝑚𝑆 ⊲⊳ 𝑚𝑇 ) = 𝜇𝑥 (𝑚𝑆) · 𝜇𝑒 (𝑚𝑇 ), which

implies that∑︁
𝑚′
𝑇
∈Mem[𝑇]

𝜇𝑦 (𝑚𝑆 ⊲⊳ 𝑚′𝑇 ) =
∑︁

𝑚′
𝑇
∈Mem[𝑇]

𝜇𝑥 (𝑚𝑆) · 𝜇𝑒 (𝑚′𝑇 ) = 𝜇𝑥 (𝑚𝑆).

Therefore, 𝜇𝑥 ⊑D 𝜇𝑦.

232


Down-Closed If 𝜇𝑧 ∈ 𝜇𝑥 ⊗D 𝜇𝑦, and 𝜇′𝑥 ⊑D 𝜇𝑥 , 𝜇′𝑦 ⊑D 𝜇𝑦, then define 𝑋 =

dom(𝜇𝑥), 𝑌 = dom(𝜇𝑦), 𝑋′ = dom(𝜇′𝑥), 𝑌 ′ = dom(𝜇′𝑦), and define 𝜇 =

𝜋𝑋 ′∪𝑌 ′𝜇𝑧. The fact that 𝜇𝑧 ∈ 𝜇𝑥 ⊗D 𝜇𝑦 implies that for any 𝑚 ∈ Mem[𝑋 ∪ 𝑌 ],

𝜇𝑧 (𝑚) = 𝜇𝑥 (𝜋𝑋𝑚) · 𝜇𝑦 (𝜋𝑌𝑚);

Thus,

𝜇(𝑚) = (𝜋𝑋 ′∪𝑌 ′𝜇𝑧) (𝑚) =
∑︁

𝑚′∈Mem[𝑋∪𝑌\(𝑋 ′∪𝑌 ′)]
𝜇𝑧 (𝑚′ ⊲⊳ 𝑚)

=
∑︁

𝑚′∈Mem[𝑋∪𝑌\(𝑋 ′∪𝑌 ′)]
𝜇𝑥 (𝜋𝑋𝑚′ ⊲⊳ 𝑚) · 𝜇𝑦 (𝜋𝑌𝑚′ ⊲⊳ 𝑚)

=
∑︁

𝑚1∈Mem[𝑋\𝑋 ′]

∑︁
𝑚2∈Mem[𝑌\𝑌 ′]

𝜇𝑥 (𝜋𝑋𝑚1 ⊲⊳ 𝑚2 ⊲⊳ 𝑚) · 𝜇𝑦 (𝜋𝑌𝑚1 ⊲⊳ 𝑚2 ⊲⊳ 𝑚)

=
©­«

∑︁
𝑚1∈Mem[𝑋\𝑋 ′]

𝜇𝑥 (𝜋𝑋𝑚1 ⊲⊳ 𝑚)ª®¬ · ©­«
∑︁

𝑚2∈Mem[𝑌\𝑌 ′]
𝜇𝑦 (𝜋𝑌𝑚2 ⊲⊳ 𝑚)ª®¬

= 𝜋𝑋 ′𝜇𝑥 (𝑚) · 𝜋𝑌 ′𝜇𝑦 (𝑚)

= 𝜇′𝑥 (𝑚) · 𝜇′𝑦 (𝑚)

Hence, 𝜇 ∈ 𝜇′𝑥 ⊗D 𝜇′𝑦, and by definition, 𝜇 ⊑D 𝜇𝑧.

□

Lemma 2.3.2. For any distribution 𝜇 ∈ XD, for a set of variables {𝑋𝑖}𝑖∈𝑆, 𝜇 |=

∗𝑖∈𝑆 Own(𝑋𝑖) iff variables {𝑋𝑖}𝑖∈𝑆 are distinct and mutually independent.

Proof. For the forward definition: 𝜇 |= ∗𝑖∈𝑆 Own(𝑋𝑖) implies 𝜇 |= ∗𝑖∈𝑇 Own(𝑋𝑖)

for any subset 𝑇 ⊆ 𝑆. by inductively unfolding the satisfaction definition and

applying eq. (Down-Closed), there exists a set of distributions {𝜇𝑖}𝑖∈𝑇 such that

• 𝜇𝑖 |= Own(𝑋𝑖) for any 𝑖 ∈ 𝑇 .

233


• dom(𝜇𝑖) are all disjoint

• Let 𝑇 ′ = ∪𝑖∈𝑇 {𝑋𝑖}. For any 𝑚 ∈ Mem[𝑇], 𝜋𝑇𝜇(𝑚) =
∏
𝑖∈𝑇 𝜇𝑖 (𝜋dom(𝜇𝑖)𝑚) .

Thus, the first condition implies 𝑋𝑖 ∈ dom(𝜇𝑖) for each 𝑖, and thus 𝜇𝑖 (𝑋𝑖 = 𝑣) can

be evaluated for any 𝑖 ∈ 𝑇 and 𝑣 ∈ Val. Combining this with the third condition,

we have that: for any set of {𝑣𝑖 ∈ Val}𝑖∈𝑇

𝜋𝑅𝜇(𝑚) =
∏
𝑖∈𝑇

𝜇𝑖 (𝑋𝑖 = 𝑣𝑖)

=
∏
𝑖∈𝑇

𝜇(𝑋𝑖 = 𝑣𝑖).

Also, the second condition combined with the third one imply that all 𝑋𝑖 are

distinct.

For the backwards direction. We define 𝜇𝑖 = 𝜋𝑋𝑖𝜇 for each 𝑖 ∈ 𝑇 . Then

clearly, each 𝜇𝑖 satisfies Own(𝑋𝑖). For convenience, relabel the variables in 𝑇 as

𝑇1, . . . , 𝑇𝑚, and denote ∪𝑘
𝑖=1{𝑇𝑖} as 𝑇 [: 𝑘]. Furthermore, we can prove by induc-

tion that 𝜋𝑇 [:𝑘]𝜇 |= ∗𝑘𝑖=1 Own(𝑋𝑖) for 1 ≤ 𝑘 ≤ 𝑚.

Base: 𝜋𝑋𝑖𝜇 |= Own(𝑋𝑖).

Inductive: 𝑋𝑖 being mutually independent implies that for any set of values

{𝑣𝑖 ∈ Val}1≤𝑖≤𝑚,

𝜇

(∧
𝑖≤𝑘

𝑇𝑖 = 𝑣𝑖

)
=

∏
𝑖≤𝑘

𝜇(𝑇𝑖 = 𝑣𝑖) = 𝜇
( ∧

1≤𝑖<𝑘
𝑇𝑖 = 𝑣𝑖

)
· 𝜇(𝑇𝑘 = 𝑣𝑘 )

Thus, for any 𝑚 ∈ Mem[∪1≤𝑖≤𝑘 {𝑇𝑖}],

𝜇(𝑚) = 𝜋𝑇 [:𝑘−1]𝜇
(
𝜋𝑇 [:𝑘−1]𝑚

)
· 𝜋𝑇𝑘𝜇(𝜋𝑇𝑘𝑚)

And therefore, 𝜋𝑇 [:𝑘]𝜇 ∈
(
𝜋𝑇 [:𝑘−1]𝜇

)
◦ 𝜋𝑇𝑘𝜇. By inductive hypothe-

sis,
(
𝜋𝑇 [:𝑘]𝜇

)
|= ∗𝑘−1

𝑖=1 Own(𝑋𝑖), and by satisfaction, we have 𝜋𝑇 [:𝑘]𝜇 |=

∗𝑘𝑖=1 Own(𝑋𝑖).

234


□

A.2 Proofs related to Probabilistic Separation Logic

Lemma 2.3.7 (Restriction). Let 𝜇 ∈ D(Mem[𝑆]) and let 𝜑 be a BI formula. Then:

𝜇 |= 𝜑⇔ (𝜎, 𝜋FV(𝜑) (𝜇)) |= 𝜑.

Proof. The reverse direction follows by the persistence. The forward direction

follows by induction on 𝜑.

• 𝜑 ≡ ⊤,⊥, and atomic propositions 𝑝. Trivial.

• 𝜑 ≡ 𝜑1 ∧ 𝜑2. By induction, we have

𝜋FV(𝜑1)𝜇 |= 𝜑1 and 𝜋FV(𝜑2)𝜇 |= 𝜑2.

By persistence, we have

𝜋FV(𝜑1∧𝜑2)𝜇 |= 𝜑1 and 𝜋FV(𝜑1∧𝜑2)𝜇 |= 𝜑2

so 𝜋FV(𝜑1∧𝜑2)𝜇 |= 𝜑1 ∧ 𝜑2.

• 𝜑 ≡ 𝜑1 ∨ 𝜑2. By induction, we have 𝜋FV(𝜑𝑖)𝜇 |= 𝜑𝑖 for 𝑖 = 1 or 𝑖 = 2. By

Kripke monotonicity, we have 𝜋FV(𝜑1,𝜑2)𝜇 |= 𝜑𝑖 so 𝜋FV(𝜑1∧𝜑2)𝜇 |= 𝜑1 ∨ 𝜑2.

• 𝜑 ≡ 𝜑1 → 𝜑2. Take any 𝜇′ ⊒ 𝜋FV(𝜑1→𝜑2)𝜇 such that 𝜇′ |= 𝜑1. By in-

ductive hypothesis, 𝜋FV(𝜑1)𝜇
′ |= 𝜑1 Because 𝜇′ ⊒ 𝜋FV(𝜑1→𝜑2)𝜇 , we have

𝜋FV(𝜑1→𝜑2)𝜇
′ = 𝜋FV(𝜑1→𝜑2)𝜇, and thus 𝜋FV(𝜑1)𝜇

′ = 𝜋FV(𝜑1)𝜇. There exists

a distribution 𝜇′′ such that dom(𝜇′′) = dom(𝜇) ∪ dom(𝜋FV(𝜑1)𝜇
′), and

𝜋dom(𝜇) (𝜇′′) = 𝜇 and 𝜋dom(𝜋𝜑1 𝜇
′) (𝜇′′) = 𝜋𝜑1𝜇

′. In particular, 𝜇′′ ⊒ 𝜇. By

235


persistence, we have 𝜇′′ |= 𝜑1 and by validity, we have 𝜇′′ |= 𝜑2. By induc-

tion, 𝜋FV(𝜑2) (𝜇′′) |= 𝜑2. Since 𝜋FV(𝜑2) (𝜇′′) ⊑ 𝜇′, persistence gives 𝜇′ |= 𝜑2.

So, 𝜋FV(𝜑1→𝜑2)𝜇 |= 𝜑1 → 𝜑2 as desired.

• 𝜑 ≡ 𝜑1 ∗ 𝜑2. There exists 𝜇1 and 𝜇2 with 𝜇1 ◦ 𝜇2 ⊑ 𝜇 and 𝜇1 |= 𝜑1 and

𝜇2 |= 𝜑2. By induction, we have 𝜋FV(𝜑1)𝜇1 |= 𝜑1 and 𝜋FV(𝜑2)𝜇2 |= 𝜑2.

By persistence, we have 𝜋FV(𝜑1∗𝜑2)𝜇1 |= 𝜑1 and 𝜋FV(𝜑1∗𝜑2)𝜇2 |= 𝜑2. Now,

since 𝜇1 ◦ 𝜇2 is defined, 𝜋FV(𝜑1∗𝜑2)𝜇1 ◦ 𝜋FV(𝜑1∗𝜑2)𝜇2 is defined as well and

𝜋FV(𝜑1∗𝜑2)𝜇1 ◦ 𝜋FV(𝜑1∗𝜑2)𝜇2 ⊑ 𝜋FV(𝜑1∗𝜑2)𝜇. So, 𝜋FV(𝜑1∗𝜑2)𝜇 |= 𝜑1 ∗ 𝜑2 as de-

sired.

• 𝜑 ≡ 𝜑1 −∗ 𝜑2. Take any 𝜇′ such that 𝜇′ ◦ 𝜋FV(𝜑1−∗𝜑2)𝜇 ↓ and 𝜇′ |= 𝜑1. If 𝜇′ ◦ 𝜇 ↓,

then 𝜇′ ◦ 𝜇 |= 𝜑2 and by induction, (𝜋FV(𝜑1−∗𝜑2)𝜇
′) ◦ (𝜋FV(𝜑1−∗𝜑2)𝜇) |= 𝜑2.

persistence gives 𝜇′ ◦ 𝜋FV(𝜑1−∗𝜑2)𝜇 |= 𝜑2.

Otherwise, suppose that 𝜇′ ◦ 𝜇 is not defined. Since 𝜇′ ◦ 𝜋FV(𝜑1−∗𝜑2)𝜇 ↓, it

must be the case that ∅ ≠ dom(𝜇′) ∩ dom(𝜇) ⊆ Var \ FV(𝜑1 −∗ 𝜑2); thus,

(𝜋FV(𝜑1) (𝜇′)◦𝜇) ↓. By induction, 𝜋FV(𝜑1) (𝜇′) |= 𝜑1 and so 𝜋FV(𝜑1) (𝜇′)◦𝜇 |= 𝜑2.

By induction again, (𝜋FV(𝜑1)∩FV(𝜑2)𝜇
′) ◦ (𝜋FV(𝜑2)𝜇) |= 𝜑2. By persistence and

the fact that the extension is defined, we have 𝜇′ ◦ (𝜋FV(𝜑1−∗𝜑2)𝜇) |= 𝜑2. So,

(𝜋FV(𝜑1−∗𝜑2)𝜇) |= 𝜑1 −∗ 𝜑2 as desired.

□

Lemma A.2.1 (Soundness for RV, WV, MV [Barthe et al., 2019]). Let 𝜇′ = ⟦𝑐⟧𝜇,

and let 𝑅 = RV(𝑐),𝑊 = WV(𝑐), 𝑆 = Var \MV(𝑐). Then:

1. Variables outside of MV(𝑐) are not modified: 𝜋𝐶 (𝜇′) = 𝜋𝐶 (𝜇).

2. The sets 𝑅 and𝑊 are disjoint.

236


3. There exists 𝑓 : Mem[𝑅] → D(Mem[MV(𝑐)]) with 𝜇′ = bind(𝜇, 𝑚 ↦→

𝑓 (𝜋𝑅 (𝑚)) ⊗ unit(𝜋𝑆 (𝑚))).

Theorem 2.3.6 (Soundness). If ⊢ {𝜑} 𝑐 {𝜓} is derivable, then |= {𝜑} 𝑐 {𝜓}.

Proof. By induction on the derivation. Let 𝜇 satisfy the pre-condition of the

conclusion.

SKIP Trivial.

SEQ By induction hypothesis.

DASSN By induction on the syntax of 𝜑.

RASSN The output distribution ⟦𝑥 ← 𝑒⟧𝜇 = bind(𝜇, 𝑚 ↦→ unit(𝑚 [𝑥 ↦→

⟦𝑒⟧(𝑚)])). Because 𝑥 ∉ FV(𝑒), for any 𝑚,

⟦𝑥⟧(𝑚 [𝑥 ↦→ ⟦𝑒⟧(𝑚)]) = ⟦𝑒⟧(𝑚 [𝑥 ↦→ ⟦𝑒⟧(𝑚)]).

Thus, ⟦𝑥 ← 𝑒⟧𝜇 |= [𝑥 = 𝑒].

SAMP The output distribution ⟦𝑥 $← 𝑑⟧(𝜇) = bind(𝜇, 𝑚 ↦→ bind(𝑑, 𝑣 ↦→

unit(𝑚 [𝑥 ↦→ 𝑣]))). Thus,

⟦𝑥⟧(⟦𝑥 $← 𝑑⟧(𝜇)) = bind(⟦𝑥 $← 𝑑⟧(𝜇), 𝑚 ↦→ ⟦𝑥⟧(𝑚))

= bind(bind(𝜇, 𝑚 ↦→ bind(𝑑, 𝑣 ↦→ unit(𝑚 [𝑥 ↦→ 𝑣]))), 𝑚 ↦→ ⟦𝑥⟧(𝑚))

= bind(𝜇, 𝑚 ↦→ bind(𝑑, 𝑣 ↦→ unit(𝑣)))

= bind(𝜇, 𝑚 ↦→ 𝑑)

= 𝑑.

Therefore, ⟦𝑥 $← 𝑑⟧(𝜇) |= 𝑥 $∼ 𝑑.

237


COND Since 𝜇 |= 𝜑 and |= 𝜑 → Detm⟨𝑏⟩, we have 𝜇 |= Detm⟨𝑏⟩. Thus, either

𝜇 |= [𝑏 = tt] or 𝜇 |= [𝑏 = ff ]. Note that exactly one case holds. If 𝜇 |= [𝑏 = tt]

holds, then 𝜇 |= 𝜑 ∧ [𝑏 = tt] and thus ⟦if 𝑏 then 𝑐 else 𝑐′⟧(𝜇) = ⟦𝑐⟧(𝜇), we

can conclude by induction. The case 𝜇 |= [𝑏 = ff ] is similar.

RCOND Because 𝜇 |= 𝜑 ∗ Own(𝑏), there exist 𝜇1, 𝜇2 such that 𝜇1 ◦ 𝜇2 ⊑ 𝜇, and

𝜇1 |= 𝜑 and 𝜇2 |= Own(𝑏). Let 𝜌 be the probability ⟦𝑏⟧(𝜇2) (tt). We may

assume that 𝜌 ∈ (0, 1) — if 𝜌 is equal to zero or one then we can conclude

by induction.

By the semantics of commands, we have

⟦if 𝑏 then 𝑐 else 𝑐′⟧(𝜇) = 𝜌 · ⟦𝑐⟧(𝜇𝑡) + (1 − 𝜌) · ⟦𝑐′⟧(𝜇 𝑓 )

where 𝜇𝑡 is the distribution 𝜇 conditioned on 𝑏 = tt, and 𝜇 𝑓 is the distribu-

tion 𝜇 conditioned on 𝑏 = ff . Recall that by the induction hypothesis, we

have:

⟦𝑐⟧(𝜇𝑡) |= 𝜓 ∗ [𝑏 = tt] and ⟦𝑐′⟧(𝜇 𝑓 ) |= 𝜓 ∗ [𝑏 = ff ] .

Thus, we can decompose the output states into

𝜈 ◦ 𝜈𝑡 ⊑ ⟦𝑐⟧(𝜇𝑡) and 𝜈 ◦ 𝜈 𝑓 ⊑ ⟦𝑐⟧(𝜇 𝑓 )

such that

𝜈 |= 𝜓 and 𝜈𝑡 |= [𝑏 = tt] and 𝜈 𝑓 |= [𝑏 = ff ]

noting that 𝜈 can be taken to be the same in both branches since 𝜓 ∈ SP;

by lemma 2.3.7, we may also assume that dom(𝜈𝑡) = dom(𝜈 𝑓 ). Thus, we

238


have:

𝜌 · 𝜈 ◦ 𝜈𝑡 + (1 − 𝜌) · 𝜈 ◦ 𝜈 𝑓 = 𝜌 · (𝜈 ⊗ 𝜈𝑡) + (1 − 𝜌) · (𝜈 ⊗ 𝜈 𝑓 )

= 𝜈 ⊗ (𝜈𝑡 ⊕𝜌 𝜈 𝑓 )

= 𝜈 ◦ (𝜈𝑡 ⊕𝜌 𝜈 𝑓 )

⊑ ⟦if 𝑏 then 𝑐 else 𝑐′⟧(𝜇),

and we can conclude since 𝜈 |= 𝜓 and 𝜈𝑡 ⊕𝜌 𝜈 𝑓 |= Own(𝑏).

LOOP For any 𝜇 |= 𝜑, the side condition implies 𝜇 |= Detm⟨𝑏⟩. We show by

induction that 𝜇 |= 𝜑 implies that for any 𝑛 > 0, ⟦(if 𝑏 then 𝑐)𝑛⟧(𝜇) |=

𝜑 ∧ Detm⟨𝑏⟩.

Base case: ⟦(if 𝑏 then 𝑐)0⟧(𝜇) = 𝜇, so it satisfies 𝜑. By side condition that

|= 𝜑→ Detm⟨𝑏⟩, we have ⟦(if 𝑏 then 𝑐)0⟧(𝜇) |= 𝜑 ∧ Detm⟨𝑏⟩.

Inductive case: Say 𝜇′ = ⟦(if 𝑏 then 𝑐)𝑛⟧(𝜇). By inductive hypothesis,

𝜇′ |= 𝜑 ∧ Detm⟨𝑏⟩, there are two possibilities:

• 𝜇′ |= 𝜑 ∧ [𝑏 = ff ], then

⟦(if 𝑏 then 𝑐)𝑛+1⟧(𝜇) = ⟦if 𝑏 then 𝑐⟧(𝜇′) = 𝜇′,

which implies that ⟦(if 𝑏 then 𝑐)𝑛+1⟧(𝜇) |= 𝜑 ∧ [𝑏 = ff ].

• 𝜇′ |= 𝜑 ∧ [𝑏 = tt], then

⟦(if 𝑏 then 𝑐)𝑛+1⟧(𝜇) = ⟦if 𝑏 then 𝑐⟧(𝜇′) = ⟦𝑐⟧(𝜇′),

which implies that ⟦(if 𝑏 then 𝑐)𝑛+1⟧(𝜇) |= 𝜑 because

⊢ {𝜑 ∧ [𝑏 = tt]} 𝑐 {𝜑}. Since |= 𝜑 → Detm⟨𝑏⟩, so

⟦if 𝑏 then 𝑐𝑛+1⟧(𝜎′, 𝜇′) |= 𝜑 ∧ Detm⟨𝑏⟩.

In both cases, ⟦if 𝑏 then 𝑐𝑛+1⟧(𝜎′, 𝜇′) |= 𝜑 ∧ Detm⟨𝑏⟩.

239


Since we assumed that the loop ends in finite step, there exists a fi-

nite number 𝑁 such that ⟦(if 𝑏 then 𝑐)𝑁⟧(𝜇) |= 𝜑 ∧ [𝑏 = ff ] (and also

⟦(if 𝑏 then 𝑐)𝑁−1⟧(𝜇) |= 𝜑 ∧ [𝑏 = tt] if 𝑁 > 1, but this fact is not used in

this proof).

Then ⟦while 𝑏 do 𝑐⟧(𝜇) = ⟦(if 𝑏 then 𝑐)𝑁⟧(𝜇) |= 𝜑 ∧ [𝑏 = ff ].

WEAK By induction hypothesis and semantics of implication.

TRUE Trivial.

CONJ By induction hypothesis and semantics of conjunction.

CASE By case analysis.

CONST The fact that ⟦𝑐⟧(𝜇) |= 𝜓 follows by induction. To show ⟦𝑐⟧(𝜇) |= 𝜂,

by the restriction property we have 𝜋FV(𝜂) (𝜇) |= 𝜂 initially, and since the

free variables of 𝜂 are disjoint from the modified variables of 𝑐, we have

𝜋FV(𝜂) (⟦𝑐⟧𝜇) |= 𝜂 as well. Thus, by monotonicity, ⟦𝑐⟧(𝜇) |= 𝜂 as desired.

FRAME There exist 𝜇1, 𝜇2 such that 𝜇1 ◦ 𝜇2 ⊑ 𝜇, and 𝜇1 |= 𝜑 and 𝜇2 |= 𝜂; let

𝑆1 ≜ dom(𝜇1), and note that 𝑇 ∪ RV(𝑐) ⊆ 𝑆1 by the last side-condition.

By the restriction property we have 𝜋FV(𝜂) (𝜇2) |= 𝜂; let 𝑆2 ≜ dom(𝜇2) ∩

FV(𝜂) and note that 𝑆1 and 𝑆2 are disjoint. Let 𝑆3 be the set of all variables

not contained in 𝑆1 or 𝑆2. Since WV(𝑐) is disjoint from 𝑆2 by the first side-

condition, we must have WV(𝑐) ⊆ 𝑆1 ∪ 𝑆3.

By induction, we have ⟦𝑐⟧(𝜇) |= 𝜓. The restriction property gives

𝜋FV(𝜓) (⟦𝑐⟧(𝜇)) |= 𝜓.

By the third side-condition, RV(𝑐) ⊆ 𝑆1. By soundness of RV and WV, all

variables in WV(𝑐) must be written to before they are read and there is a

function 𝐹 : Mem[𝑆1] → D(Mem[WV(𝑐) ∪ 𝑆1]) such that:

𝜋WV(𝑐)∪𝑆1 (⟦𝑐⟧𝜇) = bind(𝜇, 𝑚 ↦→ 𝐹 (𝜋𝑆1 (𝑚))).

240


Since 𝑆2 ⊆ FV(𝜂), variables in 𝑆2 are not in MV(𝑐) by the first side-

condition, and 𝑆2 is disjoint from WV(𝑐) ∪ 𝑆1. By soundness of MV, we

have:

𝜋(WV(𝑐)∪𝑆1)∪𝑆2 (⟦𝑐⟧𝜇) = bind(𝜋(WV(𝑐)∪𝑆1)∪𝑆2 (𝜇), (𝑚1, 𝑚2) ↦→ 𝐹 (𝑚1)⊗unit(𝑚2)).

Since 𝑆1 and 𝑆2 are independent in 𝜇, we know that 𝑆1 ∪WV(𝑐) and 𝑆2 are

independent in ⟦𝑐⟧(𝜇) as well. Hence:

⟦𝑐⟧𝜇 ⊒ (𝜋𝑆1∪WV(𝑐) (⟦𝑐⟧𝜇)) ◦ (𝜋𝑆2 (⟦𝑐⟧𝜇)).

We know that 𝐹𝑉 (𝜓) ⊆ 𝑇 ∪ WV(𝑐) ⊆ 𝑆1 ∪ WV(𝑐) so since 𝜓 is valid

in ⟦𝑐⟧(𝜇), it is valid in the first conjunct by the restriction property and

the second side-condition. Since 𝜋𝑆2 (⟦𝑐⟧𝜇) = 𝜋𝑆2 (𝜇), and 𝜂 does not de-

pend on modified deterministic variables, 𝜂 is valid in the second conjunct.

Thus, we can conclude:

⟦𝑐⟧(𝜇) |= 𝜓 ∗ 𝜂. □

241


APPENDIX B

LINA: A SEPARATION LOGIC FOR NEGATIVE DEPENDENCE

B.1 Preliminaries

Lemma B.1.1. Say S = {𝑆𝑖 | 1 ≤ 𝑖 ≤ 𝑁} where 𝑆𝑖 are disjoint, 𝑆 = ∪S and 𝜇 ∈

Mem[𝑆],

Then, 𝑆𝑖 are independent in 𝜇 if and only if for any family of all monotone or all

antitone functions 𝑓𝑖 : Mem[𝑆𝑖] → R+,

E𝑥∼𝜇

[∏
𝑆𝑖∈S

𝑓𝑖 (𝜋𝑆𝑖𝑥)
]
=

∏
𝑆𝑖∈S

E𝑥∼𝜇 [ 𝑓𝑖 (𝜋𝑆𝑖𝑥)] . (B.1)

Proof. The forward direction is straightforward. The backward direction needs

more careful analysis. In general, zero correlation does not imply independence,

but here, we have the equality for all family of monotone or antitone functions,

so that suffices for independence.

We prove by induction on T = {𝑆𝑖 | 1 ≤ 𝑖 ≤ 𝐾} that for any family of

𝑣𝑖 ∈ Mem[𝑆𝑖],

E𝑥∼𝜇


( ∧
𝑆𝑖∈T

𝜋𝑆𝑖𝑥 = 𝑣𝑖

)
∧ ©­«

∧
𝑆𝑖∈S\T

𝜋𝑆𝑖𝑥 < 𝑣𝑖
ª®¬
 =

∏
𝑆𝑖∈T

E𝑥∼𝜇
[
𝜋𝑆𝑖𝑥 = 𝑣𝑖

]
·

∏
𝑆𝑖∈S\T

E𝑥∼𝜇
[
𝜋𝑆𝑖𝑥 < 𝑣𝑖

]
.

(B.2)

Case |T | = 1: Say T = {𝑆 𝑗 }. Since indicator functions 𝑆𝑖 < 𝑣𝑖 and 𝑆𝑖 ≤ 𝑣𝑖 are

242


both monotonically decreasing,

E𝑥∼𝜇

𝜋𝑆 𝑗𝑥 = 𝑣 𝑗 ∧ (
∧

𝑆𝑖∈S\T
𝜋𝑆𝑖𝑥 < 𝑣𝑖)


= E𝑥∼𝜇

𝜋𝑆 𝑗𝑥 ≤ 𝑣 𝑗 ∧ (
∧

𝑆𝑖∈S\T
𝜋𝑆𝑖𝑥 < 𝑣𝑖)

 − E𝑥∼𝜇
𝜋𝑆 𝑗𝑥 < 𝑣 𝑗 ∧ (

∧
𝑆𝑖∈S\T

𝜋𝑆𝑖𝑥 < 𝑣𝑖)


= E𝑥∼𝜇
[
𝜋𝑆 𝑗𝑥 ≤ 𝑣 𝑗

]
·

∏
𝑆𝑖∈S\T

E𝑥∼𝜇
[
𝜋𝑆𝑖𝑥 < 𝑣𝑖

]
− E𝑥∼𝜇

[
𝜋𝑆 𝑗𝑥 < 𝑣 𝑗

]
·

∏
𝑆𝑖∈S\T

E𝑥∼𝜇
[
𝜋𝑆𝑖𝑥 < 𝑣𝑖

]
(By Equation (B.1))

= (E𝑥∼𝜇
[
𝜋𝑆 𝑗𝑥 ≤ 𝑣 𝑗

]
− E𝑥∼𝜇

[
𝜋𝑆 𝑗𝑥 < 𝑣 𝑗

]
) ·

∏
𝑆𝑖∈S\T

E𝑥∼𝜇
[
𝜋𝑆𝑖𝑥 < 𝑣𝑖

]
= E𝑥∼𝜇

[
𝜋𝑆 𝑗𝑥 = 𝑣 𝑗

]
·

∏
𝑆𝑖∈S\T

E𝑥∼𝜇
[
𝜋𝑆𝑖𝑥 < 𝑣𝑖

]
Case |T | > 1 Let 𝑆 𝑗 be an element in T .

E𝑥∼𝜇

(
∧
𝑆𝑖∈T

𝜋𝑆𝑖𝑥 = 𝑣𝑖) ∧ (
∧

𝑆𝑖∈S\T
𝜋𝑆𝑖𝑥 < 𝑣𝑖)


= E𝑥∼𝜇

𝜋𝑆 𝑗𝑥 ≤ 𝑣 𝑗 ∧ (
∧

𝑆𝑖∈T\{𝑆 𝑗 }
𝜋𝑆𝑖𝑥 = 𝑣𝑖) ∧ (

∧
𝑆𝑖∈S\T

𝜋𝑆𝑖𝑥 < 𝑣𝑖)


− E𝑥∼𝜇
𝜋𝑆 𝑗𝑥 < 𝑣 𝑗 ∧ (

∧
𝑆𝑖∈T\{𝑆 𝑗 }

𝜋𝑆𝑖𝑥 = 𝑣𝑖) ∧ (
∧

𝑆𝑖∈S\T
𝜋𝑆𝑖𝑥 < 𝑣𝑖)


= E𝑥∼𝜇

[
𝜋𝑆 𝑗𝑥 ≤ 𝑣 𝑗

]
·

∏
𝑆𝑖∈T\{𝑆 𝑗 }

E𝑥∼𝜇
[
𝜋𝑆𝑖𝑥 = 𝑣𝑖

]
·

∏
𝑆𝑖∈S\T

E𝑥∼𝜇
[
𝜋𝑆𝑖𝑥 < 𝑣𝑖

]
− E𝑥∼𝜇

[
𝜋𝑆 𝑗𝑥 < 𝑣 𝑗

]
·

∏
𝑆𝑖∈T\{𝑆 𝑗 }

E𝑥∼𝜇
[
𝜋𝑆𝑖𝑥 = 𝑣𝑖

]
·

∏
𝑆𝑖∈S\T

E𝑥∼𝜇
[
𝜋𝑆𝑖𝑥 < 𝑣𝑖

]
= E𝑥∼𝜇

[
𝜋𝑆 𝑗𝑥 = 𝑣 𝑗

]
·

∏
𝑆𝑖∈T\{𝑆 𝑗 }

E𝑥∼𝜇
[
𝜋𝑆𝑖𝑥 = 𝑣𝑖

]
·

∏
𝑆𝑖∈S\T

E𝑥∼𝜇
[
𝜋𝑆𝑖𝑥 < 𝑣𝑖

]
When T = S, Equation (B.2) implies

E𝑥∼𝜇

[∧
𝑆𝑖∈S

𝜋𝑆𝑖𝑥 = 𝑣𝑖

]
=

∏
𝑆𝑖∈S

E𝑥∼𝜇
[
𝜋𝑆𝑖𝑥 = 𝑣𝑖

]
for any 𝑣𝑖’s. Thus, components in S are independent. □

243


We prove some properties of coarsening. In the following we will use an

alternative definition of coarsening, which will be shown to be equivalent to

what we define in the main text.

Definition B.1.1 (Alternative definition of coarsening). We first index any par-

tition S as S1, . . . ,S|S|. Say |S′| = 𝑚, |S| = 𝑛. We say S′ coarsens a partition S

there exists a function a 𝑓 : [𝑚] → P([𝑛]) such that 1) ∪𝑖∈[𝑚] 𝑓 (𝑖) = [𝑛]; 2) for

any 𝑖, 𝑗 ∈ [𝑚], either 𝑖 = 𝑗 or 𝑓 (𝑖), 𝑓 ( 𝑗) are disjoint; 3) S′ = {∪{S 𝑗 | 𝑗 ∈ 𝑓 (𝑖)} | 𝑖 ∈

[𝑚]}.

Lemma B.1.2. Let S, S′ be two partitions. Then S′ coarsens S according to Defini-

tion B.1.1 if and only if S′ coarsens S according to Definition 3.3.4 .

Proof. We index S as S1, . . . ,S𝑛 and S′ as S′1, . . . ,S
′
|𝑚 | .

Backward direction: By that definition, we know a) for any S′
𝑖
∈ S′, S′

𝑖
= ∪R

for some R ⊆ S; b) ∪S = ∪S′.

We define the function 𝑔 : [𝑚] → P([𝑛]) as 𝑔(𝑖) = { 𝑗 | S 𝑗 ⊆ S′𝑖 }. This 𝑔

would satisfies all the conditions required:

1. By substitution, ∪𝑖∈[𝑚]𝑔(𝑖) = ∪𝑖∈[𝑚]{ 𝑗 | S 𝑗 ⊆ S′𝑖 } = ∪𝑠′∈S′{ 𝑗 | S 𝑗 ⊆ 𝑠′}. By b),

for any 𝑗 ∈ [𝑛], S 𝑗 ⊆ ∪S′. Then by a) and that S is a partition, if 𝑠′ covers

any of S 𝑗 , it must covers all of S 𝑗 , then S 𝑗 ⊆ ∪S′ implies there exists 𝑠′ ∈ S′

such that S 𝑗 ⊆ 𝑠′. Thus, 𝑗 ∈ { 𝑗 | S 𝑗 ⊆ 𝑠′} ⊆ ∪𝑠′∈S′{ 𝑗 | S 𝑗 ⊆ 𝑠′}. For any

𝑗 ∉ [𝑛], S 𝑗 is undefined, so it is impossible that S 𝑗 ⊆ 𝑠′ for some 𝑠′ ⊆ S′.

Therefore, ∪𝑠′∈S′{ 𝑗 | S 𝑗 ⊆ 𝑠′} = [𝑛].

2. For any 𝑘 ∈ 𝑔(𝑖), S𝑘 ⊆ S′𝑖 . If 𝑖 ≠ 𝑗 , then S′
𝑖

and S′
𝑗

are disjoint since S′ is

a partition. Thus, S𝑘 ⊈ S′
𝑗
, and 𝑘 ∉ 𝑔( 𝑗). So for any 𝑖 ≠ 𝑗 , 𝑔(𝑖), 𝑔( 𝑗) are

disjoint.

244


3. By substitution,

{∪{S 𝑗 | 𝑗 ∈ 𝑔(𝑖)} | 𝑖 ∈ [𝑚]} = {∪{S 𝑗 | S 𝑗 ⊆ S′𝑖 } | 𝑖 ∈ [𝑚]} = {∪{S 𝑗 | S 𝑗 ⊆ 𝑠′} | 𝑠′ ∈ S′}.

Again, by a) and that S is a partition, if 𝑠′ ∈ S covers any part of of S 𝑗 , it

must covers all of S 𝑗 , so ∪{S 𝑗 | S 𝑗 ⊆ 𝑠′} = 𝑠′. Thus, {∪{S 𝑗 | S 𝑗 ⊆ 𝑠′} | 𝑠′ ∈

S′} = S′.

Forward direction: By 3), we know that S′ = {∪{S 𝑗 | 𝑗 ∈ 𝑓 (𝑖)} | 𝑖 ∈ [𝑚]}.

So for any S′
𝑖
∈ S′, we have 𝑠′ = ∪{S 𝑗 | 𝑗 ∈ 𝑓 (𝑖)}, which is a subset of S′ by

construction. So we proved a). Also, ∪S′ = ∪{∪{S 𝑗 | 𝑗 ∈ 𝑓 (𝑖)} | 𝑖 ∈ [𝑚]} =

∪{S 𝑗 | 𝑗 ∈ 𝑓 (𝑖) | 𝑖 ∈ [𝑚]}, and by 1), that is equivalent to ∪{S 𝑗 | 𝑗 ∈ [𝑛]}, which

is equivalent to ∪S. □

We can prove that coarsening commute with projections.

Lemma B.1.3. Given a partition S = {S𝑖}𝑖 and a set 𝑋 , let S𝑋 = {S𝑖 ∩ 𝑋 | S𝑖 ∈ S}.

For any T coarsening S𝑋 , there exists a coarsening S′ of S such that T = {S𝑖 ∩ 𝑋 |

S𝑖 ∈ S′}; conversely, for any S′ coarsening S, and S′
𝑋
= {S𝑖 ∩ 𝑋 | S𝑖 ∈ S′}, we have

S′
𝑋

coarsens S𝑋 .

Proof. Forward direction: By Definition B.1.1, there exists a coarsening function

𝑓 such that

T = {∪{(S𝑋) 𝑗 | 𝑗 ∈ 𝑓 (𝑖)} | 𝑖 ∈ [|T |]}

= {∪{S 𝑗 ∩ 𝑋 | 𝑗 ∈ 𝑓 (𝑖)} | 𝑖 ∈ [|T |]}

= {(∪{S 𝑗 | 𝑗 ∈ 𝑓 (𝑖)}) ∩ 𝑋 | 𝑖 ∈ [|T |]}

= {𝑆′ ∩ 𝑋 | 𝑆′ ∈ S′} (where S′ = {∪{S 𝑗 | 𝑗 ∈ 𝑓 (𝑖)} | 𝑖 ∈ [|T |]})

245


S′ has the same size as T , so S′ = {∪{S 𝑗 | 𝑗 ∈ 𝑓 (𝑖)} | 𝑖 ∈ [|S′|]}, and thus S′

coarsens S.

Backward direction: S′ coarsens S, so there exists a coarsening function 𝑓

such that

S′ = {∪{𝑆 𝑗 | 𝑗 ∈ 𝑓 (𝑖)} | 𝑖 ∈ [|S′|]}.

Thus,

S′𝑋 = {(∪{𝑆 𝑗 | 𝑗 ∈ 𝑓 (𝑖)}) ∩ 𝑋 | 𝑖 ∈ [|S′|]}

= {∪{𝑆 𝑗 ∩ 𝑋 | 𝑗 ∈ 𝑓 (𝑖)} | 𝑖 ∈ [|S′|]}

= {∪{𝑆𝑋 𝑗 | 𝑗 ∈ 𝑓 (𝑖)} | 𝑖 ∈ [|S′|]}.

Therefore, S′
𝑋

coarsens S𝑋 .

□

B.2 A BI Frame for Negative Association

B.2.1 Capturing Negative Association

Theorem 3.3.2. For any two states 𝜇1, 𝜇2 ∈ 𝑋 , 𝜇1 ⊕𝑠 𝜇2 ⊆ 𝜇1 ⊕ 𝜇2 ⊆ 𝜇1 ⊕𝑤 𝜇2.

Proof. Let 𝑆 denote dom(𝜇1) and 𝑇 denote dom(𝜇2).

For any 𝜇 ∈ 𝜇1 ⊕𝑠 𝜇2, we have 𝜋𝑆𝜇 = 𝜇1, 𝜋𝑇𝜇 = 𝜇2, and 𝜇 satisfies NA. 𝜇 being

NA implies 𝜇 is R-PNA for any partition R on dom(𝜇) So for any partition S on

𝑆, partition T on 𝑇 , 𝜇 is S ∪ T -PNA. Therefore, 𝜇 ∈ 𝜇1 ⊕ 𝜇2.

246


For any 𝜇 ∈ 𝜇1 ⊕ 𝜇2, 𝜋𝑆𝜇 = 𝜇1, 𝜋𝑇𝜇 = 𝜇2, and 𝜇 is {𝑆, 𝑇}-PNA since 𝜇1 is

{𝑆}-PNA, 𝜇2 is {𝑇}-PNA. Thus, 𝜇 ∈ 𝜇1 ⊕𝑤 𝜇2. □

Theorem 3.3.1. Given a set of variables 𝑆, 𝑆 satisfies NA in 𝜇 iff 𝜇 satisfies S-PNA

for any S partitioning 𝑆 iff 𝜇 satisfies {{𝑥} | 𝑥 ∈ 𝑆}-PNA.

Proof. The second equivalence is straightforward:

• {{𝑠} | 𝑠 ∈ S} is a partition of 𝑆, so we have the backward direction.

• Any S partitioning 𝑆 coarsens {{𝑠} | 𝑠 ∈ 𝑆}, so we have the first direction.

For the forward direction of the first equivalence, it suffices to prove that

for any partition S of 𝑆, any family of all monotone or all antitone functions

𝑓𝑖 : Mem[𝑆𝑖] → R+,

E𝑚∼𝜇

[∏
𝑆𝑖∈S

𝑓𝑖 (𝜋𝑆𝑖𝑚)
]
≤

∏
𝑆𝑖∈S

E𝑚∼𝜇
[
𝑓𝑖 (𝜋𝑆𝑖𝑚)

]
. (B.3)

We prove that by induction on the size of S.

Base case |S| = 1: S-PNA is trivial.

Base case |S| = 2: S-PNA is straightforward from NA.

Inductive case: Assuming 𝜇 satisfies S-PNA for any partition with size less

than 𝐾 , we want to show that 𝜇 satisfies S-PNA for any partition with size

equals to 𝐾 .

Say S = {𝑆1, . . . , 𝑆𝐾}. For any family of all monotone or all antitone func-

tions 𝑓𝑖 : Mem[𝑆𝑖] → R+, either both 𝑚 ↦→∏𝐾−1
𝑖=1 𝑓𝑖 (𝜋𝑆𝑖𝑚) and 𝑓𝐾 are mono-

tone, or both 𝑚 ↦→ ∏𝐾−1
𝑖=1 𝑓𝑖 (𝜋𝑆𝑖𝑚) and 𝑓𝐾 are antitone. Thus, by the induc-

247


tive hypothesis

E𝑥∼𝜇

[
𝐾∏
𝑖=1

𝑓𝑖 (𝜋𝑆𝑖𝑚)
]
≤ E𝑥∼𝜇

[
𝐾−1∏
𝑖=1

𝑓𝑖 (𝜋𝑆𝑖𝑚)
]
· E𝑥∼𝜇

[
𝑓𝐾 (𝜋𝑆𝐾𝑚)

]
(We can partition ∪𝐾

𝑖=1𝑆𝑖 into {∪𝐾−1
𝑖=1 𝑆𝑖, 𝑆𝐾})

≤
𝐾−1∏
𝑖=1

E𝑥∼𝜇
[
𝑓𝑖 (𝜋𝑆𝑖𝑚)

]
· E𝑥∼𝜇

[
𝑓𝐾 (𝜋𝑆𝐾𝑚)

]
(B.4)

(We can partition ∪𝐾
𝑖=1𝑆𝑖 into {𝑆1, . . . , 𝑆𝐾})

=

𝐾∏
𝑖=1

E𝑥∼𝜇
[
𝑓𝑖 (𝜋𝑆𝑖𝑚)

]
. (B.5)

The backward direction of the first equivalence is more involved. For any

two disjoint 𝐴, 𝐵 ⊆ 𝑆, we know 𝜇 satisfies {𝐴, 𝐵}-PNA, so for every pair of both

monotone or both antitone functions 𝑓 : Mem[𝐴] → R+, 𝑔 : Mem[𝐵] → R+, we

have

E𝑚∼𝜇 [ 𝑓 (𝜋𝐴𝑚) · 𝑔(𝜋𝐵𝑚)] ≤ E𝑚∼𝜇 [ 𝑓 (𝜋𝐴𝑚)] · E𝑚∼𝜇 [𝑔(𝜋𝐵𝑚)] .

But the problem is to show this inequality when 𝑓 , 𝑔 are not both non-negative.

We prove that in three steps:

1. If 𝑓 , 𝑔 are lower-bounded by −𝐿, i.e., 𝑓 (𝑥) ≥ −𝐿 and 𝑔(𝑥) ≥ −𝐿 for any 𝑥.

Then 𝑥 → 𝑓 (𝑥) +𝐿 and 𝑥 → 𝑔(𝑥) +𝐿 are both non-negative functions. Thus,

E𝑚∼𝜇 [( 𝑓 (𝜋𝐴𝑚) + 𝐿) · (𝑔(𝜋𝐵𝑚) + 𝐿)] ≤ E𝑚∼𝜇 [ 𝑓 (𝜋𝐴𝑚) + 𝐿] · E𝑚∼𝜇 [𝑔(𝜋𝐵𝑚) + 𝐿] .

(B.6)

248


Meanwhile,

E[( 𝑓 (𝜋𝐴𝑚) + 𝐿) · (𝑔(𝜋𝐵𝑚) + 𝐿)]

= E[ 𝑓 (𝜋𝐴𝑚) · 𝑔(𝜋𝐵𝑚)] + 𝐿 · E[ 𝑓 (𝜋𝐴𝑚)] + 𝐿 · E[𝑔(𝜋𝐵𝑚)] + 𝐿2

E[ 𝑓 (𝜋𝐴𝑚) + 𝐿] · E[𝑔(𝜋𝐵𝑚) + 𝐿]

= (E[ 𝑓 (𝜋𝐴𝑚)] + 𝐿) · (E[𝑔(𝜋𝐵𝑚)] + 𝐿)

= E[ 𝑓 (𝜋𝐴𝑚)] · E[𝑔(𝜋𝐵𝑚)] + 𝐿 · E[ 𝑓 (𝜋𝐴𝑚)] + 𝐿 · E[𝑔(𝜋𝐵𝑚)] + 𝐿2.

So Equation (B.6) implies that

E𝑚∼𝜇 [ 𝑓 (𝜋𝐴𝑚) · 𝑔(𝜋𝐵𝑚)] ≤ E𝑚∼𝜇 [ 𝑓 (𝜋𝐴𝑚)] · E𝑚∼𝜇 [𝑔(𝜋𝐵𝑚)] .

2. If the codomain of 𝑓 or 𝑔 does not range across both negative and posi-

tive numbers, then we can also prove the desired inequality by applying

the monotone convergence theorem on the result for lower-bounded func-

tions.

• Say 𝑓 is non-negative and 𝑔 is non-positive. For any natural number

𝑛, 𝑚 ∈ Mem[𝐴 ∪ 𝐵], we define 𝑔𝑛 (𝜋𝐵𝑚) = max(𝑔(𝜋𝐵𝑚),−𝑛), ℎ𝑛 (𝑚) =

𝑓 (𝜋𝐴𝑚) · 𝑔𝑛 (𝜋𝐵𝑚). Then for any 𝑛, 𝑔𝑛 and ℎ𝑛 are lower-bounded non-

positive functions; and for any 𝑚, {𝑔𝑛 (𝑚)}𝑛∈N is a monotonically de-

creasing sequence converging to 𝑔(𝑚), {ℎ𝑛 (𝑚)}𝑛∈N is a monotonically

decreasing sequence converging to 𝑓 (𝜋𝐴𝑚) ·𝑔(𝜋𝐵𝑚). By the monotone

convergence theorem,

E𝑚∼𝜇 𝑓 (𝜋𝐴𝑚) · 𝑔(𝜋𝐵𝑚) = lim
𝑛→∞

E𝑚∼𝜇ℎ𝑛 (𝑚)

E𝑚∼𝜇𝑔(𝜋𝐵𝑚) = lim
𝑛→∞

E𝑚∼𝜇𝑔𝑛 (

𝑝𝑖𝐵𝑚).

249


By what we proved above, for any 𝑛, we have

E𝑚∼𝜇 [ℎ𝑛 (𝑚)] ≤ E𝑚∼𝜇 [ 𝑓 (𝜋𝐴𝑚)] · E𝑚∼𝜇 [𝑔𝑛 (𝜋𝐵𝑚)]

Taking that to the limit 𝑛→∞,

lim
𝑛→∞

E𝑚∼𝜇 [ℎ𝑛 (𝑚)] ≤ lim
𝑛→∞

(
E𝑚∼𝜇 [ 𝑓 (𝜋𝐴𝑚)] · E𝑚∼𝜇 [𝑔(𝜋𝐵𝑚)]

)
= lim
𝑛→∞

E𝑚∼𝜇 [ 𝑓 (𝜋𝐴𝑚)] · lim
𝑛→∞

E𝑚∼𝜇 [𝑔(𝜋𝐵𝑚)]

Therefore, for any distribution 𝜇 ∈ D(Mem[𝐴 ∪ 𝐵]),

E𝑚∼𝜇 [ 𝑓 (𝜋𝐴𝑚) · 𝑔(𝜋𝐵𝑚)] ≤ E𝑚∼𝜇 [ 𝑓 (𝜋𝐴𝑚)] · E𝑚∼𝜇 [𝑔(𝜋𝐵𝑚)] .

• The case where 𝑓 is non-positive and 𝑔 is non-negative is symmetric.

• The case where 𝑓 and 𝑔 are both non-positive is also similar. We

will define 𝑓𝑛 (𝜋𝐴𝑚) = max( 𝑓 (𝜋𝐴𝑚),−𝑛), 𝑔𝑛 (𝜋𝐵𝑚) = max(𝑔(𝜋𝐵𝑚),−𝑛),

ℎ𝑛 (𝑚) = 𝑓𝑛 (𝜋𝐴𝑚) · 𝑔𝑛 (𝜋𝐵𝑚). Then we have

E𝑚∼𝜇 𝑓 (𝜋𝐴𝑚) · 𝑔(𝜋𝐵𝑚) = lim
𝑛→∞

E𝑚∼𝜇ℎ𝑛 (𝑚)

E𝑚∼𝜇𝑔(𝜋𝐵𝑚) = lim
𝑛→∞

E𝑚∼𝜇𝑔𝑛 (

𝑝𝑖𝐵𝑚)

E𝑚∼𝜇 𝑓 (𝜋𝐵𝑚) = lim
𝑛→∞

E𝑚∼𝜇 𝑓𝑛 (

𝑝𝑖𝐴𝑚).

And the rest follows.

3. Now we consider the general case where we only know both 𝑓 and 𝑔 are

either lower-bounded or upper bounded.

• If both 𝑓 and 𝑔 are lower-bounded, reduce to the first case.

250


• If 𝑓 is lower-bounded by 𝐿, 𝑔 is upper-bounded by 𝑈, then we can

consider function 𝑓 ′ = 𝑓 + 𝐿 and 𝑔′ = 𝑔 −𝑈. Then 𝑓 ′ is non-negative

and 𝑔′ is non-positive, so by step 2, we have

E𝑚∼𝜇 [ 𝑓 ′(𝜋𝐴𝑚) · 𝑔′(𝜋𝐵𝑚)] ≤ E𝑚∼𝜇 [ 𝑓 ′(𝜋𝐴𝑚)] · E𝑚∼𝜇 [𝑔′(𝜋𝐵𝑚)] .

By calculations analogous to what we did in step 1, that implies

E𝑚∼𝜇 [ 𝑓 (𝜋𝐴𝑚) · 𝑔(𝜋𝐵𝑚)] ≤ E𝑚∼𝜇 [ 𝑓 (𝜋𝐴𝑚)] · E𝑚∼𝜇 [𝑔(𝜋𝐵𝑚)] .

• If 𝑓 is upper-bounded and 𝑔 is lower-bounded: analogous to above.

• If both 𝑓 and 𝑔 are upper-bounded: also, analogous to above.

Thus, 𝜇 satisfies {𝐴, 𝐵}-PNA implies 𝜇 satisfies (𝐴, 𝐵)-NA. And therefore, 𝜇

satisfies {𝐴, 𝐵}-PNA for any 𝐴, 𝐵 ⊆ 𝑆 implies 𝑆 satisfies strong NA in 𝜇.

□

B.2.2 Omitted Proofs of Frame Conditions

Theorem 3.3.4. The structure XPNA = (𝑋D, ⊑D, ⊕, 𝐸D) is a Down-Closed BI frame.

Proof. We sketch the conditions, using the notation from the definition:

Down-Closed. Let dom(𝑥) = 𝑆,dom(𝑥′) = 𝑆′,dom(𝑦) = 𝑇,dom(𝑦′) = 𝑇 ′. We

claim that we can take 𝑧′ = 𝜋𝑆′∪𝑇 ′𝑧. We evidently have 𝑧 ⊒ 𝑧′, and 𝜋𝑆′𝑧
′ =

𝜋𝑆′𝜋𝑆𝑧 = 𝑥
′ and 𝜋𝑇 ′𝑧′ = 𝜋𝑇 ′𝜋𝑇 𝑧 = 𝑦′.

What remains to show is that 𝑧′ is S ∪ T -PNA for any S, T such that 𝑥′ is

S-PNA, 𝑦′ is T -PNA, and (∪S) ∩ (∪T ) = ∅.

251


If 𝑥′ is S-PNA, then 𝑥 is S-PNA; if 𝑦′ is T -PNA, then 𝑦 is T -PNA; then

𝑧 ∈ 𝑥 ⊕ 𝑦 must be S∪T -PNA. Since 𝑧′ := 𝜋𝑆′∪𝑇 ′𝑧, and (∪S) ∪ (∪T ) ⊆ 𝑆′∪𝑇 ′,

we have 𝑧′ is S∪T -PNA too. And evidently, dom(𝑧′) = 𝑆′∪𝑇 ′ = dom(𝑥′) ∪

dom(𝑦′). So 𝑧′ ∈ 𝑥′ ⊕ 𝑦′.

Commutativity Immediate.

Associativity Let dom(𝑥) = 𝑅,dom(𝑦) = 𝑆,dom(𝑧) = 𝑇 . We can assume that

these sets are all disjoint, otherwise there is nothing to prove. We claim

that we can take 𝑠 = 𝜋𝑆∪𝑇𝑤. For any 𝑤 in 𝑡 ⊕ 𝑧, 𝑡 ∈ 𝑥 ⊕ 𝑦, we want to show

that 𝑤 ∈ 𝑥 ⊕ 𝑠 and 𝑠 ∈ 𝑦 ⊕ 𝑧.

• For any partition R,S such that (∪R) ∩ (∪S) = ∅ and 𝑥 is R-PNA,

𝑠 is S-PNA. For set 𝑋 ⊆ Var, write {𝑌 ∩ 𝑋 | 𝑌 ∈ S} as S𝑋 . Then,

by Lemma B.1.3, 𝑠 is S-PNA implies 𝑦 must be S𝑆-PNA. Similarly, 𝑠

is S-PNA implies 𝑧 must be S𝑇 -PNA.

Then, 𝑡 ∈ 𝑥⊕𝑦 must be R∪(S𝑆)-PNA, and 𝑤 ∈ 𝑡⊕𝑧 must be R∪S𝑆∪S𝑇 -

PNA. Note that S coarsens S𝑆 ∪ S𝑇 so 𝑤 is R ∪ S𝑆 ∪ S𝑇 -PNA implies

that 𝑤 is R ∪ S-PNA.

Also, 𝜋𝑅𝑤 = 𝜋𝑅𝜋𝑅∪𝑆𝑤 = 𝜋𝑅𝑡 = 𝑥, and dom(𝑤) = 𝑅 ∪ 𝑆 ∪ 𝑇 = dom(𝑥) ∪

dom(𝑠).

Hence, 𝑤 ∈ 𝑥 ⊕ 𝑠.

• Note that 𝑥 is trivially {𝑅}-PNA. Then, for any partition S,T such

that 𝑅 ∩ (∪S) ∩ (∪T ) = ∅ and 𝑦 is S-PNA and 𝑧 is T -PNA, first 𝑡

must be ({𝑅} ∪ S)-PNA, and then 𝑤 must be ({𝑅} ∪ S ∪ T)-PNA. By

projection, 𝑠 = 𝜋𝑆∪𝑇 must be S ∪ T 𝑧-PNA.

Also, 𝜋𝑆𝑠 = 𝜋𝑆𝜋𝑆∪𝑇𝑤 = 𝜋𝑆𝑤 = 𝜋𝑆𝜋𝑅∪𝑆𝑤 = 𝜋𝑆𝑡 = 𝑦, and similarly,

𝜋𝑇 𝑠 = 𝑧. Also, dom(𝑠) = 𝑆 ∪ 𝑇 = dom(𝑦) ∪ dom(𝑧).

252


Hence, 𝑠 ∈ 𝑦 ⊕ 𝑧.

Unit Existence Take 𝑒 to be 𝜇 where 𝜇 is the (unique) distribution in

D(Mem[∅]).

Unit Closure Immediate as we take 𝐸 = 𝑀 .

Unit Coherence 𝑥 ∈ 𝑦 ⊕ 𝑒 entails 𝑦 = 𝜋dom(𝑦)𝑥, which implies 𝑦 ⊑ 𝑥. □

B.3 Soundness and Completeness of 𝑀-BI algebras

B.3.1 Algebraic Soundness and Completeness

The proof is very similar to the proof of BI soundness and completeness in sec-

tion 2.2.3: we first construct a new algebra – “𝑀-BI algebra,” prove the algebraic

soundness and completeness and then establish the overall theorem.

Definition B.3.1 (𝑀-BI algebra). An 𝑀-BI algebra is an algebra A𝑀 = (𝐴,∧,∨,→

,⊤,⊥, ∗𝑚∈𝑀 ,−∗𝑚∈𝑀 ,⊤∗𝑚∈𝑀) such that

• For each 𝑚 ∈ 𝑀 , the structure (𝑎,∧,∨,→,⊤,⊥, ∗𝑚,−∗𝑚,⊤∗𝑚) is a BI algebra;

• If 𝑚1 ≤ 𝑚2 then 𝑎 ∗𝑚1 𝑏 ≤ 𝑎 ∗𝑚2 𝑏.

We can interpret 𝑀-BI in an 𝑀-BI algebra A𝑀 . Let V : AP → A𝑀 be a map

assigning atomic propositions to elements of A𝑀 . We extend V to an interpre-

tation ⟦−⟧A mapping 𝑀-BI propositions to elements of A𝑀 , defined by:

253


⟦𝑃⟧A = V(𝑃)

⟦⊤⟧A = ⊤

⟦𝐼𝑚⟧A = ⊤∗𝑚

⟦⊥⟧A = ⊥

⟦𝑃 ∧𝑄⟧A = ⟦𝑃⟧A ∧ ⟦𝑄⟧A

⟦𝑃 ∨𝑄⟧A = ⟦𝑃⟧A ∨ ⟦𝑄⟧A

⟦𝑃→ 𝑄⟧A = ⟦𝑃⟧A → ⟦𝑄⟧A

⟦𝑃 ∗𝑚 𝑄⟧A = ⟦𝑃⟧A ∗𝑚 ⟦𝑄⟧A

⟦𝑃 −∗𝑚 𝑄⟧A = ⟦𝑃⟧A −∗𝑚 ⟦𝑄⟧A

Theorem B.3.1 (Algebraic Soundness). If 𝑃 ⊢ 𝑄 is provable, then ⟦𝑃⟧A ≤ ⟦𝑄⟧A for

any algebraic interpretation ⟦−⟧A.

Proof. By induction on the derivation of 𝑃 ⊢ 𝑄. The cases for everything ex-

cept ∗-WEAKENING follow from the exact same argument as for standard BI

and BI-algebra, as in theorem 2.2.3. For the remaining case of ∗-WEAKENING,

which derives 𝑃 ∗𝑚2 𝑄 from 𝑃 ∗𝑚1 𝑄 if 𝑚1 ≤ 𝑚2. We have

⟦𝑃 ∗𝑚1 𝑄⟧A = ⟦𝑃⟧A ∗𝑚1 ⟦𝑄⟧A (By definition of ⟦−⟧A)

≤ ⟦𝑃⟧A ∗𝑚2 ⟦𝑄⟧A (By definition of 𝑀-BI algebra)

= ⟦𝑃 ∗𝑚2 𝑄⟧A. (By definition of ⟦−⟧A)

□

Next, we want to show the algebraic completeness. Analogous to before the-

orem 2.2.6, we construct the Lindenbaum-Tarski algebra corresponding to 𝑀-BI.

254


Definition B.3.2 (Lindenbaum-Tarski Algebra). Define the equivalence relation

𝑃 ∼ 𝑄 as 𝑃 ⊢ 𝑄 and 𝑄 ⊢ 𝑃. Let [𝑃]∼ be the equivalence class of 𝑃 under ∼. Take

𝐼𝑚, ⊤, and ⊥ to be [𝐼𝑚]∼, [⊤]∼, and [⊥]∼, respectively. Then we define:

...

[𝑃]∼ ∗𝑚 [𝑄]∼ = [𝑃 ∗𝑚 𝑄]∼

[𝑃]∼ −∗𝑚 [𝑄]∼ = [𝑃 −∗𝑚 𝑄]∼

The fact that these operations are well-defined and form a 𝑀-BI algebra follows

almost entirely from lemma 2.2.4. The only remaining case is to check that if

𝑚1 ≤ 𝑚2 then [𝑃]∼ ∗𝑚1 [𝑄]∼ ≤ [𝑃]∼ ∗𝑚2 [𝑄]∼. We have

[𝑃]∼ ∗𝑚1 [𝑄]∼ = [𝑃 ∗𝑚1 𝑄]∼

≤ [𝑃 ∗𝑚2 𝑄]∼ (Since 𝑃 ∗𝑚1 𝑄 ⊢ 𝑃 ∗𝑚2 𝑄)

= [𝑃]∼ ∗𝑚2 [𝑄]∼

Then, we can construct an algebraic interpretation into Lindenbaum-Tarski

algebra, ⟦−⟧L, and use it to prove algebraic completeness.

Theorem B.3.2 (Algebraic Completeness). If ⟦𝑃⟧A ≤ ⟦𝑄⟧A for all algebraic inter-

pretations ⟦−⟧A, then 𝑃 ⊢ 𝑄.

The proof is identical to the proof for theorem 2.2.6.

B.3.2 Soundness of 𝑀-BI formulas

𝑀-BI formulas are interpreted on 𝑀-BI frames. We define a structure called

complex algebra on 𝑀-BI frames and show that the complex algebra of every 𝑀-

255


BI frame is an 𝑀-BI algebra.

Definition B.3.3 (Complex Algebra). If X is an 𝑀-BI frame, then the complex

algebra of X, written Com(X) is the structure (P⊑ (𝑋),∩,∪,→X , 𝑋, ∅, ∗𝑚∈𝑀 ,−∗𝑚∈𝑀

, 𝐸𝑚∈𝑀) where

P⊑ (𝑋) = {𝐴 ⊆ 𝑋 | 𝑎 ∈ 𝐴 ∧ 𝑎 ⊑ 𝑏 → 𝑏 ∈ 𝐴}

𝐴→X 𝐵 = {𝑎 | ∀𝑏. 𝑎 ⊑ 𝑏 ∧ 𝑏 ∈ 𝐴→ 𝑏 ∈ 𝐵}

𝐴 ∗𝑚 𝐵 = {𝑥 | ∃𝑤, 𝑦, 𝑧. 𝑤 ⊑ 𝑥 ∧ 𝑤 ∈ 𝑦 ⊕𝑚 𝑧 ∧ 𝑦 ∈ 𝐴 ∧ 𝑧 ∈ 𝐵}

𝐴 −∗𝑚 𝐵 = {𝑥 | ∀𝑤, 𝑦, 𝑧. (𝑥 ⊑ 𝑤 ∧ 𝑧 ∈ 𝑤 ⊕𝑚 𝑦 ∧ 𝑦 ∈ 𝐴) → 𝑧 ∈ 𝐵}

Lemma B.3.3. If X is an 𝑀-BI frame, then Com(X) is an 𝑀-BI algebra.

Proof. Each (𝑋, ⊑, ⊕𝑚, 𝐸𝑚) is a BI frame. Lemma 2.2.7 shows that the complex of

a BI frame is a BI algebra. Thus the only thing to check is that the ordering on ∗

respects the ordering on 𝑀 . Let 𝑚1 ≤ 𝑚2. We must show that 𝐴 ∗𝑚1 𝐵 ⊆ 𝐴 ∗𝑚2 𝐵.

Let 𝑥 ∈ 𝐴 ∗𝑚1 𝐵. Then there exists 𝑤, 𝑦, 𝑧 such that 𝑤 ⊑ 𝑥 and 𝑤 ∈ 𝑦 ⊕𝑚1 𝑧, with

𝑦 ∈ 𝐴 and 𝑧 ∈ 𝐵. by Operation Inclusion property, we have that 𝑤 ∈ 𝑦 ⊕𝑚2 𝑧,

hence 𝑥 ∈ 𝐴 ∗𝑚2 𝐵. □

Theorem B.3.4. Let X = (𝑋, ⊑, ◦𝑚, 𝐸𝑚) be a 𝑀-BI frame and letVf : AP → P(𝑋) be

a persistent valuation on X. Define the algebraic assignmentVa : AP → Com(X) by

lettingVa(𝑝) = Vf(𝑝) for all atomic proposition 𝑝. Define the algebraic interpretation

⟦−⟧𝑎 by taking the homomorphic extension ofV𝑎 Then we have: 𝑥 |=Vf 𝑃 if and only if

𝑥 ∈ ⟦𝑃⟧a.

Proof. The proof is almost identical to the proof for theorem 2.2.8. We show that

by induction on the syntax of the formula 𝑃. 𝑀-BI formula only differs from BI

formula by having indexed version of the ∗, 𝐼,−∗, so the only difference in the

256


proof is that: in the induction proof for formula ∗𝑚, 𝐼𝑚,−∗𝑚, we use the indexed

version of the operations in the complex algebra. □

Theorem B.3.5 (Soundness of 𝑀-BI). In 𝑀-BI logic, if 𝑃 ⊢ 𝑄 is derivable, then

𝑃 |= 𝑄.

The proof is identical to the proof of theorem 2.2.9.

B.3.3 Completeness of 𝑀-BI formulas

We reverse the direction now; we define a prime filter frame for every 𝑀-BI alge-

bra and show that a prime filter frame of any 𝑀-BI algebra is an 𝑀-BI frame.

Definition B.3.4 (Prime Filter Frame). If A is an 𝑀-BI algebra, then the prime

filter 𝑀-frame of A is defined as Prf(A) = (Prf(𝐴), ⊆, ⊕𝑚∈𝑀 , 𝐸𝑚∈𝑀) where

𝐹1 ⊕𝑚 𝐹2 = {𝐹 ∈ Prf(𝐴) | ∀𝑎1 ∈ 𝐹1.∀𝑎2 ∈ 𝐹2. 𝑎1 ∗𝑚 𝑎2 ∈ 𝐹}

𝐸𝑚 = {𝐹 ∈ Prf(𝐴) | ⊤∗𝑚 ∈ 𝐹}

Lemma B.3.6. If A is an 𝑀-BI algebra, then Prf(A) is an 𝑀-BI frame.

Proof. Lemma 2.2.10 shows that for each𝑚 ∈ 𝑀 , (Prf(𝐴), ⊆, ⊕𝑚, 𝐸𝑚) is a BI frame.

Therefore, we only need to check the Operation Inclusion property. Let 𝑚1 ≤ 𝑚2

and let 𝐹, 𝐺, 𝐻 ∈ Prf(𝐴) with 𝐹 ∈ 𝐺 ⊕𝑚1 𝐻. Let 𝑎 ∈ 𝐺 and 𝑏 ∈ 𝐻. Then

𝑎 ∗𝑚1 𝑏 ∈ 𝐹. Since 𝑎 ∗𝑚1 𝑏 ≤ 𝑎 ∗𝑚2 𝑏, and filters are upward-closed, 𝑎 ∗𝑚2 𝑏 ∈ 𝐹,

hence 𝐹 ∈ 𝐺 ⊕𝑚2 𝐻. □

Theorem B.3.7. Let A = (𝐴, . . . ) be a 𝑀-BI algebra and let ⟦−⟧ : FormBI → 𝐴 be an

algebraic interpretation that homomorphically extends the assignmentVa : AP → 𝐴.

257


Define the persistent valuationVf : AP → P(Prf(𝐴)) on the prime filter frame Prf(A)

by:

Vf(𝑝) = {𝐹 ∈ Prf(𝐴) | Va(𝑝) ∈ 𝐹}

Then for 𝐹 ∈ Prf(𝐴), we have 𝐹 |=Vf 𝑃 if and only if ⟦𝑃⟧ ∈ 𝐹 .

The proof is almost identical to the proof of theorem 2.2.11. Then, we can

prove the completeness of 𝑀-BI using the same argument as for theorem 2.2.12.

Theorem B.3.8 (Completeness of 𝑀-BI). In 𝑀-BI logic, if 𝑃 |= 𝑄, then 𝑃 ⊢ 𝑄 is

derivable.

B.4 A 𝑀-BI Model for Independence and Negative Association

B.4.1 Independence Implies PNA

The proof that independence implies PNA will use the following lemma.

Lemma B.4.1. In a distribution 𝜇, if 𝜇 satisfies {𝑆1, 𝑆2}-PNA, 𝜇 satisfies {𝑇1, 𝑇2}-

PNA, and 𝑆1 ∪ 𝑆2 is independent from 𝑇1 ∪ 𝑇2 in 𝜇 then 𝜇 is {𝑆1 ∪ 𝑇1, 𝑆2 ∪ 𝑇2}-PNA.

Proof. By the definition of PNA and independence, 𝑆1, 𝑆2 are disjoint, 𝑇1, 𝑇2 are

disjoint, and 𝑆1 ∪ 𝑇1, 𝑆2 ∪ 𝑇2 are disjoint. For any monotonically decreasing/in-

258


creasing functions 𝑓 : Mem[𝑆1 ∪ 𝑇1] → R+, 𝑔 : Mem[𝑆2 ∪ 𝑇2] → R+,

E𝑚∼𝜇 [ 𝑓 (𝜋𝑆1∪𝑇1𝑚) · 𝑔(𝜋𝑆2∪𝑇2𝑚)]

= E𝑠∼𝜋𝑆1∪𝑆2 𝜇
E𝑡∼𝜋𝑇1∪𝑇2 𝜇

[ 𝑓 (𝜋𝑆1𝑠 ⊲⊳ 𝜋𝑇1𝑡) · 𝑔(𝜋𝑆2𝑠 ⊲⊳ 𝜋𝑇2𝑡)]

(By independence of 𝑆1 ∪ 𝑆2 and 𝑇1 ∪ 𝑇2)

≤ E𝑠∼𝜋𝑆1∪𝑆2 𝜇

(
E𝑡1∼𝜋𝑇1 𝜇

[ 𝑓 (𝜋𝑆1𝑠, 𝑡1)] · E𝑡2∼𝜋𝑇2 𝜇
[𝑔(𝜋𝑆2𝑠, 𝑡2)]

)
(♦)

≤ E𝑠1∼𝜋𝑆1 𝜇
E𝑡1∼𝜋𝑇1 𝜇

[ 𝑓 (𝑠1, 𝑡1)] · E𝑠2∼𝜋𝑆2 𝜇
E𝑡2∼𝜋𝑇2 𝜇

[𝑔(𝑠2, 𝑡2)] (♥)

≤ E𝑚∼𝜇 [ 𝑓 (𝜋𝑆1∪𝑇1𝑚)] · E𝑚∼𝜇 [𝑔(𝜋𝑆2∪𝑇2𝑚)] (♣)

where the step ♦ is because 𝜋𝑇1∪𝑇2𝜇 is 𝑇1, 𝑇2-PNA and 𝑓 (𝜋𝑆1𝑠, 𝑡1), 𝑔(𝜋𝑆2𝑠, 𝑡2) are

both monotonically decreasing/increasing in 𝑇1, 𝑇2; the step ♥ is because 𝜋𝑆1∪𝑆2𝜇

is 𝑆1, 𝑆2-PNA and that E𝑡1∼𝜋𝑇1 𝜇
[ 𝑓 (𝜋𝑆1𝑠, 𝑡1)], and E𝑡2∼𝜋𝑇2 𝜇

[𝑔(𝜋𝑆2𝑠, 𝑡2)] are both

monotonically decreasing/increasing in 𝑆1, 𝑆2; and the step ♣ is by indepen-

dence of 𝑆1 and 𝑇1 and the independence of 𝑆2 and 𝑇2 in 𝜇. □

Theorem 3.4.5 (Independence implies PNA). Let 𝑆, 𝑇 ⊆ Var be two disjoint sets of

variables. Suppose 𝜇1 ∈ D(Mem[𝑆]), 𝜇2 ∈ D(Mem[𝑇]). If 𝜇1 satisfies S-PNA and

𝜇2 satisfies T -PNA, then any 𝜇 ∈ 𝜇𝑆 ⊗D 𝜇𝑇 satisfies (S ∪ T )-PNA.

Proof. Fix S and T . Say S = {𝑆1, . . . , 𝑆𝑝} and T = {𝑇1, . . . , 𝑇𝑞}. For any R

coarsening S ∪ T , indexing S ∪ T as {𝑈1, . . . ,𝑈𝑝+𝑞}, indexing R as {𝑅1, . . . , 𝑅𝑛},

we have:

R = {∪{𝑈 𝑗 | 𝑗 ∈ 𝑓 (𝑖)} | 𝑖 ∈ [𝑛]}.

Then, given a family of monotonically increasing/decreasing functions 𝑔𝑖 :

𝑅𝑖 → R+

E𝑚∼𝜇

[ ∏
𝑅𝑖∈R

𝑔𝑖 (𝜋𝑅𝑖𝑚)
]
= E𝑚∼𝜇


∏
𝑖∈[𝑛]

𝑔𝑖 (𝜋∪{𝑈 𝑗 | 𝑗∈ 𝑓 (𝑖)}𝑚)
 .

259


For each 𝑖, ∪{𝑈 𝑗 | 𝑗 ∈ 𝑓 (𝑖)} can be divided into the part in 𝑆 and the part in 𝑇 .

We refer to them as 𝑆′
𝑖

and 𝑇 ′
𝑖
. (Some of 𝑆′

𝑖
and 𝑇 ′

𝑖
may be empty). Thus, for each

𝑖,

𝑔𝑖 (𝜋∪{𝑈 𝑗 | 𝑗∈ 𝑓 (𝑖)}𝑚) = 𝑔𝑖 (𝜋𝑆′𝑖∪𝑇 ′𝑖𝑚).

By Lemma B.1.3, S′ = {𝑆′1, . . . , 𝑆
′
𝑛} coarsens S, and T ′ = {𝑇 ′1, . . . , 𝑇

′
𝑛} coarsens

T . So 𝜇 is S′-PNA and T ′-PNA.

We prove by induction on 𝑘 ∈ [𝑛] that

E𝑚∼𝜇


∏
𝑖∈[𝑘]

𝑔𝑖 (𝜋𝑆′
𝑖
∪𝑇 ′

𝑖
𝑚)

 ≤
∏
𝑖∈[𝑘]

E𝑚∼𝜇
[
𝑔𝑖 (𝜋𝑆′

𝑖
∪𝑇 ′

𝑖
𝑚)

]
.

Base case When 𝑘 = 1, trivial.

Inductive case For 𝑘 < 𝑛, assume

E𝑚∼𝜇


∏
𝑖∈[𝑘]

𝑔𝑖 (𝜋𝑆′
𝑖
∪𝑇 ′

𝑖
𝑚)

 ≤
∏
𝑖∈[𝑘]

E𝑚∼𝜇
[
𝑔𝑖 (𝜋𝑆′

𝑖
∪𝑇 ′

𝑖
𝑚)

]
.

Note that 𝜇 is S′-PNA implies that 𝜇 is {∪𝑖∈[𝑘] (𝑆′𝑖), 𝑆′𝑘+1}-PNA, and 𝜇 is T ′-

PNA implies that {∪𝑖∈[𝑘] (𝑇 ′𝑖 ), 𝑇 ′𝑘+1}-NA. Thus, by Lemma B.4.1, 𝜇 is also

{{∪𝑖∈[𝑘] (𝑆′𝑖) ∪ {∪𝑖∈[𝑘] (𝑇 ′𝑖 ), 𝑆′𝑘+1 ∪ 𝑇
′
𝑘+1}-NA. Also, since all 𝑔𝑖 is monotoni-

cally increasing (decreasing) and non-negative, 𝑚 ↦→ ∏
𝑖∈[𝑘] 𝑔𝑖 (𝑚) is also a

monotonically increasing (decreasing) function from ∪𝑖∈[𝑘]𝑆′𝑖 ∪ ∪𝑖∈[𝑘]𝑇 ′𝑖 to

R+. Therefore,

E𝑚∼𝜇


∏

𝑖∈[𝑘+1]
𝑔𝑖 (𝜋𝑆′

𝑖
∪𝑇 ′

𝑖
𝑚)

 ≤ E𝑚∼𝜇


∏
𝑖∈[𝑘]

𝑔𝑖 (𝜋𝑆′
𝑖
∪𝑇 ′

𝑖
𝑚)

 · E𝑚∼𝜇
[
𝑔𝑘+1(𝜋𝑆′

𝑘+1∪𝑇
′
𝑘+1
𝑚)

]
≤

∏
𝑖∈[𝑘+1]

E𝑚∼𝜇
[
𝑔𝑖 (𝜋𝑆′

𝑖
∪𝑇 ′

𝑖
𝑚)

]
,

where the second inequality follows from the inductive hypothesis.

260


Thus, the desired inequality holds for any R coarseningS∪T and any family

of monotonically increasing (decreasing) functions on R. Thus, 𝜇 is S∪T -PNA.

□

B.4.2 Axioms of Negative Association

Lemma 3.5.5 (N-NARY MONOTONE MAP). Let 𝑥, 𝑥𝛾,𝛼 and 𝑦𝛾 be program variables.

Let 𝐾𝛾 be natural numbers. The following is valid in (X𝑁𝐴,V∗).

|=
𝑁

⊛
𝛾=0

©­«
𝐾𝛾∧
𝛼=0

Own(𝑥𝛾,𝛼)ª®¬ ∧
𝑁∧
𝛾=0

[
𝑦𝛾 = 𝑓𝛾

(
𝑥𝛾,0, . . . , 𝑥𝛾,𝐾𝛾

)]
→

𝑁

⊛
𝛾=0

Own(𝑦𝛾)

when 𝑓1, . . . , 𝑓𝑁 all monotone or all antitone (Mono-Map)

Proof. Abbreviate the partition of variables
{⋃𝐾𝛾

𝛼=0{𝑥𝛾,𝛼} | 𝐿 ≤ 𝛾 ≤ 𝑀
}

as 𝑋 [𝐿 :

𝑀]. Intuitively, we group all the 𝑥𝛾,𝛼 with the same 𝛾 as a block in the partition,

and different blocks in the partition are separated by the separating conjunction

⊛.

For any 𝜇 |=⊛𝑁

𝛾=0

(∧𝐾𝛾

𝛼=0 Own(𝑥𝛾,𝛼)
)
, we use induction and definition unfold-

ing to show that 𝜇 satisfies 𝑋𝑁 -PNA. We choose the inductive hypothesis 𝑃(𝑀)

to be: 𝜇 |=⊛𝑀

𝛾=0

(∧𝐾𝛾

𝛼=0 Own(𝑥𝛾,𝛼)
)

implies that 𝜇 is 𝑋 [0 : 𝑀]-PNA.

Base case: 𝑋 [: 0] =
{⋃𝐾𝛾

𝛼=0{𝑥𝛾,𝛼}
}

is partition that contains a single block. Thus,

𝜇 is trivially 𝑋 [0 : 0]-PNA.

Inductive case: For any 0 < 𝑀 ≤ 𝑀 , by satisfaction rules, 𝜇 |=

⊛𝑀

𝛾=0

(∧𝐾𝛾

𝛼=0 Own(𝑥𝛾,𝛼)
)

implies there exists 𝜇′, 𝜇1, 𝜇2 such that 𝜇 ⊒ 𝜇′ ∈

261


𝜇1 ⊕ 𝜇2,

𝜇1 |=
𝑀−1

⊛
𝛾=0
(
𝐾𝛾∧
𝛼=0

Own(𝑥𝛾,𝛼)) and 𝜇2 |=
𝐾𝑀∧
𝛼=0

Own(𝑥𝑀,𝛼)

By inductive hypothesis, 𝜇1 satisfies 𝑋 [0 : 𝑀 − 1]-PNA. And 𝑋 [𝑀 : 𝑀] is

a partition that contains a single block, so trivially, 𝜇2 satisfies 𝑋 [𝑀 : 𝑀]-

PNA. Therefore, 𝜇′ ∈ 𝜇1 ⊕ 𝜇2 implies that 𝜇′ is 𝑋 [0 : 𝑀]-PNA

Therefore, we can conclude 𝜇 is 𝑋 [0 : 𝑁]-PNA from 𝜇 |=⊛𝑁

𝛾=0

(∧𝐾𝛾

𝛼=0 Own(𝑥𝛾,𝛼)
)
.

If additionally 𝜇 |= ∧𝑁
𝛾=0 𝑦𝛾 = 𝑓𝛾 (𝑥𝛾,1, . . . , 𝑥𝛾,𝐾𝛾 ) and 𝑓𝛾 are all monotone or

antitone, then we can show that 𝜇 is
{
{𝑦𝛾} | 1 ≤ 𝛾 ≤ 𝑁

}
-PNA. For any family of

non-negative monotone functions 𝑔𝛾, note that the composed function 𝑔𝛾 ◦ 𝑓𝛾

are either all monotone or all antitone. Thus,

E𝑚∼𝜇


∏

1≤𝛾≤𝑁
𝑔𝛾 (𝑦𝛾)


= E𝑚∼𝜇


∏

1≤𝛾≤𝑁
𝑔𝛾 ( 𝑓𝛾 (𝑥𝛾,1, . . . , 𝑥𝛾,𝐾𝛾 ))


= E𝑚∼𝜇


∏

1≤𝛾≤𝑁
(𝑔𝛾 ◦ 𝑓𝛾) (𝑥𝛾,1, . . . , 𝑥𝛾,𝐾𝛾 )


≤

∏
1≤𝛾≤𝑁

E𝑚∼𝜇
[
(𝑔𝛾 ◦ 𝑓𝛾) (𝑥𝛾,1, . . . , 𝑥𝛾,𝐾𝛾 )

]
(Because 𝜇 is 𝑋 [0, 𝑁]-PNA)

=
∏

1≤𝛾≤𝑁
E𝑚∼𝜇

[
𝑔𝛾 (𝑦𝛾)

]
. (B.7)

That is, 𝜇 is
{
{𝑦𝛾} | 1 ≤ 𝛾 ≤ 𝑁

}
-PNA. And by Theorem 3.3.1 this implies that

{{𝑦𝛾} | 1 ≤ 𝛾 ≤ 𝑁} satisfies NA in 𝜇. Then, by Theorem 3.3.5, (𝜎, 𝜇) |=

⊛𝑁

𝛾=1 Own(𝑦𝛾). □

262


B.4.3 The Restriction Property of 𝑀-BI Formulas

For the counterexample of the restriction property, we prove a lemma.

Lemma B.4.2. Let 𝜇 be the uniform distribution over one hot vectors on 𝐴, 𝐵. Then,

𝜇 |= (Unif{0,1}⟨𝐶⟩) −⊛ (Own(𝐵) ∗ Own(𝐶)).

Proof. Fix any 𝜇𝐶 such that 𝜇𝐶 |= Unif{0,1}⟨𝐶⟩, which implies that 𝜋𝐶𝜇𝐶 (0) = 0.5

and 𝜋𝐶𝜇𝐶 (1) = 0.5. Fix 𝜇𝑒 ∈ 𝜇 ⊕ 𝜇𝐶 .

Since 𝐵 ∈ dom(𝜇), 𝜇 is trivially {{𝐵}}-PNA. Similarly, 𝜇𝐶 is trivially {{𝐶}}-

PNA. Thus, 𝜇𝑒 ∈ 𝜇 ⊕ 𝜇𝐶 must be {{𝐵}, {𝐶}}-PNA. Then for any two both

monotone or antitone functions 𝑓 : Mem[𝐵] → R+, 𝑔 : Mem[𝐶] → R+,

E𝑚∼𝜇𝑒 [ 𝑓 (𝜋𝐵𝑚) · 𝑔(𝜋𝐶𝑚)] ≤ E𝑚∼𝜇𝑒 [ 𝑓 (𝜋𝐵𝑚)] · E𝑚∼𝜇𝑒 [𝑔(𝜋𝐶𝑚)] .

Similarly, 𝜇𝑒 ∈ 𝜇 ⊕ 𝜇𝐶 must be {{𝐴}, {𝐶}}-PNA, and thus, for any two both

monotone or antitone functions 𝑓 : Mem[𝐴] → R+, 𝑔 : Mem[𝐶] → R+,

E𝑚∼𝜇𝑒 [ 𝑓 (𝜋𝐴𝑚) · 𝑔(𝜋𝐶𝑚)] ≤ E𝑚∼𝜇𝑒 [ 𝑓 (𝜋𝐴𝑚)] · E𝑚∼𝜇𝑒 [𝑔(𝑝𝑖𝐶𝑚)] . (B.8)

Next, we want to prove that 𝜇𝑒 |= Own(𝐵) ∗ Own(𝐶). We prove by

contradiction. Suppose variables 𝐵 and 𝐶 are not independent in 𝜇𝑒, then

by Lemma B.1.1 that says NA definition with equality instead of inequality

asserts independence, there must exists some both monotone or both antitone

functions 𝑓 : Mem[𝐵] → R+, 𝑔 : Mem[𝐶] → R+ such that

E𝑚∼𝜇𝑒 [ 𝑓 (𝜋𝐵𝑚) · 𝑔(𝜋𝐶𝑚)] < E𝑚∼𝜇𝑒 [ 𝑓 (𝜋𝐵𝑚)] · E𝑚∼𝜇𝑒 [𝑔(𝜋𝐶𝑚)]

Since 𝜇𝑒 ∈ 𝜇 ⊕ 𝜇𝐶 , we have 𝜇𝑒 ⊒ 𝜇, and 𝜇 being a uniform distribution over

one-hot vectors on 𝐴, 𝐵 indicates that for any 𝑚 in the support of 𝜇𝑒, 𝐴 = 1 iff

263


𝐵 = 0, and 𝐴 = 0 iff 𝐵 = 1. Therefore,

E𝑚∼𝜇𝑒 [ 𝑓 (−𝜋𝐴𝑚) · (−𝑔(𝜋𝐶𝑚))] = E𝑚∼𝜇𝑒 [ 𝑓 (𝜋𝐵𝑚) · (−𝑔(𝜋𝐶𝑚))]

= −E𝑚∼𝜇𝑒 [ 𝑓 (𝜋𝐵𝑚) · 𝑔(𝜋𝐶𝑚)]

> −E𝑚∼𝜇𝑒 [ 𝑓 (−𝜋𝐴𝑚)] · E𝑚∼𝜇𝑒 [𝑔(𝜋𝐶𝑚)]

= E𝑚∼𝜇𝑒 [ 𝑓 (−𝜋𝐴𝑚)] · E𝑚∼𝜇𝑒 [−𝑔(𝜋𝐶𝑚)]

where 𝑥 ↦→ 𝑓 (−𝑥) and 𝑥 → −𝑔(𝑥) are two both monotone or both antitone func-

tions because 𝑓 , 𝑔 are so. Thus, this inequality contradicts Equation (B.8).

Therefore, 𝐵 and 𝐶 must be independent in 𝜇𝑒. Hence, 𝜇𝑒 |= Own(𝐵) ∗

Own(𝐶), and 𝜇 |= (Unif{0,1}⟨𝐶⟩) −⊛ (Own(𝐵) ∗ Own(𝐶)). □

Theorem 3.5.2. There exists 𝜇 ∈ D(Mem[𝑆]) and formula 𝜑 such that 𝜇 |= 𝜑 but

𝜋FV(𝜑) ̸ |= 𝜑.

Proof. Let 𝐴, 𝐵, 𝐶 be three variables in Var. Let 𝜑 = (Unif{0,1}⟨𝐶⟩) −⊛ (Own(𝐵) ∗

Own(𝐶)). Let 𝜇 be the uniform distribution over one hot vectors on 𝐴, 𝐵. Then,

we claim 𝜇 |= 𝜑 but 𝜋{𝐵,𝐶}𝜇 ̸ |= 𝜑. For 𝜇 |= 𝜑, it suffices to show that for any

𝜇𝐶 where 𝐶’s value is the uniform distribution on {0, 1}, for any 𝜇′ ⊒ 𝜇, and

𝜇𝑒 ∈ 𝜇′ ⊕ 𝜇𝐶 , 𝐵 and 𝐶 are independent in 𝜇𝑒 according to Lemma B.4.2.

To show 𝜋{𝐵,𝐶}𝜇 ̸ |= 𝜑, we first note that 𝜋{𝐵,𝐶}𝜇 = 𝜋{𝐵}𝜇 is a uniform distri-

bution of 0 and 1 on 𝐵. Let 𝜇′
𝐶
∈ D(Mem[{𝐶}]) be the uniform distribution on

{0, 1}, 𝜇′ ∈ D(Mem[{𝐵,𝐶}]) be the uniform distribution over one-hot vectors on

𝐵,𝐶. Clearly, 𝐵,𝐶 are not independent in 𝜇′, so 𝜇′ ̸ |= Own(𝐵) ∗ Own(𝐶). Also, 𝜇′

is in 𝜋{𝐵,𝐶}𝜇 ⊕ 𝜇′𝐶 . So 𝜋𝐵,𝐶𝜇 ̸ |= Unif{0,1}⟨𝐶⟩ −⊛ (Own(𝐵) ∗ Own(𝐶)). □

Theorem 3.5.1 (Restriction). For any distribution 𝜇 ∈ 𝑋D, for any 𝜑 be an MBI+

264


formula interpreted on (X𝑁𝐴,V∗), and any valuationV,

𝜇 |=V 𝜑⇔ 𝜋FV(𝜑)𝜇 |=V 𝜑.

Proof. We prove it by induction on the syntax of formula. Most cases are the

same as in lemma 2.3.7. So we only show the case for the additional case 𝑃 ⊛ 𝑄.

𝜑 = 𝑃 ⊛ 𝑄: Assuming 𝜇 |= 𝑃 ⊛ 𝑄, then there exists 𝜇′, 𝜇1, 𝜎2 such that 𝜇 ⊒ 𝜇′ ∈

𝜇1 ⊕ 𝜇2, 𝜇1 |= 𝑃, and 𝜇2 |= 𝑄. By the definition of the pre-order and ⊕, it

must 𝜇 ⊒ 𝜇′ ∈ 𝜇1 ⊕ 𝜇2.

By inductive hypothesis, 𝜋FV(𝜑)𝜇1 |= 𝑃 and 𝜋FV(𝜑)𝜇2 |= 𝑄. Also,

by eq. (Down-Closed), there exists 𝜇′′ ⊑ 𝜇′ and 𝜇′′𝜋FV(𝜑)𝜇1 ⊕ 𝜋FV(𝜑)𝜇2. So

𝜇′′ |= 𝑃 ⊛ 𝑄 and by persistence, 𝜇 |= 𝑃 ⊛ 𝑄.

□

265


APPENDIX C

DIBI: A BUNCHED LOGIC FOR CONDITIONAL INDEPENDENCE

C.1 A Probabilistic Model of DIBI

Remark. In the following, we sometimes abbreviate dom( 𝑓𝑖) as 𝐷𝑖 and

range( 𝑓𝑖) as 𝑅𝑖.

C.1.1 Well-definedness of the Structure

To facilitate proving that (X𝐶𝐼 , ⊑, ⊕̂, ⊙̂, 𝐸) is a DIBI frame, we first show some

properties of the binary operations and the order. First, we prove that X𝐶𝐼 is

closed under ⊕ and ⊙.

Lemma C.1.1. X𝐶𝐼 is closed under ⊕ and ⊙.

Proof. For any 𝑓1, 𝑓2 ∈ X𝐶𝐼 , we need to show that

• If 𝑓1 ⊕ 𝑓2 is defined, then 𝑓1 ⊕ 𝑓2 ∈ X𝐶𝐼 .

Recall that 𝑓1 ⊕ 𝑓2 is defined if and only if 𝑅1∩𝑅2 = 𝐷1∩𝐷2, which implies

that (𝑅1 ∪ 𝑅2) \ (𝐷1 ∪ 𝐷1) = (𝑅1 \ 𝐷1) ⊎ (𝑅2 \ 𝐷2).

State 𝑓1 ⊕ 𝑓2 preserves the input because for any 𝑑 ∈ Mem[𝐷1 ∪ 𝐷2], we

266


can obtain the following (we will refer to this as ★):

(𝜋𝐷1∪𝐷2 ( 𝑓1 ⊕ 𝑓2)) (𝑑) (𝑑)

=
∑︁

𝑥∈Mem[(𝑅1∪𝑅2)\(𝐷1∪𝐷2)]
( 𝑓1 ⊕ 𝑓2) (𝑑) (𝑑 ⊲⊳ 𝑥)

=
∑︁

𝑥1∈Mem[𝑅1\𝐷1], 𝑥2∈Mem[𝑅2\𝐷2]
𝑓1(𝑑𝐷1) (𝑑𝐷1 ⊲⊳ 𝑥1) · 𝑓2(𝑑𝐷2) (𝑑𝐷2 ⊲⊳ 𝑥2)

=
©­«

∑︁
𝑥1∈Mem[𝑅1\𝐷1]

𝑓1(𝑑𝐷1) (𝑑𝐷1 ⊲⊳ 𝑥1)ª®¬ · ©­«
∑︁

𝑥2∈Mem[𝑅2\𝐷2]
𝑓2(𝑑𝐷2) (𝑑𝐷2 ⊲⊳ 𝑥2)ª®¬

= 1 · 1 = 1 (Using 𝑓1, 𝑓2 ∈ X𝐶𝐼)

Then, for any input 𝑑 ∈ Mem[𝐷1 ∪ 𝐷2], ( 𝑓1 ⊕ 𝑓2) (𝑑) is a distribution since:∑︁
𝑚∈Mem[𝑅1∪𝑅2]

( 𝑓1 ⊕ 𝑓2) (𝑑) (𝑚)

=
∑︁

𝑚∈Mem[𝑅1∪𝑅2]
𝑓1(𝑑𝐷1) (𝑚𝑅1) · 𝑓2(𝑑𝐷2) (𝑚𝑅2)

‡
=

∑︁
𝑥1∈Mem[𝑅1\𝐷1], 𝑥2∈Mem[𝑅2\𝐷2]

𝑓1(𝑑𝐷1) (𝑑𝐷1 ⊲⊳ 𝑥1) · 𝑓2(𝑑𝐷2) (𝑑𝐷2 ⊲⊳ 𝑥2)

= 1 (Using the last two steps of (★))

Step ‡ follows 𝑓1 and 𝑓2 being input-preserving, which means that the term

𝑓𝑖 (𝜋𝑑𝐷𝑖) (𝜋𝑚𝑅𝑖) is 0 when 𝑑𝐷𝑖 ≠ 𝑚𝐷𝑖 .

Thus, 𝑓1 ⊕ 𝑓2 is a kernel in X𝐶𝐼 .

• If 𝑓1 ⊙ 𝑓2 is defined, then 𝑓1 ⊙ 𝑓2 ∈ X𝐶𝐼 .

Recall that 𝑓1 ⊙ 𝑓2 : Mem[𝐷1] → D(Mem[𝑅2]) is defined iff 𝑅1 = 𝐷2. The

composition 𝑓1 ⊙ 𝑓2 preserves the input because for any 𝑑 ∈ Mem[𝐷1], we

267


can obtain (♠):

(𝜋𝐷1 𝑓1 ⊙ 𝑓2) (𝑑) (𝑑)

=
∑︁

𝑥∈Mem[𝑅2\𝐷1]
( 𝑓1 ⊙ 𝑓2) (𝑑) (𝑑 ⊲⊳ 𝑥)

=
∑︁

𝑥∈Mem[𝑅2\𝐷1]
𝑓1(𝑑) (𝑑 ⊲⊳ 𝑥𝑅1\𝐷1) · 𝑓2(𝑑 ⊲⊳ 𝑥𝑅1\𝐷1) (𝑑 ⊲⊳ 𝑥)

=
∑︁

𝑥1∈Mem[𝑅1\𝐷1]
𝑓1(𝑑) (𝑑 ⊲⊳ 𝑥1) · ©­«

∑︁
𝑥2∈Mem[𝑅2\𝑅1]

𝑓2(𝑑 ⊲⊳ 𝑥1) (𝑑 ⊲⊳ 𝑥1 ⊲⊳ 𝑥2)ª®¬
=

∑︁
𝑥1∈Mem[𝑅1\𝐷1]

( 𝑓1(𝑑) (𝑑 ⊲⊳ 𝑥1) · 1) (Using 𝑓2 ∈ X𝐶𝐼)

= 1

Then, for any 𝑑 ∈ 𝐷1, ( 𝑓1 ⊙ 𝑓2) (𝑑) is a distribution as∑︁
𝑚∈Mem[𝑅2]

( 𝑓1 ⊙ 𝑓2) (𝑑) (𝑚)

=
∑︁

𝑚∈Mem[𝑅2]
𝑓1(𝑑) (𝑚𝑅1) · 𝑓2(𝑚𝑅1) (𝑚) (Equation (4.2))

♥
=

∑︁
𝑥∈Mem[𝑅2\𝐷1]

𝑓1(𝑑) (𝑑 ⊲⊳ 𝑥𝑅1\𝐷1) · 𝑓2(𝑑 ⊲⊳ 𝑥𝑅1\𝐷1) (𝑑 ⊲⊳ 𝑥)

= 1 (Using the last three steps of (♠))

Step ♥ follows from 𝑓1, 𝑓2 being input-preserving, so the 𝑓𝑖 terms are 0

when 𝑑𝐷𝑖 ≠ 𝑚𝐷𝑖 .

Thus 𝑓1 ⊙ 𝑓2 is a kernel in X𝐶𝐼 . □

Lemma C.1.2 (Reflexivity and transitivity of order). The order ⊑ defined in X𝐶𝐼 is

transitive and reflexive.

268


Proof. Let 𝑥 : Mem[𝐴] → D(Mem[𝑋]) ∈ 𝑀 , 𝑆 = ∅, 𝑣 = unit𝑋 . Then

(𝑥 ⊕ unit𝑆) ⊙ 𝑣 = (𝑥 ⊕ unit∅) ⊙ unit𝑋

= 𝑥 ⊙ unit𝑋 (By lemma C.1.7)

= 𝑥

Thus, we have 𝑥 ⊑ 𝑥, and the order is reflexive.

For any 𝑥, 𝑦, 𝑧 ∈ 𝑀 , if 𝑥 ⊑ 𝑦 and 𝑦 ⊑ 𝑧, then by definition of ⊑, there exist

𝑆1 and 𝑣1 such that 𝑦 = (𝑥 ⊕ unit𝑆1) ⊙ 𝑣1, and there exist 𝑆2 and 𝑣2 such that

𝑧 = (𝑦 ⊕ unit𝑆2) ⊙ 𝑣2.

We can now calculate:

𝑧 = (𝑦 ⊕ unit𝑆2) ⊙ 𝑣2

= (((𝑥 ⊕ unit𝑆1) ⊙ 𝑣1) ⊕ unit𝑆2) ⊙ 𝑣2

= (((𝑥 ⊕ unit𝑆1) ⊙ 𝑣1) ⊕ (unit𝑆2 ⊙ unit𝑆2)) ⊙ 𝑣2

= (𝑥 ⊕ unit𝑆1 ⊕ unit𝑆2) ⊙ (𝑣1 ⊕ unit𝑆2) ⊙ 𝑣2 (By C.1.8 and lemma C.1.9)

= (𝑥 ⊕ unit𝑆1∪𝑆2) ⊙ ((𝑣1 ⊕ unit𝑆2) ⊙ 𝑣2)

X𝐶𝐼 is closed under ⊕, ⊙, so (𝑣1 ⊕ unit𝑆2) ⊙ 𝑣2 ∈ X𝐶𝐼 . Thus,

𝑧 = (𝑥 ⊕ unit𝑆1∪𝑆2) ⊙ (𝑣1 ⊕ unit𝑆2) ⊙ 𝑣2

showing that 𝑥 ⊑ 𝑧. So the order is transitive. □

Next we prove that the parallel composition ⊕ is associative, commutative

and identify its identity.

269


C.1.2 Associativity of Parallel Composition

Lemma C.1.3 (⊕ - Associativity). We show that when ( 𝑓 ⊕ 𝑔) ⊕ ℎ and 𝑓 ⊕ (𝑔 ⊕ ℎ)

are defined, ( 𝑓 ⊕ 𝑔) ⊕ ℎ = 𝑓 ⊕ (𝑔 ⊕ ℎ).

Proof. Consider 𝑓 : Mem[𝑆] → D(Mem[𝑆∪𝑇]), 𝑔 : Mem[𝑈] → D(Mem[𝑈∪𝑉]),

and ℎ : Mem[𝑊] → D(Mem[𝑊 ∪ 𝑋]). For any 𝑑 ∈ Mem[𝑆 ∪ 𝑈 ∪𝑊], and 𝑚 ∈

Mem[𝑆 ∪ 𝑇 ∪𝑈 ∪𝑉 ∪𝑊 ∪ 𝑋],

(( 𝑓 ⊕ 𝑔) ⊕ ℎ) (𝑑) (𝑚) =
(
𝑓 (𝑑𝑆) (𝑚𝑆∪𝑇 ) · 𝑔(𝑑𝑈) (𝑚𝑈∪𝑉 )

)
· ℎ(𝑑𝑊 ) (𝑚𝑊∪𝑋) (def. ⊕)

= 𝑓 (𝑑𝑆) (𝑚𝑆∪𝑇 ) ·
(
𝑔(𝑑𝑈) (𝑚𝑈∪𝑉 ) · ℎ(𝑑𝑊 ) (𝑚𝑊∪𝑋)

)
= ( 𝑓 ⊕ (𝑔 ⊕ ℎ)) (𝑑) (𝑚)

□

Lemma C.1.4 (Standard associativity of ⊕). For any 𝑓1, 𝑓2, 𝑓3 ∈ 𝑀 , ( 𝑓1 ⊕ 𝑓2) ⊕ 𝑓3

is defined if and only if 𝑓1 ⊕ ( 𝑓2 ⊕ 𝑓3) is defined and they are equal.

Proof. To show that ( 𝑓1 ⊕ 𝑓2) ⊕ 𝑓3 is defined if and only if 𝑓1 ⊕ ( 𝑓2 ⊕ 𝑓3) is defined,

it suffices to show that

𝑅1 ∩ 𝑅2 = 𝐷1 ∩ 𝐷2 (C.1)

(𝑅1 ∪ 𝑅2) ∩ 𝑅3 = (𝐷1 ∪ 𝐷2) ∩ 𝐷3 (C.2)

if and only if

𝑅2 ∩ 𝑅3 = 𝐷2 ∩ 𝐷3 (C.3)

𝑅1 ∩ (𝑅2 ∪ 𝑅3) = 𝐷1 ∩ (𝐷2 ∪ 𝐷3) (C.4)

We show that eq. (C.3) and eq. (C.4) follows from eq. (C.1) and eq. (C.2):

Recall that 𝐷1 ⊆ 𝑅1, 𝐷2 ⊆ 𝑅2, 𝐷3 ⊆ 𝑅3, so

270


• Equation (C.3) follows from 𝐷2 ∩ 𝐷3 ⊆ 𝑅2 ∩ 𝑅3 and 𝐷2 ∩ 𝐷3 ⊇ 𝑅2 ∩ 𝑅3,

which holds because

𝑅2 ∩ 𝑅3 = 𝑅2 ∩ ((𝑅1 ∪ 𝑅2) ∩ 𝑅3)

= 𝑅2 ∩ ((𝐷1 ∪ 𝐷2) ∩ 𝐷3) (By eq. (C.2))

= 𝑅2 ∩ ((𝐷1 ∩ 𝐷3) ∪ (𝐷2 ∩ 𝐷3))

⊆ ((𝑅2 ∩ 𝑅1) ∩ 𝐷3) ∪ (𝐷2 ∩ 𝐷3) (By 𝐷1 ⊆ 𝑅1)

= ((𝐷2 ∩ 𝐷1) ∩ 𝐷3) ∪ (𝐷2 ∩ 𝐷3) (By eq. (C.1))

⊆ 𝐷2 ∩ 𝐷3

• Equation (C.4) follows from (𝐷1 ∪ 𝐷2) ∩ 𝐷3 ⊆ (𝑅1 ∪ 𝑅2) ∩ 𝑅3 and (𝐷1 ∪

𝐷2) ∩ 𝐷3 ⊇ (𝑅1 ∪ 𝑅2) ∩ 𝑅3, which holds because

𝑅1 ∩ (𝑅2 ∪ 𝑅3)

= (𝑅1 ∩ 𝑅2) ∪ (𝑅1 ∩ 𝑅3)

⊆ (𝑅1 ∩ 𝑅2) ∪ (𝑅1 ∩ (𝑅1 ∪ 𝑅2) ∩ 𝑅3)

= (𝐷1 ∩ 𝐷2) ∪ (𝑅1 ∩ (𝐷1 ∪ 𝐷2) ∩ 𝐷3) (By eq. (C.1) and eq. (C.2))

= (𝐷1 ∩ 𝐷2) ∪ ((𝑅1 ∩ 𝐷1 ∩ 𝐷3) ∪ (𝑅1 ∩ 𝐷2 ∩ 𝐷3))

⊆ (𝐷1 ∩ 𝐷2) ∪ ((𝐷1 ∩ 𝐷3) ∪ (𝑅1 ∩ 𝑅2 ∩ 𝐷3)) (By 𝐷2 ⊆ 𝑅2)

⊆ (𝐷1 ∩ 𝐷2) ∪ ((𝐷1 ∩ 𝐷3) ∪ (𝐷1 ∩ 𝐷2 ∩ 𝐷3)) (By eq. (C.1))

⊆ (𝐷1 ∩ 𝐷2) ∪ (𝐷1 ∩ 𝐷3)

= 𝐷1 ∩ (𝐷2 ∪ 𝐷3)

We show that eq. (C.1) and eq. (C.2) follows from eq. (C.3) and eq. (C.4):

• Equation (C.1) follows from 𝐷1 ∩ 𝐷2 ⊆ 𝑅1 ∩ 𝑅2 and 𝐷1 ∩ 𝐷2 ⊇ 𝑅1 ∩ 𝑅2,

271


which holds because

𝑅1 ∩ 𝑅2 = (𝑅1 ∩ (𝑅2 ∪ 𝑅3)) ∩ 𝑅2

= (𝐷1 ∩ (𝐷2 ∪ 𝐷3)) ∩ 𝑅2 (By eq. (C.4))

= 𝐷1 ∩ ((𝐷2 ∩ 𝑅2) ∪ (𝐷3 ∩ 𝑅2))

= 𝐷1 ∩ (𝐷2 ∪ (𝐷3 ∩ 𝑅2))

⊆ 𝐷1 ∩ (𝐷2 ∪ (𝑅3 ∩ 𝑅2)) (By 𝐷3 ⊆ 𝑅3)

= 𝐷1 ∩ (𝐷2 ∪ (𝐷3 ∩ 𝐷2)) (By eq. (C.3))

= 𝐷1 ∩ 𝐷2

• Equation (C.2) follows from (𝐷1 ∪ 𝐷2) ∩ 𝐷3 ⊆ (𝑅1 ∪ 𝑅2) ∩ 𝑅3 and (𝐷1 ∪

𝐷2) ∩ 𝐷3 ⊇ (𝑅1 ∪ 𝑅2) ∩ 𝑅3, which holds because

(𝑅1 ∪ 𝑅2) ∩ 𝑅3

= (𝑅1 ∩ 𝑅3) ∪ (𝑅2 ∩ 𝑅3)

= (𝑅1 ∩ (𝑅2 ∪ 𝑅3) ∩ 𝑅3) ∪ (𝑅2 ∩ 𝑅3)

= (𝐷1 ∩ (𝐷2 ∪ 𝐷3) ∩ 𝑅3) ∪ (𝐷2 ∩ 𝐷3) (By eq. (C.4))

= (𝐷1 ∩ ((𝐷2 ∩ 𝑅3) ∪ (𝐷3 ∩ 𝑅3))) ∪ (𝐷2 ∩ 𝐷3)

⊆ (𝐷1 ∩ ((𝑅2 ∩ 𝑅3) ∪ 𝐷3)) ∪ (𝐷2 ∩ 𝐷3) (By 𝐷2 ⊆ 𝑅2, 𝐷3 ⊆ 𝑅3)

= (𝐷1 ∩ ((𝐷2 ∩ 𝐷3) ∪ 𝐷3)) ∪ (𝐷2 ∩ 𝐷3) (By eq. (C.3))

= (𝐷1 ∩ 𝐷3) ∪ (𝐷2 ∩ 𝐷3)

= (𝐷1 ∪ 𝐷2) ∩ 𝐷3

Thus, eq. (C.1) and eq. (C.2) hold if and only if eq. (C.3) and eq. (C.4) hold.

Therefore, ( 𝑓1 ⊕ 𝑓2) ⊕ 𝑓3 is defined if and only if 𝑓1 ⊕ ( 𝑓2 ⊕ 𝑓3) is defined. By

lemma C.1.3, they are equal when both defined. □

272


C.1.3 Commutativity of Parallel Composition

Lemma C.1.5 (⊕ - Commutativity). When 𝑓1 ⊕ 𝑓2 and 𝑓2 ⊕ 𝑓1 are both defined,

𝑓1 ⊕ 𝑓2 = 𝑓2 ⊕ 𝑓1.

Proof. For any 𝑑 ∈ Mem[𝐷1 ∪ 𝐷2], 𝑚 ∈ D(Mem[𝑅1 ∪ 𝑅2]) such that 𝑑 ⊲⊳ 𝑚 is

defined,

( 𝑓1 ⊕ 𝑓2) (𝑑) (𝑚) = 𝑓1(𝑑𝐷1) (𝑚𝑅1) · 𝑓2(𝑑𝐷2) (𝑚𝑅2)

= 𝑓2(𝑑𝐷2) (𝑚𝑅2) · 𝑓1(𝑑𝐷1) (𝑚𝑅1)

= ( 𝑓2 ⊕ 𝑓1) (𝑑) (𝑚)

Thus, 𝑓1 ⊕ 𝑓2 = 𝑓2 ⊕ 𝑓1. □

Lemma C.1.6 (⊕ - Identity). For any 𝑓 : Mem[𝐴] → D(Mem[𝐴 ∪ 𝑋]) ∈ 𝑀 , and

any 𝑆 ⊆ 𝐴, we must show

𝑓 ⊕ unit𝑆 = 𝑓

Proof. Since 𝑆 ⊆ 𝐴, we have dom( 𝑓 ⊕unit𝑆) = 𝐴∪𝑆 = 𝐴 = dom( 𝑓 ) and range( 𝑓 ⊕

unit𝑆) = 𝐴 ∪ 𝑋 ∪ 𝑆 = 𝐴 ∪ 𝑋 = range( 𝑓 ). For any 𝑑 ∈ Mem[𝐴], and any 𝑟 ∈

Mem[𝐴 ∪ 𝑋] such that 𝑑 ⊲⊳ 𝑟 is defined, we have

( 𝑓 ⊕ unit𝑆) (𝑑) (𝑟) = 𝑓 (𝑑) (𝑟) · unit(𝑑𝑆) (𝑟𝑆)

= 𝑓 (𝑑) (𝑟) · 1

= 𝑓 (𝑑) (𝑟)

If 𝑑 ⊲⊳ 𝑟 is not defined, then ( 𝑓 ⊕ unit𝑆) (𝑑) (𝑟) = 𝑓 (𝑑) (𝑟). Hence, 𝑓 ⊕ unit𝑆 = 𝑓 . □

273


C.1.4 Other Properties Used in Proving Frame Conditions

Lemma C.1.7. For any 𝑓 : Mem[𝐴] → D(Mem[𝐴 ∪ 𝑋]) ∈ 𝑀 , and any 𝑆 ⊆ 𝐴, we

have

𝑓 ⊕ unit𝑆 = 𝑓

Proof. Since 𝑆 ⊆ 𝐴, we have dom( 𝑓 ⊕unit𝑆) = 𝐴∪𝑆 = 𝐴 = dom( 𝑓 ) and range( 𝑓 ⊕

unit𝑆) = 𝐴 ∪ 𝑋 ∪ 𝑆 = 𝐴 ∪ 𝑋 = range( 𝑓 ). For any 𝑑 ∈ Mem[𝐴], and any 𝑟 ∈

Mem[𝐴 ∪ 𝑋] such that 𝑑 ⊗ 𝑟 is defined, we have

( 𝑓 ⊕ unit𝑆) (𝑑) (𝑟) = 𝑓 (𝑑) (𝑟) · unit(𝑑𝑆) (𝑟𝑆)

= 𝑓 (𝑑) (𝑟) · 1 = 𝑓 (𝑑) (𝑟)

Hence, 𝑓 ⊕ unit𝑆 = 𝑓 . □

Lemma C.1.8 (Reverse Exchange Equality). We show that when both ( 𝑓1 ⊕ 𝑓2) ⊙

( 𝑓3 ⊕ 𝑓4) and ( 𝑓1 ⊙ 𝑓3) ⊕ ( 𝑓2 ⊙ 𝑓4) are defined, it holds that

( 𝑓1 ⊕ 𝑓2) ⊙ ( 𝑓3 ⊕ 𝑓4) = ( 𝑓1 ⊙ 𝑓3) ⊕ ( 𝑓2 ⊙ 𝑓4). (C.5)

Proof. First, the well-definedness of 𝑓1 ◦ 𝑓3 implies that 𝐷1 ⊆ 𝑅1 = 𝐷3 ⊆ 𝑅3,

and the well-definedness of 𝑓2 ◦ 𝑓4 implies that 𝐷2 ⊆ 𝑅2 = 𝐷4 ⊆ 𝑅4. Moreover,

both terms are of type Mem[𝐷1 ∪ 𝐷2] → D(Mem[𝑅3 ∪ 𝑅4]), and, for any 𝑑 ∈

274


Mem[𝐷1 ∪ 𝐷2] and 𝑚 ∈ Mem[𝑅3 ∪ 𝑅4], we have:

(
( 𝑓1 ⊕ 𝑓2) ⊙ ( 𝑓3 ⊕ 𝑓4)

)
(𝑑) (𝑚)

= ( 𝑓1 ⊕ 𝑓2) (𝑑) (𝑚𝑅1∪𝑅2) · ( 𝑓3 ⊕ 𝑓4) (𝑚𝐷3∪𝐷4) (𝑚) (Equation (4.2))

=
(
𝑓1(𝑑𝐷1) (𝑚𝑅1) · 𝑓2(𝑑𝐷2) (𝑚𝑅2)

)
·
(
𝑓3(𝑚𝐷3) (𝑚𝑅3) · 𝑓4(𝑚𝐷4) (𝑚𝑅4)

)
(
( 𝑓1 ⊙ 𝑓3) ⊕ ( 𝑓2 ⊙ 𝑓4)

)
(𝑑) (𝑚)

= ( 𝑓1 ⊙ 𝑓3) (𝑑𝐷1) (𝑚𝑅3) · ( 𝑓2 ⊙ 𝑓4) (𝑑𝐷2) (𝑚𝑅3)

=
(
𝑓1(𝑑𝐷1) (𝑚𝑅1) · 𝑓3(𝑑𝐷3) (𝑚𝑅3)

)
·
(
𝑓2(𝑑𝐷2) (𝑚𝑅2) · 𝑓4(𝑑𝐷4) (𝑚𝑅4)

)
=

(
𝑓1(𝑑𝐷1) (𝑚𝑅1) · 𝑓2(𝑑𝐷2) (𝑚𝑅2)

)
·
(
𝑓3(𝑚𝐷3) (𝑚𝑅3) · 𝑓4(𝑚𝐷4) (𝑚𝑅4)

)
Thus, ( 𝑓1 ⊙ 𝑓3) ⊕ ( 𝑓2 ⊙ 𝑓4) = ( 𝑓1 ⊕ 𝑓2) ⊙ ( 𝑓3 ⊕ 𝑓4). □

Lemma C.1.9. For any 𝑓1, 𝑓2, 𝑓3, 𝑓4 in X𝐶𝐼 , ( 𝑓1 ⊙ 𝑓3) ⊕ ( 𝑓2 ⊙ 𝑓4) is defined implies

( 𝑓1 ⊕ 𝑓2) ⊙ ( 𝑓3 ⊕ 𝑓4) is also defined. The converse does not always hold, but if in

addition, 𝑓1 ⊙ 𝑓3 and 𝑓2 ⊙ 𝑓4 are defined, then ( 𝑓1 ⊕ 𝑓2) ⊙ ( 𝑓3 ⊕ 𝑓4) is defined implies

( 𝑓1 ⊙ 𝑓3) ⊕ ( 𝑓2 ⊙ 𝑓4) is defined too.

Proof. We prove each direction individually:

• Given ( 𝑓1 ⊙ 𝑓3) ⊕ ( 𝑓2 ⊙ 𝑓4) is defined, it must that 𝑅1 = 𝐷3, 𝑅2 = 𝐷4, and

𝑅3 ∩ 𝑅4 = 𝐷1 ∩ 𝐷2. Thus, 𝑅1 ∩ 𝑅2 = 𝐷3 ∩ 𝐷4 ⊆ 𝑅3 ∩ 𝑅4 = 𝐷1 ∩ 𝐷2, ensuring

that 𝑓1 ⊕ 𝑓2 is defined;

𝑅3 ∩ 𝑅4 = 𝐷1 ∩ 𝐷2 ⊆ 𝑅1 ∩ 𝑅2 = 𝐷3 ∩ 𝐷4, ensuring that 𝑓3 ⊕ 𝑓4 is defined;

range( 𝑓1⊕ 𝑓2) = 𝑅1∪𝑅2 = 𝐷3∪𝐷4 = dom( 𝑓3⊕ 𝑓4), ensuring ( 𝑓1⊕ 𝑓2)⊙( 𝑓3⊕ 𝑓4)

is defined.

• Given 𝑓1 ⊙ 𝑓3 and 𝑓2 ⊙ 𝑓4 are defined, ( 𝑓1 ⊙ 𝑓3) ⊕ ( 𝑓2 ⊙ 𝑓4) is defined if

275


𝑅3 ∩ 𝑅4 = 𝐷1 ∩ 𝐷2. When ( 𝑓1 ⊕ 𝑓2) ⊙ ( 𝑓3 ⊕ 𝑓4) is defined,

𝑅3 ∩ 𝑅4 = 𝐷3 ∩ 𝐷4 (Because 𝑓3 ⊕ 𝑓4 is defined)

= 𝑅1 ∩ 𝑅2 (Because 𝑓1 ⊙ 𝑓3 and 𝑓2 ⊙ 𝑓4 are defined)

= 𝐷1 ∩ 𝐷2 (Because 𝑓1 ⊕ 𝑓2 is defined)

So ( 𝑓1 ⊙ 𝑓3) ⊕ ( 𝑓2 ⊙ 𝑓4) is also defined. □

C.1.5 Main Theorem: Proving Frame Conditions

Theorem 4.2.1. (X𝐶𝐼 , ⊑, ⊕̂, ⊙̂,X𝐶𝐼) is a DIBI frame.

Proof. We restate the frame conditions using concrete definitions of ⊕ and ⊙ and

then check that they hold.

⊕ Down-Closed We want to show that for any 𝑥′, 𝑥, 𝑦′, 𝑦 ∈ 𝑀 , if 𝑥′ ⊑ 𝑥 and

𝑦′ ⊑ 𝑦 and 𝑥 ⊕ 𝑦 = 𝑧, then 𝑥′ ⊕ 𝑦′ is defined, and 𝑥′ ⊕ 𝑦′ = 𝑧′ ⊑ 𝑧.

Since 𝑥′ ⊑ 𝑥 and 𝑦′ ⊑ 𝑦, there exist sets 𝑆1, 𝑆2, and 𝑣1, 𝑣2 ∈ 𝑀 such that

𝑥 = (𝑥′ ⊕ unit𝑆1) ⊙ 𝑣1, and 𝑦 = (𝑦′ ⊕ unit𝑆2) ⊙ 𝑣2. Thus,

𝑥 ⊕ 𝑦 = ((𝑥′ ⊕ unit𝑆1) ⊙ 𝑣1) ⊕ ((𝑦′ ⊕ unit𝑆2) ⊙ 𝑣2)

=
(
(𝑥′ ⊕ unit𝑆1) ⊕ (𝑦′ ⊕ unit𝑆2)

)
⊙ (𝑣1 ⊕ 𝑣2)

(By lemma C.1.9 and C.1.8)

=
(
(𝑥′ ⊕ 𝑦′) ⊕ (unit𝑆1 ⊕ unit𝑆2)

)
⊙ (𝑣1 ⊕ 𝑣2)

(By commutativity and associativity)

=
(
(𝑥′ ⊕ 𝑦′) ⊕ (unit𝑆1∪𝑆2)

)
⊙ (𝑣1 ⊕ 𝑣2)

This derivation proved that 𝑥′ ⊕ 𝑦′ is defined, and 𝑥′ ⊕ 𝑦′ ⊑ 𝑥 ⊕ 𝑦 = 𝑧.

276


⊙ Up-Closed We want to show that for any 𝑧′, 𝑧, 𝑥, 𝑦 ∈ 𝑀 , if 𝑧 = 𝑥 ⊙ 𝑦 and 𝑧′ ⊒ 𝑧,

then there exists 𝑥′, 𝑦′ such that 𝑥′ ⊒ 𝑥, 𝑦′ ⊒ 𝑦, and 𝑧′ = 𝑥′ ⊙ 𝑦′.

Since 𝑧′ ⊒ 𝑧, there exist set 𝑆, and 𝑣 ∈ 𝑀 such that 𝑧′ = (𝑧 ⊕ unit𝑆) ⊙ 𝑣. Thus,

𝑧′ = (𝑧 ⊕ unit𝑆) ⊙ 𝑣

= ((𝑥 ⊙ 𝑦) ⊕ unit𝑆) ⊙ 𝑣

= ((𝑥 ⊙ 𝑦) ⊕ (unit𝑆 ⊙ unit𝑆)) ⊙ 𝑣

= ((𝑥 ⊕ unit𝑆) ⊙ (𝑦 ⊕ unit𝑆)) ⊙ 𝑣 (By lemma C.1.9 and C.1.8)

= (𝑥 ⊕ unit𝑆) ⊙ ((𝑦 ⊕ unit𝑆) ⊙ 𝑣) (By standard associativity of ⊙)

Thus, for 𝑥′ = 𝑥 ⊕ unit𝑆 and 𝑦′ = (𝑦 ⊕ unit𝑆) ⊙ 𝑣, 𝑧′ = 𝑥′ ⊙ 𝑦′.

⊕ Commutativity We want to show that 𝑧 = 𝑥 ⊕ 𝑦 implies that 𝑧 = 𝑦 ⊕ 𝑥. First,

𝑥 ⊕ 𝑦 is defined iff range(𝑥) ∩ range(𝑦) = dom(𝑥) ∩ dom(𝑦) iff 𝑦 ⊕ 𝑥 is

defined; second, when 𝑥 ⊕ 𝑦 and 𝑦 ⊕ 𝑥 are both defined, they are equal due

to lemma C.1.5. Thus, the ⊕ commutativity frame condition is satisfied.

⊕ Associativity We want to show that 𝑧 = (𝑥 ⊕ 𝑦) ⊕ 𝑧 implies that 𝑧 = 𝑥 ⊕ (𝑦 ⊕ 𝑧).

We show that in lemma C.1.4.

⊕ Unit Existence We want to show that for any 𝑥 ∈ 𝑀 , there exists 𝑒 ∈ 𝐸 such

that 𝑥 = 𝑒 ⊕ 𝑥. We show that 𝑒 = unit∅ serves as the unit under ⊕ for

any 𝑥. For any 𝑥 : Mem[𝐴] → D(Mem[𝐵]), 𝑥 ⊕ unit∅ is defined because

𝐵 ∩ ∅ = ∅ = 𝐴 ∩ ∅, and by lemma C.1.7, (𝑥 ⊕ unit∅) = 𝑥.

⊕ Unit Coherence We want to show that for any 𝑦 ∈ 𝑀 , 𝑒 ∈ 𝐸 = 𝑀 , if 𝑥 = 𝑦 ⊕ 𝑒,

277


then 𝑥 ⊒ 𝑦.

𝑥 = 𝑦 ⊕ 𝑒

= (𝑦 ⊙ unitrange(𝑦)) ⊕ (unitdom(𝑒) ⊙ 𝑒)

= (𝑦 ⊕ unitdom(𝑒)) ⊙ (unitrange(𝑦) ⊕ 𝑒)

(By lemma C.1.8 and lemma C.1.9)

= (𝑦 ⊕ unitdom(𝑒)) ⊙ (𝑒 ⊕ unitrange(𝑦)) (⊕ Commutativity)

Thus, 𝑥 ⊒ 𝑦.

⊙ Associativity The frame axiom reduces to the standard associativity of ⊙.

Kleisli composition satisfies standard associativity, so ⊙ also satisfies stan-

dard associativity.

⊙ Unit ExistenceL and ⊙ Unit ExistenceR We need to show that, for any 𝑥 ∈

𝑀 , there exists 𝑒 ∈ 𝐸 such that 𝑒 ⊙ 𝑥 = 𝑥, and there exists 𝑒′ ∈ 𝐸 such

that 𝑥 ⊙ 𝑒′ = 𝑥. Since ⊙ is the Kleisli composition, for any morphism 𝑥 :

Mem[𝐴] → D(Mem[𝐵]), unit𝐴 is the left unit, and unit𝐵 is the right unit.

In addition, for all 𝑆, unit𝑆 ∈ 𝑀 = 𝐸 .

⊙ CoherenceR For any 𝑦 ∈ 𝑀, 𝑒 ∈ 𝐸 such that 𝑥 = 𝑦 ⊙ 𝑒, we want to show

that 𝑥 ⊒ 𝑦. We proved in lemma C.1.7 that (𝑦 ⊕ unit∅) = 𝑦 for any 𝑦, so

𝑥 = 𝑦 ⊙ 𝑒 = (𝑦 ⊕ unit∅) ⊙ 𝑒, and 𝑥 ⊑ 𝑦 as desired.

Unit Closure We want to show that for any 𝑒 ∈ 𝐸 and 𝑒′ ⊒ 𝑒, 𝑒′ ∈ 𝐸 . This is

evident because 𝐸 = 𝑀 and 𝑀 is closed under ⊕ and ⊙.

Reverse Exchange Given 𝑥 = 𝑦 ⊕ 𝑧 and 𝑦 = 𝑦1 ⊙ 𝑦2, 𝑧 = 𝑧1 ⊙ 𝑧2, we want to show

that there exists 𝑢 = 𝑦1 ⊕ 𝑧1, 𝑣 = 𝑦2 ⊕ 𝑧2, and 𝑥 = 𝑢 ⊙ 𝑣.

After substitution, we get (𝑦1 ⊙ 𝑦2) ⊕ (𝑧1 ⊙ 𝑧2) = 𝑦 ⊕ 𝑧 = 𝑥. By C.1.8

and lemma C.1.9, when (𝑦1 ⊙ 𝑦2) ⊕ (𝑧1 ⊙ 𝑧2) is defined, (𝑦1 ⊕ 𝑧1) ⊙ (𝑦2 ⊙ 𝑧2)

278


is also defined, and (𝑦1 ⊙ 𝑦2) ⊕ (𝑧1 ⊙ 𝑧2) = (𝑦1 ⊕ 𝑧1) ⊙ (𝑦2 ⊕ 𝑧2). Thus

(𝑦1 ⊕ 𝑧1) ⊙ (𝑦2 ⊕ 𝑧2) = 𝑦 ⊕ 𝑧 = 𝑥, and thus 𝑢 = 𝑦1 ⊕ 𝑧1, 𝑣 = 𝑦2 ⊕ 𝑧2 completes

the proof. □

C.2 Capturing Conditional Independence

C.2.1 Properties of the Probabilistic Frame

We prove some properties of the model that are useful for proving lemma C.2.8.

Lemma C.2.1 (Disintegration). If 𝑓 = 𝑓1 ⊙ 𝑓2 , then 𝜋𝑅1 𝑓 = 𝑓1. Conversely, if

𝜋𝑅1 𝑓 = 𝑓1, then there exists 𝑔 such that 𝑓 = 𝑓1 ⊙ 𝑔.

Proof. In short, it follows from properties of Kleisli category of discrete proba-

bility monad: Kleisli category of discrete probability monad is a Markov cate-

gory that has conditionals [Fritz, 2020, Example 11.2]; since the kernels are mor-

phisms in this category, and the operator ⊙ is the morphism composition in the

category, we have this lemma. We spell out the detailed proof in the following.

For the forwards direction, suppose that 𝑓 = 𝑓1 ⊙ 𝑓2. Then,

𝜋𝑅1 𝑓 = 𝜋𝑅1 ( 𝑓1 ⊙ 𝑓2) = 𝑓1 ⊙ (𝜋𝑅1 𝑓2) = 𝑓1 ⊙ unit𝑅1 = 𝑓1.

Thus, 𝜋𝑅1 𝑓 = 𝑓1.

For the converse, assume 𝜋𝑅1 𝑓 = 𝑓1. Denote range( 𝑓 ) as 𝑅. Define 𝑔 :

Mem[𝑅1] → D(Mem[𝑅]) such that for any 𝑟 ∈ Mem[𝑅1], 𝑚 ∈ Mem[𝑅] such

279


that 𝑓1(𝑟𝐷1) (𝑟) ≠ 0 , let

𝑔(𝑟) (𝑚) :=


𝑓 (𝑟𝐷1 ) (𝑚)
𝑓1 (𝑟𝐷1 ) (𝑟) 𝑟 ⊲⊳ 𝑚 is defined

0 𝑟 ⊲⊳ 𝑚 not defined

We need to check that 𝑔 ∈ X𝐶𝐼 . Fixing any 𝑟 ∈ Mem[𝑅1],∑︁
𝑚∈Mem[𝑅]

𝑔(𝑟) (𝑚) =
∑︁

𝑚∈Mem[𝑅] and 𝑚⊲⊳𝑟 is defined

𝑓 (𝑟𝐷1) (𝑚)
𝑓1(𝑟𝐷1) (𝑟)

(By definition of 𝑔)

=
∑︁

𝑦∈Mem[𝑅\𝑅′]

𝑓 (𝑟𝐷1) (𝑚)
𝑓1(𝑟𝐷1) (𝑟)

=
∑︁

𝑦∈Mem[𝑅\𝑅′]

𝑓 (𝑟𝐷1) (𝑦)∑
𝑥∈Mem[𝑅\𝑅1] 𝑓 (𝑟𝐷1) (𝑟 ⊲⊳ 𝑥)

(Because 𝜋𝑅1 𝑓 = 𝑓1)

= 1

so 𝑔 does map any input to a distribution, and 𝑔 preserves the input.

By their types, 𝑓1 ⊙ 𝑔 is defined. For any 𝑑 ∈ Mem[𝐷1], 𝑚 ∈ Mem[𝑅], if

𝑓1(𝑑) (𝑚𝑅1) ≠ 0, then

( 𝑓1 ⊙ 𝑔) (𝑑) (𝑚) = 𝑓1(𝑑) (𝑚𝑅1) · 𝑔(𝑚𝑅1) (𝑚)

= 𝑓1(𝑑) (𝑚𝑅1) · 𝑓 (𝑚𝐷1) (𝑚)
𝑓1(𝑚𝐷1) (𝑚𝑅1)

= 𝑓 (𝑑) (𝑚) (𝑑 ⊲⊳ 𝑚 is defined iff 𝑑 = 𝑚𝐷1)

If (𝜋𝑅1 𝑓 ) (𝑑) (𝑚𝑅1) = 0, then 𝑓 (𝑑) (𝑚) = 0, and ( 𝑓1 ⊙ 𝑔) (𝑑) (𝑚) = 𝑓1(𝑑) (𝑚𝑅1) ·

𝑔(𝑚𝑅1) (𝑚) = 0 = 𝑓 (𝑑) (𝑚). Thus, 𝑓1 ⊙ 𝑔 = 𝑓 . □

Lemma C.2.2 (Uniqueness). For any 𝑓 , 𝑔 : Mem[𝑋] → D(Mem[𝑋 ∪𝑌 ]) in 𝑀 , and

arbitrary ℎ ∈ 𝑀 , if 𝑓 ⊑ ℎ and 𝑔 ⊑ ℎ, then 𝑓 = 𝑔.

Proof. 𝑓 ⊑ ℎ implies that there exists 𝑣1, 𝑆1 such that ( 𝑓 ⊕ unit𝑆1) ⊙ 𝑣1 = ℎ; 𝑔 ⊑ ℎ

implies that there exists 𝑣2, 𝑆2 such that (𝑔 ⊕unit𝑆2) ⊙ 𝑣2 = ℎ. Take ℎ : Mem[𝑊] →

280


D(Mem[𝑍 ∪𝑊]), and then

𝑓 ⊕ unit𝑆1 = 𝜋range( 𝑓 ⊕unit𝑆1 )ℎ = 𝜋𝑋∪𝑌∪dom(ℎ)ℎ

𝑔 ⊕ unit𝑆2 = 𝜋range(𝑔⊕unit𝑆2 )ℎ = 𝜋𝑋∪𝑌∪dom(ℎ)ℎ

Thus, 𝑓 ⊕ unit𝑆1 = 𝑔 ⊕ unit𝑆2 . Now, suppose 𝑓 ≠ 𝑔. This would imply 𝑓 ⊕ unit𝑆1 ≠

𝑔 ⊕ unit𝑆2 which is a contradiction. Thus, 𝑓 = 𝑔. □

Lemma C.2.3 ( ⊙ elimination). For any 𝑓 , 𝑔 ∈ X𝐶𝐼 , if 𝑓 ⊙ (𝑔 ⊕ unit𝑋) is defined and

dom(𝑔) ⊆ dom( 𝑓 ), then 𝑓 ⊙ (𝑔 ⊕ unit𝑋) = 𝑔 ⊕ 𝑓 .

Proof. Let 𝑓 : Mem[𝑆] → D(Mem[𝑆 ∪ 𝑇]) and 𝑔 : Mem[𝑈] → D(Mem[𝑈 ∪ 𝑉])

be in 𝑀 . When𝑈 ⊆ 𝑆,

𝑓 ⊙ (𝑔 ⊕ unit𝑋)

= ( 𝑓 ⊕ unit𝑈) ⊙ (𝑔 ⊕ unit𝑋 ⊕ unit𝑆∪𝑇 ) (By C.1.7)

= (unit𝑈 ⊕ 𝑓 ) ⊙ (𝑔 ⊕ unit𝑋 ⊕ unit𝑆∪𝑇 ) (By commutativity)

= (unit𝑈 ⊕ 𝑓 ) ⊙ (𝑔 ⊕ unit𝑆∪𝑇 ) (†)

= (unit𝑈 ⊙ 𝑔) ⊕ ( 𝑓 ⊙ unit𝑆∪𝑇 ) (By lemma C.1.9 and C.1.8)

= 𝑔 ⊕ 𝑓 □

where † follows from 𝑋 ⊆ 𝑆 ∪ 𝑇 , which holds as 𝑓 ⊙ (𝑔 ⊕ unit𝑋) defined implies

𝑆 ∪ 𝑇 = 𝑋 ∪𝑈.

Lemma C.2.4 (Converting ⊕ to ⊙). For any kernel 𝑓 : Mem[𝑆] → D(Mem[𝑆 ∪𝑇])

and 𝑔 : Mem[𝑈] → D(Mem[𝑈 ∪ 𝑉]) in X𝐶𝐼 . If 𝑓 ⊕ 𝑔 is defined, then 𝑓 ⊕ 𝑔 =

( 𝑓 ⊕ unit𝑈) ⊙ (unit𝑆∪𝑇 ⊕ 𝑔).

281


Proof.

𝑓 ⊕ 𝑔 = ( 𝑓 ⊙ unit𝑆∪𝑇 ) ⊕ (unit𝑈 ⊙ 𝑔)

= ( 𝑓 ⊕ unit𝑈) ⊙ (unit𝑆∪𝑇 ⊕ 𝑔) (By lemma C.1.9 and C.1.8)

□

Lemma C.2.5 (Quasi-Downwards-closure of ⊙). For any 𝑓 , 𝑔, ℎ, 𝑖 ∈ X𝐶𝐼 , if 𝑓 ⊑ ℎ,

𝑔 ⊑ 𝑖, and 𝑓 ⊙ 𝑔, ℎ ⊙ 𝑖 are all defined, then 𝑓 ⊙ 𝑔 ⊑ ℎ ⊙ 𝑖.

Proof. Since 𝑓 ⊑ ℎ, 𝑔 ⊑ 𝑖, there must exist sets 𝑆1, 𝑆2 and 𝑣1, 𝑣2 ∈ 𝑀 such that

ℎ = ( 𝑓 ⊕unit𝑆1) ⊙ 𝑣1, 𝑖 = (𝑔⊕unit𝑆2) ⊙ 𝑣2. 𝑓 ⊙𝑔 is defined, so dom(𝑔) = range( 𝑓 ) ⊆

range( 𝑓 ⊕ unit𝑆1) = dom(𝑣1). Thus,

ℎ ⊙ 𝑖 = ( 𝑓 ⊕ unit𝑆1) ⊙ 𝑣1 ⊙ (𝑔 ⊕ unit𝑆2) ⊙ 𝑣2

= ( 𝑓 ⊕ unit𝑆1) ⊙ (𝑔 ⊕ 𝑣1) ⊙ 𝑣2 (By lemma C.2.3 and dom(𝑔) ⊆ dom(𝑣1))

= ( 𝑓 ⊕ unit𝑆1) ⊙ (𝑔 ⊕ unitdom(𝑣1)) ⊙ (unitrange(𝑔) ⊕ 𝑣1) ⊙ 𝑣2

(By lemma C.2.4)

= ( 𝑓 ⊕ unit𝑆1) ⊙ (𝑔 ⊕ unit𝑆1) ⊙ (unitrange(𝑔) ⊕ 𝑣1) ⊙ 𝑣2 (†)

= (( 𝑓 ⊙ 𝑔) ⊕ (unit𝑆1 ⊙ unit𝑆1)) ⊙ (unitrange(𝑔) ⊕ 𝑣1) ⊙ 𝑣2 (♥)

= (( 𝑓 ⊙ 𝑔) ⊕ unit𝑆1) ⊙ (unitrange(𝑔) ⊕ 𝑣1) ⊙ 𝑣2

where † follows from dom(𝑔) = range( 𝑓 ) and lemma C.1.7, and ♥ follows

from lemma C.1.9 and C.1.8.

Therefore, 𝑓 ⊙ 𝑔 ⊑ ℎ ⊙ 𝑖. □

282


C.2.2 Key Lemmas: Conditional Independence is Expressed

Lemma C.2.6 (Classical flavor in intuitionistic model). For any 𝑓 ∈ 𝑀 ,

𝑓 |= (∅ ⊲ 𝑍) # ((𝑍 ⊲ 𝑋) ∗ (𝑍 ⊲ 𝑌 ))

if and only if there exist 𝑔, ℎ, 𝑖 ∈ 𝑀 , such that 𝑔 : Mem[∅] → D(Mem[𝑍]),

ℎ : Mem[𝑍] → D(Mem[𝑍∪𝑋]), 𝑖 : Mem[𝑍] → D(Mem[𝑍∪𝑌 ]), and 𝑔⊙(ℎ⊕𝑖) ⊑ 𝑓 .

Proof. The backwards direction trivially follows from persistence. We detail the

proof for the forward direction here. Suppose 𝑓 |= (∅ ⊲ 𝑍) # ((𝑍 ⊲ 𝑋) ∗ (𝑍 ⊲ 𝑌 )).

Then, there exist 𝑓1, 𝑓2, 𝑓3, 𝑓4 such that 𝑓1 ⊙ 𝑓2 = 𝑓 , 𝑓3 ⊕ 𝑓4 ⊑ 𝑓2, 𝑓1 |= (∅ ⊲ 𝑍),

𝑓3 |= (𝑍 ⊲ 𝑋) and 𝑓4 |= (𝑍 ⊲ 𝑌 ).

• 𝑓1 |= (∅ ⊲ 𝑍) implies that there exists 𝑓 ′′1 ⊑ 𝑓1 such that dom( 𝑓 ′′1 ) = ∅, and

range( 𝑓 ′′1 ) ⊇ 𝑍 . Let 𝑓 ′1 = 𝜋𝑍 𝑓
′′
1 . Note that 𝑓 ′1 : Mem[∅] → D(Mem[𝑍])

and 𝑓 ′1 ⊑ 𝑓 ′′1 ⊑ 𝑓1. Hence, there exists some set 𝑆1 and 𝑣1 ∈ 𝑀 such that

𝑓1 = ( 𝑓 ′1 ⊕ unit𝑆1) ⊙ 𝑣1.

• 𝑓3 |= (𝑍 ⊲ 𝑋) implies that there exists 𝑓 ′′3 ⊑ 𝑓3 such that dom( 𝑓 ′′3 ) = 𝑍 ,

and range( 𝑓 ′′3 ) ⊇ 𝑋 . Define 𝑓 ′3 = 𝜋𝑍∪𝑋 𝑓 ′′3 . Then 𝑓 ′3 ⊑ 𝑓 ′′3 ⊑ 𝑓3, and

𝑓 ′3 : Mem[𝑍] → D(Mem[𝑋 ∪ 𝑍]).

• 𝑓4 |= (𝑍 ⊲ 𝑌 ) implies that there exists 𝑓 ′′4 ⊑ 𝑓4 such that dom( 𝑓 ′′4 ) = 𝑍 ,

and range( 𝑓 ′′4 ) ⊇ 𝑌 . Define 𝑓 ′4 = 𝜋𝑍∪𝑌 𝑓 ′′4 and note that 𝑓 ′4 : Mem[𝑍] →

D(Mem[𝑌 ∪ 𝑍]).

• By ⊕ Down-Closed, having 𝑓3 ⊕ 𝑓4 defined implies that 𝑓 ′3 ⊕ 𝑓
′
4 is also de-

fined and 𝑓 ′3 ⊕ 𝑓
′
4 ⊑ 𝑓3 ⊕ 𝑓4 ⊑ 𝑓2. Thus, there exists some 𝑣2 ∈ 𝑀 and finite

set 𝑆2 such that 𝑓2 = ( 𝑓 ′3 ⊕ 𝑓
′
4 ⊕ unit𝑆2) ⊙ 𝑣2.

283


Using these observations, we can now calculate and show that 𝑓 ′1 ⊙ ( 𝑓
′
3 ⊕ 𝑓 ′4 ⊕

unit𝑍 ) ⊑ 𝑓1 ⊕ 𝑓2:

𝑓1 ⊙ 𝑓2

= ( 𝑓 ′1 ⊕ unit𝑆1) ⊙ 𝑣1 ⊙ ( 𝑓 ′3 ⊕ 𝑓
′
4 ⊕ unit𝑆2) ⊙ 𝑣2

= ( 𝑓 ′1 ⊕ unit𝑆1) ⊙
(
𝑓 ′3 ⊕ 𝑓

′
4 ⊕ 𝑣1

)
⊙ 𝑣2

(By lemma C.2.3 and dom( 𝑓 ′3 ⊕ 𝑓
′
4 ) = 𝑍 ⊆ range( 𝑓 ′1 ⊕ unit𝑆1))

= ( 𝑓 ′1 ⊕ unit𝑆1) ⊙
(
( 𝑓 ′3 ⊕ 𝑓

′
4 ⊕ unitdom(𝑣1 ) ) ⊙ (unit𝑋∪𝑌∪𝑍 ⊕ 𝑣1)

)
⊙ 𝑣2 (By lemma C.2.4)

= ( 𝑓 ′1 ⊕ unit𝑆1) ⊙ ( 𝑓 ′3 ⊕ 𝑓
′
4 ⊕ unit𝑍 ⊕ unit𝑆1) ⊙ (unit𝑋∪𝑌∪𝑍 ⊕ 𝑣1) ⊙ 𝑣2

(By dom(𝑣1) = 𝑍 ∪ 𝑆1)

=
(
( 𝑓 ′1 ⊙ ( 𝑓

′
3 ⊕ 𝑓

′
4 ⊕ unit𝑍 )) ⊕ (unit𝑆1 ⊙ unit𝑆1)

)
⊙ (unit𝑋∪𝑌∪𝑍 ⊕ 𝑣1) ⊙ 𝑣2

(By lemma C.1.8 and lemma C.1.9)

=
(
( 𝑓 ′1 ⊙ ( 𝑓

′
3 ⊕ 𝑓

′
4 ⊕ unit𝑍 )) ⊕ unit𝑆1

)
⊙ (unit𝑋∪𝑌∪𝑍 ⊕ 𝑣1) ⊙ 𝑣2

=
(
( 𝑓 ′1 ⊙ ( 𝑓

′
3 ⊕ 𝑓

′
4 )) ⊕ unit𝑆1

)
⊙ (unit𝑋∪𝑌∪𝑍 ⊕ 𝑣1) ⊙ 𝑣2 (By lemma C.1.7)

To finish, take 𝑔= 𝑓 ′1 : Mem[∅] → D(Mem[𝑍]), ℎ= 𝑓 ′3 : Mem[𝑍] → D(Mem[𝑍 ∪

𝑋]), 𝑖= 𝑓 ′4 : Mem[𝑍] → D(Mem[𝑍 ∪𝑌 ]), and note that 𝑔⊙ (ℎ⊕ 𝑖) = 𝑓 ′1 ⊙ ( 𝑓
′
3 ⊕ 𝑓

′
4) ⊑

𝑓1 ⊕ 𝑓2 ⊑ 𝑓 . □

Lemma C.2.7. If 𝑋,𝑌 are conditionally independent given 𝑆, then values on 𝑋 ∩ 𝑌 is

determined given values on 𝑆.

Proof. In short, conditional independence is closed under projection (of the con-

ditionally independent components). Thus, 𝑋 ∩ 𝑌 is conditionally independent

to itself given 𝑆. Any random variable independent to itself must be determin-

istic. Thus, 𝑋 ∩𝑌 is deterministic given 𝑆. We spell out the detailed proof below.

Let 𝑋′ = 𝑋 \𝑌 , 𝑌 ′ = 𝑌 \ 𝑋 . By assumption, 𝑋,𝑌 are conditionally independent

284


given 𝑆 , so 𝑥 ∈ Mem[𝑋], 𝑦 ∈ Mem[𝑌 ], 𝑠 ∈ Mem[𝑆],

𝜇(𝑋 = 𝑥 | 𝑆 = 𝑠) · 𝜇(𝑌 = 𝑦 | 𝑆 = 𝑠) = 𝜇(𝑋 = 𝑥 ∩ 𝑌 = 𝑦 | 𝑆 = 𝑠).

Thus, if we denote 𝑥′ = 𝜋𝑋 ′𝑥, 𝑦′ = 𝜋𝑌 ′𝑦 and let 𝑀 = 𝑋 ∩ 𝑌 , then for any 𝑚 ∈

Mem[𝑀],

𝜇(𝑋′ = 𝑥′ ∩ 𝑌 ′ = 𝑦′ ∩ 𝑀 = 𝑚 | 𝑆 = 𝑠)

= 𝜇(𝑋′ = 𝑥′ ∩ 𝑀 = 𝑚 | 𝑆 = 𝑠) · 𝜇(𝑌 ′ = 𝑦′ ∩ 𝑀 = 𝑚 | 𝑆 = 𝑠) (C.6)

For any probabilistic events 𝐸1, 𝐸2, 𝐸3, 𝜇(𝐸1 ∩ 𝐸2 | 𝐸3) = 𝜇(𝐸1 | 𝐸2, 𝐸3) · 𝜇(𝐸2 |

𝐸3). Thus, eq. (C.6) implies that

𝜇(𝑋 ′ = 𝑥′ | 𝑀 = 𝑚 ∩ 𝑆 = 𝑠) · 𝜇(𝑌 ′ = 𝑦′ | 𝑀 = 𝑚 ∩ 𝑆 = 𝑠) · 𝜇(𝑀 = 𝑚 | 𝑆 = 𝑠)

= 𝜇(𝑋 ′ = 𝑥′ ∩ 𝑌 ′ = 𝑦′ | 𝑀 = 𝑚 ∩ 𝑆 = 𝑠) (C.7)

Then, for any 𝑠 ∈ Mem[𝑆], 𝑚 ∈ Mem[𝑀] such that 𝜇(𝑀 = 𝑚 ∩ 𝑆 = 𝑠) ≠ 0,∑︁
𝑥′∈Mem[𝑋 ′]∩𝑦′∈Mem[𝑌 ′]

𝜇(𝑋′ = 𝑥′ | 𝑀 = 𝑚 ∩ 𝑆 = 𝑠) · 𝜇(𝑌 ′ = 𝑦′ | 𝑀 = 𝑚, 𝑆 = 𝑠) · 𝜇(𝑀 = 𝑚 | 𝑆 = 𝑠)

=
∑︁

𝑥′∈Mem[𝑋 ′],𝑦′∈Mem[𝑌 ′]
𝜇(𝑋′ = 𝑥′, 𝑌 ′ = 𝑦′ | 𝑀 = 𝑚, 𝑆 = 𝑠) (Because of eq. (C.7))

= 1 (C.8)

Meanwhile, for any 𝑠 ∈ Mem[𝑆], 𝑚 ∈ Mem[𝑀] such that 𝑚 ⊲⊳ 𝑠 is defined

and 𝜇(𝑀 = 𝑚, 𝑆 = 𝑠) ≠ 0,∑︁
𝑥′∈Mem[𝑋′ ],𝑦′∈Mem[𝑌 ′ ]

𝜇(𝑋 ′ = 𝑥′ | 𝑀 = 𝑚 ∩ 𝑆 = 𝑠) · 𝜇(𝑌 ′ = 𝑦′ | 𝑀 = 𝑚 ∩ 𝑆 = 𝑠) · 𝜇(𝑀 = 𝑚 | 𝑆 = 𝑠)

=
©­«

∑︁
𝑥′∈Mem[𝑋′ ],𝑦′∈Mem[𝑌 ′ ]

𝜇(𝑋 ′ = 𝑥′ | 𝑀 = 𝑚, 𝑆 = 𝑠) · 𝜇(𝑌 ′ = 𝑦′ | 𝑀 = 𝑚 ∩ 𝑆 = 𝑠)ª®¬ · 𝜇(𝑀 = 𝑚 | 𝑆 = 𝑠)

=
©­«

∑︁
𝑥′∈Mem[𝑋′ ]

𝜇(𝑋 ′ = 𝑥′ | 𝑀 = 𝑚 ∩ 𝑆 = 𝑠)ª®¬ · ©­«
∑︁

𝑦′∈Mem[𝑌 ′ ]
𝜇(𝑌 ′ = 𝑦′ | 𝑀 = 𝑚 ∩ 𝑆 = 𝑠)ª®¬ · 𝜇(𝑀 = 𝑚 | 𝑆 = 𝑠)

= 1 · 𝜇(𝑀 = 𝑚 | 𝑆 = 𝑠) (C.9)

285


Combining eq. (C.9) and eq. (C.8), we derive 𝜇(𝑀 = 𝑚 | 𝑆 = 𝑠) = 1. That

is, when 𝑋 ⊥⊥ 𝑌 | 𝑆, whether 𝑀 ⊇ 𝑆 or not, 𝜇(𝑀 = 𝑚, 𝑆 = 𝑠) ≠ 0 implies

𝜇(𝑀 = 𝑚 | 𝑆 = 𝑠) = 1. Thus, 𝑋 ⊥⊥ 𝑌 | 𝑆 renders values on 𝑋 ∩ 𝑌 deterministic

given values on 𝑆. □

Lemma C.2.8. For a distribution 𝜇 on Var, let 𝑓𝜇 denote the kernel ⟨⟩ ↦→ 𝜇. Then, there

exist 𝑆, 𝑋,𝑌 ⊆ Var, 𝑓1 : Mem[∅] → D(Mem[𝑆]), 𝑓2 : Mem[𝑆] → D(Mem[𝑆 ∪ 𝑋]),

𝑓3 : Mem[𝑆] → D(Mem[𝑆∪𝑌 ]) such that 𝑓1⊙ ( 𝑓2⊕ 𝑓3) ⊑ 𝑓𝜇, if and only if 𝑋 ⊥⊥ 𝑌 | 𝑆

and also 𝑋 ∩ 𝑌 ⊆ 𝑆.

Proof. Forward direction: Assume the existence of 𝑓1, 𝑓2, 𝑓3 satisfying 𝑓1 ⊙ ( 𝑓2 ⊕

𝑓3) ⊑ 𝑓𝜇. We must prove 𝑋 ⊥⊥ 𝑌 | 𝑆 and 𝑋 ∩ 𝑌 ⊆ 𝑆.

1. 𝑋 ∩ 𝑌 ⊆ 𝑆: 𝑓2 ⊕ 𝑓3 defined implies (𝑋 ∪ 𝑆) ∩ (𝑌 ∪ 𝑆) ⊆ 𝑆∩ 𝑆. Thus, 𝑋 ∩𝑌 ⊆ 𝑆.

2. 𝑋 ⊥⊥ 𝑌 | 𝑆: By assumption, 𝑓1 ⊙ ( 𝑓2 ⊕ 𝑓3) ⊑ 𝑓𝜇. lemma C.2.1 gives us 𝑓1 ⊙

( 𝑓2 ⊕ 𝑓3) = 𝜋𝑆∪𝑋∪𝑌 ( 𝑓𝜇), and 𝑓1 = 𝜋𝑆 ( 𝑓𝜇). Thus, for any 𝑚 ∈ Mem[𝑋 ∪𝑌 ∪ 𝑆],

𝜇(𝑋 = 𝑚𝑋 , 𝑌 = 𝑚𝑌 , 𝑆 = 𝑚𝑆) = (𝜋𝑋∪𝑌∪𝑆𝜇) (𝑚𝑋 ⊲⊳ 𝑚𝑌 ⊲⊳ 𝑚𝑆)

= 𝜋𝑋∪𝑌∪𝑆 ( 𝑓𝜇) (⟨⟩)(𝑚𝑋 ⊲⊳ 𝑚𝑌 ⊲⊳ 𝑚𝑆)

= 𝑓1 ⊙ ( 𝑓2 ⊕ 𝑓3) (⟨⟩)(𝑚𝑋 ⊲⊳ 𝑚𝑌 ⊲⊳ 𝑚𝑆)

Similarly, since 𝑓1 = 𝜋𝑆 ( 𝑓𝜇), we have

𝜇(𝑆 = 𝑚𝑆) = (𝜋𝑆𝜇) (𝑚𝑆) =
(
𝜋𝑆 ( 𝑓𝜇)

)
(⟨⟩)(𝑚𝑆) = 𝑓1(⟨⟩)(𝑚𝑆) (C.10)

286


By definition of conditional probability, when 𝜇(𝑆 = 𝑚𝑆) ≠ 0,

𝜇(𝑋 = 𝑚𝑋 , 𝑌 = 𝑚𝑌 | 𝑆 = 𝑚𝑆)

=
𝜇(𝑋 = 𝑚𝑋 , 𝑌 = 𝑚𝑌 , 𝑆 = 𝑚𝑆)

𝜇(𝑆 = 𝑚𝑆)

=
𝑓1 ⊙ ( 𝑓2 ⊕ 𝑓3) (⟨⟩)(𝑚𝑆 ⊲⊳ 𝑚𝑋 ⊲⊳ 𝑚𝑌 )

𝑓1(⟨⟩)(𝑚𝑆)

=
𝑓1(⟨⟩)(𝑚𝑆) · ( 𝑓2 ⊕ 𝑓3) (𝑚𝑆) (𝑚𝑆 ⊲⊳ 𝑚𝑋 ⊲⊳ 𝑚𝑌 )

𝑓1(⟨⟩)(𝑚𝑆)
(By eq. (4.2))

= ( 𝑓2 ⊕ 𝑓3) (𝑚𝑆) (𝑚𝑆 ⊲⊳ 𝑚𝑋 ⊲⊳ 𝑚𝑌 )

= 𝑓2(𝑚𝑆) (𝑚𝑋∪𝑆) · 𝑓3(𝑚𝑆) (𝑚𝑌∪𝑆) (C.11)

Let 𝑓 ′2 = 𝑓2 ⊕ unitMem[𝑌 ] , 𝑓 ′3 = 𝑓3 ⊕ unitMem[𝑋] . By lemma C.2.4,

𝑓1 ⊙ ( 𝑓2 ⊕ 𝑓3) = 𝑓1 ⊙ 𝑓2 ⊙ ( 𝑓3 ⊕ unitMem[𝑋]) = 𝑓1 ⊙ 𝑓2 ⊙ 𝑓 ′3

𝑓1 ⊙ ( 𝑓2 ⊕ 𝑓3) = 𝑓1 ⊙ ( 𝑓3 ⊕ 𝑓2) = 𝑓1 ⊙ 𝑓3 ⊙ ( 𝑓2 ⊕ unitMem[𝑌 ]) = 𝑓1 ⊙ 𝑓3 ⊙ 𝑓 ′2

Lemma C.2.1 gives us 𝜋𝑋∪𝑆 ( 𝑓𝜇) = 𝑓1 ⊙ 𝑓2, and 𝜋𝑌∪𝑆 ( 𝑓𝜇) = 𝑓1 ⊙ 𝑓3, Therefore,

𝜇(𝑋 = 𝑚𝑋 , 𝑆 = 𝑚𝑆) = (𝜋𝑋∪𝑆 ( 𝑓𝜇)) (⟨⟩)(𝑚𝑆 ⊲⊳ 𝑚𝑋)

= ( 𝑓1 ⊙ 𝑓2) (⟨⟩)(𝑚𝑆 ⊲⊳ 𝑚𝑋)

= 𝑓1(⟨⟩)(𝑚𝑆) · 𝑓2(𝑚𝑆) (𝑚𝑆 ⊲⊳ 𝑚𝑋)

𝜇(𝑌 = 𝑚𝑌 , 𝑆 = 𝑚𝑆) = (𝜋𝑌∪𝑆 ( 𝑓𝜇) (⟨⟩)(𝑚𝑆 ⊲⊳ 𝑚𝑌 )

= ( 𝑓1 ⊙ 𝑓3) (⟨⟩)(𝑚𝑆 ⊲⊳ 𝑚𝑌 )

= 𝑓1(⟨⟩)(𝑚𝑆) · 𝑓3(𝑚𝑆) (𝑚𝑆 ⊲⊳ 𝑚𝑌 )

287


Thus, by definition of conditional probability.

𝜇(𝑋 = 𝑚𝑋 | 𝑆 = 𝑚𝑆) = 𝜇(𝑋 = 𝑚𝑋 , 𝑆 = 𝑚𝑆)
𝜇(𝑆 = 𝑚𝑆)

=
𝑓1(⟨⟩)(𝑚𝑆) · 𝑓2(𝑚𝑆) (𝑚𝑆∪𝑋)

𝑓1(⟨⟩)(𝑚𝑆)

= 𝑓2(𝑚𝑆) (𝑚𝑆∪𝑋) (C.12)

𝜇(𝑋 = 𝑚𝑌 | 𝑆 = 𝑚𝑆) = 𝜇(𝑋 = 𝑚𝑋 , 𝑆 = 𝑚𝑆)
𝜇(𝑆 = 𝑚𝑆)

=
𝑓1(⟨⟩)(𝑚𝑆) · 𝑓3(𝑚𝑆) (𝑚𝑆∪𝑌 )

𝑓1(⟨⟩)(𝑚𝑆)

= 𝑓3(𝑚𝑆) (𝑚𝑆∪𝑌 ) (C.13)

Substituting eq. (C.12) and eq. (C.13) into the equation eq. (C.11), we have

𝜇(𝑋 = 𝑚𝑋 , 𝑌 = 𝑚𝑌 | 𝑆 = 𝑚𝑆) = 𝜇(𝑋 = 𝑚𝑋 | 𝑆 = 𝑚𝑆) · 𝜇(𝑋 = 𝑚𝑌 | 𝑆 = 𝑚𝑆))

Thus, 𝑋,𝑌 are conditionally independent given 𝑆. This completes the

proof for the first direction.

Backward direction: We want to show that if 𝑋 ⊥⊥ 𝑌 | 𝑆 and 𝑋 ∩ 𝑌 ⊆ 𝑆 then

there exists such 𝑓1, 𝑓2, 𝑓3 that 𝑓1 ⊙ ( 𝑓2 ⊕ 𝑓3) ⊑ 𝑓𝜇. Given 𝜇, we define 𝑓1 = 𝜋𝑆 ( 𝑓𝜇)

and construct 𝑓2, 𝑓3 as follows:

Let 𝑓2 : Mem[𝑆] → D(Mem[𝑆 ∪ 𝑋]). For any 𝑠 ∈ Mem[𝑆], 𝑥 ∈ Mem[𝑋],

when 𝑓1(⟨⟩)(𝑠) ≠ 0, let

𝑓2(𝑠) (𝑠 ⊲⊳ 𝑥) :=


(𝜋𝑆∪𝑋 𝑓𝜇) (⟨⟩)(𝑠⊲⊳𝑥)

𝑓1 (⟨⟩)(𝑠) if 𝑠 ⊲⊳ 𝑥 is defined

0 otherwise

When 𝑓1(⟨⟩)(𝑠) = 0, we can define 𝑓2(𝑠) (𝑠 ⊲⊳ 𝑥) arbitrarily as long as 𝑓2(𝑠) is a

distribution, because that distribution will be zeroed out in 𝑓1⊙ ( 𝑓2⊕ 𝑓3) anyway.

288


Similarly, let 𝑓3 : Mem[𝑆] → D(Mem[𝑆 ∪ 𝑌 ]). For any 𝑠 ∈ Mem[𝑆], 𝑥 ∈

Mem[𝑌 ] such that 𝑠 ⊲⊳ 𝑦 is defined, when 𝑓1(⟨⟩)(𝑠) ≠ 0, let

𝑓3(𝑠) (𝑠 ⊲⊳ 𝑦) :=


(𝜋𝑆∪𝑌 𝑓𝜇) (𝑠⊲⊳𝑦)

𝑓1 (⟨⟩)(𝑠) if 𝑠 ⊲⊳ 𝑦 is defined

0 otherwise

By construction, 𝑓1, 𝑓2, 𝑓3 each has the type needed for the lemma. We are left to

prove that given any 𝑠 ∈ Mem[𝑆], 𝑓2 and 𝑓3 are kernels in X𝐶𝐼 , 𝑓1 ⊙ ( 𝑓2 ⊕ 𝑓3) is

defined, and 𝑓1 ⊙ ( 𝑓2 ⊕ 𝑓3) ⊑ 𝑓𝜇.

• State 𝑓2 is in X𝐶𝐼 , which boils down to show that: for any 𝑠 ∈ Mem[𝑆],

𝑓2(𝑠) forms a distribution, and also 𝑓2 preserves the input. It can be shown

through by mechanical calculation and we omit it here.

• State 𝑓3 is in X𝐶𝐼 . Similar as above.

• State 𝑓1 ⊙ ( 𝑓2 ⊕ 𝑓3) is defined.

𝑓2 ⊕ 𝑓3 is defined because 𝑅2 ∩ 𝑅3 = (𝑆 ∪ 𝑋) ∩ (𝑆 ∪𝑌 ) = 𝑆 ∪ (𝑋 ∩𝑌 ), and by

assumption, 𝑋 ∩𝑌 ⊆ 𝑆, so 𝑆 ∪ (𝑋 ∩𝑌 ) = 𝑆 = 𝐷2 ∩ 𝐷3. Then 𝑓1 ⊙ ( 𝑓2 ⊕ 𝑓3) is

defined because dom( 𝑓2 ⊕ 𝑓3) = 𝐷2 ∪ 𝐷3 = 𝑆 ∪ 𝑆 = 𝑆 = range( 𝑓1).

• State 𝑓1 ⊙ ( 𝑓2 ⊕ 𝑓3) ⊑ 𝑓𝜇.

It suffices to show that there exists 𝑔 such that ( 𝑓1 ⊙ ( 𝑓2 ⊕ 𝑓3)) ⊙ 𝑔 = 𝑓𝜇.

For any 𝑠 ∈ Mem[𝑆], 𝑥 ∈ Mem[𝑋], 𝑦 ∈ Mem[𝑌 ] such that 𝑠 ⊲⊳ 𝑥 ⊲⊳ 𝑦 is

defined,

𝑓1 ⊙ ( 𝑓2 ⊕ 𝑓3) (⟨⟩)(𝑠 ⊲⊳ 𝑥 ⊲⊳ 𝑦) (C.14)

= 𝑓1(⟨⟩)(𝑠) · 𝑓2 ⊕ 𝑓3(𝑠) (𝑠 ⊲⊳ 𝑥 ⊲⊳ 𝑦)

= 𝑓1(⟨⟩)(𝑠) · ( 𝑓2(𝑠) (𝑠 ⊲⊳ 𝑥) · 𝑓3(𝑠) (𝑠 ⊲⊳ 𝑦))

= 𝜇(𝑆 = 𝑠) · (𝜇(𝑋 = 𝑥 | 𝑆 = 𝑠) · 𝜇(𝑌 = 𝑦 | 𝑆 = 𝑠)) (C.15)

289


Because 𝑋,𝑌 are conditionally independent given 𝑆 in the distribution 𝑞,

𝜇(𝑋 = 𝑥 | 𝑆 = 𝑠) · 𝜇(𝑌 = 𝑦 | 𝑆 = 𝑠) = 𝜇(𝑋 = 𝑥,𝑌 = 𝑦 | 𝑆 = 𝑠) (C.16)

Substituting eq. (C.16) into eq. (C.15), we have

𝑓1 ⊙ ( 𝑓2 ⊕ 𝑓3) (⟨⟩)(𝑠 ⊲⊳ 𝑥 ⊲⊳ 𝑦) = 𝜇(𝑆 = 𝑠) · 𝜇(𝑋 = 𝑥,𝑌 = 𝑦 | 𝑆 = 𝑠)

= 𝜇(𝑋 = 𝑥,𝑌 = 𝑦, 𝑆 = 𝑠)

Let 𝑔 : Mem[𝑋 ∪ 𝑌 ∪ 𝑆] → D(Mem[Val]) such that for any 𝑑 ∈ Mem[𝑋 ∪

𝑌 ∪ 𝑆], 𝑚 ∈ Mem[Val] such that 𝑑 ⊲⊳ 𝑚 is defined, let

𝑔(𝑑) (𝑚) = 𝜇(Val = 𝑚 | 𝑋 ∪ 𝑌 ∪ 𝑆 = 𝑑)

Then, ( 𝑓1 ⊙ ( 𝑓2 ⊕ 𝑓3)) ⊙ 𝑔 is defined, and

( 𝑓1 ⊙ ( 𝑓2 ⊕ 𝑓3) ⊙ 𝑔) (⟨⟩)(𝑚) = ( 𝑓1 ⊙ ( 𝑓2 ⊕ 𝑓3)) (⟨⟩)(𝑚𝑋∪𝑌∪𝑆) · 𝑔(𝑚𝑋∪𝑌∪𝑆) (𝑚)

= 𝜇(Val = 𝑚)

Thus, ( 𝑓1 ⊙ ( 𝑓2 ⊕ 𝑓3)) ⊙ 𝑔 = 𝑓𝜇, and therefore 𝑓1 ⊙ ( 𝑓2 ⊕ 𝑓3) ⊑ 𝑓𝜇.

This completes the proof for the backwards direction. □

C.2.3 Validating Graphoid Axioms, Section 4.2.3

Lemma C.2.9 (Weak Union). The following judgment is valid in X𝐶𝐼 :

|= [𝑍] # ( [𝑋] ∗ [𝑌 ∪𝑊]) → [𝑍 ∪𝑊] # ( [𝑋] ∗ [𝑌 ])

Proof. For any 𝑓 ∈ X𝐶𝐼 , if 𝑓 |= [𝑍] # ( [𝑋] ∗ [𝑌 ∪ 𝑊]), by lemma C.2.6, there

exist 𝑓1, 𝑓2, 𝑓3 ∈ 𝑀 such that 𝑓1 ⊙ ( 𝑓2 ⊕ 𝑓3) ⊑ 𝑓 , 𝑓1 : Mem[∅] → D(Mem[𝑍]),

𝑓2 : Mem[𝑍] → D(Mem[𝑍 ∪ 𝑋]), 𝑓3 : Mem[𝑍] → D(Mem[𝑍 ∪ 𝑌 ∪𝑊]).

290


Let 𝑓 1
3 = 𝜋𝑍∪𝑊 𝑓3, then by Disintegration there exists 𝑓 2

3 ∈ 𝑀 such that 𝑓3 =

𝑓 1
3 ⊙ 𝑓

2
3 . We note that

𝑓1 ⊙ ( 𝑓2 ⊕ 𝑓3) = 𝑓1 ⊙ 𝑓3 ⊙ (unit𝑍∪𝑌∪𝑊 ⊕ 𝑓2) (By lemma C.2.4)

= 𝑓1 ⊙ 𝑓3 ⊙ (unit𝑌∪𝑊 ⊕ 𝑓2) (By dom( 𝑓2) = 𝑍)

= 𝑓1 ⊙ ( 𝑓 1
3 ⊙ 𝑓

2
3 ) ⊙ (unit𝑌∪𝑊 ⊕ 𝑓2)

= 𝑓1 ⊙ 𝑓 1
3 ⊙ ( 𝑓

2
3 ⊙ (unit𝑌∪𝑊 ⊕ 𝑓2))

= 𝑓1 ⊙ 𝑓 1
3 ⊙ (( 𝑓2 ⊕ unit𝑊 ) ⊕ 𝑓 2

3 ) (†)

where † follows from lemma C.2.3 and dom( 𝑓2 ⊕ unit𝑊 ) = 𝑍 ∪𝑊 ⊆ range( 𝑓 1
3 ).

Thus, 𝑓1 ⊙ 𝑓 1
3 ⊙ (( 𝑓2 ⊕ unit𝑊 ) ⊕ 𝑓 2

3 ) ⊑ 𝑓 .

Note that 𝑓1 ⊙ 𝑓 1
3 has type Mem[∅] → D(Mem[𝑍 ∪𝑊]), so 𝑓1 ⊙ 𝑓 1

3 |= (∅ ⊲

𝑍 ∪𝑊). State 𝑓2 ⊕ unit𝑊 has type Mem[𝑍 ∪ 𝑊] → D(Mem[𝑍 ∪ 𝑊 ∪ 𝑋]), so

𝑓2 ⊕unit𝑊 |= (𝑍 ∪𝑊 ⊲ 𝑋). State 𝑓 2
3 has type Mem[𝑍 ∪𝑊] → D(Mem[𝑍 ∪𝑊 ∪𝑌 ]),

so 𝑓 2
3 |= (𝑍 ∪𝑊 ⊲ 𝑌 ). Therefore,

𝑓1 ⊙ 𝑓 1
3 ⊙ (( 𝑓2 ⊕ unit𝑊 ) ⊕ 𝑓 2

3 ) |= (∅ ⊲ 𝑍 ∪𝑊) # (𝑍 ∪𝑊 ⊲ 𝑋) ∗ (𝑍 ∪𝑊 ⊲ 𝑌 ).

By persistence, 𝑓 |= [𝑍 ∪𝑊] # ( [𝑋] ∗ [𝑌 ]), and Weak Union is valid. □

Lemma C.2.10 (Contraction). The following judgment is valid in X𝐶𝐼 :

|= ( [𝑍] # ( [𝑋] ∗ [𝑌 ])) ∧ ([𝑍 ∪ 𝑌 ] # ( [𝑋] ∗ [𝑊])) → [𝑍] # ( [𝑋] ∗ [𝑌 ∪𝑊])

Proof. If ℎ |= ( [𝑍] # ( [𝑋] ∗ [𝑌 ])) ∧ ([𝑍 ∪ 𝑌 ] # ( [𝑋] ∗ [𝑊])), then

• ℎ |= [𝑍] # ( [𝑋] ∗ [𝑌 ]). By the Classical flavor in intuitionistic model lemma,

there exists 𝑓1, 𝑓2, 𝑓3 such that 𝑓1 : Mem[∅] → D(Mem[𝑍]), 𝑓2 : Mem[𝑍] →

D(Mem[𝑍 ∪ 𝑋]), 𝑓3 : Mem[𝑍] → D(Mem[𝑍 ∪ 𝑌 ]), and 𝑓1 ⊙ ( 𝑓2 ⊕ 𝑓3) ⊑ ℎ.

Note 𝑓1 ⊙ ( 𝑓2 ⊕ 𝑓3) has type Mem[∅] → D(Mem[𝑍 ∪ 𝑌 ∪ 𝑍]).

291


• ℎ |= [𝑍 ∪ 𝑌 ] # ( [𝑋] ∗ [𝑊]). By lemma C.2.6, there exists 𝑔1, 𝑔2, 𝑔3 such that

𝑔1 : Mem[∅] → D(Mem[𝑍 ∪𝑌 ]), 𝑔2 : Mem[𝑍 ∪𝑌 ] → D(Mem[𝑍 ∪𝑌 ∪ 𝑋]),

𝑔3 : Mem[𝑍 ∪ 𝑌 ] → D(Mem[𝑍 ∪ 𝑌 ∪𝑊]), and 𝑔1 ⊙ (𝑔2 ⊕ 𝑔3) ⊑ ℎ.

Note 𝑔1 ⊙ 𝑔2 has type Mem[∅] → D(Mem[𝑍 ∪ 𝑌 ∪ 𝑋]).

By lemma C.2.2, 𝑓1 ⊙ ( 𝑓2 ⊕ 𝑓3) = 𝑔1 ⊙ 𝑔2.

𝑔1 ⊙ (𝑔2 ⊕ 𝑔3) = 𝑔1 ⊙ (𝑔2 ⊕ unit𝑍∪𝑌 ) ⊙ (unit𝑍∪𝑌∪𝑋 ⊕ 𝑔3) (By lemma C.2.4)

= 𝑔1 ⊙ 𝑔2 ⊙ (unit𝑍∪𝑋 ⊕ 𝑔3)

(Because 𝑍 ∪ 𝑌 ⊆ dom(𝑔2), 𝑌 ⊆ dom(𝑔3))

= 𝑓1 ⊙ ( 𝑓2 ⊕ 𝑓3) ⊙ (unit𝑍∪𝑋 ⊕ 𝑔3) ( 𝑓1 ⊙ ( 𝑓2 ⊕ 𝑓3) = 𝑔1 ⊙ 𝑔2)

= 𝑓1 ⊙
(
( 𝑓2 ⊙ unit𝑍∪𝑋) ⊕ ( 𝑓3 ⊙ 𝑔3)

)
(By C.1.8)

= 𝑓1 ⊙
(
𝑓2 ⊕ ( 𝑓3 ⊙ 𝑔3)

)
By their types, it is easy to see that 𝑓1 |= (∅ ⊲ 𝑍), 𝑓2 |= (𝑍 ⊲ 𝑋), 𝑓3 ⊙ 𝑔3 |= (𝑍 ⊲

𝑌 ∪𝑊). So,

𝑓1 ⊙ ( 𝑓2 ⊕ ( 𝑓3 ⊙ 𝑔3)) |= [𝑍] # ( [𝑋] ∗ [𝑌 ∪𝑊]).

Also, note that ℎ ⊒ 𝑔1 ⊙ (𝑔2 ⊕ 𝑔3) = 𝑓1 ⊙ ( 𝑓2 ⊕ ( 𝑓3 ⊙ 𝑔3)), so by persistence,

ℎ |= (∅ ⊲ 𝑍) # ((𝑍 ⊲ 𝑋) ∗ (𝑍 ⊲ 𝑌 ∪𝑊)). □

C.3 CPSL Assertion Logic

For the proof of Theorem 4.3.1, we need the following characterization of 𝑔 ⊑ 𝑓 .

Proposition C.3.1. Let 𝑓 be a Markov kernel, and let 𝐷 ⊆ dom( 𝑓 ) ⊆ 𝑅 ⊆ range( 𝑓 ).

Then we have 𝜋𝑅 ( 𝑓 (𝑚)) = 𝑔(𝑚′) for all 𝑚′ ∈ Mem[𝐷], 𝑚 ∈ Mem[dom( 𝑓 )] such that

𝑚𝐷 = 𝑚′ if and only if 𝑔 ⊑ 𝑓 and dom(𝑔) = 𝐷, range(𝑔) = 𝑅.

292


Proof. For the reverse direction, suppose that 𝑓 = (𝑔 ⊕ unit𝑆) ⊙ 𝑣, with 𝑆 disjoint

from dom(𝑔). Since range(𝑔) ⊆ dom(𝑣), we have:

𝜋𝑅 ( 𝑓 (𝑚)) = 𝜋𝑅 ((𝑔 ⊕ unit𝑆) (𝑚))

= 𝜋𝑅 (𝑔(𝑚𝐷) ⊕ unit𝑆 (𝑚𝑆))

= 𝜋𝑅 (𝑔(𝑚𝐷)) ⊗ 𝜋𝑅 (unit𝑆 (𝑚𝑆))

= 𝑔(𝑚𝐷)

= 𝑔(𝑚′).

For the forward direction, evidently dom(𝑔) = 𝐷 and range(𝑔) = 𝑅. Since 𝑓

preserves input to output, we have 𝜋dom( 𝑓 ) (𝑔(𝑚′)) = 𝜋dom( 𝑓 ) ( 𝑓 (𝑚)) = unit(𝑚′) so

𝑔 preserves input to output and 𝑔 is a Markov kernel. We claim that 𝑔 ⊑ 𝑓 . First,

consider 𝑔 ⊕ unitdom( 𝑓 )\𝐷 ; write 𝐷′ = dom( 𝑓 ) \ 𝐷. For any 𝑚 ∈ Mem[dom( 𝑓 )],

we have:

𝜋𝐷′∪𝑅 ( 𝑓 (𝑚)) = 𝜋𝑅 ( 𝑓 (𝑚)) ⊗ 𝜋𝐷′ ( 𝑓 (𝑚))

= 𝑔(𝑚𝐷) ⊗ unit𝐷′ (𝑚𝐷′)

= (𝑔 ⊕ unit𝐷′) (𝑚).

So by lemma C.2.1, for every 𝑚 ∈ Mem[dom( 𝑓 )] there exists a family of kernels

𝑔′𝑚 : Mem[𝐷′ ∪ 𝑅] → D(Mem[range( 𝑓 )]) such that

𝑓 (𝑚) = bind((𝑔 ⊕ unit𝐷′) (𝑚), 𝑔′𝑚)

Defining 𝑔′(𝑚) ≜ 𝑔′
𝑚dom( 𝑓 ) (𝑚), we have:

𝑓 (𝑚) = ((𝑔 ⊕ unit𝐷′) ⊙ 𝑔′) (𝑚)

and so 𝑔 ⊑ 𝑓 . □

293


C.3.1 Restriction

Theorem 4.3.1 (Restriction in DIBI+). Let 𝑃 ∈ DIBI+ with atomic propositions (𝜙 ⊲

𝜓), as described above. Then 𝑓 |= 𝑃 if and only if there exists 𝑓 ′ ⊑ 𝑓 such that

range( 𝑓 ′) ⊆ FV(𝑃) and 𝑓 ′ |= 𝑃.

Proof. The reverse direction is immediate from persistence. For the forward

direction, we argue by induction with a stronger hypothesis. If 𝑓 |= 𝑃, we

call a state 𝑓 ′ a witness of 𝑓 |= 𝑃 if 𝑓 ′ ⊑ 𝑓 , FVR(𝑃) ⊆ range( 𝑓 ′) ⊆ FV(𝑃),

dom( 𝑓 ′) ⊆ FVD(𝑃), and 𝑓 ′ |= 𝑃. We show that 𝑓 |= 𝑃 implies that there is a

witness 𝑓 ′ |= 𝑃, by induction on 𝑃.

Case (𝐷 ⊲ 𝑅): We will use two basic facts, both following from the form of the

domain and range assertions:

1. If 𝑚 |=𝑑 𝐷, then 𝑑𝑜𝑚(𝑚) = FV(𝐷).

2. If 𝜇 |=𝑟 𝑅, then 𝑑𝑜𝑚(𝜇) ⊇ FV(𝐷).

𝑓 |= (𝐷 ⊲ 𝑅) implies that there exists 𝑓 ′ ⊑ 𝑓 such that for any 𝑚 ∈ 𝑀𝑑 such

that 𝑚 |=𝑑 𝐷, 𝑓 ′(𝑚) is defined and 𝑓 ′(𝑚) |=𝑟 𝑅.

Let 𝑇 = range( 𝑓 ′) ∩ (FV(𝐷) ∪ FV(𝑅)). We claim that 𝜋𝑇 𝑓 ′ is the desired

witness for 𝑓 |= 𝑃.

• 𝜋𝑇 𝑓
′ is defined and 𝜋𝑇 𝑓 ′ ⊑ 𝑓 because:

dom( 𝑓 ′) = 𝑑𝑜𝑚(𝑚) (for any 𝑚 ∈ 𝑀𝑑 such that 𝑚 |=𝑑 𝐷)

= FV(𝐷)

⊆ 𝑇.

Thus 𝜋𝑇 𝑓 ′ is defined, and 𝜋𝑇 𝑓 ′ ⊑ 𝑓 ′ ⊑ 𝑓 .

294


• range(𝜋𝑇 𝑓 ′) = 𝑇 ⊆ FV(𝐷) ∪ FV(𝑅) = FV(𝑃).

• 𝜋𝑇 𝑓
′ |= (𝐷 ⊲ 𝑅): For any 𝑚 ∈ 𝑀𝑑 such that 𝑚 |=𝑑 𝐷, 𝑓 ′(𝑚) is a

distribution. Based on the restriction theorem for probabilistic BI,

𝜋FV(𝑅)∩range( 𝑓 ′) ( 𝑓 ′(𝑚)) |= 𝑅 too. Since 𝑇 ⊇ FV(𝑅) ∩ range( 𝑓 ′), persis-

tence in 𝑀𝑟 , implies 𝜋𝑇 ( 𝑓 ′(𝑚)) |= 𝑅. By definition of marginalization

on kernels, (𝜋𝑇 𝑓 ′) (𝑚) = 𝜋𝑇 ( 𝑓 ′(𝑚)). Since (𝜋𝑇 𝑓 ′) (𝑚) |= 𝑅, we have

𝜋𝑇 𝑓
′ |= (𝐷 ⊲ 𝑅) as well.

• FVD(𝑃) = FV(𝐷), so dom(𝜋𝑇 𝑓 ′) = dom(𝑚) = FV(𝐷) = FVD(𝑃).

• FVR(𝑃) = FV(𝐷 ⊲ 𝑅) = FV(𝐷) ∪ FV(𝑅), so

range(𝜋𝑇 𝑓 ′) ⊇ dom((𝜋𝑇 𝑓 ′) (𝑚)) (for any 𝑚 ∈ 𝑀𝑑 such that 𝑚 |=𝑑 𝐷)

⊇ FV(𝐷) ∪ FV(𝑅) (By (𝜋𝑇 𝑓 ′) (𝑚) |= 𝑅)

= FVR(𝑃).

so 𝜋𝑇 𝑓 ′ is a desired witness for 𝑓 |= 𝑃.

Case 𝑄 ∧ 𝑅: Assuming FVR(𝑄) = FV(𝑄) = FVR(𝑅) = FV(𝑅). By definition,

𝑓 |= 𝑄 ∧ 𝑅 implies that 𝑓 |= 𝑄 and 𝑓 |= 𝑅. By induction, there exists

𝑓 ′ ⊑ 𝑓 such that FVR(𝑄) = range( 𝑓 ′) = FV(𝑄), dom( 𝑓 ′) ⊆ FVD(𝑄), and

𝑓 ′ |= 𝑄, and there exists 𝑓 ′′ ⊑ 𝑓 such that FVR(𝑅) = range( 𝑓 ′′) = FV(𝑅),

dom( 𝑓 ′′) ⊆ FVD(𝑅) and 𝑓 ′′ |= 𝑅. Thus, range( 𝑓 ′) = range( 𝑓 ′′).

Note that dom( 𝑓 ′) = dom( 𝑓 ) ∩ range( 𝑓 ′) because in our models, 𝑓 ′ ⊑ 𝑓

implies that there exists 𝑆 and some 𝑣 such that 𝑓 = ( 𝑓 ′ ⊕ 𝜂𝑆) ⊙ 𝑣,

and we can make 𝑆 disjoint of dom( 𝑓 ′) and range( 𝑓 ′) wolog. Then,

dom( 𝑓 ) = dom( 𝑓 ′ ⊕ 𝑆) = dom( 𝑓 ′) ∪ 𝑆, and range( 𝑓 ′) = range( 𝑓 ′ ⊕ 𝑆) \ 𝑆,

so dom( 𝑓 ) ∪ range( 𝑓 ′) ⊆ dom( 𝑓 ′). Meanwhile, since dom( 𝑓 ′) ⊆ dom( 𝑓 )

and dom( 𝑓 ′) ⊆ range( 𝑓 ′), dom( 𝑓 ′) ⊆ dom( 𝑓 ) ∩ range( 𝑓 ′). So dom( 𝑓 ′) =

295


dom( 𝑓 ) ∩ range( 𝑓 ′). Similarly, dom( 𝑓 ′′) ⊆ dom( 𝑓 ) ∩ range( 𝑓 ′′), so

range( 𝑓 ′) = range( 𝑓 ′′) implies that dom( 𝑓 ′) = dom( 𝑓 ′).

Since dom( 𝑓 ′) = dom( 𝑓 ′′) and range( 𝑓 ′) = range( 𝑓 ′′), proposition C.3.1

implies that 𝑓 ′ = 𝑓 ′′. This is the desired witness: 𝑓 ′ = 𝑓 ′′ |= 𝑄 and 𝑓 ′ =

𝑓 ′′ |= 𝑅.

Case 𝑄 ∨ 𝑅: 𝑓 |= 𝑄 ∨ 𝑅 implies that 𝑓 |= 𝑄 or 𝑓 |= 𝑅.

Without loss of generality, suppose 𝑓 |= 𝑄. By induction, there exists 𝑓 ′ ⊑

𝑓 such that FVR(𝑄) ⊆ range( 𝑓 ′) ⊆ FV(𝑄), dom( 𝑓 ′) ⊆ FVD(𝑄). Then:

range( 𝑓 ′) ⊆ FV(𝑄) ∪ FV(𝑅) = FV(𝑃)

range( 𝑓 ′) ⊇ FVR(𝑄) ∩ FVR(𝑅) = FVR(𝑃)

dom( 𝑓 ′) ⊆ FV(𝑄) ∪ FV(𝑅) = FVD(𝑃).

Thus, 𝑓 ′ is a desired witness.

Case 𝑄 # 𝑅: Assuming FVD(𝑅) ⊆ FVR(𝑄).

𝑓 |= 𝑄 # 𝑅 implies that there exists 𝑓1, 𝑓2 such that 𝑓1 ⊙ 𝑓2 = 𝑓 , 𝑓1 |= 𝑄,

and 𝑓2 |= 𝑅. 𝑓1 ⊙ 𝑓2 is defined so range( 𝑓1) = dom( 𝑓2). By induction,

there exists 𝑓 ′1 ⊑ 𝑓1 such that 𝑓 ′1 |= 𝑄, FVR(𝑄) ⊆ range( 𝑓 ′1) ⊆ FV(𝑄) and

dom( 𝑓 ′1) ⊆ FVD(𝑄), and there exists 𝑓 ′2 ⊑ 𝑓2 such that 𝑓 ′2 |= 𝑄, FVR(𝑅) ⊆

range( 𝑓 ′2) ⊆ FV(𝑅), and dom( 𝑓 ′2) ⊆ FVD(𝑅).

Now, 𝑓̂ = 𝑓 ′1 ⊙ ( 𝑓
′
2 ⊕ unitrange( 𝑓 ′1 )\dom( 𝑓 ′2 )) is defined because dom( 𝑓 ′2) ⊆

FVD(𝑅) ⊆ FVR(𝑄) ⊆ range( 𝑓 ′1). Then, we have

𝑓̂ |= 𝑄 # 𝑅

range( 𝑓̂ ) = range( 𝑓 ′1) ∪ range( 𝑓 ′2) ⊆ FV(𝑄) ∪ FV(𝑅) = FV(𝑃)

range( 𝑓̂ ) = range( 𝑓 ′1) ∪ range( 𝑓 ′2) ⊇ FVR(𝑄) ∪ FVR(𝑅) = FVR(𝑃)

dom( 𝑓̂ ) = dom( 𝑓 ′1) ⊆ FVD(𝑄) = FVD(𝑃).

296


𝑓 ′1 ⊑ 𝑓 , 𝑓 ′2 ⊕ unitrange( 𝑓 ′1 )\dom( 𝑓 ′2 )⊕ ⊑ 𝑓2, so by lemma C.2.5, 𝑓̂ = 𝑓 ′1 ⊙ ( 𝑓
′
2 ⊕

unitrange( 𝑓 ′1 )\dom( 𝑓 ′2 )) ⊑ 𝑓1 ⊙ 𝑓2 = 𝑓 .

Thus, 𝑓̂ is a desired witness.

Case 𝑄 ∗ 𝑅: 𝑓 |= 𝑄 ∗ 𝑅 implies that there exists 𝑓1, 𝑓2 such that 𝑓1 ⊕ 𝑓2 ⊑ 𝑓 ,

𝑓1 |= 𝑄, and 𝑓2 |= 𝑅.

By induction, there exists 𝑓 ′1 ⊑ 𝑓1 such that 𝑓 ′1 |= 𝑄, FVR(𝑄) ⊆ range( 𝑓 ′1) ⊆

FV(𝑄) and dom( 𝑓 ′1) ⊆ FVD(𝑄), and there exists 𝑓 ′2 ⊑ 𝑓2 such that 𝑓 ′2 |= 𝑄,

FVR(𝑅) ⊆ range( 𝑓 ′2) ⊆ FV(𝑅), and dom( 𝑓 ′2) ⊆ FVD(𝑅). By downwards

closure of ⊕, 𝑓 ′1 ⊕ 𝑓
′
2 is defined and 𝑓 ′1 ⊕ 𝑓

′
2 ⊑ 𝑓1 ⊕ 𝑓2 ⊑ 𝑓 . We have 𝑓 ′1 ⊕ 𝑓

′
2 |=

𝑄 ∗ 𝑅, and

range( 𝑓 ′1 ⊕ 𝑓
′
2) = range( 𝑓 ′1) ∪ range( 𝑓 ′2) ⊆ FV(𝑄) ∪ FV(𝑅) = FV(𝑃)

range( 𝑓 ′1 ⊕ 𝑓
′
2) = range( 𝑓 ′1) ∪ range( 𝑓 ′2) ⊇ FVR(𝑄) ∪ FVR(𝑅) = FVR(𝑃)

dom( 𝑓 ′1 ⊕ 𝑓
′
2) = dom( 𝑓 ′1) ∪ dom( 𝑓 ′2) ⊆ FVD(𝑄) ∪ FVD(𝑅) = FVD(𝑃).

Thus, 𝑓 ′1 ⊕ 𝑓
′
2 is a desired witness.

□

297


C.3.2 Extra Axioms

Proposition 4.3.2. The following axiom schemas for atomic propositions are valid.

(𝑆 : 𝑝𝑑 ⊲ 𝑝𝑟) ∧ (𝑆 : 𝑝′𝑑 ⊲ 𝑝
′
𝑟) → (𝑆 : 𝑝𝑑 ∧ 𝑝′𝑑 ⊲ 𝑝𝑟 ∧ 𝑝

′
𝑟) if FV(𝑝𝑟) = FV(𝑝′𝑟)

(AP-AND)

(𝑆 : 𝑝𝑑 ⊲ 𝑝𝑟) ∧ (𝑆 : 𝑝′𝑑 ⊲ 𝑝
′
𝑟) → (𝑆 : 𝑝𝑑 ∨ 𝑝′𝑑 ⊲ 𝑝𝑟 ∨ 𝑝

′
𝑟) (AP-OR)

(𝑆 : 𝑝𝑑 ⊲ 𝑝𝑟) ∗ (𝑆′ : 𝑝′𝑑 ⊲ 𝑝
′
𝑟) → (𝑆 ∪ 𝑆′ : 𝑝𝑑 ∧ 𝑝′𝑑 ⊲ 𝑝𝑟 ∗ 𝑝

′
𝑟) (AP-PAR)

𝑝′𝑑 → 𝑝𝑑 and |=𝑟 𝑝𝑟 → 𝑝′𝑟 implies |= (𝑆 : 𝑝𝑑 ⊲ 𝑝𝑟) → (𝑆 : 𝑝′𝑑 ⊲ 𝑝
′
𝑟) (AP-IMP)

Proof. We check each of the axioms.

Case: AP-AND. Suppose that 𝑤 |= (𝑆 : 𝑝𝑑 ⊲ 𝑝𝑟) ∧ (𝑆 : 𝑝′
𝑑
⊲ 𝑝′𝑟). By semantics of

atomic propositions, there exists 𝑤1 ⊑𝑘 𝑤 and 𝑤2 ⊑𝑘 𝑤 such that for all 𝑚 ∈

Mem[𝑆] such that 𝑚 |=𝑑 𝑝𝑑 ∧ 𝑝′𝑑 , we have 𝑤1(𝑚) |=𝑟 𝑝𝑟 and 𝑤2(𝑚) |=𝑟 𝑝′𝑟 .

By restriction (theorem 4.3.1), we may assume that range(𝑤1) = FV(𝑝𝑟) =

FV(𝑝′𝑟) = range(𝑤2). Thus, proposition C.3.1 implies that 𝑤1 = 𝑤2, and so

𝑤 |= (𝑆 : 𝑝𝑑 ∧ 𝑝′𝑑 ⊲ 𝑝𝑟 ∧ 𝑝
′
𝑟).

Case: AP-OR. Immediate, by semantics of ∨.

Case: AP-PAR. Suppose that 𝑤 |= (𝑆 : 𝑝𝑑 ⊲ 𝑝𝑟) ∗ (𝑆′ : 𝑝′
𝑑
⊲ 𝑝′𝑟). We will show

that 𝑤 |= (𝑆 ∪ 𝑆′ : 𝑝𝑑 ∗ 𝑝′𝑑 ⊲ 𝑝𝑟 ∗ 𝑝
′
𝑟).

By semantics of atomic propositions, there exists 𝑤1 ⊑𝑘 𝑤 and 𝑤2 ⊑𝑘 𝑤

such that 𝑤1 ⊕ 𝑤2 ⊑ 𝑤, and for all 𝑚1 ∈ Mem[𝑆] such that 𝑚1 |=𝑑 𝑝𝑑 , we

have 𝑤1(𝑚1) |=𝑟 𝑝𝑟 , and for all 𝑚2 ∈ Mem[𝑆′] such that 𝑚2 |=𝑑 𝑝′𝑑 , we have

𝑤2(𝑚2) |=𝑟 𝑝′𝑟 .

Now for any 𝑚 ∈ Mem[𝑆 ∪ 𝑆′] such that 𝑚 |=𝑑 𝑝𝑑 ∧ 𝑝′𝑑 , we have 𝑚𝑆 |=𝑑 𝑝𝑑

and 𝑚𝑆′ |=𝑑 𝑝′𝑑 . Thus 𝑤1(𝑚𝑆) |=𝑟 𝑝𝑟 and 𝑤2(𝑚𝑆′) |=𝑟 𝑝′𝑟 . Letting 𝑇 = 𝑆 ∩ 𝑆′

298


and 𝑇1 = 𝑆 \ 𝑇 ; 𝑇2 = 𝑆′ \ 𝑇 be disjoint sets, and noting that 𝑤1, 𝑤2 both

preserve inputs on 𝑇 , we have:

𝑤1 ⊕ 𝑤2(𝑚) = 𝜋𝑇1𝑤1(𝑚𝑆) ⊗ unit(𝑚𝑇 ) ⊗ 𝜋𝑇2𝑤2(𝑚𝑆′)

= (𝜋𝑇1𝑤1(𝑚𝑆) ⊗ unit(𝑚𝑇 )) ⊕𝑟 (unit(𝑚𝑇 ) ⊗ 𝜋𝑇2𝑤2(𝑚𝑆′))

= 𝑤1(𝑚𝑆) ⊕𝑟 𝑤2(𝑚𝑆′)

|=𝑟 𝑝𝑟 ∗ 𝑝′𝑟

Thus, 𝑤 |= (𝑆 ∪ 𝑆′ : 𝑝𝑑 ∗ 𝑝′𝑑 ⊲ 𝑝𝑟 ∗ 𝑝
′
𝑟).

Case: AP-IMP. Immediate, by semantics of→.

□

Proposition 4.3.4. (AXIOMS FOR DIBI+) The following axioms are sound, assuming

both precedent and antecedent are in DIBI+.

(𝑃 #𝑄) # 𝑅 → 𝑃 # (𝑄 ∗ 𝑅) (INDEP-1)

𝑃 #𝑄 → 𝑃 ∗ 𝑄 if FVD(𝑄) = ∅ (INDEP-2)

𝑃 #𝑄 → 𝑃 # (𝑄 ∗ (𝑆 ⊲ [𝑆])) (PAD)

(𝑃 ∗ 𝑄) # (𝑅 ∗ 𝑆) → (𝑃 # 𝑅) ∗ (𝑄 # 𝑆) (RESTEXCH)

Proof. We prove them one by one.

INDEP-1 We want to show that when (𝑃 # 𝑄) # 𝑅, 𝑃 # (𝑄 ∗ 𝑅) are both formula

in DIBI+, 𝑓 |= (𝑃 #𝑄) # 𝑅 implies 𝑓 |= 𝑃 # (𝑄 ∗ 𝑅).

By proof system of DIBI, 𝑓 |= (𝑃 #𝑄) # 𝑅 implies that 𝑓 |= 𝑃 #
(
𝑄 # 𝑅

)
. While

𝑃 #
(
𝑄 # 𝑅

)
may not satisfy the restriction property, that is okay because we

will only used conditions guaranteed by the fact that (𝑃 # 𝑄) # 𝑅, 𝑃 # (𝑄 ∗

299


𝑅) ∈ DIBI+. In particular, we rely on 𝑃,𝑄, 𝑅 each satisfies restriction, and

FVD(𝑄 ∗ 𝑅) ⊆ FVR(𝑃), which implies that

FVD(𝑅) ⊆ FVD(𝑄 ∗ 𝑅) ⊆ FVR(𝑃) (C.17)

𝑓 |= 𝑃 #
(
𝑄 # 𝑅

)
implies there exists 𝑓𝑝, 𝑓𝑞, 𝑓𝑟 such that 𝑓𝑝 |= 𝑃, 𝑓𝑞 |= 𝑄, and

𝑓𝑟 |= 𝑅, and 𝑓𝑝 ⊙ ( 𝑓𝑞 ⊙ 𝑓𝑟) = 𝑓 .

By restriction property theorem 4.3.1, 𝑓𝑞 |= 𝑄 implies that there exists 𝑓 ′𝑞 ⊑

𝑓𝑞 such that FVR(𝑄) ⊆ range( 𝑓 ′𝑞) ⊆ FV(𝑄) and dom( 𝑓 ′𝑞) ⊆ FVD(𝑄). 𝑓 ′𝑞 ⊑ 𝑓𝑞

so there exists 𝑣, 𝑇 such that 𝑓𝑞 = ( 𝑓 ′𝑞 ⊕𝑘 unit𝑇 ) ⊙ 𝑣.

Similarly, 𝑓𝑟 |= 𝑅, by theorem 4.3.1, there exists 𝑓 ′𝑟 ⊑ 𝑓𝑟 such that FVR(𝑅) ⊆

range( 𝑓 ′𝑟 ) ⊆ FV(𝑅) and dom( 𝑓 ′𝑟 ) ⊆ FVD(𝑅). 𝑓 ′𝑟 ⊑ 𝑓𝑟 so there exists 𝑢, 𝑆

such that 𝑓𝑟 = ( 𝑓 ′𝑟 ⊕𝑘 unit𝑆) ⊙ 𝑢.

Now, we claim that FVD(𝑅) ⊆ dom( 𝑓 ′𝑞 ⊕ unit𝑇 ):

By theorem 4.3.1 𝑓𝑝 |= 𝑃 implies that there exists 𝑓 ′𝑝 ⊑ 𝑓𝑝 such that

FVR(𝑃) ⊆ range( 𝑓 ′𝑝) ⊆ FV(𝑃), dom( 𝑓 ′𝑝) ⊆ 𝐹FV(𝑃), and 𝑓 ′𝑝 |= 𝑃. Thus,

FVR(𝑃) ⊆ range( 𝑓𝑝) = dom( 𝑓𝑞).

Recall that FVD(𝑅) ⊆ FVR(𝑃), so FVD(𝑅) ⊆ dom 𝑓𝑞 = dom 𝑓 ′𝑞 ⊕ unit𝑇 .

As a corollary, we have dom( 𝑓 ′𝑟 ) ⊆ FVD(𝑅) ⊆ dom( 𝑓 ′𝑞 ⊕ unit𝑇 ) ⊆ dom(𝑣),

300


and dom( 𝑓 ′𝑟 ) ⊆ FVD(𝑅) ⊆ dom( 𝑓 ′𝑞 ⊕ unit𝑇 ). Then,

𝑓𝑞 ⊙ 𝑓𝑟 =
(
( 𝑓 ′𝑞 ⊕ unit𝑇 ) ⊙ 𝑣

)
⊙

(
( 𝑓 ′𝑟 ⊕ unit𝑆) ⊙ 𝑢

)
= ( 𝑓 ′𝑞 ⊕ unit𝑇 ) ⊙

(
𝑣 ⊙ ( 𝑓 ′𝑟 ⊕ unit𝑆)

)
⊙ 𝑢

(By standard associativity of ⊙)

= ( 𝑓 ′𝑞 ⊕ unit𝑇 ) ⊙ ( 𝑓 ′𝑟 ⊕ 𝑣) ⊙ 𝑢

(By lemma C.2.3 and dom( 𝑓 ′𝑟 ) ⊆ dom(𝑣))

= ( 𝑓 ′𝑞 ⊕ unit𝑇 ) ⊙ (( 𝑓 ′𝑟 ⊙ unitrange( 𝑓 ′𝑟 )) ⊕ (unitdom(𝑣) ⊙ 𝑣) ⊙ 𝑢

= ( 𝑓 ′𝑞 ⊕ unit𝑇 ) ⊙ ( 𝑓 ′𝑟 ⊕ unitdom(𝑣)) ⊙ (unitrange( 𝑓 ′𝑟 ) ⊕ 𝑣) ⊙ 𝑢 (♥)

= (( 𝑓 ′𝑞 ⊕ unit𝑇 ) ⊕ 𝑓 ′𝑟 ) ⊙ (𝑣 ⊕ unitrange( 𝑓 ′𝑟 )) ⊙ 𝑢 (†)

= (( 𝑓 ′𝑞 ⊕ unit𝑇 ) ⊙ 𝑣) ⊕ ( 𝑓 ′𝑟 ⊙ unitrange( 𝑓 ′𝑟 )) ⊙ 𝑢 (♥)

= 𝑓𝑞 ⊕ 𝑓𝑟

where † follows from lemma C.2.3, dom( 𝑓 ′𝑟 ) ⊆ dom( 𝑓 ′𝑞 ⊕ unit𝑇 ) and exact

commutativity, ♥ follows from lemma C.1.8 and lemma C.1.9.

Thus, 𝑓𝑞 ⊙ 𝑓𝑟 |= 𝑄 ∗ 𝑅. And by satisfaction rules,

𝑓 |= 𝑃 # (𝑄 ∗ 𝑅)

INDEP-2 We want to show that under the special condition FVD(𝑄) = ∅, 𝑓 |=

𝑃 #𝑄 implies that 𝑓 |= 𝑃 ∗ 𝑄.

If 𝑓 |= 𝑃 #𝑄, then there exists 𝑓𝑝, 𝑓𝑞 such that 𝑓𝑝 ⊙ 𝑓𝑞 = 𝑓 and 𝑓𝑝 |= 𝑃, 𝑓𝑞 |= 𝑄.

By restriction property theorem 4.3.1, 𝑓𝑞 |= 𝑄 implies that there exists 𝑓 ′𝑞 ⊑

𝑓𝑞 such that FVR(𝑄) ⊆ range( 𝑓 ′𝑞) ⊆ FV(𝑄) and dom( 𝑓 ′𝑞) ⊆ FVD(𝑄). 𝑓 ′𝑞 ⊑ 𝑓𝑞

so there exists 𝑣, 𝑇 such that 𝑓𝑞 = ( 𝑓 ′𝑞 ⊕𝑘 unit𝑇 ) ⊙ 𝑣.

Since dom( 𝑓 ′𝑞) ⊆ FVD(𝑄) and FVD(𝑄) = ∅, it must dom( 𝑓 ′𝑞) = ∅, and thus

301


no matter what the domain of 𝑓𝑝 is, dom( 𝑓 ′𝑞) ⊆ dom( 𝑓𝑝). Thus,

𝑓𝑝 ⊙ 𝑓𝑞 = 𝑓𝑝 ⊙ ( 𝑓 ′𝑞 ⊕ unit𝑇 ) ⊙ 𝑣

= ( 𝑓𝑝 ⊕ 𝑓 ′𝑞) ⊕ 𝑣 (By lemma C.2.3 and dom( 𝑓 ′𝑞) ⊆ dom( 𝑓𝑝))

Thus, 𝑓𝑝 ⊕ 𝑓 ′𝑞 ⊑ 𝑓𝑝 ⊙ 𝑓𝑞 = 𝑓 . By satisfaction rules, 𝑓𝑝 |= 𝑃 and 𝑓 ′𝑞 |= 𝑄 implies

that 𝑓𝑝 ⊕ 𝑓 ′𝑞 |= 𝑃 ∗ 𝑄. Thus, by persistence, 𝑓 |= 𝑃 ∗ 𝑄

PAD We want to show that when 𝑃 # 𝑄, 𝑃 # (𝑄 ∗ (𝑆 ⊲ 𝑆)) are both in DIBI+,

𝑓 |= 𝑃 #𝑄 implies 𝑓 |= 𝑃 # (𝑄 ∗ (𝑆 ⊲ 𝑆)).

One key guarantee we rely on from the grammar of DIBI+ is that

FVD(𝑄) ∪ 𝑆 = FVD(𝑄 ∗ (𝑆 ⊲ 𝑆)) ⊆ FVR(𝑃).

When 𝑓 |= 𝑃 #𝑄, there exists 𝑓𝑝, 𝑓𝑞 such that 𝑓𝑝 ⊙ 𝑓𝑞 = 𝑓 and 𝑓𝑝 |= 𝑃, 𝑓𝑞 |= 𝑄,

By theorem 4.3.1, 𝑓𝑝 |= 𝑃 implies that there exists 𝑓 ′𝑝 ⊑ 𝑓𝑝 such that

FVR(𝑃) ⊆ range( 𝑓 ′𝑝) ⊆ FV(𝑃), dom( 𝑓 ′𝑝) ⊆ 𝐹FV(𝑃), and 𝑓 ′𝑝 |= 𝑃. By the

fact that 𝑓𝑝 ⊙ 𝑓𝑞 is defined, and that the definition of preorder in our con-

crete models, 𝑓 ′𝑝 ⊑ 𝑓𝑝 implies

dom( 𝑓𝑞) = range( 𝑓𝑝) ⊇ range( 𝑓 ′𝑝) ⊇ FVR(𝑃) ⊇ 𝑆

Since 𝑓𝑞 preserves input, 𝑆 ⊆ dom( 𝑓𝑞) implies that 𝑓𝑞 = 𝑓𝑞 ⊕unit𝑆, and thus

𝑓𝑝 ⊙ 𝑓𝑞 = 𝑓𝑝 ⊙ ( 𝑓𝑞 ⊕ unit𝑆).

Note that unit𝑆 |= (𝑆 ⊲ [𝑆]), and 𝑓𝑞 |= 𝑄. Thus, 𝑓𝑞 ⊕ unit𝑆 |= 𝑄 ∗ (𝑆 ⊲ [𝑆]).

Since 𝑓𝑝 |= 𝑃, it follows that

𝑓𝑝 ⊙ ( 𝑓𝑞 ⊕ unit𝑆) |= 𝑃 # (𝑄 ∗ (𝑆 ⊲ [𝑆]))

Since 𝑓 = 𝑓𝑝 ⊙ 𝑓𝑞 = 𝑓𝑝 ⊙ ( 𝑓𝑞 ⊕ unit𝑆),

𝑓 |= 𝑃 # (𝑄 ∗ (𝑆 ⊲ [𝑆]))

302


RESTEXCH We want to show that when (𝑃 ∗ 𝑄) # (𝑅 ∗ 𝑆) and (𝑃 #𝑅) ∗ (𝑄 #𝑆) are

both formula in DIBI+, 𝑓 |= (𝑃 ∗ 𝑄) # (𝑅 ∗ 𝑆) implies 𝑓 |= (𝑃 ∗ 𝑅) ∗ (𝑄 ∗ 𝑆).

The key properties that being in DIBI+ guarantees us are that

FVD(𝑅) ⊆ FVR(𝑃) FVD(𝑆) ⊆ FVR(𝑄)

FVD(𝑅 ∗ 𝑆) = FVD(𝑅) ∪ FVD(𝑆) ⊆ FVR(𝑃 ∗ 𝑄) = FVR(𝑃) ∪ FVR(𝑄)

If 𝑓 |= (𝑃 ∗ 𝑄) # (𝑅 ∗ 𝑆), then there exists 𝑓1, 𝑓2 such that 𝑓1 ⊙ 𝑓2 = 𝑓 ,

𝑓1 |= 𝑃 ∗ 𝑄, 𝑓2 |= 𝑅 ∗ 𝑆. That is, there exist 𝑢1, 𝑣1 such that 𝑢1 ⊕ 𝑣1 ⊑ 𝑓1,

𝑢1 |= 𝑃, and 𝑣1 |= 𝑄; there exist 𝑢2, 𝑣2 such that 𝑢2 ⊕ 𝑣2 ⊑ 𝑓2, 𝑢2 |= 𝑅, 𝑣2 |= 𝑆.

By theorem 4.3.1,

• 𝑢1 |= 𝑃 implies there exists 𝑢′1 ⊑ 𝑢1 such that FVR(𝑃) ⊆ range(𝑢′1) ⊆

FV(𝑃), dom(𝑢′1) ⊆ FVD(𝑃), and 𝑢′1 |= 𝑃.

• 𝑣1 |= 𝑄 implies there exists 𝑣′1 ⊑ 𝑣1 such that FVR(𝑄) ⊆ range(𝑣′1) ⊆

FV(𝑄), dom(𝑣′1) ⊆ FVD(𝑄), and 𝑣′1 |= 𝑄.

• 𝑢2 |= 𝑅 implies there exists 𝑢′2 ⊑ 𝑢2 such that FVR(𝑅) ⊆ range(𝑢′2) ⊆

FV(𝑅), dom(𝑢′2) ⊆ FVD(𝑅), and 𝑢′2 |= 𝑅.

• 𝑣2 |= 𝑆 implies there exists 𝑣′2 ⊑ 𝑣2 such that FVR(𝑆) ⊆ range(𝑣′2) ⊆

FV(𝑆), dom(𝑣′2) ⊆ FVD(𝑆), and 𝑣′2 |= 𝑆.

By Downwards closure property of ⊕, 𝑢′2 ⊕ 𝑣
′
2 is defined and 𝑢′2 ⊕ 𝑣

′
2 ⊑

𝑢2 ⊕ 𝑣2 ⊑ 𝑓2. Say that 𝑓1 = (𝑢1 ⊕ 𝑣1 ⊕ unit𝑆1) ⊙ ℎ1, 𝑓2 = (𝑢′2 ⊕ 𝑣
′
2 ⊕ unit𝑆2) ⊙ ℎ2.

Also,

dom(𝑢′2 ⊕ 𝑣
′
2) = dom(𝑢′2) ∪ dom(𝑣′2) ⊆ FVD(𝑅) ∪ FVD(𝑆) ⊆ FVR(𝑃) ∪ FVD(𝑄)

⊆ range(𝑢′1) ∪ range(𝑣′1) ⊆ range(𝑢1) ∪ range(𝑣1) = range(𝑢1 ⊕ 𝑣1)

303


Then

𝑓1 ⊙ 𝑓2 = (𝑢1 ⊕ 𝑣1 ⊕ unit𝑆1) ⊙ ℎ1 ⊙ (𝑢′2 ⊕ 𝑣
′
2 ⊕ unit𝑆2) ⊙ ℎ2

= (𝑢1 ⊕ 𝑣1 ⊕ unit𝑆1) ⊙ ((𝑢′2 ⊕ 𝑣
′
2) ⊕ ℎ1) ⊙ ℎ2 (♥)

= (𝑢1 ⊕ 𝑣1 ⊕ unit𝑆1) ⊙ ((𝑢′2 ⊕ 𝑣
′
2) ⊙ unitrange(𝑢′2⊕𝑣

′
2)) ⊕ (unitdom(ℎ1) ⊙ ℎ1) ⊙ ℎ2

= (𝑢1 ⊕ 𝑣1 ⊕ unit𝑆1) ⊙ (𝑢′2 ⊕ 𝑣
′
2 ⊕ unitdom(ℎ1)) ⊙ (unitrange(𝑢′2⊕𝑣

′
2) ⊕ ℎ1) ⊙ ℎ2

(†)

= (𝑢1 ⊕ 𝑣1 ⊕ unit𝑆1) ⊙ (𝑢′2 ⊕ 𝑣
′
2 ⊕ unitrange(𝑢1⊕𝑣1) ⊕ unit𝑆1) ⊙ (unitrange(𝑢′2⊕𝑣

′
2) ⊕ ℎ1) ⊙ ℎ2

=
(
((𝑢1 ⊕ 𝑣1) ⊙ (𝑢′2 ⊕ 𝑣

′
2 ⊕ unitrange(𝑢1⊕𝑣1))) ⊕ unit𝑆1

)
⊙ (unitrange(𝑢′2⊕𝑣

′
2) ⊕ ℎ1) ⊙ ℎ2

(†)

=
(
(𝑢1 ⊙ (𝑢′2 ⊕ unitrange(𝑢1))) ⊕ (𝑣1 ⊙ (𝑣′2 ⊕ unitrange(𝑣1))) ⊕ unit𝑆1

)
⊙ (unitrange(𝑢′2⊕𝑣

′
2) ⊕ ℎ1) ⊙ ℎ2

(† and exact commutativity, associativity)

where ♥ follows from lemma C.2.3, dom(𝑢′2 ⊕ 𝑣
′
2) ⊆ range(𝑢1 ⊕ 𝑣1) ⊆

dom(ℎ1), and † follows from lemma C.1.8 and lemma C.1.9.

Thus, (𝑢1 ⊙ (𝑢′2 ⊕ unitrange(𝑢1))) ⊕ (𝑣1 ⊙ (𝑣′2 ⊕ unitrange(𝑣1))) ⊑ 𝑓1 ⊙ 𝑓2. Recall

that 𝑢′2 |= 𝑅. By persistence, 𝑢′2 ⊕ unitrange(𝑢1) |= 𝑅. Similarly, 𝑣′2 |= 𝑆, so by

persistence, 𝑣′2 ⊕ unitrange(𝑣1) |= 𝑆. Therefore,

(𝑢1 ⊙ (𝑢′2 ⊕ unitrange(𝑢1))) ⊕ (𝑣1 ⊙ (𝑣′2 ⊕ unitrange(𝑣1))) |= (𝑃 # 𝑅) ∗ (𝑄 # 𝑆)

Then, by persistence, 𝑓 |= (𝑃 # 𝑅) ∗ (𝑄 # 𝑆).

□

Proposition 4.3.5. (AXIOMS FOR ATOMIC PROPOSITIONS) The following axioms

304


are sound. For any 𝑆, 𝐴, 𝐵, 𝐶 ⊆ Var,

(𝑆 ⊲ [𝐴] ∗ [𝐵]) → (𝑆 ⊲ [𝐴]) ∗ (𝑆 ⊲ [𝐵]) if 𝐴 ∩ 𝐵 ⊆ 𝑆 (REVPAR)

(𝑆 ⊲ [𝐴] ∗ [𝐵]) → (𝑆 ⊲ [𝐴 ∪ 𝐵]) (UNIONRAN)

(𝐴 ⊲ 𝐵) # (𝐵 ⊲ 𝐶) → (𝐴 ⊲ 𝐶) (ATOMSEQ)

(𝐴 ⊲ 𝐵) → (𝐴 ⊲ 𝐴) # (𝐴 ⊲ 𝐵) (UNITL)

(𝐴 ⊲ 𝐵) → (𝐴 ⊲ 𝐵) # (𝐵 ⊲ 𝐵) (UNITR)

Proof. We prove it one by one.

REVPAR Given any 𝑓 |= (𝑆 ⊲ [𝐴] ∗ [𝐵]), by satisfaction rules and semantic of

atomic propositions, there exists 𝑓 ′ ⊑ 𝑓 such that for all 𝑚 ∈ 𝑀𝑑 such that

𝑚 |=𝑑 𝑆, 𝑓 ′(𝑚) |=𝑟 [𝐴] ∗ [𝐵].

Since 𝑓 ′(𝑚) is defined and 𝑓 ′(𝑚) |=𝑟 [𝐴] ∗ [𝐵], it follows that dom( 𝑓 ′) = 𝑆

and range( 𝑓 ′) ⊇ 𝑆 ∪ 𝐴 ∪ 𝐵. Thus, we can define 𝑓1 = 𝜋𝑆∪𝐴 𝑓 ′, 𝑓2 = 𝜋𝑆∪𝐵 𝑓 ′.

Note that 𝑓1 |= (𝑆 ⊲ 𝐴), 𝑓2 |= (𝑆 ⊲ 𝐵). Also, because 𝐴 ∩ 𝐵 ⊆ 𝑆,

range( 𝑓1) ∩ range( 𝑓2) = (𝑆 ∪ 𝐴) ∩ (𝑆 ∪ 𝐵) = 𝑆,

and thus 𝑓1 ⊕ 𝑓2 is defined. We now want to show that 𝑓1 ⊕ 𝑓2 ⊑ 𝑓 .

Note 𝑓 ′(𝑚) |=𝑟 [𝐴] ∗ [𝐵] implies that there exists 𝜇1, 𝜇2 such that 𝜇1 ⊕𝑟

𝜇2 ⊑ 𝑓 ′(𝑚), and 𝑑𝑜𝑚(𝜇1) ⊇ 𝐴, 𝑑𝑜𝑚(𝜇2) ⊇ 𝐵. Since 𝑓 ′ preserves input on

its domain 𝑆, 𝜋𝑆 𝑓 ′(𝑚) = unit(𝑚), so (𝜇1 ⊕𝑟 unit(𝑚)) ⊕𝑟 (𝜇2 ⊕𝑟 unit(𝑚)) ⊑

𝑓 ′(𝑚) ⊕𝑟 unit(𝑚) ⊕𝑟 unit(𝑚) = 𝑓 ′(𝑚) too. Let 𝜇′1 = 𝜋𝐴∪𝑆 (𝜇1 ⊕𝑟 unit(𝑚)) and

𝜇′2 = 𝜋𝐵∪𝑆 (𝜇2 ⊕𝑟 unit(𝑚)). Then due to Downwards closure in 𝑀𝑑 , 𝜇′1 ⊕𝑟 𝜇
′
2

will also be defined, and

𝜇′1 ⊕𝑟 𝜇
′
2 ⊑ (𝜇1 ⊕𝑟 unit(𝑚)) ⊕𝑟 (𝜇2 ⊕𝑟 unit(𝑚)) ⊑ 𝑓 ′(𝑚),

305


which implies that 𝜇′1 ⊕𝑟 𝜇
′
2 = 𝜋𝑆∪𝐴∪𝐵 𝑓 ′(𝑚). In the range model, this means

that 𝜇′1 = 𝜋𝑆∪𝐴 𝑓 ′(𝑚), 𝜇′2 = 𝜋𝑆∪𝐵 𝑓 ′(𝑚).

Then for any 𝑚′ ∈ Mem[𝑆], any 𝑟 ∈ Mem[𝐴 ∪ 𝐵 ∪ 𝑆],

(𝜋𝑆∪𝐴∪𝐵 𝑓 ′) (𝑚′) (𝑟) = (𝜋𝑆∪𝐴∪𝐵 𝑓 ′(𝑚′)) (𝑟) = 𝜇′1 ⊕𝑟 𝜇
′
2(𝑟) = 𝜇

′
1(𝑟

𝑆∪𝐴) · 𝜇′2(𝑟
𝑆∪𝐵)

( 𝑓1 ⊕ 𝑓2) (𝑚′) (𝑟) = 𝑓1(𝑚′) (𝑟𝑆∪𝐴) · 𝑓2(𝑚′) (𝑟𝑆∪𝐵)

= (𝜋𝑆∪𝐴 𝑓 ′) (𝑚′) (𝑟𝑆∪𝐴) · (𝜋𝑆∪𝐵 𝑓 ′(𝑚′) (𝑟𝑆∪𝐵)

= 𝜇′1(𝑟
𝑆∪𝐴) · 𝜇′2(𝑟

𝑆∪𝐵)

Thus, 𝑓1 ⊕ 𝑓2 = 𝜋𝑆∪𝐴∪𝐵 𝑓 ′, which implies that 𝑓1 ⊕ 𝑓2 ⊑ 𝑓 . By their types,

𝑓1 ⊕ 𝑓2 |= (𝑆 ⊲ 𝐴) ∗ (𝑆 ⊲ 𝐵).

By persistence, 𝑓 |= (𝑆 ⊲ 𝐴) ∗ (𝑆 ⊲ 𝐵).

UNIONRAN Obvious from the semantics of atomic proposition and the range

logic.

ATOMSEQ Given any 𝑓 |= (𝐴 ⊲ 𝐵) # (𝐵 ⊲ 𝐶), by satisfaction rules and semantic

of atomic propositions, there exists

• 𝑓1, 𝑓2 such that 𝑓1 ⊙ 𝑓2 = 𝑓 ;

• 𝑓 ′1 ⊑ 𝑓1 such that for any 𝑚 ∈ 𝑀𝑑 such that 𝑚 |=𝑑 𝐴, 𝑓 ′1 (𝑚) |=𝑟 [𝐵].

• 𝑓 ′2 ⊑ 𝑓2 such that for any 𝑚 ∈ 𝑀𝑑 such that 𝑚 |=𝑑 𝐵, 𝑓 ′2 (𝑚) |=𝑟 D[𝐶].

Note that 𝑓 ′1 (𝑚) |=𝑟 [𝐵] implies that 𝐵 ⊆ range( 𝑓 ′1), so 𝜋𝐵 𝑓 ′1 is defined. Let

𝑓 ′′1 = 𝜋𝐵 𝑓
′
1.

Note that for any 𝑚 ∈ 𝑀𝑑 such that 𝑚 |=𝑑 𝐴, 𝑓 ′′1 (𝑚) |=𝑟 [𝐵] too, so 𝑓 ′′ |= (𝐴 ⊲

𝐵) too. Also, by transitivity, 𝑓 ′′1 ⊑ 𝑓 ′1 ⊑ 𝑓1.

306


Say 𝑓1 = ( 𝑓 ′′1 ⊕ 𝜂𝑆1) ⊙ 𝑣1, 𝑓2 = ( 𝑓 ′2 ⊕ 𝜂𝑆2) ⊙ 𝑣2, then since range( 𝑓 ′′1 ) = 𝐵 =

dom( 𝑓 ′2),

𝑓1 ⊙ 𝑓2 = ( 𝑓 ′′1 ⊕ 𝜂𝑆1) ⊙ 𝑣1 ⊙ ( 𝑓 ′2 ⊕ 𝜂𝑆2) ⊙ 𝑣2

= ( 𝑓 ′′1 ⊕ 𝜂𝑆1) ⊙ ( 𝑓 ′2 ⊕ 𝑣1) ⊙ 𝑣2

(By lemma C.2.3 and dom( 𝑓 ′2) = 𝐵 = range( 𝑓 ′′1 ) ⊆ dom(𝑣1))

= ( 𝑓 ′′1 ⊕ 𝜂𝑆1) ⊙ ( 𝑓 ′2 ⊕ 𝜂dom(𝑣1)) ⊙ (𝑣1 ⊕ 𝜂range( 𝑓1)) ⊙ 𝑣2

(By lemma C.2.4)

= ( 𝑓 ′′1 ⊕ 𝜂𝑆1) ⊙ ( 𝑓 ′2 ⊕ 𝜂𝑆) ⊙ (𝑣1 ⊕ 𝜂range( 𝑓1)) ⊙ 𝑣2

= (( 𝑓 ′′1 ⊙ 𝑓
′
2) ⊕ 𝜂𝑆1) ⊙ (𝑣1 ⊕ 𝜂range( 𝑓1)) ⊙ 𝑣2

So 𝑓 ′′1 ⊙ 𝑓
′
2 ⊑ 𝑓1 ⊙ 𝑓2 = 𝑓 .

𝑓 ′′1 : Mem[𝐴] → D(Mem[𝐵]), 𝑓 ′2 : Mem[𝐵] → D(Mem[range( 𝑓 ′2)])𝐴, so

𝑓 ′′1 ⊙ 𝑓
′
2 : Mem[𝐴] → D(Mem[range( 𝑓 ′2)]). Since range( 𝑓 ′2) ⊇ 𝐶, it follows

that 𝑓 ′′1 ⊙ 𝑓
′
2 |= (𝐴 ⊲ 𝐶), and thus 𝑓 |= (𝐴 ⊲ 𝐶) by persistence.

UNITL If 𝑓 |= (𝐴 ⊲ 𝐵), then there must exists 𝑓 ′ ⊑ 𝑓 such that for all 𝑚 ∈ 𝑀𝑑

such that 𝑚 |= 𝐴, 𝑓 ′(𝑚) |=𝑟 [𝐵].

Given any witness 𝑓 ′, 𝑓 ′ = unitMem[𝐴] ⊙ 𝑓 ′, and also 𝑓 ′ |=𝑟 (𝐴 ⊲ 𝐵).

Note that unitMem[𝐴] |=𝑟 (𝐴 ⊲ 𝐴), so 𝑓 ′ = unitMem[𝐴] ⊙ 𝑓 ′ |= (𝐴 ⊲ 𝐴) # (𝐴 ⊲ 𝐵).

UNITR Analogous as the UNITL case, except that now using the fact 𝑓 ′ = 𝑓 ′ ⊙

unitMem[𝐵] for any 𝑓 ′ : Mem[𝐴] → D(Mem[𝐵]).

□

307


C.4 CPSL Soundness

Definition 4.3.6 (CPSL Validity). A CPSL judgment {𝑃} 𝑐 {𝑄} is valid, written

|= {𝑃} 𝑐 {𝑄}, if for every input distribution 𝜇 ∈ D(Mem[Var]) such that the

lifted input 𝑓𝜇 ≜ ⟨⟩ ↦→ 𝜇 satisfies 𝑓𝜇 |= 𝑃, the lifted output satisfies 𝑓⟦𝑐⟧𝜇 |= 𝑄.

Now, we are ready to prove soundness of CPSL.

Theorem 4.3.3 (CPSL Soundness). CPSL is sound: derivable judgments are valid.

Proof. By induction on the derivation. Throughout, we write 𝜇 : Mem[Var] for

the input distribution and 𝑓 : Mem[∅] → D(Mem[Var]) for the kernel obtained

by lifting the input distribution, and we assume that 𝑓 satisfies the pre-condition

of the conclusion.

ASSN By restriction (theorem 4.3.1), there exists 𝑘1 ⊑ 𝑓 such that

FV(𝑒) ⊆ FVR(𝑃) ⊆ range(𝑘1) ⊆ FV(𝑃).

Since 𝑓 has empty domain, we have 𝑓 = 𝑘1 ⊙ 𝑘2 for some 𝑘2 :

Mem[range(𝑘1)] → D(Mem[Var]). Let 𝑓 ′ = ⟨⟩ ↦→ ⟦𝑥 ← 𝑒⟧(𝜇) be the

lifted output. By the semantics of the program and associativity, we have:

𝑓 ′ = ⟨⟩ ↦→ bind(𝜇, 𝑚 ↦→ unit(𝑚 [𝑥 ↦→ ⟦𝑒⟧(𝑚)]))

= 𝜋Var\{𝑥} 𝑓 ⊙ (𝑚1 ↦→ unit(𝑚1 ∪ (𝑥 ↦→ ⟦𝑒⟧(𝑚1)))︸                                        ︷︷                                        ︸
𝑔1

⊕𝑚2 ↦→ unit(𝑚2)︸              ︷︷              ︸
𝑔2

)

where 𝑚 : Mem[Var], 𝑚1 : Mem[FV(𝑒)], and 𝑚2 : Mem[(Var \ {𝑥}) \

FV(𝑒)]. The maps 𝑔1 and 𝑔2 evidently preserves input to output and are

thus kernels. Also, because range(𝑘1) ⊆ FV(𝑃) ⊆ (Var \ {𝑥}) and 𝑘1 ⊑ 𝑓 ,

308


we have 𝑘1 ⊑ 𝜋Var\{𝑥} 𝑓 ; in addition, since 𝑘1 |= 𝑃, we have 𝑔 |= 𝑃 by

persistence. Since 𝑔1 ⊑ 𝑔1 ⊕ 𝑔2 and 𝑔1 |= (FV(𝑒) ⊲ 𝑥 = 𝑒), we have 𝑔1 ⊕ 𝑔2 |=

(FV(𝑒) ⊲ 𝑥 = 𝑒) as well. Thus, we conclude 𝑓 ′ |= 𝑃 # (FV(𝑒) ⊲ 𝑥 = 𝑒).

SAMP By restriction (theorem 4.3.1), there exists 𝑘1 ⊑ 𝑓 such that FVR(𝑃) ⊆

range(𝑘1) ⊆ FV(𝑃); let 𝐾 = range(𝑘1). Since 𝑓 has empty domain, we

have 𝑓 = 𝑘1 ⊙ 𝑘2 for some 𝑘2 : Mem[𝐾] → D(Mem[Var]). Let 𝑓 ′ = ∅ ↦→

⟦𝑥 $← 𝑑⟧(𝜇) be the lifted output. We have:

𝑓 ′ = ⟨⟩ ↦→ bind(𝜇, 𝑚 ↦→ bind(⟦𝑑⟧, 𝑣 ↦→ unit(𝑚 [𝑥 ↦→ 𝑣])))

= 𝑓 ⊙ (𝑚 ↦→ bind(⟦𝑑⟧, 𝑣 ↦→ unit(𝑚 [𝑥 ↦→ 𝑣])))

= 𝜋Var\{𝑥} 𝑓 ⊙ ((𝑚1 ↦→ unit(𝑚1)) ⊕ bind(⟦𝑑⟧, 𝑣 ↦→ unit(𝑥 ↦→ 𝑣)))

where 𝑚 : Mem[Var], 𝑚1 : Mem[FV(𝑑)], and 𝑚2 : Mem[(Var \ FV(𝑑)) \

{𝑥}]. Again, 𝑔1 and 𝑔2 evidently preserves input to the output and thus are

kernels . Because range(𝑘1) ⊆ Var\ {𝑥} and 𝑘1 ⊑ 𝑓 , we have 𝑘1 ⊑ 𝜋Var\{𝑥} 𝑓 .

Because 𝑘1 ⊑ 𝜋Var\{𝑥} 𝑓 and 𝑘1 |= 𝑃, we have 𝜋Var\{𝑥} 𝑓 |= 𝑃 by persistence.

Since 𝑔1 ⊑ 𝑔1 ⊕ 𝑔2 and 𝑔1 |= (∅ ⊲ 𝑥 $∼ 𝑑), we have 𝑔1 ⊕ 𝑔2 |= (∅ ⊲ 𝑥 $∼ 𝑑) as

well. Thus, we conclude 𝑓 ′ |= 𝑃 # (∅ ⊲ 𝑥 $∼ 𝑑).

SKIP Trivial.

SEQ Trivial.

COND At the high level, we proceed the proof in three steps: first, we show

that for any 𝑓 satisfying (∅ ⊲ [𝑏]); 𝑃, there exists 𝑗1, 𝑗2 such that 𝑓 = 𝑗1 ⊙

𝑗2 and range( 𝑗1) = {𝑏}; second, we describe exactly two kernels 𝑙tt and

𝑙ff such that 𝑓𝜇 |⟦𝑏=tt⟧ = 𝑙tt ⊙ 𝑗2 and 𝑓𝜇 |⟦𝑏=ff⟧ = 𝑙ff ⊙ 𝑗2; last, we compute

𝑓⟦if 𝑏 then 𝑐 else 𝑐′⟧𝜇 and show that it satisfies the post-condition.

Since all assertions are in DIBI+, we have FVD(𝑃) ⊆ FVR(∅ ⊲ [𝑏]) = {𝑏}.

Since 𝑓 |= (∅ ⊲ [𝑏]) # 𝑃, there exists 𝑘1, 𝑘2 such that 𝑘1 ⊙ 𝑘2 = 𝑓 , with

309


𝑘1 |= (∅ ⊲ [𝑏]) and 𝑘2 |= 𝑃.

By restriction (theorem 4.3.1), there exists 𝑗1 such that 𝑗1 ⊑ 𝑘1 and

dom( 𝑗1) ⊆ FVD(∅ ⊲ [𝑏]) = ∅

{𝑏} = FVR(∅ ⊲ [𝑏]) ⊆ range( 𝑗1) ⊆ FV(∅ ⊲ [𝑏]) = {𝑏}.

By restriction (theorem 4.3.1), there exists 𝑗2 such that 𝑗2 ⊑ 𝑘2 and 𝑗2 |= 𝑃,

and dom( 𝑗2) ⊆ FVD(𝑃) ⊆ FVR(∅ ⊲ [𝑏]) = {𝑏}. Since dom(𝑘2) =

range(𝑘1) ⊇ {𝑏}, we may assume without loss of generality that 𝑗2 |= 𝑃,

𝑗2 ⊑ 𝑘2, and dom( 𝑗2) = {𝑏}. Thus 𝑗1 ⊙ 𝑗2 is defined, and so 𝑗1 ⊙ 𝑗2 ⊑

𝑘1 ⊙ 𝑘2 ⊑ 𝑓 by lemma C.2.5.

By lemma C.2.1, there exists 𝑗 : Mem[range( 𝑗2)] → D(Mem[Var]) such

that 𝑗1 ⊙ ( 𝑗2 ⊙ 𝑗) = ( 𝑗1 ⊙ 𝑗2) ⊙ 𝑗 = 𝑓 . Since 𝑗2 ⊑ 𝑗2 ⊙ 𝑗 , we have 𝑗2 ⊙ 𝑗 |= 𝑃.

Thus, we may assume without loss of generality that range( 𝑗2) = Var and

𝑗1 ⊙ 𝑗2 = 𝑓 = ⟨⟩ ↦→ 𝜇.

Let 𝑙tt, 𝑙ff : Mem[∅] → D(Mem[𝑏]) be defined by 𝑙tt (⟨⟩) = unit(𝑏 ↦→ tt) and

𝑙ff (⟨⟩) = unit(𝑏 = ff ); evidently, 𝑙tt |= (∅ ⊲ 𝑏 = tt) and 𝑙ff |= (∅ ⊲ 𝑏 = ff ). Now,

we have:

𝑓𝜇 |⟦𝑏=tt⟧ = 𝑙tt ⊙ 𝑗2

𝑓𝜇 |⟦𝑏=ff⟧ = 𝑙ff ⊙ 𝑗2

where each equality holds if the left side is defined. Regardless of whether

the conditional distributions are defined, we always have:

𝑙tt ⊙ 𝑗2 |= (∅ ⊲ 𝑏 = tt) # 𝑃

𝑙ff ⊙ 𝑗2 |= (∅ ⊲ 𝑏 = ff ) # 𝑃.

Since both of these kernels have empty domain, we have 𝑙tt ⊙ 𝑗2 = 𝜈tt and

𝑙ff ⊙ 𝑗2 = 𝜈ff for two distributions 𝜈tt, 𝜈ff ∈ D(Mem[Var]). By induction, we

310


have:

𝑓⟦𝑐⟧𝜈tt |= (∅ ⊲ 𝑏 = tt) # (𝑏 : 𝑏 = tt ⊲ 𝑄1)

𝑓⟦𝑐⟧𝜈ff |= (∅ ⊲ 𝑏 = ff ) # (𝑏 : 𝑏 = ff ⊲ 𝑄2).

By similar reasoning as for the pre-conditions, there exists 𝑘′1, 𝑘
′
2 :

Mem[𝑏] → D(Mem[Var]) such that 𝑘′1 |= (𝑏 : 𝑏 = tt ⊲ 𝑄1) and 𝑘′2 |=

(𝑏 : 𝑏 = ff ⊲ 𝑄2), and:

𝑓⟦𝑐⟧𝜈tt = 𝑙tt ⊙ 𝑘
′
1 𝑓⟦𝑐⟧𝜈ff = 𝑙ff ⊙ 𝑘

′
2.

Let 𝑘′ : Mem[𝑏] → D(Mem[Var]) be the composite kernel defined as

follows:

𝑘′( [𝑏 ↦→ 𝑣]) ≜


𝑘′1( [𝑏 ↦→ tt]) : 𝑣 = tt

𝑘′2( [𝑏 ↦→ ff ]) : 𝑣 = ff
.

By assumption, 𝑘′ |= ((𝑏 : 𝑏 = tt ⊲ 𝑄1) ∧ (𝑏 : 𝑏 = ff ⊲ 𝑄2)). Now, let 𝑝 ≜

𝜇(⟦𝑏 = tt⟧) be the probability of taking the first branch. Then we can

conclude:

𝑓⟦if 𝑏 then 𝑐 else 𝑐′⟧𝜇 = 𝑓⟦𝑐⟧(𝜇 |⟦𝑏=tt⟧)⊕𝑝⟦𝑐′⟧(𝜇 |⟦𝑏=tt⟧)

= 𝑓⟦𝑐⟧𝜈tt⊕𝑝⟦𝑐⟧𝜈ff

= 𝑓⟦𝑐⟧𝜈tt ⊕𝑝 𝑓⟦𝑐⟧𝜈ff

= (𝑙tt ⊙ 𝑘′1) ⊕𝑝 (𝑙ff ⊙ 𝑘
′
2)

= (𝑙tt ⊙ 𝑘′) ⊕𝑝 (𝑙ff ⊙ 𝑘′)

= (𝑙tt ⊕𝑝 𝑙ff ) ⊙ 𝑘′

|= (∅ ⊲ [𝑏]) # ((𝑏 : 𝑏 = tt ⊲ 𝑄1) ∧ (𝑏 : 𝑏 = ff ⊲ 𝑄2)).

Above, 𝑘1⊕𝑝𝑘2 lifts the convex combination operator ⊕𝑝 from distributions

to kernels from Mem[∅]. We show the last equality in more detail. For any

311


𝑟 ∈ Mem[Var]:

(𝑙tt ⊙ 𝑘′) ⊕𝑝 (𝑙ff ⊙ 𝑘′) (⟨⟩)(𝑟)

= 𝑝 · (𝑙tt ⊙ 𝑘′) (⟨⟩)(𝑟) + (1 − 𝑝) · (𝑙ff ⊙ 𝑘′) (⟨⟩)(𝑟)

= 𝑝 · (𝑙tt ⊙ 𝑘′) (⟨⟩)(𝑟) + (1 − 𝑝) · (𝑙ff ⊙ 𝑘′) (⟨⟩)(𝑟)

= 𝑝 · 𝑙tt (⟨⟩)(𝑏 ↦→ tt) · 𝑘′(𝑏 ↦→ tt) (𝑟) + (1 − 𝑝) · 𝑙ff (⟨⟩)(𝑏 ↦→ ff ) · 𝑘′(𝑏 ↦→ ff ) (𝑟)

= ((𝑙tt ⊕𝑝 𝑙ff ) ⊙ 𝑘′) (⟨⟩)(𝑟).

where the penultimate equality holds because 𝑙tt and 𝑙ff are deterministic.

WEAK Trivial.

FRAME The proof for this case follows the argument for the frame rule in PSL,

with a few minor changes.

There exists 𝑘1, 𝑘2 such that 𝑘1 ⊕ 𝑘2 ⊑ 𝑓 , and 𝑘1 |= 𝑃 and 𝑘2 |= 𝑅; let

𝑆1 ≜ range(𝑘1). Also, by restriction, there exists 𝑘′2 ⊑ 𝑘2 such that 𝑘′2 |= 𝑅

and range(𝑘′2) ⊆ FV(𝑅); let 𝑆2 ≜ range(𝑘′2). Since 𝑘1 and 𝑘2 have empty

domains, 𝑆1 and 𝑆2 must be disjoint. Let 𝑆3 = Var \ (𝑆2 ∪ 𝑆1). Since MV(𝑐)

is disjoint from 𝑆2 by the first side-condition, we have WV(𝑐) ⊆ MV(𝑐) ⊆

𝑆1 ∪ 𝑆3.

Let 𝑓 ′ = 𝑓⟦𝑐⟧𝜇 be the lifted output. By induction, we have 𝑓 ′ |= 𝑄; by re-

striction (theorem 4.3.1), there exists 𝑘′1 ⊑ 𝑓 ′ such that range(𝑘′1) ⊆ FV(𝑄)

and 𝑘′1 |= 𝑄. By the last side condition, RV(𝑐) ⊆ FVR(𝑃) ⊆ 𝑆1.

By soundness of RV and WV (lemma A.2.1), all variables in WV(𝑐) must

be written before they are read and there is a function 𝐹 : Mem[𝑆1] →

D(Mem[WV(𝑐) ∪ 𝑆1]) such that:

𝜋WV(𝑐)∪𝑆1⟦𝑐⟧𝜇 = bind(𝜇, 𝑚 ↦→ 𝐹 (𝑚𝑆1)).

312


Since 𝑆2 ⊆ FV(𝑅), variables in 𝑆2 are not in MV(𝑐) by the first side-

condition, and 𝑆2 is disjoint from WV(𝑐) ∪ 𝑆1. By soundness of MV, we

have:

𝜋WV(𝑐)∪𝑆1∪𝑆2⟦𝑐⟧𝜇 = bind(𝜋WV(𝑐)∪𝑆1∪𝑆2𝜇, 𝐹 ⊕ unitMem[WV(𝑐)∪𝑆2]).

Since 𝑆1 and 𝑆2 are independent in 𝜇, we know that 𝑆1 ∪WV(𝑐) and 𝑆2 are

independent in ⟦𝑐⟧𝜇. Hence:

𝑓𝜋𝑆1∪WV(𝑐)⟦𝑐⟧𝜇 ⊕ 𝑓𝜋𝑆2⟦𝑐⟧𝜇 ⊑ 𝑓 ′.

By induction, 𝑓 ′ |= 𝑄. Furthermore, FV(𝑄) ⊆ FVR(𝑃) ∪ WV(𝑐) ⊆

𝑆1 ∪WV(𝑐) by the second side-condition. By restriction (theorem 4.3.1),

𝑓𝜋𝑆1∪WV(𝑐)⟦𝑐⟧𝜇 |= 𝑄. Furthermore, 𝜋𝑆2⟦𝑐⟧𝜇 = 𝜋𝑆2𝜇, so 𝜋𝑆2⟦𝑐⟧𝜇 |= 𝑅 as well.

Thus, 𝑓 ′ |= 𝑄 ∗ 𝑅 as desired.

□

313


APPENDIX D

THE UNARY FRAGMENT BLUEBELL FOR REASONING ABOUT

INDEPENDENCE AND CONDITIONAL INDEPENDENCE

D.1 The Rules of BLUEBELL

In fig. D.1 we summarize the notation we use for assertions over BLUEBELL’s

model. Recall that BLUEBELL’s assertions 𝑃 ∈ PA𝐼 ≜M𝐼

u−→ Prop are the upward-

closed predicates over elements of the RAM𝐼 .

⌜𝜙⌝ ≜ λ . 𝜙

Own(𝑏) ≜ λ𝑎. 𝑏 ⪯ 𝑎
𝑃 ∧𝑄 ≜ λ𝑎. 𝑃(𝑎) ∧𝑄(𝑎)
𝑃 ∗𝑄 ≜ λ𝑎. ∃𝑏1, 𝑏2. (𝑏1 · 𝑏2) ⪯ 𝑎 ∧ 𝑃(𝑏1) ∧𝑄(𝑏2)

∃𝑥 : 𝑋. 𝐾 ≜ λ𝑎. ∃𝑥 : 𝑋. 𝐾 (𝑥) (𝑎) (𝐾 : 𝑋 → PA𝐼 )
∀𝑥 : 𝑋. 𝐾 ≜ λ𝑎.∀𝑥 : 𝑋. 𝐾 (𝑥) (𝑎) (𝐾 : 𝑋 → PA𝐼 )

Own(F , 𝜇) ≜ ∃𝑝.Own(F , 𝜇, 𝑝)
𝐸 $∼ 𝜇 ≜ ∃F , 𝜇.Own(F , 𝜇) ∗ ⌜𝐸 � (F (𝑖), 𝜇(𝑖)) ∧ 𝜇 = 𝜇(𝑖) ◦ 𝐸−1⌝

C𝜇 𝐾 ≜ λ𝑎. ∃F , 𝜇, 𝑝, 𝜅. (F , 𝜇, 𝑝) ⪯ 𝑎 ∧ ∀𝑖 ∈ 𝐼 . 𝜇(𝑖) = bind(𝜇, 𝜅(𝑖))
∧ ∀𝑣 ∈ supp(𝜇).𝐾 (𝑣) (F , 𝜅(𝐼) (𝑣), 𝑝)

(𝜇 :D(𝐴), 𝐾 : 𝐴→ PA𝐼 )

wp 𝑡 {𝑄} ≜ λ𝑎.∀𝜇0.∀𝑐. (𝑎 · 𝑐) ⪯ 𝜇0 ⇒ ∃𝑏.
(
(𝑏 · 𝑐) ⪯ ⟦𝑡⟧(𝜇0) ∧𝑄(𝑏)

)
⌈𝐸⌉ ≜ (𝐸 ∈ true) $∼ 𝛿True

Own(𝐸) ≜ ∃𝜇. 𝐸 $∼ 𝜇
(𝑥:𝑞) ≜ ∃P, 𝑝.Own(P, 𝑝) ∗ ⌜𝑝(𝑖) (𝑥) = 𝑞⌝
𝑃@𝑝 ≜ 𝑃 ∧ ∃P .Own(P, 𝑝)
⌊𝑅⌋ ≜ ∃𝜇 :D(Val𝑋). ⌜𝜇(𝑅) = 1⌝ ∗ C𝜇 𝑣. ⌈𝑥 = 𝑣(𝑥)⌉𝑥∈𝑋 (𝑅 ⊆ Val𝑋, 𝑋 ⊆ 𝐼 × Var)

Figure D.1: The assertions used in BLUEBELL.

Proposition D.1.1 (Upward-closure). All the assertions in fig. D.1 are upward-

closed.

Proof. Easy by inspection of the definitions. The definitions where upward-

closedness is less obvious (e.g. joint conditioning) are made upward-closed by

314


construction by explicit use of the order ⪯ in the definition. □

Lemma D.1.2. For all 𝜇 :D(𝐴 × 𝐵), there exists a 𝜅 : 𝐴 → D(𝐵) such that 𝜇 =

(𝜇 ◦ 𝜋−1
1 ) � 𝜅.

Proof. Let 𝜇1 = 𝜇 ◦ 𝜋−1
1 . Then the result is immediate by letting

𝜅(𝑎) (𝑏) =


𝜇0 (𝑎,𝑏)
𝜇1 (𝑎) if𝜇1(𝑎) > 0

0 otherwise

□

D.1.1 Program Semantics

We assume each primitive operator 𝜑 ∈ {+,−, <, . . .} has an associated arity

ar(𝜑) ∈ N, and is given semantics as some function ⟦𝜑⟧ : Valar(𝜑) → Val.

Definition D.1.1. Expressions 𝑒 ∈ E are given semantics as a function

⟦𝑒⟧ : Mem[Var] → Val as standard:

⟦𝑣⟧(𝑠) ≜ 𝑣 ⟦𝑥⟧(𝑠) ≜ 𝑠(𝑥) ⟦𝜑(𝑒1, . . . , 𝑒ar(𝜑))⟧(𝑠) ≜ ⟦𝜑⟧(⟦𝑒1⟧, . . . , ⟦𝑒ar(𝜑)⟧)

Definition D.1.2 (Term semantics). Given 𝑡 ∈ T we define its kernel semantics

K⟦𝑡⟧ : Mem[Var] → D(ΣMem[Var]) as follows:

K⟦skip⟧(𝑠) ≜ unit(𝑠)

K⟦𝑥 := 𝑒⟧(𝑠) ≜ unit(𝑠[𝑥 ↦→ ⟦𝑒⟧(𝑠) ])

K⟦𝑥 𝑑⟧(𝑠) ≜ bind(⟦𝑑⟧(⟦𝑒1⟧(𝑠), . . . , ⟦𝑒𝑛⟧(𝑠)), 𝑣 ↦→ return(𝑠[𝑥 ↦→ 𝑣 ]))

K⟦𝑡1; 𝑡2⟧(𝑠) ≜ bind(K⟦𝑡1⟧(𝑠), 𝑠′ ↦→ K⟦𝑡2⟧(𝑠′))

K⟦if 𝑒 then 𝑡1 else 𝑡2⟧(𝑠) ≜ if ⟦𝑒⟧(𝑠) ≠ 0 then K⟦𝑡1⟧(𝑠) else K⟦𝑡2⟧(𝑠)

K⟦repeat 𝑒 𝑡⟧(𝑠) ≜ loop𝑡 (⟦𝑒⟧(𝑠), 𝑠)

315


where loop𝑡 simply iterates 𝑡:

loop𝑡 (𝑛, 𝑠) ≜


unit(𝑠) 𝑛 ≤ 0

bind(loop𝑡 (𝑛 − 1, 𝑠), 𝑠′ ↦→ K⟦𝑡⟧(𝑠′)) Otherwise

The semantics of a term is then defined as:

⟦𝑡⟧ : D(ΣMem[Var]) → D(ΣMem[Var])

⟦𝑡⟧(𝜇) ≜ bind(𝜇, 𝑠 ↦→ K⟦𝑡⟧(𝑠))

Evaluation contexts E are defined by the following grammar:

E F 𝑥 := 𝐸 | 𝑥 𝑑 | if 𝐸 then 𝑡1 else 𝑡2 | repeat 𝐸 𝑡

𝐸 F [ · ] | 𝜑( ®𝑒1, 𝐸, ®𝑒2)

A simple property holds for evaluation contexts.

Lemma D.1.3. K⟦E[𝑒]⟧(𝑠) = K⟦E[⟦𝑒⟧(𝑠)]⟧(𝑠).

Proof. Easy by induction on the structure of evaluation contexts. □

D.2 Measure Theory Lemmas

Notation In what follows, given 𝑛 ∈ N with 𝑛 > 1, we write [𝑛] to de-

note the set {1, . . . , 𝑛}. Moreover, for iterated summation we use the notation∑
𝑖∈𝐼 |Φ(𝑖) 𝑓 (𝑖) where 𝐼 = {𝑖0, 𝑖1, . . .} is countable and Φ is a predicate on elements

of 𝐼, to denote the sum 𝑓 ( 𝑗0)+ 𝑓 ( 𝑗1)+ . . . where 𝑗0, 𝑗1, . . . is the sublist of 𝑖0, 𝑖1, . . .

consisting of the elements that satisfy Φ. A similar convention is used for other

commutative and associative operators, e.g. ∪. A countable partition of Ω is a

316


partition of Ω, 𝑆 ⊆ 𝒫(Ω), with countably many sets. For uniformity, we rep-

resent countable partitions as 𝑆 = {𝐴𝑖}𝑖∈N with the convention that when the

partition has finitely many sets, say 𝑛, all the 𝐴𝑖 with 𝑖 ≥ 𝑛 are empty.

As mentioned, BLUEBELL is only concerned with discrete distributions,

i.e. distributions over a countable set of outcomes. The following lemma ex-

presses the key property of 𝜎-algebras over countable outcomes that we exploit

for proving the other results.

Lemma D.2.1. Let Ω be an countable set, and F to be an arbitrary 𝜎-algebra on Ω.

Then there exists a countable partition 𝑆 of Ω such that F = 𝜎(𝑆).

Proof. For every element 𝑥 ∈ Ω, we identify the smallest event 𝐸𝑥 ∈ F such that

𝑥 ∈ 𝐸𝑥 , and show that for 𝑥, 𝑧 ∈ Ω, either 𝐸𝑥 = 𝐸𝑧 or 𝐸𝑥 ∩ 𝐸𝑧 = ∅. Then the set

𝑆 = {𝐸𝑥 | 𝑥 ∈ Ω} is a partition of Ω, and any event 𝐸 ∈ F can be represented as⋃
𝑥∈𝐸 𝐸𝑥 , which suffices to show that F = 𝜎(𝑆).

For every 𝑥, 𝑦, let

𝐴𝑥,𝑦 =


Ω if ∀𝐸 ∈ F , either 𝑥, 𝑦 both in 𝐸 or 𝑥, 𝑦 both not in 𝐸

𝐸 otherwise,pick any 𝐸 ∈ F such that 𝑥 ∈ 𝐸 and 𝑦 ∉ 𝐸

Then we show that, for all 𝑥, 𝐸𝑥 = ∩𝑦∈Ω𝐴𝑥,𝑦 is the smallest event in F such that

𝑥 ∈ 𝐸𝑥 as follows. If there exists 𝐸′𝑥 such that 𝑥 ∈ 𝐸′𝑥 and 𝐸′𝑥 ⊂ 𝐸𝑥 , then 𝐸𝑥 \ 𝐸′𝑥 is

not empty. Let 𝑦 be an element of 𝐸𝑥 \ 𝐸′𝑥 , and by the definition of 𝐴𝑥,𝑦, we have

𝑦 ∉ 𝐴𝑥,𝑦. Thus, 𝑦 ∉ ∩𝑦∈Ω𝐴𝑥,𝑦 = 𝐸𝑥 , which contradicts with 𝑦 ∈ 𝐸𝑥 \ 𝐸′𝑥 .

Next, for any 𝑥, 𝑧 ∈ Ω, since 𝐸𝑥 is the smallest event containing 𝑥 and 𝐸𝑧 is

the smallest event containing 𝑧, the smaller event 𝐸𝑧 \ 𝐸𝑥 is either equivalent to

𝐸𝑧 or not containing 𝑧.

317


If 𝐸𝑧 \ 𝐸𝑥 = 𝐸𝑧, then 𝐸𝑥 and 𝐸𝑧 are disjoint.

If 𝑧 ∉ 𝐸𝑧 \ 𝐸𝑥 , then it must 𝑧 ∈ 𝐸𝑥 , which implies that there exists no 𝐸 ∈ F

such that 𝑥 ∈ 𝐸 and 𝑧 ∉ 𝐸 . Because F is closed under complement, then

there exists no 𝐸 ∈ F such that 𝑥 ∉ 𝐸 and 𝑧 ∈ 𝐸 as well. Therefore, we have

𝑥 ∈ ⋂
𝑦∈Ω 𝐴𝑧,𝑦 = 𝐸𝑧 as well. Furthermore, because 𝐸𝑧 is the smallest event in

F that contains 𝑧 and 𝐸𝑥 also contains 𝑧, we have 𝐸𝑧 ⊆ 𝐸𝑥 ; symmetrically,

we have 𝐸𝑥 ⊆ 𝐸𝑧. Thus, 𝐸𝑥 = 𝐸𝑧.

Hence, the set 𝑆 = {𝐸𝑥 | 𝑥 ∈ Ω} is a countable partition of Ω. □

Lemma D.2.2. If 𝑆 = {𝐴𝑖}𝑖∈N is a partition of Ω, and F = 𝜎(𝑆), then every

event 𝐸 ∈ F can be written as 𝐸 =
⊎
𝑖∈𝐼 𝐴𝑖 for some 𝐼 ⊆ N. In other words,

𝜎(𝑆) = {[}
]⊎

𝑖∈𝐼 𝐴𝑖 |𝐼 ⊆ N.

Proof. Because 𝜎-algebras are closed under countable union, for any 𝐼 ⊆ N,⊎
𝑖∈𝐼 𝐴𝑖 ∈ 𝜎(𝑆). Thus, 𝜎(𝑆) ⊇ {[}

]⊎
𝑖∈𝐼 𝐴𝑖 | 𝐼 ⊆ N.

Also, {[}
]⊎

𝑖∈𝐼 𝐴𝑖 | 𝐼 ⊆ N is a 𝜎-algebra:

• Ω =
⊎
𝑖∈N 𝐴𝑖.

• Given a countable sequences of events 𝐸1 =
⊎
𝑖∈𝐼1 𝐴𝑖, 𝐸2 =

⊎
𝑖∈𝐼2 𝐴𝑖, . . . , let

𝐼 =
⋃
𝑗∈N 𝐼 𝑗 ; then we have

⋃
𝑗∈N 𝐸𝑖 =

⊎
𝑖∈𝐼 𝐴𝑖.

• If 𝐸 =
⊎
𝑖∈𝐼 𝐴𝑖, then the complement of 𝐸 is (Ω \ 𝐸) = ⊎

𝑖∈(N\𝐼) 𝐴𝑖.

Then, {⊎𝑖∈𝐼 𝐴𝑖 | 𝐼 ⊆ N} is a 𝜎-algebra that contains 𝑆. Therefore, 𝜎(𝑆) =

{[}
]⊎

𝑖∈𝐼 𝐴𝑖 | 𝐼 ⊆ N. □

318


Lemma D.2.3. Let Ω be a countable set. If 𝑆1 = {𝐴𝑖}𝑖∈N and 𝑆2 = {𝐵 𝑗 } 𝑗∈N are both

countable partitions of Ω, then 𝜎(𝑆1) ⊆ 𝜎(𝑆2) implies that for any 𝐵 𝑗 ∈ 𝑆2 with

𝐵 𝑗 ≠ ∅, we can find a unique 𝐴𝑖 ∈ 𝑆1 such that 𝐵 𝑗 ⊆ 𝐴𝑖.

Proof. For any 𝐵 𝑗 ∈ 𝑆2 with 𝐵 𝑗 ≠ ∅, pick an arbitrary element 𝑠 ∈ 𝐵 𝑗 and denote

the unique element of 𝑆1 that contains 𝑠 as 𝐴𝑖. Because 𝐴𝑖 ∈ 𝑆1 and 𝑆1 ⊂ 𝜎(𝑆1) ⊆

𝜎(𝑆2), we have 𝐴𝑖 ∈ 𝜎(𝑆2). Note that 𝑠 ∈ 𝐵 𝑗 and 𝐵 𝑗 is an element of the partition

𝑆2 that generates 𝜎(𝑆2), 𝐵 𝑗 must be the smallest event in 𝜎(𝑆2) that contains 𝑠.

Because 𝑠 ∈ 𝐴𝑖 as well, 𝐵 𝑗 being the smallest event containing 𝑠 implies that

𝐵 𝑗 ⊆ 𝐴𝑖. □

Lemma D.2.4. Assume we are given a 𝜎-algebra F1 over a countable set Ω, measure

𝜇1 ∈ D(F1), a countable set 𝐴, a distribution 𝜇 ∈ Σ𝐴, and a function 𝜅1 : 𝐴→ D(F1)

such that 𝜇1 = bind(𝜇, 𝜅1). Then, for any probability space (F2, 𝜇2) such that

(F1, 𝜇1) ⊑ (F2, 𝜇2), there exists 𝜅2 such that 𝜇2 = bind(𝜇, 𝜅2), and that for any

𝑎 ∈ supp(𝜇), (F1, 𝜅1(𝑎)) ⊑ (F2, 𝜅2(𝑎)).

Proof. By lemma D.2.1, F𝑖 = 𝜎(𝑆𝑖) for some countable partition 𝑆𝑖. Also,

(F1, 𝜇1) ⊑ (F2, 𝜇2) implies that F1 ⊆ F2. So we have 𝜎(𝑆1) ⊆ 𝜎(𝑆2), which

by lemma D.2.3 implies that for any 𝐵 ∈ 𝑆2 with 𝐵 ≠ ∅, we can find a unique

𝐴 ∈ 𝑆1 such that 𝐵 ⊆ 𝐴. Let 𝑓 be the mapping associating to any 𝐵 ≠ ∅ the

corresponding 𝐴 = 𝑓 (𝐵), and 𝑓 (𝐵) = ∅ when 𝐵 = ∅.

Then, we define 𝜅2 as follows: for any 𝑎 ∈ 𝐴, 𝐸 ∈ F2, there exists 𝑆 ⊆ 𝑆2 such

that 𝐸 =
⊎
𝐵∈𝑆 𝐵, then define

𝜅2(𝑎) (𝐸) =
∑︁
𝐵∈𝑆

𝜅1(𝑎) ( 𝑓 (𝐵)) · ℎ(𝐵),

where ℎ(𝐵) = 𝜇2(𝐵)/𝜇2( 𝑓 (𝐵)) if 𝜇2( 𝑓 (𝐵)) ≠ 0 and ℎ(𝐵) = 0 otherwise.

319


Then we calculate:

bind(𝜇, 𝜅2) (𝐸)

=
∑︁
𝑎∈𝐴

𝜇(𝑎) · 𝜅2(𝐸)

=
∑︁
𝑎∈𝐴

𝜇(𝑎) ·
∑︁
𝐵∈𝑆

𝜅1(𝑎) ( 𝑓 (𝐵)) · ℎ(𝐵)

=
∑︁
𝐵∈𝑆

∑︁
𝑎∈𝐴

𝜇(𝑎) · 𝜅1(𝑎) ( 𝑓 (𝐵)) · ℎ(𝐵)

=
∑︁
𝐵∈𝑆

bind(𝜇, 𝜅1) ( 𝑓 (𝐵)) · ℎ(𝐵)

=
∑︁
𝐵∈𝑆

𝜇1( 𝑓 (𝐵)) · ℎ(𝐵)

=
∑︁
𝐵∈𝑆 |𝜇2 ( 𝑓 (𝐵))≠0

𝜇1( 𝑓 (𝐵)) ·
𝜇2(𝐵)

𝜇2( 𝑓 (𝐵))

=
∑︁
𝐵∈𝑆 |𝜇2 ( 𝑓 (𝐵))≠0

𝜇2( 𝑓 (𝐵)) ·
𝜇2(𝐵)

𝜇2( 𝑓 (𝐵))
(𝜇1(𝐸′) = 𝜇2(𝐸′) for any 𝐸′ ∈ F1)

=
∑︁
𝐵∈𝑆 |𝜇2 ( 𝑓 (𝐵))≠0

𝜇2(𝐵)

=
∑︁
𝐵∈𝑆 |𝜇2 ( 𝑓 (𝐵))≠0

𝜇2(𝐵) +
∑︁
𝐵∈𝑆 |𝜇2 ( 𝑓 (𝐵))=0

𝜇2(𝐵) (Because 𝜇2( 𝑓 (𝐵)) = 0 implies 𝜇2(𝐵) = 0)

=
∑︁
𝐵∈𝑆

𝜇2(𝐵)

= 𝜇2(
⊎
𝐵∈𝑆 𝐵)

= 𝜇2(𝐸)

Thus, bind(𝜇, 𝜅2) = 𝜇2.

Also, for any 𝑎 ∈ 𝐴𝜇, for any 𝐸 ∈ F1, there exists 𝑆′ ⊆ 𝑆1 such that 𝐸 =

320


⊎
𝐴∈𝑆′ 𝐴.

𝜅2(𝑎) (𝐸) = 𝜅2(𝑎)
(⊎

𝐴∈𝑆′ 𝐴
)

=
∑︁
𝐴∈𝑆′

𝜅2(𝑎) (𝐴)

=
∑︁
𝐴∈𝑆′

∑︁
𝐵⊆𝐴|𝐵∈F2

𝜅2(𝑎) (𝐵)

=
∑︁
𝐴∈𝑆′

∑︁
𝐵⊆𝐴|𝐵∈F2,𝜇2 ( 𝑓 (𝐵))≠0

𝜅1(𝑎) ( 𝑓 (𝐵)) ·
𝜇2(𝐵)

𝜇2( 𝑓 (𝐵))

=
∑︁
𝐴∈𝑆′ |𝜇2 (𝐴)≠0

𝜅1(𝑎) (𝐴) ·

(∑
𝐵⊆𝐴|𝐵∈F2

𝜇2(𝐵)
)

𝜇2(𝐴)

=
∑︁
𝐴∈𝑆′ |𝜇2 (𝐴)≠0

𝜅1(𝑎) (𝐴) ·
𝜇2(𝐴)
𝜇2(𝐴)

=
∑︁
𝐴∈𝑆′ |𝜇2 (𝐴)≠0

𝜅1(𝑎) (𝐴)

=
∑︁
𝐴∈𝑆′

𝜅1(𝑎) (𝐴)

= 𝜅1(𝑎)
(⊎

𝐴∈𝑆′ 𝐴
)

= 𝜅1(𝑎) (𝐸)

Thus, for any 𝑎, (𝜎1, 𝜅1(𝑎)) ⊑ (𝜎2, 𝜅2(𝑎)). □

Lemma D.2.5. Given two 𝜎-algebras F1 and F2 over two countable underlying sets

Ω1,Ω2, then a general element in the product 𝜎-algebra F1 ⊗ F2 can be expressed as⊎
(𝑖, 𝑗)∈𝐼 (𝐴𝑖 × 𝐵 𝑗 ) for some 𝐼 ⊆ N2 and 𝐴𝑖 ∈ F1, 𝐵 𝑗 ∈ F2 for (𝑖, 𝑗) ∈ 𝐼.

Proof. By lemma D.2.1, each 𝜎-algebra F𝑖 is generated by a countable partition

over Ω𝑖. Let 𝑆1 = {𝐴𝑖}𝑖∈N be the countable partition that generates F1, 𝑆2 =

{𝐵𝑖}𝑖∈N be the countable partition that generates F2. By lemma D.2.2, a general

element in F1 can be written as
⊎
𝑗∈𝐽 𝐴 𝑗 for some 𝐽 ⊆ N, and similarly, a general

element in F2 can be written as
⊎
𝑘∈𝐾 𝐵𝑘 for some 𝐾 ⊆ N.

321


Note that {𝐴 𝑗 × 𝐵𝑘 } 𝑗 ,𝑘∈N is a partition because: if (𝐴 𝑗 × 𝐵𝑘 ) ∩ (𝐴 𝑗 ′ × 𝐵𝑘 ′) ≠ ∅

for some 𝑗 ≠ 𝑗 ′ and 𝑘 ≠ 𝑘′, then it must 𝐴 𝑗 ∩ 𝐴 𝑗 ′ ≠ ∅ and 𝐵𝑘 ∩ 𝐵𝑘 ′ ≠ ∅, and that

imply that 𝐴 𝑗 = 𝐴 𝑗 ′ and 𝐵 𝑗 = 𝐵 𝑗 ′ ; therefore, 𝐴 𝑗 × 𝐵𝑘 = 𝐴 𝑗 ′ × 𝐵𝑘 ′ .

We next show that F1 ⊗ F2 is generated by the partition {𝐴 𝑗 × 𝐵𝑘 } 𝑗 ,𝑘∈N.

F1 ⊗ F2 = 𝜎(F1 × F2)

= 𝜎

(
{∗}⊎ 𝑗∈𝐽1

𝐴 𝑗 ×
⊎
𝑗∈𝐽2

𝐵 𝑗 |𝐽1, 𝐽2 ⊆ N
)

= 𝜎

(
{∗}⊎ 𝑗∈𝐽1,𝑘∈𝐽2

(𝐴 𝑗 × 𝐵𝑘 ) |𝐽1, 𝐽2 ⊆ N
)

= 𝜎
(
{∗}𝐴 𝑗 × 𝐵𝑘 | 𝑗 , 𝑘 ⊆ N

)
Since each 𝐴 𝑗 ∈ 𝑆1 ⊆ F1 and 𝐵𝑘 ∈ 𝑆2 ⊆ F2 a general element in F1 ⊗ F2

can be expressed as {∗}⊎ 𝑗 ,𝑘⊆𝐼 (𝐴 𝑗 × 𝐵𝑘 ) | 𝐴 𝑗 ∈ F1, 𝐵𝑘 ∈ F2, 𝐼 ⊆ N2 according to

lemma D.2.1. □

Lemma D.2.6. Given two probability spaces (F𝑎, 𝜇𝑎), (F𝑏, 𝜇𝑏) ∈ P(Ω), their indepen-

dent product (F𝑎, 𝜇𝑎)⊛ (F𝑏, 𝜇𝑏) exists if 𝜇𝑎 (𝐸𝑎) · 𝜇𝑏 (𝐸𝑏) = 0 for any 𝐸𝑎 ∈ F𝑎, 𝐸𝑏 ∈ F𝑏

such that 𝐸𝑎 ∩ 𝐸𝑏 = ∅.

Proof. We first define 𝜇 : {𝐸𝑎 ∩ 𝐸𝑏 | 𝐸𝑎 ∈ F𝑎, 𝐸𝑏 ∈ F𝑏} → [0, 1] by 𝜇(𝐸𝑎 ∩ 𝐸𝑏) =

𝜇𝑎 (𝐸𝑎) ·𝜇𝑏 (𝐸𝑏) for any 𝐸𝑎 ∈ F𝑎, 𝐸𝑏 ∈ F𝑏, and then show that 𝜇 could be extended

to a probability measure on F𝑎 ⊕ F𝑏.

• We first need to show that 𝜇 is well-defined. That is, 𝐸𝑎 ∩ 𝐸𝑏 = 𝐸′𝑎 ∩ 𝐸′𝑏
implies 𝜇𝑎 (𝐸𝑎) · 𝜇𝑏 (𝐸𝑏) = 𝜇𝑎 (𝐸′𝑎) · 𝜇𝑏 (𝐸′𝑏).

When 𝐸𝑎 ∩ 𝐸𝑏 = 𝐸′𝑎 ∩ 𝐸′𝑏, it must 𝐸𝑎 ∩ 𝐸′𝑎 ⊇ 𝐸𝑎 ∩ 𝐸𝑏 = 𝐸′𝑎 ∩ 𝐸′𝑏, Thus,

𝐸𝑎 \ 𝐸′𝑎 ⊆ 𝐸𝑎 \ 𝐸𝑏, and then 𝐸𝑎 \ 𝐸′𝑎 is disjoint from 𝐸𝑏; symmetrically,

𝐸′𝑎 \ 𝐸𝑎 is disjoint from 𝐸′
𝑏
. Since 𝐸𝑎, 𝐸′𝑎 are both in F𝑎, we have 𝐸𝑎 \ 𝐸′𝑎

322


and 𝐸′𝑎 \ 𝐸𝑎 both measurable in F𝑎. Their disjointness and the result above

implies that 𝜇𝑎 (𝐸𝑎 \𝐸′𝑎) ·𝜇𝑏 (𝐸𝑏) = 0 and 𝜇𝑎 (𝐸′𝑎 \𝐸𝑎) ·𝜇𝑏 (𝐸′𝑏) = 0. Symmetric

reasoning can also show that 𝐸′
𝑏
\ 𝐸𝑏 is disjoint from 𝐸′𝑎 ∩ 𝐸𝑎, and 𝐸𝑏 \ 𝐸′𝑏

is disjoint from 𝐸′𝑎 ∩ 𝐸𝑎, which implies 𝜇𝑎 (𝐸𝑏 \ 𝐸′𝑏) · 𝜇𝑏 (𝐸
′
𝑎 ∩ 𝐸𝑎) = 0 and

𝜇𝑎 (𝐸′𝑏 \ 𝐸𝑏) · 𝜇𝑏 (𝐸
′
𝑎) = 0.

Then there are four possibilities:

– If 𝜇𝑏 (𝐸𝑏) = 0 and 𝜇𝑏 (𝐸′𝑏) = 0, then 𝜇𝑎 (𝐸𝑎)·𝜇𝑏 (𝐸𝑏) = 0 = 𝜇𝑎 (𝐸′𝑎)·𝜇𝑏 (𝐸′𝑏).

– If 𝜇𝑎 (𝐸𝑎 \ 𝐸′𝑎) = 0 and 𝜇𝑏 (𝐸′𝑎 \ 𝐸𝑎) = 0. Then

𝜇𝑎 (𝐸𝑎) · 𝜇𝑏 (𝐸𝑏) = 𝜇𝑎 ((𝐸′𝑎 \ 𝐸𝑎) ⊎ (𝐸′𝑎 ∩ 𝐸𝑎)) · 𝜇𝑏 (𝐸𝑏)

= (𝜇𝑎 (𝐸′𝑎 \ 𝐸𝑎) + 𝜇𝑎 (𝐸′𝑎 ∩ 𝐸𝑎)) · 𝜇𝑏 (𝐸𝑏)

= 𝜇𝑎 (𝐸′𝑎 ∩ 𝐸𝑎) · 𝜇𝑏 (𝐸𝑏)

= (𝜇𝑎 (𝐸𝑎 \ 𝐸′𝑎) + 𝜇𝑎 (𝐸′𝑎 ∩ 𝐸𝑎)) · 𝜇𝑏 (𝐸𝑏)

= 𝜇𝑎 (𝐸′𝑎) · 𝜇𝑏 (𝐸𝑏)

Thus, either 𝜇𝑎 (𝐸′𝑎 ∩ 𝐸𝑎) = 0, which implies that

𝜇𝑎 (𝐸𝑎) · 𝜇𝑏 (𝐸𝑏) = (0+0) · 𝜇𝑏 (𝐸𝑏) = 0 = (0+0) · 𝜇𝑏 (𝐸𝑏) = 𝜇𝑎 (𝐸′𝑎) · 𝜇𝑏 (𝐸′𝑏),

or we have both 𝜇𝑏 (𝐸′𝑏 \𝐸𝑏) = 0 and 𝜇𝑏 (𝐸𝑏 \𝐸′𝑏) = 0, which imply that

𝜇𝑎 (𝐸𝑎) · 𝜇𝑏 (𝐸𝑏) = 𝜇𝑎 (𝐸′𝑎) · 𝜇𝑏 (𝐸𝑏)

= 𝜇𝑎 (𝐸′𝑎) · 𝜇𝑏 ((𝐸𝑏 ∩ 𝐸′𝑏) ⊎ (𝐸𝑏 \ 𝐸
′
𝑏))

= 𝜇𝑎 (𝐸′𝑎) · (𝜇𝑏 (𝐸𝑏 ∩ 𝐸′𝑏) + 0)

= 𝜇𝑎 (𝐸′𝑎) · (𝜇𝑏 (𝐸𝑏 ∩ 𝐸′𝑏) + 𝜇𝑏 (𝐸
′
𝑏 \ 𝐸𝑏))

= 𝜇𝑎 (𝐸′𝑎) · 𝜇𝑏 (𝐸′𝑏).

323


– If 𝜇𝑏 (𝐸′𝑏) = 0 and 𝜇𝑏 (𝐸𝑎 \ 𝐸′𝑎) = 0, then

𝜇𝑎 (𝐸𝑎) · 𝜇𝑏 (𝐸𝑏) = (𝜇𝑎 (𝐸𝑎 ∩ 𝐸′𝑎) + 𝜇𝑎 (𝐸𝑎 \ 𝐸′𝑎)) · (𝜇𝑏 (𝐸𝑏 ∩ 𝐸′𝑏) + 𝜇𝑏 (𝐸𝑏 \ 𝐸
′
𝑏))

= 𝜇𝑎 (𝐸𝑎 ∩ 𝐸′𝑎) · 𝜇𝑏 (𝐸𝑏 \ 𝐸′𝑏)

Because 𝜇𝑎 (𝐸𝑏 \ 𝐸′𝑏) · 𝜇𝑏 (𝐸
′
𝑎 ∩ 𝐸𝑎) = 0 and 𝜇𝑎 (𝐸′𝑏 \ 𝐸𝑏) · 𝜇𝑏 (𝐸

′
𝑎) = 0.

Thus, 𝜇𝑎 (𝐸𝑎) · 𝜇𝑏 (𝐸𝑏) = 0 = 𝜇𝑎 (𝐸′𝑎) · 𝜇𝑏 (𝐸′𝑏).

– If 𝜇𝑏 (𝐸𝑏) = 0 and 𝜇𝑏 (𝐸′𝑎 \ 𝐸𝑎) = 0, then symmetric as above.

In all these cases, 𝜇𝑎 (𝐸𝑎) · 𝜇𝑏 (𝐸𝑏) = 𝜇𝑎 (𝐸′𝑎) · 𝜇𝑏 (𝐸′𝑏) as desired.

• Show that 𝜇 satisfy countable additivity in {𝐸𝑎 ∩ 𝐸𝑏 | 𝐸𝑎 ∈ F𝑎, 𝐸𝑏 ∈ F𝑏}.

We start with showing that 𝜇 is finite-additive. Suppose 𝐸𝑛𝑎 ∩ 𝐸𝑛𝑏 =⊎
𝑖∈[𝑛] (𝐴𝑖 ∩ 𝐵𝑖) where each 𝐴𝑖 ∈ F𝑎 and 𝐵𝑖 ∈ F𝑏. Fix any 𝐴𝑖 ∩ 𝐵𝑖, there

is unique minimal 𝐴 ∈ F𝑎 containing 𝐴𝑖 ∩ 𝐵𝑖, because if 𝐴 ⊇ 𝐴𝑖 ∩ 𝐵𝑖 and

𝐴′ ⊇ 𝐴𝑖 ∩𝐵𝑖, then 𝐴∩ 𝐴′ ⊇ 𝐴𝑖 ∩𝐵𝑖 and 𝐴∩ 𝐴′ ∈ F𝐴 too, and 𝐴∩ 𝐴′ is smaller.

Because we have shown that 𝜇 is well-defined, in the following proof, we

can assume without loss of generality that 𝐴𝑖 is the smallest set in F𝑎 con-

taining 𝐴𝑖 ∩ 𝐵𝑖. Similarly, we let 𝐵𝑖 to be the smallest set in F𝑏 containing

𝐴𝑖 ∩ 𝐵𝑖. Thus, 𝐸𝑛𝑎 ∩ 𝐸𝑛𝑏 =
⊎
𝑖∈[𝑛] (𝐴𝑖 ∩ 𝐵𝑖) implies every 𝐴𝑖 is smaller than 𝐸𝑛𝑎

and every 𝐵𝑖 is smaller than 𝐸𝑛
𝑏
. Therefore, 𝐸𝑛𝑎 ⊇ ∪𝑖∈[𝑛]𝐴𝑖 and 𝐸𝑛

𝑏
⊇ ∪𝑖∈[𝑛]𝐵𝑖,

which implies that

𝐸𝑛𝑎 ∩ 𝐸𝑛𝑏 ⊇ (∪𝑖∈[𝑛]𝐴𝑖) ∩ (∪𝑖∈[𝑛]𝐵𝑖) ⊇ ∪𝑖∈[𝑛] (𝐴𝑖 ∩ 𝐵𝑖) = 𝐸
𝑛
𝑎 ∩ 𝐸𝑛𝑏 ,

which implies that the ⊇ in the inequalities all collapse to =.

For any 𝐼 ⊆ [𝑛], define 𝛼𝐼 = ∩𝑖∈𝐼𝐴𝑖\(∪𝑖∈[𝑛]\𝐼𝐴𝑖), and 𝛽𝐼 = ∩𝑖∈𝐼𝐵𝑖\(∪𝑖∈[𝑛]\𝐼𝐵𝑖).

For any 𝐼 ≠ 𝐼′, 𝛼𝐼∩𝛼𝐼′ = ∅. Thus, {𝛼𝐼}𝐼⊆[𝑛] is a set of disjoint sets in ∪𝑖∈[𝑛]𝐴𝑖,

and similarly, {𝛽𝐼}𝐼⊆[𝑛] is a set of disjoint sets in ∪𝑖∈[𝑛]𝐵𝑖. Also, for any

324


𝑖 ∈ [𝑛], we have 𝐴𝑖 = ∪𝐼⊆[𝑛] |𝑖∈𝐼𝛼𝐼 and 𝐵𝑖 = ∪𝐼⊆[𝑛] |𝑖∈𝐼𝛽𝐼 . Furthermore, for any

𝐼,

𝛼𝐼 ∩ ∪𝑖∈[𝑛]𝐵𝑖 ⊆ (∪𝑖∈[𝑛]𝐴𝑖) ∩ (∪𝑖∈[𝑛]𝐵𝑖) =
⊎
𝑖∈[𝑛] 𝐴𝑖 ∩ 𝐵𝑖,

and thus,

𝛼𝐼 ∩ ∪𝑖∈[𝑛]𝐵𝑖 = (
⊎
𝑖∈[𝑛] 𝐴𝑖 ∩ 𝐵𝑖) ∩ (𝛼𝐼 ∩ ∪𝑖∈[𝑛]𝐵𝑖)

=
⊎
𝑖∈[𝑛]

(
𝐴𝑖 ∩ 𝐵𝑖 ∩ 𝛼𝐼 ∩ ∪ 𝑗∈[𝑛]𝐵 𝑗

)
=

⊎
𝑖∈𝐼

(
𝐴𝑖 ∩ 𝐵𝑖 ∩ 𝛼𝐼 ∩ ∪ 𝑗∈[𝑛]𝐵 𝑗

)
(𝐴𝑖 ∩ 𝛼𝐼 = ∅ if 𝑖 ∉ 𝐼)

=
⊎
𝑖∈𝐼 (𝐴𝑖 ∩ 𝐵𝑖 ∩ 𝛼𝐼) (𝐵𝑖 ∩ ∪ 𝑗∈[𝑛]𝐵 𝑗 = 𝐵𝑖 for any 𝑖)

=
⊎
𝑖∈𝐼 (𝐵𝑖 ∩ 𝛼𝐼) (𝐴𝑖 ∩ 𝛼𝐼 = 𝛼𝐼 for any 𝑖 ∈ 𝐼)

= 𝛼𝐼 ∩ ∪𝑖∈𝐼𝐵𝑖 (D.1)

325


Now,

𝜇(𝐸𝑛𝑎 ∩ 𝐸𝑛𝑏)

= 𝜇((∪𝑖∈[𝑛]𝐴𝑖) ∩ (∪𝑖∈[𝑛]𝐵𝑖))

= 𝜇((⊎𝐼⊆[𝑛] 𝛼𝐼) ∩ (∪𝑖∈[𝑛]𝐵𝑖)) (By definition of 𝛼𝐼)

= 𝜇𝑎 (
⊎
𝐼⊆[𝑛] 𝛼𝐼) · 𝜇𝑏 (∪𝑖∈[𝑛]𝐵𝑖) (By definition of 𝜇)

=


∑︁
𝐼⊆[𝑛]

𝜇𝑎 (𝛼𝐼)
 · 𝜇𝑏 (∪𝑖∈[𝑛]𝐵𝑖) (By finite-additivity of 𝜇𝑎)

=
∑︁
𝐼⊆[𝑛]

𝜇𝑎 (𝛼𝐼) · 𝜇𝑏 (∪𝑖∈[𝑛]𝐵𝑖)

=
∑︁
𝐼⊆[𝑛]

𝜇(𝛼𝐼 ∩ (∪𝑖∈[𝑛]𝐵𝑖)) (By definition of 𝜇)

=
∑︁
𝐼⊆[𝑛]

𝜇(𝛼𝐼 ∩ (∪𝑖∈𝐼𝐵𝑖)) (By eq. (D.1))

=
∑︁
𝐼⊆[𝑛]

𝜇𝑎 (𝛼𝐼) · 𝜇𝑏 (∪𝑖∈𝐼𝐵𝑖) (By definition of 𝜇)

=
∑︁
𝐼⊆[𝑛]

𝜇𝑎 (𝛼𝐼) · 𝜇𝑏 (∪𝑖∈𝐼 (⊎𝐼′⊆[𝑛] |𝑖∈𝐼′𝛽𝐼′)) (By definition of 𝛽𝐼)

=
∑︁
𝐼⊆[𝑛]

𝜇𝑎 (𝛼𝐼) · 𝜇𝑏 (⊎𝐼′⊆[𝑛] |𝐼∩𝐼′≠∅𝛽𝐼′)

=
∑︁
𝐼⊆[𝑛]

𝜇𝑎 (𝛼𝐼) ·
∑︁
𝐼′⊆[𝑛] |𝐼∩𝐼′≠∅

𝜇𝑏 (𝛽𝐼′)

=
∑︁
𝐼⊆[𝑛]

∑︁
𝐼′⊆[𝑛] |𝐼∩𝐼′≠∅

𝜇𝑎 (𝛼𝐼) · 𝜇𝑏 (𝛽𝐼′)

Meanwhile, for any 𝐼, 𝐼′, if |𝐼 ∩ 𝐼′| ≥ 2, then there exists some 𝑗 , 𝑘 such that

326


𝑗 ∈ 𝐼 ∩ 𝐼′ and 𝑘 ∈ 𝐼 ∩ 𝐼′, so

𝜇𝑎 (𝛼𝐼) · 𝜇𝑏 (𝛽𝐼′) = 𝜇𝑎 (∩𝑖∈𝐼𝐴𝑖 \ (∪𝑖∈[𝑛]\𝐼𝐴𝑖)) · 𝜇𝑏 (∩𝑖∈𝐼𝐵𝑖 \ (∪𝑖∈[𝑛]\𝐼𝐵𝑖))

≤ 𝜇𝑎 (𝐴 𝑗 ∩ 𝐴𝑘 ) · 𝜇𝑏 (𝐵 𝑗 ∩ 𝐵𝑘 )

= 𝜇(𝐴 𝑗 ∩ 𝐴𝑘 ∩ 𝐵 𝑗 ∩ 𝐵𝑘 )

= 𝜇((𝐴 𝑗 ∩ 𝐵 𝑗 ) ∩ (𝐴𝑘 ∩ 𝐵𝑘 ))

= 𝜇(∅)

= 0.

Thus, continuing our previous derivation,

𝜇(𝐸𝑛𝑎 ∩ 𝐸𝑛𝑏)

=
∑︁
𝐼⊆[𝑛]

∑︁
𝐼′⊆[𝑛] |𝐼∩𝐼′≠∅

𝜇𝑎 (𝛼𝐼) · 𝜇𝑏 (𝛽𝐼′)

=
∑︁
𝐼⊆[𝑛]

∑︁
𝐼′⊆[𝑛] |1=|𝐼∩𝐼′ |

𝜇𝑎 (𝛼𝐼) · 𝜇𝑏 (𝛽𝐼′) (Because 𝜇𝑎 (𝛼𝐼) · 𝜇𝑏 (𝛽𝐼′) = 0 if |𝐼 ∩ 𝐼′| ≥ 2)

=
∑︁
𝑖∈[𝑛]

∑︁
𝐼⊆[𝑛] |𝑖∈𝐼

∑︁
𝐼′⊆[𝑛] |𝐼∩𝐼′={𝑖}

𝜇𝑎 (𝛼𝐼) · 𝜇𝑏 (𝛽𝐼′)

=
∑︁
𝑖∈[𝑛]

∑︁
𝐼⊆[𝑛] |𝑖∈𝐼

∑︁
𝐼′⊆[𝑛] |𝑖∈𝐼′

𝜇𝑎 (𝛼𝐼) · 𝜇𝑏 (𝛽𝐼′)

(Because 𝜇𝑎 (𝛼𝐼) · 𝜇𝑏 (𝛽𝐼′) = 0 if |𝐼 ∩ 𝐼′| ≥ 2)

=
∑︁
𝑖∈[𝑛]


∑︁
𝐼⊆[𝑛] |𝑖∈𝐼

𝜇𝑎 (𝛼𝐼) ·
∑︁
𝐼′⊆[𝑛] |𝑖∈𝐼′

𝜇𝑏 (𝛽𝐼′)


=
∑︁
𝑖∈[𝑛]

𝜇𝑎 (𝐴𝑖) · 𝜇𝑏 (𝐵𝑖)

=
∑︁
𝑖∈[𝑛]

𝜇(𝐴𝑖 ∩ 𝐵𝑖)

Thus, we established the finite additivity. For countable additivity, sup-

pose 𝐸𝑎 ∩ 𝐸𝑏 =
⊎
𝑖∈N(𝐴𝑖 ∩ 𝐵𝑖). By the same reason as above, we also have

𝐸𝑎 ∩ 𝐸𝑏 = (∪𝑖∈N𝐴𝑖) ∩ (∪𝑖∈N𝐵𝑖) = ∪𝑖∈N(𝐴𝑖 ∩ 𝐵𝑖) = 𝐸𝑎 ∩ 𝐸𝑏 .

327


Then,

𝜇(𝐸𝑎 ∩ 𝐸𝑏)

= 𝜇((∪𝑖∈N𝐴𝑖) ∩ (∪𝑖∈N𝐵𝑖))

= 𝜇𝑎 (∪𝑖∈N𝐴𝑖) · 𝜇𝑏 (∪𝑖∈N𝐵𝑖)

= 𝜇𝑎 ( lim
𝑛→∞
∪𝑖∈[𝑛]𝐴𝑖) · 𝜇𝑏 ( lim

𝑛→∞
∪𝑖∈[𝑛]𝐵𝑖)

= lim
𝑛→∞

𝜇𝑎 (∪𝑖∈[𝑛]𝐴𝑖) · lim
𝑛→∞

𝜇𝑏 (∪𝑖∈[𝑛]𝐵𝑖) (By continuity of 𝜇𝑎 and 𝜇𝑏)

= lim
𝑛→∞

𝜇𝑎 (∪𝑖∈[𝑛]𝐴𝑖) · 𝜇𝑏 (∪𝑖∈[𝑛]𝐵𝑖) (†)

= lim
𝑛→∞

∑︁
𝑖∈[𝑛]

𝜇𝑏 (𝐵𝑖) · 𝜇𝑎 (𝐴𝑖) (By eq. (D.1))

=
∑︁
𝑖∈N

𝜇𝑏 (𝐵𝑖) · 𝜇𝑎 (𝐴𝑖), (D.2)

where (†) holds because that the product of limits equals to the limit of the

product when both lim𝑛→∞ 𝜇𝑎 (∪𝑖∈[𝑛]𝐴𝑖) and lim𝑛→∞ 𝜇𝑏 (∪𝑖∈[𝑛]𝐵𝑖) are finite.

Thus, we proved countable additivity as well.

• Next we show that we can extend 𝜇 to a measure on F𝑎 ⊕ F𝑏.

So far, we proved that 𝜇 is a sub-additive measure on the {𝐸𝑎 ∩ 𝐸𝑏 |𝐸𝑎 ∈ F𝑎, 𝐸𝑏 ∈ F𝑏},

which forms a 𝜋-system. By a known theorem in probability theory

(e.g. [Rosenthal, 2006, Corollary 2.5.4]), we can extend a sub-additive mea-

sure on a 𝜋-system to the 𝜎-algebra it generates if the 𝜋-system is a semi-

algebra. Thus, we can extend 𝜇 to a measure on 𝜎({𝐸𝑎∩𝐸𝑏 | 𝐸𝑎 ∈ F𝑎, 𝐸𝑏 ∈

F𝑏}) if we can prove 𝐽 = {𝐸𝑎 ∩ 𝐸𝑏 | 𝐸𝑎 ∈ F𝑎, 𝐸𝑏 ∈ F𝑏} is a semi-algebra.

– 𝐽 contains ∅ and Ω: trivial.

– 𝐽 is closed under finite intersection: (𝐸𝑎∩𝐸𝑏) ∩ (𝐸′𝑎∩𝐸′𝑏) = (𝐸𝑎∩𝐸
′
𝑎) ∩

(𝐸𝑏 ∩ 𝐸′𝑏), where 𝐸𝑎 ∩ 𝐸′𝑎 ∈ F𝑎, and 𝐸𝑏 ∩ 𝐸′𝑏 ∈ F𝑏.

328


– The complement of any element of 𝐽 is equal to a finite disjoint union

of elements of 𝐽:

(𝐸𝑎 ∩ 𝐸𝑏)𝐶 = 𝐸𝐶𝑎 ∪ 𝐸𝐶𝑏

= (𝐸𝐶𝑎 ∩Ω) ⊎ (𝐸𝑎 ∩ 𝐸𝐶𝑏 )

where 𝐸𝐶𝑎 , 𝐸𝑎 ∈ F𝑎, and 𝐸𝐶
𝑏
,Ω ∈ F𝑏.

As shown in Li et al. [2023a],

𝜎({𝐸𝑎 ∩ 𝐸𝑏 | 𝐸𝑎 ∈ F𝑎, 𝐸𝑏 ∈ F𝑏}) = F𝑎 ⊕ F𝑏 (D.3)

Thus, the extension of 𝜇 is a measure on F𝑎 ⊕ F𝑏.

• Last, we show that 𝜇 is a probability measure on F𝑎 ⊕ F𝑏: 𝜇(Ω) = 𝜇𝑎 (Ω) ·

𝜇𝑏 (Ω) = 1. □

Lemma D.2.7. Consider two probability spaces (F1, 𝜇1), (F2, 𝜇2) ∈ P(Ω), and some

other probability space (Σ𝐴, 𝜇) and kernel 𝜅 such that 𝜇1 = bind(𝜇, 𝜅).

Then, the independent product (F1, 𝜇1) ⊛ (F2, 𝜇2) exists if and only if for any 𝑎 ∈

supp(𝜇), the independent product (F1, 𝜅(𝑎)) ⊛ (F2, 𝜇2) exists. When they both exist,

(F1, 𝜇1) ⊛ (F2, 𝜇2) = (F1 ⊕ F2, bind(𝜇, λ𝑎. 𝜅(𝑎) ⊛ 𝜇2))

Proof. We first show the backwards direction. By lemma D.2.6, for any 𝑎 ∈

supp(𝜇), to show that the independent product (F1, 𝜅(𝑎)) ⊛ (F1, 𝜇1) exists, it

suffices to show that for any 𝐸1 ∈ F1, 𝐸2 ∈ F2 such that 𝐸1 ∩ 𝐸2 = ∅,

𝜅(𝑎) (𝐸1) · 𝜇2(𝐸2) = 0.

Fix any such 𝐸1, 𝐸2, because (F1, 𝜇1) ⊛ (F2, 𝜇2) is defined, we have 𝜇1(𝐸1) ·

𝜇2(𝐸2) = 0, then either 𝜇1(𝐸1) = 0 or 𝜇2(𝐸2) = 0.

329


• If 𝜇1(𝐸1) = 0: Recall that

𝜇1(𝐸1) = bind(𝜇, 𝜅) (𝐸1) =
∑︁
𝑎∈𝐴

𝜇(𝑎) · 𝜅(𝑎) (𝐸1) =
∑︁

𝑎∈supp(𝜇)
𝜇(𝑎) · 𝜅(𝑎) (𝐸1)

Because all 𝜇(𝑎) > 0 and 𝜅(𝑎) (𝐸1) ≥ 0 for all 𝑎 ∈ supp(𝜇) ∑𝑎∈supp(𝜇) 𝜇(𝑎) ·

𝜅(𝑎) (𝐸1) = 0 implies that 𝜇(𝑎) · 𝜅(𝑎) (𝐸1) = 0 for all 𝑎 ∈ supp(𝜇). Thus, for

all 𝑎 ∈ supp(𝜇), it must 𝜅(𝑎) (𝐸1) = 0. Therefore, 𝜅(𝑎) (𝐸1) · 𝜇2(𝐸2) = 0 for all

𝑎 ∈ supp(𝜇) with this 𝐸1, 𝐸2.

• If 𝜇2(𝐸2) = 0, then it is also clear that 𝜅(𝑎) (𝐸1) · 𝜇2(𝐸2) = 0 for all 𝑎 ∈

supp(𝜇).

Thus, we have 𝜅(𝑎) (𝐸1) · 𝜇2(𝐸2) = 0 for any 𝐸1 ∩ 𝐸2 = ∅ and 𝑎 ∈ supp(𝜇).

By lemma D.2.6, the independent product (F1, 𝜅(𝑎)) ⊛ (F1, 𝜇1) exists.

For the forward direction: for any 𝐸1 ∈ F1 and 𝐸2 ∈ F2 such that 𝐸1 ∩ 𝐸2 = ∅,

the independent product (F1, 𝜅(𝑎)) ⊛ (F2, 𝜇2) exists implies that

𝜅(𝑎) (𝐸1) · 𝜇2(𝐸2) = 0.

Thus,

𝜇1(𝐸1) · 𝜇2(𝐸2) = bind(𝜇, 𝜅) (𝐸1) · 𝜇2(𝐸2)

=


∑︁
𝑎∈𝐴

𝜇(𝑎) · 𝜅(𝑎) (𝐸1)
 · 𝜇2(𝐸2)

=
∑︁
𝑎∈𝐴𝜇

𝜇(𝑎) · (𝜅(𝑎) (𝐸1) · 𝜇2(𝐸2))

=
∑︁
𝑎∈𝐴𝜇

𝜇(𝑎) · 0 = 0

Thus, by lemma D.2.6, the independent product (F1, 𝜇1)⊛ (F2, 𝜇2) exists. For

330


any 𝐸1 ∈ F1 and 𝐸2 ∈ F2,

bind(𝜇, λ𝑎. 𝜅(𝑎) ⊛ 𝜇2) (𝐸1 ∩ 𝐸2)

=
∑︁

𝑎∈supp(𝜇)
𝜇(𝑎) · (𝜅(𝑎) ⊛ 𝜇2) (𝐸1 ∩ 𝐸2)

=
∑︁

𝑎∈supp(𝜇)
𝜇(𝑎) · 𝜅(𝑎) (𝐸1) · 𝜇2(𝐸2)

=


∑︁

𝑎∈supp(𝜇)
𝜇(𝑎) · 𝜅(𝑎) (𝐸1)

 · 𝜇2(𝐸2)

= bind(𝜇, 𝜅) (𝐸1) · 𝜇2(𝐸2)

= 𝜇1(𝐸1) · 𝜇2(𝐸2)

= (𝜇1 ⊛ 𝜇2) (𝐸1 ∩ 𝐸2)

Thus, (F1, 𝜇1) ⊛ (F2, 𝜇2) = (F1 ⊕ F2, bind(𝜇, λ𝑎. 𝜅(𝑎) ⊛ 𝜇2)). □

D.3 Construction of the BLUEBELL Model

Lemma D.3.1. The structure PSp is an ordered unital resource algebra (RA) as defined

in definition 5.3.1.

Proof. We defined · and ⪯ the same way as in Li et al. [2023a], and they have

proved that · is associative and commutative, and ⪯ is transitive and reflexive.

We check the rest of conditions one by one.

Condition 𝑎 · 𝑏 = 𝑏 · 𝑎 The independent product is proved to be commutative

in Li et al. [2023a].

Condition (𝑎 · 𝑏) · 𝑐 = 𝑎 · (𝑏 · 𝑐) The independent product is proved to be asso-

ciative in Li et al. [2023a].

331


Condition 𝑎 ⪯ 𝑏 ⇒ 𝑏 ⪯ 𝑐 ⇒ 𝑎 ⪯ 𝑐 The order ⪯ is proved to be transitive in Li

et al. [2023a].

Condition 𝑎 ⪯ 𝑎 The order ⪯ is proved to be reflexive in Li et al. [2023a].

Condition V(𝑎 · 𝑏) ⇒ V(𝑎) Pattern matching on 𝑎 · 𝑏, either there exists prob-

ability spaces P1,P2 such that 𝑎 = P1, 𝑏 = P2 and P1 ⊛ P2 is defined, or

𝑎 · 𝑏 =  .

Case: 𝑎 · 𝑏 =  Note that V(𝑎 · 𝑏) does not hold when 𝑎 · 𝑏 =  , so we can

eliminate this case by ex falso quodlibet.

Case: 𝑎 · 𝑏 = P1 ⊛ P2 Then 𝑎 = P1, and thus V(𝑎).

Condition V(𝜀) Clear because 𝜀 ≠  .

Condition 𝑎 ⪯ 𝑏 ⇒ V(𝑏) ⇒ V(𝑎) Pattern matching on 𝑎 and 𝑏, either there ex-

ists probability spaces P1,P2 such that 𝑎 = P1, 𝑏 = P2 and P1 ⊑ P2 is

defined, or 𝑏 =  .

Case: 𝑏 =  Then V(𝑏) does not hold, and we can eliminate this case by

ex falso quodlibet.

Case: 𝑎 = P1, 𝑏 = P2 and P1 ⊑ P2 We clearly have V(𝑎).

Condition 𝜀 · 𝑎 = 𝑎 Pattern matching on 𝑎, either 𝑎 =  or there exists some

probability space P such that 𝑎 = P.

Case: 𝑎 =  Then 𝜀 · 𝑎 =  = 𝑎.

Case: 𝑎 = P Then 𝜀 · 𝑎 = 𝑎.

Condition 𝑎 ⪯ 𝑏 ⇒ 𝑎 · 𝑐 ⪯ 𝑏 · 𝑐 Pattern matching on 𝑎 and 𝑏. If 𝑎 ⪯ 𝑏, then

either 𝑏 =  or there exists P,P′ such that 𝑎 = P and 𝑏 = P′.

Case: 𝑏 =  Then 𝑏 · 𝑐 =  is the top element, and then 𝑎 · 𝑐 ⪯ 𝑏 · 𝑐.

332


Otherwise 𝑎 ⪯ 𝑏 iff P ⪯ P′, then either 𝑏 · 𝑐 =  and 𝑎 · 𝑐 ⪯ 𝑏 · 𝑐 follows,

or 𝑏 · 𝑐 = P′ ⊛ P′′ for some probability space 𝑐 = P′′. Then P ⪯ P′

implies that P ·P′′ is also defined and P ·P′ ⪯ P ·P′′. Thus, 𝑎 ·𝑐 ⪯ 𝑏 ·𝑐

too. □

Lemma D.3.2 (RA composition preserves compatibility).

F1 # 𝑝1 ⇒ F2 # 𝑝2 ⇒ (F1 ⊕ F2) # (𝑝1 · 𝑝2)

Proof. Let 𝑆1 = {𝑥 ∈ Var | 𝑝1(𝑥) = 0}, 𝑆2 = {𝑥 ∈ Var | 𝑝2(𝑥) = 0}. If F1 # 𝑝1, then

there exists P′1 ∈ P((Var \ 𝑆1) → Val) such that P1 = P′1 ⊗ 𝟙𝑆1→Val In addition, if

F2 # 𝑝2, then there exists P′2 ∈ P((Var \ 𝑆2) → Val) such that P2 = P′2 ⊗ 𝟙𝑆2→Val.

Then,

P1 · P2 = P1 ⊛ P2

= (P′1 ⊗ 𝟙𝑆1→Val) ⊛ (P′2 ⊗ 𝟙𝑆2→Val)

Say (F ′1 , 𝜇
′
1) = P

′
1, and (F ′2 , 𝜇

′
2) = P

′
2. Then the sigma algebra of P1 · P2 is

𝜎(
{
(𝐸1 × 𝑆1 → Val) ∩ (𝐸2 × 𝑆2 → Val) | 𝐸1 ∈ F ′1 , 𝐸2 ∈ F ′2

}
)

=𝜎(
{
((𝐸1 × (𝑆1 \ 𝑆2) → Val) ∩ (𝐸2 × (𝑆2 \ 𝐸1) → Val)) × (𝑆1 ∩ 𝑆2) | 𝐸1 ∈ F ′1 , 𝐸2 ∈ F ′2

}
)

Then, there exists P′′ ∈ P((Var \ (𝑆1 ∩ 𝑆2)) → Val) such that P1 · P2 = P′′ ⊗

𝟙(𝑆1∩𝑆2)→Val). Also,

{𝑥 ∈ Var | (𝑝1 · 𝑝2) (𝑥) = 0}

={𝑥 ∈ Var | 𝑝1(𝑥) + 𝑝2(𝑥) = 0}

={𝑥 ∈ Var | 𝑝1(𝑥) = 0 and 𝑝2(𝑥) = 0}

=𝑆1 ∩ 𝑆2

Therefore, F1 ⊕ F2 is compatible with 𝑝1 · 𝑝2 □

333


Lemma D.3.3. The structure (Perm, ⪯,V, ·, 𝜀) is an ordered unital resource algebra

(RA) as defined in definition 5.3.1.

Proof. We check the conditions one by one.

Condition 𝑎 · 𝑏 = 𝑏 · 𝑎 Follows from the commutativity of addition.

Condition (𝑎 · 𝑏) · 𝑐 = 𝑎 · (𝑏 · 𝑐) Follows from the associativity of addition.

Condition 𝑎 ⪯ 𝑏 ⇒ 𝑏 ⪯ 𝑐 ⇒ 𝑎 ⪯ 𝑐 ⪯ is a point-wise lifting of the order ≤ on

arithmetics, so it follows from the transitivity of ≤.

Condition 𝑎 ⪯ 𝑎 ⪯ is a point-wise lifting of the order ≤ on arithmetics, so it

follows from the reflexivity of ≤.

Condition V(𝑎 · 𝑏) ⇒ V(𝑎) By definition,

V(𝑎 · 𝑏) ⇒ ∀𝑥 ∈ Var, (𝑎 · 𝑏) (𝑥) ≤ 1

⇒ ∀𝑥 ∈ Var, 𝑎(𝑥) + 𝑏(𝑥) ≤ 1

⇒ ∀𝑥 ∈ Var, 𝑎(𝑥) ≤ 1

⇒ V(𝑎)

Condition V(𝜀) Note that 𝜀 = λ . 0 satisfies that ∀𝑥 ∈ Var, 𝜀(𝑥) ≤ 1, so V(𝜀).

Condition 𝑎 ⪯ 𝑏 ⇒ V(𝑏) ⇒ V(𝑎) By definition, 𝑎 ⪯ 𝑏 means ∀𝑥 ∈ Var.𝑎(𝑥) ≤

𝑏(𝑥), and V(𝑏) means that ∀𝑥 ∈ Var.𝑏(𝑥) ≤ 1. Thus, 𝑎 ⪯ 𝑏 and V(𝑏)

implies that ∀𝑥 ∈ Var.𝑎(𝑥) ≤ 𝑏(𝑥) ≤ 1, which implies V(𝑎).

Condition 𝜀 · 𝑎 = 𝑎 By definition,

𝜀 · 𝑎 = λ𝑥. (λ . 0) (𝑥) + 𝑎(𝑥)

= λ𝑥. 0 + 𝑎(𝑥)

= 𝑎.

334


Condition 𝑎 ⪯ 𝑏 ⇒ 𝑎 · 𝑐 ⪯ 𝑏 · 𝑐 By definition,

𝑎 ⪯ 𝑏 ⇔ ∀𝑥 ∈ Var.𝑎(𝑥) ≤ 𝑏(𝑥)

⇒ ∀𝑥 ∈ Var.𝑎(𝑥) + 𝑐(𝑥) ≤ 𝑏(𝑥) + 𝑐(𝑥)

⇒ 𝑎 · 𝑐 ⪯ 𝑏 · 𝑐 □

Lemma D.3.4. The structure PSpPm is an ordered unital resource algebra (RA) as

defined in definition 5.3.1.

Proof. We want to check that PSpPm satisfies all the requirements to be an or-

dered unital resource algebra (RA). Because PSpPm is very close to a product of

PSp and Perm, the proof below is very close to the proof that product RAs are

RA.

First, lemma D.3.2 implies that · is well-defined.

Then we need to check all the RA axioms are satisfied. For any 𝑎, 𝑏 ∈ PSpPm

and any P1, 𝑝1,P2, 𝑝2 such that 𝑎 = (P1, 𝑝1), 𝑏 = (P2, 𝑝2).

We check the conditions one by one.

Condition V(𝑎 · 𝑏) ⇒ V(𝑎) By definition, 𝑎 · 𝑏 = (P1, 𝑝1) · (P2, 𝑝2) = (P1 · P2, 𝑝1 ·

𝑝2). And V(P1 · P2, 𝑝1 · 𝑝2) implies that V(P1 · P2) and V(𝑝1 · 𝑝2). Because

PSp and Perm are both RAs, we have V(P1) and V(𝑝1). Thus, V(P1, 𝑝1).

Condition V(𝜀) Clear because 𝜀 = (𝟙Mem[Var] , λ𝑥. 0) and 𝟙Mem[Var] ≠  , and

∀𝑥.(λ𝑥. 0) (𝑥) ≤ 1.

Condition 𝑎 ⪯ 𝑏 ⇒ V(𝑏) ⇒ V(𝑎) 𝑎 ⪯ 𝑏 implies that P1 ⪯ P2 and 𝑝1 ⪯ 𝑝2. V(𝑏)

implies that P2 ≠  , and ∀𝑥.(𝑝2) (𝑥) ≤ 1. Thus, P1 ≠  , and ∀𝑥.(𝑝1) (𝑥) ≤ 1.

And therefore, V(𝑏).

335


Condition 𝜀 · 𝑎 = 𝑎 𝜀 · 𝑎 = (𝟙Mem[Var] , λ𝑥. 0) · (P1, 𝑝1)

= (𝟙Mem[Var] · P1, λ𝑥. 0 · 𝑝1)

= (P1, 𝑝1) = 𝑎.

Condition 𝑎 ⪯ 𝑏 ⇒ 𝑎 · 𝑐 ⪯ 𝑏 · 𝑐 𝑎 ⪯ 𝑏 implies that P1 ⪯ P2 and 𝑝1 ⪯ 𝑝2.

Say 𝑐 = (P3, 𝑝3). Then 𝑎 · 𝑐 = (P1 · P3, 𝑝1 · 𝑝3) and 𝑏 · 𝑐 = (P2 · P3, 𝑝2 · 𝑝3).

Because P1 ⪯ P2, P1 · P3 ⪯ P2 · P3; similarly, 𝑝1 · 𝑝3 ⪯ 𝑝2 · 𝑝3. Thus,

𝑎 · 𝑐 ⪯ 𝑏 · 𝑐. □

Lemma D.3.5. If 𝑀 is an RA, then 𝑀 𝐼 is also an RA.

Proof. RA is known to be closed under products, and 𝑀 𝐼 can be obtained as

products of 𝑀 , so we omit the proof. □

Lemma D.3.6. M𝐼 is an RA.

Proof. By lemma D.3.4, PSpPm is an RA. By lemma D.3.5,M𝐼 = PSpPm𝐼 is also

an RA. □

D.4 Characterizations of Joint Conditioning

Interestingly, it is possible to characterize the conditioning modality using the

other connectives of the logic.

Proposition D.4.1 (Alternative Characterization of Joint conditioning). The fol-

lowing is a logically equivalent characterization of the joint conditioning modality:

C𝜇 𝐾 ⊣⊢ ∃F , 𝜇, 𝑝, 𝜅.Own(F , 𝜇, 𝑝) ∗ ⌜∀𝑖 ∈ 𝐼 . 𝜇(𝑖) = bind(𝜇, 𝜅(𝑖))⌝

∗ ∀𝑣 ∈ supp(𝜇).Own(F , 𝜅(𝐼) (𝑣), 𝑝) −∗ 𝐾 (𝑣)

336


Proof. In the following, we sometimes abbreviate ∀𝑖 ∈ 𝐼 . 𝜇(𝑖) = bind(𝜇, 𝜅(𝑖)) by

writing just 𝜇 = bind(𝜇, 𝜅).

We start with the embedding:

∃F , 𝜇, 𝑝, 𝜅.Own(F , 𝜇, 𝑝) ∗ ⌜∀𝑖 ∈ 𝐼 . 𝜇(𝑖) = bind(𝜇, 𝜅(𝑖))⌝

∗ ∀𝑎 ∈ supp(𝜇).Own(F , 𝜅(𝐼) (𝑎), 𝑝) −∗ 𝐾 (𝑎)

⊣⊢ λ𝑟. ∃F , 𝜇, 𝑝, 𝜅.
(
Own(F , 𝜇′, 𝑝) ∗ ⌜𝜇 = bind(𝜇, 𝜅)⌝ ∗

(∀𝑎 ∈ supp(𝜇).Own(F , 𝜅𝑎, 𝑝) −∗ 𝐾 (𝑎))
)
(𝑟)

⊣⊢ λ𝑟. ∃F , 𝜇, 𝑝, 𝜅, F1, 𝜇1, 𝑝1, F2, 𝜇2, 𝑝2, F3, 𝜇3, 𝑝3,

𝑟 ⊒ (F1, 𝜇1, 𝑝1) · (F2, 𝜇2, 𝑝2) · (F3, 𝜇3, 𝑝3)∧

(F1, 𝜇1, 𝑝1) ⊒ (F , 𝜇, 𝑝) ∧ ⌜𝜇 = bind(𝜇, 𝜅)⌝∧

(∀𝑎 ∈ supp(𝜇).∀𝑟1, 𝑟2. 𝑟1 · (F3, 𝜇3, 𝑝3) = 𝑟2 ∧ 𝑟1 ⊒ (F , 𝜅𝑎, 𝑝) ⇒ 𝐾 (𝑎) (𝑟2))

⊣⊢ λ𝑟. ∃F , 𝜇, 𝑝, F3, 𝜇3, 𝑝3, 𝜅.

𝑟 ⊒ (F , 𝜇, 𝑝) · (F3, 𝜇3) ∧ ⌜𝜇 = bind(𝜇, 𝜅)⌝∧

(∀𝑎 ∈ supp(𝜇).∀𝑟1, 𝑟2. 𝑟1 · (F3, 𝜇3, 𝑝3) = 𝑟2 ∧ 𝑟1 ⊒ (F , 𝜅𝑎, 𝑝) ⇒ 𝐾 (𝑎) (𝑟2))

For the last equivalence, the forward direction holds because

𝑟 ⊒ (F1, 𝜇1, 𝑝1) · (F2, 𝜇2, 𝑝2) · (F3, 𝜇3, 𝑝3)

⊒ (F1, 𝜇1, 𝑝1) · (F3, 𝜇3, 𝑝3)

⊒ (F , 𝜇, 𝑝) · (F3, 𝜇3, 𝑝3).

The backward direction holds because we can pick (F1, 𝜇1, 𝑝1) = (F , 𝜇, 𝑝),

(F2, 𝜇2) be the trivial probability space on 𝑠 and 𝑝2 = λ . 0.

• To show that the embedding implies the original assertion C𝜇 𝐾 , we start

337


with 𝜇(𝑖) ⊛ 𝜇3(𝑖). For any 𝑖, we have 𝜇(𝑖) = bind(𝜇, 𝜅(𝑖)), and thus

𝜇(𝑖) ⊛ 𝜇3(𝑖) = bind(𝜇, 𝜅(𝑖)) ⊛ 𝜇3(𝑖).

According to lemma D.2.7, 𝜇(𝑖)⊛𝜇3(𝑖) is defined implies that 𝜅(𝑖) (𝑎)⊛𝜇3(𝑖)

is defined for any 𝑎 ∈. Furthermore,

𝜇(𝑖) ⊛ 𝜇3(𝑖) = bind(𝜇, λ𝑎. 𝜅(𝑖) (𝑎) ⊛ 𝜇3(𝑖))

We abbreviate the hyperkernel [𝑖: λ𝑎. 𝜅(𝑖) (𝑎) ⊛ 𝜇3(𝑖) | 𝑖 ∈ 𝐼] as 𝜅′. For any

𝑎 ∈ supp(𝜇), the assertion

∀𝑎 ∈ supp(𝜇).∀𝑟1, 𝑟2.𝑟1 ⊛ (F3, 𝜇3, 𝑝3) = 𝑟2 ∧ 𝑟1 ⊒ (F , 𝜅(𝐼)𝑎, 𝑝) ⇒ 𝐾 (𝑎) (𝑟2)

applies with the specific case 𝑟1 = (F , 𝜅(𝐼) (𝑎), 𝑝), gives us

𝐾 (𝑎) ((F , 𝜅(𝐼) (𝑎), 𝑝) · (F3, 𝜇3, 𝑝3)])

By the definition of composition in our resource algebra, we have that

𝐾 (𝑎) holds on (F ⊕ F3, 𝜅
′(𝐼) (𝑎), 𝑝 + 𝑝3).

For any 𝑟,

– If V(𝑟), then there exists F ′, 𝜇′, 𝑝′ such that 𝑟 = (F ′, 𝜇′, 𝑝′). Note that

𝑟 = (F ′, 𝜇′, 𝑝′) ⊒ (F , 𝜇, 𝑝) · (F3, 𝜇3, 𝑝3) = (F ⊕ F3, 𝜇 ⊛ 𝜇3, 𝑝 + 𝑝3)

By lemma D.2.4, 𝜇 ⊛ 𝜇3 = bind(𝜇, 𝜅′) implies that there exists

𝜅′′ such that 𝜇(𝑖) = bind(𝜇, 𝜅′′(𝑖)), and that for any 𝑎 ∈ supp 𝜇,

(F ⊕ F3, 𝜅
′(𝐼) (𝑎)) ⊑ (F ′, 𝜅′′(𝐼) (𝑎)). Thus, by monotonicity with

respect to the extension order, that would imply 𝐾 (𝑎) holds on

(F ′, 𝜅′′(𝐼) (𝑎), 𝑝′). And 𝐾 (𝑎) holds on (F ′, 𝜅′′(𝐼) (𝑎), 𝑝′) for any 𝑎 ∈

supp 𝜇 together with 𝜇(𝑖) = bind(𝜇, 𝜅′′(𝑖)) implies that 𝑟 satisfy the

original assertion of conditioning modality.

338


– If not V(𝑟), then 𝑟 satisfies any assertions, so 𝑟 satisfy the original

assertion of conditioning modality.

• To show the other direction that having the original assertion implies the

embedded assertion. Assume C𝜇 𝐾 (𝑟), that is,

∃F , 𝜇, 𝑝, 𝜅. (F , 𝜇, 𝑝) ⪯ 𝑟 ∧ ∀𝑖 ∈ 𝐼 . 𝜇(𝑖) = bind(𝜇, 𝜅(𝑖))

∧ ∀𝑣 ∈ supp(𝜇).𝐾 (𝑣) (F , 𝜅(𝐼) (𝑣), 𝑝)

(𝑟)

To show that 𝑟 also satisfy the embedding, we pick the witness for

the existential quantifier as follows: let (F3, 𝜇3) be the trivial probabil-

ity space on Mem[Var]; let 𝑝3 = λ . 0; pick (Fembd, 𝜇embd, 𝑝embd) be the

(Forig, 𝜇orig, 𝑝orig) that witness C𝜇 𝐾 (𝑟), and 𝜅embd = 𝜅orig.

Then:

– First we show

𝑟 ⪰ (Forig, 𝜇orig, 𝑝orig)

= (Forig, 𝜇orig, 𝑝orig) · (F3, 𝜇3, 𝑝3)

= (Fembd, 𝜇embd, 𝑝embd) · (F3, 𝜇3, 𝑝3)

– 𝜇orig = bind(𝜇, 𝜅orig(𝐼) (𝑎)) implies 𝜇embd = bind(𝜇, 𝜅embd(𝐼) (𝑎)).

– For any 𝑟1, 𝑟2,

𝑟1 · (F3, 𝜇3, 𝑝3) = 𝑟2 ∧ 𝑟1 ⊒ (Fembd, 𝜅embd(𝐼) (𝑎), 𝑝embd)

implies that 𝑟2 = 𝑟1 ⊒ (Forig, 𝜅orig(𝐼) (𝑎), 𝑝orig). By the assumption that

the orig assertion holds, we have 𝐾 (𝑎) (Forig, 𝜅orig(𝐼) (𝑎), 𝑝orig), which

implies 𝐾 (𝑎) (𝑟2).

Therefore, 𝑟 also satisfy the embedding. □

339


D.5 Soundness

D.5.1 Soundness of Primitive Rules

Soundness of Distribution Ownership Rules

Lemma D.5.1. DIST-INJ is sound.

Proof. Assume a valid 𝑎 ∈ M𝐼 is such that both 𝐸 $∼ 𝜇(𝑎) and 𝐸 $∼ 𝜇′(𝑎) hold.

Let 𝑎 = (F , 𝜇0, 𝑝), then we know 𝜇 = 𝜇0 ◦ 𝐸−1 = 𝜇′, which proves the claim. □

Lemma D.5.2. SURE-MERGE is sound.

Proof. The proof for the forward direction is very similar to the one for sec-

tion 5.3.5. For 𝑎 ∈ M𝐼 , if (⌈𝐸1⌉ ∗ ⌈𝐸2⌉)(𝑎). Then there exists 𝑎1, 𝑎2 such that

𝑎1 · 𝑎2 ⪯ 𝑎 and ⌈𝐸1⌉ (𝑎1), ⌈𝐸2⌉ (𝑎2). Say 𝑎 = (F , 𝜇, 𝑝), 𝑎1 = (F1, 𝜇1, 𝑝1) and

𝑎2 = (F2, 𝜇2, 𝑝2). Then ⌈𝐸1⌉ (𝑎1) implies that

𝜇1(𝐸−1
1 (True)) = 1

And similarly,

𝜇2(𝐸−1
2 (True)) = 1

Thus,

𝜇(𝐸−1
1 (True) ∩ 𝐸−1

2 (True)) = 𝜇1(𝐸−1
1 (True)) · 𝜇2(𝐸−1

2 (True)) = 1.

Hence,

𝜇(𝐸1 ∧ 𝐸−1
2 (True)) = 𝜇(𝐸−1

1 (True) ∩ 𝐸−1
2 (True)) = 1

340


Thus, ⌈𝐸1 ∧ 𝐸2⌉ (𝑎).

Now we prove the backwards direction: Say 𝑎 = (F , 𝜇, 𝑝). if ⌈𝐸1 ∧ 𝐸2⌉ (𝑎),

then 𝜇(𝐸1 ∧ 𝐸−1
2 (True)) = 1, and then

𝜇(𝐸−1
1 (True)) ≥ 𝜇(𝐸1 ∧ 𝐸−1

2 (True)) = 1

𝜇(𝐸−1
2 (True)) ≥ 𝜇(𝐸1 ∧ 𝐸−1

2 (True)) = 1

Let F1 = 𝜎(𝐸−1
1 (True)) and F2 = 𝜎(𝐸−1

2 (True)). Then,

⌈𝐸1⌉ (F1, 𝜇 |F1 , λ . 0)

⌈𝐸2⌉ (F2, 𝜇 |F2 , λ . 0)

(F1, 𝜇 |F1 , λ . 0) ∗ (F2, 𝜇 |F2 , λ . 0) ⪯ 𝑎

Thus, ⌈𝐸1⌉ ∗ ⌈𝐸2⌉ holds on 𝑎. □

Lemma D.5.3. PROD-SPLIT is sound.

Proof. For any (F , 𝜇, 𝑝) such that ((𝐸1, 𝐸2) $∼ 𝜇1 ⊗ 𝜇2) (F , 𝜇, 𝑝), by definition, it

must

∃F ′, 𝜇′. (Own(F ′, 𝜇′)) (F , 𝜇, 𝑝) ∗ (𝐸1, 𝐸2) � (F ′(𝑖), 𝜇′(𝑖)) ∧ 𝜇1 ⊗ 𝜇2 = 𝜇′(𝑖) ◦ (𝐸1, 𝐸2)−1.

We can derive from it that

∃F ′, 𝜇′, 𝑝′.(F ′, 𝜇′) ⪯ (F , 𝜇, 𝑝)∗(
∀𝑎, 𝑏 ∈ 𝐴.∃𝐿𝑎,𝑏,𝑈𝑎,𝑏 ∈ F ′(𝑖). 𝐿𝑎,𝑏 ⊆ (𝐸1, 𝐸2)−1(𝑎, 𝑏) ⊆ 𝑈𝑎,𝑏 ∧ 𝜇′(𝐿𝑎,𝑏) = 𝜇′(𝑈𝑎,𝑏)∧

𝜇1 ⊗ 𝜇2(𝑎, 𝑏) = 𝜇′(𝑖) (𝐿𝑎,𝑏) = 𝜇′(𝑖) (𝑈𝑎,𝑏)
)

Also, for any 𝑎, 𝑏, 𝑎′, 𝑏′ ∈ 𝐴 such that 𝑎 ≠ 𝑎′ or 𝑏 ≠ 𝑏′, we have 𝐿𝑎,𝑏 disjoint from

𝐿𝑎′,𝑏′ because on 𝐿𝑎,𝑏 ∩ 𝐿𝑎′,𝑏′ , the random variable (𝐸1, 𝐸2) maps to both (𝑎, 𝑏)

and (𝑎′, 𝑏′).

341


Define

F1(𝑖) = 𝜎(
{
(⋃𝑏∈𝐴 𝐿𝑎,𝑏) | 𝑎 ∈ 𝐴

}
∪

{
(⋃𝑏∈𝐴𝑈𝑎,𝑏) | 𝑎 ∈ 𝐴

}
),

and similarly define

F2(𝑖) = 𝜎(
{
(⋃𝑎∈𝐴 𝐿𝑎,𝑏) | 𝑏 ∈ 𝐴

}
∪

{
(⋃𝑎∈𝐴𝑈𝑎,𝑏) | 𝑏 ∈ 𝐴

}
).

Denote 𝜇′ restricted to F1 as 𝜇′1 and 𝜇′ restricted to F2 as 𝜇′2.

We want to show that (F1(𝑖), 𝜇′1(𝑖)) ⊛ (F2(𝑖), 𝜇′2(𝑖)) ⊑ (F ′(𝑖), 𝜇′(𝑖)), which

boils down to show that for any 𝑋1 ∈ F1(𝑖), any 𝑋2 ∈ F2(𝑖),

𝜇′(𝑋1 ∩ 𝑋2) = 𝜇′1(𝑋1) · 𝜇′2(𝑋2)

For convenience, we will denote ∪𝑏∈𝐴𝐿𝑎,𝑏 as 𝐿𝑎, denote ∪𝑎∈𝐴𝐿𝑎,𝑏 as 𝐿𝑏, de-

note ∪𝑏∈𝐴𝑈𝑎,𝑏 as𝑈𝑎, and denote ∪𝑎∈𝐴𝑈𝑎,𝑏 as𝑈𝑏.

First, using a standard construction in measure theory proofs, we rewrite

F1 and F2 as sigma algebra generated by sets of partitions. Specifically, F1 is

equivalent to

𝜎(
{⋂

𝑎∈𝑆1
𝐿𝑎 ∩

⋂
𝑎∈𝑆2

𝑈𝑎 \ (
⋃
𝑎∈𝐴\𝑆1

𝐿𝑎 ∪
⋃
𝑎∈𝐴\𝑆2

𝑈𝑎) | 𝑆1, 𝑆2 ⊆ 𝐴
}
)

and similarly, F2 is equivalent to

𝜎(
{⋂

𝑏∈𝑇1
𝐿𝑏 ∩

⋂
𝑏∈𝑇2

𝑈𝑏 \ (
⋃
𝑏∈𝐴\𝑇1

𝐿𝑏 ∪
⋃
𝑏∈𝐴\𝑇2

𝑈𝑏) | 𝑇1, 𝑇2 ⊆ 𝐴
}
).

Thus, by lemma D.2.2, any event 𝑋1 in F1 can be represented by⊎
𝑆1∈𝐼1,𝑆2∈𝐼2

⋂
𝑎∈𝑆1

𝐿𝑎 ∩
⋂
𝑎∈𝑆2

𝑈𝑎 \ (
⋃
𝑎∈𝐴\𝑆1

𝐿𝑎 ∪
⋃
𝑎∈𝐴\𝑆2

𝑈𝑎)

for some 𝐼1, 𝐼2 ⊆ P(𝐴), where P is the powerset over 𝐴. Similarly, any event 𝑋2

in F2 can be represented by⊎
𝑆3∈𝐼3,𝑆4∈𝐼4

⋂
𝑏∈𝑆3

𝐿𝑏 ∩
⋂
𝑏∈𝑆4

𝑈𝑏 \ (
⋃
𝑏∈𝐴\𝑆3

𝐿𝑏 ∪
⋃
𝑏∈𝐴\𝑆2

𝑈𝑏)

342


for some 𝐼3, 𝐼4 ⊆ P(𝐴). Thus, 𝑋1 ∩ 𝑋2 can be represented as

𝑋1 ∩ 𝑋2 = (⊎𝑆1∈𝐼1,𝑆2∈𝐼2
⋂
𝑎∈𝑆1

𝐿𝑎 ∩
⋂
𝑎∈𝑆2

𝑈𝑎 \ (
⋃
𝑎∈𝐴\𝑆1

𝐿𝑎 ∪
⋃
𝑎∈𝐴\𝑆2

𝑈𝑎))⋂(⊎𝑆3∈𝐼3,𝑆4∈𝐼4
⋂
𝑏∈𝑆3

𝐿𝑏 ∩
⋂
𝑏∈𝑆4

𝑈𝑏 \ (
⋃
𝑏∈𝐴\𝑆3

𝐿𝑏 ∪
⋃
𝑏∈𝐴\𝑆2

𝑈𝑏))

=
⊎
𝑆1∈𝐼1,𝑆2∈𝐼2,𝑆3∈𝐼3,𝑆4∈𝐼4 (

⋂
𝑎∈𝑆1

𝐿𝑎 ∩
⋂
𝑎∈𝑆2

𝑈𝑎 \ (
⋃
𝑎∈𝐴\𝑆1

𝐿𝑎 ∪
⋃
𝑎∈𝐴\𝑆2

𝑈𝑎))

∩ (⋂𝑏∈𝑆3
𝐿𝑏 ∩

⋂
𝑏∈𝑆4

𝑈𝑏 \ (
⋃
𝑏∈𝐴\𝑆3

𝐿𝑏 ∪
⋃
𝑏∈𝐴\𝑆2

𝑈𝑏))

Because 𝐿𝑎,𝑏 and 𝐿𝑎′,𝑏′ are disjoint as long as not 𝑎 = 𝑎′ and 𝑏 = 𝑏′, we have

𝐿𝑎 disjoint from 𝐿𝑎′ if 𝑎 ≠ 𝑎′. Thus,
⋂
𝑎∈𝑆1

𝐿𝑎∩
⋂
𝑎∈𝑆2

𝑈𝑎 \ (
⋃
𝑎∈𝐴\𝑆1

𝐿𝑎∪
⋃
𝑎∈𝐴\𝑆2

𝑈𝑎)

is not empty only when 𝑆1 is singleton and empty.

• If 𝑆1 is empty, then⋂
𝑎∈𝑆1

𝐿𝑎∩
⋂
𝑎∈𝑆2

𝑈𝑎\(
⋃
𝑎∈𝐴\𝑆1

𝐿𝑎∪
⋃
𝑎∈𝐴\𝑆2

𝑈𝑎) =
⋂
𝑎∈𝑆2

𝑈𝑎\(
⋃
𝑎∈𝐴 𝐿𝑎∪

⋃
𝑎∈𝐴\𝑆2

𝑈𝑎)

has measure 0 because
⋃
𝑎∈𝐴 𝐿𝑎 has measure 1.

• Otherwise, if 𝑆1 is singleton, say 𝑆1 = {𝑎′}, then⋂
𝑎∈𝑆1

𝐿𝑎 ∩
⋂
𝑎∈𝑆2

𝑈𝑎 \ (
⋃
𝑎∈𝐴\𝑆1

𝐿𝑎 ∪
⋃
𝑎∈𝐴\𝑆2

𝑈𝑎) = 𝐿𝑎′ ∩
⋂
𝑎∈𝑆2

𝑈𝑎 \
⋃
𝑎∈𝐴\𝑆2

𝑈𝑎).

Furthermore,

𝜇′(⋂𝑎∈𝑆2
𝑈𝑎) = 𝜇′(

⋂
𝑎∈𝑆2

𝐿𝑎 ⊎ (𝑈𝑎 \ 𝐿𝑎))

= 𝜇′(⋂𝑎∈𝑆2
𝐿𝑎) + 0

And
⋂
𝑎∈𝑆2

𝐿𝑎 is non-empty only if 𝑆2 is a singleton set or empty set. Thus,

𝐿𝑎′ ∩
⋂
𝑎∈𝑆2

𝑈𝑎 \
⋃
𝑎∈𝐴\𝑆2

𝑈𝑎) ⊆
⋂
𝑎∈𝑆2

𝑈𝑎 has non-zero measure only if 𝑆2 is

empty or a singleton set.

– When 𝑆2 is empty,

𝐿𝑎′ ∩
⋂
𝑎∈𝑆2

𝑈𝑎 \
⋃
𝑎∈𝐴\𝑆2

𝑈𝑎 = 𝐿𝑎′ \
⋃
𝑎∈𝐴𝑈𝑎 ⊆ 𝐿𝑎′ \𝑈𝑎′ = ∅

343


– When 𝑆2 = {𝑎′},

𝐿𝑎′ ∩
⋂
𝑎∈𝑆2

𝑈𝑎 \
⋃
𝑎∈𝐴\𝑆2

𝑈𝑎 = 𝐿𝑎′ \
⋃
𝑎∈𝐴,𝑎≠𝑎′ 𝑈𝑎 .

– When 𝑆2 = {𝑎′′} for some 𝑎′′ ≠ 𝑎′

𝐿𝑎′ ∩
⋂
𝑎∈𝑆2

𝑈𝑎 \
⋃
𝑎∈𝐴\𝑆2

𝑈𝑎 = 𝐿𝑎′ ∩𝑈𝑎′′ \
⋃
𝑎∈𝐴,𝑎≠𝑎′′ 𝑈𝑎

= ∅

Thus,

𝜇′(𝑋1) =𝜇′
( ⋃

𝑆1∈𝐼1,𝑆2∈𝐼2
⋂
𝑎∈𝑆1

𝐿𝑎 ∩
⋂
𝑎∈𝑆2

𝑈𝑎 \ (
⋃
𝑎∈𝐴\𝑆1

𝐿𝑎 ∪
⋃
𝑎∈𝐴\𝑆2

𝑈𝑎)∩)

=𝜇′
( ⋃
{𝑎′}∈𝐼1,𝑆2∈𝐼2 (𝐿𝑎′ ∩

⋂
𝑎∈𝑆2

𝑈𝑎 \
⋃
𝑎∈𝐴\𝑆2

𝑈𝑎)
)

=𝜇′
( ⋃
{𝑎′}∈𝐼1∩𝐼2 𝐿𝑎′ ∩𝑈𝑎′ \

⋃
𝑎∈𝐴,𝑎≠𝑎′ 𝑈𝑎

)
=𝜇′

( ⋃
{𝑎′}∈𝐼1∩𝐼2 (𝐿𝑎′ \

⋃
𝑎∈𝐴,𝑎≠𝑎′ 𝑈𝑎)

)
=𝜇′

( ⋃
{𝑎′}∈𝐼1∩𝐼2 (𝐿𝑎′ \

⋃
𝑎∈𝐴,𝑎≠𝑎′ (𝐿𝑎

⋃(𝑈𝑎 \ 𝐿𝑎))))
=𝜇′

( ⋃
{𝑎′}∈𝐼1∩𝐼2 (𝐿𝑎′ \

⋃
𝑎∈𝐴,𝑎≠𝑎′ (𝐿𝑎))

)
=𝜇′

( ⋃
{𝑎′}∈𝐼1∩𝐼2 𝐿𝑎′

)
Denote

⋃
{𝑎′}∈𝐼1∩𝐼2 𝐿𝑎′ as 𝑋′1. And 𝑋1 \ 𝑋′1 and 𝑋′1 \ 𝑋1 both have measure 0.

Similar results hold for 𝑋2 as well, and we can show that

𝜇′(𝑋2) =𝜇′
( ⋃
{𝑏′}∈𝐼3∩𝐼4 𝐿𝑏′

)
Denote

⋃
{𝑏′}∈𝐼3∩𝐼4 𝐿𝑏′ as 𝑋′2. And 𝑋2 \ 𝑋′2 and 𝑋′2 \ 𝑋2 both have measure 0.

344


Thus,

𝜇′(𝑋1 ∩ 𝑋2) =𝜇′(𝑋1 ∩ 𝑋2 ∩ 𝑋′1) + 𝜇
′((𝑋1 ∩ 𝑋2) \ 𝑋′1)

=𝜇′(𝑋1 ∩ 𝑋2 ∩ 𝑋′1) + 0

=𝜇′(𝑋1 ∩ 𝑋2 ∩ 𝑋′1 ∩ 𝑋
′
2) + 𝜇

′((𝑋1 ∩ 𝑋2 ∩ 𝑋′1) \ 𝑋
′
2) + 0

=𝜇′(𝑋1 ∩ 𝑋2 ∩ 𝑋′1 ∩ 𝑋
′
2) + 0 + 0

=𝜇′(𝑋1 ∩ 𝑋2 ∩ 𝑋′1 ∩ 𝑋
′
2) + 𝜇

′((𝑋2 ∩ 𝑋′1 ∩ 𝑋
′
2) \ 𝑋1)

=𝜇′(𝑋2 ∩ 𝑋′1 ∩ 𝑋
′
2)

=𝜇′(𝑋2 ∩ 𝑋′1 ∩ 𝑋
′
2) + 𝜇

′((𝑋′1 ∩ 𝑋
′
2) \ 𝑋2)

=𝜇′(𝑋′1 ∩ 𝑋
′
2)

=𝜇′
(
(⋃{𝑎′}∈𝐼1∩𝐼2 𝐿𝑎′) ∩ (⋃{𝑏′}∈𝐼3∩𝐼4 𝐿𝑏′))

=𝜇′
(⋃
{𝑎′}∈𝐼1∩𝐼2,{𝑏′}∈𝐼3∩𝐼4 𝐿𝑎′,𝑏′

)
=

∑︁
{𝑎′}∈𝐼1∩𝐼2
{𝑏′}∈𝐼3∩𝐼4

𝜇′(𝐿𝑎′,𝑏′)

Next we show that 𝜇′(𝑖) (𝐿𝑎,𝑏) = 𝜇′(𝑖) (𝑋1) · 𝜇′(𝑖) (𝑋2). Note that 𝜇′(𝐿𝑎) =∑
𝑏 𝜇
′(𝐿𝑎,𝑏) = 𝜇′(𝐸−1

1 (𝑎)), and 𝜇′(𝐿𝑏) =
∑
𝑎 𝜇
′(𝐿𝑎,𝑏) = 𝜇′(𝐸−1

2 (𝑏)). And 𝜇1 ⊗ 𝜇2 =

𝜇′(𝑖) ◦ (𝐸1, 𝐸2)−1 implies that

𝜇′(𝑖) (𝐿𝑎,𝑏) = 𝜇1 ⊗ 𝜇2(𝑎, 𝑏)

= 𝜇1(𝑎) · 𝜇2(𝑏)

345


Then

𝜇1(𝑎) = 𝜇1(𝑎) ·
∑︁
𝑏∈𝐴

𝜇2(𝑏)

=
∑︁
𝑏∈𝐴

𝜇1(𝑎) · 𝜇2(𝑏)

=
∑︁
𝑏∈𝐴

𝜇′(𝑖) (𝐿𝑎,𝑏)

= 𝜇′(𝑖)
(∑︁
𝑏∈𝐴

𝐿𝑎,𝑏

)
= 𝜇′(𝑖) (𝐿𝑎),

and similarly,

𝜇2(𝑏) =
(∑︁
𝑎∈𝐴

𝜇1(𝑎)
)
· 𝜇2(𝑏)

=
∑︁
𝑎∈𝐴
(𝜇1(𝑎) · 𝜇2(𝑏))

=
∑︁
𝑎∈𝐴

𝜇′(𝑖) (𝐿𝑎,𝑏)

= 𝜇′(𝑖)
(∑︁
𝑎∈𝐴

𝐿𝑎,𝑏

)
= 𝜇′(𝑖) (𝐿𝑏).

Thus,

𝜇′(𝑖) (𝐿𝑎,𝑏) = 𝜇1(𝑎) · 𝜇2(𝑏) = 𝜇′(𝑖) (𝐿𝑎) · 𝜇′(𝑖) (𝐿𝑏)

346


Therefore,

𝜇′(𝑋1 ∩ 𝑋2) =
∑︁

{𝑎′}∈𝐼1∩𝐼2
{𝑏′}∈𝐼3∩𝐼4

𝜇′(𝐿𝑎′,𝑏′)

=
∑︁

{𝑎′}∈𝐼1∩𝐼2
{𝑏′}∈𝐼3∩𝐼4

𝜇′(𝐿𝑎′) · 𝜇′(𝐿𝑏′)

=
∑︁

{𝑎′}∈𝐼1∩𝐼2

𝜇′(𝐿𝑎′) ·
∑︁

{𝑏′}∈𝐼3∩𝐼4

𝜇′(𝐿𝑏′)

=𝜇′(𝑋1) · 𝜇′(𝑋2)

=𝜇′1(𝑋1) · 𝜇′2(𝑋2)

Thus we have (F1, 𝜇
′
1) ⊛ (F2, 𝜇

′
2) ⊑ (F ′, 𝜇′). Let 𝑝1 = 𝑝2 = λ𝑥. 𝑝′(𝑥)/2.

Next we show that 𝐸1 $∼ 𝜇1(F1, 𝜇
′
1, 𝑝1) and 𝐸2 $∼ 𝜇2(F2, 𝜇

′
2, 𝑝2). By definition,

𝐸1 $∼ 𝜇1(F1, 𝜇
′
1, 𝑝1) is equivalent to

∃F ′′, 𝜇′′. (Own(F ′′, 𝜇′′)) (F1, 𝜇
′
1, 𝑝1) ∗ 𝐸1 � (F ′′(𝑖), 𝜇′′(𝑖)) ∧ 𝜇1 = 𝜇′′(𝑖) ◦ 𝐸−1

1 ,

which is equivalent to

∃F ′′, 𝜇′′. (F ′′, 𝜇′′) ⪯ (F1, 𝜇
′
1) ∗

(
∀𝑎 ∈ 𝐴.∃𝑆𝑎, 𝑇𝑎 ∈ F ′′(𝑖).

𝑆𝑎 ⊆ 𝐸−1
1 (𝑎) ⊆ 𝑇𝑎 ∧ 𝜇

′′(𝑖) (𝑆𝑎) = 𝜇′′(𝑖) (𝑆𝑎) ∧ 𝜇1(𝑎) = 𝜇′′(𝑖) (𝑆𝑎) = 𝜇′′(𝑖) (𝑇𝑎)
)

We can pick the existential witness to be F1, 𝜇
′
1. For any 𝑎 ∈ 𝐴, 𝐸−1

1 (𝑎) =⋃
𝑏∈𝐴 (𝐸1, 𝐸2)−1(𝑎, 𝑏). Because we have 𝐿𝑎,𝑏 ⊆ (𝐸1, 𝐸2)−1(𝑎, 𝑏) ⊆ 𝑈𝑎,𝑏, then

⋃
𝑏∈𝐴 𝐿𝑎,𝑏 ⊆ 𝐸−1

1 (𝑎) =
⋃
𝑏∈𝐴 (𝐸1, 𝐸2)−1(𝑎, 𝑏) ⊆ ⋃

𝑏∈𝐴𝑈𝑎,𝑏 .

By definition, for each 𝑎,
⋃
𝑏∈𝐴 𝐿𝑎,𝑏 ∈ F1(𝑖) and

⋃
𝑏∈𝐴𝑈𝑎,𝑏 ∈ F1(𝑖), and we also

347


have

𝜇′1(𝑖) (
⋃
𝑏∈𝐴 𝐿𝑎,𝑏) =

∑︁
𝑏∈𝐴

𝜇′1(𝑖) (𝐿𝑎,𝑏)

=
∑︁
𝑏∈𝐴

𝜇′1(𝑖) (𝑈𝑎,𝑏)

= 𝜇′1(𝑖)
(⋃

𝑏∈𝐴𝑈𝑎,𝑏
)

= 𝜇1(𝑎)

Thus, 𝑆𝑎 =
⋃
𝑏∈𝐴 𝐿𝑎,𝑏 and 𝑇𝑎 =

⋃
𝑏∈𝐴𝑈𝑎,𝑏 witnesses the conditions needed for

𝐸1 $∼ 𝜇1(F1, 𝜇
′
1, 𝑝1). And similarly, we have 𝐸2 $∼ 𝜇2(F2, 𝜇

′
2, 𝑝2). □

Soundness of Conditioning Rules

Lemma D.5.4. C-TRUE is sound.

Proof. Let 𝜀 = (F𝜀, 𝜇𝜀, 𝑝𝜀) ∈ M𝐼 be the unit ofM𝐼 and 𝜅 = λ𝑣. 𝜇𝜀. Then,

True ⊢ Own(F𝜀, 𝜇𝜀)

⊢ Own(F𝜀, 𝜇𝜀) ∗ ⌜∀𝑖 ∈ 𝐼 . 𝜇𝜀 (𝑖) = bind(𝜇, 𝜅(𝑖))⌝

⊢ Own(F𝜀, 𝜇𝜀) ∗ ⌜∀𝑖 ∈ 𝐼 . 𝜇𝜀 (𝑖) = bind(𝜇, 𝜅(𝑖))⌝ ∗ True

⊢ ∃F𝜀, 𝜇𝜀, 𝜅.Own(F𝜀, 𝜇𝜀) ∗ ⌜∀𝑖 ∈ 𝐼 . 𝜇𝜀 (𝑖) = bind(𝜇, 𝜅(𝑖))⌝

∗ (∀𝑣 ∈ supp(𝜇).Own(F𝜀, 𝜅(𝐼) (𝑣), 𝑝𝜀) −∗ True)

⊢ C𝜇 .True □

Lemma D.5.5. C-FALSE is sound.

Proof. Assume 𝑎 ∈ M𝐼 is such that V(𝑎) and that it satisfies C𝜇 𝑣. False. By

348


definition, this means that, for some F0, 𝜇0, 𝑝0, and 𝜅0:

(F0, 𝜇0, 𝑝0) ⪯ 𝑎 (D.4)

∀𝑖 ∈ 𝐼 . 𝜇0(𝑖) = bind(𝜇, 𝜅0(𝑖)) (D.5)

∀𝑣 ∈ supp(𝜇). False(F0, 𝜅0(𝐼) (𝑣), 𝑝0) (D.6)

Let 𝑣0 ∈ supp(𝜇)—we know one exists because 𝜇 is a (discrete) probability distri-

bution. Then by (D.6) on 𝑣0 we get False(F0, 𝜅0(𝐼) (𝑣0), 𝑝0) holds. Since False( )

is by definition false, we get False(𝑎) holds ex falso. □

Lemma D.5.6. C-CONS is sound.

Proof. Assume 𝑎 ∈ M𝐼 is such that V(𝑎) and that it satisfies C𝜇 𝑣. 𝐾 (𝑣). By defi-

nition, this means that, for some F0, 𝜇0, 𝑝0, and 𝜅0:

(F0, 𝜇0, 𝑝0) ⪯ 𝑎 (D.7)

∀𝑖 ∈ 𝐼 . 𝜇0(𝑖) = bind(𝜇, 𝜅0(𝑖)) (D.8)

∀𝑣 ∈ supp(𝜇). 𝐾 (𝑣) (F0, 𝜅0(𝐼) (𝑣), 𝑝0) (D.9)

Then by the premise ∀𝑣. 𝐾 (𝑣) ⊢ 𝐾′(𝑣) and (D.9) we obtain

∀𝑣 ∈ supp(𝜇). 𝐾′(𝑣) (F0, 𝜅0(𝐼) (𝑣), 𝑝0) (D.10)

By (D.7), (D.8), and (D.10) we get C𝜇 𝑣. 𝐾′(𝑣) as desired. □

Lemma D.5.7. C-FRAME is sound.

Proof. Assume 𝑎 ∈ M𝐼 is such that V(𝑎) and that it satisfies 𝑃 ∗ C𝜇 𝑣. 𝐾 (𝑣). By

definition, this means that there exist some (F1, 𝜇1, 𝑝1), (F2, 𝜇2, 𝑝2), and 𝜅 such

349


that

(F1, 𝜇1, 𝑝1) · (F2, 𝜇2, 𝑝2) ⪯ 𝑎 (D.11)

𝑃(F1, 𝜇1, 𝑝1) (D.12)

∀𝑖 ∈ 𝐼 .𝜇2(𝑖) = bind(𝜇, 𝜅(𝑖)) (D.13)

∀𝑣 ∈ supp(𝜇). 𝐾 (𝑣) (F2, 𝜅(𝐼) (𝑣), 𝑝2) (D.14)

Now let:

(F ′, 𝜇′, 𝑝′) = (F1(𝑖), 𝜇1(𝑖)) ⊛ (F2(𝑖), 𝜇2(𝑖)) 𝜅′(𝑖) = λ𝑣. 𝜇1(𝑖) ⊛ 𝜅(𝑖) (𝑣)

By lemma D.2.7, for each 𝑖 ∈ 𝐼:

(F ′, 𝜇′, 𝑝′) = (F1(𝑖), 𝜇1(𝑖)) ⊛ (F2(𝑖), 𝜇2(𝑖))

= (F1(𝑖) ⊕ F2(𝑖), bind(𝜇, λ𝑣. 𝜇1(𝑖) ⊛ 𝜅(𝑖) (𝑣))) (By lemma D.2.7)

= (F1(𝑖) ⊕ F2(𝑖), bind(𝜇, 𝜅′(𝑖)))

Notice that 𝜅′(𝐼) (𝑣) = 𝜇1 ⊛ 𝜅(𝐼) (𝑣). Thus we obtain:

(F ′, 𝜇′, 𝑝′) ⪯ 𝑎 (D.15)

∀𝑖 ∈ 𝐼 .𝜇′(𝑖) = bind(𝜇, 𝜅′(𝑖)) (D.16)

and for all 𝑣 ∈ supp(𝜇),

(F1, 𝜇1, 𝑝1) ⊛ (F2, 𝜅(𝐼) (𝑣), 𝑝2) = (F ′, 𝜇1 ⊛ 𝜅(𝐼) (𝑣), 𝑝′) ⪯ (F ′, 𝜅′(𝐼) (𝑣), 𝑝′) (D.17)

𝑃(F1, 𝜇1, 𝑝1) (D.18)

𝐾 (𝑣) (F2, 𝜅(𝐼) (𝑣), 𝑝2) (D.19)

which gives us that 𝑎 satisfies C𝜇 𝑣. (𝑃 ∗ 𝐾 (𝑣)) as desired. □

Lemma D.5.8. C-UNIT-L is sound.

Proof. Straightforward. □

350


Lemma D.5.9. C-UNIT-R is sound.

Proof. We prove the two directions separately.

Forward direction 𝐸 $∼ 𝜇 ⊢ C𝜇 𝑣. ⌈𝐸 = 𝑣⌉ By unfolding the assumption 𝐸 $∼ 𝜇

we get that there exist F , 𝜇 such that:

Own(F , 𝜇) ∗ ⌜𝐸 � (F (𝑖), 𝜇(𝑖))⌝ ∗ ⌜𝜇 = 𝜇(𝑖) ◦ 𝐸−1⌝

holds. Let

𝜅 ≜ λ 𝑗 .


λ𝑣. 𝜇( 𝑗) if 𝑗 ≠ 𝑖

λ𝑣. 𝛾𝑣 if 𝑗 = 𝑖
𝛾𝑣 ≜ λ𝑋 :F (𝑖). 𝜇(𝑖) (𝑋 ∩ (𝐸 = 𝑣)−1)

𝜇(𝑖) ((𝐸 = 𝑣)−1)

That is, 𝜅( 𝑗) maps every 𝑣 to 𝜇( 𝑗) when 𝑖 ≠ 𝑗 , while when 𝑖 = 𝑗 it maps 𝑣

to the distribution 𝜇(𝑖) conditioned on 𝐸 = 𝑣. Note that 𝜅 is well defined

because

1. although the events 𝑋 ∩ (𝐸 = 𝑣)−1 and (𝐸 = 𝑣)−1 might not belong to

F (𝑖), their probability is uniquely determined by almost measurabil-

ity of 𝐸 ;

2. we are only interested in the cases where 𝑣 ∈ supp(𝜇), which implies

that the denominator is not zero: 𝜇(𝑖) ((𝐸 = 𝑣)−1) = 𝜇(𝑣) > 0.

By construction we obtain that

∀ 𝑗 ∈ 𝐼 . 𝜇( 𝑗) = bind(𝜇, 𝜅( 𝑗)) (D.20)

∀𝑣 ∈ supp(𝜇). 𝜅(𝑖) (𝑣) ((𝐸 = 𝑣)−1) = 1 (D.21)

From (D.21) we get that ⌈𝐸 = 𝑣⌉ holds on (F (𝑖), 𝜅(𝑖) (𝑣), 𝑝(𝑖)), from which

it follows that:

Own(F , 𝜅(𝐼) (𝑣), 𝑝) −∗ ⌈𝐸 = 𝑣⌉

351


Therefore we obtain

∃F , 𝜇, 𝜅, 𝑝.Own(F , 𝜇, 𝑝) ∗ ⌜∀ 𝑗 ∈ 𝐼 .𝜇( 𝑗) = bind(𝜇, 𝜅( 𝑗))⌝

∗ (∀𝑣 ∈ 𝐴𝜇 .Own(F , 𝜅(𝐼) (𝑣), 𝑝) −∗ ⌈𝐸 = 𝑣⌉)

which gives us C𝜇 𝑣. ⌈𝐸 = 𝑣⌉ by proposition D.4.1.

Backward direction C𝜇 𝑣. ⌈𝐸 = 𝑣⌉ ⊢ 𝐸 $∼ 𝜇 First note that

⌈𝐸 = 𝑣⌉ (F , 𝜅(𝑣), 𝑝)

⇔
(
((𝐸 = 𝑣) ∈ true) $∼ 𝛿True

)
(F , 𝜅(𝐼) (𝑣), 𝑝)

⇔ ((𝐸 = 𝑣) ∈ true) � (F (𝑖), 𝜅(𝑖) (𝑣)) ∧ 𝛿True = 𝜅(𝑖) (𝑣) ◦ ((𝐸 = 𝑣) ∈ true)−1

⇔ ((𝐸 = 𝑣) ∈ true) � (F (𝑖), 𝜅(𝑖) (𝑣)) ∧ 𝛿𝑣 = 𝜅(𝑖) (𝑣) ◦ 𝐸−1

for some 𝜅. This implies ⌜𝐸 � F (𝑖), 𝜅(𝑖) (𝑣)⌝. Then, for any value 𝑣 ∈

supp(𝜇),

𝜇(𝑖) ◦ 𝐸−1(𝑣) = (bind(𝜇, 𝜅(𝑖)) ◦ 𝐸−1) (𝑣)

= bind(𝜇, 𝜅(𝑖)) (𝐸−1(𝑣))

=
∑︁

𝑣′∈supp(𝜇)
𝜇(𝑣′) · 𝜅(𝑖) (𝑣′) (𝐸−1(𝑣))

=
∑︁

𝑣′∈supp(𝜇)
𝜇(𝑣′) · (𝜅(𝑖) (𝑣′) ◦ 𝐸−1) (𝑣)

=
∑︁

𝑣′∈supp(𝜇)
𝜇(𝑣′) · 𝛿𝑣′ (𝑣)

= 𝜇(𝑣)

This implies the pure facts that 𝐸 � (F (𝑖), 𝜇(𝑖)) and 𝜇 = 𝜇(𝑖) ◦ 𝐸−1. There-

352


fore:

C𝜇 𝑣. ⌈𝐸 = 𝑣⌉ ⊢ ∃F , 𝜇, 𝜅, 𝑝.Own(F , 𝜇, 𝑝) ∗ ⌜∀ 𝑗 ∈ 𝐼 .𝜇( 𝑗) = bind(𝜇, 𝜅( 𝑗))⌝

∗ (∀𝑣 ∈ 𝐴𝜇 .Own(F , 𝜅(𝐼) (𝑣), 𝑝) −∗ ⌈𝐸 = 𝑣⌉)

⊢ ∃F , 𝜇.Own(F , 𝜇) ∗ ⌜𝐸 � (F (𝑖), 𝜇(𝑖))⌝ ∗ ⌜𝜇 = 𝜇(𝑖) ◦ 𝐸−1⌝

⊢ 𝐸 $∼ 𝜇 □

Lemma D.5.10. C-ASSOC is sound.

Proof. Define 𝜅′ = λ𝑣. bind(𝜅(𝑣), λ𝑤. return(𝑣, 𝑤)). We start by rewriting the as-

sumption C𝜇 𝑣. C𝜅(𝑣) 𝑤. 𝐾 (𝑣, 𝑤) so that 𝑘′ is used and 𝐾 depends only on the

binding of the innermost modality:

C𝜇 𝑣. C𝜅(𝑣) 𝑤. 𝐾 (𝑣, 𝑤) ⊢ C𝜇 𝑣. C𝜅′ (𝑣) (𝑣′, 𝑤). 𝐾 (𝑣, 𝑤) (C-TRANSF, C-CONS)

⊢ C𝜇 𝑣. C𝜅′ (𝑣) (𝑣′, 𝑤). 𝐾 (𝑣′, 𝑤) (C-PURE, C-CONS)

C-TRANSF is applied to the innermost modality by using the bijection 𝑓𝑣 (𝑤) =

(𝑣, 𝑤). Then, since (𝑣′, 𝑤) ∈ supp(𝑘′(𝑣)) ⇒ 𝑣 = 𝑣′, we can replace 𝑣′ for 𝑣 in 𝐾 .

Our goal is now to prove:

C𝜇 𝑣. C𝜅′ (𝑣) (𝑣′, 𝑤). 𝐾 (𝑣′, 𝑤) ⊢ Cbind(𝜇,𝜅′) (𝑣′, 𝑤). 𝐾 (𝑣′, 𝑤)

Let 𝑎 ∈ M𝐼 be such that V(𝑎) and that it satisfies C𝜇 𝑣. C𝜅′ (𝑣) (𝑣′, 𝑤). 𝐾 (𝑣′, 𝑤).

From this assumption we know that, for some F0, 𝜇0, 𝑝0, and 𝜅0:

(F0, 𝜇0, 𝑝0) ⪯ 𝑎 (D.22)

∀𝑖 ∈ 𝐼 . 𝜇0(𝑖) = bind(𝜇, 𝜅0(𝑖)) (D.23)

353


such that ∀𝑣 ∈ supp(𝜇), there are some F 𝑣1 , 𝜇
𝑣
1, 𝑝

𝑣
1, and 𝜅𝑣1 satisfying:

(F 𝑣1 , 𝜇
𝑣
1, 𝑝

𝑣
1) ⪯ (F0, 𝜅0(𝐼) (𝑣), 𝑝0) (D.24)

∀𝑖 ∈ 𝐼 . 𝜇𝑣1(𝑖) = bind(𝜅′(𝑣), 𝜅𝑣1 (𝑖)) (D.25)

∀(𝑣′, 𝑤) ∈ supp(𝜅′(𝑣)). 𝐾 (𝑣′, 𝑤) (F 𝑣1 , 𝜅
𝑣
1 (𝐼) (𝑣

′, 𝑤), 𝑝𝑣1) (D.26)

Our goal is to prove Cbind(𝜇,𝜅′) (𝑣′, 𝑤). 𝐾 (𝑣′, 𝑤) holds on 𝑎. To this end, we

want to show that there exists 𝜅′2 such that:

∀𝑖 ∈ 𝐼 . 𝜇0(𝑖) = bind(bind(𝜇, 𝜅′), 𝜅′2(𝑖)) (D.27)

∀(𝑣′, 𝑤) ∈ supp(bind(𝜇, 𝜅′)). 𝐾 (𝑣′, 𝑤) (F0, 𝜅
′
2(𝐼) (𝑣

′), 𝑝0) (D.28)

Now let

𝜅2(𝑖) = λ(𝑣′, 𝑤). 𝜅𝑣′1 (𝑖) (𝑣
′, 𝑤).

which by construction and eq. (D.25) gives us

𝜇𝑣1(𝑖) = bind(𝜅′(𝑣), 𝜅𝑣1 (𝑖)) = bind(𝜅′(𝑣), 𝜅2(𝑖))

Therefore, by eq. (D.24), we can apply lemma D.2.4 and obtain that there exists

a 𝜅′2 such that

𝜅0(𝑖) (𝑣) = bind(𝜅′(𝑣), 𝜅′2(𝑖)) (D.29)(
F0, 𝜅

′
2(𝑖) (𝑣

′, 𝑤)
)
⊒

(
F 𝑣′1 , 𝜅2(𝑖) (𝑣′, 𝑤)

)
=

(
F 𝑣′1 , 𝜅𝑣

′

1 (𝑖) (𝑣
′, 𝑤)

)
(D.30)

By eqs. (D.23) and (D.29) we have:

𝜇0(𝑖) = bind(𝜇, 𝜅0(𝑖))

= bind(𝜇, λ𝑣. bind(𝜅′(𝑣), 𝜅′2(𝑖))) By associativity of bind

= bind(bind(𝜇, 𝜅′), 𝜅′2(𝑖))

354


which proves eq. (D.27).

Finally, to prove eq. (D.28), we can observe that (𝑣′, 𝑤) ∈ supp(bind(𝜇, 𝜅′))

implies 𝑣′ ∈ supp(𝜇); therefore, by (D.26), upward closure of 𝐾 (𝑣′, 𝑤), and (D.30)

and (D.24), we can conclude 𝐾 (𝑣′, 𝑤) holds on (F0, 𝜅
′
2(𝐼) (𝑣

′), 𝑝0), as desired. □

Lemma D.5.11. C-UNASSOC is sound.

Proof. Assume 𝑎 ∈ M𝐼 is such that V(𝑎) and that it satisfies Cbind(𝜇,𝜅) 𝑤. 𝐾 (𝑤).

By definition, this means that, for some F0, 𝜇0, 𝑝0, and 𝜅0:

(F0, 𝜇0, 𝑝0) ⪯ 𝑎 (D.31)

∀𝑖 ∈ 𝐼 . 𝜇0(𝑖) = bind(bind(𝜇, 𝜅), 𝜅0(𝑖)) (D.32)

∀𝑤 ∈ supp(bind(𝜇, 𝜅)). 𝐾 (𝑤) (F0, 𝜅0(𝐼) (𝑤), 𝑝0) (D.33)

Our goal is to show that 𝑎 satisfies C𝜇 𝑣. C𝜅(𝑣) 𝑤. 𝐾 (𝑤), for which it would suffice

to show that there is a 𝜅1 such that:

∀𝑖 ∈ 𝐼 . 𝜇0(𝑖) = bind(𝜇, 𝜅1(𝑖)) (D.34)

and for all 𝑣 ∈ supp(𝜇) there is a 𝜅𝑣2 with

∀𝑖 ∈ 𝐼 . 𝜅1(𝑖) (𝑣) = bind(𝜅(𝑣), 𝜅𝑣2 (𝑖)) (D.35)

∀𝑤 ∈ supp(𝜅(𝑣)). 𝐾 (𝑤) (F0, 𝜅
𝑣
2 (𝐼) (𝑤), 𝑝0) (D.36)

To prove this we let

𝜅1(𝑖) = λ𝑣. bind(𝜅(𝑣), 𝜅0(𝑖)) 𝜅𝑣2 (𝑖) = 𝜅0(𝑖)

By the associativity of bind we have

𝜇0(𝑖) = bind(bind(𝜇, 𝜅), 𝜅0(𝑖)) = bind(𝜇, λ𝑣. bind(𝜅(𝑣), 𝜅0(𝑖))) = bind(𝜇, 𝜅1(𝑖))

355


which proves (D.34). By construction,

𝜅1(𝑖) (𝑣) = bind(𝜅(𝑣), 𝜅0(𝑖)) = bind(𝜅(𝑣), 𝜅𝑣2 (𝑖))

proving (D.35). Finally, 𝑣 ∈ supp(𝜇) and 𝑤 ∈ supp(𝜅(𝑣)) imply 𝑤 ∈

supp(bind(𝜇, 𝜅)), so by (D.33) we proved (D.36), concluding the proof. □

Lemma D.5.12. C-SKOLEM is sound.

Proof. For any resource 𝑟 = (F , 𝜇, 𝑝),

(
C𝜇 𝑣. ∃𝑥 : Var. 𝑄(𝑣, 𝑥)

)
(F , 𝜇, 𝑝)

⇔ ∃𝜅.∀𝑖 ∈ 𝐼 .𝜇(𝑖) = bind(𝜇, 𝜅(𝑖)) ∧ ∀𝑣 ∈ supp(𝜇). (∃𝑥 : 𝑋. 𝑄(𝑣, 𝑥)) (F , 𝜅(𝐼) (𝑣), 𝑝)

For all 𝑣 ∈ supp(𝜇), ∃𝑥 : 𝑋. 𝑄(𝑣, 𝑥) holds on (F , 𝜅(𝐼) (𝑣), 𝑝). Thus,

𝑄(𝑣, 𝑥𝑣) (F , 𝜅(𝐼) (𝑣), 𝑝) holds for some 𝑥𝑣. Then define 𝑓 : 𝐴 → Var by letting

𝑓 (𝑣) = 𝑥𝑣 for 𝑣 ∈ supp(𝜇). Then,

∃𝜅.∀𝑖 ∈ 𝐼 .𝜇(𝑖) = bind(𝜇, 𝜅(𝑖)) ∧ ∀𝑣 ∈ supp(𝜇). 𝑄(𝑣, 𝑓 (𝑣)) (F , 𝜅(𝐼) (𝑣), 𝑝)

And therefore F , 𝜇, 𝑝 satisfies ∃ 𝑓 : 𝐴→ Var. C𝜇 𝑣. 𝑄(𝑣, 𝑥). □

Lemma D.5.13. C-TRANSF is sound.

Proof. For any resource 𝑎 = (F , 𝜇, 𝑝), if
(
C𝜇 𝑣. 𝐾 (𝑣)

)
((F , 𝜇, 𝑝)), then

∃𝜅. (F , 𝜇, 𝑝) ⪯ 𝑎 ∧ ∀𝑖 ∈ 𝐼 . 𝜇(𝑖) = bind(𝜇, 𝜅(𝑖))

∧ ∀𝑣 ∈ supp(𝜇).(𝐾 (𝑣)) ((F , 𝜅(𝐼) (𝑣), 𝑝))

356


𝜇 = bind(𝜇, 𝜅) says that for any 𝐸 ∈ F ,

𝜇(𝐸) =
∑︁

𝑣∈supp(𝜇)
𝜇(𝑣) · 𝜅(𝐼) (𝑣) (𝐸)

=
∑︁

𝑣 | 𝑓 (𝑣)∈supp(𝜇)
𝜇( 𝑓 (𝑣)) · 𝜅(𝐼) ( 𝑓 (𝑣)) (𝐸) (Because 𝑓 is bijective)

=
∑︁

𝑣∈supp(𝜇′)
𝜇′(𝑣) · 𝜅(𝐼) ( 𝑓 (𝑣)) (𝐸) (Because 𝜇′(𝑣) = 𝜇( 𝑓 (𝑣)))

= bind(𝜇′, λ𝑣. 𝜅(𝐼) ( 𝑓 (𝑣))) (𝐸)

Thus, 𝜇 = bind(𝜇′, λ𝑣. 𝜅(𝐼) ( 𝑓 (𝑣))). Furthermore, (𝐾 ( 𝑓 (𝑣))) ((F , 𝜅(𝐼) ( 𝑓 (𝑣)), 𝑝)).

Thus, if we denote λ𝑣. 𝜅(𝐼) ( 𝑓 (𝑣)) as 𝜅′, it satisfies

(F , 𝜇, 𝑝) ⪯ 𝑎 ∧ ∀𝑖 ∈ 𝐼 . 𝜇(𝑖) = bind(𝜇′, 𝜅′(𝑖))

∧ ∀𝑣 ∈ supp(𝜇).(𝐾 (𝑣)) ((F , 𝜅′(𝐼) (𝑣), 𝑝))

Thus,
(
C′𝜇 𝑣. 𝐾 ( 𝑓 (𝑣))

)
((F , 𝜇, 𝑝)). □

Lemma D.5.14. SURE-STR-CONVEX is sound.

Proof. Assume 𝑎 ∈ M𝐼 is a valid resource that satisfies C𝜇 𝑣.(𝐾 (𝑣) ∗ ⌈𝐸⌉). Then,

by definition, we know that, for some (F0, 𝜇0, 𝑝0) and 𝜅0:

(F0, 𝜇0, 𝑝0) ⪯ 𝑎 (D.37)

∀𝑖 ∈ 𝐼 . 𝜇0(𝑖) = bind(𝜇, 𝜅0(𝑖)) (D.38)

and, for all 𝑣 ∈ supp(𝜇), there are (F 𝑣1 , 𝜇
𝑣
1, 𝑝

𝑣
1), (F

𝑣
2 , 𝜇

𝑣
2, 𝑝

𝑣
2) such that

(F 𝑣1 , 𝜇
𝑣
1, 𝑝

𝑣
1) · (F

𝑣
2 , 𝜇

𝑣
2, 𝑝

𝑣
2) ⪯ (F0, 𝜅0(𝐼) (𝑣), 𝑝0) (D.39)

𝐾 (𝑣) (F 𝑣1 , 𝜇
𝑣
1, 𝑝

𝑣
1) (D.40)

⌈𝐸⌉ (F 𝑣2 , 𝜇
𝑣
2, 𝑝

𝑣
2) (D.41)

357


From (D.41) we know that for all 𝑣 ∈ supp(𝜇) there are 𝐿𝑣1, 𝐿
𝑣
0,𝑈

𝑣
1 ,𝑈

𝑣
0 ∈ F

𝑣
2 (𝑖)

such that:

𝐿𝑣0 ⊆ 𝐸
−1(False) ⊆ 𝑈𝑣

0 𝜇𝑣2(𝐿
𝑣
0) = 𝜇

𝑣
2(𝑈

𝑣
0) = 0

𝐿𝑣1 ⊆ 𝐸
−1(True) ⊆ 𝑈𝑣

1 𝜇𝑣2(𝐿
𝑣
1) = 𝜇

𝑣
2(𝑈

𝑣
1) = 1

Without loss of generality, all 𝐿𝑣0, 𝐿
𝑣
1,𝑈

𝑣
0 ,𝑈

𝑣
1 can be assumed to be only non-trivial

on FV(𝐸). Consequently, we can also assume that 𝑝𝑣2(𝑥) < 1 for every 𝑥, and in

addition 𝑝𝑣2(𝑥) > 0 if and only if 𝑥 ∈ FV 𝐸 and 𝑗 = 𝑖. From these components we

can construct a new resource:

F3( 𝑗) ≜


𝜎

(
{⋂𝑣∈supp(𝜇) 𝐿

𝑣
1,

⋃
𝑣∈supp(𝜇)𝑈

𝑣
1}

)
if 𝑗 = 𝑖

{Mem[Var], ∅} if 𝑗 ≠ 𝑖

𝜇3 ≜ 𝜇0 |F3

𝑝3 ≜ λ𝑥.


min

{
𝑝𝑣2(𝑥) |𝑣 ∈ supp(𝜇)

}
if 𝑗 = 𝑖 ∧ 𝑥 ∈ FV(𝐸)

0 otherwise

By construction we obtain that ∀ 𝑗 ∈ 𝐼 . F3( 𝑗) ⊆ F0( 𝑗), and that V(F3, 𝜇3, 𝑝3).

Now letting 𝑝′1 = 𝑝0 − 𝑝3, we obtain a valid resource (F0, 𝜇0, 𝑝
′
1).

Moreover, we have F0 = F0⊕F3 and ∀ 𝑗 ∈ 𝐼 .∀𝑋 ∈ F3( 𝑗). 𝜇3(𝑋) ∈ {0, 1}, which

means that for any 𝑋 ∈ F3 and 𝑌 ∈ F0, 𝜇3(𝑋) · 𝜇0(𝑌 ) = 𝜇0(𝑋∩𝑌 ). Then, by (D.38):

(F0, bind(𝜇, 𝜅0), 𝑝′1) ⊛ (F3, 𝜇3, 𝑝3) ⪯ (F0, 𝜇0, 𝑝0) = 𝑎

To close the proof it would then suffice to show that C𝜇 𝑣.𝐾 (𝑣) holds on

(F0, bind(𝜇, 𝜅0), 𝑝′1) and that ⌈𝐸⌉ holds on (F3( 𝑗), 𝜇3, 𝑝3). The latter is obvious.

The former follows from the fact that 𝜅0( 𝑗) (𝑣) |F 𝑣1 = 𝜇𝑣1( 𝑗); by upward-closure

and (D.40) this means that, for all 𝑣 ∈ supp(𝜇):

𝐾 (𝑣) (F 𝑣1 , 𝜇
𝑣
1, 𝑝

𝑣
1) ⇒ 𝐾 (𝑣) (F0, 𝜅0(𝐼) (𝑣), 𝑝′1)

358


which proves our claim. □

Lemma D.5.15. C-FOR-ALL is sound.

Proof. By unfolding the definitions,

C𝜇 𝑣.∀𝑥 : 𝑋.𝑄(𝑣)

⇔ ∃F , 𝜇0, 𝜅.Own((F , 𝜇0)) ∗ ⌜∀𝑖 ∈ 𝐼 . 𝜇0(𝑖) = bind(𝜇, 𝜅(𝑖))⌝

∗ (∀𝑎 ∈ 𝐴𝜇 .Own((F , [𝑖: 𝜅(𝑖) (𝑎) | 𝑖 ∈ 𝐼])) −∗ ∀𝑥 : 𝑋.𝑄(𝑣))

⇒ ∀𝑥 : 𝑋.∃F , 𝜇0, 𝜅.Own((F , 𝜇0)) ∗ ⌜∀𝑖 ∈ 𝐼 . 𝜇0(𝑖) = bind(𝜇, 𝜅(𝑖))⌝

∗ (∀𝑎 ∈ 𝐴𝜇 .Own((F , [𝑖: 𝜅(𝑖) (𝑎) | 𝑖 ∈ 𝐼])) −∗ 𝑄(𝑣))

⇔∀𝑥 : 𝑋. C𝜇 𝑣.𝑄(𝑣)

□

Lemma D.5.16. C-PURE is sound.

Proof. We first prove the forward direction: For any 𝑎 ∈ M𝐼 , if(
⌜𝜇(𝑋) = 1⌝ ∗ C𝜇 .𝐾 (𝑣)

)
((𝑎)), then there exists some F0, 𝜇0, 𝑝0, and 𝜅0:

(F0, 𝜇0, 𝑝0) ⪯ 𝑎

∀𝑖 ∈ 𝐼 . 𝜇0(𝑖) = bind(𝜇, 𝜅0(𝑖))

∀𝑣 ∈ supp(𝜇). (𝐾 (𝑣)) ((F0, 𝜅0(𝐼) (𝑣), 𝑝0))

The pure fact ⌜𝜇(𝑋) = 1⌝ implies that 𝑋 ⊇ supp(𝜇) , and thus for every

𝑣 ∈ supp(𝜇), ⌜𝑣 ∈ 𝑋⌝. Therefore, (𝐾 (𝑣)) ((F0, 𝜅0(𝐼) (𝑣), 𝑝0)), which witnesses that(
C𝜇 .⌜𝑣 ∈ 𝑋⌝ ∗ 𝐾 (𝑣)

)
((𝑎)).

We then prove the backward direction: if C𝜇 .⌜𝑣 ∈ 𝑋⌝∗𝐾 (𝑣), then there exists

359


F0, 𝜇0, 𝑝0, and 𝜅0:

(F0, 𝜇0, 𝑝0) ⪯ 𝑎

∀𝑖 ∈ 𝐼 . 𝜇0(𝑖) = bind(𝜇, 𝜅0(𝑖))

∀𝑣 ∈ supp(𝜇). (⌜𝑣 ∈ 𝑋⌝ ∗ 𝐾 (𝑣)) ((F0, 𝜅0(𝐼) (𝑣), 𝑝0))

Then it must 𝑋 ⊇ supp(𝜇), which implies that ⌜𝜇(𝑋) = 1⌝. Meanwhile, ⌜𝑣 ∈ 𝑋⌝ ∗

𝐾 (𝑣) holding on (F0, 𝜅0(𝐼) (𝑣), 𝑝0) implies that 𝐾 (𝑣) holds on (F0, 𝜅0(𝐼) (𝑣), 𝑝0)

Therefore, ⌜𝜇(𝑋) = 1⌝ ∗ C𝜇 .𝐾 (𝑣) holds on 𝑎. □

D.5.2 Soundness of Primitive WP Rules

Structural Rules

Lemma D.5.17. WP-CONS is sound.

Proof. For any resource 𝑎, if (wp 𝑡 {𝑄})(𝑎), then

∀𝜇0.∀𝑐. (𝑎 · 𝑐) ⪯ 𝜇0 ⇒ ∃𝑏.
(
(𝑏 · 𝑐) ⪯ ⟦𝑡⟧(𝜇0) ∧ (𝑄) ((𝑏))

)
From the premise 𝑄 ⊢ 𝑄′, and the fact that 𝑏 must be valid for (𝑏 · 𝑐) ⪯ ⟦𝑡⟧(𝜇0)

to hold, we have that 𝑄(𝑏) implies 𝑄′(𝑏). Thus, it must

∀𝜇0.∀𝑐. (𝑎 · 𝑐) ⪯ 𝜇0 ⇒ ∃𝑏.
(
(𝑏 · 𝑐) ⪯ ⟦𝑡⟧(𝜇0) ∧𝑄′(𝑏)

)
,

which says (wp 𝑡 {𝑄′})(𝑎). □

Lemma D.5.18. WP-FRAME is sound.

360


Proof. Let 𝑎 ∈ M𝐼 be a valid resource such that it satisfies 𝑃 ∗ wp 𝑡 {𝑄}. By

definition, this means that, for some 𝑎1, 𝑎2:

𝑎1 · 𝑎2 ⪯ 𝑎 (D.42)

𝑃(𝑎1) (D.43)

∀𝜇0, 𝑐. (𝑎2 · 𝑐) ⪯ 𝜇0 ⇒ ∃𝑏.
(
(𝑏 · 𝑐) ⪯ ⟦𝑡⟧(𝜇0) ∧𝑄(𝑏)

)
(D.44)

Our goal is to prove 𝑎 satisfies wp 𝑡 {𝑃 ∗𝑄}, which, by unfolding the definitions,

amounts to:

∃𝑎′ ⪯ 𝑎.∀𝜇0, 𝑐
′. (𝑎′·𝑐′) ⪯ 𝜇0 ⇒ ∃𝑏1, 𝑏. ((𝑏1·𝑏)·𝑐′) ⪯ ⟦𝑡⟧(𝜇0)∧𝑃(𝑏1)∧𝑄(𝑏) (D.45)

Our goal can be proven by instantiating 𝑎′ = (𝑎1 ·𝑎2) and 𝑏1 = 𝑎1, from which

we reduce the goal to proving, for all 𝜇0, 𝑐
′:

((𝑎1 · 𝑎2) · 𝑐′) ⪯ 𝜇0 ⇒ ∃𝑏. ((𝑎1 · 𝑏) · 𝑐′) ⪯ ⟦𝑡⟧(𝜇0) ∧ 𝑃(𝑎1) ∧𝑄(𝑏) (D.46)

We have that 𝑃(𝑎1) holds by (D.43). By associativity and commutativity of the

RA operation, we reduce the goal to:

(𝑎2 · (𝑎1 · 𝑐′)) ⪯ 𝜇0 ⇒ ∃𝑏. (𝑏 · (𝑎1 · 𝑐′)) ⪯ ⟦𝑡⟧(𝜇0) ∧𝑄(𝑏) (D.47)

This follows by applying assumption (D.44) with 𝑐 = (𝑎1 · 𝑐′). □

Lemma D.5.19. C-WP-SWAP is sound.

Proof. By the meaning of conditioning modality and weakest precondition

transformer,

(ownVar ∧ C𝜇 𝑣.wp 𝑡 {𝑄(𝑣)})(𝑎)

⇔ ownVar(𝑎) ∧ ∃F , 𝜇, 𝑝, 𝜅. (F , 𝜇, 𝑝) ⪯ 𝑎 ∧ ∀𝑖 ∈ 𝐼 . 𝜇(𝑖) = bind(𝜇, 𝜅(𝑖))

∧ ∀𝑣 ∈ supp(𝜇).(wp 𝑡 {𝑄(𝑣)})(F , 𝜅(𝐼) (𝑣), 𝑝)

361


Intuitively, for each 𝑣, running 𝑡 on each fibre (F , 𝜅(𝐼) (𝑣), 𝑝) gives a output re-

source that satisfies 𝑄(𝑣).

Assume V(𝑎) holds and let 𝑎 = (F𝑎, 𝜇𝑎, 𝑝𝑎). By lemma D.2.4, when

(F , 𝜇, 𝑝) ⪯ 𝑎, 𝜇 = bind(𝜇, 𝜅) iff that there exists 𝜅′′ such that 𝜇𝑎 = bind(𝜇, 𝜅′′)

and 𝜅(𝐼) (𝑣) ⊑ 𝜅′′(𝐼) (𝑣) for every 𝑣. Thus,

(C𝜇 𝑣.wp 𝑡 {𝑄(𝑣)})(F𝑎, 𝜇𝑎, 𝑝𝑎) ⇔ ∃𝜅.∀𝑖 ∈ 𝐼 . 𝜇𝑎 (𝑖) = bind(𝜇, 𝜅′′(𝑖))

∧ ∀𝑣 ∈ supp(𝜇).(wp 𝑡 {𝑄(𝑣)})(F , 𝜅(𝐼) (𝑣), 𝑝)

We want to show that

wp 𝑡 {C𝜇 𝑣.𝑄(𝑣)}(𝑎)

which is equivalent to

∀𝜇′.∀𝑐. 𝑎 · 𝑐 ⪯ 𝜇′⇒ ∃𝑎′. 𝑎′ · 𝑐 ⪯ ⟦𝑡⟧(𝜇′) ∧ (C𝜇 𝑄(𝑣)) (𝑎).

Let’s fix an arbitrary 𝜇′, 𝑐 that satisfy V(𝑎 · 𝑐) ∧ 𝑎 · 𝑐 ⪯ 𝑎𝜇′ , we try to construct

a corresponding 𝑎′. The high-level approach that we will take is to show that

running 𝑡 on 𝑎 takes us to a resource that is equivalent to bind the set of output

resource satisfying 𝑄(𝑣) to 𝜇.

Recall that 𝑎 = (F𝑎, 𝜇𝑎, 𝑝𝑎) also satisfies ownVar, which says F𝑎 = ΣVar. We

claim that 𝑎 ·𝑐 ⪯ (ΣVar, 𝜇
′, 𝑝1) holds implies that the probability space 𝑐 is trivial.

Say 𝑐 = (F𝑐, 𝜇𝑐, 𝑝𝑐), then for any 𝐸 ∈ F𝑐, the event 𝐸 must also in F𝑎 and ΣVar

because they are the full sigma algebra. By definition of 𝑎 · 𝑐 ⪯ (ΣVar, 𝜇
′, 𝑝1), we

have

𝜇𝑐 (𝐸) · 𝜇𝑎 (𝐸) = 𝜇′(𝐸 ∩ 𝐸) = 𝜇′(𝐸). (D.48)

Another implication of 𝑎 · 𝑐 ⪯ (ΣVar, 𝜇
′, 𝑝1) is that we have 𝜇𝑐 (𝐸) = 𝜇′(𝐸) and

362


𝜇𝑎 (𝐸) = 𝜇′(𝐸). Combining with eq. (D.48), we can conclude

𝜇′(𝐸) · 𝜇′(𝐸) = 𝜇′(𝐸),

which implies that 𝜇𝑐 (𝐸) = 𝜇′(𝐸) ∈ {0, 1}. Therefore, 𝑐 is a trivial probability

space and

(F𝑎, 𝜅(𝐼) (𝑣), 𝑝𝑎) · 𝑐 ⪯ (F𝑎, 𝜅(𝐼) (𝑣), 𝑝𝑎)

Furthermore, for every 𝑣 ∈ supp(𝜇), we have (wp 𝑡 {𝑄(𝑣)})(F , 𝜅(𝐼) (𝑣), 𝑝)

which implies

∀𝜅′.(F𝑎, 𝜅(𝐼) (𝑣), 𝑝𝑎) · 𝑐 ⪯ 𝜅′(𝐼) (𝑣) (D.49)

⇒ ∃𝑎𝑣 . (𝑎𝑣 · 𝑐 ⪯ ⟦𝑡⟧(𝜅′(𝐼) (𝑣))) ∧𝑄(𝑣) (𝑎𝑣). (D.50)

Therefore,

𝑎 · 𝑐 ⪯ 𝑎𝜇′ ⇒ ∀𝑣 ∈ supp(𝜇).(V((F𝑎, 𝜅(𝐼) (𝑣), 𝑝𝑎) · 𝑐) ∧ (F𝑎, 𝜅(𝐼) (𝑣), 𝑝𝑎) · 𝑐 ⪯ (ΣVar, 𝜅(𝐼) (𝑣), 1)

(By D.2.7 and D.2.4)

⇒ ∀𝑣 ∈ supp(𝜇).∃𝑎𝑣 .V(𝑎𝑣 · 𝑐) ∧ (𝑎𝑣 · 𝑐 ⪯ (ΣVar, ⟦𝑡⟧(𝜅(𝐼) (𝑣)), 1)) ∧𝑄(𝑣) (𝑎𝑣)

(By eq. (D.49))

⇒ ∀𝑣 ∈ supp(𝜇).𝑝𝑎𝑣 + 𝑝𝑐 ⪯ 1 ∧𝑄(𝑣) (ΣVar, ⟦𝑡⟧(𝜅′(𝐼) (𝑣)), 1).

(By upwards closure)

Let 𝑎′𝑣 = (ΣVar, ⟦𝑡⟧(𝜅′(𝐼) (𝑣)), 𝑝𝑎). Because 𝜇𝑐 (𝐸) ∈ {0, 1} for any 𝐸 ∈ F𝑐,

for every 𝑣, we have (ΣVar, ⟦𝑡⟧(𝜅′(𝐼) (𝑣))) · (F𝑐, 𝜇𝑐) defined and thus 𝑎′𝑣 · 𝑐 valid.

Define

𝑎′ = (ΣVar, bind(𝜇, λ𝑣. ⟦𝑡⟧(𝜅′(𝐼) (𝑣)), 𝑝𝑎)

By lemma D.2.7, V(𝑎′𝑣 ·𝑐) for all 𝑣 ∈ supp𝜇 implies V(𝑎′·𝑐). Also, because𝑄(𝑣) (𝑎𝑣)

for all 𝑣 ∈ 𝐴𝜇, (C𝜇 𝑣.𝑄(𝑣)) (𝑎′). Thus, (wp 𝑡 {C𝜇 𝑣.𝑄(𝑣)})(𝑎). □

363


Program Rules

Lemma D.5.20. WP-SKIP is sound.

Proof. Assume 𝑎 ∈ M𝐼 is valid and such that 𝑃(𝑎) holds. By unfolding the

definition of WP, we need to prove

∀𝜇0.∀𝑐. (𝑎 · 𝑐) ⪯ 𝜇0 ⇒ ∃𝑏.
(
(𝑏 · 𝑐) ⪯ ⟦𝑡⟧(𝜇0) ∧ 𝑃(𝑏)

)
which follows trivially by ⟦[𝑖:skip]⟧(𝜇0) = 𝜇0 and picking 𝑏 = 𝑎. □

Lemma D.5.21. WP-SEQ is sound.

Proof. Assume 𝑎0 ∈ M𝐼 is a valid resource such that (wp [𝑖: 𝑡] {wp [𝑖: 𝑡′] {𝑄}})(𝑎0)

holds. Our goal is to prove (wp ( [𝑖: 𝑡; 𝑡′]) {𝑄})(𝑎0) holds, which unfolds by defi-

nition of WP into:

∀𝜇0.∀𝑐0. (𝑎0 · 𝑐0) ⪯ 𝜇0 ⇒ ∃𝑎2.
(
(𝑎2 · 𝑐0) ⪯ ⟦[𝑖: 𝑡; 𝑡′]⟧(𝜇0) ∧𝑄(𝑎2)

)
(D.51)

Take an arbitrary 𝜇0 and 𝑐0 such that (𝑎0 · 𝑐0) ⪯ 𝜇0. By unfolding the WPs in

the assumption, we have that there exists a 𝑎1 ∈ M𝐼 such that:

(𝑎1 · 𝑐0) ⪯ ⟦[𝑖: 𝑡]⟧(𝜇0) (D.52)

∀𝜇1.∀𝑐1. (𝑎1 · 𝑐1) ⪯ 𝜇1 ⇒ ∃𝑎2. ((𝑎2 · 𝑐1) ⪯ ⟦[𝑖: 𝑡′]⟧(𝜇1) ∧𝑄(𝑎2)) (D.53)

We can apply (D.53) to (D.52) by instantiating 𝜇1 with ⟦[𝑖: 𝑡]⟧(𝜇0), and 𝑐1 with

𝑐0, obtaining:

∃𝑎2. ((𝑎2 · 𝑐0) ⪯ ⟦[𝑖: 𝑡′]⟧(⟦[𝑖: 𝑡]⟧(𝜇0)) ∧𝑄(𝑎2))

Since by definition, ⟦𝑡; 𝑡′⟧(𝜇0) = ⟦𝑡′⟧(⟦𝑡⟧(𝜇0)), we obtain the goal (D.51) as de-

sired. □

364


Lemma D.5.22. WP-ASSIGN is sound.

Proof. Let 𝑎 ∈ M𝐼 be a valid resource, and let 𝑎(𝑖) = (F , 𝜇, 𝑝). By assumption we

have 𝑝(𝑥) = 1 and 𝑝(𝑦) > 0 for all 𝑦 ∈ FV(𝑒). We want to show that 𝑎 satisfies

wp [𝑖:x := 𝑒] {⌈𝑥 = 𝑒⌉}. This is equivalent to

∀𝜇0.∀𝑐. (𝑎 · 𝑐 ⪯ 𝜇0) ⇒ ∃𝑏. (𝑏 · 𝑐 ⪯ ⟦[𝑖:x := 𝑒]⟧(𝜇0) ∧ ⌈𝑥 = 𝑒⌉ (𝑏))

We show this holds by picking 𝑏 as follows:

𝑏 ≜ 𝑎[𝑖: (F𝑏, 𝜇𝑏, 𝑝)] F𝑏 ≜ {Mem[Var], ∅, 𝐴,Mem[Var] \ 𝐴} 𝐴 ≜ {𝑠[𝑥 ↦→ ⟦𝑒⟧(𝑠) ] |𝑠 ∈ Mem[Var]}

where 𝜇𝑏 is determined by setting 𝜇𝑏 (𝐴) = 1.

By construction we have that ⌈𝑥 = 𝑒⌉ (𝑏) holds. To close the proof we then

need to show that (𝑏 · 𝑐) ⪯ ⟦[𝑖:x := 𝑒]⟧(𝜇0).

Let 𝑐(𝑖) = (F𝑐, 𝜇𝑐, 𝑝𝑐). Observe that by the assumptions on 𝑝, we have V(𝑏)

since F𝑏 is only non-trivial on FV(𝑒) ∪ {𝑥}; moreover, by the assumption V(𝑎 · 𝑐)

we have that V(𝑝 + 𝑝𝑐) holds, which means that 𝑝𝑐 (𝑥) = 0, and thus F𝑐 is trivial

on 𝑥.

Let us define the function pre : 𝒫(Mem[Var]) → 𝒫(Mem[Var]) as:

pre(𝑋) ≜ {𝑠 |𝑠[𝑥 ↦→ ⟦𝑒⟧(𝑠) ] ∈ 𝑋}.

That is, pre(𝑋) is the weakest precondition (in the standard sense) of the assign-

ment. By construction, we have:

pre(𝐴) = Mem[Var] pre(𝑋1 ∩ 𝑋2) = pre(𝑋1) ∩ pre(𝑋2)

pre(Mem[Var] \ 𝐴) = ∅ pre(𝑋𝑐) = 𝑋𝑐 for all 𝑋𝑐 ∈ F𝑐

In particular, the latter holds because F𝑐 is trivial in 𝑥.

365


By unfolding the definition of ⟦ · ⟧, it is easy to check that for every 𝑋 ∈

ΣMem[Var] :

⟦x := 𝑒⟧(𝜇0) (𝑋) = 𝜇0(pre(𝑋))

We are now ready to show (𝑏 ·𝑐) ⪯ ⟦[𝑖:x := 𝑒]⟧(𝜇0) by showing that (F𝑏, 𝜇𝑏)⊛

(F𝑐, 𝜇𝑐) = (F𝑏 ⊕ F𝑐, ⟦x := 𝑒⟧(𝜇0) | (F𝑏⊕F𝑐)) where 𝜇0 = 𝜇0(𝑖). To show this it suffices

to prove that for every 𝑋𝑏 ∈ F𝑏 and every 𝑋𝑐 ∈ F𝑐, ⟦x := 𝑒⟧(𝜇0) (𝑋𝑏 ∩ 𝑋𝑐) =

𝜇𝑏 (𝑋𝑏) · 𝜇𝑐 (𝑋𝑐). We proceed by case analysis on 𝑋𝑏:

Case: 𝑋𝑏 = 𝐴 Then:

⟦x := 𝑒⟧(𝜇0) (𝐴 ∩ 𝑋𝑐) = 𝜇0(pre(𝐴 ∩ 𝑋𝑐))

= 𝜇0(pre(𝐴) ∩ pre(𝑋𝑐))

= 𝜇0(Mem[Var] ∩ pre(𝑋𝑐))

= 𝜇0(pre(𝑋𝑐))

= 𝜇𝑏 (𝐴) · 𝜇0(𝑋𝑐)

= 𝜇𝑏 (𝐴) · 𝜇𝑐 (𝑋𝑐)

Case: 𝑋𝑏 = Mem[Var] \ 𝐴 Then:

⟦x := 𝑒⟧(𝜇0) (Mem[Var] \ 𝐴 ∩ 𝑋𝑐) = 𝜇0(pre((Mem[Var] \ 𝐴) ∩ 𝑋𝑐))

= 𝜇0(pre(Mem[Var] \ 𝐴) ∩ pre(𝑋𝑐))

= 𝜇0(∅ ∩ pre(𝑋𝑐))

= 0

= 𝜇𝑏 (Mem[Var] \ 𝐴) · 𝜇𝑐 (𝑋𝑐)

Case: 𝑋𝑏 = Mem[Var] or 𝑋𝑏 = ∅ Analogous to the previous cases. □

Lemma D.5.23. WP-SAMP is sound.

366


Proof. Assume 𝑎 ∈ M𝐼 is valid and such that 𝑎(𝑖) = (F , 𝜇, 𝑝), with 𝑝(𝑥) = 1. Our

goal is to show that 𝑎 satisfies wp [𝑖:x 𝑑(®𝑣)] {𝑥 $∼ 𝑑 (®𝑣)} which is equivalent to

proving, for all 𝜇0 and for all 𝑐:

(𝑎 · 𝑐 ⪯ 𝜇0) ⇒ ∃𝑏.
(
𝑏 · 𝑐 ⪯ ⟦[𝑖:x 𝑑(®𝑣)]⟧(𝜇0) ∧ (𝑥 $∼ 𝑑 (®𝑣)) (𝑏)

)
(D.54)

Let 𝜇0 = 𝜇0(𝑖) and 𝜇1 = ⟦x 𝑑(®𝑣)⟧(𝜇0). Moreover, let 𝑐(𝑖) = (F𝑐, 𝜇𝑐, 𝑝𝑐).

Observe that by the assumptions on 𝑝 and validity of 𝑎 · 𝑐, we have 𝑝𝑐 (𝑥) = 0,

which means F𝑐 is trivial on 𝑥. We aim to prove (D.54) by letting

𝑏 ≜ 𝑎[𝑖: (F𝑏, 𝜇𝑏, 𝑝𝑏)] 𝜇𝑏 ≜ 𝜇1 |F𝑏

F𝑏 ≜ 𝜎
(
{[}

]
{𝑠 ∈ Mem[Var] |𝑠(𝑥) = 𝑣}|𝑣 ∈ Val

)
𝑝𝑏 ≜ (𝑥:1)

Note that by construction V(𝑝𝑏 + 𝑝𝑐), and V(𝑏) since F𝑏 is only non-trivial

in 𝑥. Similarly to the proof for section 5.4, we define the function

pre : 𝒫(Mem[Var]) → 𝒫(Mem[Var]) as:

pre(𝑋) ≜ {𝑠 |∃𝑣 ∈ Val. 𝑠[𝑥 ↦→ 𝑣 ] ∈ 𝑋}.

Since F𝑐 is trivial on 𝑥, for all 𝑋𝑐 ∈ F𝑐, pre(𝑋𝑐) = 𝑋𝑐. Moreover, for all 𝑋𝑏 ∈

F𝑏 \ {∅}, pre(𝑋𝑏) = Mem[Var], since 𝑋𝑏 is trivial on every variable except 𝑥.

By unfolding the definitions, we have:

𝜇1(𝑋) = ⟦x 𝑑(®𝑣)⟧(𝜇0) (𝑋)

=
∑︁
𝑠∈𝑋

𝜇0(pre(𝑠)) · ⟦𝑑⟧(®𝑣) (𝑠(𝑥))

We now show that (F𝑏, 𝜇𝑏)⊛(F𝑐, 𝜇𝑐) = (F𝑏⊕F𝑐, 𝜇1 | (F𝑏⊕F𝑐)) by showing that for all

𝑋𝑏 ∈ F𝑏 and 𝑋𝑐 ∈ F𝑐: 𝜇1(𝑋𝑏 ∩ 𝑋𝑐) = 𝜇𝑏 (𝑋𝑏) · 𝜇𝑐 (𝑋𝑐). To prove this we first define

V: 𝒫(Mem[Var]) → 𝒫(Val) as V(𝑋) ≜ {𝑠(𝑥) |𝑠 ∈ 𝑋}, and 𝑆𝑤 ≜ {𝑠 |𝑠(𝑥) = 𝑤}. We

367


observe that 𝑋𝑏 =
⊎
𝑤∈V(𝑋𝑏) 𝑆𝑤, and thus 𝑋𝑏 ∩ 𝑋𝑐 =

⊎
𝑤∈V(𝑋𝑏) (𝑋𝑐 ∩ 𝑆𝑤); moreover,

pre(𝑋𝑐 ∩ 𝑆𝑤) = {𝑠 |𝑠[𝑥 ↦→ 𝑤 ] ∈ 𝑋𝑐} = 𝑋𝑐. Thus, we can calculate:

𝜇1(𝑋𝑏 ∩ 𝑋𝑐) =
∑︁

𝑠∈𝑋𝑏∩𝑋𝑐
𝜇0(pre(𝑠)) · ⟦𝑑⟧(®𝑣) (𝑠(𝑥))

=
∑︁

𝑤∈V(𝑋𝑏)

∑︁
𝑠∈𝑋𝑐∩𝑆𝑤

𝜇0(pre(𝑠)) · ⟦𝑑⟧(®𝑣) (𝑤)

=
∑︁

𝑤∈V(𝑋𝑏)

(
⟦𝑑⟧(®𝑣) (𝑤) ·

∑︁
𝑠∈𝑋𝑐∩𝑆𝑤

𝜇0(pre(𝑠))
)

=
©­«

∑︁
𝑤∈V(𝑋𝑏)

⟦𝑑⟧(®𝑣) (𝑤) · 𝜇0(pre(𝑋𝑐 ∩ 𝑆𝑤))ª®¬
=

©­«
∑︁

𝑤∈V(𝑋𝑏)
⟦𝑑⟧(®𝑣) (𝑤)ª®¬ · 𝜇0(𝑋𝑐)

= 𝜇𝑏 (𝑋𝑏) · 𝜇𝑐 (𝑋𝑐)

The last equation is given by 𝑎 · 𝑐 ⪯ 𝜇0 which implies that 𝜇𝑐 = 𝜇0 |F𝑐 , and by:

𝜇𝑏 (𝑋𝑏) = 𝜇1(𝑋𝑏) =
∑︁
𝑠∈𝑋𝑏

𝜇0(pre(𝑠)) · ⟦𝑑⟧(®𝑣) (𝑠(𝑥))

=
∑︁

𝑤∈V(𝑋𝑏)

∑︁
𝑠∈𝑆𝑤

𝜇0(pre(𝑠)) · ⟦𝑑⟧(®𝑣) (𝑤)

=
∑︁

𝑤∈V(𝑋𝑏)
⟦𝑑⟧(®𝑣) (𝑤)

Finally, we need to show (𝑥 $∼ 𝑑 (®𝑣)) (𝑏) which amounts to proving 𝑥� (F𝑏, 𝜇𝑏)

and ⟦𝑑⟧(®𝑣) = 𝜇𝑏 ◦𝑥−1. The former holds because by construction 𝑥 is measurable

in F𝑏. For the latter, for all𝑊 ⊆ Val:

(𝜇𝑏 ◦ 𝑥−1) (𝑊) = 𝜇𝑏 (𝑥−1(𝑊)) =
∑︁

𝑤∈V(𝑥−1 (𝑊))

⟦𝑑⟧(®𝑣) (𝑤) =
∑︁
𝑤∈𝑊
⟦𝑑⟧(®𝑣) (𝑤) = ⟦𝑑⟧(®𝑣) (𝑊). □

Lemma D.5.24. WP-IF-PRIM is sound.

368


Proof. For any valid resource 𝑎,

(if 𝑣 then wp [𝑖: 𝑡1] {𝑄(1)} else wp [𝑖: 𝑡2] {𝑄(0)})(𝑎)

⇔


(wp [𝑖: 𝑡1] {𝑄(1)})(𝑎) if 𝑣 � 1

(wp [𝑖: 𝑡2] {𝑄(0)})(𝑎) otherwise

⇔ ∀𝜇0.∀𝑐. (𝑎 · 𝑐) ⪯ 𝜇0 ⇒


∃𝑏. (𝑏 · 𝑐) ⪯ ⟦𝑖 : 𝑡1⟧(𝜇0) ∧𝑄(1) (𝑏) if 𝑣 � 1

∃𝑏. (𝑏 · 𝑐) ⪯ ⟦𝑖 : 𝑡2⟧(𝜇0) ∧𝑄(0) (𝑏) otherwise

⇔ ∀𝜇0.∀𝑐. (𝑎 · 𝑐) ⪯ 𝜇0 ⇒ ∃𝑏. (𝑏 · 𝑐) ⪯ ⟦𝑖 : if 𝑣 then 𝑡1 else 𝑡2⟧(𝜇0) ∧𝑄(𝑣 � 1) (𝑏)

⇒(wp [𝑖: if 𝑣 then 𝑡1 else 𝑡2] {𝑄(𝑣 � 1)})(𝑎)

□

Lemma D.5.25. WP-BIND is sound.

Proof. For any resource 𝑎 = (F , 𝜇, 𝑝), (⌈𝑒 = 𝑣⌉∗wp
[
𝑖: E[𝑣]

]
{𝑄})(F , 𝜇, 𝑝) iff there

exists (F1, 𝜇1, 𝑝1), (F2, 𝜇2, 𝑝2) such that

(⌈𝑒 = 𝑣⌉)(F1, 𝜇1, 𝑝1)

(wp
[
𝑖: E[𝑣]

]
{𝑄})(F2, 𝜇2, 𝑝2)

(F1, 𝜇1, 𝑝1) · (F2, 𝜇2, 𝑝2) ⪯ (F , 𝜇, 𝑝)

By the upwards closure, we also have

(⌈𝑒 = 𝑣⌉)(F , 𝜇, 𝑝)

(wp
[
𝑖: E[𝑣]

]
{𝑄})(F , 𝜇, 𝑝)

The fact that (⌈𝑒 = 𝑣⌉)(F1, 𝜇1, 𝑝1) implies that 𝜇1((𝑒 = 𝑣)−1(True)) = 1, which

implies that ⟦𝑒⟧(𝑠) = 𝑣 for all 𝑠 ∈ supp(𝜇1(𝑖)).

369


By lemma D.1.3, we have for any 𝑠 ∈ Mem[Var],

K⟦E[𝑒]⟧(𝑠) = K⟦E[⟦𝑒⟧(𝑠)]⟧(𝑠),

which implies that for any 𝜇0 over ΣMem[Var]

⟦E[𝑒]⟧(𝜇0) = 𝑠← 𝜇0; K⟦E[𝑒]⟧(𝑠)

= 𝑠← 𝜇0; K⟦E[⟦𝑒⟧(𝑠)]⟧(𝑠)

= 𝑠← 𝜇0; K⟦E[𝑣]⟧(𝑠)

= ⟦E[𝑣]⟧(𝜇0).

Define 𝜇′0 = ⟦[𝑖: E[𝑣]]⟧𝜇0. Thus, (wp
[
𝑖: E[𝑣]

]
{𝑄})(𝑎) iff

∀𝜇0.∀𝑐. (V(𝑎 · 𝑐) ∧ 𝑎 · 𝑐 ⪯ 𝑎𝜇0) ⇒ ∃𝑎′. (V(𝑎′ · 𝑐) ∧ 𝑎′ · 𝑐 ⪯ 𝑎𝜇′0 ∧𝑄(𝑎
′))

iff

∀𝜇0.∀𝑐. (V(𝑎 · 𝑐) ∧ 𝑎 · 𝑐 ⪯ 𝑎𝜇0) ⇒ ∃𝑎′. (V(𝑎′ · 𝑐) ∧ 𝑎′ · 𝑐 ⪯ 𝑎𝜇′0 ∧𝑄(𝑎
′))

iff
(
wp

[
𝑖: E[𝑒]

]
{𝑄}

)
((𝑎)). □

Lemma D.5.26. WP-LOOP-UNF is sound.

Proof. By definition,

⟦repeat (𝑛 + 1) 𝑡⟧(𝜇) =
(
𝑠← 𝜇; 𝑠′← loop𝑡 (𝑛, 𝑠); K⟦𝑡⟧(𝑠′)

)
= ⟦(repeat 𝑛 𝑡); 𝑡⟧(𝜇)

thus the rule follows from the argument of lemma D.5.21. □

Lemma D.5.27. WP-LOOP is sound.

Proof. By induction on 𝑛.

370


Base case 𝑛 = 0 Analogously to lemma D.5.20 since, by definition, ⟦repeat 0 𝑡⟧(𝜇0) =

𝜇0.

Induction step 𝑛 > 0 By induction hypothesis 𝑃(0) ⊢ wp [ 𝑗 :repeat (𝑛 − 1) 𝑡] {𝑃(𝑛−

1)} holds, and we want to show that 𝑃(0) ⊢ wp [ 𝑗 :repeat 𝑛 𝑡] {𝑃(𝑛)}. By

lemma D.5.26, it suffices to show 𝑃(0) ⊢ wp [ 𝑗 :repeat (𝑛 − 1) 𝑡] {wp [ 𝑗 : 𝑡] {𝑃(𝑛)}}.

By applying the induction hypothesis and lemma D.5.17 we are left with

proving 𝑃(𝑛 − 1) ⊢ wp [ 𝑗 : 𝑡] {𝑃(𝑛)} which is implied by the premise of the

rule with 𝑖 = 𝑛 − 1 < 𝑛. □

D.5.3 Soundness of Derived Rules

In this section we provide derivations for the rules we claim are derivable in

BLUEBELL.

Ownership and Distributions

Lemma D.5.28. SURE-DIRAC is sound.

Proof.

𝐸 $∼ 𝛿𝑣 ⊣⊢ ∃F , 𝜇.Own((F , 𝜇)) ∗ ⌜𝜇 ◦ 𝐸−1 = 𝛿𝑣⌝

⊣⊢ ∃F , 𝜇.Own((F , 𝜇)) ∗ ⌜𝜇 ◦ (𝐸 = 𝑣)−1 = 𝛿True⌝

⊣⊢ ⌈𝐸 = 𝑣⌉ □

Lemma D.5.29. SURE-EQ-INJ is sound.

371


Proof.

⌈𝐸 = 𝑣⌉ ∗ ⌈𝐸 = 𝑣′⌉ ⊢ 𝐸 $∼ 𝛿𝑣 ∗ 𝐸 $∼ 𝛿𝑣′ (SURE-DIRAC)

⊢ 𝐸 $∼ 𝛿𝑣 ∧ 𝐸 $∼ 𝛿𝑣′

⊢ ⌜𝛿𝑣 = 𝛿𝑣′⌝ (DIST-INJ)

⊢ ⌜𝑣 = 𝑣′⌝ □

Lemma D.5.30. SURE-SUB is sound.

Proof.

𝐸1 $∼ 𝜇 ∗ ⌈(𝐸2 = 𝑓 (𝐸1))⌉ ⊢ C𝜇 𝑣. ⌈𝐸1 = 𝑣⌉ ∗ ⌈(𝐸2 = 𝑓 (𝐸1))⌉ (C-UNIT-R, C-FRAME)

⊢ C𝜇 𝑣. ⌈𝐸1 = 𝑣 ∧ 𝐸2 = 𝑓 (𝐸1)⌉ (SURE-MERGE)

⊢ C𝜇 𝑣. ⌈𝐸2 = 𝑓 (𝑣)⌉ (C-CONS)

⊢ C𝜇 𝑣. C𝛿 𝑓 (𝑣) 𝑣
′. ⌈𝐸2 = 𝑣′⌉ (C-UNIT-L)

⊢ C𝜇′ 𝑣
′. ⌈𝐸2 = 𝑣′⌉ (C-ASSOC, C-SURE-PROJ)

where 𝜇′= bind(𝜇, λ𝑥. 𝛿 𝑓 (𝑥)) = 𝜇 ◦ 𝑓 −1. By C-UNIT-R we thus get 𝐸2 $∼ 𝜇 ◦ 𝑓 −1. □

Lemma D.5.31. DIST-FUN is sound.

Proof. Assume 𝐸 : Mem[Var] → 𝐴 and 𝑓 : 𝐴→ 𝐵, then:

𝐸 $∼ 𝜇 ⊢ C𝜇 𝑣. ⌈(𝐸 = 𝑣)⌉ (C-UNIT-R)

⊢ C𝜇 𝑣. ⌈( 𝑓 ◦ 𝐸) = 𝑓 (𝑣)⌉ (C-CONS)

⊢ C𝜇 𝑣. C𝛿 𝑓 (𝑣) 𝑣
′. ⌈( 𝑓 ◦ 𝐸) = 𝑣′⌉ (C-UNIT-L)

⊢ C𝜇′ 𝑣
′. ⌈( 𝑓 ◦ 𝐸) = 𝑣′⌉ (C-ASSOC, C-SURE-PROJ)

where 𝜇′ = bind(𝜇, λ𝑥. 𝛿 𝑓 (𝑥)) = 𝜇 ◦ 𝑓 −1. By C-UNIT-R we thus get ( 𝑓 ◦ 𝐸) $∼

𝜇 ◦ 𝑓 −1. □

372


Lemma D.5.32. DIRAC-DUP is sound.

Proof.

𝐸 $∼ 𝛿𝑣 ⊢ ⌈𝐸 = 𝑣⌉ (SURE-DIRAC)

⊢ ⌈𝐸 = 𝑣⌉ ∗ ⌈𝐸 = 𝑣⌉ (SURE-MERGE)

⊢ 𝐸 $∼ 𝛿𝑣 ∗ 𝐸 $∼ 𝛿𝑣 (SURE-DIRAC)

□

Lemma D.5.33. DIST-SUPP is sound.

Proof.

𝐸 $∼ 𝜇 ⊢ C𝜇 𝑣.⌈𝐸 = 𝑣⌉ (C-UNIT-R)

⊢ ⌜𝜇(supp(𝜇)) = 1⌝ ∗ C𝜇 𝑣.⌈𝐸 = 𝑣⌉

⊢ C𝜇 𝑣.
(
⌜𝑣 ∈ supp(𝜇)⌝ ∗ ⌈𝐸 = 𝑣⌉

)
(C-PURE)

⊢ C𝜇 𝑣.
(
⌈𝐸 = 𝑣⌉ ∗ ⌈𝐸 ∈ supp(𝜇)⌉

)
⊢

(
C𝜇 𝑣.⌈𝐸 = 𝑣⌉

)
∗ ⌈𝐸 ∈ supp(𝜇)⌉ (SURE-STR-CONVEX)

⊢ 𝐸 $∼ 𝜇 ∗ ⌈𝐸 ∈ supp(𝜇)⌉ (C-UNIT-R)

□

Lemma D.5.34. PROD-UNSPLIT is sound.

Proof.

𝐸1 $∼ 𝜇1 ∗ 𝐸2 $∼ 𝜇2 ⊢ C𝜇1 𝑣1. C𝜇2 𝑣2.
(
⌈𝐸1 = 𝑣1⌉ ∗ ⌈𝐸2 = 𝑣2⌉

)
(C-UNIT-R, C-FRAME)

⊢ C𝜇1 𝑣1. C𝜇2 𝑣2. ⌈(𝐸1, 𝐸2) = (𝑣1, 𝑣2)⌉ (SURE-MERGE)

⊢ C𝜇1⊗𝜇2 (𝑣1, 𝑣2). ⌈(𝐸1, 𝐸2) = (𝑣1, 𝑣2)⌉ (C-ASSOC)

⊢ (𝐸1, 𝐸2) $∼ 𝜇1 ⊗ 𝜇2 (C-UNIT-R)

373


□

Joint conditioning

Lemma D.5.35. C-FUSE is sound.

Proof. Recall that 𝜇 � 𝜅 ≜ λ(𝑣, 𝑤). 𝜇(𝑣)𝜅(𝑣) (𝑤). which can be reformulated as

𝜇 � 𝜅 = bind(𝜇, λ𝑣. (bind(𝜅(𝑣), λ𝑤. return(𝑣, 𝑤)))).

The (⊢) direction is an instance of C-ASSOC.

The (⊣) direction follows from C-UNASSOC:

C𝜇�𝜅 (𝑣′, 𝑤′). 𝐾 (𝑣′, 𝑤′) ⊢ C𝜇 𝑣. Cbind(𝜅(𝑣),λ𝑤.𝛿 (𝑣,𝑤) ) (𝑣
′, 𝑤′). 𝐾 (𝑣′, 𝑤′) (C-UNASSOC)

⊢ C𝜇 𝑣. C𝜅(𝑣) 𝑤. C𝛿 (𝑣,𝑤) (𝑣′, 𝑤′). 𝐾 (𝑣′, 𝑤′) (C-UNASSOC)

⊢ C𝜇 𝑣. C𝜅(𝑣) 𝑤. 𝐾 (𝑣, 𝑤) (C-UNIT-L)

□

Lemma D.5.36. C-SWAP is sound.

Proof.

C𝜇1 𝑣1. C𝜇2 𝑣2. 𝐾 (𝑣1, 𝑣2) ⊢ C𝜇1⊗𝜇2 (𝑣1, 𝑣2). 𝐾 (𝑣1, 𝑣2) (C-FUSE)

⊢ C𝜇2 𝑣2. C𝜇1 𝑣1. 𝐾 (𝑣1, 𝑣2) (C-FUSE)

Where

𝜇1 ⊗ 𝜇2 = 𝜇1 � (λ . 𝜇2) = 𝜇2 � (λ . 𝜇1)

justifies the applications of C-FUSE. □

Lemma D.5.37. SURE-CONVEX is sound.

374


Proof. By SURE-STR-CONVEX with 𝐾 = True. □

Lemma D.5.38. Section 5.3.5 is sound.

Proof.

C𝜇 𝑣.𝐸 $∼ 𝜇′ ⊢ C𝜇 𝑣. C𝜇′ 𝑤.⌈𝐸 = 𝑤⌉ (C-UNIT-R)

⊢ C𝜇′ 𝑤. C𝜇 𝑣.⌈𝐸 = 𝑤⌉ (C-SWAP)

⊢ C𝜇′ 𝑤.⌈𝐸 = 𝑤⌉ (SURE-CONVEX)

⊢ 𝐸 $∼ 𝜇′ (C-UNIT-R)

□

Lemma D.5.39. The following rule is sound:

∀(𝑣, ) ∈ supp(𝜇).∀𝜇′. C𝜇′ 𝑤. 𝑃(𝑣) ⊢ 𝑃(𝑣)

C𝜇 (𝑣, 𝑤). 𝑃(𝑣) ⊣⊢ C𝜇◦𝜋−1 𝑣. 𝑃(𝑣)

Proof. Assume that for all (𝑣, ) ∈ supp(𝜇), ∀𝜇′. C𝜇′ 𝑤. 𝑃(𝑣) ⊢ 𝑃(𝑣) (i.e. 𝑃(𝑣) is

convex). By lemma D.1.2 there is some 𝜅 such that 𝜇 = (𝜇 ◦ 𝜋−1) � 𝜅. Then:

C𝜇 (𝑣, 𝑤). 𝑃(𝑣) ⊣⊢ C𝜇◦𝜋−1 𝑣. C𝜅(𝑣) 𝑤. 𝑃(𝑣) (C-FUSE)

⊣⊢ C𝜇◦𝜋−1 𝑣. 𝑃(𝑣)

The last step is justified by the convexity assumption in the (⊢) direction, and

by C-TRUE and C-FRAME in the (⊣) direction. □

Lemma D.5.40. C-SURE-PROJ is sound.

Proof. By lemma D.5.39 and lemma D.5.37. □

Lemma D.5.41. Section 5.3.5 is sound.

375


Proof.

C𝜇1 𝑣1.
(
⌈𝐸1 = 𝑣1⌉ ∗ 𝐸2 $∼ 𝜇2

)
⊢ C𝜇1 𝑣1.

(
⌈𝐸1 = 𝑣1⌉ ∗ C𝜇2 𝑣2. ⌈𝐸2 = 𝑣2⌉

)
(C-UNIT-R)

⊢ C𝜇1 𝑣1. C𝜇2 𝑣2.
(
⌈𝐸1 = 𝑣1⌉ ∗ ⌈𝐸2 = 𝑣2⌉

)
(C-FRAME)

⊢ C𝜇1 𝑣1. C𝜇2 𝑣2. ⌈𝐸1 = 𝑣1 ∧ 𝐸2 = 𝑣2⌉ (SURE-MERGE)

⊢ C𝜇1⊗𝜇2 (𝑣1, 𝑣2). ⌈(𝐸1, 𝐸2) = (𝑣1, 𝑣2)⌉ (C-ASSOC)

⊢ (𝐸1, 𝐸2) $∼ (𝜇1 ⊗ 𝜇2) (C-UNIT-R)

⊢ 𝐸1 $∼ 𝜇1 ∗ 𝐸2 $∼ 𝜇2 (PROD-SPLIT)

□

Lemma D.5.42. Section 5.3.5 is sound.

Proof. By lemma D.5.39 and lemma D.5.38. □

Weakest Precondition

Lemma D.5.43. Section 5.4 is sound.

Proof. Special case of WP-LOOP with 𝑛 = 0, which makes the premises trivial. □

Lemma D.5.44. Section 5.4 is sound.

376


Proof. From the premises, we derive:

𝑃 ∗ ⌈𝑒 = 1⌉ ⊩ wp [1: 𝑡1] {𝑄(1)} 𝑃 ∗ ⌈𝑒 = 0⌉ ⊩ wp [1: 𝑡2] {𝑄(0)}

∀𝑏 ∈ {0, 1}. 𝑃 ∗ ⌈𝑒 = 1⌉ ⊩ if 𝑏 then wp [1: 𝑡1] {𝑄(1)} else wp [1: 𝑡2] {𝑄(0)}

∀𝑏 ∈ {0, 1}. 𝑃 ∗ ⌈𝑒 = 𝑏⌉ ⊩ wp [1: (if 𝑏 then 𝑡1 else 𝑡2)] {𝑄(𝑏 � 1)}
WP-IF-PRIM

∀𝑏 ∈ {0, 1}. 𝑃 ∗ ⌈𝑒 = 𝑏⌉ ⊩ wp [1: (if e then 𝑡1 else 𝑡2)] {𝑄(𝑏 � 1)}
WP-BIND

C𝛽 𝑏. (𝑃 ∗ ⌈𝑒 = 𝑏⌉) ⊩ C𝛽 𝑏.wp [1: (if e then 𝑡1 else 𝑡2)] {𝑄(𝑏 � 1)}
C-CONS

𝑃 ∗ 𝑒 $∼ 𝛽 ⊩ C𝛽 𝑏.wp [1: (if e then 𝑡1 else 𝑡2)] {𝑄(𝑏 � 1)}
C-UNIT-R,C-FRAME

𝑃 ∗ 𝑒 $∼ 𝛽 ⊩ wp [1: (if e then 𝑡1 else 𝑡2)] {C𝛽 𝑏. 𝑄(𝑏 � 1)}
C-WP-SWAP

□

377


	Biographical Sketch
	Dedication
	Acknowledgements
	Table of Contents
	List of Figures
	Introduction
	Probabilistic Programs
	Independence and Dependencies in Programs
	Separation Logic for Independence and Dependencies
	Outline of the Thesis

	Bunched Logic and Probabilistic Separation Logic
	Background
	Bunched Logic (BI)
	Syntax and Semantics
	Proof System
	Soundness and Completeness of BI
	A Discrete Probabilistic Frame of BI

	Probabilistic Separation Logic
	A Simple Probabilistic Programming Language
	A Concrete BI Model for Asserting Independence
	A Program Logic for Reasoning about Independence


	A Program Logic for Negative Dependence
	Overview
	Negative Association
	A BI Frame for Negative Dependence
	Initial Attempts at a BI Frame for Negative Association
	Our BI Frame for Negative Association

	M-BI: Combining BI Models
	The Syntax and Proof Rules
	Semantics
	A M-BI Model for Independence and NA

	Logic of Independence and Negative Association
	Assertion Logic
	Program Logic

	Examples
	Probability-related Axioms for Examples
	Bloom filter, High-level
	Bloom filter, Low-level
	Permutation Hashing
	Fully-dynamic Dictionary
	Repeated Balls-into-bins Process

	Related Work

	A Bunched Logic for Dependence and Independence
	DIBI Logic
	Syntax and semantics
	Proof system
	Soundness and Completeness of DIBI

	A Probabilistic Model of DIBI
	A Concrete Probabilistic Frame of DIBI
	Capturing Conditional Independence
	Validating the Semi-graphoid Axioms

	Conditional Probabilistic Separation Logic
	CPSL: Assertion Logic
	Conditional Probabilistic Separation Logic (CPSL)
	Example: CPSL in Action

	Related Work

	Bluebell: A Unifying Framework for Independence, Conditional Independence and Relational Reasoning
	Overview
	Preliminaries: Programs and Probability Spaces
	The Bluebell Logic
	An Alternative Approach to Bunched Logic
	A Model of Probabilistic Spaces
	A Model of Mutable Probabilistic Stores
	Joint Conditioning
	The Rules of Conditioning and Independence

	Reasoning about Programs in Bluebell
	Case Studies for Bluebell
	One Time Pad Revisited
	Markov Blankets
	Multi-party Secure Computation
	Von Neumann Extractor

	Related Work

	Discussion
	Related Work
	Directions for Future Work

	Bunched Logic and Probabilistic Separation Logic
	Proofs related to Bunched Logic
	Proofs related to Probabilistic Separation Logic

	LINA: A Separation Logic for Negative Dependence
	Preliminaries
	A BI Frame for Negative Association
	Capturing Negative Association
	Omitted Proofs of Frame Conditions

	Soundness and Completeness of M-BI algebras
	Algebraic Soundness and Completeness
	Soundness of M-BI formulas
	Completeness of M-BI formulas

	A M-BI Model for Independence and Negative Association
	Independence Implies PNA
	Axioms of Negative Association
	The Restriction Property of M-BI Formulas


	DIBI: A Bunched Logic for Conditional Independence
	A Probabilistic Model of DIBI
	Well-definedness of the Structure
	Associativity of Parallel Composition
	Commutativity of Parallel Composition
	Other Properties Used in Proving Frame Conditions
	Main Theorem: Proving Frame Conditions

	Capturing Conditional Independence
	Properties of the Probabilistic Frame
	Key Lemmas: Conditional Independence is Expressed
	Validating Graphoid Axioms, Section 4.2.3

	CPSL Assertion Logic
	Restriction
	Extra Axioms

	CPSL Soundness

	The Unary Fragment Bluebell for Reasoning About Independence and Conditional Independence
	The Rules of Bluebell
	Program Semantics

	Measure Theory Lemmas
	Construction of the Bluebell Model
	Characterizations of Joint Conditioning
	Soundness
	Soundness of Primitive Rules
	Soundness of Primitive WP Rules
	Soundness of Derived Rules