PROBABILISTIC SEPARATION LOGICS FOR RANDOMIZED ALGORITHMS A Dissertation Presented to the Faculty of the Graduate School of Cornell University in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy by Jialu Bao August 2025 © 2025 Jialu Bao ALL RIGHTS RESERVED PROBABILISTIC SEPARATION LOGICS FOR RANDOMIZED ALGORITHMS Jialu Bao, Ph.D. Cornell University 2025 Randomized algorithms are hard to test, thus accentuating the need for formal methods to ensure their correctness. When probabilistic separation logic was first developed as a formal method for proving probabilistic independence be- tween program variables, it was unclear whether this approach generalizes to weaker forms of probabilistic separation used in program analysis. We first overview existing work in Bunched logic — the assertion logic un- derlying separation logic — and probabilistic separation logic for independence in chapter 2. In chapter 3, we extend probabilistic separation logic to reason about negative dependence, a relation in which an increase in one variable makes others less likely to increase. We demonstrate the utility of this program logic by analyzing hash-based data structures, such as Bloom filters. In chapter 4, we introduce a variation of probabilistic separation logic for reasoning about dependence and independence. Specifically, we use it to estab- lish conditional independence between programs variables in simple programs. Last, in chapter 5, we present the unary fragment of BLUEBELL to provide a more ergonomic way to reason about conditional independence and indepen- dence. We illustrate its application through more intricate examples drawn from cryptography, security, and probabilistic graphical models. All the program logics developed in this thesis target imperative programs that can sample from probability distributions. BIOGRAPHICAL SKETCH Jialu spent the first eleventh years of her life in Ningbo, a coastal city with a long history of fishing and trades. She went on to attend middle school and high school in Hangzhou. After high school, she briefly enrolled at the Univer- sity of Virginia for one semester before beginning her undergraduate studies at Cornell University as a spring admit. At Cornell, she earned Bachelor of Arts in Math and Computer Science. She then moved to Wisconsin to start her Ph.D. at University of Wisconsin–Madison. After two years, she transferred back to Cornell following her advisor’s move. Her doctoral research lies in the field of programming languages, formal verification, and probabilistic programs. After completing her Ph.D. in Computer Science at Cornell, she will start as a post- doctoral researcher at Northeastern University with Prof. Steven Holtzen. iii To my family. iv ACKNOWLEDGEMENTS First and foremost, I would like to express my deepest gratitude to my ad- visor Justin Hsu, without whom this thesis would not have existed. Justin has been such an exceptional teacher, an insightful mentor, and an inspiring role model! Through thoughtful explanations and detailed feedback, he taught me how to think, write, and present more clearly and break down complex prob- lems into manageable parts. I am also grateful to have Joseph Halpern, Dexter Kozen, and Alexandra Silva on my committee. I was blessed to meet many wonderful teachers, and Joe was one of them. His course ”Reasoning about Knowledge” gave a fasci- nating introduction to epistemic logic and sparked my interest in modal logic. I also feel fortunate to have the opportunity to learn from Dexter Kozen about Kleene Algebra in a class. Outside of the classroom, Dexter’s wisdom has also led to many inspiring discussions in PLDG and in the hallway. I am immensely grateful to Alexandra Silva for collaborating with me and hosting me on var- ious occasions, including my visit to her UCL group this year, during which part of this thesis was written. The lab’s friendly and intellectually engaging atmosphere made it an ideal place for me to reflect and learn. I am indebted to all my collaborators, Jessica Cho, Simon Dorcherty, Emanuele D’Osualdo, Azadeh Farzan, Marco Gaboardi, Tao Gu, Kun He, John Hopcroft. Justin Hsu, Drashti Pathak, Oliver Richardson, Subhajit Roy, Alexan- dra Silva, Joseph Tassarotti, Nitesh Trivedi, Xiaodong Xin, and Fabio Zanasi. In particular, I would like to thank Emanuele and Azadeh for their close men- torship during our collaboration. They gave me new perspectives on program logics and showed me fresh ways to approach research problems. I would also like to thank Shuchi Chawla for kindly mentoring me on a project, though it did v not result in a publication. I am also grateful to have Eli Bingham and Zenna Tavares as my mentors when I interned at Basis, and Ellie Cheng, Ayush Chopra, Poorva Garg, Palka Puri, Raffi Sanna, and Andy Zane as my intern cohorts. During my undergraduate studies, the courses taught by Paul Ginsparg, Michael Clarkson, and Jon Kleinberg deeply influenced my career choice. I would like to thank them for giving intellectually stimulating lectures and thoughtful assignments. I am also grateful to Michael Macy, Chris Cameron, John Hopcroft, and Nate Foster for mentoring me and introducing me to the world of research. Although my years in Wisconsin were cast in the shadow of the Covid lock- downs, the wonderful people I met and the beautiful outdoor scenery colored my memories. I would like to thank Evangelia Gergatsouli, Yang Guo, Xiating Ouyang, Rojin Rezvan and Laura Stegner for many fun gatherings when we could meet in person. I would also like to thank Kyrylo Chernyshov, Yuchen Han, Lu Yang, Yujia Zhang, and other friends for sharing their daily moments remotely and kept me in a good spirit during that period. After returning to Cornell, I also had great pleasure to be surrounded by fantastic friends and colleagues. Although we did not have a lab officially, other students of Justin (Noah Bertram, Max Fan, Karuna Grewal, Vaibhav Mehta, Kei Imada, Zachary Susag, and Laura Zielinski) and my officemates (Keri D’Angelo, Kangbo Li, Khonzoda Umarova, and Noam Zilberstein) filled that role, offering knowledge and support whenever it was needed. Their kindness, curiosity, and good humor made both the research and the everyday moments delightful. I am also especially thankful to Mark Moeller and Yulun Yao for being amazing PLDG co-czars – I hope this PL group tradition continues for many years to vi come! In my last year, the Sunday casual tennis organized by Ayaka Yorihiro and Nitika Saran became a cherished social routine. I had great fun hitting with Ayaka, Ethan Yang, Max, Nitika, Rebecca Liu, Yunxi Shen and others. I also feel fortunate to share my Ph.D. journey with Pedro de Amorim, Ryan Doenges, Ali Farahbakhsh, Wen-Ding Li, Yueying Li, Rishabh Madan, Anshuman Mohan, Rolph Recto, Oliver Richardson, Goktug Saatcioglu, Albert Tseng, Nathan Yan, and Alicia Yang. I am deeply thankful for the support they offered and the insights they generously shared. I hope our friendship will continue for many years. Outside of the CS department, I am fortunate to have Ning Duan, Lijun Zhang, and Yujia Zhang as my close friends locally in Ithaca. I am also grateful to have had Shi Tang and Yu Pan as best friends since high school — our en- during friendship offers a sanctuary for reflecting on my personal journey and sharing my feelings. Last, I am infinitely grateful to my parents Fang and Guanzhen, wàipó Yazhen, wàigōng Meiding, and the rest of my family for their unconditional love, unwavering support, and the countless cherished moments we have shared. Hangzhou, China July, 2025 vii TABLE OF CONTENTS Biographical Sketch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi 1 Introduction 1 1.1 Probabilistic Programs . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Independence and Dependencies in Programs . . . . . . . . . . . 3 1.3 Separation Logic for Independence and Dependencies . . . . . . . 7 1.4 Outline of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2 Bunched Logic and Probabilistic Separation Logic 12 2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.2 Bunched Logic (BI) . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.2.1 Syntax and Semantics . . . . . . . . . . . . . . . . . . . . . 15 2.2.2 Proof System . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.2.3 Soundness and Completeness of BI . . . . . . . . . . . . . 20 2.2.4 A Discrete Probabilistic Frame of BI . . . . . . . . . . . . . 32 2.3 Probabilistic Separation Logic . . . . . . . . . . . . . . . . . . . . . 36 2.3.1 A Simple Probabilistic Programming Language . . . . . . 38 2.3.2 A Concrete BI Model for Asserting Independence . . . . . 43 2.3.3 A Program Logic for Reasoning about Independence . . . 45 3 A Program Logic for Negative Dependence 52 3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 3.2 Negative Association . . . . . . . . . . . . . . . . . . . . . . . . . . 54 3.3 A BI Frame for Negative Dependence . . . . . . . . . . . . . . . . 60 3.3.1 Initial Attempts at a BI Frame for Negative Association . . 61 3.3.2 Our BI Frame for Negative Association . . . . . . . . . . . 63 3.4 𝑀-BI: Combining BI Models . . . . . . . . . . . . . . . . . . . . . . 67 3.4.1 The Syntax and Proof Rules . . . . . . . . . . . . . . . . . . 68 3.4.2 Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 3.4.3 A 𝑀-BI Model for Independence and NA . . . . . . . . . . 71 3.5 Logic of Independence and Negative Association . . . . . . . . . 73 3.5.1 Assertion Logic . . . . . . . . . . . . . . . . . . . . . . . . . 73 3.5.2 Program Logic . . . . . . . . . . . . . . . . . . . . . . . . . 77 3.6 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 3.6.1 Probability-related Axioms for Examples . . . . . . . . . . 81 3.6.2 Bloom filter, High-level . . . . . . . . . . . . . . . . . . . . 84 3.6.3 Bloom filter, Low-level . . . . . . . . . . . . . . . . . . . . . 93 3.6.4 Permutation Hashing . . . . . . . . . . . . . . . . . . . . . 95 viii 3.6.5 Fully-dynamic Dictionary . . . . . . . . . . . . . . . . . . . 97 3.6.6 Repeated Balls-into-bins Process . . . . . . . . . . . . . . . 105 3.7 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 4 A Bunched Logic for Dependence and Independence 114 4.1 DIBI Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 4.1.1 Syntax and semantics . . . . . . . . . . . . . . . . . . . . . 118 4.1.2 Proof system . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 4.1.3 Soundness and Completeness of DIBI . . . . . . . . . . . . 126 4.2 A Probabilistic Model of DIBI . . . . . . . . . . . . . . . . . . . . . 128 4.2.1 A Concrete Probabilistic Frame of DIBI . . . . . . . . . . . 130 4.2.2 Capturing Conditional Independence . . . . . . . . . . . . 133 4.2.3 Validating the Semi-graphoid Axioms . . . . . . . . . . . . 135 4.3 Conditional Probabilistic Separation Logic . . . . . . . . . . . . . 137 4.3.1 CPSL: Assertion Logic . . . . . . . . . . . . . . . . . . . . . 138 4.3.2 Conditional Probabilistic Separation Logic (CPSL) . . . . . 144 4.3.3 Example: CPSL in Action . . . . . . . . . . . . . . . . . . . 147 4.4 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 5 Bluebell: A Unifying Framework for Independence, Conditional In- dependence and Relational Reasoning 156 5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 5.2 Preliminaries: Programs and Probability Spaces . . . . . . . . . . 162 5.3 The BLUEBELL Logic . . . . . . . . . . . . . . . . . . . . . . . . . . 167 5.3.1 An Alternative Approach to Bunched Logic . . . . . . . . 167 5.3.2 A Model of Probabilistic Spaces . . . . . . . . . . . . . . . . 170 5.3.3 A Model of Mutable Probabilistic Stores . . . . . . . . . . . 174 5.3.4 Joint Conditioning . . . . . . . . . . . . . . . . . . . . . . . 178 5.3.5 The Rules of Conditioning and Independence . . . . . . . 179 5.4 Reasoning about Programs in BLUEBELL . . . . . . . . . . . . . . . 183 5.5 Case Studies for BLUEBELL . . . . . . . . . . . . . . . . . . . . . . 188 5.5.1 One Time Pad Revisited . . . . . . . . . . . . . . . . . . . . 188 5.5.2 Markov Blankets . . . . . . . . . . . . . . . . . . . . . . . . 192 5.5.3 Multi-party Secure Computation . . . . . . . . . . . . . . . 195 5.5.4 Von Neumann Extractor . . . . . . . . . . . . . . . . . . . . 200 5.6 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 6 Discussion 209 6.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209 6.2 Directions for Future Work . . . . . . . . . . . . . . . . . . . . . . . 211 A Bunched Logic and Probabilistic Separation Logic 231 A.1 Proofs related to Bunched Logic . . . . . . . . . . . . . . . . . . . . 231 A.2 Proofs related to Probabilistic Separation Logic . . . . . . . . . . . 235 ix B LINA: A Separation Logic for Negative Dependence 242 B.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242 B.2 A BI Frame for Negative Association . . . . . . . . . . . . . . . . . 246 B.2.1 Capturing Negative Association . . . . . . . . . . . . . . . 246 B.2.2 Omitted Proofs of Frame Conditions . . . . . . . . . . . . . 251 B.3 Soundness and Completeness of 𝑀-BI algebras . . . . . . . . . . . 253 B.3.1 Algebraic Soundness and Completeness . . . . . . . . . . . 253 B.3.2 Soundness of 𝑀-BI formulas . . . . . . . . . . . . . . . . . 255 B.3.3 Completeness of 𝑀-BI formulas . . . . . . . . . . . . . . . 257 B.4 A 𝑀-BI Model for Independence and Negative Association . . . . 258 B.4.1 Independence Implies PNA . . . . . . . . . . . . . . . . . . 258 B.4.2 Axioms of Negative Association . . . . . . . . . . . . . . . 261 B.4.3 The Restriction Property of 𝑀-BI Formulas . . . . . . . . . 263 C DIBI: A Bunched Logic for Conditional Independence 266 C.1 A Probabilistic Model of DIBI . . . . . . . . . . . . . . . . . . . . . 266 C.1.1 Well-definedness of the Structure . . . . . . . . . . . . . . . 266 C.1.2 Associativity of Parallel Composition . . . . . . . . . . . . 270 C.1.3 Commutativity of Parallel Composition . . . . . . . . . . . 273 C.1.4 Other Properties Used in Proving Frame Conditions . . . . 274 C.1.5 Main Theorem: Proving Frame Conditions . . . . . . . . . 276 C.2 Capturing Conditional Independence . . . . . . . . . . . . . . . . 279 C.2.1 Properties of the Probabilistic Frame . . . . . . . . . . . . . 279 C.2.2 Key Lemmas: Conditional Independence is Expressed . . 283 C.2.3 Validating Graphoid Axioms, Section 4.2.3 . . . . . . . . . 290 C.3 CPSL Assertion Logic . . . . . . . . . . . . . . . . . . . . . . . . . . 292 C.3.1 Restriction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294 C.3.2 Extra Axioms . . . . . . . . . . . . . . . . . . . . . . . . . . 298 C.4 CPSL Soundness . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308 D The Unary Fragment Bluebell for Reasoning About Independence and Conditional Independence 314 D.1 The Rules of BLUEBELL . . . . . . . . . . . . . . . . . . . . . . . . . 314 D.1.1 Program Semantics . . . . . . . . . . . . . . . . . . . . . . . 315 D.2 Measure Theory Lemmas . . . . . . . . . . . . . . . . . . . . . . . 316 D.3 Construction of the BLUEBELL Model . . . . . . . . . . . . . . . . 331 D.4 Characterizations of Joint Conditioning . . . . . . . . . . . . . . . 336 D.5 Soundness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 340 D.5.1 Soundness of Primitive Rules . . . . . . . . . . . . . . . . . 340 D.5.2 Soundness of Primitive WP Rules . . . . . . . . . . . . . . 360 D.5.3 Soundness of Derived Rules . . . . . . . . . . . . . . . . . . 371 x LIST OF FIGURES 2.2 BI frame requirements (with outermost universal quantification omitted). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.3 Satisfaction for BI . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.4 Hilbert system for BI . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.5 pWhile command syntax . . . . . . . . . . . . . . . . . . . . . . . 39 2.6 Semantics of Expressions and Distributions . . . . . . . . . . . . . 41 2.7 Program semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 2.8 Rules of Probabilistic Separation Logic . . . . . . . . . . . . . . . 51 3.1 Hilbert system for 𝑀-BI . . . . . . . . . . . . . . . . . . . . . . . . 69 3.2 New LINA rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 3.3 Bloom filter examples . . . . . . . . . . . . . . . . . . . . . . . . . 85 3.4 Check the membership of a new item . . . . . . . . . . . . . . . . 89 3.5 Permutation hashing . . . . . . . . . . . . . . . . . . . . . . . . . . 95 3.6 Fully-dynamic dictionary [Ding and König, 2011] . . . . . . . . . 98 3.7 Repeated balls-into-bins [Becchetti et al., 2019] . . . . . . . . . . . 106 4.1 From probabilistic programs to kernels . . . . . . . . . . . . . . . 117 4.2 DIBI frame requirements (with outermost universal quantifica- tion omitted for readability). . . . . . . . . . . . . . . . . . . . . . 120 4.3 Satisfaction for DIBI . . . . . . . . . . . . . . . . . . . . . . . . . . 121 4.4 Hilbert system for DIBI . . . . . . . . . . . . . . . . . . . . . . . . 123 4.5 Proof rules: CPSL . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 4.6 Example programs . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 5.1 Program Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 5.2 Satisfaction for BI formulas on RA . . . . . . . . . . . . . . . . . . 169 5.3 Primitive rules of BLUEBELL. . . . . . . . . . . . . . . . . . . . . . 180 5.4 Derived rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 5.5 The primitive WP rules of BLUEBELL. . . . . . . . . . . . . . . . . 185 5.6 Derived WP rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . 186 5.7 One time pad. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 5.8 Von Neumann extractor. . . . . . . . . . . . . . . . . . . . . . . . . 201 5.9 Proof outline of the Von Neumann extractor example. . . . . . . 202 D.1 The assertions used in BLUEBELL. . . . . . . . . . . . . . . . . . . 314 xi CHAPTER 1 INTRODUCTION 1.1 Probabilistic Programs Whether one believes the world we live in is fundamentally deterministic or the result of some dice rolling, probability offers a useful lens to model and analyze various phenomena. For example: “Would it rain tomorrow?” “Who would win US Open this year?” ”How unlikely are these constituencies to be divided in a such biased way? ” These are all scenarios where we can use probability to distill our uncertainties into some quantities. For computer programs, probabilities again play an important role. For in- stance, when an algorithm’s efficiency can vary largely depending on the in- put, it makes sense to consider the average cost, which makes an assumption about the program’s distribution and then computes the expected value of the algorithm’s cost, when executed on inputs drawn from that distribution. Here, the probability is not used by the program — we just use it to model our un- certainties; moreover, we can also harness probabilistic mechanisms in the de- sign of the algorithms. For example, while the deterministic version of Quick- sort Hoare [1961] needs 𝑂 (𝑛2) comparisons to sort 𝑛 elements at the worst case, the randomized Quicksort partitions the array based on a randomized pivot and makes the average cost for every scenario (including the worst case) 𝑂 (𝑛 log 𝑛). Similarly, using random bits allows algorithms to guarantee better average per- formance against adversaries in distributed systems [Fischer et al., 1985, Lynch, 1996] and cache management [Psounis and Prabhakar, 2001, Suri, 2020]. 1 Randomness also has many other usages in algorithms. Randomized assign- ments ensures fairness when we want different outcomes all have possibilities to occur, each with the desired probability. In cryptography, randomness makes the secrets hard to guess and thus leads to security guarantees. Randomness also allows us to trade accuracy for efficiency. For instance, although find- ing solutions for integer linear programming is NP-hard, randomized round- ing [Raghavan and Tompson, 1987] finds solutions with a good probability and runs in polynomial time. In primality testing, where the goal is to determine whether a given number 𝑛 is prime, Miller–Rabin primality test [Rabin, 1980] probabilistically samples integers that may witness 𝑛 to be composite and in polynomial time determines 𝑛’s primality, with an exponentially small proba- bility of mistaking a composite number as a prime. Following early breakthroughs in randomized algorithms, the seminal work Kozen [1981] gives formal semantics for a programming language that allows the usage of randomness. Roughly, probabilistic programs in Kozen [1981] ex- tend standard imperative programming language with a command for sam- pling from distributions. Kozen presents two natural and equivalent semantics of probabilistic programs: the first reflects the view of probabilistic programs as standard programs reading a tape of random bits, and the second directly interprets probabilistic programs as maps from distributions to distributions. Expressing randomized algorithms as probabilistic programs pins down their behaviors precisely through the formal semantics and then facilitates rig- orous analysis of these algorithms. In the study of programming languages, researchers developed various formal methods for systematically checking the correctness of programs. One kind of formal method is deductive verification, 2 where we use an expressive logic to specify the desired behavior of a program and apply logical rules, i.e., deduction, to prove the validity of such specifica- tions. This thesis will focus on the deductive verification of probabilistic pro- grams through program logic. It is worth noting that, besides randomized algorithm, probabilistic pro- gramming languages are also developed and implemented to describe compli- cated probabilistic processes succinctly [Gordon et al., 2014]. There, an impor- tant addition to the language is the conditioning operator, sometimes also called the observe statement, which transforms a distribution to a conditional distribu- tion. Notably, the effect of the conditioning operator can be simulated using a while loop, but adding the conditioning operator in the language facilitates po- tentially different implementations of this command, whose effect is computa- tionally expensive to implement exactly and often only approximated. Histori- cally, the design of randomized algorithms rarely use the conditioning operator, Since we focus on verifying probabilistic programs for randomized algorithms in this thesis, and we leave out the conditioning operator in our probabilistic programming language. 1.2 Independence and Dependencies in Programs In our discourse, we consider probabilistic program variables as a superset of deterministic program variables: each probabilistic program variable’s value is sampled from a distribution, and a deterministic program variable can be con- sidered as sampling its value from a point-mass distribution. We also abbreviate “probabilistic program variables” as “variables” sometimes. 3 An important and ubiquitous relation between two probabilistic programs variables is probabilistic independence. Independence between two variables means that their values are unrelated, i.e., knowing the outcome of one of the variables does not change one’s knowledge of the distribution of the other, vice versa. Intuitively (and somewhat tautologically), two probabilistic variables are independent if they are derived from fresh and distinct sources of randomness, like two coin flips. In contrast, a coin flip 𝑥 and the derived variable 𝑥 + 1 are clearly not independent because the value of 𝑥 dictates the value 𝑥 + 1 gets. However, two variables that use a shared source of randomness and even have logical dependency can also be probabilistically independent. For instance, in one-time-pad encryption, we assume a 𝑙-bit message 𝑚 is drawn from some distribution over binary strings, and we draw a key 𝑘 from the uniform dis- tribution over 𝑙-bit binary strings and encrypt the message 𝑚 into 𝑐, defined to be 𝑚 xor 𝑘 . This ciphered message 𝑐 is probabilistically independent of the original message 𝑚, though it is also clearly derived from the original message. Also, in the degenerated case, when a variable is deterministic, then knowing its value does not give any information about how the outcome of other variables are sampled. Because of that, deterministic variables are independent from any other variables. Sometimes, two variables 𝐴, 𝐵 are not exactly probabilistic independent but, when we fix the value of a third variable 𝐶, then the values of 𝐴 and 𝐵 be- come irrelevant. This is a case where 𝐴 and 𝐵 are conditionally independent given 𝐶. Or, two variables 𝐴, 𝐵 may be negatively associated in that when 𝐴 attains a higher value, then 𝐵 tends to attain a lower value. We will refer to such re- lations between program variables concerning their probabilistic dependencies and independence as (in)dependencies. 4 Knowing the (in)dependencies between program variables can be extremely helpful in program analysis for multiple reasons. First, sometimes ensuring the desired (in)dependencies is straightforwardly the goal. For example, in cryptog- raphy, perfect security means that the public information is independent from the secrets. In multi-party secured computation, multiple parties want to com- pute a value that depends on each party’s secrets without divulging their own secrets. The perfect security is not an appropriate goal here because the dif- ferent parties want the computed result to be made public and that value is not independent from their secrets. A more appropriate goal is the conditional inde- pendence of each party’s view and the other parties’ secrets given the outcome of the computed result, so establishing that conditional independence proves the protocol correct. Second, (in)dependencies facilitates further analysis. For example, the law of large numbers says that, if we draw a large number of independent sam- ples from a distribution, then the empirical average of the results converges to the expected value. Various inequalities upper-bound the probability that the empirical average deviates from the expected value for more than certain amount — these inequalities are called the “concentration bounds.” Concentra- tion bounds can be applied, for instance, to upper-bound the probability that the randomized Quicksort terminates within some desired time bound on an arbi- trary instance, because the choice of each randomized pivot is independent and it is unlikely to always choose the “bad” pivots. Some concentration bounds also hold for negatively associated variables. Intuitively, if one variable getting a bigger outcome means the others get smaller outcomes, then their deviation from the expected value would likely cancel out. As an application, concentra- tion bounds also help analyzing the collision probability or overflowing prob- 5 ability of hash algorithms: when we hash a fixed number of items into a set of buckets, one bucket getting more items means less items can go to the other buckets, so the number of items hashed to different buckets are negatively asso- ciated, and thus we can apply concentration bounds to deduce that it is unlikely for many buckets to get a lot of items. Other than program analysis, we can also leverage (in)dependencies to rep- resent a probabilistic model more concisely [Koller and Friedman, 2009], iden- tify parallelizable computations, and perform more efficient probabilistic infer- ence [Holtzen, 2021]. Analyzing (in)dependencies between probabilistic program variables, how- ever, is intricate. First of all, testing probabilistic properties is hard. For deter- ministic programs, we can run an implementation and test whether a property is violated by the implementation; for probabilistic behaviors, however, test- ing can only exhibit a finite number of execution traces, from which we can- not conclude (in)dependencies between program variables in the distribution of execution traces with certainty. Second, our mental model of probabilistic (in)dependencies can be unreliable. As we illustrated above through the exam- ple of one-time-pad encryption, somewhat counter-intuitively, logically depen- dent variables can also be probabilistically independent. As another example, consider the Bloom filter [Bloom, 1970], a widely-used randomized data struc- ture for membership queries, which is highly space-efficient, at the price of re- turning false positives sometimes. A Bloom filter stores a relatively small array of 0-1 bits, and an item is mapped to a set of indices on the array using dis- tinct hash functions and the corresponding bits are flipped to 1 when an item is added. When analyzing the Bloom filter’s false positive rate, many sources 6 (e.g., Mullin [1983], Blustein and El-Maazawi [2002]) have mistaken the value on different indices of the Bloom filter as independent,1 while they are not be- cause one index flipped to 1 means other indices are more likely to be 0. A possible explanation of such confusion is that people may intuitively think that independence is preserved through arbitrary composition, while they are not. These difficulties all speak to the need for deductive verification of (in)dependencies in probabilistic programs, whose rigor allows us to confi- dently use (in)dependencies in analysis. 1.3 Separation Logic for Independence and Dependencies Separation logic extends Hoare logic to reason about programs. Originally, it was developed to verify programs that manipulate pointer data structures, i.e., heaps. The core innovation is the introduction of ”separating conjunction” (symbolized by ∗), a logical connective that allows assertions about distinct, non-overlapping regions of memory to be combined. Unlike traditional con- junction 𝑃 ∧ 𝑄 that only requires the validity of two assertions 𝑃 and 𝑄, sepa- rating conjunction 𝑃 ∗ 𝑄 also asserts the disjointness of the subheaps validat- ing 𝑃 and 𝑄. At the program logic level, the signature frame rule allows local reasoning about heap manipulations while preserving propositions on disjoint pieces of memory. Using these new assertions and rules, Separation Logic ad- dresses a critical limitation of classical Hoare logic, that reasoning about pointer- manipulating programs was hindered by complex aliasing and interference be- 1This issue is first pointed out by Bose et al. [2008], which also attempted to fix it. Chris- tensen et al. [2010] later identified an issue in the definition of Stirling numbers of the second kind in Bose et al. [2008]. Gopinathan and Sergey [2020] formally certifies the analysis using the theorem prover ROCQ. 7 tween memory regions. The ideas that “we can reason about separate components separately” makes no special assumptions about heaps, so Separation Logic can be a general tool for reasoning about resources that can be separated or shared among different entities. A influential extension of heap-based Separation Logic is Concurrent Separation Logic (CSL) [Brookes, 2007a, Vafeiadis and Parkinson, 2007, Brookes, 2007b], which leverages the separating conjunction to ensure that concurrent modifications to the heap are localized and do not interfere with each other. It has led to practical and scalable verification tools like Infer [Facebook] for au- tomatically verifying properties important to security, concurrency and in other domains. More recently, probabilistic separation logic (PSL) by Barthe et al. [2019] reappropriates Separation Logic for reasoning about probabilistic programs, with the insight that independence is a separation between different compo- nents of a distribution. They do not make the distinction between the store and the heap — both are considered as memories — and build their program logic for probabilistic programs interpreted as maps between distributions over memories. In PSL, the separation conjunction 𝑃 ∗ 𝑄 asserts the indepen- dence of the formula 𝑃 and formula 𝑄 by requiring 𝑃 and 𝑄 to use disjoint sets of variables and the two sets of variables to be independent. PSL enjoys a proof system analogous to Separation Logic, also with a frame rule, but in- stead for establishing probabilistic independence of probabilistic program vari- ables. While Barthe et al. [2019] demonstrates that their program logic, with the help of domain-specific axioms, can establish probabilistic independence in several cryptography-based examples, we want to know how much further we 8 can push this idea. Concretely, we ask the following questions: 1. Can we also adapt separation logic for reasoning about ”probabilistic sep- aration” notions that are weaker than independence, such as conditional independence or negative association? 2. Can we make the assertion logic more expressive? For instance, existing PSL conflated probabilistic independence and variable disjointness; can we precisely assert probabilistic independence without assuming variable disjointness? 3. Can this style of “probabilistic separation logic” scale to bigger, more com- plicated programs? 1.4 Outline of the Thesis In this thesis, we first overview the assertion logic underpinning separation logic, Bunched Logic (abbreviated as BI for “the logic of Bunched Implications”), in chapter 2. The original BI is an important stepping stone before we intro- duce its variations and other practical models of probabilistic separation logic. In chapter 3, we extend probabilistic separation logic to also support composi- tional reasoning of negative association and call the new logic LINA. In chap- ter 4, we introduce a new assertion logic DIBI, which extends BI with a non- commutative conjunction for modeling dependent resources, and design a pro- gram logic CPSL on top of DIBI for proving conditional independence in prob- abilistic programs. Chapter 3 and Chapter 4 together give a positive answer to 9 Question 1. Last, in chapter 5, we focus on the unary fragment of BLUEBELL, a pro- gram logic designed for integrating unary and relational reasoning of proba- bilistic program. The unary fragment of BLUEBELL gives an alternative program logic for proving conditional independence and independence. While CPSL ex- presses conditional independence using two different conjunctions, BLUEBELL, inspired by Li et al. [2023a], introduces a modality to the logic for condition- ing on distributions and expresses conditional independence using the modal- ity and the usual separation conjunction for independence. This new modality also allows us to express probabilistic dependence such as, depending on the outcome 𝑣 of the variable 𝑥, the variable 𝑦 is distributed as some 𝜅(𝑣). Mean- while, similar to LINA and CPSL, BLUEBELL is a program logic developed for imperative probabilistic programs. In BLUEBELL, we are able to decouple the as- sumption of variable disjointness from assertion of probabilistic independence, using the probabilistic independence BI model proposed by Li et al. [2023a] and permissions, a concept developed in the concurrent separation logic for tracking who can read from and write into a resource. This feature also answers Question 2 positively. In addition, we apply LINA and BLUEBELL on some non-trivial probabilis- tic programs, demonstrating their potential to scale. CPSL is only applied to smaller examples. One difficulty in applying CPSL to more complicated pro- grams is that, as a result of our design choices, the program logic rules only apply to assertions following certain syntactic restrictions. In designing BLUE- BELL, we prioritize the ergonomics and no longer impose syntactic restrictions to assertion logic; instead, all assertions can be used in the program logic rules. 10 We also see this as a step towards a more scalable probabilistic separation logic, thus making progress in answering Question 3. We also want to note that chapter 2 is mainly based on prior work Docherty [2019] and Barthe et al. [2019]. Chapter 4 is based on Bao et al. [2021]; Chapter 3 is based on Bao et al. [2022]; Chapter 5 is based on Bao et al. [2025]. 11 CHAPTER 2 BUNCHED LOGIC AND PROBABILISTIC SEPARATION LOGIC 2.1 Background A key feature of Separation Logic is using bunched logic instead of the usual propositional logic or first-order logic for asserting program states. Bunched logic is a substructural logic formulated by O’Hearn and Pym [1999]. The usual propositional logic satisfies three structural rules — WEAKENING, CONTRAC- TION and EXCHANGE. Intuitively, WEAKENING allows one to add unused things to the context; CONTRACTION allows one to contract duplicated things in the context, and EXCHANGE allows one to exchange things in the context. Bunched logic does not require WEAKENING and CONTRACTION. The lack of contraction makes its contexts behave like non-duplicable resources; in addi- tion, the lack of weakening makes its contexts behave like resources that have to be used. While this choice of structural rules is exactly the same as in linear logic, bunched logic also allows contexts joined by another connective ‘;’ that satisfies all three structural rules. That is, in sequent calculus style presentation, Γ ⊢ 𝜓 Γ; 𝜙 ⊢ 𝜓 WEAKENING Γ; 𝜙; 𝜙 ⊢ 𝜓 Γ; 𝜙 ⊢ 𝜓 CONTRACTION Γ1; 𝜙; Γ2;𝜓; Γ3 ⊢ 𝜃 Γ1;𝜓; Γ2; 𝜙; Γ3 ⊢ 𝜃 EXCHANGE-1 Γ1, 𝜙, Γ2, 𝜓, Γ3 ⊢ 𝜃 Γ1, 𝜓, Γ2, 𝜙, Γ3 ⊢ 𝜃 EXCHANGE-2 (a) Substructural rules for bunched logic , ; 𝜑 𝜓 𝑥 (b) An example context 12 bunched logic has the structural rules in fig. 2.1a. Contexts that interleave these two connectives are tree-structured instead of list-structured, for example, as the context given in fig. 2.1b, thus giving the logic the name bunched logic. Since it allows different ways to combine the contexts, bunched logic pro- vides a flexible foundation for reasoning about resources whose sharing and separation need careful accounting. We can already see it from the connectives in bunched logic. First, the two ways to combine the contexts induce two con- junctions, the multiplicative conjunction ∗ and the additive conjunction ∧, Γ ⊢ 𝜙 Δ ⊢ 𝜓 ∗-I Γ,Δ ⊢ 𝜙 ∗ 𝜓 Γ ⊢ 𝜙 Δ ⊢ 𝜓 ∧-I Γ;Δ ⊢ 𝜙 ∧ 𝜓 Informally, the assertion 𝜙 ∗ 𝜓 can be used to ensure properties 𝜙, 𝜓 hold on sep- arate resources, while 𝜙 ∧ 𝜓 allows us to assert the validity of facts 𝜙, 𝜓 without extra requirements. Analogously, bunched logic has a multiplicative implica- tion as well as a standard implication→, Γ, 𝜙 ⊢ 𝜓 −∗-I Γ ⊢ 𝜙 −∗ 𝜓 Γ; 𝜑 ⊢ 𝜓 →-I Γ ⊢ 𝜙→ 𝜓 The multiplicative version 𝜙 −∗ 𝜓 asserts that combining current state with a separate resource satisfying 𝜙 would validate 𝜓, while 𝜙 → 𝜓 simply asserts that fact 𝜙 implies 𝜓. In this chapter, we first give a formal overview of bunched logic, introducing its syntax, semantics, and proof system; we then show that the proof system is sound and complete. All the methodology and proofs of this part are taken from Docherty [2019]. What we aim for is to list the precise definitions and results needed for the rest of the chapters; we also detail some cases of induction proofs omitted in Docherty [2019] to illustrate how they are proved. 13 It is worth noting that there are varied presentations of bunched logic’s se- mantics in the literature: the original paper by [O’Hearn and Pym, 1999] inter- prets BI formula over doubly closed categories; early works in separation logic often interpret BI over partial commutative monoids that satisfies extra condi- tions [Calcagno et al., 2007]; more recent works in higher-order concurrent sep- aration logic use a customized resource algebra whose binary operation is total and may not have a single unit, with extra functions on elements [Jung et al., 2018], etc. We adopt the system from Simon Docherty’s thesis [Docherty, 2019], because it provides a uniform account of various bunched logics, accompanied with a completeness proof — we do not know if the proof system is complete with other variations of semantics. After introducing the metatheory of bunched logic, we introduce a proba- bilistic separation logic based on bunched logic. First, we describe a concrete bunched logic model XD based on probabilistic memories, i.e., distributions over program memories. In this model, separating conjunction can be used to assert probabilistic independence. Then, we define an imperative probabilistic language pWhile that operates on probabilistic memories. Last, in this chapter, we describe a program logic that reasons about pWhile programs with speci- fications asserted using the bunched logic formulas of the concrete probabilis- tic model XD. This program logic is a simplified but also generalized version of probabilistic separation logic in prior work [Barthe et al., 2019] for proving probabilistic independence. 14 2.2 Bunched Logic (BI) 2.2.1 Syntax and Semantics The set of BI formulas, FormBI, extends propositional formula with the multi- plicative conjunction 𝑃 ∗ 𝑄, and the implication 𝑃 −∗ 𝑄 and the unit 𝐼 associated with it. 𝑃,𝑄 ::= 𝑝 ∈ AP | ⊤ | ⊥ | 𝐼 | 𝑃 ∧𝑄 | 𝑃 ∨𝑄 | 𝑃→ 𝑄 | 𝑃 ∗ 𝑄 | 𝑃 −∗ 𝑄 BI formulas are interpreted on a kind of mathematical structure named BI frames. Definition 2.2.1 (Downwards-Closed BI Frame). A Downwards-Closed BI frame is a structure X = (𝑋, ⊑, ◦, 𝐸) such that ⊑ is a preorder (i.e., a transitive and reflexive relation), 𝐸 ⊆ 𝑋 , and ◦ : 𝑋 × 𝑋 → P(𝑋) is a non-deterministic binary operation, satisfying the rules in Figure 2.2. 𝑧 ∈ 𝑥 ◦ 𝑦 → 𝑧 ∈ 𝑦 ◦ 𝑥; (Commutativity) 𝑤 ∈ 𝑡 ◦ 𝑧 ∧ 𝑡 ∈ 𝑥 ◦ 𝑦 → ∃𝑠 ( 𝑠 ∈ 𝑦 ◦ 𝑧 ∧ 𝑤 ∈ 𝑥 ◦ 𝑠 ) ; (Associativity) ∃𝑒 ∈ 𝐸 (𝑥 ∈ 𝑒 ◦ 𝑥); (Unit Existence) 𝑒 ∈ 𝐸 ∧ 𝑒 ⊑ 𝑒′ → 𝑒′ ∈ 𝐸 ; (Unit Closure) 𝑒 ∈ 𝐸 ∧ 𝑦 ∈ 𝑥 ◦ 𝑒 → 𝑥 ⊑ 𝑦; (Unit Coherence) 𝑧 ∈ 𝑥 ◦ 𝑦 ∧ 𝑥′ ⊑ 𝑥 ∧ 𝑦′ ⊑ 𝑦′ → ∃𝑧′ (𝑧′ ⊑ 𝑧 ∧ 𝑧′ ∈ 𝑥′ ◦ 𝑦′). (Down-Closed) Figure 2.2: BI frame requirements (with outermost universal quantification omitted). Intuitively, 𝑋 is a set of states, the preorder ⊑ describes when a smaller state can be extended to a larger state, the binary operator ◦ offers a way of combin- ing states, and 𝐸 is a set of states that act like units with respect to ◦. The binary 15 operator returns a set of states instead of a single state, and thus it can be de- terministic (at most one state returned) or non-deterministic, partial (empty set returned) or total. In alternative presentations of BI frames as partial commuta- tive monoids, the binary operator is defined to be a partial map 𝑋 × 𝑋 ⇀ 𝑋 . But the proof of Bunched logic’s completeness relies on the frame’s admission of non-deterministic models Docherty [2019]. Furthermore, the non-deterministic combination is useful for reasoning about probabilistic states, as we showcase in Chapter 3 for negative dependence. For the preorder, there are two opposite but equally sensible readings of 𝑥 ⊑ 𝑦 where 𝑥 and 𝑦 are interpreted as resources: 1. 𝑦 as a resource is an extension of resource 𝑥 and we can convert 𝑦 to 𝑥 by using up some part of 𝑦; 2. Or, resource 𝑥 converts to resource 𝑦. To avoid confusion, in this thesis, we consistently use the first reading. Also, we sometimes write 𝑥 ⊒ 𝑦 as an interchangeable notation for 𝑦 ⊑ 𝑥. The frame conditions define properties that must hold for all models of BI. The first three properties (Commutativity), (Associativity), and (Unit Existence) can be viewed as generalizations of familiar algebraic properties to non- deterministic operations. (Unit Existence) also relaxes the usual unit existence axiom for monoids, which states that there is one element 𝑒 that is the unit for all other elements with respect to the binary operation, to allow different units 𝑒 ∈ 𝐸 chosen for different 𝑥. (Unit Closure) states that the set 𝐸 is closed under the preorder ⊑. (Unit Coherence) say that if 𝑦 can be obtained composing 𝑥 with a unit 𝑒 ∈ 𝐸 , then 𝑦 is an extension of 𝑥; roughly, this ensures that 𝐸 only has elements that behave like units. Last, (Down-Closed) is another coherence con- dition for the order ⊑ and the composition ◦, which says that for 𝑧 ∈ 𝑥 ◦ 𝑦, then 16 the composition of any 𝑥′ smaller than 𝑥 and any 𝑦′ smaller than 𝑦 contains an element 𝑧′ smaller than 𝑧. Informally, it says that the resource conversion of the components 𝑥, 𝑦 translates into the resource conversion of the composition 𝑧.1 We then use a Kripke-style semantics for BI. Given a BI frame, the semantics defines which states in the frame satisfy each formula. Since the semantics is defined inductively on formulas, we first need a specification of which states satisfy the atomic propositions. Definition 2.2.2 (Valuation and model). A persistent valuation is an assignment V : AP → P(𝑋) of atomic propositions to subsets of states of a BI frame satis- fying: if 𝑥 ∈ V(𝑝) and 𝑦 ⊒ 𝑥 then 𝑦 ∈ V(𝑝). A BI model (X,V) is a BI frame X together with a persistent valuationV. We now give a semantics to BI formulas in a BI model. 𝑥 |=V ⊤ always 𝑥 |=V ⊥ never 𝑥 |=V 𝐼 iff 𝑥 ∈ 𝐸 𝑥 |=V p iff 𝑥 ∈ V(p) 𝑥 |=V 𝑃 ∧𝑄 iff 𝑥 |=V 𝑃 and 𝑥 |=V 𝑄 𝑥 |=V 𝑃 ∨𝑄 iff 𝑥 |=V 𝑃 or 𝑥 |=V 𝑄 𝑥 |=V 𝑃→ 𝑄 iff for all 𝑦 ⊒ 𝑥, 𝑦 |=V 𝑃 implies 𝑦 |=V 𝑄 𝑥 |=V 𝑃 ∗ 𝑄 iff there exist 𝑥′, 𝑦, 𝑧 s.t. 𝑥 ⊒ 𝑥′ ∈ 𝑦 ◦ 𝑧, 𝑦 |=V 𝑃 and 𝑧 |=V 𝑄 𝑥 |=V 𝑃 −∗ 𝑄 iff for all 𝑦, 𝑧 s.t. 𝑧 ∈ 𝑥 ◦ 𝑦, 𝑦 |=V 𝑃 implies 𝑧 |=V 𝑄 Figure 2.3: Satisfaction for BI Definition 2.2.3 (BI Satisfaction and Validity). Satisfaction at a state 𝑥 of a model (X,V) is inductively defined by the clauses in Figure 2.3. 𝑃 is valid in a model, X |=V 𝑃, iff 𝑥 |=V 𝑃 for all 𝑥 ∈ X. 𝑃 is valid, |= 𝑃, iff 𝑃 is valid in all models. 𝑃 |= 𝑄 iff, for all models (X,V), for any state 𝑥 ∈ X, 𝑥 |=V 𝑃 implies 𝑥 |=V 𝑄. 1It is also possible to interpret BI formulas on structures without (Down-Closed) while still ensuring soundness and completeness with respect to usual BI proof system and the persis- tence of formulas, but other axioms ((Associativity)) needs to be more delicate. The assumption of (Down-Closed) is common in the presentation of BI models in the literature. 17 Where the context is clear, we omit the subscript V on the satisfaction re- lation. With the semantics in Figure 2.3, persistence on propositional atoms extends to all formulas: Lemma 2.2.1 (Persistence Lemma). For all BI formula 𝑃, if 𝑥 |= 𝑃 and 𝑦 ⊒ 𝑥 then 𝑦 |= 𝑃. Remark The emphasis on properties being persistent roots back to the history of intuitionistic logic. Classical logic has the law of excluded middle, ⊢ 𝜑 ∨ ¬𝜑, which says that for any property 𝑝, either 𝑝 holds or 𝑝 does not hold. However, with some readings of formula satisfaction, the law seems to be on precarious ground. For instance, if we interpret 𝑥 |= 𝑝 as saying that at state 𝑥, the fact 𝑝 has been verified to be true, and then, 𝑥 |= ¬𝑝 would be saying that at state 𝑥, the fact ¬𝑝 has been verified to be true, then we should not expect the law of excluded middle to be valid — it is possible that neither 𝑝 nor ¬𝑝 has been verified. This motivates non-classical logic without the law of excluded middle, and furthermore, many properties that motivate such readings of formulas are naturally persistent. For instance, suppose states are ordered by temporal order, if 𝑝 has been verified to be true at state 𝑥, then for every state 𝑥′ following 𝑥, 𝑝 has been verified to be true at 𝑥′ too. 2.2.2 Proof System In the study of logic, we are not only interested in when a formula holds, which is captured by the semantics, but also interested in how to prove a formula holds – a useful approach is to derive new formulas using formulas known to hold 18 𝑃 ⊢ 𝑃 AX 𝑃 ⊢ ⊤ TOP ⊥ ⊢ 𝑃 BOT 𝑃 ⊢ 𝑅 𝑄 ⊢ 𝑅 𝑃 ∨𝑄 ⊢ 𝑅 ∨-E 𝑃 ⊢ 𝑄𝑖 𝑃 ⊢ 𝑄1 ∨𝑄2 ∨-I 𝑃 ⊢ 𝑄 𝑃 ⊢ 𝑅 𝑃 ⊢ 𝑄 ∧ 𝑅 ∧-I-R 𝑄 ⊢ 𝑅 𝑃 ∧𝑄 ⊢ 𝑅 ∧-I-L 𝑃 ⊢ 𝑄1 ∧𝑄2 𝑃 ⊢ 𝑄𝑖 ∧-E 𝑃 ∧𝑄 ⊢ 𝑅 𝑃 ⊢ 𝑄 → 𝑅 →-I 𝑃 ⊢ 𝑄 → 𝑅 𝑃 ⊢ 𝑄 𝑃 ⊢ 𝑅 →-E 𝑃 ⊢ 𝑅 𝑄 ⊢ 𝑆 𝑃 ∗ 𝑄 ⊢ 𝑅 ∗ 𝑆 ∗-CONJ 𝑃 ∗ 𝑄 ⊢ 𝑅 𝑃 ⊢ 𝑄 −∗ 𝑅 −∗-I 𝑃 ⊢ 𝑄 −∗ 𝑅 𝑆 ⊢ 𝑄 𝑃 ∗ 𝑆 ⊢ 𝑅 −∗-E 𝑃 ⊣⊢ 𝑃 ∗ 𝐼 ∗-UNIT 𝑃 ∗ 𝑄 ⊢ 𝑄 ∗ 𝑃 ∗-COMM (𝑃 ∗ 𝑄) ∗ 𝑅 ⊣⊢ 𝑃 ∗ (𝑄 ∗ 𝑅) ∗-ASSOC Figure 2.4: Hilbert system for BI following syntactic rules in a proof system. We present a Hilbert-style proof sys- tem for BI in fig. 2.4. This calculus extends a system for propositional logic with additional rules governing the multiplicative connectives ∗ and −∗ and the multiplicative unit 𝐼. These rules say that the multiplicative conjunction ∗ is commutative, associative, the multiplicative unit 𝐼 interacts with ∗ as expected, and −∗ is adjoint to ∗ just as the regular→ is adjoint to ∧. A useful proof system for a logic should be sound with respect to its seman- tics. That is, if a formula 𝜙 is derivable from another formula 𝜑 using the rules in the proof system, then 𝜓 should always hold when 𝜙 holds. On top of that, 19 it is nicer if the proof system is also complete with respect to its semantics. That requires, if 𝜓 always holds when 𝜙 holds, then 𝜓 is derivable from 𝜙 as well. 2.2.3 Soundness and Completeness of BI A methodology for proving the soundness and completeness of bunched logic is given by Docherty [2019], inspired by the duality-theoretic approach to modal logic Goldblatt [1989]. First, BI is proved sound and complete with respect to an algebraic semantics obtained by interpreting the rules of the proof system as algebraic axioms. Next, the algebraic soundness is used to establish sound- ness of the proof system with respect to the Kripke semantics, and similarly, the algebraic completeness is used to establish overall completeness. Notably, a more straightforward proof for the soundness of the proof system is by induction on the proof rules; here, we instead present the duality-theoretic approach for proving soundness to illustrate the technique. Algebraic Soundness and Completeness of BI Proof System The algebraic semantics interpret BI formulas into elements in a structure that we call BI algebra. Definition 2.2.4 (BI Algebra). A BI algebra is an algebra A = (𝐴,∧A,∨A,→A ,⊤A,⊥A, ∗A,−∗A, 𝐼A) such that, for all 𝑎, 𝑏, 𝑐, 𝑑 ∈ 𝐴: • (𝐴,∧A,∨A,→A,⊤A,⊥A) is a Heyting algebra, i.e., (𝐴,∧A,∨A,⊤A,⊥A) forms a bounded lattice (with join and meet operations written ∨A and ∧A and 20 with least element ⊥A and greatest element ⊤A) and→A is a binary opera- tion such that 𝑎 ∧A 𝑏 ≤A 𝑐 is equivalent to 𝑎 ≤ 𝑏 →A 𝑐. • (𝐴, ∗A, 𝐼A) is a commutative monoid; • 𝑎 ∗A 𝑏 ≤ 𝑐 iff 𝑎 ≤ 𝑏 −∗A 𝑐, where ≤ is the ordering associated with the Heyting algebra. In the following, we drop the subscripts A when it is clear that we are re- ferring to elements and operations in the BI algebra and overload the notations ⊤,⊥, ∗,−∗, 𝐼, which are also used as connectives in BI formulas. By Goldblatt [1989], the residuation property 𝑎 ∗A 𝑏 ≤ 𝑐 iff 𝑎 ≤ 𝑏 −∗A 𝑐 implies the following useful properties. Lemma 2.2.2. Given any BI algebra A, for any 𝑎, 𝑏, 𝑐 ∈ 𝐴, the following properties hold: (𝑎 ∨ 𝑏) ∗ 𝑐 = (𝑎 ∗ 𝑐) ∨ (𝑏 ∗ 𝑐) (BI-Alg:Dist-1) 𝑎 ∗ (𝑏 ∨ 𝑐) = (𝑎 ∗ 𝑏) ∨ (𝑎 ∗ 𝑐) (BI-Alg:Dist-2) 𝑎 ≤ 𝑎′ and 𝑏 ≤ 𝑏′ implies 𝑎 ∗ 𝑏 ≤ 𝑎′ ∗ 𝑏′ (BI-Alg:Coh) ⊥ ∗ 𝑎 = ⊥ = 𝑎 ∗ ⊥ (BI-Alg:Bot) We can interpret bunched logic formulas in a BI algebra A. Given an assign- ment V from atomic propositions to the carrier set of A, we can extend it to an algebraic interpretation of bunched logic formulas ⟦−⟧A : FormBI → 𝐴 by taking 21 the unique homomorphic extension of this assignment: ⟦𝑝⟧A = V(𝑝) ⟦⊤⟧A = ⊤ ⟦𝐼⟧A = 𝐼∗ ⟦⊥⟧A = ⊥ ⟦𝑃 ∧𝑄⟧A = ⟦𝑃⟧A ∧ ⟦𝑄⟧A ⟦𝑃 ∨𝑄⟧A = ⟦𝑃⟧A ∨ ⟦𝑄⟧A ⟦𝑃→ 𝑄⟧A = ⟦𝑃⟧A → ⟦𝑄⟧A ⟦𝑃 ∗ 𝑄⟧A = ⟦𝑃⟧A ∗ ⟦𝑄⟧A ⟦𝑃 −∗ 𝑄⟧A = ⟦𝑃⟧A −∗ ⟦𝑄⟧A Theorem 2.2.3 (Algebraic Soundness). If 𝑃 ⊢ 𝑄 is derivable, then ⟦𝑃⟧A ≤ ⟦𝑄⟧A for all algebraic interpretations ⟦−⟧A. Proof. By induction on the derivation of 𝑃 ⊢ 𝑄. For instance, for the case of ∗- CONJ: if 𝑃 ⊢ 𝑅 and 𝑄 ⊢ 𝑆, then by inductive hypothesis, we have ⟦𝑃⟧A ≤ ⟦𝑅⟧A and ⟦𝑄⟧A ≤ ⟦𝑆⟧A for all algebraic interpretations ⟦−⟧A. By BI-Alg:Coh, that means ⟦𝑃⟧A ∗ ⟦𝑄⟧A ≤ ⟦𝑅⟧A ∗ ⟦𝑆⟧A; therefore, for any algebraic interpretation ⟦−⟧A, ⟦𝑃 ∗ 𝑄⟧A = ⟦𝑃⟧A ∗ ⟦𝑄⟧A ≤ ⟦𝑅⟧A ∗ ⟦𝑆⟧A = ⟦𝑅 ∗ 𝑆⟧A □ To prove algebraic completeness, we construct a term BI algebra by quoti- enting formulas by equiderivability. 22 Definition 2.2.5 (Lindenbaum-Tarski Algebra). The Lindenbaum-Tarski algebra corresponding to the bunched logic is the set of all equivalence classes of inter- provable propositions. That is, define the equivalence relation 𝑃 ∼ 𝑄 as 𝑃 ⊢ 𝑄 and 𝑄 ⊢ 𝑃. Take 𝐼L, ⊤L, and ⊥L to be [𝐼]∼, [⊤]∼, and [⊥]∼, respectively. Then we define: [𝑃]∼ ∧L [𝑄]∼ = [𝑃 ∧𝑄]∼ (Lindenbaum–Tarski–And) [𝑃]∼ ∨L [𝑄]∼ = [𝑃 ∨𝑄]∼ (Lindenbaum–Tarski-Or) [𝑃]∼ →L [𝑄]∼ = [𝑃→ 𝑄]∼ (Lindenbaum–Tarski–Imp) [𝑃]∼ ∗L [𝑄]∼ = [𝑃 ∗ 𝑄]∼ (Lindenbaum–Tarski–SepAnd) [𝑃]∼ −∗L [𝑄]∼ = [𝑃 −∗ 𝑄]∼ (Lindenbaum–Tarski–SepImp) Lemma 2.2.4. The operations ∧L,∨L,→L, ∗L,−∗L are well-defined. Also, the structure ({[𝑃]∼}𝑃∈FormBI ,∧L,∨L,→L,⊤L,⊥L, ∗L,−∗L, 𝐼L) in Lindenbaum-Tarski algebra forms a BI algebra. Furthermore, let ⟦−⟧L be the algebraic interpretation obtained by extending the assignment 𝑝 ↦→ [𝑝]∼ for each atomic proposition 𝑝. Lemma 2.2.5. For any formula 𝑃 ∈ FormBI, ⟦𝑃⟧L = [𝑃]∼. The proof for lemma 2.2.4 and lemma 2.2.5 are straightforward, and we omit them here. The Lindenbaum-Tarski algebra is crucially used in the proof of algebraic completeness. Theorem 2.2.6 (Algebraic Completeness). If ⟦𝑃⟧A ≤ ⟦𝑄⟧A for all algebraic inter- pretations ⟦−⟧A, then 𝑃 ⊢ 𝑄 is derivable. Proof. For any 𝑃,𝑄 ∈ FormBI, if ⟦𝑃⟧A ≤ ⟦𝑄⟧A for all algebraic interpretations, then ⟦𝑃⟧L ≤ ⟦𝑄⟧L in the Lindenbaum-Tarski algebra. By lemma 2.2.5, that 23 means [𝑃]∼ ≤ [𝑄]∼. In addition, [𝑃]∼ ≤ [𝑄]∼ ⇔[⊤]∼ ∧A [𝑃]∼ ≤ [𝑄]∼ (𝑃 ⊣⊢ ⊤ ∧ 𝑃 by TOP, ∧-E, ∧-I-R, and also by definition Lindenbaum–Tarski–And) ⇔[⊤]∼ =A [𝑃]∼ →A [𝑄]∼ (By the residuation property of the bounded lattice) ⇔[⊤]∼ =A [𝑃→ 𝑄]∼ (By definition Lindenbaum–Tarski–Imp) ⇔⊤ ⊢ 𝑃→ 𝑄 ( TOP gives the other direction 𝑃→ 𝑄 ⊢ ⊤ of equiderivability) ⇔𝑃 ⊢ 𝑄 (By ∧-I-R, ∧-I-L, ∧-I,→-E) Thus, if ⟦𝑃⟧A ≤A ⟦𝑄⟧A for all BI algebras, then 𝑃 ⊢ 𝑄. □ Soundness of BI Proof Systems Next, we establish the soundness and completeness of BI algebra with respect to BI Kripke semantics. To show soundness, we first give a recipe to construct a BI algebra given a BI frame; in particular, the BI algebra’s carrier set consists of upwards-closed subsets of states in the BI frame — we can think of these subsets as states satisfied by specific formulas. This construction will help to prove that: if there exists a BI model (X,V) in which 𝑃 ̸ |=(X,V) 𝑄, then there exists a BI algebra and an algebraic interpretation ⟦−⟧A, such that ⟦𝑃⟧A ≰ ⟦𝑄⟧A. The construction is called the complex algebra of a BI frame. Definition 2.2.6 (Complex Algebra). If X is a BI frame, then the complex algebra 24 of X, written Com(X) is the structure (P⊑ (𝑋),∩,∪,→X , 𝑋, ∅, ∗,−∗, 𝐸) where P⊑ (𝑋) = {𝐴 ⊆ 𝑋 | 𝑎 ∈ 𝐴 ∧ 𝑎 ⊑ 𝑏 → 𝑏 ∈ 𝐴} 𝐴→X 𝐵 = {𝑎 | ∀𝑏. 𝑎 ⊑ 𝑏 ∧ 𝑏 ∈ 𝐴→ 𝑏 ∈ 𝐵} 𝐴 ∗ 𝐵 = {𝑥 | ∃𝑤, 𝑦, 𝑧. 𝑤 ⊑ 𝑥 ∧ 𝑤 ∈ 𝑦 ◦ 𝑧 ∧ 𝑦 ∈ 𝐴 ∧ 𝑧 ∈ 𝐵} 𝐴 −∗ 𝐵 = {𝑥 | ∀𝑤, 𝑦, 𝑧. (𝑥 ⊑ 𝑤 ∧ 𝑧 ∈ 𝑤 ◦ 𝑦 ∧ 𝑦 ∈ 𝐴) → 𝑧 ∈ 𝐵} The complex algebra of any BI frame forms a BI algebra. Lemma 2.2.7. If X = (𝑋, ⊑, ◦, 𝐸) is a BI frame, then Com(X) is a BI algebra. Proof. Given X = (𝑋, ⊑, ◦, 𝐸). Let us show that for any 𝐴 ∈ P⊑ (𝑋), 𝐴 ∗ 𝐸 = 𝐸 ∗ 𝐴 = 𝐴 and omit the rest of the conditions. For the first part, 𝐴 ∗ 𝐸 = {𝑥 | ∃𝑤, 𝑦, 𝑧. 𝑤 ⊑ 𝑥 ∧ 𝑤 ∈ 𝑦 ◦ 𝑧 ∧ 𝑦 ∈ 𝐴 ∧ 𝑧 ∈ 𝐸} = {𝑥 | ∃𝑤, 𝑦, 𝑧. 𝑤 ⊑ 𝑥 ∧ 𝑤 ∈ 𝑧 ◦ 𝑦 ∧ 𝑧 ∈ 𝐸 ∧ 𝑦 ∈ 𝐴} (By Commutativity) = 𝐸 ∗ 𝐴 (2.1) For the second part, 𝐸 ∗ 𝐴 = {𝑥 | ∃𝑤, 𝑦, 𝑧. 𝑤 ⊑ 𝑥 ∧ 𝑤 ∈ 𝑦 ◦ 𝑧 ∧ 𝑦 ∈ 𝐸 ∧ 𝑧 ∈ 𝐴} ⊇ {𝑥 | ∃𝑧, 𝑒𝑧 . 𝑧 ⊑ 𝑥 ∧ 𝑧 ∈ 𝑒𝑧 ◦ 𝑧 ∧ 𝑒𝑧 ∈ 𝐸 ∧ 𝑧 ∈ 𝐴} By Unit Existence, for any 𝑧 ∈ 𝑋 , there exists 𝑒𝑧 ∈ 𝐸 such that 𝑧 ∈ 𝑒𝑦 ◦ 𝑧. Thus, 𝐸 ∗ 𝐴 ⊇ {𝑥 | ∃𝑧.𝑧 ⊑ 𝑥 ∧ 𝑧 ∈ 𝐴} = 𝐴 On the other hand, by Unit Coherence and Commutativity, 𝑤 ∈ 𝑦 ◦ 𝑧 ∧ 𝑦 ∈ 𝐸 implies that 𝑧 ⊑ 𝑤, and thus, 𝐸 ∗ 𝐴 ⊆ {𝑥 | ∃𝑤, 𝑧. 𝑤 ⊑ 𝑥 ∧ 𝑧 ⊑ 𝑤 ∧ 𝑧 ∈ 𝐴} = {𝑥 | ∃𝑧. 𝑧 ⊑ 𝑥 ∧ 𝑧 ∈ 𝐴} = 𝐴 25 Therefore, 𝐴 ∗ 𝐸 = 𝐸 ∗ 𝐴 = 𝐴. □ The complex algebra is constructed in a way that allows us to regard any per- sistent valuation on the BI frame as an algebraic interpretation of the complex algebra. Theorem 2.2.8. Let X = (𝑋, ⊑, ◦, 𝐸) be a BI frame and let Vf : AP → P(𝑋) be a persistent valuation on X. Define the algebraic assignment Va : AP → Com(X) by lettingVa(𝑝) = Vf(𝑝) for all atomic proposition 𝑝. Define the algebraic interpretation ⟦−⟧𝑎 by taking the homomorphic extension ofV𝑎 Then we have: 𝑥 |=Vf 𝑃 if and only if 𝑥 ∈ ⟦𝑃⟧a. Proof. We proceed by induction on 𝑃. We show the base case and one inductive case, and omit the rest of the inductive cases. • Case 𝑃 = 𝑝: We have: 𝑥 |=Vf 𝑝 iff 𝑥 ∈ Vf(𝑝) iff 𝑥 ∈ Va(𝑝) iff 𝑥 ∈ ⟦𝑝⟧a • Case 𝑃 = 𝑄1 ∧𝑄2: 𝑥 |=Vf 𝑄1 ∧𝑄2 iff 𝑥 |=Vf 𝑄1 and 𝑥 |=Vf 𝑄2 (By satisfication rule) iff 𝑥 ∈ ⟦𝑄1⟧a and 𝑥 ∈ ⟦𝑄2⟧a (Inductive Hypothesis) iff 𝑥 ∈ ⟦𝑄1⟧a ∩ ⟦𝑄2⟧a iff 𝑥 ∈ ⟦𝑄1 ∧𝑄2⟧a (By the ∧ operation in Complex algebra and the recursive definition ofV) □ 26 This equivalence between persistent valuations and algebraic interpretations to complex algebra bridges the remaining gap between algebraic soundness we proved in theorem 2.2.3 and overall soundness of the proof system with respect to BI models. Theorem 2.2.9 (Soundness of BI). If 𝑃 ⊢ 𝑄 is derivable, then 𝑃 |= 𝑄. Proof. We prove the contra-positive. If 𝑃 ̸ |= 𝑄, then there exists a BI model (X,V) and a state 𝑥 ∈ 𝑋 such that 𝑥 |= 𝑃 but 𝑥 ̸ |= 𝑄. By theorem 2.2.8, if we define Va : AP → Com(X) by Va(𝑝) = V(𝑝), then we can extend it into an algebraic interpretationVa such that 𝑥 |=V 𝑃 if and only if 𝑥 ∈ ⟦𝑃⟧a. Thus, there exists algebraic interpretation ⟦−⟧𝑎 of Com(X) such that 𝑥 ∈ ⟦𝑃⟧a and 𝑥 ∉ ⟦𝑄⟧a. So ⟦𝑃⟧a ⊈ ⟦𝑄⟧a; since the order ≤V𝑎 in the algebra is exactly the set inclusion ⊆, we have ⟦𝑃⟧a ≰ ⟦𝑄⟧a. By algebraic soundness, that implies 𝑃 ⊢ 𝑄 is not derivable. □ Completeness of BI Proof Systems In the following, we show the completeness of the BI proof system. Dual to the approach for proving soundness, we show that if there exists an algebraic interpretation ⟦−⟧A and some formulas 𝑃,𝑄 such that ⟦𝑃⟧A ≰ ⟦𝑄⟧A, then there exists a BI model (X,V) such that 𝑃 ̸ |=(X,V) 𝑄. To show that, we utilize a map dual to the complex algebra construction in the soundness proof: here, given an algebraic interpretation of BI formulas to that a BI algebra, we construct a BI frame corresponding to the BI algebra and a valuation to that BI frame corre- sponding to the algebraic interpretation. We first recall a structure on a bounded distributive lattice, called prime filter. 27 Definition 2.2.7 (Prime Filter). If (𝐿,∧,∨) is a bounded distributive lattice, a filter 𝐹 on 𝐿 is a non-empty subset of 𝐴 such that: • If 𝑥 ∈ 𝐹 and 𝑥 ≤ 𝑦 then 𝑦 ∈ 𝐹. • If 𝑥 ∈ 𝐹 and 𝑦 ∈ 𝐹 then 𝑥 ∧ 𝑦 ∈ 𝐹. A filter is proper if it is a proper subset of 𝐴, i.e.,, it does not contain ⊥. A prime filter is a proper filter that in addition satisfies: if 𝑥 ∨ 𝑦 ∈ 𝐹 then 𝑥 ∈ 𝐹 or 𝑦 ∈ 𝐹. Given a BI algebra, we can construct a BI frame whose states are prime filters. We write Prf(𝐿) for the set of prime filters on 𝐿. Definition 2.2.8 (Prime Filter Frame). If A = (𝐴,∧,∨,→,⊤,⊥, ∗,−∗, 𝐼) is a BI al- gebra, then the prime filter frame of A is defined as Prf(A) = (Prf(𝐴), ⊆, ◦, 𝐸) where 𝐹1 ◦ 𝐹2 = {𝐹 ∈ Prf(𝐴) | ∀𝑎1 ∈ 𝐹1.∀𝑎2 ∈ 𝐹2. 𝑎1 ∗ 𝑎2 ∈ 𝐹} 𝐸 = {𝐹 ∈ Prf(𝐴) | 𝐼 ∈ 𝐹} We need to check that the constructed structure is a BI frame. Lemma 2.2.10. If A = (𝐴,∧,∨,→,⊤,⊥, ∗,−∗, 𝐼) is a BI algebra, then Prf(A) is a BI frame. Proof. Let us show Unit Coherence and Down-Closed. 28 For Unit Coherence, if 𝑒 ∈ 𝐸 , then for any 𝑥 ∈ Prf(𝐴) , 𝑥 ◦ 𝑒 = {𝐹 ∈ Prf(𝐴) | ∀𝑎1 ∈ 𝑥.∀𝑎2 ∈ 𝑒. 𝑎1 ∗ 𝑎2 ∈ 𝐹} ⊇ {𝐹 ∈ Prf(𝐴) | ∀𝑎1 ∈ 𝑥. 𝑎1 ∗ 𝐼 ∈ 𝐹} = {𝐹 ∈ Prf(𝐴) | ∀𝑎1 ∈ 𝑥. 𝑎1 ∈ 𝐹} = {𝐹 ∈ Prf(𝐴) |𝑥 ⊆ 𝐹} Thus, 𝑦 ∈ 𝑥 ◦ 𝑒 implies 𝑥 ⊑ 𝑦. For Down-Closed, for any 𝑥, 𝑥′, 𝑦, 𝑦′, 𝑧 ∈ Prf(𝐴), if 𝑧 ∈ 𝑥 ◦ 𝑦 and 𝑥′ ⊆ 𝑥 and 𝑦′ ⊆ 𝑦 then for any 𝑎1 ∈ 𝑥′, 𝑎2 ∈ 𝑦′, it must 𝑎1 ∈ 𝑥, 𝑎2 ∈ 𝑦 as well, and we have 𝑎1 ∗ 𝑎2 ∈ 𝑧. Thus, 𝑧 ∈ 𝑥′ ◦ 𝑦′. □ Below, we show that any algebraic interpretation to A corresponds to a “morally equivalent” persistent valuation on the prime filter frame Prf(A). This result is in dual to theorem 2.2.8 used in the soundness proof. In the theorem and its proof, we use the following notation: for any element 𝑎 in a lattice, we write [𝑎) := {𝑥 | 𝑎 ≤ 𝑥}. By construction, any such [𝑎) is upwards-closed and closed under meet, thus a filter. Theorem 2.2.11. Let A = (𝐴, . . . ) be a BI algebra and let ⟦−⟧ : FormBI → 𝐴 be an algebraic interpretation that homomorphically extends the assignmentVa : AP → 𝐴. Define the persistent valuationVf : AP → P(Prf(𝐴)) on the prime filter frame Prf(A) by: Vf(𝑝) = {𝐹 ∈ Prf(𝐴) | Va(𝑝) ∈ 𝐹} Then for 𝐹 ∈ Prf(𝐴), we have 𝐹 |=Vf 𝑃 if and only if ⟦𝑃⟧ ∈ 𝐹 . 29 Proof. We proceed by induction on the formula 𝑃. • Case 𝑃 = 𝑝: For any 𝐹 ∈ Prf(𝐴) and atomic proposition 𝑝, we have 𝐹 |=Vf 𝑝 iff 𝐹 ∈ Vf(𝑝) (By Kripke semantics fig. 2.3) iff 𝐹 ∈ Prf(A) andVa(𝑝) ∈ 𝐹 (By definition ofVf) iff Va(𝑝) ∈ 𝐹 (By assumption 𝐹 ∈ Prf(𝐴)) iff ⟦𝑝⟧ ∈ 𝐹. ( ⟦−⟧ extendsVa(−)) • Case 𝑃 = 𝑄1 ∗ 𝑄2. For any 𝐹 ∈ Prf(𝐴) and formula 𝑄1, 𝑄2, 𝐹 |=Vf 𝑄1 ∗ 𝑄2 iff there exists 𝐹′, 𝐹𝑦, 𝐹𝑧 s.t. 𝐹 ⊒ 𝐹′ ∈ 𝐹𝑦 ◦ 𝐹𝑧, 𝐹𝑦 |=Vf 𝑄1 and 𝐹𝑧 |=Vf 𝑄2 (By Kripke semantics fig. 2.3) iff there exists 𝐹′, 𝐹𝑦, 𝐹𝑧 s.t. 𝐹 ⊇ 𝐹′ ∈ 𝐹𝑦 ◦ 𝐹𝑧, ⟦𝑄1⟧ ∈ 𝐹𝑦 and ⟦𝑄2⟧ ∈ 𝐹𝑧 (By inductive hypothesis and definition of the preorder in Prf(A)) For the forward direction, by definition of prime filter frames, ⟦𝑄1⟧ ∈ 𝐹𝑦 and ⟦𝑄2⟧ ∈ 𝐹𝑧 imply that for any 𝐹′ ∈ 𝐹𝑦 ◦ 𝐹𝑧, it must ⟦𝑄1⟧ ∗ ⟦𝑄2⟧ ∈ 𝐹′. Thus, it implies that ⟦𝑄1 ∗ 𝑄2⟧ ∈ 𝐹. For the other direction, if ⟦𝑄1 ∗ 𝑄2⟧ ∈ 𝐹, then ⟦𝑄1⟧ ∗ ⟦𝑄2⟧ ∈ 𝐹. We do a case analysis: – Suppose 𝑄𝑖 is ⊥, then ⟦𝑄𝑖⟧ = ⊥A. By BI-Alg:Bot, ⟦𝑄1⟧ ∗ ⟦𝑄2⟧ = ⊥A. And then 𝐹 ∋ ⊥A, contradicting with 𝐹 being a proper set. Thus, this case is impossible. – 𝑄1 and 𝑄2 are both not ⊥. This means that [⟦𝑄1⟧A), [⟦𝑄2⟧A) are both proper filters. On the high level, we first show that 𝐹 ∈ [⟦𝑄1⟧A) ◦ [⟦𝑄2⟧A), and then use that to show that there exist prime filters 𝐹𝑦, 𝐹𝑧 30 such that 𝐹 ∈ 𝐹𝑦 ◦ 𝐹𝑧 and ⟦𝑄1⟧A ∈ 𝐹1 and ⟦𝑄2⟧A ∈ 𝐹2, which would then be used to show 𝐹 |=Vf 𝑄1 ∗ 𝑄2. First, [⟦𝑄1⟧A) ◦ [⟦𝑄2⟧A) = {𝐹 ∈ Prf(𝐴) | ∀𝑎 ∈ [⟦𝑄1⟧A).∀𝑏 ∈ [⟦𝑄2⟧A). 𝑎 ∗ 𝑏 ∈ 𝐹} = {𝐹 ∈ Prf(𝐴) | ∀𝑎, 𝑏. ⟦𝑄1⟧A ≤ 𝑎 ∧ ⟦𝑄2⟧A ≤ 𝑏 ⇒ 𝑎 ∗ 𝑏 ∈ 𝐹} By BI-Alg:Coh, ⟦𝑄1⟧A ≤ 𝑎 and ⟦𝑄2⟧A ≤ 𝑏 implies ⟦𝑄1⟧ ∗ ⟦𝑄2⟧ ≤ 𝑎 ∗ 𝑏. Thus, our given 𝐹 being a filter and ⟦𝑄1⟧ ∗ ⟦𝑄2⟧ ∈ 𝐹 imply that 𝑎 ∗ 𝑏 ∈ 𝐹 for any such 𝑎, 𝑏. Therefore, 𝐹 ∈ [⟦𝑄1⟧A) ◦ [⟦𝑄2⟧A). Next, define a predicate 𝑃 such that 𝑃(𝐹1, 𝐹2) = 1 if and only if 𝐹 ∈ 𝐹1 ◦ 𝐹2 and ⟦𝑄1⟧A ∈ 𝐹1 and ⟦𝑄2⟧A ∈ 𝐹2. Because 𝐹 ∈ [⟦𝑄1⟧A) ◦ [⟦𝑄2⟧A), we have 𝑃( [⟦𝑄1⟧A), [⟦𝑄2⟧A)) = 1. This predicate 𝑃 is a prime predicate according to Docherty [2019] (cf. Definition 5.5) — the proof follows from unfolding definitions and we omit it. Then, applying the Prime Extension Lemma (cf. Lemma 5.7 of Docherty [2019]), the existence of proper filters 𝐹1, 𝐹2 such that 𝑃(𝐹1, 𝐹2) = 1 implies that there exists prime filters 𝐹𝑦, 𝐹𝑧 such that 𝑃(𝐹𝑦, 𝐹𝑧) = 1. Therefore. there exists 𝐹𝑦, 𝐹𝑧 such that 𝐹 ∈ 𝐹𝑦 ◦ 𝐹𝑧 and ⟦𝑄1⟧A ∈ 𝐹𝑦 and ⟦𝑄2⟧A ∈ 𝐹𝑧 — and by inductive hypothesis this means 𝐹𝑦 |=Vf 𝑄1 and 𝐹𝑧 |=Vf 𝑄2. The existence of such 𝐹𝑦 and 𝐹𝑧 validates that 𝐹 |=Vf 𝑄1 ∗ 𝑄2. □ Now we are ready to prove completeness. 31 Theorem 2.2.12 (BI Completeness). If 𝑃 |=V 𝑄 for all BI models (X,V), then 𝑃 ⊢ 𝑄. Proof. We prove the contra-positive. Assume 𝑃 ⊢ 𝑄 is not derivable. By al- gebraic completeness, there exists algebra A and interpretation ⟦−⟧ such that ⟦𝑃⟧ ≰ ⟦𝑄⟧. Then, the element ⟦𝑄⟧ is not in [⟦𝑃⟧), the least filter containing 𝑃. Let 𝐹 = [⟦𝑃⟧). • 𝑃 is ⊥. Then ⊥ ⊢ 𝑄 by the proof rule BOT, contradicting with 𝑃 ⊬ 𝑄. • 𝑃 is not ⊥. Then 𝐹 = [⟦𝑃⟧) is a proper filter. Define a predicate 𝑃 such that 𝑃(𝐹′) = 1 iff ⟦𝑃⟧ ∈ 𝐹′ and ⟦𝑄⟧ ∉ 𝐹′. Because ⟦𝑄⟧ ∉ [⟦𝑃⟧) and ⟦𝑃⟧ ∈ [⟦𝑃⟧), we have 𝑃(𝐹) = 1. This predicate is a prime predicate, and from prime extension lemma (cf. Lemma 5.7 of Docherty [2019]) it can be established that there is a prime filter 𝐹′ on A such that ⟦𝑃⟧ ∈ 𝐹′ and ⟦𝑄⟧ ∉ 𝐹′. Define a persistent valuationVf on Prf(A) by Vf(𝑝) = {𝐹 ∈ Prf(𝐴) | Va(𝑝) ∈ 𝐹}. By theorem 2.2.11, we have 𝐹Vf |= 𝑃 and 𝐹Vf ̸ |= 𝑄. Thus, 𝑃 ̸ |= 𝑄. □ 2.2.4 A Discrete Probabilistic Frame of BI After displaying the metatheory of bunched logic, next we show a concrete ex- ample of BI model. We present a model based on probabilistic distributions over program memories, which will be useful later in reasoning about probabilistic programs. 32 Definition 2.2.9 (Discrete Distribution). Given a set 𝑆, a discrete subdistribution 𝜇 is a countable support function 𝜇 : 𝑋 → [0, 1] satisfying ∑ 𝑥∈𝑋 𝜇(𝑥) ≤ 1. A (full) distribution is a subdistribution that in addition ∑ 𝑥∈𝑋 𝜇(𝑥) = 1. We use D(𝑋) to denote the set of discrete (full) distributions 𝜇 over 𝑋 . Now we can define program memories. Throughout this thesis, we fix a set of variables Var and a set of values that the variables can take Val. Definition 2.2.10 (Program Memories). Let 𝑆 ⊆ Var be a set of variable names. We call any function 𝑚 : 𝑆 → Val a program memory because such a map 𝑚 as- signs a value to each variable in 𝑆. Let Mem[𝑆] denote the set of program mem- ories from 𝑆 to Val; and for each 𝑚 ∈ Mem[𝑆], define the domain of 𝑚 to be 𝑆 and denote it as dom(𝑆) As an example, the empty program memory Mem[∅] = ∅ → Val contains exactly one element, which is a trivial map with an empty domain; we denote the trivial map by ⟨⟩. We need two operations on memories. First, a memory 𝑚 with domain 𝑆 can be projected to a memory 𝜋𝑇𝑚 with domain 𝑇 if 𝑇 ⊆ 𝑆, defined as 𝜋𝑇𝑚(𝑥) = 𝑚(𝑥) for any variable 𝑥 ∈ 𝑇 . Second, two memories can be combined if they agree on the intersection of their domains. Definition 2.2.11. Given memories 𝑚1 ∈ Mem[𝑆], 𝑚2 ∈ Mem[𝑇] such that 𝜋𝑆∩𝑇𝑚1 = 𝜋𝑆∩𝑇𝑚2, we define 𝑚1 ⊲⊳ 𝑚2 : 𝑆 ∪ 𝑇 → Val by 𝑚1 ⊲⊳ 𝑚2(𝑥) :=  𝑚1(𝑥) if 𝑥 ∈ 𝑆 \ 𝑇 𝑚2(𝑥) if 𝑥 ∈ 𝑇 \ 𝑆 𝑚1(𝑥) = 𝑚2(𝑥) if 𝑥 ∈ 𝑆 ∩ 𝑇 33 This operation is not defined when 𝑚1, 𝑚2 disagree on 𝑆 ∩ 𝑇 . This operation is well-defined exactly because 𝑚1, 𝑚2 agrees on 𝑆 ∩ 𝑇 . We also lift the projection map to distributions. We define the projection 𝜋𝑆 to marginalize a distribution 𝜇 onD(Mem[𝑆′]) to a distribution onD(Mem[𝑆 ∩ 𝑆′]): for any 𝑥 ∈ Mem[𝑆 ∩ 𝑆′], 𝜋𝑆𝜇(𝑥) := ∑︁ 𝑥′∈Mem[𝑆′\𝑆] 𝜇(𝑥′ ⊲⊳ 𝑥). This gives us enough ingredients to define a probabilistic BI frame that will later be useful for reasoning about probabilistic independence. Definition 2.2.12 (A Discrete Probabilistic BI Frame). Define a discrete probabilis- tic BI frame to be a structure XD = (𝑋D, ⊑D, ⊗D, 𝐸D) where • XD := ∪𝑆⊆VarD(Mem[𝑆]); • Distributions 𝜇1 ⊑D 𝜇2 iff 𝜇1 = 𝜋dom(𝜇2),dom(𝜇1)𝜇2. • For distributions 𝜇1 ∈ D(Mem[𝑆]), 𝜇2 ∈ D(Mem[𝑇]), the binary operation ⊗D takes the independent product of them iff 𝑆 and 𝑇 are disjoint: 𝜇1 ⊗D 𝜇2 =  {𝜇 | ∀𝑥 ∈ Mem[𝑆 ∪ 𝑇], 𝜇(𝑥) = 𝜇1(𝜋𝑆𝑥) · 𝜇2(𝜋𝑇𝑥)} if 𝑆, 𝑇 disjoint ∅ otherwise • 𝐸D := ∪𝑆⊆VarD(Mem[𝑆]); We check that the structure XD is a BI frame. In Barthe et al. [2019], they check a very similar structure is a partial commutative monoid. The structure’s carrier set consists of pairs of deterministic memories and randomized memo- ries; our states can be viewed as a degenerated case of their states with trivial 34 deterministic memories. Meanwhile, BI frames can be viewed as a generaliza- tion of partial commutative monoids. So it intuitively follows from their result that XD is a BI frame. Theorem 2.2.13. XD = (𝑋D, ⊑D, ⊗D, 𝐸D) is a BI frame. Proof. We show that it satisfies all the frame conditions. For instance, Down-Closed If 𝜇𝑧 ∈ 𝜇𝑥 ⊗D 𝜇𝑦, and 𝜇′𝑥 ⊑D 𝜇𝑥 , 𝜇′𝑦 ⊑D 𝜇𝑦, then define 𝑋 = dom(𝜇𝑥), 𝑌 = dom(𝜇𝑦), 𝑋′ = dom(𝜇′𝑥), 𝑌 ′ = dom(𝜇′𝑦), and define 𝜇 = 𝜋𝑋 ′∪𝑌 ′𝜇𝑧. The fact that 𝜇𝑧 ∈ 𝜇𝑥 ⊗D 𝜇𝑦 implies that for any 𝑚 ∈ Mem[𝑋 ∪ 𝑌 ], 𝜇𝑧 (𝑚) = 𝜇𝑥 (𝜋𝑋𝑚) · 𝜇𝑦 (𝜋𝑌𝑚); Thus, 𝜇(𝑚) = (𝜋𝑋 ′∪𝑌 ′𝜇𝑧) (𝑚) = ∑︁ 𝑚′∈Mem[𝑋∪𝑌\(𝑋 ′∪𝑌 ′)] 𝜇𝑧 (𝑚′ ⊲⊳ 𝑚) = ∑︁ 𝑚′∈Mem[𝑋∪𝑌\(𝑋 ′∪𝑌 ′)] 𝜇𝑥 (𝜋𝑋𝑚′ ⊲⊳ 𝑚) · 𝜇𝑦 (𝜋𝑌𝑚′ ⊲⊳ 𝑚) = ∑︁ 𝑚1∈Mem[𝑋\𝑋 ′] ∑︁ 𝑚2∈Mem[𝑌\𝑌 ′] 𝜇𝑥 (𝜋𝑋𝑚1 ⊲⊳ 𝑚2 ⊲⊳ 𝑚) · 𝜇𝑦 (𝜋𝑌𝑚1 ⊲⊳ 𝑚2 ⊲⊳ 𝑚) = ©­« ∑︁ 𝑚1∈Mem[𝑋\𝑋 ′] 𝜇𝑥 (𝜋𝑋𝑚1 ⊲⊳ 𝑚)ª®¬ · ©­« ∑︁ 𝑚2∈Mem[𝑌\𝑌 ′] 𝜇𝑦 (𝜋𝑌𝑚2 ⊲⊳ 𝑚)ª®¬ = 𝜋𝑋 ′𝜇𝑥 (𝑚) · 𝜋𝑌 ′𝜇𝑦 (𝑚) = 𝜇′𝑥 (𝑚) · 𝜇′𝑦 (𝑚) Hence, 𝜇 ∈ 𝜇′𝑥 ⊗D 𝜇′𝑦, and by definition, 𝜇 ⊑D 𝜇𝑧. We delay the full proof to appendix A. □ 35 This theorem indicates that, when given a set of atomic propositions and a persistent valuation, we can interpret BI formulas on distributions over mem- ories. In the next section, we will use BI formulas to specify probabilistic pro- grams and also give proof rules for reasoning about probabilistic programs. 2.3 Probabilistic Separation Logic In this section, we will overview probabilistic separation logic, which consists of a set of rules for analyzing probabilistic programs. Each rule describes how a probabilistic program transforms its input distribution into its output distri- bution; and the input and output distributions are specified using BI formulas interpreted on the concrete probabilistic BI frame XD — we will introduce a set of atomic propositions and a valuation so that the BI formulas can effectively specify probabilistic programs. At the high level, probabilistic separation logic will utilize a useful and com- mon property in probabilistic distributions — independence — to reason about irrelevant parts of a probabilistic programs modularly. Independence is often defined for events. For a discrete distribution 𝜇 : 𝑋 → [0, 1], an event EV is a map 𝑋 → {0, 1}, and the probability of event EV is ∑ 𝜔∈𝑋 𝜇(𝜔) · EV(𝜔), which we will overload the notation and write as 𝜇(EV). The independence of two events says that the occurrence of one event does not tell anything about the occurrence of the other event. Formally, Definition 2.3.1 (Probabilistic Independence of Events). Given any distribution 𝜇 : 𝑋 → [0, 1], two events EV1, EV2 are independent if and only if 𝜇(EV1 ∩ EV2) = 𝜇(EV1) · 𝜇(EV2). 36 A set of events EV1, . . . , EV𝑛 are mutually independent if for any subset 𝑆 ⊆ [𝑛], 𝜇( ⋂ 𝑗∈𝑆 EV 𝑗 ) = ⋂ 𝑗∈𝑆 𝜇(EV 𝑗 ). For program analysis, another useful notion is the probabilistic indepen- dence between two program variables, which says that knowing the value of one program variable does not tell anything about the value of the other. To formally define it, we need “a variable 𝑥 takes a value 𝑣” to be an event, which is the case for distributions over program memories. Given a distribution 𝜇 over D(Mem[𝑆]), then for any 𝑥 ∈ 𝑆, we write the event {𝜔 ∈ Mem[𝑆] | 𝜔(𝑥) = 𝑣} as 𝑥 = 𝑣. We then define the probabilistic independence of variables as followings. Definition 2.3.2 (Probabilistic Independence of Program Variables). Given any distribution 𝜇 : 𝑋 → [0, 1] and two variables 𝑥, 𝑦 ∈ Var, if 𝑥 = 𝑣1, 𝑦 = 𝑣2 are events for any two values 𝑣1, 𝑣2 ∈ Val, then we define the variables 𝑥 and 𝑦 to be independent if and only if: for any 𝑣1, 𝑣2 ∈ Val, 𝜇(𝑥 = 𝑣1 ∧ 𝑦 = 𝑣2) = 𝜇(𝑥 = 𝑣1) · 𝜇(𝑦 = 𝑣2), i.e., the events 𝑥 = 𝑣1 and 𝑦 = 𝑣2 are independent. Similarly, a set of program variables 𝑦1, . . . , 𝑦𝑛 are mutual independent iff for any subset 𝑆 ⊆ [𝑛], for any set of {𝑣𝑖 ∈ Val | 𝑖 ∈ 𝑆} 𝜇( ⋂ 𝑖∈𝑆 𝑥 = 𝑣𝑖) = ∏ 𝑖∈𝑆 𝜇(𝑥 = 𝑣𝑖), In the following, when talking about a set of variables, we will abbreviate mutual independence as independence. Another commonly used notion is the independence between two sets of program variables. We can talk about value assignments on a set of variables 37 in a similar way: given a distribution 𝜇 over Mem[𝑆], for any 𝑋 ⊆ 𝑆 and 𝑚 ∈ Mem[𝑋], we write 𝑋 = 𝑚 for {𝜔 ∈ Mem[𝑆] | ∀𝑥 ∈ 𝑋.𝜔(𝑥) = 𝑚(𝑥)}. Definition 2.3.3 (Probabilistic Independence of Two Sets of Program Variables). Given two sets of variables 𝑋,𝑌 ⊆ Var, program memories on 𝑋 and program memories on 𝑌 , 𝑋,𝑌 are independent if for any 𝑚𝑋 ∈ Mem[𝑋], 𝑚𝑌 ∈ Mem[𝑌 ], 𝜇(𝑋 = 𝑚𝑋 ∩ 𝑌 = 𝑚𝑌 ) = 𝜇(𝑋 = 𝑚𝑋) · 𝜇(𝑌 = 𝑚𝑋). An equivalent condition is as follows: for any 𝑚𝑋 ∈ Mem[𝑋], 𝑚𝑌 ∈ Mem[𝑌 ] such that 𝑚𝑋 ⊲⊳ 𝑚𝑌 is defined, 𝜋𝑋∪𝑌 𝜇(𝑚𝑋 ⊲⊳ 𝑚𝑌 ) = 𝜋𝑋𝜇(𝑚𝑋) · 𝜋𝑌 𝜇(𝑚𝑌 ). The probabilistic separation logic presented in this section will facilitate its users to prove, track, and utilize the independence of (sets of) program vari- ables. 2.3.1 A Simple Probabilistic Programming Language We work with an imperative language pWhile that allows sampling from a set of built-in primitive distributions. We first define the set of valid expressions E and the set of allowed distributions D, and then define the formal grammar of commands C. We assume a fixed set of typed program variables; 𝑥 stands for a numeric variable, while 𝑏 stands for a boolean variable. The expression lan- guage is standard. Distribution terms 𝑑 ∈ D can be Bern𝑣 for a Bernoulli (coin- flip) distribution with bias 𝑣, Unif𝑆 for a uniform distribution over elements in 𝑆, or some other symbols interpreted into distributions — we will introduce them 38 E ∋ 𝑒 ::= 𝑣 ∈ Val | 𝑥, 𝑏 ∈ Var | 𝑒1 = 𝑒2 | 𝑒1 + 𝑒2 | 𝑒1 × 𝑒2 | . . . D ∋ 𝑑 ::= Bern𝑣 | Unif𝑆 | . . . C ∋ 𝑐 ::= skip | 𝑥 ← 𝑒 | 𝑥 $← 𝑑 | if 𝑏 then 𝑐 else 𝑐′ | 𝑐 ; 𝑐′ | while 𝑏 do 𝑐 Figure 2.5: pWhile command syntax later when needed. We assume throughout that all expressions and distribu- tion terms are well-typed; in particular, the value 𝑣 in Bern𝑣 is a number in the interval [0, 1], and 𝑆 in Unif𝑆 is a finite set whose size is |𝑆 |. For commands, pWhile has six kinds of programs: the no-op skip; assign- ments 𝑥 ← 𝑒, which assign the evaluated value of the expression 𝑒 to the pro- gram variable 𝑥; sampling 𝑥 $← 𝑑 for drawing a value from a distribution 𝑑 and assigning it to 𝑥; conditionals if 𝑏 then 𝑐 else 𝑐′ for branching on a (possibly randomized) condition 𝑏; sequencing 𝑐 ; 𝑐′; and loops while 𝑏 do 𝑐 for iterat- ing a command 𝑐 until the condition 𝑏 is not true. We also write if 𝑏 then 𝑐 as abbreviation of if 𝑏 then 𝑐 else skip. Probabilistic Monad To concisely describe the denotational semantics of these commands, we introduce operations on distributions and probabilistic monads. Since D(𝑋) is the set of distributions over 𝑋 , we can view D as an operation that maps a set into distributions over that set. This operation on sets can be lifted to functions 𝑓 : 𝑋 → 𝑌 , resulting in a map of distributions D( 𝑓 ) : D(𝑋) → D(𝑌 ) given by D( 𝑓 ) (𝜇) (𝑦) := ∑ 𝑓 (𝑥)=𝑦 𝜇(𝑥). Intuitively, D( 𝑓 ) takes the sum of the probabilities of all elements in the pre-image of 𝑦. These operations turn D into a functor on sets and, further, D is also a monad [Giry, 1982, Moggi, 1991]. Definition 2.3.4 (Distribution Monad). Define unit : 𝑋 → D(𝑋) as unit(𝑥) := 𝛿𝑥 39 where 𝛿𝑥 denotes the Dirac distribution on 𝑥: for any 𝑦 ∈ 𝑋 , we have 𝛿𝑥 (𝑦) = 1 if 𝑦 = 𝑥, otherwise 𝛿𝑥 (𝑦) = 0. Further, define bind : D(𝑋) × (𝑋 → D(𝑌 )) → D(𝑌 ) by bind(𝜇, 𝑓 ) (𝑦) := ∑ 𝑝∈D(𝑌 ) D( 𝑓 ) (𝜇) (𝑝) · 𝑝(𝑦). Intuitively, unit embeds a set into distributions over the set, and bind enables the sequential combination of probabilistic computations. Both maps are natu- ral transformations and satisfy the following interaction laws, establishing that (D, unit, bind) is a monad: bind(unit(𝑥), 𝑓 ) = 𝑓 (𝑥) bind(𝜇, 𝑥 ↦→ unit(𝑥)) = 𝜇, bind(bind(𝜇, 𝑓 ), 𝑔) = bind(𝜇, 𝜆𝑥.bind( 𝑓 (𝑥), 𝑔)). The distribution monad has an equivalent presentation in which bind is replaced with a multiplication operation join : D(D(𝑋)) → D(𝑋), which flattens distribu- tions by averaging: join(𝜇) (𝑥) := ∑︁ 𝜌∼𝜇 𝜇(𝜌) · 𝜌(𝑥). Program Semantics Given a program memory containing all variables ap- pearing in an expression, we interpret E terms as values in Val and interpret D terms as distributions in D(Val) as in fig. 2.6. We overload the notation and write ⟦𝑒⟧ for interpretation of expression 𝑒 and ⟦𝑑⟧ for interpretation of distri- bution 𝑑. We can also interpret expressions on probabilistic memories through a lift- ing. For any 𝜇 ∈ D(Mem[𝑆]), ⟦𝑒⟧(𝜇) = bind(𝜇, 𝑚 ↦→ ⟦𝑒⟧(𝑚)) 40 ⟦𝑣⟧(𝑚) := 𝑣 ⟦𝑥⟧(𝑚) := 𝑚(𝑥) ⟦𝑥 = 𝑦⟧(𝑚) := 1 if 𝑚(𝑥) = 𝑚(𝑦) else 0 ⟦𝑥 + 𝑦⟧(𝑚) := 𝑚(𝑥) + 𝑚(𝑦) ⟦𝑥 × 𝑦⟧(𝑚) := 𝑚(𝑥) × 𝑚(𝑦) ⟦Bern𝑣⟧ =  1 ↦→ 𝑣 0 ↦→ 1 − 𝑣 𝜔 ↦→ 0 if 𝜔 ≠ 0 and 𝜔 ≠ 1 ⟦Unif𝑆⟧ = { 𝜔 ↦→ 1 |𝑆 | if 𝜔 ∈ 𝑆 𝜔 ↦→ 0 otherwise Figure 2.6: Semantics of Expressions and Distributions Then we can interpret programs in pWhile as distribution transformers D(Mem[Var]) → D(Mem[Var]), as in fig. 2.7. The interpretation is standard. The command skip simply outputs the input distribution; 𝑥 ← 𝑒 and 𝑥 $← 𝑑 use the monadic operation bind to compose the input distribution 𝜇 with the updat- ing map describing the output distribution corresponding to each deterministic input memory 𝑚; last, 𝑐 ; 𝑐′ composes the interpretation of 𝑐 and 𝑐′ using usual function composition. Because the conditional if 𝑏 then 𝑐 else 𝑐′ allows a randomized guard 𝑏, inter- preting it requires two more operations on distributions: a conditioning opera- tion 𝜇 | 𝑆 to split control flow, and convex combination ⊕𝑝 to merge control flow. Given any distribution 𝜇 ∈ D(𝐴) and event 𝑆 ⊆ 𝐴, if 𝜇(𝑆) > 0, the conditional distribution of 𝜇 given 𝑆 is: (𝜇 | 𝑆) (𝑎) :=  𝜇(𝑎) 𝜇(𝑆) : 𝑎 ∈ 𝑆 0 : 𝑎 ∉ 𝑆. (2.2) 41 ⟦skip⟧(𝜇) := 𝜇 ⟦𝑥 ← 𝑒⟧(𝜇) := bind(𝜇, 𝑚 ↦→ unit(𝑚 [𝑥 ↦→ ⟦𝑒⟧(𝑚)])) ⟦𝑥 $← 𝑑⟧(𝜇) := bind(𝜇, 𝑚 ↦→ bind(𝑑, 𝑣 ↦→ unit(𝑚 [𝑥 ↦→ 𝑣]))) ⟦𝑐 ; 𝑐′⟧(𝜇) := ⟦𝑐′⟧(⟦𝑐⟧(𝜇)) ⟦if 𝑏 then 𝑐 else 𝑐′⟧(𝜇) := ⟦𝑐⟧(𝜇 | 𝑏 = tt) ⊕𝑝 ⟦𝑐′⟧(𝜇 | 𝑏 = ff ) where 𝑝 := 𝜇(𝑏 = tt) ⟦abort⟧(𝜇) := 𝜆𝜔.0 ⟦while 𝑏 do 𝑐⟧(𝜇) := lim 𝑛→∞ ⟦(if 𝑏 then 𝑐)𝑛; if 𝑏 then abort⟧(𝜇) Figure 2.7: Program semantics When 𝜇(𝑆) = 0, we leave 𝜇 | 𝑆 undefined. For convex combination, for any 𝜇1, 𝜇2 ∈ D(𝐴), we define 𝜇1 ⊕0 𝜇2 := 𝜇2 and 𝜇1 ⊕1 𝜇2 := 𝜇1. When 𝑝 ∈ (0, 1), (𝜇1 ⊕𝑝 𝜇2) (𝑎) := 𝑝 · 𝜇1(𝑎) + (1− 𝑝) · 𝜇2(𝑎). Conditioning and convex combination are inverses in the sense that 𝜇 = (𝜇 | 𝑆) ⊕𝜇(𝑆) (𝜇 | (𝐴 \ 𝑆)). The command abort is only used in the definition of the while loops and is not accessible by the users. It disregards the input distribution 𝜇 and returns a subdistribution that assigns 0 to all possible outcomes 𝜔. The semantics of the while loop while 𝑏 do 𝑐 is the limit of ⟦(if 𝑏 then 𝑐)𝑛; if 𝑏 then abort⟧ as 𝑛 ap- proaches to infinity. The limit, taken with the point-wise order, exists according to the monotone convergence theorem [Abbott, 2015, Strichartz, 2000] because the subdistribution’s mass is non-decreasing as 𝑛 increases and is upper bound by 1. In practice, we assumed that all loops terminate in finite steps, the limit is always a full distribution, so all commands in pWhile can still be interpreted as distribution transformers. 42 2.3.2 A Concrete BI Model for Asserting Independence We define some atomic propositions to describe distributions over program memories ∪𝑆⊆VarD(Mem[𝑆]). The BI frame XD defined in section 2.2.4 to- gether with the valuation V∗ for atomic propositions defined below provide a BI model, on which we can assert properties such as distributions of variables, and independence between variables. Let atomic propositions APD ∋ 𝑝 ::= Own(E) | E $∼ 𝜇 | Detm⟨E⟩ | [E = E] | E[E] ⊲⊳ 𝑐 (2.3) where ⊲⊳ ∈ {=, ≤, ≥}, 𝑏 ∈ {0, 1}, and 𝑐 ∈ R is a constant. Roughly, Own(E) asserts that the distribution of the expression E is fully determined; E $∼ 𝜇 asserts that the expression E has distribution 𝜇; Detm⟨E⟩ asserts that the expression E is deterministic; [E1 = E2] asserts that the expression E1 and E2 are always equal; last, E[𝑒] ⊲⊳ 𝑐 bound the expected value of an expression 𝑒 with respect to a constant 𝑐. In particular, since events are maps from memories to {0, 1}, which are the same type as the interpretation of boolean expressions in the language, we assume the set of expressions contains events as well. We define the satisfaction of atomic proposition on program configurations as follows. Let FV(𝑒) be the set of free variables in expression 𝑒. Definition 2.3.5 (Valuation). For 𝜇 ∈ XD, defineV∗ such that • 𝜇 ∈ V∗(Own(𝑒)) holds if FV(𝑒) ⊆ dom(𝜇); • 𝜇 ∈ V∗(𝑒 $∼ 𝜇′) iff FV(𝑒) ⊆ dom(𝜇) and ⟦𝑒⟧(𝜇) = 𝜇′; • 𝜇 ∈ V∗(Detm⟨𝑒⟩) iff FV(𝑒) ⊆ dom(𝜇) and ⟦𝑒⟧(𝜇) is a Dirac distribution; 43 • 𝜇 ∈ V∗( [𝑒 ⊲⊳ 𝑒′]) iff FV(𝑒) ∪ FV(𝑒′) ⊆ dom(𝜇) and ⟦𝑒⟧(𝑚) ⊲⊳ ⟦𝑒′⟧(𝑚) for any 𝑚 in the support of 𝜇; • 𝜇 ∈ V(E[𝑒] ⊲⊳ 𝑐) iff the expected value of expression 𝑒 in 𝜇, i.e., E[𝑒] :=∑ 𝑚∈Mem[dom(𝜇)] 𝜇(𝑚) · ⟦𝑒⟧(𝑚), satisfies E[𝑒] ⊲⊳ 𝑐. For an event 𝑒𝑣, we also write Pr[𝑒𝑣] ⊲⊳ 𝑐 for E[𝑒𝑣] ⊲⊳ 𝑐. It is straightforward to show that V∗ defined for these atomic propositions is a persistent valuation. Proposition 2.3.1. (XD,V∗) forms a BI model. With these atomic propositions, we can now use bunched logic formulas to assert interesting probabilistic properties. For instance, we can assert the independence between two variables using the following assertion. Lemma 2.3.2. For any distribution 𝜇 ∈ XD, for a set of variables {𝑋𝑖}𝑖∈𝑆, 𝜇 |= ∗𝑖∈𝑆 Own(𝑋𝑖) iff variables {𝑋𝑖}𝑖∈𝑆 are distinct and mutually independent. We present the proof in appendix A. The assertion logic has all the axioms for atomic formulas stated in Barthe et al. [2019, Lemma 3, 4]. While this set of axioms is not complete, they are useful for reasoning about a rich family of probabilistic properties. Lemma 2.3.3. The following axiom schemas are valid: |= [𝑒1 = 𝑒2] → [𝑒2 = 𝑒1] (Eq-Sym) |= [𝑒1 = 𝑒2] ∧ [𝑒2 = 𝑒3] → [𝑒1 = 𝑒2] (Eq-Tran) |= Own(𝑒1) → Own(𝑒2) whenever 𝐹𝑉 (𝑒2) ⊆ 𝐹𝑉 (𝑒1) (Own-Incl) 44 Note that |= [𝑒1 = 𝑒1] is not an axiom — it is not sound, since it may not hold in a randomized memory D(Mem[∅]) with empty domain. We also have axioms for uniformity propositions. Lemma 2.3.4. The following axiom schemas are valid: |= [𝑒1 = 𝑒2] ∧ Unif𝑆⟨𝑒1⟩ → Unif𝑆⟨𝑒2⟩ (Unif-Tran) |= Unif𝑆⟨𝑒1⟩ → [𝑒1 = 𝑒1] (Unif-Weak) |= Unif𝑆⟨𝑒1⟩ → Unif𝑆⟨ 𝑓 (𝑒1)⟩ for any bijection ⟦ 𝑓 ⟧ : 𝑆 → 𝑆 and FV( 𝑓 ) ⊆ FV(𝑒1) (Unif-Bij) 2.3.3 A Program Logic for Reasoning about Independence We now introduce the program logic layer of probabilistic separation logic. In the spirit of other separation logic (e.g., [Reynolds, 2002, Brookes, 2007a, Jung et al., 2018]), the logic is designed to prove separations and harness separations to prove other properties more easily; in this case, the separation is probabilistic independence. Similar to standard Hoare logic, it has judgments of the form {𝑃} prog {𝑄}, where prog is a probabilistic program command in C, and 𝑃, 𝑄 are BI formulas with atomic propositions in APD. Definition 2.3.6 (Validity). A probabilistic separation logic judgment is valid, written |= {𝑃} prog {𝑄}, if for all 𝜇 ∈ D(Mem[Var]) such that 𝜇 |= 𝑃, we have ⟦prog⟧(𝜇) |= 𝑄. Next, we proceed to the proof system, which consists of program rules for each command and structural rules that match any program command. As be- fore, we use FV(𝑒) to denote the set of free variables in an expression 𝑒. 45 Program Rules The program rules are presented in fig. 2.8a. The rules RASSN and SAMP are for randomized assignment and random sampling. Both rules are presented with the trivial pre-condition ⊤; in practice, one would want to reason about assignments and sampling starting from general pre-conditions, and we will derive variants of these rules with other pre-conditions using the structural rules. There are two rules governing conditionals. In COND, the precondition im- plies that the randomized guard 𝑏 behaves deterministically, and thus either the guard 𝑏 is true and 𝑐 executes or the guard 𝑏 is false and 𝑐′ executes. If both branches guarantee 𝜓 as the post-condition, then 𝜓 is also the post-condition for the conditional. The rule RCOND, on the other hand, applies when the ran- domized guard 𝑏 is separate from the rest of the pre-condition — that is, it must be probabilistically independent of the portion of the randomized memory cap- tured by 𝜑. This independence is crucial for ensuring 𝜑 remains valid as the pre-condition of both branches: each branch’s input distribution is obtained by conditioning on the guard’s value in the original distribution; notably, that con- ditioning operation can invalidate 𝜑 if the guard 𝑏 and variables in 𝜑 are corre- lated, even if they share no variables. To illustrate this, recall [Barthe et al., 2019, Example 1], Example 2.3.1. Suppose that 𝑥, 𝑦, 𝑧 are boolean program variables, and let 𝜇 be the output of: 𝑥 $← UnifB; 𝑦 $← UnifB; 𝑧 ← 𝑥 ∨ 𝑦 In other words, 𝑥 and 𝑦 store the results of two fair coin flips, and 𝑧 stores the value of 𝑥 ∨ 𝑦. Then 𝑥 and 𝑦 are independent in 𝜇, i.e., Own(𝑥) ∗ Own(𝑦) holds in 𝜇. However, if 𝑀 ⊆ Mem[Var] is the set of all randomized memories where 𝑧 = tt, representing the event that 𝑧 is true, then Own( [)𝑥] ∗ Own( [)𝑦] does not 46 hold in 𝜇 | 𝑀 . Intuitively, if we know 𝑧 = tt, then 𝑥 and 𝑦 are correlated: if one is false, then the other must be true. We also need to be more careful when formulating the post-condition for conditionals. Even when both the true branch and the false branch guarantee 𝜓 as the post-condition, it is in general unsound to conclude the post-condition 𝜓 for if 𝑏 then 𝑐 else 𝑐′. We also illustrate this through an example. Example 2.3.2. Suppose that 𝑥, 𝑦, 𝑧 are boolean program variables, and let 𝜇 be the output of: 𝑧 $← Bern1/2; if 𝑧 then 𝑥 $← Bern0.9; 𝑦 $← Bern0.9 else 𝑥 $← Bern0.1; 𝑦 $← Bern0.1 In both the true branch’s output and the false branch’s output, 𝑥 and 𝑦 are probabilistically independent, and thus validating Own(𝑥) ∗ Own(𝑦) as a post- condition. However, in 𝜇, 𝑥 and 𝑦 are not independent: when 𝑥 is true, then more likely is 𝑧 true and the true branch to be executed, and thus 𝑦 is also more likely to be true; the case is similar when 𝑥 is false. To make sure that the post-conditions from the branches can be combined into the post-condition of the conditional, the side condition of RCOND checks that the part of post-condition 𝜓 determines unique portion of the distribution over randomized memories. Formally, we adapt the following class of asser- tions from separation logic [Reynolds, 2002]. Definition 2.3.7. A formula 𝜑 is supported (SP) if there exists a randomized mem- ory 𝜇 such that if 𝜇′ |= 𝜑, then 𝜇 ⊑ 𝜇′. We can prove by induction that the following syntactic conditions ensure SP. 47 Lemma 2.3.5. The following assertions are SP: 𝜂 ::= 𝑝𝑑 | [𝑥 = 𝑣] | 𝑥 $∼ 𝜇 | 𝜂 ∗ 𝜂 Last, the loop rule LOOP is in the same style of COND, which also requires the guard to be deterministic as a consequence of the precondition 𝜑. This side condition essentially restricts the loop to run a deterministic number of itera- tions. In that case, if we have precondition 𝜑 and the program 𝑐 preserves 𝜑 as an invariant, then when the loop while 𝑏 do 𝑐 terminates, we have 𝜑∧ [𝑏 = ff ] as the post-condition. Structural Rules The structural rules are in fig. 2.8b and they apply to Hoare triples with any command 𝑐 as long as the pre- and post-conditions match. The rules WEAK, TRUE, CONJ, and CASE are standard. CONST is the rule of constancy from Hoare logic, which states that, if a for- mula 𝜂 does not mention any of 𝑐’s modified variables MV(𝑐), then it can be conjoined to the pre- and post-condition. This rule is not sound in standard sep- aration logic — motivating the separating conjunction and the frame rule — but it is sound in Probabilistic Separation Logic because writes in pWhile cannot invalidate assertions about other variables. But, the post-condition in CONST does not ensure that 𝜓 and 𝜂 use proba- bilistically independent variables. For this stronger guarantee, we need FRAME, whose side conditions mention several classes of variables. Roughly speaking, RV(𝑐) is the set of variables that 𝑐 may read from, while WV(𝑐) is the set of variables that 𝑐 must write to (before possibly reading from). MV(𝑐) is the set of variables that 𝑐 may write to, so WV(𝑐) is a subset of MV(𝑐). Formally, 48 Definition 2.3.8. RV,WV,MV are defined as follows: RV(𝑥𝑟 ← 𝑒𝑟) ≜ FV(𝑒𝑟) RV(𝑥𝑟 $← 𝜇) ≜ ∅ RV(𝑐 ; 𝑐′) ≜ RV(𝑐) ∪ (RV(𝑐′) \WV(𝑐)) RV(if 𝑏 then 𝑐 else 𝑐′) ≜ FV(𝑏) ∪ RV(𝑐) ∪ RV(𝑐′) RV(while 𝑏 do 𝑐) ≜ FV(𝑏) ∪ RV(𝑐) WV(𝑥𝑟 ← 𝑒𝑟) ≜ {𝑥𝑟} \ FV(𝑒𝑟) WV(𝑥𝑟 $← 𝜇) ≜ {𝑥𝑟} WV(𝑐 ; 𝑐′) ≜ WV(𝑐) ∪ (WV(𝑐′) \ RV(𝑐)) WV(if 𝑏 then 𝑐 else 𝑐′) ≜ (WV(𝑐) ∩WV(𝑐′)) \ FV(𝑏) WV(while 𝑏 do 𝑐) ≜ WV(𝑐) MV(𝑥𝑟 ← 𝑒) ≜ {𝑥𝑟} MV(𝑥𝑟 $← 𝜇) ≜ {𝑥𝑟} MV(𝑐 ; 𝑐′) ≜ MV(𝑐) ∪MV(𝑐′) MV(if 𝑏 then 𝑐 else 𝑐′) ≜ MV(𝑐) ∪MV(𝑐′) MV(while 𝑏 do 𝑐) ≜ MV(𝑐) Last, FRAME says that we can conjoin a formula 𝜂 to both the pre- and post- conditions if 1. 𝜂 does not use any variables modified by the program 𝑐; 2. the program 𝑐 only reads from the part of memories that the precondition 𝜑 describes; 3. the post-condition 𝜓 only talks about variables that the precondition 𝜑 already describes or variables the program 𝑐 writes to. The first condition is standard in separation logic — separation logics for rea- soning about heaps or concurrency also need an analogous condition. The sec- ond and the third condition are needed because our star ∗ asserts probabilistic 49 independence: if 𝑐 reads from variables in 𝜂, then the post-condition 𝜓 may not be independent from 𝜂; if 𝜓 talks about variables that are neither written by the command 𝑐 nor described by 𝜓, those variables may be already correlated with variables in 𝜂. Together, this set of conditions guarantees that 𝜂 refers to vari- ables that are probabilistically independent of 𝜓, thus validating 𝜓 ∗ 𝜂 as the post-condition.2 All these proof rules are sound. Theorem 2.3.6 (Soundness). If ⊢ {𝜑} 𝑐 {𝜓} is derivable, then |= {𝜑} 𝑐 {𝜓}. When proving the soundness of the program rules, we sometimes want to focus on a smaller distribution 𝜇′ inside a given distribution 𝜇 |= 𝜑, such that 𝜇′ satisfies some sub-formula of 𝜑. Specifically, such reasoning is used in the proof for CASE, CONST, and FRAME. To ensure there exists such a smaller distribu- tion, we require the assertion logic to satisfy a key condition called restriction, which says that to check whether a distribution satisfies 𝜑, it suffices to check whether its marginalization on FV(𝜑) satisfies 𝜑. BI formulas in (XD,V∗) satis- fies restriction: Lemma 2.3.7 (Restriction). Let 𝜇 ∈ D(Mem[𝑆]) and let 𝜑 be a BI formula. Then: 𝜇 |= 𝜑⇔ (𝜎, 𝜋FV(𝜑) (𝜇)) |= 𝜑. We leave the proof for theorem 2.3.6 and lemma 2.3.7 to appendix A. Barthe et al. [2019] demonstrates that probabilistic separation logic can be used to prove the correctness of various cryptographic schemes, where security relies on the independence of secrets and public information. 2There also exist other choices for the side conditions of FRAME — we stick with the choice by Barthe et al. [2019]. 50 SKIP ⊢ {𝜑} skip {𝜑} SEQN ⊢ {𝜑} 𝑐 {𝜓} ⊢ {𝜓} 𝑐′ {𝜂} ⊢ {𝜑} 𝑐 ; 𝑐′ {𝜂} DASSN ⊢ {Detm⟨𝑒⟩ ∧ 𝜑[𝑒/𝑥]} 𝑥 ← 𝑒 {Detm⟨𝑥⟩ ∧ 𝜑} RASSN 𝑥𝑟 ∉ FV(𝑒𝑟) ⊢ {⊤} 𝑥𝑟 ← 𝑒𝑟 {[𝑥𝑟 = 𝑒𝑟]} SAMP ⊢ {⊤} 𝑥𝑟 $← 𝜇 {𝑥𝑟 $∼ 𝜇} COND ⊢ {𝜑 ∧ [𝑏 = tt]} 𝑐 {𝜓} ⊢ {𝜑 ∧ [𝑏 = ff ]} 𝑐′ {𝜓} |= 𝜑→ Detm⟨𝑏⟩ ⊢ {𝜑} if 𝑏 then 𝑐 else 𝑐′ {𝜓} RCOND ⊢ {𝜑 ∗ [𝑏 = tt]} 𝑐 {𝜓 ∗ [𝑏 = tt]} ⊢ {𝜑 ∗ [𝑏 = ff ]} 𝑐′ {𝜓 ∗ [𝑏 = ff ]} 𝜓 ∈ SP ⊢ {𝜑 ∗ Own(𝑏)} if 𝑏 then 𝑐 else 𝑐′ {𝜓 ∗ Own(𝑏)} LOOP ⊢ {𝜑 ∧ [𝑏 = tt]} 𝑐 {𝜑} |= 𝜑→ Detm⟨𝑏⟩ ⊢ {𝜑} while 𝑏 do 𝑐 {𝜑 ∧ [𝑏 = ff ]} (a) Program Rules of Probabilistic Separation Logic WEAK ⊢ {𝜑} 𝑐 {𝜓} |= 𝜑′→ 𝜑 ∧ 𝜓 → 𝜓′ ⊢ {𝜑′} 𝑐 {𝜓′} TRUE ⊢ {⊤} 𝑐 {⊤} CONJ ⊢ {𝜑1} 𝑐 {𝜓1} ⊢ {𝜑2} 𝑐 {𝜓2} ⊢ {𝜑1 ∧ 𝜑2} 𝑐 {𝜓1 ∧ 𝜓2} CASE ⊢ {𝜑1} 𝑐 {𝜓1} ⊢ {𝜑2} 𝑐 {𝜓2} ⊢ {𝜑1 ∨ 𝜑2} 𝑐 {𝜓1 ∨ 𝜓2} CONST ⊢ {𝜑} 𝑐 {𝜓} FV(𝜂) ∩MV(𝑐) = ∅ ⊢ {𝜑 ∧ 𝜂} 𝑐 {𝜓 ∧ 𝜂} FRAME ⊢ {𝜑} 𝑐 {𝜓} FV(𝜂) ∩MV(𝑐) = ∅ |= 𝜑→ Own(𝑇 ∪ RV(𝑐)) FV(𝜓) ⊆ 𝑇 ∪ RV(𝑐) ∪WV(𝑐) ⊢ {𝜑 ∗ 𝜂} 𝑐 {𝜓 ∗ 𝜂} (b) Structural Rules of Probabilistic Separation Logic Figure 2.8: Rules of Probabilistic Separation Logic 51 CHAPTER 3 A PROGRAM LOGIC FOR NEGATIVE DEPENDENCE 3.1 Overview In the last chapter, we have seen a program logic for reasoning about proba- bilistic independence. While independence is useful for many applications, it is a strict requirement. A natural question is, what if we do not have perfect in- dependence? Can we use other kind of probabilistic dependencies in program analysis? Utilizing probabilistic dependencies is, for example, important when we an- alyze hashing-based probabilistic data structures such as hash tables and Bloom filters. In these applications, a hash function ℎ maps a universe of possible val- ues, typically large, to a set of buckets, typically small, and items are looked up through their hashes. The performance of hash-based data structures is cap- tured by a variety of probabilistic guarantees, e.g., the space usage, the amor- tized cost of insertion, the amortized cost of look-up, etc. One useful proba- bilistic guarantee is the false positive rate: the probability that a data structure mistakenly identifies an element as being stored in the data structure, when it was not inserted. We may also be interested in load measures, such as the prob- ability that a bucket in the data structure overflows. A typical way to analyze these quantities is to treat random hash functions as balls-into-bins processes. For example, hashing unique elements into bins can be modeled as throwing balls into bins, where each bin is drawn uniformly at random. While this modeling is convenient, one complication is that the counts of 52 the elements in the different buckets are not probabilistically independent: one bin containing many elements makes it more likely that other bins contain few elements. The lack of independence makes it difficult to reason about multi- ple bins, for instance, bounding the number of occupied bins. Moreover, many common tools for analyzing probabilistic processes, like concentration bounds, usually require independence. This subtlety has also been a source of prob- lems in pen-and-paper analyses of probabilistic data structures (e.g., Mullin [1983], Blustein and El-Maazawi [2002]). After many attempts to correct the bounds for Bloom filter’s false positive rate [Bose et al., 2008, Christensen et al., 2010] using pen-and-paper proofs, recently, Gopinathan and Sergey [2020] cer- tified its analysis using a complex proof in ROCQ. We aim to develop a simpler method to formally reason about hash-based data structures and balls-into-bins processes, drawing on a key concept in probability theory: negative depen- dence. While there are multiple incomparable definitions of negative dependence, Joag-Dev and Proschan [1983] proposed a notion called negative association (NA) that shares many good probabilistic properties of probabilistic independence. First, some standard theorems about sums of independent random variables ap- ply more generally to sums of NA random variables. In particular, the widely- used Chernoff bound, which intuitively says that the sum of independent ran- dom variables is close to the expected value of the sum with high probability, holds also for NA variables. Intuitively, it is unlikely for all variables to at- tain high values compared to their expected value, and equally unlikely for all variables to attain low values; thus, their sum most likely stays close with the expected value of the sum. Second, negative association is preserved by some common operations on random variables. For instance, variables that use an 53 independent source of randomness are independent — a crucial property that validates FRAME in probabilistic separation logic; while this does not hold for negative associated variables, the following variation holds: if 𝑋,𝑌 are nega- tively associated, and 𝑍 is obtained by applying monotone map on 𝑋 , then 𝑍,𝑌 are also negatively associated. Such closure properties allow one to prove neg- ative dependence in a compositional way. In this chapter, we introduce a program logic for proving and utilizing neg- ative association and independence. While probabilistic separation logic intro- duced in the last chapter can assert independence using the multiplicative con- junction, its assertion logic cannot express negative association. Inspired by this approach, we think of asserting negative association using another multiplica- tive conjunction, but that means we need to support multiple multiplicative conjunctions in the assertion logic. For that purpose, we propose 𝑀-BI, an ex- tension to bunched logic where each element in 𝑀 is associated with its own multiplicative conjunction and implication. In the following, we first present a BI model for asserting negative association, then combine it with the BI model for asserting probabilistic independence into an 𝑀-BI model, and last, we de- sign a program logic that incorporates compositional proof principles of NA. 3.2 Negative Association We now define negative association precisely and state its properties. Negative association is a property of a set of random variables, formalized as follows: Definition 3.2.1 (Negative Association (NA)). Let 𝑋1, . . . 𝑋𝑛 be random vari- ables. The set {𝑋𝑖}𝑖 is negatively associated (NA) if for every pair of subsets 54 𝐼, 𝐽 ⊆ {1, . . . , 𝑛} such that 𝐼 ∩ 𝐽 = ∅, and every pair of both monotone or both antitone functions1 𝑓 : R|𝐼 | → R and 𝑔 : R|𝐽 | → R, where 𝑓 , 𝑔 is either lower bounded or upper bounded,2 we have: E [ 𝑓 (𝑋𝑖, 𝑖 ∈ 𝐼) · 𝑔(𝑋 𝑗 , 𝑗 ∈ 𝐽) ] ≤ E[ 𝑓 (𝑋𝑖, 𝑖 ∈ 𝐼)] · E [ 𝑔(𝑋 𝑗 , 𝑗 ∈ 𝐽) ] . We can view NA as generalizing independence: a set of independent random variables is NA because equality holds. NA also strengthens negative covariance, a simpler notion of negative dependence that occurs frequently in statistics lit- erature. Negative correlation [Rice, 2007, Chapter 4.3] of 𝑋1, . . . , 𝑋𝑛 says that E  ∏ 𝑖∈[𝑛] 𝑋𝑖  ≤ ∏ 𝑖∈[𝑛] E[𝑋𝑖], which automatically holds if {𝑋1, . . . , 𝑋𝑛} are negatively associated. To see that, we show the following: Lemma 3.2.1. Let 𝑋1, . . . , 𝑋𝑛 be a sequence of NA random variables, then for any family of non-negative all monotone or all antitone functions 𝑓𝑖 : R→ R, E  ∏ 𝑖∈[𝑛] 𝑓𝑖 (𝑋𝑖)  ≤ ∏ 𝑖∈[𝑛] E[ 𝑓𝑖 (𝑋𝑖)] . Proof. We prove it by induction. The base case is when 𝑛 = 1, then trivially, E  ∏ 𝑖∈[𝑛] 𝑓𝑖 (𝑋𝑖)  = E[ 𝑓1(𝑋1)] ≤ ∏ 𝑖∈[𝑛] E[ 𝑓1(𝑋1)] . 1In the following, we will consistently use monotone to mean monotonically non-decreasing and antitone to mean monotonically non-increasing. 2Technically, we slightly modify Dubhashi and Ranjan [1998]’s NA by in addition assuming that 𝑓 , 𝑔 are bounded from one side. We add the condition to have a cleaner version of theo- rem 3.3.1 and Theorem 3.3.5. All our other results and properties we state about NA in Sec- tion 3.2 hold with or without this condition. 55 When 𝑛 > 1, note that the map (𝑋1, . . . , 𝑋𝑛−1) → ∏ 𝑖∈[𝑛−1] 𝑓𝑖 (𝑋𝑖)) is also monotone if all 𝑓𝑖 are monotone, and antitone if all 𝑓𝑖 are antitone, E  ∏ 𝑖∈[𝑛] 𝑓𝑖 (𝑋𝑖)  = E ©­« ∏ 𝑖∈[𝑛−1] 𝑓𝑖 (𝑋𝑖)ª®¬ · 𝑓𝑛 (𝑥𝑛)  = E ©­« ∏ 𝑖∈[𝑛−1] 𝑓𝑖 (𝑋𝑖)ª®¬  · E[ 𝑓𝑛 (𝑥𝑛)] (Because the variables are NA) = ∏ 𝑖∈[𝑛] E[ 𝑓𝑖 (𝑋𝑖)] (By inductive hypothesis) □ In particular, when we take all 𝑓𝑖 to be identity functions, we derive E [∏ 𝑖∈[𝑛] 𝑋𝑖 ] ≤ ∏ 𝑖∈[𝑛] E[𝑋𝑖] from variables being NA. NA variables can arise from various mechanisms. Theorem 3.2.2 (See Dubhashi and Ranjan [1998]). We enumerate three scenarios: 1. The set of independent random variables {𝑋1, . . . , 𝑋𝑛} is negatively associated. 2. If {𝑋1, . . . , 𝑋𝑛} are Bernoulli random variables such that ∑ 𝑖∈[𝑛] 𝑋𝑖 = 1, then the set of variables is negatively associated. 3. Let 𝑋 be a uniformly random permutation of a finite, nonempty multi-set 𝐴, and for each 𝑖, let 𝑋𝑖 be the 𝑖-th entry in the vector 𝑋 . Then {𝑋1, . . . , 𝑋𝑛} is negatively associated. As an example of the second case, consider a deck of cards perfectly shuffled — so that the cards’ order is uniformly sampled from all possible permutations. If, for each 𝑖, 𝑋𝑖 gets the value on the 𝑖-th card, then the variables {𝑋𝑖}𝑖 are neg- atively associated. Also, the third case of this theorem implies that if we draw 56 a length-𝑛 one-hot vector, i.e., a vector that has one entry being one and all re- maining entries being zero, uniformly at random, then the entries of the vector satisfies negative association. The following theorem states three key closure properties of NA random variables. Theorem 3.2.3 (See Dubhashi and Ranjan [1998]). We enumerate three scenarios: 1. For any negatively associated set of variables 𝑇 , and for any 𝑆 that is a non-empty subset of 𝑇 , the set 𝑆 of random variables is negatively associated; 2. For any two sets of negatively associated random variables 𝑇,𝑈 such that every 𝑋 ∈ 𝑇 and 𝑌 ∈ 𝑈 is independent of each other, the union set 𝑇 ∪ 𝑈 of random variables is negatively associated. 3. Let {𝑋1, . . . 𝑋𝑛} be negatively-associated, and 𝐼1, . . . , 𝐼𝑚 be a partition of the set {1, . . . , 𝑛}. For each 1 ≤ 𝑗 ≤ 𝑚, let 𝑓 𝑗 : R|𝐼 𝑗 | → R be monotone. Let 𝑆 = { 𝑓1(𝑋𝑘 , 𝑘 ∈ 𝐼1), . . . , 𝑓𝑚 (𝑋𝑘 , 𝑘 ∈ 𝐼𝑚)}. Then 𝑆 is negatively associated. The first case shows that NA is preserved if we discard random variables, while the second case allows us to join two independent sets of negatively as- sociated random variables to form a larger negatively associated set. Finally, the third case guarantees that negative association is preserved under applying monotone maps on disjoint subsets of variables. Chernoff’s Bound and Negative Association Another nice property of NA is that negatively associated random variables satisfies some frequently used tail bounds, including Chernoff’s bound. 57 Chernoff’s bound is one of the most basic and versatile tools in the life of a theoretical computer scientist, with a seemingly endless amount of applications. — Mulzer [2019] Qualitatively, Chernoff’s bound says the sum 𝑋1 + · · · + 𝑋𝑛 is usually close to its expected value, and upper bounds the probability that the sum deviates from the mean for more than a tolerated amount. This kind of analysis is useful for establishing high-probability guarantees of randomized algorithms, e.g., showing that the error of a random estimate is at most 0.01 with probability at least 99%. There are various formulations of Chernoff’s bound, with different assumptions of the random variables (e.g., {𝑋𝑖}𝑖 being independent Bernoulli random vari- ables, or {𝑋𝑖}𝑖 simply being independent bounded random variables) and dif- ferent ways to measure the error (e.g., the additive form uses the absolute dif- ference between the realized value and the expected value, and the multiplica- tive form uses error ratio). While the mainstream formulation of Chernoff’s bound all require the variables {𝑋𝑖}𝑖 to be independent, Dubhashi and Ranjan [1998] observes that Chernoff’s bound also holds on negatively associated ran- dom variables. We state the result using a formulation in the additive form for [0, 1] bounded random variables. This version is also known as the Hoeffding’s in- equality. Theorem 3.2.4 (Chernoff-Hoeffiding Bound for NA variables [Dubhashi and Ranjan, 1998]). Let 𝑋1, . . . , 𝑋𝑛 be a sequence of NA random variables, each bounded in [0, 1], and let 𝑌 = ∑𝑛 𝑖=1 𝑋𝑖. Then for any failure probability 𝛽 ∈ (0, 1], we have: Pr[|𝑌 − E[𝑌 ] | ≥ 𝛽] ≤ 𝐹 (𝛽, 𝑛) where 𝐹 (𝛽, 𝑛) = e−2𝛽2/𝑛. 58 Or, an equivalent way to express it is, Pr[|𝑌 − E[𝑌 ] | ≥ 𝑇 (𝛽, 𝑛)] ≤ 𝛽 where 𝑇 (𝛽, 𝑛) = √︁ (𝑛/2) ln(1/𝛽). Proof. The proof uses Hoeffding’s lemma: for any real-valued random variable 𝑋 such that 𝑋 ∈ [𝑎, 𝑏] almost surely. Then for any 𝜆 ∈ R, E [ e𝜆(𝑋−E[𝑋]) ] ≤ e𝜆 2 (𝑏−𝑎)2/8. (See, e.g., Romanı́ [2021], for the proof of Hoeffding’s lemma.) For any 𝜆 > 0, the event 𝑌 −E[𝑌 ] ≥ 𝛽 is the same as the event e𝜆(𝑌−E[𝑌 ]) ≥ e𝜆𝛽, Pr(𝑌 − E[𝑌 ] ≥ 𝛽) = Pr(e𝜆(𝑌−E[𝑌 ]) ≥ e𝜆𝛽). By Markov inequality, for any positive random variable 𝑋 , Pr(𝑋 ≥ 𝑎) ≤ E[𝑋] 𝑎 . By regarding e𝜆(𝑌−E[𝑌 ]) as the random variable, the Markov inequality gives us Pr(e𝜆(𝑌−E[𝑌 ]) ≥ e𝜆·𝛽) ≤ E [ e𝜆(𝑌−E[𝑌 ]) ] e𝜆·𝛽 Now we analyze the nominator on the right, E [ e𝜆(𝑌−E[𝑌 ]) ] = E [ e𝜆( ∑ 𝑖∈[𝑛] 𝑋𝑖−E[∑𝑖∈[𝑛] 𝑋𝑖]) ] (By definition of 𝑌 ) = E [ e𝜆· ∑ 𝑖∈[𝑛] (𝑋𝑖−E[𝑋𝑖]) ] (By linearity of expectation) = E  ∏ 𝑖∈[𝑛] e𝜆·(𝑋𝑖−E[𝑋𝑖])  ≤ ∏ 𝑖∈[𝑛] E [ e𝜆·(𝑋𝑖−E[𝑋𝑖]) ] (Because {𝑋𝑖}𝑖 are NA and by lemma 3.2.1) ≤ ∏ 𝑖∈[𝑛] e𝜆 2/8. (Hoeffding’s lemma) Therefore, Pr(e𝜆(𝑌−E[𝑌 ]) ≥ e𝜆·𝛽) ≤ ∏ 𝑖∈[𝑛] e𝜆 2/8 e𝜆·𝛽 = e(𝑛·𝜆 2/8)−𝜆·𝛽 59 Similarly, Pr(E[𝑌 ] − 𝑌 ≥ 𝛽) = Pr(e𝜆(E[𝑌 ]−𝑌 ) ≥ e𝜆·𝛽) = E [ e𝜆· ∑ 𝑖∈[𝑛] (E[𝑋𝑖]−𝑋𝑖) ] e𝜆·𝛽 (By linearity of expectation) ≤ ∏ 𝑖∈[𝑛] E [ e𝜆·(E[𝑋𝑖]−𝑋𝑖) ] e𝜆·𝛽 (Because {𝑋𝑖}𝑖 are NA and by lemma 3.2.1) ≤ ∏ 𝑖∈[𝑛] e𝜆 2/8 e𝜆·𝛽 (Hoeffding’s lemma) = e(𝑛·𝜆 2/8)−𝜆·𝛽 The 𝜆 that minimizes e(𝑛·𝜆 2/8)−𝜆·𝛽 is the 𝜆 that minimizes (𝑛 ·𝜆2/8)−𝜆 ·𝛽, which is 4𝛽 𝑛 . Substitute 4𝛽 𝑛 for 𝜆 in e(𝑛·𝜆 2/8)−𝜆·𝛽, we can reduce the bound e(𝑛·𝜆 2/8)−𝜆·𝛽 into e−2𝛽2/𝑛. □ Crucially, the step that previously relied on independence of {𝑋𝑖}𝑖 now fol- lows from {𝑋𝑖}𝑖 being NA and lemma 3.2.1. 3.3 A BI Frame for Negative Dependence Now that we have seen some nice properties of negative association, we start the quest of building a bunched logic that can assert both negative association and independence. Concretely, we construct a BI frame XPNA that can capture negative association and then combine it with our BI model for probabilistic independence (XD,V∗). To be compatible with (XD,V∗), we let XPNA have the same set of states and the same pre-order as XD. The important remaining piece 60 of the puzzle is the binary operation ⊕, which must satisfy the frame conditions while capturing negative association. The meaning of “capturing negative association” has so far been left ambigu- ous. Previously, in the design process of (XD,V∗), we want the satisfaction of 𝑃 ∗ 𝑄 ensures that 𝑃 and 𝑄 hold on independent components of distributions, and more precisely, we choose to require all variables involved in 𝑃 are inde- pendent of all variables involved in 𝑄. Because 𝑃 ∗ 𝑄 is interpreted through the binary operation ⊗D, we define ⊗D to take the independent product of two dis- tributions when possible. Now, analogously, we want to interpret the formula 𝑃 ⊛ 𝑄 through a binary operation ⊕: 𝑥 |=V∗ 𝑃 ⊛ 𝑄 iff there exist 𝑥′, 𝑦, 𝑧 s.t. 𝑥 ⊒ 𝑥′ ∈ 𝑦 ⊕ 𝑧, 𝑦 |=V∗ 𝑃 and 𝑧 |=V∗ 𝑄 such that 𝑃 ⊛ 𝑄 ensures that 𝑃 and𝑄 hold on negatively associated components of distributions — we use ⊛ for the separating conjunction interpreted on XPNA to distinguish it from the separating conjunction for asserting independence. But negative association is defined for a set of variables instead of two (groups) of variables, and it is unclear what “𝑃,𝑄 holds on negatively associated compo- nents” should mean. In the following, we explore several plausible definitions of ⊕, with the goal that we can express a set of variables 𝑥1, . . . , 𝑥𝑛 is negatively associated using formulas involving ⊛. 3.3.1 Initial Attempts at a BI Frame for Negative Association One first attempt is to let 𝜇1⊕ 𝜇2 be the set of distributions that agree with 𝜇1, 𝜇2, and satisfy strong NA — we say 𝜇 satisfies strong NA if dom(𝜇) satisfies NA. 61 Definition 3.3.1. (Attempt 1: Strong NA model) Recall that 𝑋D = 𝐸D = ∪𝑆⊆VarD(Mem[𝑆]), and for 𝜇, 𝜇′ ∈ 𝑋D, we have 𝜇 ⊑D 𝜇′ iff dom(𝜇) ⊆ dom(𝜇′) and 𝜋dom(𝜇)𝜇 ′ = 𝜇. Define ⊕𝑠 : 𝑋D × 𝑋D → P(𝑋D): 𝜇1⊕𝑠𝜇2 = {𝜇 ∈ D(Mem[𝑆∪𝑇]) | 𝜇 satisfies strong NA, 𝜋𝑆𝜇 = 𝜇1, 𝜋𝑇𝜇 = 𝜇2, 𝑆∩𝑇 = ∅}. We call X𝑠 = (𝑋, ⊑, ⊕𝑠, 𝐸𝑠) the strong NA structure. Unfortunately, the strong NA structure fails to satisfy the Unit Existence con- dition: if 𝜇 does not satisfy strong NA, then there exists no 𝜇′ that marginalizes to 𝜇 and satisfies strong NA; and because our definition of 𝑒 ⊕𝑠 𝜇 only includes distributions 𝜇′ that marginalize to 𝜇, then there is no 𝑒 such that 𝑒 ⊕𝑠 𝜇 is not empty. The failure of this property implies that whether or not two states can be combined depends not just on how the two states relate to each other, but also critically on properties of the single states in isolation (e.g., whether a distribu- tion satisfies strong NA); this is hard to justify if we are to read ⊕ as describing which pairs of states can be safely combined. Looking for a different way of capturing NA, we try working with a weaker notion of NA. We try letting 𝜇1 ⊕ 𝜇2 return distributions that agree with 𝜇1, 𝜇2 where any variable 𝑥 in dom(𝜇1) must be negatively associated with any vari- able 𝑦 in dom(𝜇2), but variables within dom(𝜇1) and variables within dom(𝜇2) need not be negatively associated. We call this notion weak NA. Definition 3.3.2 (Weak NA). Let 𝑆 ⊆ Var be a set of variables, and let 𝐴, 𝐵 be two disjoint subsets of 𝑆. A distribution 𝜇 ∈ D(Mem[𝑆]) satisfies (𝐴, 𝐵)-NA if for every pair of both monotone or both antitone functions 𝑓 : Mem[𝐴] → R, 𝑔 : Mem[𝐵] → R, where we take the point-wise orders on Mem[𝐴] and Mem[𝐵], such that 𝑓 , 𝑔 is either lower bounded or upper bounded, we have E𝑚∼𝜇 [ 𝑓 (𝜋𝐴𝑚) · 𝑔(𝜋𝐵𝑚)] ≤ E𝑚∼𝜇 [ 𝑓 (𝜋𝐴𝑚)] · E𝑚∼𝜇 [𝑔(𝜋𝐵𝑚)] . 62 By definition, being (𝐴, 𝐵)-NA for all disjoint 𝐴, 𝐵 ⊆ 𝑆 is equivalent to strong NA on 𝑆. Also, (𝐴, 𝐵)-NA is closed under projection in the sense that if 𝜇 satis- fies (𝐴, 𝐵)-NA and 𝐴′ ⊆ 𝐴, 𝐵′ ⊆ 𝐵¡ then 𝜇 satisfies (𝐴′, 𝐵′)-NA as well. Now, we try defining a model based on weak NA. Definition 3.3.3. (Attempt 2: Weak NA model) Define ⊕𝑤 : 𝑋D × 𝑋D → P(𝑋D): 𝜇1⊕𝑤𝜇2 = {𝜇 ∈ D(Mem[𝑆∪𝑇]) | 𝜇 satisfies (𝑆, 𝑇)-NA, 𝜋𝑆𝜇 = 𝜇1, 𝜋𝑇𝜇 = 𝜇2, 𝑆∩𝑇 = ∅}. We call X𝑤 = (𝑋D, ⊑D, ⊕𝑤, 𝐸D) the weak NA structure. This weak NA structure satisfies most BI frame conditions, except that Asso- ciativity is unclear. In short, the definition of ⊕𝑤 and Associativity requires that: if 𝑤 satisfies (𝑅 ∪ 𝑆, 𝑇)-NA and (𝑅, 𝑆)-NA, then 𝑤 also satisfies (𝑆, 𝑇)-NA and (𝑅, 𝑆 ∪ 𝑇)-NA. Now 𝑤 satisfies (𝑆, 𝑇)-NA by projection closure, but it is unclear whether (𝑅, 𝑆∪𝑇)-NA follows from these conditions. Failing to satisfy Associa- tivity would lead to a logic where separating conjunction is not associative, and significantly more difficult to use. Since we could not find a counter-example nor prove that X𝑤 satisfies Associativity and thus forms a BI frame, we will leave this question as an open problem and define another structure to capture negative association. 3.3.2 Our BI Frame for Negative Association Facing the problems with the strong NA structure and the weak NA structures, we define a BI model for negative association based on a new notion of negative association called S-partition negative association (S-PNA), where S is a partition of a set of random variables. This notion interpolates weak NA and strong NA in 63 the following sense: when 𝐴, 𝐵 are both sets of variables, {𝐴, 𝐵}-PNA is equiva- lent to (𝐴, 𝐵)-NA for disjoint 𝐴, 𝐵, and {{𝑥} | 𝑥 ∈ 𝑆}-PNA is equivalent to strong NA for distributions in D(Mem[𝑆]). We say a partition S′ coarsens a partition S if ∪S = ∪S′ and for any 𝑠′ ∈ S′, 𝑠′ = ∪R for some R ⊆ S. In particular, any partition S coarsens itself. Definition 3.3.4 (Partition Negative Association). A distribution 𝜇 is S-PNA if and only if for any T that coarsens S, for any family of non-negative mono- tone functions (or family of non-negative antitone functions) { 𝑓𝐴 : Mem[𝐴] → R+}𝐴∈T ,3 where for each 𝐴 ∈ T the order on Mem[𝐴] is taken to be the point- wise order, we have E𝑚∼𝜇 [∏ 𝐴∈T 𝑓𝐴 (𝜋𝐴𝑚) ] ≤ ∏ 𝐴∈T E𝑚∼𝜇 [ 𝑓𝐴 (𝜋𝐴𝑚)] . We can use PNA to encode NA: Theorem 3.3.1. Given a set of variables 𝑆, 𝑆 satisfies NA in 𝜇 iff 𝜇 satisfies S-PNA for any S partitioning 𝑆 iff 𝜇 satisfies {{𝑥} | 𝑥 ∈ 𝑆}-PNA. See appendix B.2.1 for the proof. We require PNA to be closed under coars- ening, which helps us to prove the structure defined next is a BI frame. Definition 3.3.5. Define the operation ⊕ : 𝑋D × 𝑋D → P(𝑋D): 𝜇1 ⊕ 𝜇2 = {𝜇 ∈D(Mem[𝑆 ∪ 𝑇]) | 𝜋𝑆𝜇 = 𝜇1, 𝜋𝑇𝜇 = 𝜇2, 𝜇 is (S ∪ T )-PNA for any partition S,T such that 𝜇1 is S-PNA, 𝜇2 is T -PNA, and (∪S) ∩ (∪T ) = ∅.} 3We restrict the family of functions to be non-negative: prior work like Joag-Dev and Proschan [1983] has assumed non-negativity when working with notions of NA on partitions; furthermore, without that requirement, for partitions with an odd number of components, PNA would be equivalent to independence, a strange property. 64 This definition of ⊕ interpolates ⊕𝑤 and ⊕𝑠, in the following sense. Theorem 3.3.2. For any two states 𝜇1, 𝜇2 ∈ 𝑋 , 𝜇1 ⊕𝑠 𝜇2 ⊆ 𝜇1 ⊕ 𝜇2 ⊆ 𝜇1 ⊕𝑤 𝜇2. The first inclusion is because 𝜇 satisfying strong NA implies 𝜇 is R-PNA for any partition R on dom(𝜇). The second inclusion is because 𝜇1 ∈ D(Mem[𝑆]) satisfies {𝑆}-PNA and 𝜇2 ∈ D(Mem[𝑇]) satisfies {𝑇}-PNA trivially, which im- plies any 𝜇 ∈ 𝜇1 ⊕ 𝜇2 would satisfy (𝑆, 𝑇)-NA. Note that ⊕ is non-deterministic, and not just partial. Theorem 3.3.3. There are distributions 𝜇1, 𝜇2 such that |𝜇1 ⊕ 𝜇2 | ≥ 2. Proof. Let 𝜇1 ∈ D(Mem[{𝑥}]) and 𝜇2 ∈ D(Mem[{𝑦}]) be uniform distribu- tion over memories over boolean variables 𝑥, 𝑦. Then the independent product 𝜇∗ ∈ 𝜇1 ⊗D 𝜇2 is in 𝜇1 ⊕ 𝜇2, because the projections to 𝑥 and to 𝑦 are 𝜇1 and 𝜇2 respectively, and 𝜇∗ satisfies PNA since independence implies PNA (we will see this shortly in Theorem 3.4.5). But the one-hot uniform distribution 𝜇𝑜ℎ over variables 𝑥 and 𝑦, i.e., 𝜇𝑜ℎ ( [𝑥 ↦→ 1, 𝑦 ↦→ 0]) = 𝜇𝑜ℎ ( [𝑥 ↦→ 0, 𝑦 ↦→ 1]) = 1/2, is also in 𝜇1 ⊕ 𝜇2, since again the projections match 𝜇1 and 𝜇2 and the one-hot distribution satisfies NA, and hence PNA. Since 𝜇𝑜ℎ ≠ 𝜇∗, we are done. □ Thus, we build the following BI frame that crucially uses a non-deterministic binary operation on states to capture negative association. Theorem 3.3.4. The structure XPNA = (𝑋D, ⊑D, ⊕, 𝐸D) is a Down-Closed BI frame. For the frame conditions where the previous attempts failed, (Unit Exis- tence) holds by letting the unit 𝑒 to always be the trivial distribution on the empty set, and (Associativity) can be proved using the facts that PNA is closed 65 under coarsening and coarsening commute with projections. See the full proof in Appendix B.2.2. Furthermore, the binary combination ⊕ in XPNA captures negative associa- tion: consider the atomic propositions introduced in eq. (2.3) and the valuation V∗ for them, clearly (XPNA,V∗) forms a BI model; when interpreting BI for- mulas on the model (XPNA,V∗), we can express negative association of a set of variables using the separating conjunction ⊛. We use the iterative version of the connective ∗, which is well-defined because it is associative. Definition 3.3.6. For any connective ⊙ ∈ {∧,∨,⊛, ∗}, we use the corresponding big-connective ⊙ ∈ {∧ , ∨ ,⊛,∗}. • For any constant or logical variable 𝑁 ≥ 1, let ⊙𝑁 𝑖=0 𝑃𝑖 = 𝑃0 abbreviate ((𝑃0⊙𝑃1)⊙· · · )⊙𝑃𝑁−1. Formally, let ⊙𝑁 𝑖=0 𝑃𝑖 = ⊤ if 𝑁 = 0, and let ⊙𝑁 𝑖=0 𝑃𝑖 ≜(⊙𝑁−1 𝑖=0 𝑃𝑖 ) ⊙ 𝑃𝑁 for 𝑁 > 0. • For a finite multiset of formula {𝑃𝑖}𝑖∈𝑆, let ⊙ 𝑠∈𝑆 𝑃𝑠 abbreviate ((𝑃𝑠0 ⊙𝑃𝑠1) ⊙ · · · ) ⊙ 𝑃𝑠𝑘 , where 𝑠0, . . . , 𝑠𝑘 is an arbitrary ordering of 𝑆. The satisfaction is not ambiguous since ⊙ is associative and commutative. • For any program variable 𝑣 ∈ Var, for any state 𝜇 |= [𝑣 = 𝑁], we want ⊙𝑣 𝑖=0 𝑃𝑖 to be equivalent to ⊙𝑁 𝑖=0 𝑃𝑖. Formally, ⊙𝑣 𝑖=0 𝑃𝑖 abbreviates∨ 𝑁∈Val( [𝑣 = 𝑁] ∧ ⊙𝑁 𝑖=0 𝑃𝑖). Theorem 3.3.5. Let 𝑆 be any subset of Var. A set of randomized program variables 𝑌 = {𝑦𝑖 | 0 ≤ 𝑖 < 𝐾} satisfies NA in the distribution 𝜇 ∈ D(Mem[𝑆]) if and only if we have 𝜇 |=⊛𝐾 𝑖=0 Own(𝑦𝑖). Proof. Forward direction: We denote {𝑦𝑖} as 𝑌 [𝑖], and denote {𝑦𝑖 | 0 ≤ 𝑖 < 𝑗} as 𝑌 [: 𝑗]. We prove by induction on 𝑗 that 𝜋𝑌 [: 𝑗]𝜇 |=⊛ 𝑗 𝑖=0 Own(𝑦𝑖). 66 Base case 𝑗 = 1 : Trivially, 𝜋𝑌 [:1]𝜇 |= Own(𝑦0), and then by persistence. Inductive case 𝑗 ≥ 1 : Assuming 𝜋𝑌 [: 𝑗]𝜇 |= ⊛ 𝑗 𝑖=0 Own(𝑦𝑖). Since 𝑌 satisfies NA in 𝜇, by Theorem 3.3.1, 𝜇 is T -PNA for any partition T of 𝑌 . In particu- lar, for any partition T1 on 𝑌 [: 𝑗] and any (trivial) partition T2 on 𝑌 [ 𝑗], 𝜇 must be T1 ∪ T2-PNA. Thus, 𝜋𝑌 [: 𝑗+1]𝜇 ∈ 𝜋𝑌 [: 𝑗]𝜇 ⊕ 𝜋𝑦[ 𝑗+1]𝜇. Since 𝜋𝑌 [ 𝑗]𝜇 |= ⊛ 𝑗 𝑖=0 Own(𝑦𝑖) and 𝜋𝑌 [ 𝑗]𝜇 |= Own(𝑦 𝑗 ), that implies 𝜋𝑌 [: 𝑗+1] |=⊛ 𝑗+1 𝑖=0 Own(𝑦𝑖). Thus, we have 𝜋𝑌 𝜇 |=⊛𝐾 𝑖=0 Own(𝑦𝑖). By persistence, 𝜇 |=⊛𝐾 𝑖=0 Own(𝑦𝑖). Backward direction: for any 𝐴, 𝐵 being disjoint subsets of [𝑛], by commuta- tivity and associativity of ⊛, we can reorder formula and get 𝜇 |= ( ⊛ 𝑖∈𝐴 Own(𝑦𝑖) ⊛⊛ 𝑖∈𝐵 Own(𝑦𝑖) ) ⊛ ⊛ 𝑦𝑖∈[𝑛]\(𝐴∪𝐵) Own(𝑦𝑖) By satisfaction rules, there exists 𝜇1, 𝜇2, 𝜇 ′ such that 𝜇 ⊒ 𝜇′ ∈ 𝜇1 ⊕ 𝜇2, and 𝜇1 |= ⊛𝑖∈𝐴 Own(𝑦𝑖), and 𝜇2 |= ⊛𝑖∈𝐵 Own(𝑦𝑖). Note that 𝜇1 is trivially {𝐴}-PNA, and 𝜇2 is trivially {𝐵}-PNA. Thus, 𝜇′ satisfies {𝐴, 𝐵}-PNA. Therefore, 𝜇 satisfies (𝐴, 𝐵)-NA for any 𝐴, 𝐵 being disjoint subsets of 𝑌 , i.e., 𝜇 satisfies NA on 𝑌 . □ 3.4 𝑀-BI: Combining BI Models Now that we have a BI model for capturing independence and a BI model for capturing negative association, we want to combine them and design an as- sertion logic that can express both independence and negative association; fur- thermore, it would be helpful to internalize the fact that independence implies 67 negative association in the assertion logic. To achieve that goal, we now ex- tend bunched logic to support multiple separating conjunctions related by a pre-order. While our motivation is to use one separating conjunction to assert independence, and use another to assert negative association, the logic poten- tially also has other interesting models. 3.4.1 The Syntax and Proof Rules Let AP be a set of atomic propositions, and (𝑀, ≤) be a finite pre-order. The formula in the logic of 𝑀-bunched implications (𝑀-BI) has the following gram- mar: 𝑃,𝑄 ::= 𝑝 ∈ AP | ⊤ | 𝐼𝑚∈𝑀 | ⊥ | 𝑃 ∧𝑄 | 𝑃 ∨𝑄 | 𝑃→ 𝑄 | 𝑃 ∗𝑚∈𝑀 𝑄 | 𝑃 −∗𝑚∈𝑀 𝑄. We associate each element of 𝑚 ∈ 𝑀 with a separating conjunction ∗𝑚, a corresponding multiplicative identity 𝐼𝑚 and a separating implication −∗𝑚. The proof system for M-BI is based on the proof system for BI, with an indexed copy of rules for each separation, and additionally has the ∗-WEAKENING rules. We present the full Hilbert-style proof system in fig. 3.1. The new rule ∗- WEAKENING simply says that the separation conjunction associated with a big- ger element in 𝑀 is weaker: if 𝑚1 ≤ 𝑚2, then the assertion 𝑃 ∗𝑚1 𝑄 implies 𝑃 ∗𝑚2 𝑄. We can derive analogous weakening rules for separating implications and multiplicative identities, in the reverse direction. 68 𝑃 ⊢ 𝑃 AX 𝑃 ⊢ ⊤ TOP ⊥ ⊢ 𝑃 BOT 𝑃 ⊢ 𝑅 𝑄 ⊢ 𝑅 𝑃 ∨𝑄 ⊢ 𝑅 ∨-E 𝑃 ⊢ 𝑄𝑖 𝑃 ⊢ 𝑄1 ∨𝑄2 ∨-I 𝑃 ⊢ 𝑄 𝑃 ⊢ 𝑅 𝑃 ⊢ 𝑄 ∧ 𝑅 ∧-I-R 𝑄 ⊢ 𝑅 𝑃 ∧𝑄 ⊢ 𝑅 ∧-I-L 𝑃 ⊢ 𝑄1 ∧𝑄2 𝑃 ⊢ 𝑄𝑖 ∧-E 𝑃 ∧𝑄 ⊢ 𝑅 𝑃 ⊢ 𝑄 → 𝑅 →-I 𝑃 ⊢ 𝑄 → 𝑅 𝑃 ⊢ 𝑄 𝑃 ⊢ 𝑅 →-E 𝑃 ⊢ 𝑅 𝑄 ⊢ 𝑆 𝑃 ∗𝑚 𝑄 ⊢ 𝑅 ∗𝑚 𝑆 ∗-CONJ 𝑃 ∗𝑚 𝑄 ⊢ 𝑅 𝑃 ⊢ 𝑄 −∗𝑚 𝑅 −∗-I 𝑃 ⊢ 𝑄 −∗𝑚 𝑅 𝑆 ⊢ 𝑄 𝑃 ∗𝑚 𝑆 ⊢ 𝑅 −∗-E 𝑃 ⊣⊢ 𝑃 ∗𝑚 𝐼𝑚 ∗-UNIT 𝑃 ∗𝑚 𝑄 ⊢ 𝑄 ∗𝑚 𝑃 ∗-COMM (𝑃 ∗𝑚 𝑄) ∗𝑚 𝑅 ⊣⊢ 𝑃 ∗𝑚 (𝑄 ∗𝑚 𝑅) ∗-ASSOC 𝑚1 ≤ 𝑚2 𝑃 ∗𝑚1 𝑄 ⊢ 𝑃 ∗𝑚2 𝑄 ∗-WEAKENING Figure 3.1: Hilbert system for 𝑀-BI Lemma 3.4.1. The following rules are derivable in 𝑀-BI: 𝑚1 ≤ 𝑚2 𝑃 −∗𝑚2 𝑄 ⊢ 𝑃 −∗𝑚1 𝑄 −∗-WEAKENING 𝑚1 ≤ 𝑚2 𝐼𝑚2 ⊢ 𝐼𝑚1 UNITWEAKENING 3.4.2 Semantics As is standard with bunched logics, we give a Kripke style semantics to 𝑀-BI. We will define a structure called 𝑀-BI frame, and then define 𝑀-BI models and the satisfaction rules on 𝑀-BI models. An 𝑀-BI frame is a collection of BI frames sharing the same set of states and pre-order, with ordered binary operations. 69 Definition 3.4.1 (𝑀-BI Frame). An 𝑀-BI frame is a structureX = (𝑋, ⊑, ⊕𝑚∈𝑀 , 𝐸𝑚) such that for each 𝑚, (𝑋, ⊑, ⊕𝑚, 𝐸𝑚) is a BI frame (see definition 2.2.1), and there is a preorder ≤ on 𝑀 satisfying: 𝑚1 ≤ 𝑚2 → 𝑥 ⊕𝑚1 𝑦 ⊆ 𝑥 ⊕𝑚2 𝑦 (Operation Inclusion) The Operation Inclusion condition together with the frame conditions of BI imply an inclusion on unit sets: Lemma 3.4.2. Let X be an 𝑀-BI frame. If 𝑚1 ≤ 𝑚2 then 𝐸𝑚2 ⊆ 𝐸𝑚1 . Proof. Let 𝑒2 ∈ 𝐸𝑚2 . By Unit Existence, there exists 𝑒1 ∈ 𝐸𝑚1 such that 𝑒2 ∈ 𝑒1 ⊕𝑚1 𝑒2. By Operation Inclusion, 𝑒2 ∈ 𝑒1 ⊕𝑚2 𝑒2, so Unit Coherence implies that 𝑒1 ⊑ 𝑒2, and then Unit Closure implies 𝑒2 ∈ 𝐸𝑚1 . So 𝐸𝑚2 ⊆ 𝐸𝑚1 . □ To obtain a 𝑀-BI model over a given 𝑀-BI frame, we need a valuation that defines which states in the 𝑀-BI frame satisfy each atomic proposition. Again, for the soundness of the proof system, the valuation must be persistent: any formula true at a state remains true at any larger state. Definition 3.4.2 (Valuation and model). An 𝑀-BI model (X,V) is an 𝑀-BI frame X = (𝑋, ⊑, ⊕𝑚, 𝐸𝑚) associated with a persistent valuationV on it. Next, we define the satisfaction of 𝑀-BI formula in a 𝑀-BI model. The defi- nition is almost the same as fig. 2.3, except that it supports the 𝑀-indexed sepa- ration conjunctions, implications, and units. Definition 3.4.3. On an 𝑀-BI model (X,V), we define the satisfaction relation |=V between states in X and 𝑀-BI formula: for any 𝑥 ∈ X, 70 𝑥 |=V ⊤ always 𝑥 |=V ⊥ never 𝑥 |=V 𝐼𝑚 iff 𝑥 ∈ 𝐸𝑚 𝑥 |=V p iff 𝑥 ∈ V(p) 𝑥 |=V 𝑃 ∧𝑄 iff 𝑥 |=V 𝑃 and 𝑥 |=V 𝑄 𝑥 |=V 𝑃 ∨𝑄 iff 𝑥 |=V 𝑃 or 𝑥 |=V 𝑄 𝑥 |=V 𝑃→ 𝑄 iff for all 𝑦 ⊒ 𝑥, 𝑦 |=V 𝑃 implies 𝑦 |=V 𝑄 𝑥 |=V 𝑃 ∗𝑚 𝑄 iff there exist 𝑥′, 𝑦, 𝑧 s.t. 𝑥 ⊒ 𝑥′ ∈ 𝑦 ⊕𝑚 𝑧, 𝑦 |=V 𝑃 and 𝑧 |=V 𝑄 𝑥 |=V 𝑃 −∗𝑚 𝑄 iff for all 𝑦, 𝑧 s.t. 𝑧 ∈ 𝑥 ⊕𝑚 𝑦, 𝑦 |=V 𝑃 implies 𝑧 |=V 𝑄 Analogous to the case in standard BI, we say 𝑃 |= 𝑄 iff, for all models (X,V), for any state 𝑥 ∈ X, 𝑥 |=V 𝑃 implies 𝑥 |=V 𝑄. We prove that the proof system for 𝑀-BI is sound and complete with respect to its semantics using the duality- theoretic framework proposed by Docherty [2019]. Theorem 3.4.3. Let 𝑃 and 𝑄 be any two 𝑀-BI formulas. Then 𝑃 |= 𝑄 iff 𝑃 ⊢ 𝑄. We show the proof in appendix B.3 3.4.3 A 𝑀-BI Model for Independence and NA We now combineXPNA with the BI frameXD to construct a 𝑀-BI frame. Since the separating conjunction in XD captures independence, and separating conjunc- tion in XPNA captures negative association, we can expect to use 𝑀-BI formulas interpreted on the combined model to express both probabilistic independence and negative association. We combine XD and XPNA into a 2̂-BI model where 2̂ denotes the set {0, 1} ordered as 0 ≤ 1, the index 0 is associated with the independent combination ⊗D and the index 1 is associated with the NA combination ⊕. 71 Theorem 3.4.4. The structure X𝑁𝐴 = (𝑋D, ⊑D, (⊗D, ⊕), (𝐸D, 𝐸D)) forms a 2̂-BI frame. Proving X𝑁𝐴 forms a 2̂-BI boils down to showing that for any 𝜇1, 𝜇2 ∈ 𝑋 , 𝜇1 ⊗D 𝜇2 ⊆ 𝜇1 ⊕ 𝜇2. The inclusion is implied by the following theorem generalizing the indepen- dence closure for NA (theorem 3.2.2). Its proof, however, is more involved be- cause PNA is more expressive and is closed under coarsening. Theorem 3.4.5 (Independence implies PNA). Let 𝑆, 𝑇 ⊆ Var be two disjoint sets of variables. Suppose 𝜇1 ∈ D(Mem[𝑆]), 𝜇2 ∈ D(Mem[𝑇]). If 𝜇1 satisfies S-PNA and 𝜇2 satisfies T -PNA, then any 𝜇 ∈ 𝜇𝑆 ⊗D 𝜇𝑇 satisfies (S ∪ T )-PNA. The proof is based on the observation that: for any coarsening R of S ∪ T , any block 𝑝 in R is the union of some blocks from S and some blocks from T . Intuitively, by the independence closure for NA, for any block 𝑝 in R, the blocks from S and T that are in 𝑝 are negatively associated with the rest of the blocks in S and T . Because any other block in R is formed by merging some remaining blocks in S and some remaining blocks in T , the block 𝑝 is also negatively asso- ciated with any other block in R. Formally, we establish that proof by induction on the number of blocks in the coarsening R (see appendix B.4.1). Because X𝑁𝐴 has the same carrier set as XD, we can combine X𝑁𝐴 with the persistent valuation V∗ : AP → 𝑋D (Definition 2.3.5) to form a 2̂-BI model (X𝑁𝐴,V∗). In the remaining of this chapter, we take 2̂-BI formulas interpreted in this model as our assertion logic. 72 3.5 Logic of Independence and Negative Association 3.5.1 Assertion Logic When designing a separation logic for reasoning about independence and negative association, we also want the assertion logic to satisfy restriction (lemma 2.3.7) so that, to check whether a distribution satisfies 𝜑, it suffices to check whether the distribution’s projection on FV(𝜑) satisfies 𝜑. Previously, to prove the soundness of the program logic in section 2.3.3, we show that all BI formulas satisfy the restriction property when interpreted on XD. Here, not all 2̂-BI formulas satisfy the restriction property when interpreted onX𝑁𝐴; we iden- tify a subset MBI+ that satisfies restriction. Definition 3.5.1. We define MBI+ as MBI+ ∋ 𝑃,𝑄 ::= 𝑝 ∈ AP | ⊤ | ⊥ | 𝑃 ∧𝑄 | 𝑃 ∨𝑄 | 𝑃→ 𝑄 | 𝑃 ∗ 𝑄 | 𝑃 −∗ 𝑄 | 𝑃 ⊛ 𝑄 where AP is defined as in 2.3. MBI+ omits multiplicative identities 𝐼𝑚 because on (X𝑁𝐴,V∗) they are all equivalent to ⊤. The only limitation is that MBI+ excludes the use of −⊛. Theorem 3.5.1 (Restriction). For any distribution 𝜇 ∈ 𝑋D, for any 𝜑 be an MBI+ formula interpreted on (X𝑁𝐴,V∗), and any valuationV, 𝜇 |=V 𝜑⇔ 𝜋FV(𝜑)𝜇 |=V 𝜑. We defer its proof to appendix B.4.3. Indeed, we can exhibit a counterexam- ple showing that −⊛ does not satisfy restriction. 73 Theorem 3.5.2. There exists 𝜇 ∈ D(Mem[𝑆]) and formula 𝜑 such that 𝜇 |= 𝜑 but 𝜋FV(𝜑) ̸ |= 𝜑. We also defer its proof to appendix B.4.3. Below, we consider MBI+ formula on the (X𝑁𝐴,V∗) model as the assertion logic. In this assertion logic, all the axioms for the Independence BI model section 2.3.2 and the NA BI model sec- tion 3.3.2 still hold. Also, because (X𝑁𝐴,V∗) is a conservative extension of (XPNA,V∗), the theorem theorem 3.3.5 that says NA is captured by separating conjunction ⊛ also still holds. We also have some new axioms for the nega- tive association conjunction, which involves two new distributions introduced below. Definition 3.5.2 (One-Hot Vectors). Let oh(𝑛) denote the set of one-hot vectors of length 𝑛, where a one-hot vector [. . . , 1, . . . ] has exactly one entry set to 1 and all other entries set to 0. We abbreviate Unifoh(𝑛) as OH𝑛. To describe the next distribution, we generalize the function Unif (−) so that it can also be used to describe uniform distributions over multi-sets. A multi- set is an unordered collection of items that allow an item to occur more than once. When 𝐴 is an multi-set, the distribution Unif𝐴 assigns the outcome 𝑥 with weight Unif𝐴 (𝑎) = Multiplicity of 𝑥∑ 𝑦∈𝐴 Multiplicity of 𝑦 . It is clear that, when 𝐴 is simply a set, this definition agrees with our definition of uniform distribution over a set in fig. 2.6. Definition 3.5.3 (Permutations). Given a finite multi-set of 𝐴, a permutation of 𝐴 is a bijective function 𝛼 : 𝐴 → 𝐴. We let permutation(𝐴) be the multi-set of 𝐴’s permutations. When 𝐴 has duplicates, we distinguish them using addi- 74 tional labels; so there are always |𝐴|! elements in permutation(𝐴). We abbreviate Unifpermutation(𝐴) as Permu𝐴. Then, we have the following axioms that introduce formulas that assert neg- ative association among variables. Lemma 3.5.3. Let 𝑥𝛾 be variables. The following axioms are valid in (X𝑁𝐴,V∗). |= OH𝑁 ⟨[𝑥0, . . . , 𝑥𝑁−1]⟩ → 𝑁 ⊛ 𝛾=0 Own(𝑥𝛾) (OH-PNA) |= Permu𝐴⟨[𝑥0, . . . , 𝑥𝑁−1]⟩ → 𝑁 ⊛ 𝛾=0 Own(𝑥𝛾) (Perm-PNA) The two axioms follow from Theorem 3.2.2, which shows that random vari- ables in one-hot distributions and permutation distributions are NA, and Theo- rem 3.3.5, which shows that ⊛ captures the NA of random variables. We can also encode the monotone map closure in Theorem 3.2.3 as an axiom in the logic. Lemma 3.5.4 (BINARY MONOTONE MAP). The following is valid in (X𝑁𝐴,V∗). |= (𝜑 ⊛ 𝜂 ∧ [𝑦 = 𝑓 (FV(𝜑))]) → Own(𝑦) ⊛ 𝜂 where 𝑓 is monotone (Binary-Mono-Map) Proof. For any 𝜇 |= 𝜑 ⊛ 𝜂 ∧ [𝑦 = 𝑓 (FV(𝜑))], there exists 𝜇1, 𝜇2, 𝜇 ′ such that 𝜇 ⊒ 𝜇′ ∈ 𝜇1 ⊕ 𝜇2, 𝜇1 |= 𝜑 and 𝜇2 |= 𝜂; furthermore, for any 𝑚 such that 𝜇(𝑚) > 0, ⟦𝑦⟧(𝑚) = ⟦ 𝑓 (𝑋)⟧(𝑚). Let 𝑆 = dom𝜇2. and let 𝜇′′ denote 𝜋𝑆∪{𝑦}𝜇. We want to show that 𝜇′′ ∈ (𝜋𝑦𝜇) ⊕ 𝜇2. For any partition {𝑆1, . . . , 𝑆𝑘 } of 𝑆, for any family of non-negative all 75 monotone or all antitone functions 𝑓0, 𝑓1, . . . , 𝑓𝑘 , E𝑚∈𝜇′′  𝑓0(𝜋𝑦𝑚) · ∏ 𝑖∈[𝑘] 𝑓𝑖 (𝜋𝑆𝑖𝑚)  = E𝑚∈𝜇  𝑓0(𝜋𝑦𝑚) · ∏ 𝑖∈[𝑘] 𝑓𝑖 (𝜋𝑆𝑖𝑚)  (𝜇′′ is a marginalization of 𝜇) = E𝑚∈𝜇  𝑓0( 𝑓 (𝜋𝑋𝑚)) · ∏ 𝑖∈[𝑘] 𝑓𝑖 (𝜋𝑆𝑖𝑚)  (Because ⟦𝑦⟧(𝑚) = ⟦ 𝑓 (𝑋)⟧(𝑚)) ≤ E𝑚∈𝜇 [ 𝑓0( 𝑓 (𝜋𝑋𝑚))] · ∏ 𝑖∈[𝑘] E𝑚∈𝜇′′ [ 𝑓𝑖 (𝜋𝑆𝑖𝑚) ] (Because 𝜇 ⊒ 𝜇′ ∈ 𝜇1 ⊕ 𝜇2) ≤ E𝑚∈𝜇 [ 𝑓0(𝑦)] · ∏ 𝑖∈[𝑘] E𝑚∈𝜇′′ [ 𝑓𝑖 (𝜋𝑆𝑖𝑚) ] (Because ⟦𝑦⟧(𝑚) = ⟦ 𝑓 (𝑋)⟧(𝑚)) ≤ E𝑚∈𝜇′′ [ 𝑓0(𝑦)] · ∏ 𝑖∈[𝑘] E𝑚∈𝜇′′ [ 𝑓𝑖 (𝜋𝑚𝑆𝑖)] (𝜇′′ is a marginalization of 𝜇) Also, because 𝜇′′ = 𝜋𝑆∪{𝑦}𝜇, we have 𝜋𝑦𝜇′′ = 𝜋𝑦𝜇 and 𝜋𝑆𝜇 ′′ = 𝜋𝑆𝜇 = 𝜇2. Thus, 𝜇′′ ∈ 𝜋𝑦𝜇⊕𝜇2. Because 𝜋𝑦𝜇 |= Own(𝑦), we have 𝜇′′ |= Own(𝑦) ⊛ 𝜂. By persistence, 𝜇′ |= Own(𝑦) ⊛ 𝜂. □ (Mono-Map) will play an important role in reasoning about negative associ- ation arising in probabilistic programs. Furthermore, we can prove an N-nary version of the monotone map axiom. Lemma 3.5.5 (N-NARY MONOTONE MAP). Let 𝑥, 𝑥𝛾,𝛼 and 𝑦𝛾 be program variables. Let 𝐾𝛾 be natural numbers. The following is valid in (X𝑁𝐴,V∗). |= 𝑁 ⊛ 𝛾=0 ©­« 𝐾𝛾∧ 𝛼=0 Own(𝑥𝛾,𝛼)ª®¬ ∧ 𝑁∧ 𝛾=0 [ 𝑦𝛾 = 𝑓𝛾 ( 𝑥𝛾,0, . . . , 𝑥𝛾,𝐾𝛾 )] → 𝑁 ⊛ 𝛾=0 Own(𝑦𝛾) when 𝑓1, . . . , 𝑓𝑁 all monotone or all antitone (Mono-Map) 76 We defer the proof to appendix B.4.2. We also have an axiom particular to permutation distributions. When we establish NA from permutation distributions, it is preserved under not only monotone/antitone maps but also any element-wise homogeneous maps. The reason is that fixing a multi-set and a permutation, permuting first, and then applying the same map on each element is equivalent to applying the map on each element and then permuting. So applying homogeneous maps on a per- mutation distribution gives another permutation distribution. We capture this property in the following axiom. Lemma 3.5.6 (Permutation Map). Let 𝑥𝛾 be variables, and 𝑓 (𝐴) be { 𝑓 (𝑎) | 𝑎 ∈ 𝐴}. The following axiom is valid in (X𝑁𝐴,V∗). |= Permu𝐴⟨[𝑥1, . . . , 𝑥𝑁 ]⟩ ∧ [𝑦 = [ 𝑓 (𝑥1), . . . , 𝑓 (𝑥𝑁 )]] → Permu 𝑓 (𝐴) ⟨𝑦⟩ (Perm-Map) The proof is straightforward by unfolding the definitions, so we omit it here. 3.5.2 Program Logic We now build upon the assertion logic and develop a program logic LINA for reasoning about independence and negative association in probabilistic pro- grams. Judgements in LINA have the form {𝑃} 𝑐 {𝑄}, where 𝑐 ∈ C is a proba- bilistic program introduced in fig. 2.8, and bunched formulas 𝑃,𝑄 are restricted assertions in MBI+. Definition 3.5.4 (Validity). A LINA judgment is valid, written |= {𝑃} 𝑐 {𝑄}, if for all 𝜇 ∈ Mem[Var] such that 𝜇 |= 𝑃, we have ⟦𝑐⟧(𝜇) |= 𝑄. 77 Next, we present the proof system of LINA. Since our assertions are a con- servative extension of assertions from probabilistic separation logic, all the rules from Figure 2.8 carry over unchanged. We have one new program rule NEGFRAME, which acts as the frame rule for the negative association sep- arating conjunction ⊛, and one new structural rule RCASE, which does case analysis where each case only has some probability of occurring. NEGFRAME |= 𝜑→ Own(𝑋) FV(𝜂) ∩MV(𝑐) = ∅ 𝑋 ∩MV(𝑐) = ∅ ⊢ {𝜑} 𝑐 {[𝑦 = 𝑓 (𝑋)]} 𝑓 is a monotone function ⊢ {𝜑 ⊛ 𝜂} 𝑐 {Own(𝑦) ⊛ 𝜂} RCASE 𝜂 ∈ CC ∀𝛼 ∈ 𝑆. ⊢ {𝜑 ∗ 𝜂(𝛼)} 𝑐 {𝜓} |=Mem 𝜂→ ∨ 𝛼∈𝑆 𝜂(𝛼) 𝜓 ∈ 𝐶𝑀 ⊢ {𝜑 ∗ 𝜂} 𝑐 {𝜓} PROBBOUND ⊢ {𝑒𝑣1 = 1} 𝑐 {Pr[𝑒𝑣2] ≤ 𝛿} ⊢ {Pr[𝑒𝑣1] ≥ 1 − 𝜖} 𝑐 {Pr[𝑒𝑣2] ≤ 𝛿 + 𝜖} Figure 3.2: New LINA rules. Informally, the NEGFRAME rule says that if a set of variables 𝑋 is negatively associated with another set of variables 𝑌 that satisfy 𝜂 in a program state, and the program 𝑐 performs a monotone operation 𝑓 on 𝑋 and stores the result in a variable 𝑦, then in the resulting program state, 𝑦 and the untouched variables 𝑌 will also be negatively associated, and 𝑌 will still satisfy 𝜂. Like the FRAME rule for independence ∗, the NEGFRAME rule uses syntactic restrictions to control which variables the program may read and write. The three sets of variables RV(𝑐),WV(𝑐),MV(𝑐) are the ones defined in definition 2.3.8. Roughly, the side conditions guarantee the program 𝑐 does not read from or modify 𝑌 , the set of variables satisfying 𝜂; they in addition guarantee that 𝑋 , the domain of the monotone map will not be modified by 𝑐, and 𝑦, the codomain of the monotone 78 map does not belong to 𝑌 . For RCASE, we write |=Mem 𝑃 iff ∀𝑚 ∈ ∪𝑆⊆VarMem[𝑆], 𝛿(𝑚) |= 𝑃. We say a formula 𝜂 is closed under conditioning (CC) iff for any 𝜇, 𝜇 |= 𝜂 implies that for any event 𝑆, we have 𝜇 | 𝑆 |= 𝜂. And as in RCOND, a formula 𝜂 in CM means that 𝜂 is closed under the mixture. At a high-level, RCASE allows us to first condition the input distribution on one specific case, reason about the post-condition with the conditioned input distribution, and then use the post-condition – we implicitly combined post-conditions from different cases by requiring the post-condition to be closed under the mixture. Last, we present the rule PROBBOUND to facilitate bounding probabilities. It says that if the pre-condition 𝑒𝑣1 = 1 guarantees that event 𝑒𝑣2 happens with at most 𝛿 probability after command 𝑐, then in general, event 𝑒𝑣2 happens with at most probability 𝛿 + 𝜖 after 𝑐, where 𝜖 upper bounds the probability that 𝑒𝑣1 is not true in the pre-condition. The validity of this rule uses the law of total probability, which says for any two events 𝑒𝑣1 and 𝑒𝑣2, Pr(𝑒𝑣1) = Pr(𝑒𝑣1 | 𝑒𝑣2) · Pr(𝑒𝑣2) + Pr(𝑒𝑣1 | ¬𝑒𝑣2) · Pr(¬𝑒𝑣2) ≤ Pr(𝑒𝑣1 | 𝑒𝑣2) + Pr(¬𝑒𝑣2). As expected, the LINA proof system is sound. Theorem 3.5.7. (Soundness of LINA) If ⊢ {𝜑} 𝑐 {𝜓} is derivable, then it is valid: |= {𝜑} 𝑐 {𝜓}. Proof. We prove the soundness of each new rule in LINA. NEGFRAME We show that NEGFRAME follows from Binary-Mono-Map and ex- isting program rules. By CONST, the side conditions FV(𝜂) ∩MV(𝑐) = ∅ 79 and 𝑋 ∩ MV(𝑐) imply that {Own(𝑋) ⊛ 𝜂} 𝑐 {Own(𝑋) ⊛ 𝜂}. Because |= 𝜑 → Own(𝑋), by ∗-CONJ, it must |= 𝜑 ⊛ 𝜂 → Own(𝑋) ⊛ 𝜂. Thus, by WEAK, {𝜑 ⊛ 𝜂} 𝑐 {Own(𝑋) ⊛ 𝜂} . Also by WEAK, the premise {𝜑} 𝑐 {[𝑦 = 𝑓 (𝑋)]} implies {𝜑 ⊛ 𝜂} 𝑐 {[𝑦 = 𝑓 (𝑋)]}. Thus, by CONJ, {𝜑 ⊛ 𝜂} 𝑐 {Own(𝑋) ⊛ 𝜂 ∧ [𝑦 = 𝑓 (𝑋)]} . By (Binary-Mono-Map), Own(𝑋) ⊛ 𝜂 ∧ [𝑦 = 𝑓 (𝑋)] implies Own(𝑦) ⊛ 𝜂. Thus, by WEAK again, {𝜑 ⊛ 𝜂} 𝑐 {Own(𝑦) ⊛ 𝜂} . RCASE For any 𝜇 |= 𝜑 ∗ 𝜂, there exists 𝜇1, 𝜇2, 𝜇 ′ such that 𝜇 ⊇ 𝜇′ ∈ 𝜇1 ◦ 𝜇2, 𝜇1 |= 𝜑, and 𝜇2 |= 𝜂. The formula 𝜂 being CC means for any 𝑚 in the support of 𝜇2, 𝛿𝑚 |= 𝜂 as well. Then, with the side condition |=Mem 𝜂 → ∨ 𝛼∈𝑆 𝜂𝛼, we have 𝛿𝑚 |= ∨ 𝛼∈𝑆 𝜂(𝛼). Combining the side-condition that {𝜂(𝛼)} 𝑐 {𝜓} for all 𝛼 with CASE, we get {∨𝛼∈𝑆 𝜂(𝛼)} 𝑐 {𝜓}. Thus, for any 𝑚 ∈ supp(𝜇) we have ⟦𝑐⟧(𝛿𝑚) |= 𝜓. According to the semantics, ⟦𝑐⟧(𝜇) is a convex combination of ⟦𝑐⟧(𝛿𝑚) for different 𝑚, and thus ⟦𝑐⟧𝜇 |= 𝜓. PROBBOUND Denote the function 𝜆𝑥.1 − 𝑒𝑣1(𝑥) as ¬𝑒𝑣1. For any program state 𝜇 |= Pr[𝑒𝑣1] ≥ 1− 𝜖 , let 𝜌 = 𝜇(𝑒𝑣1), it must 𝜌 ≤ 1− 𝜖 . Let 𝜇𝑒𝑣1 = ⟦𝑐⟧(𝜇 | 𝑒𝑣1) and let 𝜇¬𝑒𝑣1 = ⟦𝑐⟧(𝜇 | ¬𝑒𝑣1). By induction on the denotational semantics of the commands, we can prove that ⟦𝑐⟧(𝜇) = 𝜇𝑒𝑣1 ◦𝜌 𝜇¬𝑒𝑣1 . 80 Also, by construction, ⟦𝑒𝑣1⟧(𝜇 | 𝑒𝑣1) = 1, so 𝜇 | 𝑒𝑣1 |= 𝑒𝑣1. By the rule’s assumption and inductive hypothesis, we have |= {𝑒𝑣1} 𝑐 {Pr[𝑒𝑣2] ≤ 𝛿}, which implies 𝜇𝑒𝑣1 |= Pr[𝑒𝑣2] ≤ 𝛿. Thus, we have 𝜇𝑒𝑣1 (𝑒𝑣2) ≤ 𝛿. Then, by definition and the law of total probability, ⟦𝑐⟧(𝜇) (𝑒𝑣2(𝑥)) = (𝜇𝑒𝑣1 ◦𝜌 𝜇¬𝑒𝑣1) (𝑒𝑣2) ≤ 𝜌 · 𝜇𝑒𝑣1 (𝑒𝑣2) + (1 − 𝜌) ≤ 𝜌 · 𝛿 + (1 − 𝜌) ≤ 𝜌 · 𝛿 + 𝜖 ≤ 𝛿 + 𝜖 That ensures ⟦𝑐⟧(𝜇) |= Pr[𝑒𝑣2] ≤ 𝛿 + 𝜖 . □ 3.6 Examples Now that we have introduced LINA, we present a series of formalized case studies. Our examples are extracted from various algorithms using hashing and balls-into-bins processes. 3.6.1 Probability-related Axioms for Examples Our examples will use a handful of standard facts about probability distribu- tions, encoded as axioms in the assertion logic. For completeness, we list the 81 axioms used below. We also observe the following conventions throughout the examples: logical variables are denoted by Greek (𝛼, 𝛽, 𝛾, . . . ) and capital Ro- man letters (𝑀, 𝑁, 𝐾, . . . ). Program variables start with lower-case Roman let- ters (𝑥, 𝑦, 𝑧, . . . ). The most important axiom is the one encoding Chernoff Bound: in each of our examples, we establish negative dependence of a sequence of random vari- ables {𝑋𝑖}𝑖 and apply the Chernoff bound to derive a tail bound. In our assertion logic, the Chernoff bound can be encoded as the following axiom schema: Theorem 3.6.1 (Chernoff bound, axiom). Let {𝑥𝛼} be a family of variables indexed by 𝛼, where each variable is bounded in [0, 1] and is a monotone function of its program variables. Then for any 𝛽 ∈ (0, 1], the following axiom schema is sound in our model: |= 𝑁 ⊛ 𝛼=0 Own(𝑥𝛼) → Pr [ ����� 𝑁∑︁ 𝛼=0 𝑥𝛼 − E [ 𝑁∑︁ 𝛼=0 𝑥𝛼 ] ����� ≥ 𝛽 ] ≤ 𝐹 (𝛽, 𝑛) (NA-Chernoff-1) |= 𝑁 ⊛ 𝛼=0 Own(𝑥𝛼) → Pr [ ����� 𝑁∑︁ 𝛼=0 𝑥𝛼 − E [ 𝑁∑︁ 𝛼=0 𝑥𝛼 ] ����� ≥ 𝑇 (𝛽, 𝑛) ] ≤ 𝛽 (NA-Chernoff-2) For the other axioms, we present the axioms in binary form for simplicity, though most extend directly to big operations. • Linearity of expectation. Let 𝑒, 𝑓 be bounded expressions. |= [E[𝛼 · 𝑒 + 𝛽 · 𝑓 ] = 𝛼 · E[𝑒] + 𝛽 · E[ 𝑓 ]] (LinExp) • Union bound. Let 𝑒𝑣1, 𝑒𝑣2 ∈ EV, |= Pr[𝑒𝑣1 ∨ 𝑒𝑣2] ≤ Pr[𝑒𝑣1] + Pr[𝑒𝑣2] (UnionBd) • Permutation marginal. Let 𝑥 be an array variable, and let 𝑆 be a finite set. |= Permu𝑆⟨𝑥⟩ → Unif𝑆⟨𝑥 [𝛼]⟩ (PermMarg) 82 • Expectation Indicator. Let 𝑒 be a 0/1 valued expression, |= [E[𝑒] = Pr[𝑒 = 1]] (ExpectInd) • Bernoulli variables probabilities. Let 𝑒 be an expression, |= Bern𝑝 ⟨𝑒⟩ → Pr[𝑒 = 1] = 𝑝 (BernProb) • Probability of uniform. Let 𝑆 be a finite set. |= [Pr[Unif𝑆⟨𝑥⟩ = 𝛼] = 1/|𝑆 |] (ProbUnif) • Bijection uniform. Let 𝑆 be a finite set, and let 𝑓 : 𝑆 → 𝑆 be a bijection. |= Unif𝑆⟨𝑥⟩ → Unif𝑆⟨ 𝑓 (𝑥)⟩ (BijectUnif) • One-hot marginal. Let 𝑥 be an array variable. |= OH𝑆⟨𝑥⟩ → Unif𝑆⟨𝑥 [𝛼]⟩ (OHMarg) • Independent product one-hot. |= OH[𝑀] ⟨𝑥⟩ ∗ OH[𝑁] ⟨𝑦⟩ → OH[𝑀]×[𝑁] 〈 𝑥⊤ · 𝑦 〉 (IndProdOH) • Independent map. Let 𝑥 be an array variable of length 𝑁 . |= 𝑁∗ 𝛼=0 𝑥 [𝛼] $∼→ 𝑁∗ 𝛼=0 𝑓 (𝑥 [𝛼]) $∼ (IndMap) • Deterministic independent. Let 𝑥 be a variable. |= Detm⟨𝑥⟩ → 𝑥 $∼∗ 𝑒 $∼ (DetInd) • Events happen only if they have probability one. Let 𝑒𝑣 ∈ EV, |= 𝑒𝑣 = 1→ Pr(𝑒𝑣) = 1 (ProbOne) 83 • Uniform sampling from a population. We represent a population as a bit- vector, where each entry is an individual and 1 indicates they have some feature and 0 indicates not. Then, if we uniformly sample from the popula- tion, the probability of getting one is equal to the population-level ratio of ones, regardless of how they are distributed in the population. Let 𝑁 ≥ 𝐽 be constants or logical variables, 𝑏 be an array variable of length 𝑁 , and 𝑥, ℎ𝑖𝑡 be variables: |= ( ( bv(𝑏, 𝐽, 𝑁) ∗ Unif [𝑁] ⟨𝑥⟩ ) ∧ [ℎ𝑖𝑡 = 𝑏[𝑥]] ) → Bernℎ𝑖𝑡 〈 𝐽 𝑁 〉 ∗ ©­« 𝑁∑︁ 𝛽=0 𝑏[𝛽] = 𝐽ª®¬. (UniformSamp) • Independent product probabilities. Let 𝑒𝑣1, 𝑒𝑣2 ∈ EV , 𝐽, 𝐾 be two real numbers, |= Pr[𝑒𝑣1] ≤ 𝐽 ∗ Pr[𝑒𝑣2] ≤ 𝐾 → Pr[𝑒𝑣1 ∧ 𝑒𝑣2] ≤ 𝐽 · 𝐾. (IndepProb) • Equal probabilities. Let 𝑏1, 𝑏2 be two boolean expressions. Recall that 𝑏1, 𝑏2 ∈ EV too. |= [𝑏1 = 𝑏2] → Pr[𝑏1] = Pr[𝑏2] (EqualProb) 3.6.2 Bloom filter, High-level We demonstrate how NA and its closure properties can be used to analyze Bloom filters. A Bloom filter is a space-efficient probabilistic data structure for storing a set of items from a universe 𝑈. An 𝑁-bit Bloom filter consists of a length-𝑁 array 𝑏𝑙𝑜𝑜𝑚 holding zero-one entries. We assume there is a family 𝑆 of hash functions mapping𝑈 to {0, . . . , 𝑁 − 1} and a distributionH over 𝑆 such that for any 𝑥 ∈ 𝑈 and any bucket 𝑘 , Pr 𝑓∼H ( 𝑓 (𝑥) = 𝑘) = 1/𝑁 . Let 𝑙1, . . . , 𝑙𝐻 be a 84 BLOOM : 𝑏𝑙𝑜𝑜𝑚 ← zero(𝑁); 𝑚 ← 0; while 𝑚 < 𝑀 do ℎ← 0 while ℎ < 𝐻 do 𝑏𝑖𝑛 $← OH[𝑁] ; 𝑢𝑝𝑑 ← 𝑏𝑙𝑜𝑜𝑚 | | 𝑏𝑖𝑛; 𝑏𝑙𝑜𝑜𝑚 ← 𝑢𝑝𝑑; ℎ← ℎ + 1; 𝑚 ← 𝑚 + 1; (a) Higher-level version BLOOMARRAY : 𝑏𝑙𝑜𝑜𝑚 ← zero(𝑁); 𝑚 ← 0; while 𝑚 < 𝑀 do ℎ← 0 while ℎ < 𝐻 do 𝑏𝑖𝑛 $← OH[𝑁] ; 𝑛← 0; while 𝑛 < 𝑁 do 𝑢𝑝𝑑 ← 𝑏𝑙𝑜𝑜𝑚 [𝑛] | | 𝑏𝑖𝑛[𝑛]; 𝑏𝑙𝑜𝑜𝑚 [𝑛] ← 𝑢𝑝𝑑; 𝑛← 𝑛 + 1 ℎ← ℎ + 1; 𝑚 ← 𝑚 + 1 (b) Array version Figure 3.3: Bloom filter examples collection of hash functions drawn from H . We assume the hash functions are independent, meaning the collection of variables {𝑙𝑖 (𝑥) | 𝑥 ∈ 𝑈, 𝑖 ∈ {1, . . . , 𝐻}} are independent. To add an item 𝑥 ∈ 𝑈 to the filter, we compute 𝑙1(𝑥), . . . , 𝑙𝐻 (𝑥) to get 𝐻 positions in the bit array 𝑏𝑙𝑜𝑜𝑚 and then set the bits at each of these positions to 1. To check if an item 𝑦 is in the filter, we check whether the bits at positions 𝑙1(𝑦), . . . , 𝑙𝐻 (𝑦) in 𝑏𝑙𝑜𝑜𝑚 are all 1. If they are, the item is said to be in the filter, but if any is 0, then the item is not in the filter. This membership test may suffer from false positives, i.e., it may show that an item 𝑦 is in the filter even when 𝑦 was never added to the filter. This can happen because, with hash collisions, other items added to the Bloom filter could set all the bits at loca- tions 𝑙1(𝑦), . . . , 𝑙𝐻 (𝑦) to 1. A basic quantity of interest is the false positive rate: the probability that a Bloom filter reports a false positive. We model the process of adding 𝑀 distinct items into a Bloom filter as the program BLOOM in fig. 3.3a. Because the 𝑀 items are distinct, we model the 85 hash functions as if they independently, randomly sample hash values for each item as they are added, a standard model used in the analysis of hashing data structures [Mitzenmacher and Upfal, 2005]. That is, we encode the hashing step as sampling a one-hot vector from the distribution OH[𝑁] and storing it in the variable 𝑏𝑖𝑛, where the hot bit of the vector 𝑏𝑖𝑛 represents the selected position. To set the corresponding position in the filter to 1, we update 𝑏𝑙𝑜𝑜𝑚, which is set to be an all-zero vector at the beginning of the program, to be 𝑏𝑙𝑜𝑜𝑚 | | 𝑏𝑖𝑛, the bitwise-or of the current array and the sampled one-hot array. Our goal is to bound an 𝑁-bit Bloom filter’s false positive rate after 𝑀 dis- tinct items are added. We split the analysis of the false positive rate into two steps. First, we will analyze BLOOM and prove that the entries in 𝑏𝑙𝑜𝑜𝑚 are negatively associated at the end of the process. By (NA-Chernoff-2), NA be- tween the entries of 𝑏𝑙𝑜𝑜𝑚 gives a tail bound of the fraction of bits in 𝑏𝑙𝑜𝑜𝑚 that are set to 1. Second, we analyze a program that checks the membership of a new item in a given Bloom filter, presented as CHECKMEM in fig. 3.4, and bound the probability that the 𝐻 hashed values of the new item are all already in the Bloom filter. Last, we combine them into one proof that bounds the false positive rate of a Bloom filter with 𝑀 elements. Proving NA of BLOOM Recall that the code models inserting 𝑀 distinct el- ements into a Bloom filter backed by an array 𝑏𝑙𝑜𝑜𝑚 of length 𝑁 , where each element is hashed by 𝐻 functions, each producing an element of [𝑁] uniformly at random. We refer to the outer loop as outer, and the inner loop as inner. For both the outer and the inner loop, we apply the rule LOOP with the loop invari- 86 ant: ⊛𝑁 𝛽=0 Own(𝑏𝑙𝑜𝑜𝑚 [𝛽]). We consider the inner loop first. We show that the invariant is preserved by the body of inner. After the sampling command 𝑏𝑖𝑛 $← OH[𝑁] , SAMP gives: ©­« 𝑁 ⊛ 𝛽=0 Own(𝑏𝑙𝑜𝑜𝑚 [𝛽])ª®¬ ∗ OH[𝑁] ⟨𝑏𝑖𝑛⟩ By negative association of the one-hot distribution (OH-PNA), we get ©­« 𝑁 ⊛ 𝛽=0 Own(𝑏𝑙𝑜𝑜𝑚 [𝛽])ª®¬ ∗ ©­« 𝑁 ⊛ 𝛾=0 𝑏𝑖𝑛[𝛾]ª®¬ which implies ©­« 𝑁 ⊛ 𝛽=0 Own(𝑏𝑙𝑜𝑜𝑚 [𝛽])ª®¬ ⊛ ©­« 𝑁 ⊛ 𝛾=0 𝑏𝑖𝑛[𝛾]ª®¬ using WEAK. By rearranging terms, this is equivalent to 𝑁 ⊛ 𝛽=0 Own(𝑏𝑙𝑜𝑜𝑚 [𝛽]) ⊛ Own(𝑏𝑖𝑛[𝛽]). After the assignment to 𝑢𝑝𝑑, we have: ©­« 𝑁 ⊛ 𝛽=0 Own(𝑏𝑙𝑜𝑜𝑚 [𝛽]) ⊛ Own(𝑏𝑖𝑛[𝛽])ª®¬ ∧ [𝑢𝑝𝑑 = 𝑏𝑙𝑜𝑜𝑚 | | 𝑏𝑖𝑛] . Because | | is monotone, applying the monotone mapping axiom (Mono-Map) gives us: 𝑁 ⊛ 𝛽=0 Own(𝑢𝑝𝑑 [𝛽]). Using the assignment rule (RASSN) on the assignment to bloom shows that the loop invariant is preserved by the inner loop. Thus, LOOP gives: { 𝑁 ⊛ 𝛽=0 Own(𝑏𝑙𝑜𝑜𝑚 [𝛽])} inner { 𝑁 ⊛ 𝛽=0 Own(𝑏𝑙𝑜𝑜𝑚 [𝛽])} Next, we turn to the outer loop. The argument showing that the invariant is preserved by the outer loop follows from a straightforward argument, since the 87 outer loop only modifies 𝑏𝑙𝑜𝑜𝑚 through the inner loop, so LOOP gives: { 𝑁 ⊛ 𝛽=0 Own(𝑏𝑙𝑜𝑜𝑚 [𝛽])} outer { 𝑁 ⊛ 𝛽=0 Own(𝑏𝑙𝑜𝑜𝑚 [𝛽])} Then, we have: {⊤} BLOOM { 𝑁 ⊛ 𝛽=0 Own(𝑏𝑙𝑜𝑜𝑚 [𝛽])} because initializing 𝑏𝑙𝑜𝑜𝑚 to the all-zeros vector, a deterministic value, estab- lishes the loop invariant. This judgment shows that the 𝑏𝑙𝑜𝑜𝑚 vector satisfies NA at the end of the program. We now apply the Chernoff bound to the NA variables (NA-Chernoff-2) to prove that, with high probability, the number of occupied bins in BLOOM is near its mean with high probability:{ ⊤ } BLOOM { Pr  ������ 𝑁∑︁𝛽=0 𝑏𝑙𝑜𝑜𝑚 [𝛽] − E  𝑁∑︁ 𝛽=0 𝑏𝑙𝑜𝑜𝑚 [𝛽]  ������ ≥ 𝑇 (𝛿, 𝑁)  ≤ 𝛿 } . This concentration bound implies that a tail bound, which says with high prob- ability ∑𝑁 𝛽=0 𝑏𝑙𝑜𝑜𝑚 [𝛽] is upper bounded by its expected value plus 𝑇 (𝛿, 𝑁),{ ⊤ } BLOOM { Pr  𝑁∑︁ 𝛽=0 𝑏𝑙𝑜𝑜𝑚 [𝛽] < E  𝑁∑︁ 𝛽=0 𝑏𝑙𝑜𝑜𝑚 [𝛽]  + 𝑇 (𝛿, 𝑁)  ≥ 1 − 𝛿 } . (3.1) Furthermore, since the first line of the program, 𝑏𝑙𝑜𝑜𝑚 has been kept as a bit-array, i.e., all its entries are either 0 or 1. So it is easy to prove that {⊤} BLOOM { 𝑁∧ 𝛽=0 (𝑏𝑙𝑜𝑜𝑚 [𝛽] = 0 ∨ 𝑏𝑙𝑜𝑜𝑚 [𝛽] = 1)} . In the following, we will abbreviate formulas that assert 𝑏 is a bit-array where exactly 𝐽 of its first 𝑁 entries are one, i.e., ©­« 𝑁∑︁ 𝛽=0 𝑏[𝛽] = 𝐽ª®¬ ∧ 𝑁∧ 𝛽=0 (𝑏[𝛽] = 0 ∨ 𝑏[𝛽] = 1), 88 as bv(𝑏, 𝐽, 𝑁). Similarly, we will use bv(𝑏, < 𝐽, 𝑁) to abbreviate ©­« 𝑁∑︁ 𝛽=0 𝑏[𝛽] < 𝐽ª®¬ ∧ 𝑁∧ 𝛽=0 (𝑏[𝛽] = 0 ∨ 𝑏[𝛽] = 1). Now we restate our goal as {bv(𝑏𝑙𝑜𝑜𝑚, < 𝐾, 𝑁)} CHECKMEM {Pr[𝑎𝑙𝑙ℎ𝑖𝑡] ≤ (𝐾/𝑁)𝐻} . CHECKMEM(𝐻, 𝑏𝑙𝑜𝑜𝑚) : ℎ← 0; 𝑎𝑙𝑙ℎ𝑖𝑡 ← 1 while ℎ < 𝐻 do 𝑏𝑖𝑛 $← Unif [𝑁] ; ℎ𝑖𝑡 ← 𝑏𝑙𝑜𝑜𝑚 [𝑏𝑖𝑛]; 𝑎𝑙𝑙ℎ𝑖𝑡 ← ℎ𝑖𝑡&& 𝑎𝑙𝑙ℎ𝑖𝑡; ℎ← ℎ + 1; Figure 3.4: Check the membership of a new item Bounding the false positive rate Now, we turn to verifying a bound on the false positive rate of the Bloom filter. Recall that a false positive occurs if the filter returns true when querying with an element that was not inserted. We can encode the membership check of a new element as a program CHECKMEM (𝐻, 𝑏𝑙𝑜𝑜𝑚), listed in Figure 3.4. The program hashes the new element into 𝐻 uniformly random positions and checks if these positions are all set to one in the filter. If so, the Bloom filter will report that the new element is in set, even though it was never inserted — a false positive. To verify the false positive rate, we place the program CHECKMEM(𝐻, 𝑏𝑙𝑜𝑜𝑚) immediately after BLOOM, and then verify a bound on the probability that 𝑎𝑙𝑙ℎ𝑖𝑡 is 1 at the end of the combined program. 89 CHECKMEM first initializes ℎ and 𝑎𝑙𝑙ℎ𝑖𝑡 deterministically to 1. Then, us- ing RASSN and FRAME, we can show that ⊢ {⊤} ℎ← 0; 𝑎𝑙𝑙ℎ𝑖𝑡 ← 1 {[ℎ = 0] ∗ [𝑎𝑙𝑙ℎ𝑖𝑡 = 1]} . Using the (ProbOne) axiom and the fact that 1 ≤ (𝐾/𝑁)0 for any 𝐾 and 𝑁 , we can show |= ( [ℎ = 0]) ∗ ([𝑎𝑙𝑙ℎ𝑖𝑡 = 1]) → Pr[𝑎𝑙𝑙ℎ𝑖𝑡] ≤ (𝐾/𝑁)ℎ. Thus, ⊢ {⊤} ℎ← 0; 𝑎𝑙𝑙ℎ𝑖𝑡 ← 1 {Pr[𝑎𝑙𝑙ℎ𝑖𝑡] ≤ (𝐾/𝑁)ℎ} . Because the assignments ℎ← 0; 𝑎𝑙𝑙ℎ𝑖𝑡 ← 1 do not modify the Bloom filter array 𝑏𝑙𝑜𝑜𝑚, we can then apply FRAME to derive ⊢ {bv(𝑏𝑙𝑜𝑜𝑚, < 𝐾, 𝑁)} ℎ← 0; 𝑎𝑙𝑙ℎ𝑖𝑡 ← 1 {bv(𝑏𝑙𝑜𝑜𝑚, < 𝐾, 𝑁) ∗ Pr[𝑎𝑙𝑙ℎ𝑖𝑡] ≤ (𝐾/𝑁)ℎ} . (3.2) We will abbreviate bv(𝑏𝑙𝑜𝑜𝑚, < 𝐾, 𝑁) ∗ Pr[𝑎𝑙𝑙ℎ𝑖𝑡] ≤ (𝐾/𝑁)ℎ ∧ ℎ < 𝐻 as 𝜂. Be- cause ∑𝑁 𝛽=0 𝑏[𝛽] is an integer upper bounded by 𝑁 , |=Mem 𝜂→ ∨ 0≤𝐽<𝐾 𝜂𝐽 , where 𝜂𝐽 abbreviates 𝐽 < 𝐾 ∧ ( bv(𝑏𝑙𝑜𝑜𝑚, 𝐽, 𝑁) ∗ Pr[𝑎𝑙𝑙ℎ𝑖𝑡] ≤ (𝐾/𝑁)ℎ ) ∧ ℎ ≤ 𝐻. We will then prove that for each 𝐽, the formula 𝜂𝐽 is a loop invariant of CHECKMEM’s loop body. The loop body first uniformly samples an element from [𝑁], so by SAMP and FRAME, we have the following as the post-condition: 𝜂𝐽 ∗ Unif [𝑁] ⟨𝑏𝑖𝑛⟩. (3.3) Together with the axiom |= ((𝑃∧𝑄) ∗ 𝑅) → (𝑃∧ (𝑄 ∗ 𝑅)), the post-condition 3.3 implies 𝐽 < 𝐾 ∧ ℎ ≤ 𝐻 ∧ ( bv(𝑏𝑙𝑜𝑜𝑚, 𝐽, 𝑁) ∗ ( Pr[𝑎𝑙𝑙ℎ𝑖𝑡] ≤ (𝐾/𝑁)ℎ ) ∗ Unif [𝑁] ⟨𝑏𝑖𝑛⟩ ) . 90 Then, ℎ𝑖𝑡 gets assigned to 𝑏𝑙𝑜𝑜𝑚 [𝑏𝑖𝑛], so by RASSN and CONST, we have {bv(𝑏𝑙𝑜𝑜𝑚, 𝐽, 𝑁) ∗ Unif [𝑁] ⟨𝑏𝑖𝑛⟩} ℎ𝑖𝑡 ← 𝑏𝑙𝑜𝑜𝑚 [𝛽] { (bv(𝑏𝑙𝑜𝑜𝑚, 𝐽, 𝑁) ∗ Unif [𝑁] ⟨𝑏𝑖𝑛⟩ ) ∧ [ℎ𝑖𝑡 = 𝑏𝑙𝑜𝑜𝑚 [𝑏𝑖𝑛]]} . Since the array 𝑏𝑙𝑜𝑜𝑚 only contains zero-one entries, when the sum of its entries is 𝐽, an entry 𝑏𝑙𝑜𝑜𝑚 [𝑏𝑖𝑛] drawn uniformly at random has probability 𝐽 𝑁 to be 1. If the entry is in addition chosen independently from values in 𝑏𝑙𝑜𝑜𝑚, then the bit 𝑏𝑙𝑜𝑜𝑚 [𝑏𝑖𝑛] is distributed independent from the distribution of 𝑏𝑙𝑜𝑜𝑚. The (UniformSamp) axiom encodes this fact: |= ( ( bv(𝑏, 𝐽, 𝑁) ∗ Unif [𝑁] ⟨𝑥⟩ ) ∧ [ℎ𝑖𝑡 = 𝑏[𝑥]] ) → Bern 𝐽 𝑁 ⟨ℎ𝑖𝑡⟩ ∗ bv(𝑏, 𝐽, 𝑁). Thus, we have {bv(𝑏𝑙𝑜𝑜𝑚, 𝐽, 𝑁) ∗ Unif [𝑁] ⟨𝑏𝑖𝑛⟩} ℎ𝑖𝑡 ← 𝑏𝑙𝑜𝑜𝑚 [𝛽] {Bern 𝐽 𝑁 ⟨ℎ𝑖𝑡⟩ ∗ bv(𝑏𝑙𝑜𝑜𝑚, 𝐽, 𝑁)} . Because ℎ𝑖𝑡 ← 𝑏𝑙𝑜𝑜𝑚 [𝑏𝑖𝑛] does not modify 𝑎𝑙𝑙ℎ𝑖𝑡, we can apply FRAME and get{ bv(𝑏𝑙𝑜𝑜𝑚, 𝐽, 𝑁) ∗ Unif [𝑁] ⟨𝑏𝑖𝑛⟩ ∗ Pr[𝑎𝑙𝑙ℎ𝑖𝑡] ≤ ( 𝐾 𝑁 )ℎ} ℎ𝑖𝑡 ← 𝑏𝑙𝑜𝑜𝑚 [𝛽]{ Bern 𝐽 𝑁 ⟨ℎ𝑖𝑡⟩ ∗ bv(𝑏𝑙𝑜𝑜𝑚, 𝐽, 𝑁) ∗ Pr[𝑎𝑙𝑙ℎ𝑖𝑡] ≤ ( 𝐾 𝑁 )ℎ} . Next, with the assignment 𝑎𝑙𝑙ℎ𝑖𝑡 ← ℎ𝑖𝑡&& 𝑎𝑙𝑙ℎ𝑖𝑡, by applying the RASSN rule and the axioms (IndepProb), (EqualProb), we get:{ Bern 𝐽 𝑁 ⟨ℎ𝑖𝑡⟩ ∗ bv(𝑏𝑙𝑜𝑜𝑚, 𝐽, 𝑁) ∗ Pr[𝑎𝑙𝑙ℎ𝑖𝑡] ≤ ( 𝐾 𝑁 )ℎ} 𝑎𝑙𝑙ℎ𝑖𝑡 ← ℎ𝑖𝑡&& 𝑎𝑙𝑙ℎ𝑖𝑡{( Pr[𝑎𝑙𝑙ℎ𝑖𝑡] ≤ 𝐽 𝑁 · ( 𝐾 𝑁 )ℎ) ∗ bv(𝑏𝑙𝑜𝑜𝑚, 𝐽, 𝑁) } 91 We can then apply the rule of constancy CONST and get{ 𝐽 < 𝐾 ∧ ℎ ≤ 𝐻 ∧ ( bv(𝑏𝑙𝑜𝑜𝑚, 𝐽, 𝑁) ∗ Unif [𝑁] ⟨𝑏𝑖𝑛⟩ ∗ Pr[𝑎𝑙𝑙ℎ𝑖𝑡] ≤ ( 𝐾 𝑁 )ℎ)} ℎ𝑖𝑡 ← 𝑏𝑙𝑜𝑜𝑚 [𝛽]; 𝑎𝑙𝑙ℎ𝑖𝑡 ← ℎ𝑖𝑡&& 𝑎𝑙𝑙ℎ𝑖𝑡{ 𝐽 < 𝐾 ∧ ℎ ≤ 𝐻 ∧ (( Pr[𝑎𝑙𝑙ℎ𝑖𝑡] ≤ 𝐽 𝑁 · ( 𝐾 𝑁 )ℎ) ∗ bv(𝑏𝑙𝑜𝑜𝑚, 𝐽, 𝑁) )} When we have 𝐽 < 𝐾 , then (𝐾/𝑁)ℎ · 𝐽 𝑁 ≤ (𝐾/𝑁)ℎ+1, so the postcondition implies 𝐽 < 𝐾 ∧ ℎ ≤ 𝐻 ∧ (( Pr[𝑎𝑙𝑙ℎ𝑖𝑡] ≤ ( 𝐾 𝑁 )ℎ+1) ∗ bv(𝑏𝑙𝑜𝑜𝑚, 𝐽, 𝑁) ) The last step in the loop body is the assignment ℎ← ℎ + 1. By the deterministic assignment rule DASSN, we can establish the postcondition 𝜂𝐽 afterwards: 𝐽 < 𝐾 ∧ ℎ ≤ 𝐻 ( bv(𝑏𝑙𝑜𝑜𝑚, 𝐽, 𝑁) ∗ Pr[𝑎𝑙𝑙ℎ𝑖𝑡] ≤ ( 𝐾 𝑁 )ℎ) . Thus, we have {𝜂𝐽} loop body {𝜂𝐽} By LOOP rule, we can establish {𝜂𝐽} 𝑙𝑜𝑜𝑝 {𝜂𝐽 ∧ ℎ ≥ 𝐻}. Recall 𝜂 abbreviates bv(𝑏𝑙𝑜𝑜𝑚, < 𝐾, 𝑁) ∗ Pr[𝑎𝑙𝑙ℎ𝑖𝑡] ≤ (𝐾/𝑁)ℎ∧ℎ < 𝐻, so the post-condition 𝜂𝐽∧ℎ ≥ 𝐻 implies ℎ = 𝐻, which further implies Pr[𝑎𝑙𝑙ℎ𝑖𝑡] ≤ (𝐾/𝑁)𝐻 . We then have {𝜂𝐽} 𝑙𝑜𝑜𝑝 {Pr[𝑎𝑙𝑙ℎ𝑖𝑡] ≤ ( 𝐾 𝑁 )𝐻 } . Because Pr[𝑎𝑙𝑙ℎ𝑖𝑡] ≤ (𝐾/𝑁)𝐻 is closed under mixtures, and 𝜂 is closed under conditioning, we can then apply RCASE to prove that {𝜂} 𝑙𝑜𝑜𝑝 {Pr[𝑎𝑙𝑙ℎ𝑖𝑡] ≤ ( 𝐾 𝑁 )𝐻 } . (3.4) Using the SEQN rule to combine the proved judgments for CHECKMEM’s ini- tialization (3.2) and loop (3.4), we derive {bv(𝑏𝑙𝑜𝑜𝑚, < 𝐾, 𝑁)} CHECKMEM {Pr[𝑎𝑙𝑙ℎ𝑖𝑡] ≤ ( 𝐾 𝑁 )𝐻 } 92 Then, by the PROBBOUND rule and basic axioms about probabilities, we have {Pr  𝑁∑︁ 𝛽=0 𝑏𝑙𝑜𝑜𝑚 [𝛽] < 𝐾  ≥ 1 − 𝛿 ∧ 𝑁∧ 𝛽=0 (𝑏[𝛽] = 0 ∨ 𝑏[𝛽] = 1)} CHECKMEM {Pr[𝑎𝑙𝑙ℎ𝑖𝑡] ≤ (𝐾/𝑁)𝐻 + 𝛿} . (3.5) We then use SEQN to combine the proved judgements for BLOOM (3.1) and CHECKMEM (3.5) to derive that, for any 𝛿, {⊤} BLOOM; CHECKMEM {Pr[𝑎𝑙𝑙ℎ𝑖𝑡] ≤ ©­­« E [∑𝑁 𝛽=0 𝑏𝑙𝑜𝑜𝑚 [𝛽] ] + 𝑇 (𝛿, 𝑁) 𝑁 ª®®¬ 𝐻 + 𝛿} . Since 𝑎𝑙𝑙ℎ𝑖𝑡 is 1 exactly when there is a false positive, this judgment proves an upper bound on the false positive rate of the Bloom filter.4 3.6.3 Bloom filter, Low-level The previous Bloom filter uses a vector operation 𝑏𝑙𝑜𝑜𝑚 | | 𝑏𝑖𝑛 to transform an array of negatively associated values. We next consider a lower-level version of the previous example, BLOOMARRAY, in Figure 3.3b, where the vector operation is replaced by a loop that applies the Boolean-or. Let outer and mid be the outer-most and second outer-most loops, and let inner be the inner-most loop. Again, our goal is to show that the vector 𝑏𝑙𝑜𝑜𝑚 is negatively associated at the end of the program. We first prove the following 4The precise expected value is 𝑁 · (1 − (1 − 1/𝑁)𝑀 ·𝐻 ), a fact that can also be shown in our logic. Roughly speaking, this fact follows because each element of 𝑏𝑙𝑜𝑜𝑚 is the logical-or of 𝑀 ·𝐻 probabilistically independent bits, each 1 with probability 1/𝑁 and 0 otherwise. This argument does not rely on negative association. 93 judgment for inner: { 𝑁 ⊛ 𝛽=0 Own(𝑏𝑙𝑜𝑜𝑚 [𝛽]) ∗ 𝑁 ⊛ 𝛾=0 Own(𝑏𝑖𝑛[𝛾])} inner { 𝑁 ⊛ 𝛽=0 Own(𝑏𝑙𝑜𝑜𝑚 [𝛽]) ⊛ 𝑁 ⊛ 𝛾=𝑛 Own(𝑏𝑖𝑛[𝛾])} We will apply the rule LOOP on inner with the following loop invariant: 𝜑 = 𝑁 ⊛ 𝛽=0 Own(𝑏𝑙𝑜𝑜𝑚 [𝛽]) ⊛ 𝑁 ⊛ 𝛾=𝑛 Own(𝑏𝑖𝑛[𝛾]) To show that the loop invariant is preserved by the body, we can first show: {Own(𝑏𝑙𝑜𝑜𝑚 [𝑛], 𝑏𝑖𝑛[𝑛])} 𝑢𝑝𝑑 ← 𝑏𝑙𝑜𝑜𝑚 [𝑛] | | 𝑏𝑖𝑛[𝑛] {[𝑢𝑝𝑑 = 𝑏𝑙𝑜𝑜𝑚 [𝑛] | | 𝑏𝑖𝑛[𝑛]]} using RASSN. Noting that the boolean-or operator is a monotone operation, we may apply NEGFRAME to obtain: {Own(𝑏𝑙𝑜𝑜𝑚 [𝑛], 𝑏𝑖𝑛[𝑛]) ⊛ 𝜂} 𝑢𝑝𝑑 ← 𝑏𝑙𝑜𝑜𝑚 [𝑛] | | 𝑏𝑖𝑛[𝑛] {Own(𝑢𝑝𝑑) ⊛ 𝜂} with the framing condition 𝜂 = ©­« 𝑛−1 ⊛ 𝛽=0 Own(𝑏𝑙𝑜𝑜𝑚 [𝛽])ª®¬ ⊛ ©­« 𝑁 ⊛ 𝛽=𝑛+1 Own(𝑏𝑙𝑜𝑜𝑚 [𝛽])ª®¬ ⊛ ©­« 𝑁 ⊛ 𝛾=𝑛+1 Own(𝑏𝑖𝑛[𝛾])ª®¬. Thus, by re-associating the separating conjunction and applying DASSN for the remaining two assignments in the inner-most loop, we have: {𝜑} 𝑢𝑝𝑑 ← 𝑏𝑙𝑜𝑜𝑚 [𝑛] | | 𝑏𝑖𝑛[𝑛]; 𝑏𝑙𝑜𝑜𝑚 [𝑛] ← 𝑢𝑝𝑑; 𝑛← 𝑛 + 1 {𝜑} and thus by LOOP, we have: { 𝑁 ⊛ 𝛽=0 Own(𝑏𝑙𝑜𝑜𝑚 [𝛽]) ⊛ 𝑁 ⊛ 𝛾=𝑛 Own(𝑏𝑖𝑛[𝛾])} inner { 𝑁 ⊛ 𝛽=0 Own(𝑏𝑙𝑜𝑜𝑚 [𝛽]) ⊛ 𝑁 ⊛ 𝛾=𝑛 Own(𝑏𝑖𝑛[𝛾])} . Now for loop mid, we establish the same loop invariant as we took before: 𝜓 = 𝑁 ⊛ 𝛽=0 Own(𝑏𝑙𝑜𝑜𝑚 [𝛽]) 94 If 𝜓 holds at the beginning of mid, then invariant for the inner-most loop 𝜑 holds after assigning 0 to 𝑛 and sampling 𝑏𝑖𝑛, since 𝑏𝑖𝑛 is independent of 𝜓 and 𝑏𝑖𝑛 is distributed as OH𝑛, which implies entries in 𝑏𝑖𝑛 are negatively associated (OH-PNA). Furthermore, 𝜑 implies 𝜓 at the exit of inner, by dropping the con- junct describing 𝑏𝑖𝑛. Thus, 𝜓 is a valid invariant for mid, and the rest of the proof proceeds unchanged. 3.6.4 Permutation Hashing PERMHASH : 𝑔 $← Permu[𝐵·𝐾] ; 𝑛← 0; 𝑐𝑡 ← 0; while 𝑛 < 𝑁 do 𝑏𝑖𝑛[𝑛] ← 𝑚𝑜𝑑 (𝑔[𝑛], 𝐵); ℎ𝑖𝑡𝑍 [𝑛] ← [𝑏𝑖𝑛[𝑛] = 𝑍]; 𝑐𝑡 ← 𝑐𝑡 + ℎ𝑖𝑡𝑍 [𝑛]; 𝑛← 𝑛 + 1 Figure 3.5: Permutation hashing Our second example considers a scheme for hashing using a random per- mutation. Consider the program in Figure 3.5, from an algorithm for fast set intersection [Ding and König, 2011]. Letting 𝐵 be the number of bins, and the data universe be [𝐵 · 𝐾] = {1, . . . , 𝐵 · 𝐾} where 𝐵 · 𝐾 ≥ 𝑁 , we first draw a uni- formly random permutation 𝑔 of the data universe. Then, we hash the numbers 𝑛 ∈ [𝑁] into 𝑏𝑖𝑛[𝑛] by applying the hash function 𝑔 and then taking the result modulo 𝐵. Then, we record whether the item landed in a specific bucket 𝑍 by computing the indicator ℎ𝑖𝑡𝑍 [𝑛] = [𝑏𝑖𝑛[𝑛] = 𝑍], which is 1 if 𝑏𝑖𝑛[𝑛] = 𝑍 and 0 otherwise, and accumulate the result into the count 𝑐𝑡. 95 Our goal is to show that 𝑐𝑡 is usually not far from its expected value, which is 𝑁/𝐵. If the quantities {[𝑏𝑖𝑛[𝑛] = 𝑍]}𝑛 were independent, we would be able to apply a standard concentration bound to the sum 𝑐𝑡. However, {[𝑏𝑖𝑛[𝑛] = 𝑍]}𝑛 are not independent: for instance, since exactly 𝐾 elements from [𝐵 · 𝐾] map to 𝑍 , if 𝑏𝑖𝑛[𝑛] = 𝑍 for 𝑛 ∈ {0, 1, . . . , 𝐾 − 1}, then 𝑏𝑖𝑛[𝐾] = 𝑍 must be false. Nevertheless, we can show that {[𝑏𝑖𝑛[𝑛] = 𝑍]}𝑛 are negatively associated random variables. Intuitively, {𝑔[𝑛]}𝑛 are NA random variables because the result of a uniformly random permutation is NA. Then, {𝑏𝑖𝑛[𝑛]}𝑛 is computed by mapping the function𝑚𝑜𝑑 (−, 𝐵) over the array 𝑔; since this produces another uniform permutation distribution, the vector {𝑏𝑖𝑛[𝑛]}𝑛 is also NA. By similar reasoning {[𝑏𝑖𝑛[𝑛] = 𝑍]}𝑛 is also NA, as it is obtained by mapping the function [− = 𝑍] over {𝑏𝑖𝑛[𝑛]}𝑛. We formalize the reasoning using the program logic LINA. For the main loop, we apply the rule LOOP with the following loop invariant: 𝑛∧ 𝛼=0 [ℎ𝑖𝑡𝑍 [𝛼] = [𝑚𝑜𝑑 (𝑔[𝛼], 𝐵) = 𝑍]] ∧ Permu[𝐵·𝐾] ⟨𝑔⟩ ∧ 𝑐𝑡 = 𝑛∑︁ 𝛼=0 ℎ𝑖𝑡𝑍 [𝛼] ∧ ((𝑛 ≥ 𝑁) → [𝑛 = 𝑁]) The loop invariant is preserved by the body of the loop, using RASSN and CONST. Thus we can show the following judgment: {[𝑐𝑡 = 0] ∧ [𝑛 = 0]} 𝑙𝑜𝑜𝑝 { 𝑁∧ 𝛼=0 [ℎ𝑖𝑡𝑍 [𝛼] = [𝑚𝑜𝑑 (𝑔[𝛼], 𝐵) = 𝑍]] ∧ Permu[𝐵·𝐾] ⟨𝑔⟩ ∧ [ 𝑐𝑡 = 𝑁∑︁ 𝛼=0 ℎ𝑖𝑡𝑍 [𝛼] ] } 96 Applying (Perm-Map), the post-condition implies: 𝑁∧ 𝛼=0 [ℎ𝑖𝑡𝑍 [𝛼] = [𝑚𝑜𝑑 (𝑔[𝛼], 𝐵) = 𝑍]] ∧ 𝑁∗ 𝛼=0 Own(ℎ𝑖𝑡𝑍 [𝛼]) ∧ Permu[𝐵·𝐾] ⟨𝑔⟩ ∧ [ 𝑐𝑡 = 𝑁∑︁ 𝛼=0 ℎ𝑖𝑡𝑍 [𝛼] ] Applying basic axioms about expected value and the permutation distribution ((PermMarg) (ProbUnif) (BijectUnif)), we have: 𝑁∗ 𝛼=0 Own(ℎ𝑖𝑡𝑍 [𝛼]) ∧ [ 𝑐𝑡 = 𝑁∑︁ 𝛼=0 ℎ𝑖𝑡𝑍 [𝛼] ] ∧ [E[𝑐𝑡] = 𝑁/𝐵] And we can apply the negative-association Chernoff bound (NA-Chernoff-2) to conclude: {⊤} PERMHASH {Pr[|𝑐𝑡 − 𝑁/𝐵 | > 𝑇 (𝛽, 𝑁)] < 𝛽} This conclusion corresponds to Proposition A.2 in Ding and König [2011] algo- rithm for fast set intersection.5 3.6.5 Fully-dynamic Dictionary For our next example, we consider a hashing scheme for a fully-dynamic dic- tionary, a space-efficient data structure that supports insertions, deletions, and membership queries. The top level of the data structure by Bercea and Even [2022] uses a two-level hashing scheme: elements are first hashed into a crate, and then hashed into a pocket dictionary within each crate. As part of the space analysis of their scheme, Bercea and Even [2022] proves a high-probability 5Ding and König [2011] apply a variant of the Chernoff bound to obtain a multiplicative, rather than an additive, error guarantee. We present the additive version since the bound is a bit simpler, but there is no difficulty in handling the multiplicative version in our framework. 97 FDDICT : 𝑏𝑖𝑛𝐶𝑡 ← zero(𝐶, 𝑃); 𝑜𝑣𝑒𝑟𝐶𝑡 ← zero(𝐶); 𝑛← 0; while 𝑛 < 𝑁 do 𝑐𝑟𝑎𝑡𝑒[𝑛] $← OH[𝐶] ; 𝑝𝑜𝑐𝑘𝑒𝑡 [𝑛] $← OH[𝑃] ; 𝑏𝑖𝑛[𝑛] ← 𝑐𝑟𝑎𝑡𝑒[𝑛]⊤ · 𝑝𝑜𝑐𝑘𝑒𝑡 [𝑛]; 𝑐 ← 0; while 𝑐 < 𝐶 do 𝑝 ← 0; while 𝑝 < 𝑃 do 𝑢𝑝𝑑 ← 𝑏𝑖𝑛𝐶𝑡 [𝑐] [𝑝] + 𝑏𝑖𝑛[𝑛] [𝑐] [𝑝]; 𝑏𝑖𝑛𝐶𝑡 [𝑐] [𝑝] ← 𝑢𝑝𝑑; 𝑝 ← 𝑝 + 1; 𝑐 ← 𝑐 + 1; 𝑛← 𝑛 + 1; 𝑐 ← 0; while 𝑐 < 𝐶 do 𝑝 ← 0; while 𝑝 < 𝑃 do 𝑜𝑣𝑒𝑟 [𝑐] [𝑝] ← [𝑏𝑖𝑛𝐶𝑡 [𝑐] [𝑝] > 𝑇𝑏𝑖𝑛]; 𝑢𝑝𝑑 ← 𝑜𝑣𝑒𝑟𝐶𝑡 [𝑐] + 𝑜𝑣𝑒𝑟 [𝑐] [𝑝]; 𝑜𝑣𝑒𝑟𝐶𝑡 [𝑐] ← 𝑢𝑝𝑑; 𝑝 ← 𝑝 + 1; 𝑐 ← 𝑐 + 1 Figure 3.6: Fully-dynamic dictionary [Ding and König, 2011] bound on the number of pocket dictionaries that overflow after a given number of elements are inserted. We extract the program FDDICT in Figure 3.6 from the scheme in Bercea and Even [2022]. The program models the insertion of 𝑁 elements. Each ele- ment is first hashed into one of 𝐶 possible crates uniformly at random and then hashed into one of 𝑃 possible pocket dictionaries uniformly at random. The variable 𝑏𝑖𝑛[𝑛] is a 𝐶 by 𝑃 matrix, with all entries zero except for the entry at (𝑐𝑟𝑎𝑡𝑒[𝑛], 𝑝𝑜𝑐𝑘𝑒𝑡 [𝑛]), which is set to 1. Next, the program totals up the number 98 of elements hashing to each (crate, pocket) pair, storing the result in the 𝐶 by 𝑃 matrix 𝑏𝑖𝑛𝐶𝑡. Finally, the program checks which (𝑐𝑟𝑎𝑡𝑒, 𝑝𝑜𝑐𝑘𝑒𝑡) pairs have count larger than some concrete threshold 𝑇𝑏𝑖𝑛 and records that in 𝑜𝑣𝑒𝑟, and totals up the number of full pocket dictionaries in each crate (𝑜𝑣𝑒𝑟𝐶𝑡). Our logic can prove a judgment of the following form for 𝑇𝑏𝑖𝑛 ≥ 𝑁/(𝑃 · 𝐶): {⊤} FDDICT { 𝐶∧ 𝛾=0 Pr[𝑜𝑣𝑒𝑟𝐶𝑡 [𝛾] ≥ 𝑃 · 𝐹 (𝑇𝑏𝑖𝑛 − 𝑁/(𝑃 · 𝐶), 𝑁) + 𝑇 (𝜌𝑜𝑣𝑒𝑟 , 𝑃)] ≤ 𝜌𝑜𝑣𝑒𝑟}, where the logical variables 𝜌𝑏𝑖𝑛 and 𝜌𝑜𝑣𝑒𝑟 represents the parametric overflow properties. This formalizes a result similar to Bercea and Even [2022, Claim 21], which states that except with probability 𝛽, all crates have at most 𝑇𝑜𝑣𝑒𝑟 overfull pocket dictionaries. The core of the proof shows that for every crate index 𝛾, the counts 𝑏𝑖𝑛𝐶𝑡 [𝛾] [𝛽] are negatively associated, using the NEGFRAME rule as in the array version of the Bloom filter example. Then, we show that vector 𝑜𝑣𝑒𝑟 [𝛾] [𝛽], which indicates whether each pocket dictionary 𝛽 in crate 𝛾 is overfull or not, is also negatively associated. This holds because 𝑜𝑣𝑒𝑟 [𝛾] [𝛽] is obtained from 𝑏𝑖𝑛𝐶𝑡 [𝛾] [𝛽] by applying a monotone function. Furthermore, the count of overflows 𝑜𝑣𝑒𝑟𝐶𝑡 [𝛾] is obtained by another monotone function on 𝑜𝑣𝑒𝑟 [𝛾] [𝛽] and thus its entries are also negatively associated. Now we prove each step using the program logic. We will refer to the two outer-most loops as (1) and (2), the next two outer-most loops as (1.1) and (2.1), and the inner-most loop as (1.1.1). Computing E[𝑏𝑖𝑛𝐶𝑡 [𝑐] [𝑝]]. For loop (1), we apply LOOP with the following loop invariant 𝜑: 𝐶∧ 𝛾=0 𝑃∧ 𝛽=0 [E[𝑏𝑖𝑛𝐶𝑡 [𝛾] [𝛽]] = 𝑛/(𝑃 · 𝐶)] ∧ ([𝑛 ≥ 𝑁] → [𝑛 = 𝑁]) ∧ Detm⟨𝑛⟩. 99 To show that this invariant is preserved by the loop, by applications of SAMP and RASSN and CONST and FRAME, the following holds after the sampling and the assignment: OH[𝑃] ⟨𝑝𝑜𝑐𝑘𝑒𝑡 [𝑛]⟩ ∗ OH[𝐶] ⟨𝑐𝑟𝑎𝑡𝑒[𝑛]⟩ ∧ [ 𝑏𝑖𝑛[𝑛] = 𝑐𝑟𝑎𝑡𝑒[𝑛]⊤ · 𝑝𝑜𝑐𝑘𝑒𝑡 [𝑛] ] . (3.6) Using an axiom about independence and products of one-hot vectors (IndProdOH), this implies: OH[𝐶]×[𝑃] ⟨𝑏𝑖𝑛[𝑛]⟩. Using an axiom about the one-hot encoding (OHMarg): E[𝑏𝑖𝑛[𝛼] [𝛾] [𝛽]] = 1/(𝑃 · 𝐶) for every 𝛼, 𝛾, and 𝛽. Standard loop invariants for loop (1.1) and (1.1.1) show that: [ 𝑏𝑖𝑛𝐶𝑡 [𝑐] [𝑝] = 𝑛∑︁ 𝛼=0 𝑏𝑖𝑛[𝛼] [𝑐] [𝑝] ] , and linearity of expectation establishes the invariant condition 3.6 for loop (1). The invariant holds at the start of the loop (1) since 𝑏𝑖𝑛𝐶𝑡 is zero-initialized, and it also holds at the end of the loop (1). Since 𝑏𝑖𝑛𝐶𝑡 is not modified further, the expectation equality remains valid at the end of the program due to CONST. Bounding Pr[𝑏𝑖𝑛𝐶𝑡 [𝑐] [𝑝] > 𝑇𝑏𝑖𝑛]. For loop (1), we also apply LOOP with the following loop invariant:( 𝑛∗ 𝛼=0 Own(𝑏𝑖𝑛[𝛼]) ) ∧ 𝐶∧ 𝛾=0 𝑃∧ 𝛽=0 [ 𝑏𝑖𝑛𝐶𝑡 [𝛾] [𝛽] = 𝑁∑︁ 𝛼=0 𝑏𝑖𝑛[𝛼] [𝛾] [𝛽] ] ∧ [𝑛 = 𝑁] ∧Detm⟨𝑛⟩. The first conjunction is an invariant, by applying SAMP and FRAME. The rest of the invariant is preserved, following standard invariants for loops (1.1) and 100 (1.1.1). By projection (IndMap), at the end of the loop (1) we can conclude: 𝐶∧ 𝛾=0 𝑃∧ 𝛽=0 ( 𝑁∗ 𝛼=0 Own(𝑏𝑖𝑛[𝛼] [𝛾] [𝛽]) ) ∧ [ 𝑏𝑖𝑛𝐶𝑡 [𝛾] [𝛽] = 𝑁∑︁ 𝛼=0 𝑏𝑖𝑛[𝛼] [𝛾] [𝛽] ] . Thus, a standard Chernoff bound gives (here, we apply NA-Chernoff-1 to in- dependent variables by first changing the independence star into NA star us- ing WEAK) : 𝐶∧ 𝛾=0 𝑃∧ 𝛽=0 Pr[𝑏𝑖𝑛𝐶𝑡 [𝛾] [𝛽] ≥ E[𝑏𝑖𝑛𝐶𝑡 [𝛾] [𝛽]] + 𝜌𝑏𝑖𝑛] ≤ 𝐹 (𝜌𝑏𝑖𝑛, 𝑁). where E[𝑏𝑖𝑛𝐶𝑡 [𝛾] [𝛽]] is 𝑁/(𝑃 ·𝐶) by the previous step. Thus, by applying CONJ, we can combine the post-conditions and derive 𝐶∧ 𝛾=0 𝑃∧ 𝛽=0 Pr[𝑏𝑖𝑛𝐶𝑡 [𝛾] [𝛽] ≥ 𝑁/(𝑃 · 𝐶) + 𝜌𝑏𝑖𝑛] ≤ 𝐹 (𝜌𝑏𝑖𝑛, 𝑁). (3.7) Again, the property holds until the end of the program since 𝑏𝑖𝑛𝐶𝑡 is not modi- fied further (CONST). Bounding E[𝑜𝑣𝑒𝑟𝐶𝑡 [𝑐]]. Using standard loop invariants, at the end of the loop (2) we have: 𝐶∧ 𝛾=0 𝑜𝑣𝑒𝑟𝐶𝑡 [𝛾] = 𝑃∑︁ 𝛽=0 𝑜𝑣𝑒𝑟 [𝛾] [𝛽]  ∧ 𝑃∧ 𝛽=0 [𝑜𝑣𝑒𝑟 [𝛾] [𝛽] = [𝑏𝑖𝑛𝐶𝑡 [𝛾] [𝛽] > 𝑇𝑏𝑖𝑛]] . Using linearity of expectation and the fact that 𝑜𝑣𝑒𝑟 [𝛾] [𝛽] is either zero or one, we have: E[𝑜𝑣𝑒𝑟𝐶𝑡 [𝛾]] = 𝑃∑︁ 𝛽=0 E[𝑜𝑣𝑒𝑟 [𝛾] [𝛽]] = 𝑃∑︁ 𝛽=0 E[𝑏𝑖𝑛𝐶𝑡 [𝛾] [𝛽] > 𝑇𝑏𝑖𝑛] = 𝑃∑︁ 𝛽=0 Pr[𝑏𝑖𝑛𝐶𝑡 [𝛾] [𝛽] > 𝑇𝑏𝑖𝑛] 101 Because the bound we obtained in eq. (3.7), 𝑃∑︁ 𝛽=0 Pr[𝑏𝑖𝑛𝐶𝑡 [𝛾] [𝛽] > 𝑇𝑏𝑖𝑛] ≤ 𝑃∑︁ 𝛽=0 𝐹 (𝑇𝑏𝑖𝑛 − 𝑁/(𝑃 · 𝐶), 𝑁) = 𝑃 · 𝐹 (𝑇𝑏𝑖𝑛 − 𝑁/(𝑃 · 𝐶), 𝑁) for 𝑇𝑏𝑖𝑛 > 𝑁/(𝑃 · 𝐶). Thus, E[𝑜𝑣𝑒𝑟𝐶𝑡 [𝛾]] = 𝑃 · 𝐹 (𝑇𝑏𝑖𝑛 − 𝑁/(𝑃 · 𝐶), 𝑁). Bounding Pr[𝑜𝑣𝑒𝑟𝐶𝑡 [𝑐]] > 𝑇𝑜𝑣𝑒𝑟]. At the high level, we want the following loop invariant for Loop (1): 𝐶∧ 𝛾=0 𝑃 ⊛ 𝛽=0 Own(𝑏𝑖𝑛𝐶𝑡 [𝛾] [𝛽]) We want the following loop invariant for (1.1): 𝐶∧ 𝛾=𝑐 𝑃 ⊛ 𝛽=0 Own(𝑏𝑖𝑛𝐶𝑡 [𝛾] [𝛽]) ⊛ 𝑃 ⊛ 𝛽=0 Own(𝑏𝑖𝑛[𝑛] [𝛾] [𝛽]) ∧ 𝑐∧ 𝛾=0 𝑃 ⊛ 𝛽=0 Own(𝑏𝑖𝑛𝐶𝑡 [𝛾] [𝛽]) ∧ [𝑐 ≤ 𝐶] ∧ Detm⟨𝑐⟩ And the following loop invariant for (1.1.1): 𝐶∧ 𝛾=𝑐+1 𝑃 ⊛ 𝛽=0 Own(𝑏𝑖𝑛𝐶𝑡 [𝛾] [𝛽]) ⊛ 𝑃 ⊛ 𝛽=0 Own(𝑏𝑖𝑛[𝑛] [𝛾] [𝛽]) ∧ 𝑃 ⊛ 𝛽=0 Own(𝑏𝑖𝑛𝐶𝑡 [𝑐] [𝛽]) ⊛ 𝑃 ⊛ 𝛽=𝑝 Own(𝑏𝑖𝑛[𝑛] [𝑐] [𝛽]) ∧ 𝑐∧ 𝛾=0 𝑃 ⊛ 𝛽=0 Own(𝑏𝑖𝑛𝐶𝑡 [𝛾] [𝛽]) ∧ [𝑐 ≤ 𝐶] ∧ Detm⟨𝑐⟩ ∧ [𝑝 ≤ 𝑃] ∧ Detm⟨𝑝⟩ We show the loops preserve the respective invariant for a fixed 𝛾; the big con- junction then follows by applying CONJ. Working from inside to outside, we 102 start with loop (1.1.1). To establish the invariant condition, the critical case is 𝛾 = 𝑐. We can pull out: 𝜑 := 𝑝 ⊛ 𝛽=0 Own(𝑏𝑖𝑛𝐶𝑡 [𝑐] [𝛽]) ⊛ 𝑃 ⊛ 𝛽=𝑝+1 Own(𝑏𝑖𝑛𝐶𝑡 [𝑐] [𝛽]) ⊛ 𝑃 ⊛ 𝛽=𝑝+1 Own(𝑏𝑖𝑛[𝑛] [𝑐] [𝛽]) ⊛ Own(𝑏𝑖𝑛𝐶𝑡 [𝑐] [𝑝]) ⊛ Own(𝑏𝑖𝑛[𝑛] [𝑐] [𝑝])︸ ︷︷ ︸ Φ Now, we can use the assignment rule to show: {Φ} 𝑢𝑝𝑑 ← 𝑏𝑖𝑛𝐶𝑡 [𝑐] [𝑝] + 𝑏𝑖𝑛[𝑛] [𝑐] [𝑝] {𝜑 ∧ [𝑢𝑝𝑑 = 𝑏𝑖𝑛𝐶𝑡 [𝑐] [𝑝] + 𝑏𝑖𝑛[𝑛]𝑐𝑝]} Since addition is a monotone function, the NA frame rule NEGFRAME gives: 𝑝 ⊛ 𝛽=0 Own(𝑏𝑖𝑛𝐶𝑡 [𝑐] [𝛽]) ⊛ 𝑃 ⊛ 𝛽=𝑝+1 Own(𝑏𝑖𝑛𝐶𝑡 [𝑐] [𝛽]) ⊛ 𝑃 ⊛ 𝛽=𝑝+1 Own(𝑏𝑖𝑛[𝑛] [𝑐] [𝛽]) ⊛ Own(𝑢𝑝𝑑) after the assignment to 𝑢𝑝𝑑. After the assignment to 𝑏𝑖𝑛[𝑐] [𝑝], we can fold it into 𝑃 ⊛ 𝛽=0 Own(𝑏𝑖𝑛𝐶𝑡 [𝑐] [𝛽]) ⊛ 𝑃 ⊛ 𝛽=𝑝+1 Own(𝑏𝑖𝑛[𝑛] [𝑐] [𝛽]). Then, by applying DASSN, we get 𝑃 ⊛ 𝛽=0 Own(𝑏𝑖𝑛𝐶𝑡 [𝑐] [𝛽]) ⊛ 𝑃 ⊛ 𝛽=𝑝 Own(𝑏𝑖𝑛[𝑛] [𝑐] [𝛽]). The assertion for all the other 𝛾 remains unchanged, so we establish the invari- ant for loop (1.1.1). Now we reason about the loop (1.1). Since 𝑏𝑖𝑛𝐶𝑡 is zero-initialized (DetInd), the invariant for loop (1.1.1) holds on loop entry. Then, apply LOOP with the 103 loop invariant established above for loop (1.1.1) gives us 𝐶∧ 𝛾=𝑐+1 𝑃 ⊛ 𝛽=0 Own(𝑏𝑖𝑛𝐶𝑡 [𝛾] [𝛽]) ⊛ 𝑃 ⊛ 𝛽=0 Own(𝑏𝑖𝑛[𝑛] [𝛾] [𝛽]) ∧ 𝑐+1∧ 𝛾=0 𝑃 ⊛ 𝛽=0 Own(𝑏𝑖𝑛𝐶𝑡 [𝛾] [𝛽]) ∧ [𝑐 < 𝐶] ∧ Detm⟨𝑐⟩ after the termination of loop (1.1.1). The program then deterministically in- creases 𝑐 by 1, and by DASSN, we can establish the loop invariant for loop 1.1. Similarly, loop invariant for loop (1) is established when loop (1.1) exits and we increase 𝑛 by 1. Thus, after loop (1) terminates, we have the postcondition: 𝐶∧ 𝛾=0 𝑃 ⊛ 𝛽=0 Own(𝑏𝑖𝑛𝐶𝑡 [𝛾] [𝛽]) Next, we tackle loop (2). We take the invariant: 𝑐∧ 𝛾=0 𝑃 ⊛ 𝛽=0 Own(𝑜𝑣𝑒𝑟 [𝛾] [𝛽]) ∧ 𝑜𝑣𝑒𝑟𝐶𝑡 [𝛾] = 𝑃∑︁ 𝛽=0 𝑜𝑣𝑒𝑟 [𝛾] [𝛽]  ∧ 𝐶∧ 𝛾=0 𝑃 ⊛ 𝛽=0 Own(𝑏𝑖𝑛𝐶𝑡 [𝛾] [𝛽]) ∧ [𝑐 ≤ 𝐶] ∧ Detm⟨𝑐⟩ For the inner loop (2.1), we take the invariant: 𝑐∧ 𝛾=0 𝑃 ⊛ 𝛽=0 Own(𝑜𝑣𝑒𝑟 [𝛾] [𝛽]) ∧ 𝑜𝑣𝑒𝑟𝐶𝑡 [𝛾] = 𝑃∑︁ 𝛽=0 𝑜𝑣𝑒𝑟 [𝛾] [𝛽]  ∧ 𝑝 ⊛ 𝛽=0 Own(𝑜𝑣𝑒𝑟 [𝑐] [𝛽]) ⊛ 𝑃 ⊛ 𝛽=𝑝 Own(𝑏𝑖𝑛𝐶𝑡 [𝑐] [𝛽]) ∧ 𝑜𝑣𝑒𝑟𝐶𝑡 [𝛾] = 𝑝∑︁ 𝛽=0 𝑜𝑣𝑒𝑟 [𝛾] [𝛽]  ∧ 𝐶∧ 𝛾=0 𝑃 ⊛ 𝛽=0 Own(𝑏𝑖𝑛𝐶𝑡 [𝛾] [𝛽]) ∧ [𝑐 ≤ 𝐶] ∧ Detm⟨𝑐⟩ 104 Again, we show the invariant post-conditions for a fixed 𝛾. For the critical itera- tion 𝛾 = 𝑐, we again isolate 𝑏𝑖𝑛𝐶𝑡 [𝑐] [𝑝], observe that addition is monotone and the function [𝑏𝑖𝑛𝐶𝑡 [𝑐] [𝑝] > 𝑇𝑏𝑖𝑛] is monotone in 𝑏𝑖𝑛𝐶𝑡 [𝛾] [𝑝], and apply the NA frame rule NEGFRAME. Finally, at the end of the program, we can show: 𝑃 ⊛ 𝛽=0 Own(𝑜𝑣𝑒𝑟 [𝛾] [𝛽]) along with the regular invariant𝑜𝑣𝑒𝑟𝐶𝑡 [𝛾] = 𝑃∑︁ 𝛽=0 𝑜𝑣𝑒𝑟 [𝛾] [𝛽]  . We can then apply the negative-dependence Chernoff bound (NA-Chernoff-2): Pr[𝑜𝑣𝑒𝑟𝐶𝑡 [𝛾] ≥ E[𝑜𝑣𝑒𝑟𝐶𝑡 [𝛾]] + 𝑇 (𝜌𝑜𝑣𝑒𝑟 , 𝑃)] ≤ 𝜌𝑜𝑣𝑒𝑟 . Using the expectation bound from the previous step and putting everything together, we conclude: {⊤} FDDICT { 𝐶∧ 𝛾=0 Pr[𝑜𝑣𝑒𝑟𝐶𝑡 [𝛾] ≥ 𝑃 · 𝐹 (𝑇𝑏𝑖𝑛 − 𝑁/(𝑃 · 𝐶), 𝑁) + 𝑇 (𝜌𝑜𝑣𝑒𝑟 , 𝑃)] ≤ 𝜌𝑜𝑣𝑒𝑟}, thus showing a high-probability upper-bound on the number of overfull pock- ets dictionaries within each crate. 3.6.6 Repeated Balls-into-bins Process Our final example considers a probabilistic protocol proposed by Becchetti et al. [2019], implemented as REPEATBIB in Figure 3.7. Intuitively, the program im- plements a repeated balls-into-bins process. Initially, 𝑁 balls are distributed 105 REPEATBIB : 𝑟 ← 0; while 𝑟 < 𝑅 do 𝑛← 0 𝑟𝑒𝑚 ← 0; while 𝑛 < 𝑁 do 𝑐𝑡 [𝑛] ← 𝑐𝑡 [𝑛] − [𝑐𝑡 [𝑛] > 0]; 𝑟𝑒𝑚 ← 𝑟𝑒𝑚 + [𝑛 > 0]; 𝑛← 𝑛 + 1; 𝑗 ← 0; while 𝑗 < 𝑟𝑒𝑚 do 𝑏𝑖𝑛[ 𝑗] $← OH[𝑁] ; 𝑘 ← 0; while 𝑘 < 𝑁 do 𝑢𝑝𝑑 ← 𝑐𝑡 [𝑘] + 𝑏𝑖𝑛[ 𝑗] [𝑘]; 𝑐𝑡 [𝑘] ← 𝑢𝑝𝑑; 𝑘 ← 𝑘 + 1; 𝑗 ← 𝑗 + 1; 𝑛← 0; 𝑒𝑚𝑝𝑡𝑦𝐶𝑡 [𝑟] ← 0; 𝑒𝑚𝑝𝑡𝑦 ← 𝑖𝑠𝑍𝑒𝑟𝑜(𝑐𝑡); while 𝑛 < 𝑁 do 𝑢𝑝𝑑 ← 𝑒𝑚𝑝𝑡𝑦𝐶𝑡 [𝑟] + 𝑒𝑚𝑝𝑡𝑦[𝑛]; 𝑒𝑚𝑝𝑡𝑦𝐶𝑡 [𝑟] ← 𝑢𝑝𝑑; 𝑛← 𝑛 + 1; 𝑟 ← 𝑟 + 1; Figure 3.7: Repeated balls-into-bins [Becchetti et al., 2019] among 𝑁 bins (𝑐𝑡 [𝑛]). For 𝑅 rounds, in each round, a ball is first removed from every non-empty bin. Then, the 𝑟𝑒𝑚 removed balls are randomly reassigned to bins. This process is useful for distributed protocols and scheduling algorithms, where the balls represent tasks and the bins represent computation nodes. Bec- chetti et al. [2019] proposed and analyzed this algorithm (e.g., bounding the maximum load, proving how long it takes for all balls to visit all bins). We can verify the following lower-bound on the number of empty bins, analogous to 106 Becchetti et al. [2019, Lemma 1 and Lemma 2]: {𝑁 ≥ 2 ∧ [ 𝑁∑︁ 𝛼=0 𝑐𝑡 [𝛼] = 𝑁 ] } REPEATBIB {Pr  𝑅∨ 𝛽=0 (𝑒𝑚𝑝𝑡𝑦𝐶𝑡 [𝛽] < 𝑁 15 − 𝑇 (𝜌𝑒𝑚𝑝𝑡𝑦, 𝑁))  ≤ 𝑅 · 𝜌𝑒𝑚𝑝𝑡𝑦} Two aspects of this program make it more difficult to verify. First, there is a loop with a randomized guard: the number of removed balls 𝑟𝑒𝑚 is a randomized quantity. Reasoning about such loops is challenging because our LOOP rule is not directly applicable and only far weaker rules are available for loops with general randomized guards. Becchetti et al. [2019] sidestep this problem by con- ditioning on the number of balls in each bin, which also fixes 𝑟𝑒𝑚 to be some value, proving the target property for every fixed setting, and then combining the proofs together. LINA can formalize this style of reasoning using the ran- domized case analysis rule RCASE to condition on 𝑟𝑒𝑚’s value, and then apply the section 2.3.3 rule; however, the post-condition of section 3.5.2 must be closed under mixtures, while independence and negative association are known not to satisfy this side-condition. Thus, it is not possible to prove negative association by first conditioning and then combining. To work around this second problem, we use a technique from Becchetti et al. [2019] and prove, on each conditional distribution, a high-probability bound using the Chernoff bound. The benefit of this approach is that high-probability bounds are closed under mixture, so we can apply RCASE to combine the results. In the formal proof, we will refer to the loops in Figure 3.7 using the same scheme we used before: the outer-most loop is loop (1), the three next-outer- most loops are loops (1.1), (1.2), and (1.3), and the inner-most loop is loop (1.2.1). 107 Starting from the outside, we take the following invariant for loop (1): Pr  𝑟∨ 𝛽=0 (𝑒𝑚𝑝𝑡𝑦𝐶𝑡 [𝛽] > 𝑇𝑒𝑚𝑝𝑡𝑦)  ≤ 𝑟 · 𝜌𝑒𝑚𝑝𝑡𝑦 ∧ [ 𝑁∑︁ 𝛼=0 𝑐𝑡 [𝛼] = 𝑁 ] Showing the invariant condition requires some work. First, note that: |=Mem [ 𝑁∑︁ 𝛼=0 𝑐𝑡 [𝛼] = 𝑁 ] → ∨ 𝜎:[𝑁]→[𝑁] 𝑁∧ 𝛼=0 [ 𝑐𝑡 [𝛼] = |𝜎−1(𝛼) | ] where 𝜎 : [𝑁] → [𝑁] ranges over all assignments of 𝑁 balls to 𝑁 bins. We write 𝜏(𝛼) = |𝜎−1(𝛼) | for the number of balls in bin 𝛼. We will show: { 𝑁∧ 𝛼=0 [𝑐𝑡 [𝛼] = 𝜏(𝛼)]} 𝑏𝑜𝑑𝑦 {Pr [ 𝑒𝑚𝑝𝑡𝑦𝐶𝑡 [𝑟] < 𝑇𝑒𝑚𝑝𝑡𝑦 ] ≤ 𝜌𝑒𝑚𝑝𝑡𝑦} where 𝑏𝑜𝑑𝑦 is the body of loop (1). For loop (1.1), it is straightforward to show the invariant using RASSN: 𝑁∧ 𝛼=𝑛 ( [𝑐𝑡 [𝛼] = 𝜏(𝛼)]) ∧ 𝑛∧ 𝛼=0 ( [𝑐𝑡 [𝛼] = 𝜏(𝛼) − [𝜏(𝛼) > 0]]) ∧ [ 𝑟𝑒𝑚 = 𝑛∑︁ 𝛼=0 [𝜎(𝛼) > 0] ] ∧ [𝑛 ≤ 𝑁] Using the loop rule LOOP, we derive the following at the exit of loop (1.1), 𝑁∧ 𝛼=0 ( [𝑐𝑡 [𝛼] = 𝜏(𝛼) − [𝜏(𝛼) > 0]]) ∧ [ 𝑟𝑒𝑚 = 𝑁∑︁ 𝛼=0 [𝜎(𝛼) > 0] ] Since counts are all equal to expressions of logical variables, conditioning on the logical variable 𝜎, they are all deterministic; Thus, we have 𝑁∧ 𝛼=0 Detm⟨𝑐𝑡 [𝛼]⟩ ∧ Detm⟨𝑟𝑒𝑚⟩ which implies⊛𝑁 𝛼=0 Own(𝑐𝑡 [𝛼]) ∧ Detm⟨𝑟𝑒𝑚⟩. We take ⊛𝑁 𝛼=0 Own(𝑐𝑡 [𝛼]) ∧ Detm⟨𝑟𝑒𝑚⟩ to be the invariant for loop (1.2). To establish this, we reason much as in the previous examples. The sampling rule SAMP gives: 𝑁 ⊛ 𝛼=0 Own(𝑐𝑡 [𝛼]) ∗ Own(𝑏𝑖𝑛[ 𝑗]) 108 By negative association for one-hot encoding (OH-PNA): 𝑁 ⊛ 𝛼=0 Own(𝑐𝑡 [𝛼]) ∗ 𝑁 ⊛ 𝛼=0 Own(𝑏𝑖𝑛[ 𝑗] [𝛼]). For the inner-most loop (1.2.1), we apply the same technique as for loop (1.2). Since loop (1.2) has a randomized guard, 𝑘 is a random variable, and loop (1.2.1) also has a randomized guard. However, under the conditioning, we may as- sume that 𝑘 is deterministic and apply LOOP on loop (1.2.1) with the following invariant: 𝑘 ⊛ 𝛼=0 Own(𝑐𝑡 [𝛼]) ⊛ 𝑁 ⊛ 𝛼=𝑘 (Own(𝑐𝑡 [𝑘]) ⊛ Own(𝑏𝑖𝑛[ 𝑗] [𝛼])) ∧ [𝑘 ≤ 𝑁] Like in earlier examples, we can establish this invariant using NEGFRAME since 𝑐𝑡 [𝑛] + 𝑏𝑖𝑛[ 𝑗] [𝑛] is monotone. Thus at the exit of loop (1.2.1), we have: 𝑁 ⊛ 𝛼=0 Own(𝑐𝑡 [𝛼]) And that is preserved to the end of loop (1.2). Next, three applications of the assignment rule RASSN give: 𝑁 ⊛ 𝛼=0 Own(𝑐𝑡 [𝛼]) ∧ [𝑛 = 0] ∧ [𝑒𝑚𝑝𝑡𝑦𝐶𝑡 [𝑟] = 0] ∧ [𝑒𝑚𝑝𝑡𝑦 = 𝑖𝑠𝑍𝑒𝑟𝑜(𝑐𝑡)] The function 𝑖𝑠𝑍𝑒𝑟𝑜(𝑣) takes a numerical vector 𝑣 and returns a vector where each index 𝑖 is 1 if 𝑣 [𝑖] is zero, else it holds 0. This is an antitone func- tion: it is non-increasing in its argument. Thus, the monotone mapping axiom (Mono-Map) gives: 𝑁 ⊛ 𝛼=0 Own(𝑒𝑚𝑝𝑡𝑦[𝛼]) Then, a standard loop invariant for loop (1.3) gives: 𝑁 ⊛ 𝛼=0 Own(𝑒𝑚𝑝𝑡𝑦[𝛼]) ∧ [ 𝑒𝑚𝑝𝑡𝑦𝐶𝑡 [𝑟] = 𝑁∑︁ 𝛼=0 𝑒𝑚𝑝𝑡𝑦[𝛼] ] 109 at the end of loop (1.3). Thus, { 𝑁∧ 𝛼=0 [𝑐𝑡 [𝛼] = 𝜏(𝛼)]} 𝑏𝑜𝑑𝑦 { 𝑁 ⊛ 𝛼=0 Own(𝑒𝑚𝑝𝑡𝑦[𝛼]) ∧ [ 𝑒𝑚𝑝𝑡𝑦𝐶𝑡 [𝑟] = 𝑁∑︁ 𝛼=0 𝑒𝑚𝑝𝑡𝑦[𝛼] ] } . Now, we are in a position to apply the negative association Chernoff bound (NA-Chernoff-2), giving the judgment: { 𝑁∧ 𝛼=0 [𝑐𝑡 [𝛼] = 𝜏(𝛼)]} 𝑏𝑜𝑑𝑦 {Pr[𝑒𝑚𝑝𝑡𝑦𝐶𝑡 [𝑟] ≤ E[𝑒𝑚𝑝𝑡𝑦𝐶𝑡 [𝑟]] − 𝑇 (𝜌𝑒𝑚𝑝𝑡𝑦, 𝑁)] ≤ 𝜌𝑒𝑚𝑝𝑡𝑦} where 𝑏𝑜𝑑𝑦 is the body of loop (1). Next we bound E[𝑒𝑚𝑝𝑡𝑦𝐶𝑡 [𝑟]] by translating an argument by Becchetti et al. [2019, Lemma 2] into our logic gives: { 𝑁∧ 𝛼=0 [𝑐𝑡 [𝛼] = 𝜏(𝛼)] ∧ [𝑁 ≥ 2]} 𝑏𝑜𝑑𝑦 {E[𝑒𝑚𝑝𝑡𝑦𝐶𝑡 [𝑟]] ≥ 𝑁/15} The argument makes use of basic properties of expected values and the expo- nential function; we omit the details. Thus, we can conclude that { 𝑁∧ 𝛼=0 [𝑐𝑡 [𝛼] = 𝜏(𝛼)]} 𝑏𝑜𝑑𝑦 {Pr[𝑒𝑚𝑝𝑡𝑦𝐶𝑡 [𝑟] ≤ 𝑁/15 − 𝑇 (𝜌𝑒𝑚𝑝𝑡𝑦, 𝑁)] ≤ 𝜌𝑒𝑚𝑝𝑡𝑦} . Note that this post-condition is closed under the mixture. So now we can ap- ply the randomized case analysis rule RCASE to combine the proof for different assignments 𝜎. We can take the trivial pre-condition 𝜑 = ⊤, and the case condi- tion: 𝜂 := ∨ 𝜎:{𝑁}→{𝑁} 𝑁∧ 𝛼=0 [𝑐𝑡 [𝛼] = 𝜏(𝛼)] . Since 𝜂 asserts that one of the equality holds with probability 1, it is closed under conditioning. Applying RCASE, we have: { ∨ 𝜎:{𝑁}→{𝑁} 𝑁∧ 𝛼=0 [𝑐𝑡 [𝛼] = 𝜏(𝛼)]} 𝑏𝑜𝑑𝑦 {Pr[𝑒𝑚𝑝𝑡𝑦𝐶𝑡 [𝑟] < 𝑁/15 − 𝑇 (𝜌𝑒𝑚𝑝𝑡𝑦, 𝑁)] ≤ 𝜌𝑒𝑚𝑝𝑡𝑦} 110 Also, since |= ∑𝑁 𝛼=0 [𝑐𝑡 [𝛼] = 𝑁] → ∨ 𝜎:{𝑁}→{𝑁} ∧𝑁 𝛼=0 [𝑐𝑡 [𝛼] = 𝜏(𝛼)], we can weaken the precondition into ∑𝑁 𝛼=0 [𝑐𝑡 [𝛼] = 𝑁] → ∨ 𝜎:{𝑁}→{𝑁}. Recalling that we wanted the following invariant to get preserved by loop (1): Pr  𝑟∨ 𝛽=0 (𝑒𝑚𝑝𝑡𝑦𝐶𝑡 [𝛽] < 𝑇𝑒𝑚𝑝𝑡𝑦)  ≤ 𝑟 · 𝜌𝑒𝑚𝑝𝑡𝑦 ∧ 𝑁∑︁ 𝛼=0 [𝑐𝑡 [𝛼] = 𝑁] ∧ Detm⟨𝑟⟩ ∧ [𝑁 ≥ 2] We can use the rule of constancy CONST and the assignment rule DASSN to preserve the first conjunct to show: Pr  𝑟−1∨ 𝛽=0 (𝑒𝑚𝑝𝑡𝑦𝐶𝑡 [𝛽] < 𝑇𝑒𝑚𝑝𝑡𝑦)  ≤ (𝑟 − 1) · 𝜌𝑒𝑚𝑝𝑡𝑦 at the end of the body of loop (1). Combined with the probability bound for 𝑒𝑚𝑝𝑡𝑦𝐶𝑡 [𝑟], an application of the union bound (UnionBd) establishes the in- variant for loop (1). Putting everything together, we have: {𝑁 ≥ 2 ∧ 𝑁∑︁ 𝛼=0 [𝑐𝑡 [𝛼] = 𝑁]} REPEATBIB {Pr  𝑅∨ 𝛽=0 (𝑒𝑚𝑝𝑡𝑦𝐶𝑡 [𝛽] < 𝑁/15 − 𝑇 (𝜌𝑒𝑚𝑝𝑡𝑦, 𝑁))  ≤ 𝑅 · 𝜌𝑒𝑚𝑝𝑡𝑦} analogous to Becchetti et al. [2019, Lemma 1 and 2]. 3.7 Related Work Verifying approximate data structures and applying concentration bounds. Bloom filters are a data structure supporting approximate membership queries (AMQs). Ceramist [Gopinathan and Sergey, 2020] is a recent framework for 111 verifying hash-based AMQ structures in the Coq theorem prover. Besides han- dling Bloom filters, Ceramist supports subtle proofs of correctness for many other AMQs. Compared with our approach, Ceramist proofs are more precise but also more intricate, applying theorems about Stirling numbers to achieve a precise bound on the false positive probability. In contrast, our approach rea- sons about negative dependence to achieve a substantially simpler proof, albeit with less precise bounds. Prior works in verification have also applied the Chernoff bound to bound sums of independent random quantities (e.g., [Wang et al., 2021, Chakarov and Sankaranarayanan, 2013]). While independence is easier to establish, the nega- tive association property that we need is more subtle. Negative dependence. There are multiple definitions of negative dependence in the literature, each with their own strengths and weaknesses. We work with negative association (NA) [Joag-Dev and Proschan, 1983, Dubhashi and Ranjan, 1998], because it holds in many situations where negative dependence should hold and it is closed under various notions of composition. Recently, the notion of Strong Rayleigh (SR) [Borcea et al., 2009] distribution has been proposed as an ideal definition of negative dependence. The SR condition satisfies more closure properties than NA does; in particular, it is preserved under various forms of conditioning. However, SR distributions have mostly been studied for Boolean variables only, and we do not know if an analogue of the monotone maps property of NA holds for SR. Beyond theoretical investigations, negative dependence plays a useful role in many practical applications. In machine learning, negative dependence can 112 help ensure diversity in predictions by a model [Kulesza and Taskar, 2012], and fast algorithms are known to learn and sample from negatively-dependent distributions [Anari et al., 2016]. In algorithm design, negative dependence is a useful tool to randomly round solutions of linear programs to integral so- lutions [Srinivasan, 2001]. Negative dependence can ensure that certain con- straints are satisfied exactly after rounding, while still allowing concentration bounds to be applied to analyze the quality of the rounded solution. 113 CHAPTER 4 A BUNCHED LOGIC FOR DEPENDENCE AND INDEPENDENCE Conditional independence (CI) is a well-studied notion in probability the- ory and statistics [Dawid, 1979, Pearl et al., 1989, Dawid, 2001, Simpson, 2018]. While there are many interpretations of CI, a natural reading is in terms of ir- relevance: 𝑋 and 𝑌 are independent conditioned on 𝑍 if knowing the value of 𝑍 renders 𝑋 and 𝑌 unrelated; in other words, observing one gives no further information about the other. Conditional independence has a wide range of applications. For exam- ple, it enables distinguishing superfluous correlation from causation. For in- stance, suppose researchers found a strong positive correlation between a na- tion’s per capita Nobel laureates number and chocolate consumption. A con- venient (mis)interpretation would be that chocolate consumption makes people smarter and leads to more Nobel Laureates. But the correlation is likely due to other factors, e.g., a nation’s economic status, and the two are conditionally independent fixing the third factor. Conditional independence can also succinctly encode interesting properties. As more and more life-changing decisions, e.g., job hiring, judicial decisions, and loan approvals, are automated using prediction algorithms, algorithmic fairness has gained more attention. To prevent algorithms from discriminat- ing based on sensitive features (e.g., race and gender), researchers formalized notions of fairness originated from different philosophies using conditional in- dependence. For instance, one school of thought is that an algorithm is fair if it satisfies equalized odds, i.e., the algorithm’s predictions and the sensitive features are conditionally independent, fixing the innate quality (i.e., the target label) 114 that the algorithm is aiming to predict; another proposal for fairness is calibra- tion, which says that fixing on the algorithm’s prediction, the sensitive features and the target label are conditionally independent. (More details are presented in Barocas et al. [2023].) Since we are studying probabilistic programs, we want to reason about con- ditional independence of (sets of) program variables, which is defined as fol- lows: Definition 4.0.1 (Conditional independence). Let 𝑋,𝑌, 𝑍 ⊆ Var. For any 𝑚 ∈ Mem[Var], we write the event {𝜔 ∈ Mem[Var] | ∀𝑥 ∈ 𝑋.𝜔(𝑥) = 𝑚(𝑥)} as 𝑋 = 𝑚. The set of variables 𝑋 and 𝑌 are independent conditioned on 𝑍 , written 𝑋 ⊥⊥ 𝑌 | 𝑍 , if for all 𝑥 ∈ Mem[𝑋], 𝑦 ∈ Mem[𝑌 ], and 𝑧 ∈ Mem[𝑍]: 𝜇(𝑋 = 𝑥 | 𝑍 = 𝑧) · 𝜇(𝑌 = 𝑦 | 𝑍 = 𝑧) = 𝜇(𝑋 = 𝑥,𝑌 = 𝑦 | 𝑍 = 𝑧). When 𝑍 = ∅, we say 𝑋 and 𝑌 are independent, written 𝑋 ⊥⊥ 𝑌 . Conditional independence of program variables allows for more efficient representation of distributions over program memories. For instance, if 𝑋 ⊥⊥ 𝑌 | 𝑍 , then instead of storing the joint distribution of 𝑋,𝑌, 𝑍 , one can store the distribution of 𝑍 , the marginal distribution of 𝑋 given 𝑍 , and the marginal dis- tribution of 𝑌 given 𝑍 ; when there are 𝑛 possible outcomes for each of 𝑋,𝑌, 𝑍 , storing the former takes 𝑂 (𝑛3) space, while storing the latter only takes 𝑂 (𝑛2) space. The factored representation also enables more efficient inference algo- rithms (e.g., Holtzen [2021]), which are developed to compute or approximate the distribution after conditioning on an observation. Thus, we want to extend probabilistic separation logic to prove conditional independence of program variables. To achieve that, we need an assertion logic 115 that can express conditional independence. The existing probabilistic BI model (Section 2.3.2) provides no means to describe the distribution over program memories conditioned on the values a set of variables takes. Accordingly, one cannot capture the basic statement of conditional independence, i.e., 𝑋 and 𝑌 are independent conditioned on any value of 𝑍 . To address that problem, we develop a novel assertion logic DIBI, short for Dependence and Independence BI. DIBI extends BI with new connectives: the conjunction 𝑃 # 𝑄 for modeling de- pendence between states and its adjoints 𝑃 � 𝑄 and 𝑃 ⊸ 𝑄. We then develop a probabilistic model of DIBI so that 𝑃 ∗ 𝑄 can assert probabilistic independence and 𝑃 # 𝑄 can assert dependence. Then, we express conditional independence of 𝑋 and 𝑌 given 𝑍 roughly as 𝑍 # (𝑋 ∗ 𝑌 ), which asserts the independence of 𝑋 and 𝑌 while they both depend on 𝑍¿ Intuitively, to assert dependence with the conjunction 𝑃 # 𝑄, we want to in- terpret # through a binary operator ⊙, where the operator ⊙ is defined so that in the composed distribution 𝑓 ⊙ 𝑔, the variables described by 𝑔 depends on the variables described 𝑓 ; however, it is unclear how to define such an operator ⊙ for distributions. To address this problem, we design a DIBI model whose states are not distributions but Markov kernels [Panangaden, 2009], which are essentially maps from a set 𝐴 to a distribution over a set 𝐵 and get the name because of their role in the theory of general Markov processes [Dynkin, 2012]. We will sometimes abbreviate them as kernels for convenience. Crucially, Markov kernels can be composed sequentially using the bind op- eration in the distribution monad: given 𝑓 : 𝑋 → D(𝑌 ), 𝑔 : 𝑌 → D(𝑍), the Kleisli composition 𝑓 ; 𝑔 : 𝑋 → D(𝑍) is: ( 𝑓 ; 𝑔) (𝑥) := bind( 𝑓 (𝑥), 𝑔) (4.1) 116 𝑧 $← Bern1/2; if 𝑧 then 𝑥 $← Bern3/4; 𝑦 $← Bern3/4; else 𝑥 $← Bern1/2; 𝑦 $← Bern1/2 (a) Probabilistic program 𝑝 𝑥 𝑦 𝑧 𝜇 0 0 0 1/8 0 0 1 1/32 1 0 0 1/8 1 0 1 3/32 0 1 0 1/8 0 1 1 3/32 1 1 0 1/8 1 1 1 9/32 (b) Distribution 𝜇 generated by 𝑝 𝑥 𝑦 𝜇0 0 0 1/4 1 0 1/4 0 1 1/4 1 1 1/4 (c) 𝜇 conditioned on 𝑧 = 0 𝑥 𝑦 𝜇1 0 0 1/16 1 0 3/16 0 1 3/16 1 1 9/16 (d) 𝜇 conditioned on 𝑧 = 1 Figure 4.1: From probabilistic programs to kernels Markov kernels generalize distributions because we can lift any distribution 𝜇 : D(𝑋) to a kernel 𝑓𝜇 : 1 → D(𝑋) by assigning 𝜇 to the single element of 1. Kernels can also encode conditional distributions, which play a key role in con- ditional independence. We show an example of how to encode conditional dis- tributions using kernels below. Example 4.0.1 (Kernels and Conditional Probabilities). Consider the program 𝑝 in Figure 4.1a, where 𝑥, 𝑦, and 𝑧 are Boolean variables. First, flip a fair coin and store the result in 𝑧. If 𝑧 = 0, flip a fair coin twice, and store the results in 𝑥 and 𝑦, respectively. If 𝑧 = 1, flip a coin with bias 1/4 twice, and store the results in 𝑥 and 𝑦. This program produces a distribution 𝜇, shown in Figure 4.1b. If we condition 𝜇 on 𝑧 = 0, then the resulting distribution 𝜇0 models two independent fair coin flips: 1/4 probability for each possible pair of outcomes (Figure 4.1c). If we condition on 𝑧 = 1, however, then the distribution 𝜇1 will be skewed — there will be a much higher probability that we observe (1, 1) than (0, 0), but 𝑥 and 𝑦 are still independent given 𝑧 (Figure 4.1d). To connect 𝜇0 and 𝜇1 to the original distribution 𝜇, we package 𝜇0 and 𝜇1 117 into a Markov kernel 𝑘 : Mem[𝑧] → D(Mem[{𝑥, 𝑦, 𝑧}]) given by 𝑘 (𝑧 = 𝑖) (𝑥 = 𝑣𝑥 , 𝑦 = 𝑣𝑦) = 𝜇𝑖 (𝑥 = 𝑣𝑥 , 𝑦 = 𝑣𝑦) for any 𝑣𝑥 , 𝑣𝑦 ∈ Val. Then, the relation between the conditional and original distributions is 𝑓𝜇 = 𝑓𝜇𝑧 ; 𝑘 , where 𝜇𝑧 is the projection of 𝜇 on {𝑧}. In the following, we first introduce the metatheory of the DIBI logic in sec- tion 4.1, then define the probabilistic model of DIBI on Markov kernels in sec- tion 4.2. After showing that conditional independence can be asserted in the probabilistic model of DIBI in subsection 4.2.2, we lay out a program logic us- ing DIBI as the assertion logic in section 4.3. 4.1 DIBI Logic Analogous to the sections about bunched logic and LINA, we first introduce the syntax and semantics for DIBI formulas, then provide a proof system, and last show that the proof system is sound and complete. 4.1.1 Syntax and semantics The syntax of DIBI extends BI with a non-commutative conjunctive connective # and its associated implications ⊸ and �. Let AP be a set of propositional atoms. The set of DIBI formulas, FormDIBI, is generated by the following gram- 118 mar: 𝑃,𝑄 ::= 𝑝 ∈ AP | ⊤ | 𝐼 | ⊥ | 𝑃 ∧𝑄 | 𝑃 ∨𝑄 | 𝑃→ 𝑄 | 𝑃 ∗ 𝑄 | 𝑃 −∗ 𝑄 | 𝑃 #𝑄 | 𝑃 ⊸ 𝑄 | 𝑃 � 𝑄. DIBI formulas are interpreted on DIBI frames, which extend BI frames. As in BI frames, we want to define one binary operator, denoted ⊕ here, to interpret 𝑃 ∗ 𝑄, which asserts the separation of resources validating 𝑃 and 𝑄. The main extension is a new binary operator ⊙ for interpreting the formulas 𝑃 #𝑄, 𝑃 ⊸ 𝑄 and 𝑃 � 𝑄. Informally, we want 𝑃 # 𝑄 to assert that the resource validating 𝑄 depends on the resource validating 𝑃. Because dependence in general is not commutative, we define ⊙ as a non-commutative operator. Definition 4.1.1 (DIBI Frame). A DIBI frame is a structure X = (𝑋, ⊑, ⊕, ⊙, 𝐸) such that ⊑ is a preorder, 𝐸 ⊆ 𝑋 , and ⊕ : 𝑋2 → P(𝑋) and ⊙ : 𝑋2 → P(𝑋) are binary operations, satisfying the rules in Figure 4.2. Similar to the case in BI frames, 𝑋 is a set of states, the preorder ⊑ describes when a smaller state can be extended to a larger state, the binary operators ⊙, ⊕ offer two ways of combining states, and 𝐸 is the set of states that act like units with respect to these operations. For instance, in our intended model for probabilistic programs, the states would be Markov kernels that preserve their input through to their output, which present conditional distributions. We would define 𝑓 ⊕ 𝑔 to return the set of independent products of two kernels — there is no standard definition for this but roughly it should generalize independent product of distributions, and define 𝑓 ⊙𝑔 to return the set of kernels obtained by the sequential composition of two kernels, which is based on the monadic bind. The definition of pre-order would generalize the pre-order in PSL’s assertion logic, which says 𝜇1 is smaller than 𝜇2 if 𝜇1 is a marginal distribution of 𝜇2. 119 𝑧 ∈ 𝑥 ⊕ 𝑦 ∧ 𝑥 ⊒ 𝑥′ ∧ 𝑦 ⊒ 𝑦′→ ∃𝑧′(𝑧 ⊒ 𝑧′ ∧ 𝑧′ ∈ 𝑥′ ⊕ 𝑦′); (⊕ Down-Closed) 𝑧 ∈ 𝑥 ⊙ 𝑦 ∧ 𝑧′ ⊒ 𝑧 → ∃𝑥′, 𝑦′(𝑥′ ⊒ 𝑥 ∧ 𝑦′ ⊒ 𝑦 ∧ 𝑧′ ∈ 𝑥′ ⊙ 𝑦′) (⊙ Up-Closed) 𝑧 ∈ 𝑥 ⊕ 𝑦 → 𝑧 ∈ 𝑦 ⊕ 𝑥; (⊕ Commutativity) 𝑤 ∈ 𝑡 ⊕ 𝑧 ∧ 𝑡 ∈ 𝑥 ⊕ 𝑦 → ∃𝑠(𝑠 ∈ 𝑦 ⊕ 𝑧 ∧ 𝑤 ∈ 𝑥 ⊕ 𝑠); (⊕ Associativity) ∃𝑒 ∈ 𝐸 (𝑥 ∈ 𝑒 ⊕ 𝑥); (⊕ Unit Existence) 𝑒 ∈ 𝐸 ∧ 𝑥 ∈ 𝑦 ⊕ 𝑒 → 𝑥 ⊒ 𝑦; (⊕ Unit Coherence) ∃𝑡 (𝑤 ∈ 𝑡 ⊙ 𝑧 ∧ 𝑡 ∈ 𝑥 ⊙ 𝑦) ↔ ∃𝑠(𝑠 ∈ 𝑦 ⊙ 𝑧 ∧ 𝑤 ∈ 𝑥 ⊙ 𝑠); (⊙ Associativity) ∃𝑒 ∈ 𝐸 (𝑥 ∈ 𝑒 ⊙ 𝑥); (⊙ Unit ExistenceL) ∃𝑒 ∈ 𝐸 (𝑥 ∈ 𝑥 ⊙ 𝑒); (⊙ Unit ExistenceR) 𝑒 ∈ 𝐸 ∧ 𝑥 ∈ 𝑦 ⊙ 𝑒 → 𝑥 ⊒ 𝑦; (⊙ CoherenceR) 𝑒 ∈ 𝐸 ∧ 𝑒′ ⊒ 𝑒 → 𝑒′ ∈ 𝐸 ; (Unit Closure) 𝑥 ∈ 𝑦 ⊕ 𝑧 ∧ 𝑦 ∈ 𝑦1 ⊙ 𝑦2 ∧ 𝑧 ∈ 𝑧1 ⊙ 𝑧2 → ∃𝑢, 𝑣(𝑢 ∈ 𝑦1 ⊕ 𝑧1 ∧ 𝑣 ∈ 𝑦2 ⊕ 𝑧2 ∧ 𝑥 ∈ 𝑢 ⊙ 𝑣). (Reverse Exchange) Figure 4.2: DIBI frame requirements (with outermost universal quantification omitted for readability). The frame conditions define properties that must hold for all models of DIBI. The frame conditions required for ⊕ are exactly the frame conditions satisfied by the binary combination in a BI frame; that is, (𝑋, ⊑, ⊕, 𝐸) forms a BI frame. The binary combination ⊙, in contrast, is not commutative, but it is still associative and has units. Having ⊙ being non-commutative splits the ⊙ analogues of ⊕ ax- ioms into pairs of axioms, although we exclude the left version of (⊙ Coherence) for reasons we explain in section 4.1.2. Also, while ⊕ is downwards-closed as in the binary operation in BI frames, the new binary combination ⊙ is upwards- closed. These choices of closedness conditions match the desired interpretations of ⊕ as independence and ⊙ as dependence: independence should drop down to substates (which must necessarily be independent if the superstates were so), while independence should be inherited by superstates (the source of depen- dence will still be present in any extensions). Finally, the (Reverse Exchange) condition defines the interaction between ⊕ and ⊙: intuitively, if 𝑦2 depends on 120 𝑦1 and 𝑧2 depends on 𝑧1, and 𝑦1, 𝑦2 are independent from 𝑧1, 𝑧2, then the combi- nation of 𝑦2 and 𝑧2 depends on 𝑦1 and 𝑧1. We give a Kripke-style semantics for DIBI. Definition 4.1.2 (Valuation and model). A persistent valuation of DIBI is an as- signment V : AP → P(𝑋) of atomic propositions to subsets of states of a DIBI frame satisfying persistence: if 𝑥 ∈ V(𝑝) and 𝑦 ⊒ 𝑥 then 𝑦 ∈ V(𝑝). A DIBI model (X,V) is a DIBI frame X together with a persistent valuationV. We now inductively define satisfaction of DIBI formulas in a DIBI model. 𝑥 |=V ⊤ always 𝑥 |=V ⊥ never 𝑥 |=V 𝐼 iff 𝑥 ∈ 𝐸 𝑥 |=V 𝑝 iff 𝑥 ∈ V(𝑝) 𝑥 |=V 𝑃 ∧𝑄 iff 𝑥 |=V 𝑃 and 𝑥 |=V 𝑄 𝑥 |=V 𝑃 ∨𝑄 iff 𝑥 |=V 𝑃 or 𝑥 |=V 𝑄 𝑥 |=V 𝑃→ 𝑄 iff for all 𝑦 ⊒ 𝑥, 𝑦 |=V 𝑃 implies 𝑦 |=V 𝑄 𝑥 |=V 𝑃 ∗ 𝑄 iff there exist 𝑥′, 𝑦, 𝑧 s.t. 𝑥 ⊒ 𝑥′ ∈ 𝑦 ⊕ 𝑧, 𝑦 |=V 𝑃 and 𝑧 |=V 𝑄 𝑥 |=V 𝑃 #𝑄 iff there exist 𝑦, 𝑧 s.t. 𝑥 ∈ 𝑦 ⊙ 𝑧, 𝑦 |=V 𝑃 and 𝑧 |=V 𝑄 𝑥 |=V 𝑃 −∗ 𝑄 iff for all 𝑦, 𝑧 s.t. 𝑧 ∈ 𝑥 ⊕ 𝑦: 𝑦 |=V 𝑃 implies 𝑧 |=V 𝑄 𝑥 |=V 𝑃 ⊸ 𝑄 iff for all 𝑥′, 𝑦, 𝑧 s.t. 𝑥′ ⊒ 𝑥 and 𝑧 ∈ 𝑥′ ⊙ 𝑦: 𝑦 |=V 𝑃 implies 𝑧 |=V 𝑄 𝑥 |=V 𝑃 � 𝑄 iff for all 𝑥′, 𝑦, 𝑧 s.t. 𝑥′ ⊒ 𝑥 and 𝑧 ∈ 𝑦 ⊙ 𝑥′: 𝑦 |=V 𝑃 implies 𝑧 |=V 𝑄 Figure 4.3: Satisfaction for DIBI Definition 4.1.3 (DIBI Satisfaction and Validity). Satisfaction at a state 𝑥 in a model is inductively defined by the clauses in Figure 4.3. As before, we say 𝑃 is valid in a model, X |=V 𝑃, iff 𝑥 |=V 𝑃 for all 𝑥 ∈ X. 𝑃 is valid, |= 𝑃, iff 𝑃 is valid in all models. 𝑃 |= 𝑄 iff, for all models, 𝑥 |=V 𝑃 implies 𝑥 |=V 𝑄. Where the context is clear, we omit the subscript V on the satisfaction re- lation. With the semantics in Figure 4.3, persistence on propositional atoms indeed extends to all formulas: 121 Lemma 4.1.1 (Persistence Lemma). For all 𝑃 ∈ FormDIBI, if 𝑥 |= 𝑃 and 𝑥 ⊑ 𝑦, then 𝑦 |= 𝑃. Proof. We prove that induction on the syntax of the formulas. Specifically, the persistence of ⊤ and ⊥ is trivial, and the persistent of 𝐼 follows from Unit Clo- sure. 𝑃 ∧ 𝑄 and 𝑃 ∨ 𝑄 are persistent because of their inductive hypothesis. For 𝑃→ 𝑄, 𝑃 ∗ 𝑄, 𝑃 � 𝑄, and 𝑃 ⊸ 𝑄, persistence is evident because their semantic clauses account for the order. □ Notably, in fig. 4.3, the semantic clauses for # and ∗ are different even besides that they use different binary operations — the semantic clause for ∗ has an additional variable 𝑥′ under the existential quantifier and only requires 𝑥 ⊒ 𝑥′ ∈ 𝑦⊕𝑧 instead of 𝑥 ∈ 𝑦⊕𝑧; the semantic clauses for −∗ and ⊸ are also different — the semantic clause for ⊸ also uses an additional variable 𝑥′ under the existential quantifier. This difference is due to the different frame axioms satisfied by ⊙ and ⊕ and our goal to ensure lemma 4.1.1 holds. The satisfaction of ⊙ Up-Closed frame axiom ensures the persistence of the simpler clause for #, and similarly ⊕ Down-Closed ensures the persistence of −∗ [Cao et al., 2017]. 4.1.2 Proof system Now we describe how DIBI formulas can be derived. We give a Hilbert-style proof system for DIBI in Figure 4.4. This calculus extends the proof system for BI with additional rules governing the new connectives #, ⊸, and �. In sec- tion 4.1.3, we will prove this calculus is sound and complete. Here we comment on two important details in this proof system. 122 𝑃 ⊢ 𝑃 AX 𝑃 ⊢ ⊤ TOP ⊥ ⊢ 𝑃 BOT 𝑃 ⊢ 𝑅 𝑄 ⊢ 𝑅 𝑃 ∨𝑄 ⊢ 𝑅 ∨-E 𝑃 ⊢ 𝑄𝑖 𝑃 ⊢ 𝑄1 ∨𝑄2 ∨-I 𝑃 ⊢ 𝑄 𝑃 ⊢ 𝑅 𝑃 ⊢ 𝑄 ∧ 𝑅 ∧-I-R 𝑄 ⊢ 𝑅 𝑃 ∧𝑄 ⊢ 𝑅 ∧-I-L 𝑃 ⊢ 𝑄1 ∧𝑄2 𝑃 ⊢ 𝑄𝑖 ∧-E 𝑃 ∧𝑄 ⊢ 𝑅 𝑃 ⊢ 𝑄 → 𝑅 →-I 𝑃 ⊢ 𝑄 → 𝑅 𝑃 ⊢ 𝑄 𝑃 ⊢ 𝑅 →-E 𝑃 ⊢ 𝑅 𝑄 ⊢ 𝑆 𝑃 ∗ 𝑄 ⊢ 𝑅 ∗ 𝑆 ∗-CONJ 𝑃 ∗ 𝑄 ⊢ 𝑅 𝑃 ⊢ 𝑄 −∗ 𝑅 −∗-I 𝑃 ⊢ 𝑄 −∗ 𝑅 𝑆 ⊢ 𝑄 𝑃 ∗ 𝑆 ⊢ 𝑅 −∗-E 𝑃 ⊣⊢ 𝑃 ∗ 𝐼 ∗-UNIT 𝑃 ∗ 𝑄 ⊢ 𝑄 ∗ 𝑃 ∗-COMM (𝑃 ∗ 𝑄) ∗ 𝑅 ⊣⊢ 𝑃 ∗ (𝑄 ∗ 𝑅) ∗-ASSOC 𝑃 #𝑄 ⊢ 𝑅 𝑃 ⊢ 𝑄 ⊸ 𝑅 ⊸-I 𝑃 ⊢ 𝑄 ⊸ 𝑅 𝑆 ⊢ 𝑄 𝑃 # 𝑆 ⊢ 𝑅 ⊸ MP 𝑃 #𝑄 ⊢ 𝑅 𝑄 ⊢ 𝑃 � 𝑅 �-I 𝑃 ⊢ 𝑄 � 𝑅 𝑆 ⊢ 𝑄 𝑆 # 𝑃 ⊢ 𝑅 � MP 𝑃 ⊢ 𝐼 # 𝑃 #-LEFT UNIT 𝑃 ⊣⊢ 𝑃 # 𝐼 #-RIGHT UNIT 𝑃 ⊢ 𝑅 𝑄 ⊢ 𝑆 𝑃 #𝑄 ⊢ 𝑅 # 𝑆 #-CONJ (𝑃 #𝑄) # 𝑅 ⊣⊢ 𝑃 # (𝑄 # 𝑅) #-ASSOC (𝑃 #𝑄) ∗ (𝑅 # 𝑆) ⊢ (𝑃 ∗ 𝑅) # (𝑄 ∗ 𝑆) REVEX Figure 4.4: Hilbert system for DIBI 123 Reverse exchange The proof system of DIBI shares many similarities with Concurrent Kleene Bunched Logic (CKBI) [Docherty, 2019], which also extends BI with a non-commutative conjunction. Inspired by concurrent Kleene alge- bra (CKA) Hoare et al. [2011], CKBI supports the following exchange axiom, derived from CKA’s exchange law: (𝑃 ∗ 𝑅) # (𝑄 ∗ 𝑆) ⊢CKBI (𝑃 #𝑄) ∗ (𝑅 # 𝑆) EXCH In models of CKBI, ∗ describes interleaving concurrent composition, while # describes sequential composition. The exchange rule states that the process on the left has fewer behaviors than the process on the right — e.g., 𝑃 # 𝑄 allows fewer behaviors than 𝑃 ∗ 𝑄, so 𝑃 #𝑄 ⊢CKBI 𝑃 ∗ 𝑄 is derivable. In our models, ∗ has a different reading: it states that two computations can be combined because they are independent (i.e., non-interfering). Accordingly, DIBI replaces EXCH by the reversed version REVEX — the fact that the process on the left is safe to combine implies that the process on the right is also safe. 𝑃 ∗ 𝑄 is now stronger than 𝑃 #𝑄, and instead 𝑃 ∗ 𝑄 ⊢ 𝑃 #𝑄 is derivable (Lemma 4.1.2). Lemma 4.1.2. In the proof system given by fig. 4.4, 𝑃 ∗ 𝑄 ⊢ 𝑃 #𝑄. Proof. For better readability, we break the proof tree down into two components. #-RIGHT UNIT 𝑃 ⊢ 𝑃 # 𝐼 #-LEFT UNIT 𝑄 ⊢ 𝐼 #𝑄 ∗-CONJ 𝑃 ∗ 𝑄 ⊢ (𝑃 # 𝐼) ∗ (𝐼 #𝑄) REVEX(𝑃 # 𝐼) ∗ (𝐼 #𝑄) ⊢ (𝑃 ∗ 𝐼) # (𝐼 ∗ 𝑄) CUT 𝑃 ∗ 𝑄 ⊢ (𝑃 ∗ 𝐼) # (𝐼 ∗ 𝑄) With 𝑃 ∗ 𝑄 ⊢ (𝑃 ∗ 𝐼) # (𝐼 ∗ 𝑄), we construct the following 124 𝑃 ∗ 𝑄 ⊢ (𝑃 ∗ 𝐼) # (𝐼 ∗ 𝑄) ∗-UNIT 𝑃 ∗ 𝐼 ⊢ 𝑃 ∗-COMM 𝐼 ∗ 𝑄 ⊢ 𝑄 ∗ 𝐼 ∗-UNIT 𝑄 ∗ 𝐼 ⊢ 𝑄 CUT 𝐼 ∗ 𝑄 ⊢ 𝑄 #-CONJ(𝑃 ∗ 𝐼) # (𝐼 ∗ 𝑄) ⊢ 𝑃 #𝑄 CUT 𝑃 ∗ 𝑄 ⊢ 𝑃 #𝑄 This proof uses the admissible rule CUT, which can be derived as follows: 𝑄 ⊢ 𝑅 ∧2 𝑃 ∧𝑄 ⊢ 𝑅 → 𝑃 ⊢ 𝑄 → 𝑅 𝑃 ⊢ 𝑄 MP 𝑃 ⊢ 𝑅 □ Left unit While # has a right unit in our logic, it does not have a proper left unit. Semantically, this corresponds to the lack of a frame condition 𝑒 ∈ 𝐸 ∧ 𝑥 ∈ 𝑒 ⊙ 𝑦 → 𝑥 ⊒ 𝑦; (⊙ CoherenceL) in our definition of DIBI frames. This difference can also be seen in our proof rules: while #-UNIT-R gives entailment in both directions, #-UNIT-R only shows entailment in one direction — there is no axiom stating 𝐼 # 𝑃 ⊢ 𝑃. We make this relaxation to support our intended model, which we will see in Section 4.2. In a nutshell, states in our models are Markov kernels that preserve their input through to their output. Our models take ⊙ to be Kleisli composition, which exhibits an important asymmetry for such arrows: 𝑓 can always be re- covered from 𝑓 ⊙ 𝑒, but not from arbitrary 𝑒⊙ 𝑓 . As a result, the set of all kernels naturally serves as the set of right units, but these kernels cannot all serve as left units.1 1In the special case that 𝑒 maps the input of 𝑓 to the Dirac distribution on it, then 𝑒 ⊙ 𝑓 = 𝑓 . But because we also want Unit Closure, which says the set of units is closed under the pre-order ⊑, our unit set 𝐸 contains other elements 𝑔 such that 𝑓 cannot be recovered from 𝑔 ⊙ 𝑓 . 125 4.1.3 Soundness and Completeness of DIBI The soundness and completeness of DIBI follow the same recipe as before, us- ing the methodology given by Docherty [2019]. First, DIBI is proved sound and complete with respect to an algebraic semantics obtained by interpreting the rules of the proof system as algebraic axioms. We then establish a representa- tion theorem: every DIBI algebra A embeds into a DIBI algebra generated by a DIBI frame, that is in turn generated by A. Soundness and completeness of the algebraic semantics can then be transferred to the Kripke semantics. We prove algebraic soundness and completeness of DIBI proof systems with respect to a new structure called DIBI algebra. Definition 4.1.4 (DIBI Algebra). A DIBI algebra is an algebra A = (𝐴,∧,∨,→,⊤,⊥, ∗,−∗, #,⊸,�, 𝐼) such that, for all 𝑎, 𝑏, 𝑐, 𝑑 ∈ 𝐴: • (𝐴,∧,∨,→,⊤,⊥) is a Heyting algebra; • (𝐴, ∗, 𝐼) is a commutative monoid; • (𝐴, #, 𝐼) is a weak monoid: # is an associative operation with right unit 𝐼 and 𝑎 ≤ 𝐼 # 𝑎; • 𝑎 ∗ 𝑏 ≤ 𝑐 iff 𝑎 ≤ 𝑏 −∗ 𝑐; • 𝑎 # 𝑏 ≤ 𝑐 iff 𝑎 ≤ 𝑏 ⊸ 𝑐 iff 𝑏 ≤ 𝑎 � 𝑐; • (𝑎 # 𝑏) ∗ (𝑐 # 𝑑) ≤ (𝑎 ∗ 𝑐) # (𝑏 ∗ 𝑑). An algebraic interpretation of DIBI is specified by an assignment on the atomic propositions ⟦−⟧ : AP → 𝐴. The interpretation is obtained as the unique homomorphic extension of this assignment, and so we use the notation ⟦−⟧ 126 interchangeably for both assignment and interpretation. Soundness and com- pleteness can be established by constructing a term DIBI algebra by quotienting formulas by equiderivability. Theorem 4.1.3. 𝑃 ⊢ 𝑄 is derivable iff ⟦𝑃⟧ ≤ ⟦𝑄⟧ for all algebraic interpretations ⟦−⟧. We now connect these algebras to DIBI frames so we can transfer the sound- ness and completeness of DIBI proof systems with respect to these algebras to the DIBI frames. Again, we use the notion of complex algebras and prime filters. We denote the set of prime filters of a DIBI algebra A by Prf(A). Definition 4.1.5 (Prime Filter Frame). Given a DIBI algebra A, the prime filter frame of A is defined as 𝑃𝑟 (A) = (Prf(A), ⊆, ⊕A, ⊙A, 𝐸A), where 𝐹 ⊕A 𝐺 = {𝐻 ∈ Prf(A) | ∀𝑎 ∈ 𝐹, 𝑏 ∈ 𝐺 (𝑎 ∗ 𝑏 ∈ 𝐻)} 𝐹 ⊙A 𝐺 = {𝐻 ∈ Prf(A) | ∀𝑎 ∈ 𝐹, 𝑏 ∈ 𝐺 (𝑎 # 𝑏 ∈ 𝐻)} 𝐸A = {𝐹 ∈ Prf(A) | 𝐼 ∈ 𝐹}. Lemma 4.1.4. For any DIBI algebra A, the prime filter frame 𝑃𝑟 (A) is a DIBI frame. In the other direction, DIBI frames generate DIBI algebras. Definition 4.1.6 (Complex Algebra). Given a DIBI frame X = (𝑋, ⊑, ⊕, ⊙, 𝐸), the complex algebra ofX is Com(X) = (P⊑ (𝑋),∩,∪,⇒X , 𝑋, ∅, •X ,�X , ⊲X ,−⊲X , ⊲−X , 𝐸): P⊑ (𝑋) = {𝐴 ⊆ 𝑋 | if 𝑎 ∈ 𝐴 and 𝑎 ⊑ 𝑏 then 𝑏 ∈ 𝐴} 𝐴⇒X 𝐵 = {𝑎 | for all 𝑏, if 𝑏 ⊒ 𝑎 and 𝑏 ∈ 𝐴 then 𝑏 ∈ 𝐵} 𝐴 •X 𝐵 = {𝑥 | there exist 𝑥′, 𝑎, 𝑏 s.t 𝑥 ⊒ 𝑥′ ∈ 𝑎 ⊕ 𝑏, 𝑎 ∈ 𝐴 and 𝑏 ∈ 𝐵} 𝐴 �X 𝐵 = {𝑥 | for all 𝑎, 𝑏, if 𝑏 ∈ 𝑥 ⊕ 𝑎 and 𝑎 ∈ 𝐴 then 𝑏 ∈ 𝐵} 𝐴 ⊲X 𝐵 = {𝑥 | there exist 𝑎, 𝑏 s.t 𝑥 ∈ 𝑎 ⊙ 𝑏, 𝑎 ∈ 𝐴 and 𝑏 ∈ 𝐵} 𝐴 −⊲X 𝐵 = {𝑥 | for all 𝑥′, 𝑎, 𝑏, if 𝑥 ⊑ 𝑥′, 𝑏 ∈ 𝑥′ ⊙ 𝑎 and 𝑎 ∈ 𝐴 then 𝑏 ∈ 𝐵} 𝐴 ⊲−X 𝐵 = {𝑥 | for all 𝑥′, 𝑎, 𝑏, if 𝑥 ⊑ 𝑥′, 𝑏 ∈ 𝑎 ⊙ 𝑥′ and 𝑎 ∈ 𝐴 then 𝑏 ∈ 𝐵}. 127 Lemma 4.1.5. For any DIBI frame X, the complex algebra Com(X) is a DIBI algebra. The following main result facilitates the transference of soundness and com- pleteness. Theorem 4.1.6 (Representation of DIBI algebras). Every DIBI algebra is isomorphic to a subalgebra of a complex algebra: given a DIBI algebra A, the map 𝜃A : A → Com(Prf(A)) defined by 𝜃A(𝑎) = {𝐹 ∈ Prf(A) | 𝑎 ∈ 𝐹} is an embedding. Given the previous correspondence between DIBI algebras and frames, we only need to show that 𝜃 is a monomorphism: the necessary argument is iden- tical to that for similar bunched logics [Docherty, 2019, Theorems 6.11, 6.25]. Given ⟦−⟧ on A, the representation theorem establishes thatV⟦−⟧(𝑝) := 𝜃A(⟦𝑝⟧) is a persistent valuation on 𝑃𝑟 (A) such that 𝐹 |=V⟦−⟧ 𝑃 iff ⟦𝑃⟧ ∈ 𝐹, from which our main theorem can be proved. Theorem 4.1.7 (Soundness and Completeness). 𝑃 ⊢ 𝑄 is derivable iff 𝑃 |= 𝑄. 4.2 A Probabilistic Model of DIBI Now we develop a probabilistic model of DIBI where 𝑃 ∗ 𝑄 can assert proba- bilistic independence and 𝑃 #𝑄 can assert dependence. Because our DIBI model is designed to describe probabilistic programs’ pro- gram states, in the remainder of this chapter, we use the term (Markov) kernels to specifically refer to maps 𝑓 : Mem[𝑆] → D(Mem[𝑈]) with 𝑆,𝑈 ⊆ Var. For a kernel 𝑓 , we define its domain dom( 𝑓 ) = 𝑆 and its range range( 𝑓 ) = 𝑈. We can also project kernels to a smaller range. 128 Definition 4.2.1 (Marginalizing kernels). For a Markov kernel 𝑓 : Mem[𝑆] → D(Mem[𝑈]) and 𝑉 ⊆ 𝑈, the marginalization of 𝑓 to V is the map 𝜋𝑉 𝑓 : Mem[𝑆] → D(Mem[𝑉]): (𝜋𝑉 𝑓 ) (𝑑) (𝑟) := ∑ 𝑚∈Mem[𝑈\𝑉] 𝑓 (𝑑) (𝑟 ⊲⊳ 𝑚) for 𝑑 ∈ Mem[𝑆], 𝑟 ∈ Mem[𝑉]. Now we define an important requirement for our DIBI model’s states. Definition 4.2.2. We use unit𝑆 to denote the kernel 𝑔 : Mem[𝑆] → D(Mem[𝑆]) defined by 𝑔(𝑚) = unit(𝑚) for all 𝑚 ∈ Mem[𝑆]. We say a kernel 𝑓 : Mem[𝑆] → D(Mem[𝑈]) preserves its input to its output if 𝑆 ⊆ 𝑈 and 𝜋𝑆 𝑓 = unit𝑆. Intuitively, kernels that preserve their input to their output are suitable for encoding conditional distributions: once a variable has been conditioned, its value should not change. We define two ways to compose these kernels. Definition 4.2.3 (Composing Markov kernels on memories). Given 𝑓 : Mem[𝑆] → D(Mem[𝑇]) and 𝑔 : Mem[𝑈] → D(Mem[𝑉]) that preserve their inputs, we define their parallel composition, whenever 𝑆 ∩ 𝑈 = 𝑇 ∩ 𝑉 , as the map 𝑓 ⊕ 𝑔 : Mem[𝑆 ∪𝑈] → D(Mem[𝑇 ∪𝑉]) given by ( 𝑓 ⊕ 𝑔) (𝑑) (𝑚) := 𝑓 (𝑑𝑆) (𝑚𝑇 ) · 𝑔(𝑑𝑈) (𝑚𝑉 ). If 𝑇 = 𝑈, the sequential composition 𝑓 ⊙ 𝑔 : Mem[𝑆] → D(Mem[𝑉]) is just Kleisli composition (eq. (4.1)). Example 4.2.1 (Kernel decomposition). Recall the distribution 𝜇 on Mem[{𝑥, 𝑦, 𝑧}] from Example 4.0.1. Let 𝑘𝑥 : Mem[𝑧] → D(Mem[{𝑥, 𝑧}]) encode the conditional distribution of 𝑥 given 𝑧, and let 𝑘𝑦 : Mem[𝑧] → D(Mem[{𝑦, 𝑧}]) encode the conditional distribution of 𝑦 given 𝑧. Explicitly, for 𝛼 = 𝑥 or 𝑦, 𝑘𝛼 (𝑧 = 0) (𝛼 = 1, 𝑧 = 0) = 1/2 𝑘𝛼 (𝑧 = 0) (𝛼 = 0, 𝑧 = 0) = 1/2 𝑘𝛼 (𝑧 = 1) (𝛼 = 1, 𝑧 = 1) = 1/4 𝑘𝛼 (𝑧 = 1) (𝛼 = 0, 𝑧 = 1) = 3/4. 129 Since 𝑘𝑥 , 𝑘𝑦 exactly include 𝑧 in their range, 𝑘𝑥⊕𝑘𝑦 is defined. A small calculation shows that 𝑘𝑥⊕𝑘𝑦 = 𝑘 , where 𝑘 : Mem[𝑧] → D(Mem[{𝑥, 𝑦, 𝑧}]) is the conditional distribution of (𝑥, 𝑦, 𝑧) given 𝑧. This decomposition shows that 𝑥 and 𝑦 are inde- pendent conditioned on 𝑧. The correspondence between the decomposition of kernels and conditional independence is proved in 4.2.1 A Concrete Probabilistic Frame of DIBI We now have all the ingredients to define a first concrete model: states are Markov kernels that preserve their input; the binary operation ⊕ behaves as a parallel composition, and the binary operation ⊙ serves as the sequential com- position. While there is a canonical choice for the sequential composition of Markov kernels, i.e., Kleisli composition, there are many choices for the parallel composition. For instance, it is unclear whether we should only allow paral- lel composition of kernels with the same domain, or work with a more relaxed condition. Another difficulty is in the definition of the pre-order. We are go- ing to define two very different binary operations, and not only do we need both of their unit sets be closed under the pre-order, we also need the coher- ence conditions for the pre-order and both binary operations (⊕ Down-Closed, ⊙ Up-Closed, ⊕ Unit Coherence, ⊙ CoherenceR) to hold. Definition 4.2.4 (Probabilistic frame). We define the frame (X𝐶𝐼 , ⊑, ⊕̂, ⊙̂,X𝐶𝐼) as follows: • X𝐶𝐼 are Markov kernels that preserve their input to their output; • ⊕̂ and ⊙̂ are defined through the parallel and sequential composition of 130 kernels: 𝑓 ⊕̂ 𝑔 =  { 𝑓 ⊕ 𝑔} if range( 𝑓 ) ∩ range(𝑔) = dom( 𝑓 ) ∩ dom(𝑔) ∅ otherwise 𝑓 ⊙̂ 𝑔 =  { 𝑓 ⊙ 𝑔} if range( 𝑓 ) = dom(𝑔) ∅ otherwise • Given 𝑓 , 𝑔 ∈ X𝐶𝐼 , 𝑓 ⊑ 𝑔 if there exist a set of variables 𝑅 ⊆ Val and another kernel ℎ ∈ X𝐶𝐼 such that 𝑔 = ( 𝑓 ⊕ unit𝑅) ⊙ ℎ. We make three remarks. First, the binary combinations ⊕̂ and ⊙̂ return sets with at most one element. So they are essentially a wrapper over their under- lying operations ⊕ and ⊙, which are partial and deterministic; in the following, including when proving the structure (X𝐶𝐼 , ⊑, ⊕̂, ⊙̂,X𝐶𝐼) is a DIBI frame, we will work directly with the underlying operations ⊕ and ⊙. Second, the definition of 𝑓 ⊙𝑔 onX𝐶𝐼 can be simplified. Given 𝑓 : Mem[𝑆] → D(Mem[𝑇]) and 𝑔 : Mem[𝑇] → D(Mem[𝑉]), eq. (4.1) yields the formula: ( 𝑓 ⊙ 𝑔) (𝑑) (𝑚) := ∑︁ 𝑚′∈Mem[𝑇] 𝑓 (𝑑) (𝑚′) · 𝑔(𝑚′) (𝑚). Since 𝑓 , 𝑔 ∈ X𝐶𝐼 preserve input to output, this reduces to ( 𝑓 ⊙ 𝑔) (𝑑) (𝑚) = 𝑓 (𝑑) (𝑚𝑇 ) · 𝑔(𝑚𝑇 ) (𝑚𝑉 ). (4.2) Third, the preorder is defined so that 𝑓 ⊑ 𝑔 holds when 𝑔 can be obtained from extending 𝑓 . If 𝑔 is obtained by composing 𝑓 in parallel with unit𝑅, and then extending the range via composition with ℎ, then we can recover 𝑓 from 𝑔 by marginalizing 𝑔 to range( 𝑓 ) ∪ 𝑅, and then ignoring the 𝑅 portion. We show that our probabilistic frame is indeed a DIBI frame. 131 Theorem 4.2.1. (X𝐶𝐼 , ⊑, ⊕̂, ⊙̂,X𝐶𝐼) is a DIBI frame. Proof sketch. Since ⊕̂ and ⊙̂ returns either a singleton set or an empty set, for any axioms that mention 𝑥 ∈ 𝑦 ⊕̂ 𝑧 (resp. 𝑥 ∈ 𝑦 ⊙̂ 𝑧), we can always use the 𝑥 such that 𝑥 = 𝑦 ⊕ 𝑧 (resp. 𝑥 = 𝑦 ⊙ 𝑧). We first show thatX𝐶𝐼 is closed under ⊕ and ⊙, and ⊑ is transitive and reflex- ive. Then we show the frame axioms, which are mostly straightforward. Several conditions rely on a property of our model that we call Exchange Equality: if both ( 𝑓1⊕ 𝑓2) ⊙ ( 𝑓3⊕ 𝑓4) and ( 𝑓1⊙ 𝑓3) ⊕ ( 𝑓2⊙ 𝑓4) are defined, then they are equal, and if the second is defined, then so is the first. While its connection with REVEX is the most obvious, the Exchange Equality is also useful for proving other conditions since the preorder in X𝐶𝐼 is defined through the binary combinations ⊕ and ⊙. For example: (⊕ Unit Coherence): Since the unit set in this frame is the entire state space X𝐶𝐼 , we must show that for any 𝑓1, 𝑓2 ∈ X𝐶𝐼 , if 𝑓1⊕ 𝑓2 is defined, then 𝑓1 ⊑ 𝑓1⊕ 𝑓2: 𝑓1 ⊕ 𝑓2 = ( 𝑓1 ⊙ unitrange( 𝑓1)) ⊕ (unitdom( 𝑓2) ⊙ 𝑓2) = ( 𝑓1 ⊕ unitdom( 𝑓2)) ⊙ (unitrange( 𝑓1) ⊕ 𝑓2) (By Exchange Equality) = ( 𝑓1 ⊕ unitdom( 𝑓2)) ⊙ ( 𝑓2 ⊕ unitrange( 𝑓1)) (By ⊕ Commutativity) Also, for the commutativity and associativity of the binary combinations, the main difficulty lies in showing that both terms are defined at the same time. In particular, the associativity of ⊕ requires ( 𝑓 ⊕ 𝑔) ⊕ ℎ being defined iff 𝑓 ⊕ (𝑔 ⊕ ℎ) being defined, which takes some non-trivial set manipulations to prove We present the complete proof in appendix C.1.5 □ 132 4.2.2 Capturing Conditional Independence Now we return to our original goal: express conditional independence of pro- gram variables. For that, we introduce some basic atomic propositions and in- terpret DIBI formulas on the probabilistic DIBI frame (X𝐶𝐼 , ⊑, ⊕̂, ⊙̂,X𝐶𝐼). If the only property we need to express is conditional independence of program vari- ables, we only need atomic propositions in the form of (𝐴 ⊲ 𝐵), which intends to describe the domain and range of the current kernel. Definition 4.2.5 (Basic atomic proposition). For sets of variables 𝐴, 𝐵 ⊆ Var, a basic atomic proposition has the form (𝐴 ⊲ 𝐵) and the semantics: 𝑓 |= (𝐴 ⊲ 𝐵) iff there exists 𝑓 ′ ⊑ 𝑓 such that dom( 𝑓 ′) = 𝐴 and range( 𝑓 ′) ⊇ 𝐵. For example, 𝑓 : Mem[𝑦] → D(Mem[𝑦, 𝑧]) defined by 𝑓 (𝑦 ↦→ 𝑣) := unit(𝑦 ↦→ 𝑣, 𝑧 ↦→ 𝑣) satisfies (𝑦 ⊲ 𝑦), (𝑦 ⊲ 𝑧), (𝑦 ⊲ ∅), (𝑦 ⊲ 𝑦, 𝑧), (∅ ⊲ ∅), and no other atomic propositions. With these atomic propositions, we can assert conditional independence of program variables using a simple formula: Theorem 4.2.2. Given distribution 𝜇 ∈ D(Mem[Var]), then for any 𝑋,𝑌, 𝑍 ⊆ Var, 𝑓𝜇 |= (∅ ⊲ 𝑍) # (𝑍 ⊲ 𝑋) ∗ (𝑍 ⊲ 𝑌 ) (4.3) if and only if 𝑋 ⊥⊥ 𝑌 | 𝑍 and 𝑋 ∩ 𝑌 ⊆ 𝑍 are both satisfied. We prove it in the restriction 𝑋 ∩ 𝑌 ⊆ 𝑍 is harmless: when 𝑋 ⊥⊥ 𝑌 | 𝑍 but 𝑋 ∩ 𝑌 ⊈ 𝑍 , then the variables in 𝑋 ∩ 𝑌 must be determined by variables in 𝑍 133 (see lemma C.2.7), and it suffices to check 𝑋 ⊥⊥ 𝑌 | 𝑍 ∪ (𝑋 ∩ 𝑌 ). For simplicity, we abbreviate the formula (∅ ⊲ 𝑍) # ((𝑍 ⊲ 𝑋) ∗ (𝑍 ⊲ 𝑌 )) as [𝑍] # ( [𝑋] ∗ [𝑌 ]). Proof sketch. For the forward direction, suppose 𝑓𝜇 satisfies 4.3. We first show in lemma C.2.6 that this intuitionistic logic has some classical flavor: when- ever 𝑓𝜇 satisfies 4.3 there exist 𝑓 , 𝑔, and ℎ in X𝐶𝐼 with 𝑓 ⊙ (𝑔 ⊕ ℎ) ⊑ 𝑓𝜇, where 𝑓 : Mem[∅] → D(Mem[𝑍]), 𝑔 : Mem[𝑍] → D(Mem[𝑍 ∪ 𝑋]), and ℎ : Mem[𝑍] → D(Mem[𝑍 ∪ 𝑌 ]); we also have 𝑋 ∩ 𝑌 ⊆ 𝑍 as 𝑓 ⊙ (𝑔 ⊕ ℎ) is defined. Since dom( 𝑓𝜇) = Mem[∅], 𝑓 ⊙ (𝑔 ⊕ ℎ) ⊑ 𝑓𝜇 implies: 𝑓 ⊙ (𝑔 ⊕ ℎ) = 𝜋𝑍∪𝑋∪𝑌 𝑓𝜇 and 𝑓 = 𝜋𝑍 𝑓𝜇 . Further, we can show that 𝑓 ⊙ (𝑔 ⊕ ℎ) = 𝑓 ⊙ 𝑔 ⊙ (unit𝑋 ⊕ ℎ) = 𝑓 ⊙ ℎ ⊙ (unit𝑌 ⊕ 𝑔), and thus: 𝑓 ⊙ 𝑔 = 𝜋𝑍∪𝑋 𝑓𝜇 and 𝑓 ⊙ ℎ = 𝜋𝑍∪𝑌 𝑓𝜇 . These imply that 𝑔 (resp. ℎ) encodes the conditional distributions of 𝑋 (resp. 𝑌 ) given 𝑍 , and 𝑔⊕ ℎ encodes the conditional distribution of (𝑋,𝑌 ) given 𝑍 . Hence, 𝑓 ⊙(𝑔⊕ℎ) ⊑ 𝑓𝜇 implies that the conditional distribution of (𝑋,𝑌 ) given 𝑍 is equal to the product distribution of 𝑋 given 𝑍 and 𝑌 given 𝑍 , and so 𝑋 ⊥⊥ 𝑌 | 𝑍 holds in 𝜇. For the reverse direction, suppose that 𝑋 ⊥⊥ 𝑌 | 𝑍 holds in 𝜇 and 𝑋 ∩ 𝑌 ⊆ 𝑍 . Now, consider 𝜋𝑋∪𝑌∪𝑍 𝑓𝜇, the marginal distribution on (𝑋,𝑌, 𝑍) encoded as a kernel, and observe that 𝜋𝑋,𝑌,𝑍 𝑓𝜇 = 𝑓 ⊙ 𝑓 ′, where 𝑓 encodes the marginal distribution of 𝑍 , and 𝑓 ′ is the conditional distribution of (𝑋,𝑌 ) given values of 𝑍 . From (a), the conditional distribution of (𝑋,𝑌 ) given 𝑍 is the product of the conditional distributions of 𝑋 given 𝑍 , and 𝑌 given 𝑍 , that is 𝑓 ′ = 𝑔 ⊕ ℎ, where 𝑔 (resp. ℎ) encode the conditional distribution of 𝑋 (resp. 𝑌 ) given 𝑍 . Then by (b), 134 𝑓 ⊙ (𝑔 ⊕ ℎ) is defined and 𝑓 ⊙ (𝑔 ⊕ ℎ) = 𝜋𝑋∪𝑌∪𝑍 𝑓𝜇 ⊑ 𝑓𝜇. It is straightforward to see that 𝑓 ⊙ (𝑔 ⊕ ℎ) satisfies [𝑍] # ( [𝑋] ∗ [𝑌 ]). Hence, persistence shows that 𝑓𝜇 also satisfies [𝑍] # ( [𝑋] ∗ [𝑌 ]). See lemma C.2.8 for details. □ 4.2.3 Validating the Semi-graphoid Axioms Notions analogous to conditional independence are useful in different domains. For instance, in database theory [Abiteboul et al., 1995], join dependency, which can be seen as conditional independence for powersets instead of distributions, allows more efficient storage and querying of relational databases [Fagin and Vardi, 1984]. There is a long line of research on logical characterizations of con- ditional independence and join dependency. Graphoids is perhaps the most well- known approach [Pearl and Paz, 1985]; later, Dawid [2001] has a similar notion called separoids. Here, we focus on graphoids. Definition 4.2.6 (Graphoids and semi-graphoids). Suppose that 𝐼 (𝑋, 𝑍,𝑌 ) is a ternary relation on subsets of Var (i.e., 𝑋, 𝑍,𝑌 ⊆ Var). Then the relation 𝐼 is a graphoid if it satisfies: 𝐼 (𝑋, 𝑍,𝑌 ) ⇔ 𝐼 (𝑌, 𝑍, 𝑋) (SYMMETRY) 𝐼 (𝑋, 𝑍,𝑌 ∪𝑊) ⇒ 𝐼 (𝑋, 𝑍,𝑌 ) ∧ 𝐼 (𝑋, 𝑍,𝑊) (DECOMPOSITION) 𝐼 (𝑋, 𝑍,𝑌 ∪𝑊) ⇒ 𝐼 (𝑋, 𝑍 ∪𝑊,𝑌 ) (WEAK UNION) 𝐼 (𝑋, 𝑍,𝑌 ) ∧ 𝐼 (𝑋, 𝑍 ∪ 𝑌,𝑊) ⇔ 𝐼 (𝑋, 𝑍,𝑌 ∪𝑊) (CONTRACTION) 𝐼 (𝑋, 𝑍 ∪𝑊,𝑌 ) ∧ 𝐼 (𝑋, 𝑍 ∪ 𝑌,𝑊) ⇒ 𝐼 (𝑋, 𝑍,𝑌 ∪𝑊) (INTERSECTION) If 𝐼 satisfies the first four properties, then it is a semi-graphoid. 135 Because 𝐼 (𝑋, 𝑍,𝑌 ) intends to capture CI-like notions, these conditions aims at axiomatizing the relation “knowing 𝑍 renders 𝑋 irrelevant to𝑌 .” As an exam- ple, it is known that conditional independence relation forms a semi-graphoid: if we fix a distribution over 𝜇 ∈ D(Mem[Var]), then taking 𝐼 (𝑋, 𝑍,𝑌 ) to be the set of triples such that 𝑋 ⊥⊥ 𝑌 | 𝑍 holds in 𝜇 defines a semi-graphoid. Below, we show that the semi-graphoid axioms can be naturally translated into valid formulas in our probabilistic model. Theorem 4.2.3. We abbreviate our probabilistic model as 𝑀 . Define 𝐼 (𝑋, 𝑍,𝑌 ) iff 𝑀 |= [𝑍] # ( [𝑋] ∗ [𝑌 ]). Then, SYMMETRY, DECOMPOSITION, WEAK UNION, and CONTRACTION are valid. Furthermore, SYMMETRY is derivable in the proof system, and DECOMPOSITION is derivable given the following axiom, valid in 𝑀 : (𝑍 ⊲ 𝑌 ∪𝑊) ↔ (𝑍 ⊲ 𝑌 ) ∧ (𝑍 ⊲ 𝑊) (SPLIT) Proof sketch. We show the proof for the derivable axioms. To derive SYMMETRY, we use the ∗-COMM rule to commute the separating conjunction. AX 𝑃 ⊢ 𝑃 ∗-COMM 𝑄 ∗ 𝑅 ⊢ 𝑅 ∗ 𝑄 #-Conj 𝑃 # (𝑄 ∗ 𝑅) ⊢ 𝑃 # (𝑅 ∗ 𝑄) →⊢ 𝑃 # (𝑄 ∗ 𝑅) → 𝑃 # (𝑅 ∗ 𝑄) The proof of DECOMPOSITION uses the axiom SPLIT to split up 𝑌 ∪𝑊 , and then uses proof rules to derive the following. AX 𝑃 ⊢ 𝑃 AX 𝑄 ⊢ 𝑄 AX 𝑅 ∧ 𝑆 ⊢ 𝑅 ∧ 𝑆 ∧3 𝑅 ∧ 𝑆 ⊢ 𝑅 ∗-CONJ 𝑄 ∗ (𝑅 ∧ 𝑆) ⊢ 𝑄 ∗ 𝑅 #-CONJ 𝑃 # (𝑄 ∗ (𝑅 ∧ 𝑆)) ⊢ 𝑃 # (𝑄 ∗ 𝑅) Similar to left 𝑃 # (𝑄 ∗ (𝑅 ∧ 𝑆)) ⊢ 𝑃 # (𝑄 ∗ 𝑆) ∧1 𝑃 # (𝑄 ∗ (𝑅 ∧ 𝑆)) ⊢ 𝑃 # (𝑄 ∗ 𝑅) ∧ 𝑃 # (𝑄 ∗ 𝑆)→ ⊢ 𝑃 # (𝑄 ∗ (𝑅 ∧ 𝑆)) → 𝑃 # (𝑄 ∗ 𝑅) ∧ 𝑃 # (𝑄 ∗ 𝑆) 136 Thus, as an instance ⊢(∅ ⊲ 𝑍) # ((𝑍 ⊲ 𝑋) ∗ ((𝑍 ⊲ 𝑌 ) ∧ (𝑍 ⊲ 𝑊))) → (∅ ⊲ 𝑍) # ((𝑍 ⊲ 𝑋) ∗ (𝑍 ⊲ 𝑌 )) ∧ (∅ ⊲ 𝑍) # ((𝑍 ⊲ 𝑋) ∗ (𝑍 ⊲ 𝑊)) Combine that with eq. (SPLIT), we have ⊢(∅ ⊲ 𝑍) # ((𝑍 ⊲ 𝑋) ∗ ((𝑍 ⊲ 𝑌 ∪𝑊))) → (∅ ⊲ 𝑍) # ((𝑍 ⊲ 𝑋) ∗ (𝑍 ⊲ 𝑌 )) ∧ (∅ ⊲ 𝑍) # ((𝑍 ⊲ 𝑋) ∗ (𝑍 ⊲ 𝑊)) We prove validity of WEAK UNION and CONTRACTION in appendix C.2.3. □ Our conference paper Bao et al. [2021] in addition introduces a relational model of DIBI, where [𝑍] # ( [𝑋] ∗ [𝑌 ]) asserts join dependency, and the semi- graphoid axioms can be translated into valid formulas in the relational model as well. 4.3 Conditional Probabilistic Separation Logic Conditional independence of program variables can be subtle to reason about, motivating formal methods for proving it. We design a program logic CPSL for formally proving conditional independence in a simplified probabilistic im- perative language. The language has assignments, sampling, sequencing, and conditionals, but no loops, which would make the reasoning even trickier. Here, our goal is to simply show how a DIBI-based program logic could work in a ba- sic setting. 137 4.3.1 CPSL: Assertion Logic Like PSL and LINA, CPSL is constructed in two layers: the assertion logic de- scribes program states — probability distributions here — while the program logic describes probabilistic programs, using the assertion logic to specify pre- and post-conditions. Our starting point for the assertion logic is the probabilistic model of DIBI introduced in section 4.2, with atomic assertions we introduced to assert conditional independence in section 4.2.2. We encode distributions as Markov kernels with domain Mem[∅] in order to interpret DIBI on program states. However, it turns out that the full logic DIBI is not suitable for a program logic. The main problem is that not all formulas in DIBI satisfy a key technical condition, the restriction property. Definition 4.3.1 (Restriction). A formula 𝑃 satisfies restriction if: a Markov ker- nel 𝑓 satisfies 𝑃 iff there exists 𝑓 ′ ⊑ 𝑓 such that range( 𝑓 ′) ⊆ FV(𝑃) and 𝑓 ′ |= 𝑃. A similar restriction property plays an important role in the soundness of Frame-like rules in PSL and LINA because formulas satisfying restriction are preserved if the program does not modify variables appearing in the formula. Here, we also need it, not only to prove FRAME but also to reason about how the preconditions are preserved in ASSN, SAMP and COND. Thus, we want to show that the restriction property holds for DIBI formulas. The reverse direction is immediate by persistence, but the forward direction is more delicate – there are simple formulas where restriction fails. Example 4.3.1 (Failure of restriction). Consider the kernel 𝑓 : Mem[𝑧] → D(Mem[𝑥, 𝑧]) with 𝑓 (𝑧 ↦→ 𝑐) := unit(𝑥 ↦→ 𝑐, 𝑧 ↦→ 𝑐). We can show that 𝑓 satisfies the formula 𝜑 := ⊤ # (𝑥 ⊲ Own(𝑥)): letting 𝑓1 : Mem[𝑧] → D(Mem[𝑥, 𝑧]) and 138 𝑓2 : Mem[𝑥, 𝑧] → D(Mem[𝑥, 𝑧]) with 𝑓1(𝑧 ↦→ 𝑐) := unit(𝑥 ↦→ 𝑐, 𝑧 ↦→ 𝑐) 𝑓2 := unitMem[𝑥] ⊕ unitMem[𝑧] then we have 𝑓1 |= ⊤ and 𝑓2 |= (𝑥 ⊲ [𝑥]). Also, 𝑓 = 𝑓1 ⊙ 𝑓2, so 𝑓 |= ⊤ # (𝑥 ⊲ [𝑥]). Since FV(𝜑) = {𝑥}, any subkernel 𝑓 ′ ⊑ 𝑓 simultaneously satisfying 𝜑 and wit- nessing restriction must be of type 𝑓 ′ : Mem[𝑧] → D(Mem[𝑥]), but there are no input-preserving kernels of that type. To address this problem, we will identify a fragment of DIBI that satis- fies restriction and is sufficiently rich to support an interesting program logic. Intuitively, restriction may fail for 𝜑 when the satisfaction of 𝜑 implicitly re- quires unexpected variables in the domain of the kernel, or 𝜑 does not describe needed variables in its range. Thus, we employ syntactic conditions to over- approximate variables that can appear in the domain of a kernel satisfying 𝜑 as FVD(𝜑) and under-approximate variables that can appear in the range as FVR(𝜑). Definition 4.3.2 (FVD and FVR). For DIBI formulas generated by probabilistic atomic propositions, conjunctions (∧, ∗, #) and disjunction (∨), we define two 139 sets of variables: FVD(⊤) = FVD(⊥) := ∅ FVR(⊤) = FVR(⊥) := ∅ FVD(𝐴 ⊲ 𝐵) := FV(𝐴) FVR(𝐴 ⊲ 𝐵) := FV(𝐴) ∪ FV(𝐵) FVD(𝑃 ∧𝑄) := FVD(𝑃) ∪ FVD(𝑄) FVR(𝑃 ∧𝑄) := FVR(𝑃) ∪ FVR(𝑄) FVD(𝑃 ∗ 𝑄) := FVD(𝑃) ∪ FVD(𝑄) FVR(𝑃 ∗ 𝑄) := FVR(𝑃) ∪ FVR(𝑄) FVD(𝑃 #𝑄) := FVD(𝑃) ∪ FVD(𝑄) FVR(𝑃 #𝑄) := FVR(𝑃) ∪ FVR(𝑄) FVD(𝑃 ∨𝑄) := FVD(𝑃) ∪ FVD(𝑄) FVR(𝑃 ∨𝑄) := FVR(𝑃) ∩ FVR(𝑄) Now, we have all the ingredients to introduce our assertions. The logic DIBI+ is a fragment of DIBI with atomic propositions AP, with formulas DIBI+ de- fined by the following grammar: 𝑃,𝑄 ::= AP | ⊤ | ⊥ | 𝑃 ∨𝑄 | 𝑃 ∗ 𝑄 | 𝑃 #𝑄 (FVD(𝑄) ⊆ FVR(𝑃)) | 𝑃 ∧𝑄 (FVR(𝑃) = FVR(𝑄) = FV(𝑃) = FV(𝑄)). The side-condition for 𝑃 # 𝑄 ensures that variables used by 𝑄 are described by 𝑃. The side-condition for 𝑃 ∧𝑄 is the most restrictive — to understand why we need it, consider the following example. Example 4.3.2 (Failure of restriction for And). Consider the formula 𝑃 := (∅ ⊲ {𝑥}) ∧ (∅ ⊲ {𝑦}), and kernel 𝑓 : Mem[𝑧] → D(Mem[𝑥, 𝑦, 𝑧])with 𝑓 (𝑧 ↦→ tt) being the distribution with 𝑥 a fair coin flip, 𝑦 = 𝑥, and 𝑧 = tt, and 𝑓 (𝑧 ↦→ ff ) being the distribution with 𝑥 a fair coin flip, 𝑦 = ¬𝑥, and 𝑧 = ff . Then, there exist 𝑓1 : Mem[∅] → D(Mem[𝑥]) and 𝑓2 : Mem[∅] → D(Mem[𝑦]) such that 𝑓1 ⊑ 𝑓 and 𝑓2 ⊑ 𝑓 . Since 𝑓1 |= (∅ ⊲ {𝑥}) and 𝑓2 |= (∅ ⊲ {𝑦}), it follows 𝑓 |= 𝑃. But, because 𝑧 is correlated with (𝑥, 𝑦), there is no kernel 𝑓 ′ : Mem[∅] → 140 D(Mem[𝑥, 𝑦]) satisfying 𝑃 such that 𝑓 ′ ⊑ 𝑓 because that means 𝑓 can be ob- tained by parallel combination of 𝑓 ′with another kernel with domain {𝑧}, which requires them to be independent. With atomic propositions introduced to express conditional independence, i.e., (𝐴 ⊲ 𝐵) where 𝐴, 𝐵 ⊆ Var, all formulas in DIBI+ satisfy the restriction property. But before proving the restriction property for DIBI+, we enrich the atomic propositions to describe more fine-grained information about the do- main and range of kernels, and then show that DIBI+ with the enriched set of atomic propositions still satisfies the restriction property. In particular, we want to enrich the atomic propositions in the following ways. Domain. Given a kernel 𝑓 , the existing atomic propositions (𝐴 ⊲ 𝐵) can only describe properties that hold for all (well-typed) inputs 𝑚 to 𝑓 . We would like to be able to describe properties that hold for only certain inputs, e.g., for memories 𝑚 where a variable 𝑧 is true. Range. Given any input 𝑚 to a kernel 𝑓 , the existing atomic propositions can only guarantee the presence of variables in the output distribution 𝑓 (𝑚). We would like to describe more precise information about 𝑓 (𝑚), e.g., that certain variables are independent conditioned on a particular value of 𝑚, rather than on all values of 𝑚. Thus, we extend atomic propositions to all pairs of logical formula (𝜙 ⊲ 𝜓), where 𝜙 is a logical formula over the kernel domain (i.e., memories), while 𝜓 is a logical formula over the kernel range (i.e., distributions over memories). To describe memories, we take a simple propositional logic. 141 Definition 4.3.3 (Domain logic). The domain logic has formulas 𝜙 of the form 𝑆 : 𝑝𝑑 , where 𝑆 ⊆ Var is a subset of variables and 𝑝𝑑 ::= [𝑒1 = 𝑒2] | ⊤ | ⊥ | 𝑝𝑑 ∧ 𝑝′𝑑 | 𝑝𝑑 ∨ 𝑝 ′ 𝑑 . A formula 𝑆 : 𝑝𝑑 is satisfied in by a memory𝑚, written𝑚 |=𝑑 𝑆 : 𝑝𝑑 , if dom(𝑚) = 𝑆 and 𝑝𝑑 holds in 𝑚. In particular, [𝑒1 = 𝑒2] holds in 𝑚 iff ⟦𝑒1⟧(𝑚) = ⟦𝑒2⟧(𝑚). We read 𝑆 : 𝑝𝑑 as “memories over 𝑆 such that 𝑝𝑑” and abbreviate 𝑆 : ⊤ as 𝑆. To describe distributions over memories, we adapt formulas in probabilistic BI for the range logic. Definition 4.3.4 (Range logic). The range logic has the following formulas from probabilistic BI: 𝑝𝑟 ::= [𝑆] | 𝑥 $∼ 𝑑 | [𝑥 = 𝑒] | ⊤ | ⊥ | 𝑝𝑟 ∧ 𝑝′𝑟 | 𝑝𝑟 ∗ 𝑝′𝑟 . We give a semantics where states are distributions over memories: 𝑀𝑟 = {𝜇 : D(Mem[𝑆]) | 𝑆 ⊆ Var}. We define a preorder on states via 𝜇1 ⊑𝑟 𝜇2 if and only if dom(𝜇1) ⊆ dom(𝜇2) and 𝜋dom(𝜇1)𝜇2 = 𝜇1, and we define a partial binary operation on states: for any 𝜇1 ∈ D(Mem[𝑆]) and 𝜇2 ∈ D(Mem[𝑇]), 𝜇1 ⊕𝑟 𝜇2 :=  {𝜋𝑆\𝑇𝜇1 ⊗ 𝛿𝑚 ⊗ 𝜋𝑇\𝑆𝜇2} if ∃𝑚 ∈ Mem[𝑆 ∩ 𝑇] s.t. 𝜋𝑆∩𝑇𝜇1 = 𝜋𝑆∩𝑇𝜇2 = 𝛿𝑚 {} otherwise where ⊗ takes the independent product of two distributions over disjoint do- mains. That is, for any 𝑥 ∈ D(Mem[𝑆 ∪ 𝑇]) , (𝜋𝑆\𝑇𝜇1 ⊗ 𝛿𝑚 ⊗ 𝜋𝑇\𝑆𝜇2) (𝑥) := 𝜋𝑆\𝑇𝜇1(𝜋𝑆\𝑇𝑥) · 1 · 𝜋𝑇\𝑆𝜇2(𝜋𝑇\𝑆𝑥) 142 This operation generalizes the monoid from the probabilistic BI frame to allow for combining distributions with overlapping domains if the distributions over the overlap is deterministic and equal; this mild generalization is useful for our setting, where distributions often have deterministic variables (e.g., variables corresponding to the input of kernels). Then, we define the semantics of the range logic as: 𝜇 |=𝑟 ⊤ always 𝜇 |=𝑟 ⊥ never 𝜇 |=𝑟 [𝑠] iff 𝑠 ⊆ dom(𝜇) or 𝑠 ∈ dom(𝜇) 𝜇 |=𝑟 𝑒 $∼ 𝑑 iff FV(𝑒) ⊆ dom(𝜇) and ⟦𝑒⟧(𝜇) = 𝑑 𝜇 |=𝑟 [𝑒1 = 𝑒2] iff FV(𝑒1) ∪ FV(𝑒2) ⊆ dom(𝜇) and ⟦𝑒⟧(𝑚) = ⟦𝑒′⟧(𝑚) for any 𝑚 in the support of 𝜇 𝜇 |=𝑟 𝑝𝑟 ∧ 𝑝′𝑟 iff 𝜇 |=𝑟 𝑝𝑟 and 𝜇 |=𝑟 𝑝′𝑟 𝜇 |=𝑟 𝑝𝑟 ∗ 𝑝′𝑟 iff there exists 𝜇1 ⊕𝑟 𝜇2 ⊑ 𝜇 with 𝜇1 |=𝑟 𝑝𝑟 and 𝜇2 |=𝑟 𝑝′𝑟 . We only use domain formulas 𝜙 and range formulas 𝜓 in the enriched atomic propositions of the form (𝜙 ⊲ 𝜓), so we do not need to show formulas in the do- main logic are persistent and similarly formulas in the range logic are persistent. Now, we can give a semantics to our enriched atomic propositions. Definition 4.3.5. Given a kernel 𝑓 and atomic proposition (𝜙 ⊲ 𝜓), we define 𝑓 |= (𝜙 ⊲ 𝜓) iff there exists 𝑓 ′ ⊑ 𝑓 such that 𝑚 |=𝑑 𝜙 implies 𝑚 ∈ dom( 𝑓 ′) and 𝑓 (𝑚) |=𝑟 𝜓. This valuation is persistent by construction. Furthermore, formulas in DIBI+ with these atomic propositions satisfy restriction. Theorem 4.3.1 (Restriction in DIBI+). Let 𝑃 ∈ DIBI+ with atomic propositions (𝜙 ⊲ 143 𝜓), as described above. Then 𝑓 |= 𝑃 if and only if there exists 𝑓 ′ ⊑ 𝑓 such that range( 𝑓 ′) ⊆ FV(𝑃) and 𝑓 ′ |= 𝑃. Proof sketch. We prove a stronger statement by induction on 𝑃: 𝑓 |= 𝑃 if and only if there exists 𝑓 ′ ⊑ 𝑓 such that dom( 𝑓 ′) ⊆ FVD(𝑃), and FVR(𝑃) ⊆ range( 𝑓 ′) ⊆ FV(𝑃). □ Last, atomic propositions satisfy some axiom schemas, inspired by proof rules of BI. Proposition 4.3.2. The following axiom schemas for atomic propositions are valid. (𝑆 : 𝑝𝑑 ⊲ 𝑝𝑟) ∧ (𝑆 : 𝑝′𝑑 ⊲ 𝑝 ′ 𝑟) → (𝑆 : 𝑝𝑑 ∧ 𝑝′𝑑 ⊲ 𝑝𝑟 ∧ 𝑝 ′ 𝑟) if FV(𝑝𝑟) = FV(𝑝′𝑟) (AP-AND) (𝑆 : 𝑝𝑑 ⊲ 𝑝𝑟) ∧ (𝑆 : 𝑝′𝑑 ⊲ 𝑝 ′ 𝑟) → (𝑆 : 𝑝𝑑 ∨ 𝑝′𝑑 ⊲ 𝑝𝑟 ∨ 𝑝 ′ 𝑟) (AP-OR) (𝑆 : 𝑝𝑑 ⊲ 𝑝𝑟) ∗ (𝑆′ : 𝑝′𝑑 ⊲ 𝑝 ′ 𝑟) → (𝑆 ∪ 𝑆′ : 𝑝𝑑 ∧ 𝑝′𝑑 ⊲ 𝑝𝑟 ∗ 𝑝 ′ 𝑟) (AP-PAR) 𝑝′𝑑 → 𝑝𝑑 and |=𝑟 𝑝𝑟 → 𝑝′𝑟 implies |= (𝑆 : 𝑝𝑑 ⊲ 𝑝𝑟) → (𝑆 : 𝑝′𝑑 ⊲ 𝑝 ′ 𝑟) (AP-IMP) We omitted the proofs to appendix C.3.2. 4.3.2 Conditional Probabilistic Separation Logic (CPSL) With the assertion logic set, we are now ready to introduce our program logic. We call it Conditional Probabilistic Separation Logic, abbreviated as CPSL. Judgments in CPSL have the form {𝑃} 𝑐 {𝑄}, where 𝑐 is a loopless probabilistic program in pWhile and 𝑃,𝑄 ∈ DIBI+ are restricted assertions serving as the pre- and post-conditions. As usual, a judgment holds if the program in the judgment 144 ASSN 𝑥 ∉ FV(𝑒) ∪ FV(𝑃) ⊢ {𝑃} 𝑥 ← 𝑒 {𝑃 # (FV(𝑒) ⊲ [𝑥 = 𝑒])} SAMP 𝑥 ∉ FV(𝑃) ⊢ {𝑃} 𝑥 $← 𝑑 {𝑃 # (∅ ⊲ 𝑥 $∼ 𝑑)} SKIP ⊢ {𝑃} skip {𝑃} SEQN ⊢ {𝑃} 𝑐 {𝑄} ⊢ {𝑄} 𝑐′ {𝑅} ⊢ {𝑃} 𝑐 ; 𝑐′ {𝑅} COND ⊢ {(∅ ⊲ [𝑏 = tt]) # 𝑃} 𝑐 {(∅ ⊲ [𝑏 = tt]) # (𝑏 : [𝑏 = tt] ⊲ 𝑄1)} ⊢ {(∅ ⊲ [𝑏 = ff ]) # 𝑃} 𝑐′ {(∅ ⊲ [𝑏 = ff ]) # (𝑏 : [𝑏 = ff ] ⊲ 𝑄2)} ⊢ {(∅ ⊲ [𝑏]) # 𝑃} if 𝑏 then 𝑐 else 𝑐′ {(∅ ⊲ [𝑏]) # ((𝑏 : [𝑏 = tt] ⊲ 𝑄1) ∧ (𝑏 : [𝑏 = ff ] ⊲ 𝑄2))} WEAK ⊢ {𝑃} 𝑐 {𝑄} |= 𝑃′ → 𝑃 ∧𝑄 → 𝑄′ ⊢ {𝑃′} 𝑐 {𝑄′} FRAME ⊢ {𝑃} 𝑐 {𝑄} FV(𝑅) ∩MV(𝑐) = ∅ FV(𝑄) ⊆ FVR(𝑃) ∪WV(𝑐) RV(𝑐) ⊆ FVR(𝑃) ⊢ {𝑃 ∗ 𝑅} 𝑐 {𝑄 ∗ 𝑅} Figure 4.5: Proof rules: CPSL maps states satisfying the pre-condition to states satisfying the post-condition. One small difference is that DIBI+ formulas are interpreted on kernels while the program states are distributions — the mismatch is handled by the natural lifting of the distributions to kernels. Definition 4.3.6 (CPSL Validity). A CPSL judgment {𝑃} 𝑐 {𝑄} is valid, written |= {𝑃} 𝑐 {𝑄}, if for every input distribution 𝜇 ∈ D(Mem[Var]) such that the lifted input 𝑓𝜇 ≜ ⟨⟩ ↦→ 𝜇 satisfies 𝑓𝜇 |= 𝑃, the lifted output satisfies 𝑓⟦𝑐⟧𝜇 |= 𝑄. The proof rules of CPSL are presented in Figure 4.5. Note that the require- ment that the assertions in the judgments are in DIBI+ poses implicit side condi- tions. For example, the rule ASSN requires that the post-condition 𝑃 # (FV(𝑒) ⊲ 𝑥 = 𝑒) is a formula in DIBI+, which in turn requires that FV(𝑒) ⊆ FVR(𝑃). The rules SKIP, SEQ, WEAK are standard, we comment on the other, more interesting rules. ASSN and SAMP allow forward reasoning across assignments and random sampling commands. In both cases, a pre-condition that does not mention the assigned variable 𝑥 is augmented with new information tracking 145 the value or distribution of 𝑥, and variables 𝑥 may depend on. COND allows reasoning about probabilistic control flow, and the ensu- ing conditional dependence that may result. The main pre-condition 𝑃 is al- lowed to depend on the guard variable 𝑏 but nothing else — because we need FVD(𝑃) ⊆ FVR(∅ ⊲ [𝑏]) for the formula to be in DIBI+ — and 𝑃 is preserved as a pre-condition for both branches. The post-conditions allows introducing new facts (𝑏 : 𝑏 = tt ⊲ 𝑄1) and (𝑏 : 𝑏 = ff ⊲ 𝑄2), which are then combined in the post-condition of the entire conditional command. As in PSL, the rule for con- ditionals does not allow the branches to modify the guard 𝑏 — this restriction is needed to accurately associate each post-condition to each branch. Finally, FRAME is the frame rule for CPSL. Much like in PSL, the rule in- volves three classes of variables: MV(𝑐) is the set of variables that 𝑐 may write to, RV(𝑐) is the set of variables that 𝑐 may read from the input, and WV(𝑐) is the set of variables that 𝑐 must write to; these variable sets are defined as in defini- tion 2.3.8. Then, the first side-condition FV(𝑅) ∩MV(𝑐) = ∅ of FRAME ensures that the framing condition is not modified, which is a fairly standard condition in frame-like rules. The second and third side-conditions are more specialized. Observe that the variables described by 𝑄 in the post-condition are either al- ready described by 𝑃 in the pre-condition, or are written by 𝑐. These two side conditions ensure that variables mentioned by𝑄 that were not already indepen- dent of 𝑅 are freshly written, and freshly written variables are computed using variables that were already independent of 𝑅 in the precondition, which can be guaranteed if the variables 𝑐 reads from are all in FV(𝑃). Theorem 4.3.3 (CPSL Soundness). CPSL is sound: derivable judgments are valid. Proof sketch. By induction on the proof derivation. The restriction property is 146 𝑧 $← Bern1/2; 𝑥 $← Bern1/2; 𝑦 $← Bern1/2; 𝑎 ← 𝑥 ∨ 𝑧; 𝑏 ← 𝑦 ∨ 𝑧 (a) COMMONCAUSE 𝑧 $← Bern1/2; if 𝑧 then 𝑥 $← Bern𝑝; 𝑦 $← Bern𝑝 else 𝑥 $← Bern𝑞; 𝑦 $← Bern𝑞 (b) CONDSAMPLES Figure 4.6: Example programs used repeatedly to constrain the domains and ranges of kernels witnessing dif- ferent sub-assertions, ensuring that pre-conditions about unmodified variables continue to hold in the post-condition. □ We include the full proof in appendix C.4. 4.3.3 Example: CPSL in Action Now, we demonstrate CPSL on two example programs. Example 4.3.3. Figure 4.6 introduces two more example programs. The pro- gram COMMONCAUSE (Figure 4.6a) generates a distribution where two ran- dom observations share a common cause. Specifically, 𝑧, 𝑥, and 𝑦 are indepen- dent random samples, and 𝑎 and 𝑏 are values computed from (𝑥, 𝑧) and (𝑦, 𝑧), respectively. Intuitively, 𝑧, 𝑥, and 𝑦 could represent independent noisy measure- ments, while 𝑎 and 𝑏 could represent quantities derived from these measure- ments. Since 𝑎 and 𝑏 share a common source of randomness 𝑧, they are not independent. However, 𝑎 and 𝑏 are independent conditioned on the value of 𝑧; this is a textbook example of conditional independence. The program CONDSAMPLES (Figure 4.6b) is a bit more complex: it branches 147 on a random value 𝑧, and then assigns 𝑥 and 𝑦 with two independent samples from Bern𝑝 in the true branch, and Bern𝑞 in the false branch (𝑝, 𝑞 are constant value in [0, 1]). While we might think that 𝑥 and 𝑦 are independent at the end of the program since they are independent at the end of each branch, this is not true because their distributions are different in the two branches. For example, suppose that 𝑝 = 1 and 𝑞 = 0. Then at the end of the first branch (𝑥, 𝑦) = (tt, tt) with probability 1, while at the end of the second branch (𝑥, 𝑦) = (ff , ff ) with probability 1. Thus, observing whether 𝑥 = tt or 𝑥 = ff determines the value of 𝑦 — clearly, 𝑥 and 𝑦 can’t be independent. However, 𝑥 and 𝑦 are independent conditioned on 𝑧. In both cases, we will prove a conditional independence assertion as the post-condition. We will need some axioms for implications between formulas in DIBI+. The following axioms are valid in our probabilistic model X𝐶𝐼 . Proposition 4.3.4. (AXIOMS FOR DIBI+) The following axioms are sound, assuming both precedent and antecedent are in DIBI+. (𝑃 #𝑄) # 𝑅 → 𝑃 # (𝑄 ∗ 𝑅) (INDEP-1) 𝑃 #𝑄 → 𝑃 ∗ 𝑄 if FVD(𝑄) = ∅ (INDEP-2) 𝑃 #𝑄 → 𝑃 # (𝑄 ∗ (𝑆 ⊲ [𝑆])) (PAD) (𝑃 ∗ 𝑄) # (𝑅 ∗ 𝑆) → (𝑃 # 𝑅) ∗ (𝑄 # 𝑆) (RESTEXCH) We briefly explain the axioms. INDEP-1 may look surprising, and it does not hold if we do not require the formulas to be in DIBI+. Under this assumption, it holds because 𝑃 # (𝑄 ∗ 𝑅) ∈ DIBI+ implies that 𝑅 only mentions variables that are guaranteed to be in 𝑃, and then with some maneuver, we can change one sequential composition into a parallel composition. INDEP-2 holds because any 148 kernel witnessing 𝑄 depends on no variables and thus is independent of any kernel witnessing 𝑃. PAD allows conjoining (𝑆 ⊲ [𝑆]) to the second conjunct: since 𝑃 # (𝑄 ∗ (𝑆 ⊲ [𝑆])) is in DIBI+, 𝑆 can only mention variables that are already in 𝑃. Finally, RESTEXCH shows that the standard exchange law also holds for restricted assertions. We defer the proof to Appendix C.3.2. We also need the following axioms for a particular form of atomic propo- sitions, in addition to the axioms for general atomic propositions in Proposi- tion 4.3.2. Proposition 4.3.5. (AXIOMS FOR ATOMIC PROPOSITIONS) The following axioms are sound. For any 𝑆, 𝐴, 𝐵, 𝐶 ⊆ Var, (𝑆 ⊲ [𝐴] ∗ [𝐵]) → (𝑆 ⊲ [𝐴]) ∗ (𝑆 ⊲ [𝐵]) if 𝐴 ∩ 𝐵 ⊆ 𝑆 (REVPAR) (𝑆 ⊲ [𝐴] ∗ [𝐵]) → (𝑆 ⊲ [𝐴 ∪ 𝐵]) (UNIONRAN) (𝐴 ⊲ 𝐵) # (𝐵 ⊲ 𝐶) → (𝐴 ⊲ 𝐶) (ATOMSEQ) (𝐴 ⊲ 𝐵) → (𝐴 ⊲ 𝐴) # (𝐴 ⊲ 𝐵) (UNITL) (𝐴 ⊲ 𝐵) → (𝐴 ⊲ 𝐵) # (𝐵 ⊲ 𝐵) (UNITR) We defer the proof to Appendix C.3.2. Now, we have all the ingredients for verifying our example programs, COMMONCAUSE and CONDSAMPLES. Throughout, we must ensure that all formulas used in CPSL rules and DIBI+ axioms are in DIBI+. The conjunction # raises a tricky point: DIBI+ is not closed under reassociating #, so we add paren- theses for formulas that must be in DIBI+. However, we may soundly use the full proof system of DIBI when proving implications between DIBI+ assertions, since DIBI+ is a fragment of DIBI. 149 Verification of COMMONCAUSE We aim to prove the following judgment: ⊢ {⊤} COMMONCAUSE {(∅ ⊲ [𝑧]) # ((𝑧 ⊲ [𝑎]) ∗ (𝑧 ⊲ [𝑏]))} By Theorem 4.2.2, this shows that 𝑎, 𝑏 are conditionally independent given 𝑧 at the end of the program. First, using SAMP to handle the sampling for 𝑧, 𝑥, 𝑦, we can prove the assertion: (∅ ⊲ [𝑧]) # (∅ ⊲ [𝑥]) # (∅ ⊲ [𝑦]). Using Axioms PAD, AP- PAR, UNIONRAN, and # ASSOC, this assertion implies (∅ ⊲ [𝑧]) # (𝑧 ⊲ [𝑧, 𝑥]) # (𝑧 ⊲ [𝑧, 𝑦]). ( (∅ ⊲ [𝑧]) # (∅ ⊲ [𝑥]) ) # (∅ ⊲ [𝑦]) PAD( (∅ ⊲ [𝑧]) # ((∅ ⊲ [𝑥]) ∗ (𝑧 ⊲ [𝑧])) ) # ((∅ ⊲ [𝑦]) ∗ (𝑧 ⊲ [𝑧])) ∗ (𝑧 ⊲ [𝑧]) AP-PAR(∅ ⊲ [𝑧]) # (𝑧 ⊲ [𝑧] ∗ [𝑥]) # (𝑧 ⊲ [𝑧] ∗ [𝑦]) UNIONRAN(∅ ⊲ [𝑧]) # (𝑧 ⊲ [𝑧, 𝑥]) # (𝑧 ⊲ [𝑧, 𝑦]) We take the proved formula as the pre-condition before assigning to 𝑎 and as- signing to 𝑏. After the assignments, ASSN proves:( ( (∅ ⊲ [𝑧]) # (𝑧 ⊲ [𝑧, 𝑥]) # (𝑧 ⊲ [𝑧, 𝑦]) ) # (𝑧, 𝑥 ⊲ [𝑎]) ) # (𝑧, 𝑦 ⊲ [𝑏]). Then, we can reassociate and apply INDEP-1 to derive: (∅ ⊲ [𝑧]) # ( (𝑧 ⊲ [𝑧, 𝑥]) # (𝑧, 𝑥 ⊲ [𝑎]) ) ∗ ( (𝑧 ⊲ [𝑧, 𝑦]) # (𝑧, 𝑦 ⊲ [𝑏]) ) . By Axiom ATOMSEQ, we obtain the desired post-condition: (∅ ⊲ [𝑧]) # ((𝑧 ⊲ [𝑎]) ∗ (𝑧 ⊲ [𝑏])). □ Verification of CONDSAMPLES We aim to show the following judgment: ⊢ {⊤} CONDSAMPLES {(∅ ⊲ [𝑧]) # ((𝑧 ⊲ [𝑥]) ∗ (𝑧 ⊲ [𝑦]))} 150 Again, by Theorem 4.2.2, this shows that 𝑥, 𝑦 are conditionally independent given 𝑧 at the end of the program. Starting with the sampling statement for 𝑧, applying SAMP, the Axiom INDEP-2, ∗-UNIT and #-UNIT-R gives: ⊢ {⊤} 𝑧 $← Bern1/2 {(∅ ⊲ [𝑧]) # ⊤} . To reason about the branching, we use COND. We start with the first branch. By SAMP, ASSN, SKIP, WEAK and SEQ, we have ⊢ {(∅ ⊲ 𝑧 = 𝑡𝑡) # ⊤} 𝑥 $← Bern𝑝 # 𝑦 $← Bern𝑝 {(∅ ⊲ 𝑧 = tt) # (∅ ⊲ [𝑥]) # (∅ ⊲ [𝑦])}. As before, Axioms PAD, AP-PAR, UNIONRAN, together with # ASSOC give the post- condition (∅ ⊲ 𝑧 = tt) # (𝑧 ⊲ [𝑧, 𝑥]) # (𝑧 ⊲ [𝑧, 𝑦]). Applying Axiom INDEP-1, we can show (∅ ⊲ 𝑧 = tt) # ((𝑧 ⊲ [𝑧, 𝑥]) ∗ (𝑧 ⊲ [𝑧, 𝑦])) at the end of the branch. Thus: ⊢ {(∅ ⊲ 𝑧 = 𝑡𝑡) # ⊤} 𝑥 $← Bern𝑝 # 𝑦 $← Bern𝑝 {(∅ ⊲ 𝑧 = tt) # (𝑧 : 𝑧 = tt ⊲ [𝑧, 𝑥] ∗ [𝑧, 𝑦])} . The second branch is similar: ⊢ {(∅ ⊲ 𝑧 = 𝑓 𝑓 ) # ⊤} 𝑥 $← Bern𝑞 # 𝑦 $← Bern𝑞 {(∅ ⊲ 𝑧 = ff ) # (𝑧 : 𝑧 = ff ⊲ [𝑧, 𝑥] ∗ [𝑧, 𝑦])} . Applying COND, we have: ⊢ {(∅ ⊲ [𝑧])} CONDSAMPLES {(∅ ⊲ [𝑧]) # ((𝑧 : 𝑧 = tt ⊲ [𝑧, 𝑥] ∗ [𝑧, 𝑦]) ∧ (𝑧 = ff ⊲ [𝑧, 𝑥] ∗ [𝑧, 𝑦]))} By AP-OR, the postcondition implies (∅ ⊲ [𝑧]) # ((𝑧 : 𝑧 = tt ∨ 𝑧 = ff ) ⊲ [𝑧, 𝑥] ∗ [𝑧, 𝑦] ∨ [𝑧, 𝑥] ∗ [𝑧, 𝑦]). In the domain and range logic, we have: |=𝑑 𝑧 : ⊤ → 𝑧 : (𝑧 = tt ∨ 𝑧 = ff ) and |=𝑟 [𝑧, 𝑥] ∗ [𝑧, 𝑦] ∨ [𝑧, 𝑥] ∗ [𝑧, 𝑦] → [𝑧, 𝑥] ∗ [𝑧, 𝑦] . 151 So AP-IMP implies (∅ ⊲ [𝑧]) # (𝑧 ⊲ [𝑧, 𝑥] ∗ [𝑧, 𝑦]). We can then apply REVPAR because {𝑧, 𝑥}∩ {𝑧, 𝑦} = 𝑧, deriving the postcondition (∅ ⊲ [𝑧]) # ((𝑧 ⊲ [𝑧, 𝑥]) ∗ (z ⊲ [𝑧, 𝑦])). By Axiom SPLIT, we obtain the desired post-condition: (∅ ⊲ [𝑧]) # ((𝑧 ⊲ [𝑥]) ∗ (z ⊲ [𝑦])). □ 4.4 Related Work While our program logic is the first separation logic for proving conditional in- dependence, related work has explored other approaches to capture dependen- cies and independence and has potential to lead to alternative formal methods for reasoning about conditional independence. Other non-classical logics for modeling dependencies There are other non- classical logics that aim to model dependencies. Independence-friendly (IF) logic [Hintikka and Sandu, 1989] and dependence logic [Väänänen, 2007] intro- duce new quantifiers and propositional atoms to state that a variable depends, or does not depend, on another variable logically; these logics are each equiv- alent in expressivity to existential second-order logic. More recently, Durand et al. [2018] proposed a probabilistic team semantics for dependence logic, and Hannula et al. [2020] gave a descriptive complexity result connecting this logic to real-valued Turing machines. Under probabilistic team semantics, the uni- versal and existential quantifiers bear a resemblance to our separating and de- pendent conjunctions, respectively. It would be interesting to understand the relation between these two logics, akin to how the semantics of propositional IF forms a model of BI [Abramsky and Väänänen, 2009]. 152 Conditional independence, join dependency, and logic There is a long line of research on logical characterizations of conditional independence and join dependency. The literature is too vast to survey here. On the conditional inde- pendence side, we can point to work by Geiger and Pearl [1993] on graphical models; on the join dependency side, the survey by Fagin and Vardi [1984] de- scribes the history of the area in database theory. There are several broadly similar approaches to axiomatizing the general properties of conditional depen- dence, including graphoids [Pearl and Paz, 1985] and separoids [Dawid, 2001]. Graphical Approach to Conditional Independence Probabilistic graphical models offer a powerful framework for representing probabilistic relation- ships [Koller and Friedman, 2009, Pearl, 2014]. In particular, Bayesian networks model joint distributions of program variables using directed acyclic graphs (DAGs), where edges represent conditional dependencies: each child node is associated with a conditional distribution given its parent nodes. This structure enables a more compact representation of the overall distribution by leveraging conditional independence between variables. Bayesian networks are widely used in machine learning as a flexible and interpretable class of models for fitting data [Friedman et al., 1997, Murphy, 2012]. In many cases, the structure of the network is fixed, and the parame- ters—defining the conditional distributions along the edges—are learned from data via probabilistic inference. In such settings, the conditional indepen- dence encoded in the network can significantly improve the efficiency of infer- ence [Obermeyer et al., 2019]. Moreover, the graphical structure makes it easier to identify and exploit these independence relations. 153 However, when the structure of a suitable Bayesian network is not known in advance, structure learning techniques are applied to discover it from data [Chickering, 2002, 1996, Kitson et al., 2023]. These methods often rely on identifying conditional independence relationships among variables—either through statistical tests or scoring criteria—to constrain the space of possible graph structures. Consequently, the accuracy and reliability of structure learn- ing depend heavily on the ability to verify conditional independence in ob- served data. Categorical probability The view of conditional independence as a factoriza- tion of Markov kernels has previously been explored [Jacobs and Zanasi, 2017, Cho and Jacobs, 2019, Fritz, 2020]. Taking a different approach, Simpson [2018] has recently introduced category-theoretic structures for modeling generalized conditional independence, capturing conditional independence and join depen- dency as well as analogues in heaps and nominal sets [Pitts, 2013]. Roughly speaking, conditional independence in heaps requires two disjoint portions ex- cept for a common overlap contained in the part that is conditioned; this notion can be smoothly accommodated in our framework as a DIBI model where ker- nels are Kleisli arrows for the identity monad (Brotherston and Calcagno [2009]) also consider a similar notion of separation). Simpson [2018]’s notion of condi- tional independence in nominal sets suggests that there might be a DIBI model where kernels are Kleisli arrows for some monad in nominal sets, although the appropriate monad is unclear. A recent work [Simpson, 2024] studies logical reasoning principles for gener- alized conditional independence and equality, when equality is a coarser notion than equivalence. They provide a semantic foundation for these reasoning prin- 154 ciples based on atomic sheaves, and shows a category of probability sheaves as an instantiation. While they do not concern probabilistic programs, and their reasoning principles derive new relations of variables based on known relations in a fixed distribution, it could be interesting to explore alternative assertion logic for capturing probabilistic programs based on their atomic sheaf logic. A Categorical model to DIBI logic The relational model of DIBI, introduced in the conference version [Bao et al., 2021], and the probabilistic model intro- duced above are similar but Bao et al. [2021] does not provide a unifying way to construct such similar DIBI models. In our follow-up work Gu et al. [2024], we develop an abstract framework for systematically constructing DIBI models, using category theory as the unifying mathematical language. In particular, we use string diagrams – a graphical presentation of monoidal categories – to give a uniform definition of the parallel composition and preorder in DIBI models. Our approach not only generalises known models, but also yields new models of interest and reduces properties of DIBI models to structures in the underlying categories. Furthermore, our categorical framework enables a logical notion of CI, in terms of the satisfaction of specific DIBI formulas. 155 CHAPTER 5 BLUEBELL: A UNIFYING FRAMEWORK FOR INDEPENDENCE, CONDITIONAL INDEPENDENCE AND RELATIONAL REASONING 5.1 Overview In this chapter, we present BLUEBELL, another separation logic for reasoning about probabilistic programs. While BLUEBELL is designed to be a flexible framework for combining unary reasoning and relational reasoning of proba- bilistic programs, in this thesis, we mostly focus on the unary part of BLUEBELL. The unary part of BLUEBELL shares functionality with CPSL in that they can both be used to prove independence and conditional independence of program variables. However, BLUEBELL allows more expressive assertions and has more ergonomic rules, which enable us to prove conditional independence arising in more complicated programs. Below, we overview our motivation for design- ing a unifying framework for unary reasoning and relational reasoning, prior work that we get inspiration from, and our key design choices to make it more ergonomic. Motivation: Independence Helps Relational Reasoning Unary reasoning means analyzing the behavior of the one target program di- rectly. For example, the program logics introduced in previous chapters, PSL, LINA and CPSL, are all techniques for conducting unary reasoning. This choice aligns with the nature that independence, negative dependence and conditional independence, are all properties of program variables in a single program. 156 However, sometimes the properties of concern are naturally relational. For in- stance, we may want to show that two probabilistic programs have the same behaviors, or that their outputs only differ up to a certain margin. To prove such relational goals, it could be easier to compare the two programs step by step, instead of individually capturing their outputs and comparing their out- puts. Formalization of relational reasoning of probabilistic programs has been an active research area. One prominent line of work in this area is probabilistic relational Hoare logic (pRHL) [Barthe et al., 2013, Hsu, 2017], which formalizes a technique known as “proof by coupling” by the probability theory community. Conceptually, we can think of any probabilistic program as a distribution over different execution traces. To compare two probabilistic programs, it can be helpful to pair up the execution traces from the two programs and examine the pair: for example, consider simple programs 𝐴 and 𝐵 that both make coin flips, then their coin flips have the same bias iff we can pair up their execution traces such that when 𝐴 flips head, 𝐵 flips head as well, and vice versa. This method works not only for proving program equivalence but also for a range of properties important for cryptography and differential privacy [Barthe et al., 2009, 2015, Hsu, 2017, Wang et al., 2019, Zhang and Kifer, 2017]. To describe the pairing, we can use a coupling, i.e., the joint distribution of the two distributions, thus making sense of the name “proof by coupling.” Note that the pairing and the coupling here are just reasoning tools — the actual execution of the two programs can be either correlated, or completely oblivious of each other. Our motivation for designing BLUEBELL comes from an observation that in- dependence allows one to decompose relational arguments. As an example, say we want to show two probabilistic programs 𝐴 and 𝐵 are equivalent, and 𝐴1, 𝐴2 157 are two independent components in program 𝐴, and similarly, 𝐵1, 𝐵2 are inde- pendent two components in program 𝐵. Then it is sufficient (and of course not necessary) if we can develop one relational argument showing 𝐴1 equivalent to 𝐵1 and another relational argument showing 𝐴2 equivalent to 𝐵2. Here, the key condition used is independence: when 𝐴1, 𝐴2 are not independent, or 𝐵1, 𝐵2 are not independent, then component-wise equivalence does not guarantee the overall equivalence because the components may be correlated differently; the relations between the two programs do not matter, i.e., we can also replace pro- gram equivalence with other relations between 𝐴𝑖 and 𝐵𝑖. Such decomposition can make relational reasoning more scalable, especially when the only other tool for building relational arguments is “proof by cou- pling,” which until recently requires rigid alignments between two programs. Since both reasoning of probabilistic independence and coupling can be sub- tle, we want to formalize such usage of probabilistic independence in relational proofs. Furthermore, since probabilistic independence is inherently a unary property, it is more natural to prove it in unary style arguments, such as in prob- abilistic separation logic, prompting us to unify unary reasoning and relational reasoning of probabilistic programs in one framework. Unary Fragment of BLUEBELL: A More Ergonomic Probabilistic Separation Logic On the unary reasoning side, we want to present a program separation logic that can cleanly prove independence and conditional independence. Concretely, we want a logic that allows precise description of complicated program states and formalizes subtle probabilistic reasoning as easy-to-apply syntactic rules. 158 Ideally, the users of the logic can carry out all (or at least most of) the important steps using rules in the logic, instead of resorting to meta-level mathematics. Also, we want to relieve the users from the burden of checking side conditions for applying a rule as much as possible. In the previous chapter, we show one way to assert conditional indepen- dence in bunched logic — using the DIBI logic interpreted on the probabilis- tic kernels model, and also showed a program logic based on it for reasoning about conditional independence arising in programs. The approach, however, has some limitations. For one thing, DIBI+ excludes a lot of formulas in DIBI to ensure the restriction property (theorem 4.3.1) which says that a formula 𝑃 holds in a kernel iff it holds in the subkernel restricted to free variables of 𝑃. Although we demonstrated that conditional independence in small programs can be proved using CPSL, whose program rules only involve DIBI+ formulas, it is cumbersome to always have to check if a formula is in DIBI+. In addition, in PSL, LINA and DIBI, we cannot assert expressions 𝑒1, 𝑒2 independent if 𝑒1 and 𝑒2 share variables. For example, while 𝑥 may be independent of 𝑥 xor 𝑦, the formula Own(𝑥) ∗ Own(𝑥 xor 𝑦) always implies false in their assertion logic. When designing BLUEBELL, we take inspirations from Li et al. [2023a] (Lilac), which proposes a variant of probabilistic separation logic that addresses these limitations of DIBI for functional programs. We investigate whether we can design a program logic for an imperative probabilistic programming language also with these nice features. We work with an imperative probabilistic program- ming language both because of intellectual curiosity, investigating whether it would allow us to use less technical program semantics than Lilac, and be- cause of our bigger goal to unify unary reasoning and relational reasoning in 159 one framework — a lot of work in pRHL (e.g., Barthe et al. [2013, 2015, 2016b,a, 2017]) also feature an imperative probabilistic programming language. Allow- ing the program states to be mutable creates new challenges for validating the frame rule, which requires any separate resource framed to the current one to always be preserved in program execution. Quick Walkthrough of Lilac [Li et al., 2023a] Lilac’s key innovation is using a new BI model based on measure-theoretic prob- abilities. In PSL and LINA, no state 𝜇 can satisfy assertions such as Own(𝑥) ∗ Own(𝑥 xor 𝑦) because evaluating 𝑥 and (𝑥 xor 𝑦) respectively needs marginal distributions with domain {𝑥} and {𝑥, 𝑦}, and the independent product of these two marginal distributions is undefined because their domain overlap on {𝑥}. However, measure-theoretic probability spaces are specified by a sigma-algebra describing the event space and a measure on the sigma-algebra, and it is pos- sible to separate the event space of 𝑥 and the event space of (𝑥 xor 𝑦) such that their independent product recovers the original probability space. With measure-theoretic probabilities, they also give a rigorous treatment for contin- uous probabilities, which enables them to handle examples that sample from a uniform distribution over the interval [0, 1]. To assert conditional independence, Li et al. [2023a] introduce a modality C𝑥←𝑋 to the assertion logic: their assertion logic model consists of distributions — represented using measure-theoretic probability spaces — instead of kernels as states; in their model, a distribution satisfies C𝑥←𝑋𝑃(𝑥) iff, fixing the variable 𝑋 to any value 𝑥 it can take, the conditional distribution satisfies 𝑃(𝑋). Using that, they show that the conditional independence of variables 𝑌, 𝑍 given 𝑋 can 160 be asserted using C𝑥←𝑋 Own(𝑌 ) ∗ Own(𝑍) and encode several axioms regarding conditioning and independence. On the program logic level, following the convention adopted by the higher- order concurrent separation logic framework Iris [Jung et al., 2018], Lilac chooses to work with a functional probabilistic language and define the va- lidity of a Hoare triple differently. Their definition of Hoare triples implicitly requires that any frame conjuncted to the current resource must be preserved. This allows the frame rule to be proven easily, without relying on side condi- tions or verifying that the formulas satisfy the restriction property. In exchange, one needs to inductively prove that each program preserves the frame. As they work with a functional probabilistic programming language, they fix a probabil- ity space (“ambient sampling space” in their term) and their program variables are simply random variables on that probability space. They show that every- thing works out when they fix the ambient sampling space to be the product of countably infinite copies of the [0, 1] interval under a particular set of technical constraints. The choice of the “ambient sampling space” seems highly non- robust, e.g., a product of finite copies of [0, 1] interval would not work in their proofs even if the probabilistic programs only make a finite number of sampling calls. Bluebell’s Design Choices In BLUEBELL, we combine Lilac’s measure-theoretic BI model with a BI model of permissions, which are used in Concurrent Separation Logic literature to track who can read from and write to a resource. In our model section 5.3.3, two resources 𝑎, 𝑏 can be composed together only if their permissions can be com- 161 bined as well, and this extra requirement plays a crucial role in ensuring that the Frame rule holds in our logic. We draw insights from both DIBI and Lilac to design a modified condition- ing modality, introduced in section 5.3.4, that is more expressive and satisfies a richer family of axioms than Lilac’s conditioning modality. Our conditioning modality is the key to how we mix unary reasoning and relational reasoning. It supports conditioning on one program state as well as two or more program states, and using that, we can not only capture conditional independence, but also couplings. Furthermore, from the axioms of the conditioning modality, we not only derive reasoning principles important for proving conditional inde- pendence, but also relational reasoning principles and their interactions. Even when we only focus on the unary reasoning functionality of BLUEBELL, we can formalize more interesting proof steps in the logic using the richer set of axioms enjoyed by this conditioning modality, showcased by examples in section 5.5. 5.2 Preliminaries: Programs and Probability Spaces To formally define the model of BLUEBELL and validate its rules, we introduce a number of preliminary notions. Our starting point is the measure-theoretic approach of Li et al. [2023a] in defining probabilistic separation. We recall the main definitions below. One crucial difference between elementary probability (as how we in- troduced distributions in definition 2.2.9) and measure-theoretic probabil- ity [Rosenthal, 2006, Fristedt and Gray, 2013] is their treatment of event space. In elementary probability, we work with the set of outcomes directly — distri- 162 butions are defined as maps from outcomes to numbers in [0, 1] and any subset of outcomes is considered as an event. In measure-theoretic probability, events are specified by a structure called 𝜎-algebra. Definition 5.2.1 (𝜎-algebra). Given a set of possible outcomes Ω, a 𝜎-algebra F is a set of subsets of Ω that is closed under countable unions and complements, and such that Ω ∈ F . We call an element of a 𝜎-algebra an event. We denote the set of 𝜎-algebras over a set of outcomes Ω as A(Ω). The full 𝜎-algebra over Ω is ΣΩ = 𝒫(Ω), the powerset of Ω. For 𝐹 ⊆ 𝒫(Ω), we write 𝜎(𝐹) ∈ A(Ω) for the smallest 𝜎-algebra containing 𝐹. The measure-theoretic notion of distributions map events in 𝜎-algebras to a number between 0 and 1, which we call the measure of the event, or the proba- bility of the event. Definition 5.2.2 (Probability Distributions). Given F ∈ A(Ω), a probability dis- tribution 𝜇 ∈ D(F ), is a function 𝜇 : F → [0, 1] such that • For any countable set of disjoint events {𝐸𝑖 | 𝑖 ∈ 𝐼} , 𝜇(⊎𝑖∈𝐼𝐸𝑖) = ∑ 𝑖∈𝐼 𝜇(𝐸𝑖). • 𝜇(Ω) = 1. The support of a distribution 𝜇 ∈ D(ΣΩ) is the set of outcomes with non-zero probability supp(𝜇) ≜ {𝑎 ∈ Ω | 𝜇(𝑎) > 0}, where 𝜇(𝑎) abbreviates 𝜇({𝑎}). Probability spaces are given as a triple of the outcome space, the 𝜎-algebra, and the distribution. Definition 5.2.3. A probability space P is a pair P = (Ω, F , 𝜇) of a 𝜎-algebra F ∈ A(Ω) and a probability distribution 𝜇 ∈ D(F ). We call the distribution 163 𝜇 the measure in a probability space P = (F , 𝜇). The trivial probability space 𝟙Ω ∈ P(Ω) is the trivial 𝜎-algebra {Ω, ∅} equipped with the trivial probability distribution that maps Ω to probability 1 and maps ∅ to probability 0. When the outcome space is clear, we omit outcome space in the triple of probability spaces. We define a pre-order on probability spaces to capture the intuition that a probability space 𝐴 is smaller than a probability space 𝐵 if 𝐴 is defined for a subset of events where 𝐵 is defined on and 𝐴 agrees with 𝐵 on the subset. This pre-order will be used in BLUEBELL’s BI model over probability spaces. Definition 5.2.4. Given F1 ⊆ F2 and 𝜇 ∈ D(F2), the distribution 𝜇 |F1 ∈ D(F1) is the restriction of 𝜇 to F1. The extension pre-order (⊑) over probability spaces is defined as (F1, 𝜇1) ⊑ (F2, 𝜇2) ≜ F1 ⊆ F2 ∧ 𝜇1 = 𝜇2 |F1 . Given two probability spaces, we identify a set of functions that transfer nicely between them, called measurable functions. Definition 5.2.5. A function 𝑓 : Ω1 → Ω2 is measurable on F1 ∈ A(Ω1) and F2 ∈ A(Ω2) if for any event 𝑋 ∈ F2, we also have 𝑓 −1(𝑋) ∈ F1. When F2 = ΣΩ2 we simply say 𝑓 is measurable on F1. Later we will want to decompose one probability space into two, and its definition depends on how we compose two probability spaces into one. Two natural ways to combine two 𝜎-algebras are taking the Cartesian product and taking the union. Definition 5.2.6 (Product and union spaces). Given F1 ∈ A(Ω1), F2 ∈ A(Ω2), 164 their product is the 𝜎-algebra F1 ⊗ F2 ∈ A(Ω1 ×Ω2) defined as F1 ⊗ F2 ≜ 𝜎({𝑋1 × 𝑋2 | 𝑋1 ∈ F1, 𝑋2 ∈ F2}), and their union is the 𝜎-algebra F1 ⊕ F2 ∈ A(Ω1 ×Ω2) defined as 𝜎(F1 ∪ F2). We can take the product of two distributions to obtain a distribution over the product 𝜎-algebra. Definition 5.2.7. The product of two probability distributions 𝜇1 ∈ D(F1) and 𝜇2 ∈ D(F2) is the distribution (𝜇1 ⊗ 𝜇2) ∈ D(F1 ⊗ F2) defined by (𝜇1 ⊗ 𝜇2) (𝑋1 × 𝑋2) = 𝜇1(𝑋1) · 𝜇2(𝑋2) for all 𝑋1 ∈ F1, 𝑋2 ∈ F2. In this chapter, we will frequently use the independent product of two distri- butions, which is over the union of their 𝜎-algebra and is not always defined. Definition 5.2.8 (Independent product Li et al. [2023a]). Given (F1, 𝜇1), (F2, 𝜇2) ∈ P(Ω), their independent product is the probability space (F1 ⊕ F2, 𝜇) ∈ P(Ω) where for all 𝑋1 ∈ F1, 𝑋2 ∈ F2, 𝜇(𝑋1 ∩ 𝑋2) = 𝜇1(𝑋1) · 𝜇2(𝑋2). It is unique, if it exists [Li et al., 2023a, Lemma 2.3]. Let P1 ⊛ P2 be the unique independent product of P1 and P2 when it exists, and be undefined otherwise. Probabilistic Programming Language Another important component in BLUEBELL is the probabilistic programming language. We use a simple first-order imperative language very similar to pWhile except that it contains a different construct for loops. As in pWhile, we 165 T ∋ 𝑡 ::= skip | 𝑥 := 𝑒 | 𝑥 𝑑 | if 𝑏 then 𝑡1 else 𝑡2 | 𝑡1 ; 𝑡2 | repeat 𝑒 𝑡 Figure 5.1: Program Syntax fix a finite set of program variables 𝑥 ∈ Var and countable set of values 𝑣 ∈ Val ≜ Z and define the program stores to be 𝑠 ∈ Mem[Var] ≜ Var → Val. For sim- plicity, booleans are encoded by using 0 ∈ Val as false and any other value as true. Program terms 𝑡 ∈ T are formed according to the grammar in fig. 5.1. (We call them terms to follow the terminology in the conference version Bao et al. [2025] and distinguish them from the commands in pWhile.) The expressions 𝑒 are interpreted into ⟦𝑒⟧ : Mem[Var] → Val following the standard definition (see definition D.1.1). As before, we write FV(𝑒) for the set of program variables that occur in 𝑒. The distributions 𝑑 are interpreted to be measures over prob- ability space Σ𝐴 for some type 𝐴; when 𝑑 : Σ𝐴 is used in the sampling 𝑥 𝑑, we expect 𝐴 to be a subset of Z. An example distribution is Bern𝑣, the Bernoulli distribution with probability 𝑣 to yield 1 and probability 1 − 𝑣 to yield 0. Though we do not allow general loops because of difficulties around rea- soning about them, we allow iterations implemented through using a simpler construct repeat 𝑒 𝑡, which evaluates 𝑒 to a value 𝑛 ∈ Val and, if 𝑛 > 0, executes 𝑡 in sequence 𝑛 times. Only allowing this restrictive version of iteration means we only consider a subset of terminating programs. For the semantics of programs, we interpret each term 𝑡 to a function ⟦𝑡⟧ : D(ΣMem[Var]) → D(ΣMem[Var]), i.e., a map from distributions of input stores to distributions of output stores. The interpretation of the terms is standard, and we defer the mathematical definition to definition D.1.2. Notably, working with a countable set of values Val means that the set of program stores are also count- 166 able, so distributions in D(ΣMem[Var]) are also discrete. In BLUEBELL, we work with discrete distributions because it is unclear how continuous distributions interact with some relational constructs in our logic. However, we still use the measure-theoretic definitions for more granular control over the event space. 5.3 The BLUEBELL Logic We are now ready to define BLUEBELL’s semantic model and show its laws. 5.3.1 An Alternative Approach to Bunched Logic While the assertion logic of BLUEBELL extends from the Bunched logic, we use a different presentation than the one used in PSL, LINA and DIBI. We adapt the approach to BI in Krebbers et al. [2018], which is motivated by efforts in mech- anizing various separation logics in a ROCQ framework called Iris. Though BLUEBELL has not been mechanized yet in Bao et al. [2025], we look forward to mechanizing it in the future, and thus, we lay the foundation of the logic in a style that aligns with the Iris framework and its follow-up works. Specifically, instead of interpreting formulas in a structure similar to BI frames and DIBI frames, which combine two states using non-deterministic binary operators, we use a structure called “ordered unital resource algebras ” (henceforth RA). RAs allow their states to be combined using either partial or total binary operators: RAs are always equipped with a total binary operation and a predicate V indi- cating which elements of the carrier are considered valid resources; then partial- ity of the operation manifests as mapping some combinations of arguments to 167 invalid elements. Definition 5.3.1 (Ordered Unital Resource Algebra). An ordered unital resource algebra (RA) is a tuple (𝑀, ⪯,V, ·, 𝜀) where ⪯ : 𝑀 is a pre-order called the resource order, V : 𝑀 → Prop is the validity predicate, (·) : 𝑀 → 𝑀 → 𝑀 is the resource composition, a commutative and associative binary operation on 𝑀 , and 𝜀 ∈ 𝑀 is the unit of 𝑀 , satisfying, for all 𝑎, 𝑏, 𝑐 ∈ 𝑀 : V(𝜀); (Unit Validity) 𝜀 · 𝑎 = 𝑎; (Unit Existence) V(𝑎 · 𝑏) → V(𝑎); (Element Validity) 𝑎 ⪯ 𝑏 → 𝑎 · 𝑐 ⪯ 𝑏 · 𝑐; (Validity Closure) 𝑎 ⪯ 𝑏 → 𝑎 · 𝑐 ⪯ 𝑏 · 𝑐. (Order Coherence) BLUEBELL also differs from PSL, LINA and DIBI in that, in BLUEBELL, we take a semantic approach to assertions: we do not insist on a specific syntax and instead characterize what constitutes an assertion by its type. We embed our definitions in a standard first-order logic, and will refer to it as the meta-level logic. We overload ∧ and ∨ as the conjunction and disjunction and write ⇒ for the implication in this meta-level logic. Following the convention in ROCQ community, we use Prop denote to the type of propositions. BLUEBELL uses an alternative definition of BI assertions. To disambiguate from the definition of BI assertions in previous chapters, we denote the BLUE- BELL version as BI∗ assertions. Definition 5.3.2. We define BI∗ assertions relative to some RA 𝑀 as the upward closed functions 𝑀 → Prop. A map 𝑃 : 𝑀 → Prop is upward closed if ∀𝑎, 𝑎′ ∈ 𝑀 such that 𝑎 ⪯𝑀 𝑎′, 𝑃(𝑎) ⇒ 𝑃(𝑎′) in the propositional logic. 168 The requirement that BI∗ assertions need to be upward closed maps is another way to express the persistence condition we impose on assertions in previous chapters. In this chapter, we do not use the symbol |= as the satisfaction relation; instead, we say that a resource 𝑎 satisfies an assertion 𝑃 if 𝑃(𝑎). Entailment is defined as (𝑃 ⊢ 𝑄) ≜ ∀𝑎 ∈ 𝑀.V(𝑎) ⇒ (𝑃(𝑎) ⇒ 𝑄(𝑎)). Logical equivalence is defined as entailment in both directions: 𝑃 ⊣⊢ 𝑄 ≜ (𝑃 ⊢ 𝑄) ∧ (𝑄 ⊢ 𝑃). We introduce two families of assertions useful in separation logic. First, pure assertions ⌜𝜙⌝ lift meta-level propositions 𝜙 to BI∗ assertions (by ignor- ing the resource). For example, an formula about specified distributions such as Bern0.5 = bind(Bern0.3, 𝑣 ↦→ Bern0.5) is pure and can be used in separation logic as ⌜Bern0.5 = bind(Bern0.3, 𝑣 ↦→ Bern0.5)⌝. Second, Own(𝑏) holds on resources that are greater or equal than 𝑏 in the RA order; this means 𝑏 represents a lower bound on the available resources. Mathematically, ⌜𝜙⌝ ≜ λ . 𝜙 Own(𝑏) ≜ λ𝑎. 𝑏 ⪯ 𝑎 We also use standard connectives from BI to produce new assertions given existing ones. We interpret these connectives relative to an RA, and the defini- tion is standard: 𝑃 ∧𝑄 ≜ λ𝑎. 𝑃(𝑎) ∧𝑄(𝑎) 𝑃 ∨𝑄 ≜ λ𝑎. 𝑃(𝑎) ∨𝑄(𝑎) 𝑃→ 𝑄 ≜ λ𝑎.∀𝑏 s.t. 𝑎 ⪯ 𝑏, 𝑃(𝑏) ⇒ 𝑄(𝑏) 𝑃 ∗ 𝑄 ≜ λ𝑎. ∃𝑏, 𝑐 s.t. 𝑎 ⊒ 𝑏 ◦ 𝑐, 𝑃(𝑏) ∧𝑄(𝑐) 𝑃 −∗ 𝑄 ≜ λ𝑎.∀𝑏, 𝑐 s.t. 𝑎 ◦ 𝑏 ⪯ 𝑐, 𝑃(𝑏) implies 𝑄(𝑐) ∀𝑥 : 𝑋. 𝑃(𝑥) ≜ λ𝑎.∀𝑥 ∈ 𝑋. 𝑃(𝑥) (𝑎) ∃𝑥 : 𝑋. 𝑃(𝑥) ≜ λ𝑎. ∃𝑥 ∈ 𝑋. 𝑃(𝑥) (𝑎) Figure 5.2: Satisfaction for BI formulas on RA 169 5.3.2 A Model of Probabilistic Spaces BLUEBELL’s assertions will be interpreted over a specific RA, which we con- struct by combining more basic RAs. The main component is the Probability Spaces RA, which uses the independent product as the RA operation. Definition 5.3.3 (Probability Spaces RA). The probability spaces RA PSpΩ is the resource algebra (P(Ω) ⊎ { }, ⪯,V, ·, 𝟙Ω)where ⪯ is the extension pre-order (def- inition 5.2.4) with added as the top element, i.e. P1 ⪯ P2 ≜ P1 ⊑ P2 and ∀𝑎 ∈ PSpΩ. 𝑎 ⪯ ; V(𝑎) ≜ 𝑎 ≠ ; composition is the independent product: 𝑎 · 𝑏 ≜  P1 ⊛ P2 if 𝑎 = P1, 𝑏 = P2, and P1 ⊛ P2 is defined otherwise The fact that PSpΩ satisfies the axioms of RAs is established in appendix D.3 and builds on the analogous construction in Lilac. We now introduce assertions that are specific to PSpΩ. We use the follow- ing two abbreviations so we do not need to write out the resource pedantically when using the BI∗ assertion Own(−): Own(F , 𝜇, 𝑝) ≜ Own(((F , 𝜇), 𝑝)) Own(F , 𝜇) ≜ ∃𝑝.Own(F , 𝜇, 𝑝) We also want to use expressions in assertions. Let A-typed expressions be maps 𝐸 of type Mem[Var] → 𝐴. We allow PSpΩ assertions to use A-typed ex- pressions for any type 𝐴. As an example, the interpretation of any program ex- pression ⟦𝑒⟧ : Mem[Var] → Val is a Val-typed expression. Thus, we seamlessly use program expressions in assertions by implicitly coercing them to their se- mantics. 170 The first kind of PSpΩ assertions we want to introduce are 𝐸 $∼ 𝜇. Intuitively, we want it to assert that the expression 𝐸 has the distribution 𝜇 in the probability space specified; and to evaluate the expression 𝐸 , the probability space needs to have enough information — we refer to the condition needed to evaluate an expression 𝐸 as ownership over 𝐸 below. Lilac proposed to use measurability as the notion of ownership. Recall that a function 𝑓 : 𝐴 → 𝐵 is measurable in a sigma-algebra F :A(𝐴) if 𝑓 −1(𝑏) = {𝑎 ∈ 𝐴 | 𝑓 (𝑎) = 𝑏} ∈ F for all 𝑏 ∈ 𝐵. An A-typed expression 𝐸 always de- fines a measurable function (i.e. a random variable) on ΣMem[Var] but might not be measurable on some sub-algebras of ΣMem[Var] . Their definition makes sense because any resource that makes 𝐸 measurable contains enough information to determine 𝐸 ’s distribution, However, we discovered that this choice made ax- ioms used in Lilac’s proofs flawed. In short, axioms such as Own(𝑥) ∗ ⌈𝑥 = 𝑦⌉ |= Own(𝑦), which intuitively convey that idea that if 𝑥 is measurable and 𝑥, 𝑦 are equal in any plausible outcomes, then 𝑦 is also measurable, played crucial role in Lilac’s proofs of example programs but are not sound. 1 Thus, we propose a slight weakening of the notion of measurability which solves those issues while still retaining the intent behind the notion of owner- ship. We call this weaker notion “almost measurability”. Definition 5.3.4 (Almost-measurability). Given a probability space (F , 𝜇) ∈ P(Ω) and a set 𝑋 ⊆ Ω, we say 𝑋 is almost measurable in (F , 𝜇), written 𝑋 � (F , 𝜇), if ∃𝑋1, 𝑋2 ∈ F . 𝑋1 ⊆ 𝑋 ⊆ 𝑋2 ∧ 𝜇(𝑋1) = 𝜇(𝑋2) We say a function 𝐸 : Ω→ 𝐴, is almost measurable in (F , 𝜇), written 𝐸 � (F , 𝜇), if 1A later revision Li et al. [2023b] corrected the issue, although with a different solution from ours. 171 𝐸−1(𝑎) � (F , 𝜇) for all 𝑎 ∈ 𝐴. While almost-measurability does not imply measurability, it constrains the current probability space to uniquely determine the distribution of 𝐸 in any extension where 𝐸 becomes measurable. Example 5.3.1. For example, let 𝑋 = {𝑠 | 𝑠(𝑥) = 42} and F = 𝜎({𝑋}) = {Mem[Var], ∅, 𝑋,Mem[Var] \ 𝑋}. If 𝜇(𝑋) = 1, then 𝑥 � (F , 𝜇) holds but 𝑥 is not measurable in F , as F lacks events for 𝑥 = 𝑣 for all 𝑣 except 42. Nevertheless, any extension (F ′, 𝜇′) ⊒ (F , 𝜇) where 𝑥 is measurable, would need to assign 𝜇′(𝑋) = 1 and 𝜇′(𝑥 = 𝑣) = 0 for every 𝑣 ≠ 42. In general, when 𝑋1 ⊆ 𝑋 ⊆ 𝑋2 and 𝜇(𝑋1) = 𝜇(𝑋2) = 𝑝, we can unambigu- ously assign probability 𝑝 to 𝑋 , as any extension of 𝜇 to ΣΩ must assign 𝑝 to 𝑋 ; then we write 𝜇(𝑋) for 𝑝. When defining 𝐸 $∼ 𝜇, we require 𝐸 to be almost-measurable and to be dis- tributed as 𝜇 in any extension of the local probability space. Formally, given 𝜇 :D(Σ𝐴) and 𝐸 : Mem[Var] → 𝐴, we define: 𝐸 $∼ 𝜇 ≜ ∃F , 𝜇.Own(F , 𝜇) ∗ ⌜𝐸 � (F , 𝜇) ∧ 𝜇 = 𝜇 ◦ 𝐸−1⌝ Notably, 𝐸 � (F , 𝜇) ∧ 𝜇 = 𝜇 ◦ 𝐸−1 is a pure fact that we can reason without using the local probability space — the probability space (F , 𝜇) is fixed by the existential quantifier and not relying on the local probability space. Using the 𝐸 $∼ 𝜇 assertion, we can define a number of useful derived asser- tions. In their definition, we use the following events of the outcome space Val: 172 false ≜ {0} and true ≜ {𝑛 ∈ Val | 𝑛 ≠ 0}. E[𝐸] = 𝑟 ≜ ∃𝜇. 𝐸 $∼ 𝜇 ∗ ⌜𝑟 = ∑︁ ∗𝑎∈supp(𝜇)𝜇(𝑎) · 𝑎⌝ Pr(𝐸) = 𝑟 ≜ ∃𝜇. 𝐸 $∼ 𝜇 ∗ ⌜𝜇(true) = 𝑟⌝ ⌈𝐸⌉ ≜ 𝐸 $∼ 𝛿true Own(𝐸) ≜ ∃𝜇. 𝐸 $∼ 𝜇 Assertions about expectations (E[𝐸]) and probabilities (Pr(𝐸)) both assert 𝐸 is distributed as 𝜇 for some distribution 𝜇 and that 𝜇 satisfies the desired pure property. To assert E[𝐸] = 𝑟, we implicitly assume 𝐸 is a numerical-typed expression. The assertion holds if 𝐸 is uniquely determined to distribute as 𝜇 and the expected value in 𝜇 is 𝑟. To assert Pr(𝐸) = 𝑟, we implicitly assume 𝐸 is a Val-typed expression. The assertion Pr(𝐸) = 𝑟 holds on a probability space if the probability space uniquely determines 𝐸 to distribute as 𝜇, where 𝜇 assign probability 𝑟 to the event true. The “almost surely” assertion ⌈𝐸⌉ takes an expression 𝐸 and asserts that 𝐸 al- ways “evaluate to true.” Because we encode booleans by treating 0 ∈ Val as false and any other value as true, we define it to evaluate whether 𝐸 is distributed as the Dirac distribution true — the handling of ownership over 𝐸 is baked in the definition of distributed as. By this definition, an assertion like ⌈𝑥 = 𝑦⌉ owns the expression 𝑥 = 𝑦 but not necessarily 𝑥 itself: the only events needed to make the expression 𝑥 = 𝑦 almost measurable are {𝑠 | 𝑥 = 𝑦} and {𝑠 | 𝑥 ≠ 𝑦}, which would be not enough to make 𝑥 itself almost measurable. Now we see an example formula that is not satisfiable in PSL’s assertion logic, but is satisfiable in the PSpΩ model. Example 5.3.2. Assume there are only two variables 𝑥 and 𝑦. Let 𝑋𝑣 = 173 {𝑠 | 𝑠(𝑥) = 𝑣} and P1 = (F1, 𝜇1) with F1 = 𝜎({𝑋𝑣 | 𝑣 ∈ Val}) and let 𝜇1 give 𝑥 the distribution of a fair coin, i.e. 𝜇1 is the extension to F1 of 𝜇1(𝑋0) = 𝜇1(𝑋1) = 1 2 . Intuitively, the assertion 𝑥 $∼ Bern 1 2 holds on P1. Similarly, ⌈𝑥 = 𝑦⌉ holds on P2 = (F2, 𝜇2) where F2 = {∅,Mem[Var], {𝐸},Mem[Var] \ 𝐸} with 𝐸 = {𝑠 | 𝑠(𝑥) = 𝑠(𝑦)} and 𝜇2(𝐸) = 1. Note that F2 is very coarse: it does not con- tain events that can pin the value of 𝑥 precisely; thanks to this, 𝜇2 does not need to specify what is the distribution of 𝑥, but only that y will coincide on 𝑥 with probability 1. It is easy to see that the independent product of P1 and P2 exists and is P3 = (F1 ⊕ F2, 𝜇3) where 𝜇3 is determined by 𝜇3(𝑋0 ∩ 𝐸) = 𝜇3(𝑋1 ∩ 𝐸) = 1 2 , i.e. makes 𝑥, y the outcomes of the same fair coin. This means P3 is a model of 𝑥 $∼ Bern 1 2 ∗ ⌈𝑥 = 𝑦⌉. 5.3.3 A Model of Mutable Probabilistic Stores In BLUEBELL, we want to develop a program logic to reason about an imperative probabilistic programming language. Ideally, we want a clean frame rule as in Lilac, which does not need side conditions as in PSL (see section 2.3.3), to make modular reasoning of independent components easy. That means we want to allow assertions on any independent probability spaces to be framed to the pre- and post-conditions of our program judgements simultaneously. Lilac shows that it is sound to do so in their model because their program variables are immutable: their program variables are essentially maps (i.e., random variables) on a fixed probability space over an infinite tape, and they can always perform some manipulations so that the random variables used in the frame assertion depends on a previously unused index of the tape. However, we work with a language with mutation — our program term updates the probability space 174 over stores as it runs, and it is problematic to allow such a frame rule in our setting. Example 5.3.3. To illustrate the problem, consider a simple assignment 𝑥 := 0. In the spirit of separation logic’s local reasoning, we expect to be able to prove a small footprint triple for the assignment, i.e., one where the precondition only involves ownership of the variable 𝑥, such as {Own(𝑥)} 𝑥 := 0 {⌈𝑥 = 0⌉}. How- ever, we would run into problems when proving the Frame rule, which is the key to enabling modular reasoning in separation logics. As we remarked, an assertion like ⌈𝑥 = 𝑦⌉ is a valid frame of Own(𝑥), so the Frame rule would allow us to derive ⊢ {Own(𝑥) ∗ ⌈𝑥 = 𝑦⌉} 𝑥 := 0 {⌈𝑥 = 0⌉ ∗ ⌈𝑥 = 𝑦⌉}. Yet the Hoare triple {Own(𝑥) ∗ ⌈𝑥 = 𝑦⌉} 𝑥 := 0 {⌈𝑥 = 0⌉ ∗ ⌈𝑥 = 𝑦⌉} would be invalid because, as long as 𝑦 ≠ 0 in input state , the formula ⌈𝑥 = 𝑦⌉ would not hold after the assignment. We solve this problem by combining PSpMem[Var] , the RA of probability spaces over the outcome space Mem[Var], with an RA of permissions over vari- ables. The idea is that in addition to information about the distribution, asser- tions can indicate which “write permissions” we own on variables. An assertion that owns write permissions on 𝑥 would be incompatible with any frame predi- cating on 𝑥. Then a triple for assignment just needs to require write permission to the assigned variable. We model permissions using a standard fractional per- mission RA. Definition 5.3.5. The permissions RA is defined as (Perm, ⪯,V, ·, 𝜀) where the carrier set Perm is defined to be maps Var→ Q+, where Q+ denotes non-negative rational numbers. The resource pre-order is the point-wise order: for any two 𝑎, 𝑏 ∈ Perm, we have 𝑎 ⪯ 𝑏 iff ∀𝑥 ∈ Var. 𝑎(𝑥) ≤ 𝑏(𝑥). A permission is valid if it is upper-bounded by 1: for 𝑎 ∈ Perm, V(𝑎) iff ∀𝑥 ∈ Var. 𝑎(𝑥) ≤ 1. The composition 175 of two permissions adds the two maps together point-wise: 𝑎1 · 𝑎2 ≜ λ𝑥. 𝑎1(𝑥) + 𝑎2(𝑥). The unit with respect to the composition is the constant zero permission: 𝜀 = λ . 0. We now want to associate probability spaces with permissions. The goal is to make sure that, for any resource 𝑠 with permission 1 on a variable 𝑥, any resources that validly compose with 𝑠 must impose no constraints on the marginal distribution of 𝑥. Since resources that validly compose with 𝑠 must have zero permission on 𝑥, we only put restriction on probability spaces’ infor- mation about variables with zero permission. For variables with strictly positive permission, whether the permission is 0.01 or 1, the probability space can spec- ify full distributions of them, or give no information, or anything in between. This gives rise to the following definition. Definition 5.3.6 (Compatibility). Given a probability space P ∈ P(Mem[Var]) and a permission map 𝑝 ∈ Perm, let 𝑆 = {𝑥 ∈ Var | 𝑝(𝑥) = 0}. We say that P is compatible with 𝑝, written P # 𝑝, if there exists P′ ∈ P((Var \ 𝑆) → Val) such that P is isomorphic to P′ ⊗ 𝟙𝑆→Val, witnessed by the isomorphism lifted from Mem[Var] � ((Var \ 𝑆) → Val) × (𝑆 → Val) on the outcome space. We extend the definition to PSpMem[Var] by declaring # 𝑝. We now construct an RA that associates probability spaces with permissions. Definition 5.3.7. Let PSpPm ≜ { (P , 𝑝) | P ∈ PSpMem[Var] , 𝑝 ∈ Perm,P # 𝑝 } . 176 We define Probability Spaces with Permissions RA (PSpPm, ⪯,V, ·, 𝜀) where V((P , 𝑝)) ≜ P ≠ ∧ ∀𝑥.𝑝(𝑥) ≤ 1 (P , 𝑝) ⪯ (P′ , 𝑝 ′) ≜ P ⪯ P′ and 𝑝 ⪯ 𝑝′ (P , 𝑝) · (P′ , 𝑝 ′) ≜ (P · P′ , 𝑝 · 𝑝 ′) 𝜀 ≜ (𝟙Mem[Var] , λ𝑥. 0) We define the following assertions specific to (PSpPm, ⪯,V, ·, 𝜀). (𝑥:𝑞) ≜ ∃P, 𝑝.Own(P, 𝑝) ∗ ⌜𝑝(𝑖) (𝑥) = 𝑞⌝ 𝑃@𝑝 ≜ ∃P . 𝑃(P) ∧ Own(P, 𝑝) (5.1) The first assertion (𝑥:𝑞) states that the current resource (P′, 𝑝′) assigns per- mission at least 𝑞 to the variable 𝑥, i.e., 𝑝′(𝑥) ≥ 𝑞. In particular, any resource that can be composed with a resource that satisfies (𝑥:1) would have a 𝜎-algebra which is trivial on 𝑥. Therefore, having (𝑥:1) holds forbids any frame to re- tain information about 𝑥. We can also differentiate between an assertion ( 𝑥:1 2 ) , which does not allow frames that mutate 𝑥 but allows frames that predicate on 𝑥 (e.g. ⌈𝑥 = 𝑦⌉) , and an assertion (𝑥:1) which does not allows frames that pred- icate on 𝑥; consequently, having ( 𝑥:1 2 ) hold standalone does not allow mutation of 𝑥, but having (𝑥:1) enables mutation of 𝑥. The second assertion 𝑃@𝑝 states 𝑃 holds in the probability space and 𝑝 lower bounds the permission. The assertion (𝑥:𝑞) is a special case of 𝑃@𝑝 where 𝑃 is set to⊤ and 𝑝 is defined as: 𝑝(𝑥) = 𝑞 and 𝑝(𝑦) = 0 for any other variables 𝑦 ∈ Var. Also, in practice, preconditions of valid program logic triples are always of the form 𝑃@𝑝 where 𝑝 contains full permissions for every variable, the relevant program mutates, and non-zero permissions for the other variables referenced in the assertions or program. For example, we define Hoare triples such that {𝑃@𝑝} 𝑥 := 𝑦 {𝑄@𝑞} is valid only if 𝑝(𝑥) = 1 and 𝑝(𝑦) > 0. 177 While permissions allow for a clean semantic treatment of mutation and in- dependence, it does incur some bookkeeping of permissions in practice. The necessary permissions are however easy to infer from the variables used in the assertions, as we will illustrate later in example 5.4.1. Since we focus on BLUEBELL for unary reasoning in this thesis, our BI model is simplyM � PSpPm, and we use assertions in the form of (𝑥:𝑞) and 𝑃@𝑝 to describe resources in this model. We write the type of assertionsM → Prop as PA. 5.3.4 Joint Conditioning To assert conditional independence, we want to assert independence of vari- ables in conditional distributions. We thus introduce the joint conditioning modality C𝜇 𝐾 to assert on conditional distributions. Here we show the defi- nition of C𝜇 restricted to the unary setting; a more general version defined for tuples of program states (with permissions) is presented in the conference ver- sion [Bao et al., 2025]. Definition 5.3.8 (Joint conditioning modality). Let 𝜇′ ∈ D(Σ𝐴) and 𝐾 : 𝐴 → PA, then we define the assertion C𝜇′ 𝐾 : PA as follows: C𝜇′ 𝐾 ≜ λ𝑎. ∃F , 𝜇, 𝑝, 𝜅. (F , 𝜇, 𝑝) ⪯ 𝑎 ∧ 𝜇 = bind(𝜇′, 𝜅) ∧ ∀𝑣 ∈ supp(𝜇′). 𝐾 (𝑣) (F , 𝜅(𝑣), 𝑝) Intuitively, C𝜇 𝐾 holds on resources whose probability spaces can be seen as the result of binding the given 𝜇 and some kernel 𝜅. Then for each outcome 𝑣 that is in the support of 𝜇, the assertion 𝐾 (𝑣) is required to hold on the distri- bution 𝜅(𝑣) (packaged with the original 𝜎-algebra and permission to make up a 178 resource). Note that the definition is upward-closed by construction because of the part ∃F , 𝜇, 𝑝.(F , 𝜇, 𝑝) ⪯ 𝑎. As the name “conditioning modality” suggests, we want to use C𝜇 𝐾 to as- sert formulas on conditional distributions. To assert 𝑄(𝑣) on the conditional distribution fixing the value of a variable 𝑥 to 𝑣, we assert ∃𝜇′. C𝜇′ 𝑣.⌈𝑥 = 𝑣⌉ ∗𝑄(𝑣), where we use the notation 𝑣.⌈𝑥 = 𝑣⌉ ∗𝑄(𝑣) to denote the map from any outcome 𝑣 ∈ supp(𝜇′) to the assertion ⌈𝑥 = 𝑣⌉ ∗ 𝑄(𝑣). This works because: if a resource (F , 𝜇, 𝑝) satisfies C𝜇′ 𝑣.⌈𝑥 = 𝑣⌉ ∗ 𝑄(𝑣), then we can prove from C𝜇′ 𝑣.⌈𝑥 = 𝑣⌉ that 𝑥 is distributed as 𝜇′ in the 𝜇; furthermore, it says there must exists 𝜅 such that 𝜇 is the distribution of 𝑥 extended with 𝜅, i.e., 𝜇 = bind(𝜇′, 𝜅), and ⌈𝑥 = 𝑣⌉ in (F , 𝜅(𝑣), 𝑝) for every 𝑣 ∈ supp(𝜇′), these two conditions together constrain 𝜅(𝑣) to be the original distribution 𝜇 conditioned on ⌈𝑥 = 𝑣⌉. Because𝑄(𝑣) is asserted on the distribution 𝜅(𝑣) for each 𝑣 ∈ supp(𝜇′) we have 𝑄(𝑣) holds in the respec- tive conditional distributions. As an example, the conditional independence of variables 𝑦 and 𝑧 given 𝑥 can be asserted as ∃𝜇′. C𝜇′ 𝑣.⌈𝑥 = 𝑣⌉ ∗ Own(𝑦) ∗ Own(𝑧). In this particular case, 𝑄(𝑣) is invariant with respect to 𝑣. 5.3.5 The Rules of Conditioning and Independence Although we adopt a “shallow embedding” approach to assertions in this chap- ter, the rules of BLUEBELL provide an axiomatic treatment of these assertions so that the user should never manipulate raw predicates over the semantic model. For 179 Distribution ownership rules DIST-INJ 𝐸 $∼ 𝜇 ∧ 𝐸 $∼ 𝜇′ ⊢ ⌜𝜇 = 𝜇′⌝ SURE-MERGE ⌈𝐸1⌉ ∗ ⌈𝐸2⌉ ⊣⊢ ⌈(𝐸1 ∧ 𝐸2)⌉ PROD-SPLIT (𝐸1, 𝐸2) $∼ 𝜇1 ⊗ 𝜇2 ⊢ 𝐸1 $∼ 𝜇1 ∗ 𝐸2 $∼ 𝜇2 Joint conditioning rules C-TRUE ⊢ C𝜇 . True C-FALSE C𝜇 𝑣. False ⊢ False C-CONS ∀𝑣. 𝐾1(𝑣) ⊢ 𝐾2(𝑣) C𝜇 𝑣. 𝐾1(𝑣) ⊢ C𝜇 𝑣. 𝐾2(𝑣) C-FRAME 𝑃 ∗ C𝜇 𝑣. 𝐾 (𝑣) ⊢ C𝜇 𝑣. (𝑃 ∗ 𝐾 (𝑣)) C-UNIT-L C 𝛿𝑣0 𝑣. 𝐾 (𝑣) ⊣⊢ 𝐾 (𝑣0) C-UNIT-R 𝐸 $∼ 𝜇 ⊣⊢ C𝜇 𝑣. ⌈𝐸 = 𝑣⌉ C-ASSOC 𝜇0 = bind(𝜇, λ𝑣. (bind(𝜅(𝑣), λ𝑤. return(𝑣, 𝑤)))) C𝜇 𝑣. C𝜅 (𝑣) 𝑤. 𝐾 (𝑣, 𝑤) ⊢ C𝜇0 (𝑣, 𝑤). 𝐾 (𝑣, 𝑤) C-UNASSOC Cbind(𝜇,𝜅 ) 𝑤. 𝐾 (𝑤) ⊢ C𝜇 𝑣. C𝜅 (𝑣) 𝑤. 𝐾 (𝑤) C-SKOLEM 𝜇 :D(Σ𝐴) C𝜇 𝑣. ∃𝑥 : 𝑋. 𝑄(𝑣, 𝑥) ⊢ ∃ 𝑓 : 𝐴→ 𝑋. C𝜇 𝑣. 𝑄(𝑣, 𝑓 (𝑣)) C-TRANSF 𝑓 : supp(𝜇) → supp(𝜇′) bijective ∀𝑏 ∈ supp(𝜇′). 𝜇′(𝑏) = 𝜇( 𝑓 −1(𝑏)) C𝜇 𝑎. 𝐾 (𝑎) ⊢ C𝜇′ 𝑏. 𝐾 ( 𝑓 −1(𝑏)) SURE-STR-CONVEX C𝜇 𝑣. (𝐾 (𝑣) ∗ ⌈𝐸⌉) ⊢ ⌈𝐸⌉ ∗ C𝜇 𝑣. 𝐾 (𝑣) C-FOR-ALL C𝜇 𝑣.∀𝑥 : 𝑋. 𝑄(𝑣, 𝑥) ⊢ ∀𝑥 : 𝑋. C𝜇 𝑣. 𝑄(𝑣, 𝑥) C-PURE ⌜𝜇(𝑋) = 1⌝ ∗ C𝜇 𝑣. 𝐾 (𝑣) ⊣⊢ C𝜇 𝑣. (⌜𝑣 ∈ 𝑋⌝ ∗ 𝐾 (𝑣)) Figure 5.3: Primitive rules of BLUEBELL. brevity, we omit the rules that apply to the basic connectives of separation logic, as they are well-known and have been proven correct for any model that is an RA. For those we refer to Krebbers et al. [2018]. We make a distinction between “primitive” and “derived” rules. The primi- tive rules require proofs that manipulate the semantic model definitions directly. 180 Ownership and distributions SURE-DIRAC 𝐸 $∼ 𝛿𝑣 ⊣⊢ ⌈𝐸 = 𝑣⌉ SURE-EQ-INJ ⌈𝐸 = 𝑣⌉ ∗ ⌈𝐸 = 𝑣′⌉ ⊢ ⌜𝑣 = 𝑣′⌝ SURE-SUB 𝐸1 $∼ 𝜇 ∗ ⌈(𝐸2 = 𝑓 (𝐸1))⌉ ⊢ 𝐸2 $∼ 𝜇 ◦ 𝑓 −1 DIST-FUN 𝐸 $∼ 𝜇 ⊢ ( 𝑓 ◦ 𝐸) $∼ 𝜇 ◦ 𝑓 −1 DIRAC-DUP 𝐸 $∼ 𝛿𝑣 ⊢ 𝐸 $∼ 𝛿𝑣 ∗ 𝐸 $∼ 𝛿𝑣 DIST-SUPP 𝐸 $∼ 𝜇 ⊢ 𝐸 $∼ 𝜇 ∗ ⌈𝐸 ∈ supp(𝜇)⌉ PROD-UNSPLIT 𝐸1 $∼ 𝜇1 ∗ 𝐸2 $∼ 𝜇2 ⊢ (𝐸1, 𝐸2) $∼ 𝜇1 ⊗ 𝜇2 Joint conditioning C-FUSE C𝜇 𝑣. C𝜅 (𝑣) 𝑤. 𝐾 (𝑣, 𝑤) ⊣⊢ C𝜇�𝜅 (𝑣, 𝑤). 𝐾 (𝑣, 𝑤) C-SWAP C𝜇1 𝑣1. C𝜇2 𝑣2. 𝐾 (𝑣1, 𝑣2) ⊢ C𝜇2 𝑣2. C𝜇1 𝑣1. 𝐾 (𝑣1, 𝑣2) SURE-CONVEX C𝜇 𝑣. ⌈𝐸⌉ ⊢ ⌈𝐸⌉ DIST-CONVEX C𝜇 𝑣. 𝐸 $∼ 𝜇′ ⊢ 𝐸 $∼ 𝜇′ C-SURE-PROJ C𝜇 (𝑣, 𝑤). ⌈𝐸 (𝑣)⌉ ⊣⊢ C𝜇◦𝜋−1 1 𝑣. ⌈𝐸 (𝑣)⌉ C-EXTRACT C𝜇1 𝑣1. ( ⌈𝐸1 = 𝑣1⌉ ∗ 𝐸2 $∼ 𝜇2 ) ⊢ 𝐸1 $∼ 𝜇1 ∗ 𝐸2 $∼ 𝜇2 C-DIST-PROJ C𝜇 (𝑥, 𝑦). 𝐸 (𝑥) $∼ 𝜇(𝑥) ⊢ C𝜇◦𝜋−1 1 𝑥. 𝐸 (𝑥) $∼ 𝜇(𝑥) Figure 5.4: Derived rules. The derived rules can be proved sound by staying at the level of the logic, i.e. by using the primitive rules of BLUEBELL. Figure 5.3 presents the primitive rules and fig. 5.4 presents the derived rules.2 We first present three primitive rules concerning distribution ownership. DIST-INJ allows us to conclude from two assertions on an expression’s distribu- tion that the distributions asserted in the two are the same. SURE-MERGE com- bines two sure assertions into one. PROD-SPLIT rewrites an assertion saying that 2We omitted rules for relational reasoning here. They are presented in the appendix of the conference version Bao et al. [2025]. 181 two expressions are distributed as the independent product of 𝜇1, 𝜇2 using the independent conjunction ∗ in the logic. We then present the primitive rules for the conditioning modality. Among the primitive rules, C-TRUE, C-FALSE, C-FRAME, C-SKOLEM and C-FOR-ALL describe how the conditioning modality interacts with other connectives in the logic — respectively True, False, ∗, ∃ and ∀ here. In particular, C-TRUE allows to introduce a trivial modality; together with C-FRAME this allows for the intro- duction of the modality around any assertion. Because distributions form a monad, and the definition of the conditioning modality uses the monadic bind, we also have rules corresponding to the three monad laws: C-UNIT-L (resp. C-UNIT-R) reflects the existence of left units (resp. right unit) for bind; and C-ASSOC and C-UNASSOC) hold because the monadic bind is associative. Among the rest, C-CONS allows us to weaken the asser- tion under conditioning; C-TRANSF allows for the transformation of the convex combination using 𝜇 into using 𝜇′ by applying a bijection between their support in a way that does not affect the weight of each outcome; SURE-STR-CONVEX internalizes a stronger version of convexity of ⌈𝐸⌉ assertions and allows us to pull ⌈𝐸⌉ out of conditioning modality — it is the reversal of C-FRAME but only applies for sure assertions; C-PURE allows to translate facts that hold with prob- ability 1 in 𝜇 to predicates that hold on every 𝑣 bound by conditioning on 𝜇. The derived rules capture other useful reasoning patterns followed from the primitive rules. For instance, C-FUSE is derived from C-ASSOC and C-UNASSOC. It concerns a particular distribution 𝜇 � 𝑘 , defined by 𝜇 � 𝜅 := bind(𝜇, 𝑣 ↦→ unit(𝑣) ⊗ 𝜅(𝑣)). We explain the rest of these rules in section 5.5 when they are used. 182 5.4 Reasoning about Programs in BLUEBELL To reason about programs, we introduce a weakest-precondition assertion (WP) wp 𝑡 {𝑄}. Our weakest-precondition assertion wp 𝑡 {𝑄} intuitively states: given the current input distributions, if we run the programs in 𝑡, we obtain output distributions that satisfy 𝑄; furthermore, every frame is preserved. Definition 5.4.1 (Weakest Precondition). For 𝑎 ∈ M𝐼 and 𝜇 :D(ΣMem[Var] 𝐼 ), let 𝑎 ⪯ 𝜇 to abbreviate for 𝑎 ⪯ (ΣMem[Var] 𝐼 , 𝜇, λ𝑥. 1). wp 𝑡 {𝑄} ≜ λ𝑎.∀𝜇0.∀𝑐. (𝑎 · 𝑐) ⪯ 𝜇0 ⇒ ∃𝑏. ( (𝑏 · 𝑐) ⪯ ⟦𝑡⟧(𝜇0) ∧𝑄(𝑏) ) The assertion holds on the resources 𝑎 such that if, together with some frame 𝑐, they can be seen as a fragment of the global distribution 𝜇0, then it is possible to update the resource to some 𝑏 which still composes with the frame 𝑐, and 𝑏 · 𝑐 can be seen as a fragment of the output distribution ⟦𝑡⟧(𝜇0). Moreover, such 𝑏 needs to satisfy the postcondition 𝑄. In the previous chapters, we use Hoare triples for reasoning about programs. We remark two kinds of differences here. The first difference is between the Hoare-style logic and weakest-precondition style specifications. Previously, we treat Hoare triples as judgments in the program logic layer, which uses the as- sertion logic layer for specifications. But here, we consider WP as a modality of the logic, analogous to the conditioning modality. The WP modality only uses the post-conditions and the programs while Hoare triple in addition takes the preconditions. One can also define Hoare triples on top of the WP by {𝑃} 𝑡 {𝑄} ≜ 𝑃 ⊢ wp 𝑡 {𝑄} and computes a sufficient pre-condition. The second difference is our design 183 choice to require every frame to be preserved in the definition of WP. While this sets us apart from the the original notion of weakest preconditions in Dijkstra’s seminal paper Dijkstra [1975], the choice of requiring the frame to be preserved is also prevalent in separation logics literature (e.g., Jung et al. [2015], Li et al. [2023a]). Crucially, this more complicated version of WP frees us from requiring the formulas to satisfy the restriction property, which is needed to isolate a part of the resource sufficient for validating a formula in LINA and DIBI. We present the full set of WP rules in fig. 5.5. The structural rules has the standard WP-CONS that allows us to weaken the post-condition. WP-CONS, as we desired, does not need side conditions. C-WP-SWAP is a new rule, saying that we can commute the conditioning modality and the WP. This rule facilitates case analysis in program analysis: it implies that, if we can condition the current probability space based on different scenarios 𝑣 ∼ 𝜇 and, for each scenario 𝑣, we have 𝑄(𝑣) after running 𝑡, then we can push the case analysis to the postcondi- tion after running 𝑡. There is a side condition, however, that we need to own all the variables in Var because subtleties in the interaction of C-WP-swapping frame preservation. For the program rules, WP-SKIP and WP-SEQ are standard. We discuss the rules for assignments and sampling in more detail below. WP-IF-PRIM is also the standard rule for a conditional whose guard is simply a value; but we can reason about conditionals whose guard is a randomized variable as well by first conditioning on the value of the guard, and then apply WP-IF-PRIM together with WP-BIND and C-WP-SWAP. We encapsulate this reasoning pattern as the derived rule WP-IF-UNARY. The loop rule WP-LOOP-UNF helps unfolding a loop with (𝑛+1) iterations, and WP-LOOP reduces the task of reasoning about 𝑛 itera- 184 Structural WP rules WP-CONS 𝑄 ⊢ 𝑄′ wp 𝑡 {𝑄} ⊢ wp 𝑡 {𝑄′} WP-FRAME 𝑃 ∗ wp 𝑡 {𝑄} ⊢ wp 𝑡 {𝑃 ∗ 𝑄} C-WP-SWAP C𝜇 𝑣.wp 𝑡 {𝑄(𝑣)} ∧ ownVar ⊢ wp 𝑡 { C𝜇 𝑣. 𝑄(𝑣) } Program WP rules WP-SKIP 𝑃 ⊢ wp [skip] {𝑃} WP-SEQ wp [𝑡] { wp [𝑡′] {𝑄} } ⊢ wp [𝑡; 𝑡′] {𝑄} WP-ASSIGN 𝑥 ∉ FV(𝑒) ∀𝑦 ∈ FV(𝑒). 𝑝(𝑦) > 0 𝑝(𝑥) = 1 (𝑝) ⊢ wp [x := 𝑒] { ⌈𝑥 = 𝑒⌉@𝑝 } WP-SAMP (𝑥:1) ⊢ wp [x 𝑑(®𝑣)] {𝑥 $∼ 𝑑 (®𝑣)} WP-IF-PRIM if 𝑣 then wp [𝑡1] {𝑄(1)} else wp [𝑡2] {𝑄(0)} ⊢ wp [if 𝑣 then 𝑡1 else 𝑡2] {𝑄(𝑣)} WP-BIND ⌈𝑒 = 𝑣⌉ ∗ wp [ E[𝑣] ] {𝑄} ⊢ wp [ E[𝑒] ] {𝑄} WP-LOOP-UNF wp [repeat 𝑛 𝑡] {wp [𝑡] {𝑄}} ⊢ wp [repeat (𝑛 + 1) 𝑡] {𝑄} WP-LOOP ∀𝑖 < 𝑛. 𝑃(𝑖) ⊢ wp [𝑡] {𝑃(𝑖 + 1)} 𝑃(0) ⊢ wp [repeat 𝑛 𝑡] {𝑃(𝑛)} 𝑛 ∈ N Figure 5.5: The primitive WP rules of BLUEBELL. tions to reason about each loop iteration. Both loop rules are proved by straight- forward inductions on the semantics level, and we can also derive WP-LOOP-0 from these two rules. We prove the soundness of each rules, using facts in first-order logic, in ap- pendix D.5. Theorem 5.4.1. If 𝑃 ⊢ 𝑄, then 𝑃⇒ 𝑄 is derivable in first-order logic. WP-SAMP is the expected “small footprint” rule for sampling; the precondi- 185 WP-LOOP-0 𝑃 ⊢ wp [𝑖:repeat 0 𝑡] {𝑃} WP-IF-UNARY 𝑃 ∗ ⌈𝑒 = 1⌉ ⊩ wp [𝑡1] {𝑄(1)} 𝑃 ∗ ⌈𝑒 = 0⌉ ⊩ wp [𝑡2] {𝑄(0)} 𝑃 ∗ 𝑒 $∼ 𝛽 ⊩ wp [if 𝑒 then 𝑡1 else 𝑡2] { C𝛽 𝑏. 𝑄(𝑏) } Figure 5.6: Derived WP rules. tion only requires full permission on the variable being assigned, to forbid any frame to record information about it. WP-ASSIGN requires full permission on 𝑥, and non-zero permission on the variables on the right-hand side of the as- signment. This allows the postcondition to assert that 𝑥 and the expression 𝑒 assigned to it are equal with probability 1. The condition 𝑥 ∉ FV( ®𝑒) ensures 𝑒 has the same meaning before and after the assignment, but is not restrictive: if needed the old value of 𝑥 can be stored in a temporary variable, or the proof can condition on 𝑥 to work with its pure value. The assignment and sampling rules are the only ones that impose constraints on the owned permissions. In proofs, this means that most rule applications simply thread through permissions so that the needed permissions can reach the applications of the assignment rules. To avoid cluttering proof derivations with this bookkeeping, we mostly omit permission information from assertions. The appropriate permission annotations can be inferred, as we show in the fol- lowing example. Example 5.4.1. Consider the following triple with an unknown permission 𝑝: (𝑥 $∼ 𝜇1 ∗ ⌈𝑥 = 𝑦⌉ ∗ 𝑧 $∼ 𝜇2)@(𝑝) ⊢ wp [x :=z] {(⌈𝑥 = 𝑧⌉ ∗ 𝑧 $∼ 𝜇2)@(𝑝)} We want to determine a choice for 𝑝 that makes the proof derivation goes through. Because the assignment only changes the variable 𝑥 by an assignment, 186 our proof strategy is to first apply WP-ASSIGN and WP-CONS to prove (𝑥 $∼ 𝜇1 ∗ ⌈𝑥 = 𝑦⌉)@(𝑝′) ⊢ wp [x :=z] {⌈𝑥 = 𝑧⌉@(𝑝′)}, for some suitable 𝑝′ and then apply WP-FRAME to frame 𝑧 $∼ 𝜇2 and prove the original goal. To apply WP-ASSIGN, we need to ensure 𝑝′(𝑥) = 1 and 𝑝′(𝑧) > 0. Because ⌈𝑥 = 𝑦⌉ is not trivial on 𝑦, so it must 𝑝′(𝑦) > 0 as well Also, to apply WP-FRAME to frame 𝑧 $∼ 𝜇2, we need to ensure 𝑝′ composes with another permission 𝑝′′ that is compatible with the probability space where 𝑧 $∼ 𝜇2 holds; because 𝑧 $∼ 𝜇2 asserts 𝑧 is non-trivial, it must 𝑝′′(𝑧) > 0, indicating 𝑝′(𝑧) < 1. Thus, one reasonable way to distribute the permission is 𝑝′(𝑥) = 1 𝑝′(𝑦) = 1/2𝑝′(𝑧) = 1/3 𝑝′′(𝑥) = 0 𝑝′′(𝑦) = 0𝑝′′(𝑧) = 1/3. We can thus prove that (𝑥 $∼ 𝜇1 ∗ ⌈𝑥 = 𝑦⌉)@ ( 𝑥:1, 𝑦: 1 2 , 𝑧: 1 3 ) ∗ 𝑧 $∼ 𝜇2@ ( 𝑧: 1 3 ) ⊢ wp [x :=z] {(⌈𝑥 = 𝑧⌉@ ( 𝑥:1, 𝑦: 1 2 , 𝑧: 1 3 ) ∗ 𝑧 $∼ 𝜇2)@ ( 𝑧: 1 3 ) } The triple can be further composed with a frame that asserts fractional permis- sion for 𝑦 and 𝑧. because permissions in the range (0, 1) essentially serve the same role, we can also pick different number for 𝑝′(𝑦), 𝑝′(𝑧) and 𝑝′′(𝑧) as long as 𝑝′(𝑦), 𝑝′(𝑧), 𝑝′′(𝑧) stay in (0, 1) and 𝑝′(𝑧) + 𝑝′′(𝑧) stay in (0, 1). 187 5.5 Case Studies for BLUEBELL Our evaluation of BLUEBELL is based on two main lines of enquiry: (1) Are high- level principles about probabilistic reasoning provable from the core constructs of BLUEBELL? (2) Does BLUEBELL, through enabling new reasoning patterns, expand the horizon for verification of probabilistic programs beyond what was possible before? We include case studies that try to highlight the contribution of BLUEBELL each question, and sometimes both at the same time. Specifically, our evaluation is guided by the following research questions: RQ1: Do joint conditioning and independence offer a good abstract interface over the underlying semantic model? RQ2: Can known unary/relational principles be reconstructed from BLUE- BELL’s primitives? RQ3: Can new unary/relational principles be discovered (as new lemmas) and proved from BLUEBELL’s primitives? RQ4: Can BLUEBELL’s primitives be successfully incorporated in an effective program logic? Since we only introduced the unary part of BLUEBELL, we only show exam- ples that exercise BLUEBELL’s unary reasoning. 5.5.1 One Time Pad Revisited 188 def encrypt(): k Ber(1 2) m Ber(𝑝) c :=k xor m Figure 5.7: One time pad. In fig. 5.7 we show a simple example adapted from Barthe et al. [2019]: the encrypt procedure uses a fair coin flip to generate an encryption key 𝑘 , gen- erates a plaintext message in boolean variable 𝑚 (using a coin flip with some bias 𝑝) and produces the cipher- text c by XORing the key and the mes- sage. One way of stating and proving the correctness of encrypt is to establish that in the output distribution 𝑐 and 𝑚 are independent, which can be expressed as the unary goal: ( [𝑘: 1, 𝑚: 1, 𝑐: 1]) ⊢ wp [1:encrypt()] {𝑐 $∼ Bern1/2 ∗ 𝑚 $∼ Bern𝑝} The triple states that after running encrypt, the ciphertext 𝑐 is distributed as a fair coin, and—importantly—is not correlated with the plaintext in 𝑚. The PSL proof in Barthe et al. [2019] performs some of the steps within the logic, but needs to carry out some crucial entailments at the meta-level, which shows some limitations of its abstractions (RQ1). The same applies to the Lilac proof in Li et al. [2023a] which requires ad-hoc lemmas proven on the semantic model. The stumbling block is proving the valid entailment: 𝑘 $∼ Bern 1 2 ∗ 𝑚 $∼ Bern𝑝 ∗ ⌈𝑐 = 𝑘 xor 𝑚⌉ ⊢ 𝑚 $∼ Bern𝑝 ∗ 𝑐 $∼ Bern 1 2 In BLUEBELL we can prove the entailment in two steps: (1) we condition on 𝑚 and 𝑘 to compute the result of the 𝑥𝑜𝑟 operation and obtain that 𝑐 is distributed as Bern 1 2 ; (2) we carefully eliminate the conditioning while preserving the inde- pendence of 𝑚 and 𝑐. 189 The first step starts by conditioning the distribution of the message 𝑚 us- ing C-UNIT-R. 𝑘 $∼ Bern 1 2 ∗ 𝑚 $∼ Bern𝑝 ∗ ⌈𝑐 = 𝑘 xor 𝑚⌉ ⊢ 𝑚 $∼ Bern𝑝 ∗ 𝑘 $∼ Bern 1 2 ∗ ⌈𝑐 = 𝑘 xor 𝑚⌉ ⊢ ( CBern𝑝 𝑢. ⌈𝑚 = 𝑢⌉ ) ∗ 𝑘 $∼ Bern 1 2 ∗ ⌈𝑐 = 𝑘 xor 𝑚⌉ (C-UNIT-R) Because the key 𝑘 is sampled independently from the message 𝑚 and the almost sure assertion ⌈𝑐 = 𝑘 xor 𝑚⌉ also holds on an independent probability space, conditioning on 𝑚 does not change assertion on them — this idea is for- malized by the rule C-FRAME.( CBern𝑝 𝑢. ⌈𝑚 = 𝑢⌉ ) ∗ 𝑘 $∼ Bern 1 2 ∗ ⌈𝑐 = 𝑘 xor 𝑚⌉ ⊢ CBern𝑝 𝑢. ( ⌈𝑚 = 𝑢⌉ ∗ 𝑘 $∼ Bern 1 2 ∗ ⌈𝑐 = 𝑘 xor 𝑚⌉ ) (C-FRAME) Because ⌈𝑚 = 𝑢⌉ and ⌈𝑐 = 𝑘 xor 𝑚⌉ under conditioning, so we can merge the two facts into ⌈𝑐 = 𝑘 xor 𝑢⌉ using SURE-MERGE. After that, we condition on 𝑘 , again by first using C-UNIT-R and then moving the fact ⌈𝑐 = 𝑘 xor 𝑢⌉ under the conditioning using C-FRAME. CBern𝑝 𝑢. ( ⌈𝑚 = 𝑢⌉ ∗ 𝑘 $∼ Bern 1 2 ∗ ⌈𝑐 = 𝑘 xor 𝑚⌉ ) ⊢ CBern𝑝 𝑢. ( ⌈𝑚 = 𝑢⌉ ∗ 𝑘 $∼ Bern 1 2 ∗ ⌈𝑐 = 𝑘 xor 𝑢⌉ ) (SURE-MERGE) ⊢ CBern𝑝 𝑢. ( ⌈𝑚 = 𝑢⌉ ∗ ( CBern 1 2 𝑣. ⌈𝑘 = 𝑣⌉ ) ∗ ⌈𝑐 = 𝑘 xor 𝑢⌉ ) (C-UNIT-R) ⊢ CBern𝑝 𝑢. ( ⌈𝑚 = 𝑢⌉ ∗ CBern 1 2 𝑣. ( ⌈𝑘 = 𝑣⌉ ∗ ⌈𝑐 = 𝑘 xor 𝑢⌉ ) ) (C-FRAME) Under the conditioning of 𝑚 and 𝑘 , we have the fact that ⌈𝑘 = 𝑣⌉ ∗ ⌈𝑐 = 𝑘 xor 𝑢⌉. In the final goal, we do not care what the key 𝑘 is as long as the ciphered message 𝑐 does not leak any information about the original message 𝑚. So next, 190 we side-step assertions about 𝑘 to facilitate reasoning about 𝑐 and 𝑚. Formally, we merge ⌈𝑘 = 𝑣⌉ ∗ ⌈𝑐 = 𝑘 xor 𝑢⌉ into ⌈𝑘 = 𝑣∧𝑐 = 𝑘 xor 𝑢⌉ using SURE-MERGE, and then apply propositional reasoning to rewrite it into a case analysis, i.e. ⌈𝑐 = 𝑣⌉ when 𝑢 = 0 and ⌈𝑐 = ¬𝑣⌉ when 𝑢 = 1. CBern𝑝 𝑢. ( ⌈𝑚 = 𝑢⌉ ∗ CBern 1 2 𝑣. ( ⌈𝑘 = 𝑣⌉ ∗ ⌈𝑐 = 𝑘 xor 𝑢⌉ ) ) ⊢ CBern𝑝 𝑢. ( ⌈𝑚 = 𝑢⌉ ∗ CBern 1 2 𝑣. ⌈𝑘 = 𝑣 ∧ 𝑐 = 𝑣 xor 𝑢⌉ ) (SURE-MERGE) ⊢ CBern𝑝 𝑢. ©­­­«⌈𝑚 = 𝑢⌉ ∗  CBern 1 2 𝑣. ⌈𝑐 = 𝑣⌉ if 𝑢 = 0 CBern 1 2 𝑣. ⌈𝑐 = ¬𝑣⌉ if 𝑢 = 1 ª®®®¬ (C-CONS) The next is a crucial application of C-TRANSF. Because Bern 1 2 is uniform between 0 and 1, if we choose the bijection 𝑓 : 𝑣 ↦→ ¬𝑣, then the pushforward measure of Bern 1 2 by 𝑓 is also Bern 1 2 . Thus, applying C-TRANSF, we can rewrite CBern 1 2 𝑣. ⌈𝑐 = ¬𝑣⌉ into CBern 1 2 𝑤. ⌈𝑐 = ¬¬𝑤⌉. Therefore, CBern𝑝 𝑢. ©­­­«⌈𝑚 = 𝑢⌉ ∗  CBern 1 2 𝑣. ⌈𝑐 = 𝑣⌉ if 𝑢 = 0 CBern 1 2 𝑣. ⌈𝑐 = ¬𝑣⌉ if 𝑢 = 1 ª®®®¬ ⊢ CBern𝑝 𝑢. ©­­­«⌈𝑚 = 𝑢⌉ ∗  CBern 1 2 𝑣. ⌈𝑐 = 𝑣⌉ if 𝑢 = 0 CBern 1 2 𝑤. ⌈𝑐 = 𝑤⌉ if 𝑢 = 1 ª®®®¬ Now, the formulas in the two cases of the case analysis are equivalent, so we can combine the two cases. CBern𝑝 𝑢. ©­­­«⌈𝑚 = 𝑢⌉ ∗  CBern 1 2 𝑣. ⌈𝑐 = 𝑣⌉ if 𝑢 = 0 CBern 1 2 𝑤. ⌈𝑐 = 𝑤⌉ if 𝑢 = 1 ª®®®¬ (C-TRANSF) ⊢ CBern𝑝 𝑢. ( ⌈𝑚 = 𝑢⌉ ∗ CBern 1 2 𝑣. ⌈𝑐 = 𝑣⌉ ) 191 Now, we can almost read off that no matter what value 𝑚 takes, the vari- able 𝑐 is distributed as a Bernoulli distribution Bern 1 2 , and that implies that 𝑚 and 𝑐 are independent. Formally, we apply a sequence of steps to reach that conclusion. The first two steps are familiar applications of C-FRAME followed by SURE-MERGE. Then, C-ASSOC binds the two distribution together — because the second distribution Bern 1 2 does not use any 𝑢 drawn from the first distri- bution, we get the independent product of Bern𝑝 ⊗ Bern 1 2 as the result. After that, C-UNIT-R eliminates the conditioning. Last, PROD-SPLIT helps us pull the independent product in the distribution Bern𝑝 ⊗ Bern 1 2 into the independence conjunction asserted in our logic 𝑚 $∼ Bern𝑝 ∗ 𝑐 $∼ Bern 1 2 . CBern𝑝 𝑢. ( ⌈𝑚 = 𝑢⌉ ∗ CBern 1 2 𝑣. ⌈𝑐 = 𝑣⌉ ) ⊢ CBern𝑝 𝑢. CBern 1 2 𝑣. ( ⌈𝑚 = 𝑢⌉ ∗ ⌈𝑐 = 𝑣⌉ ) (C-FRAME) ⊢ CBern𝑝 𝑢. CBern 1 2 𝑣. ⌈𝑚 = 𝑢 ∧ 𝑐 = 𝑣⌉ (SURE-MERGE) ⊢ CBern𝑝⊗Bern 1 2 (𝑢, 𝑣). ⌈(𝑚, 𝑐) = (𝑢, 𝑣)⌉ (C-ASSOC) ⊢ (𝑚, 𝑐) $∼ (Bern𝑝 ⊗ Bern 1 2 ) (C-UNIT-R) ⊢ 𝑚 $∼ Bern𝑝 ∗ 𝑐 $∼ Bern 1 2 (PROD-SPLIT) 5.5.2 Markov Blankets We next study Markov Blankets [Pearl, 2014] — a useful concept in Bayesian rea- soning — to illustrate BLUEBELL’s expressiveness (RQ1 and RQ2). Intuitively, Markov blankets identify a set of variables that contains all useful information of a target set of variables. When knowing the values of variables in a Markov 192 blanket, we no longer need to worry about how the other variables influence the target set. For concreteness, consider the program 𝑥1 𝑑1;𝑥2 𝑑2(𝑥1);𝑥3 𝑑3(𝑥2). The program describes a Markov chain of three variables, where we first sample 𝑥1, then sample 𝑥2 from a distribution determined by the value of 𝑥1, and last sample 𝑥3 from a distribution determined by the value of 𝑥2. These kinds of de- pendencies are ubiquitous in, for instance, hidden Markov models and Bayesian network representations of distributions. Clearly, 𝑥3 depends on 𝑥2 and, indirectly, on 𝑥1. However, Markov chains enjoy the memorylessness property: when fixing a variable in the chain, the variables that follow it are independent of the variables that preceded it. For our example, this means that conditioning on 𝑥2, then 𝑥1 and 𝑥3 are independent (i.e. we can ignore the indirect dependencies). The memorylessness property is used in a lot of analysis of Markov chain based algorithms and also causal infer- ence. In the following, we prove the memorylessness property for this specific program using BLUEBELL. Using BLUEBELL’s program rules, we can prove that after the program exe- cution the output distribution satisfies the assertion C𝑑1 𝑣1. ( ⌈𝑥1 = 𝑣1⌉ ∗ C𝑑2 (𝑣1) 𝑣2. ( ⌈𝑥2 = 𝑣2⌉ ∗ 𝑥3 $∼ 𝑑3(𝑣2) ) ) We want to transform the assertion into: C𝜇2 𝑣2. ( ⌈𝑥2 = 𝑣2⌉ ∗ 𝑥1 $∼ 𝜇1(𝑣2) ∗ 𝑥3 $∼ 𝑑3(𝑣2) ) for appropriate 𝜇2 and 𝜇1. 193 In probability theory, the proof of memorylessness is an application of Bayes’ law: we compute the distribution of 𝑥1, 𝑥3 conditioned on 𝑥2, using the distribu- tion of 𝑥2 conditioned on 𝑥1 and the distribution of 𝑥3 conditioned on 𝑥2. In BLUEBELL we can reproduce the transformation using the joint conditioning rules, in particular the right-to-left direction of C-FUSE and the primitive rule C- UNASSOC. Using these we can prove: C𝑑1 𝑣1. ( ⌈𝑥1 = 𝑣1⌉ ∗ C𝑑2 (𝑣1) 𝑣2. ( ⌈𝑥1 = 𝑣2⌉ ∗ 𝑥3 $∼ 𝑑3(𝑣2) ) ) ⊢ C𝑑1 𝑣1. ( C𝑑2 (𝑣1) 𝑣2. ( ⌈𝑥1 = 𝑣1⌉ ∗ ⌈𝑥1 = 𝑣2⌉ ∗ 𝑥3 $∼ 𝑑3(𝑣2) ) ) (C-FRAME) ⊢ C𝜇0 (𝑣1, 𝑣2). ( ⌈𝑥1 = 𝑣1⌉ ∗ ⌈𝑥2 = 𝑣2⌉ ∗ 𝑥3 $∼ 𝑑3(𝑣2) ) (C-FUSE) ⊢ C𝜇2 𝑣2. ( C𝜇1 (𝑣2) 𝑣1. ( ⌈𝑥1 = 𝑣1⌉ ∗ ⌈𝑥2 = 𝑣2⌉ ∗ 𝑥3 $∼ 𝑑3(𝑣2) ) ) (C-UNASSOC) ⊢ C𝜇2 𝑣2. ( ⌈𝑥2 = 𝑣2⌉ ∗ C𝜇1 (𝑣2) 𝑣1. ( ⌈𝑥1 = 𝑣1⌉ ∗ 𝑥3 $∼ 𝑑3(𝑣2) ) ) (SURE-STR-CONVEX) ⊢ C𝜇2 𝑣2. ( ⌈𝑥2 = 𝑣2⌉ ∗ 𝑥1 $∼ 𝜇1(𝑣2) ∗ 𝑥3 $∼ 𝑑3(𝑣2) ) (C-EXTRACT) where 𝑑1 � 𝑑2 = 𝜇0 = 𝜇2 � 𝜇1. The existence of such 𝜇2 and 𝜇1 is a simple ap- plication of Bayes’ law: 𝜇2(𝑣2) = ∑ 𝑣1∈Val 𝜇0(𝑣1, 𝑣2), and 𝜇1(𝑣2) (𝑣1) = 𝜇0 (𝑣1,𝑣2) 𝜇2 (𝑣2) . The second to last step pull out ⌈𝑥2 = 𝑣2⌉ from the second conditioning modality C𝜇2 𝑣2. It is sound to pull out almost sure assertions like ⌈𝑥2 = 𝑣2⌉ but not gen- eral assertions. Last, we use the derived rule C-EXTRACT to eliminate the second conditioning and extract the distribution of 𝑥1 given 𝑥2. We see the ability of BLUEBELL to perform these manipulations as evidence that joint conditioning and independence form a sturdy abstraction over the se- mantic model (RQ1). The meta-reasoning required to manipulate the distribu- tions and the conditioning modality — here it is showing the existence of 𝜇1, 𝜇2 such that 𝑑1 � 𝑑2 = 𝜇2 � 𝜇1 — are minimal and localized. Our abstraction and 194 rules also offer a good way to inject facts about distributions without interfering with the rest of the proof context. 5.5.3 Multi-party Secure Computation In multi-party secure computation [Goldreich, 1998], the goal is for 𝑁 parties to compute a function 𝑓 (𝑥1, . . . , 𝑥𝑁 ) of some private data 𝑥𝑖 owned by each party 𝑖, without revealing any more information about 𝑥𝑖 than the output of 𝑓 would reveal. For example, if 𝑓 is addition, a secure computation of 𝑓 can be used to compute the total number of votes without revealing who voted positively: some information would leak (e.g. if the total is non-zero then somebody voted positively) but only what is revealed by knowing the total and nothing more. To achieve this objective, multi-party secure addition (MPSAdd) works by having the parties break their secret into 𝑁 secret shares which individually look random, but the sum of which amounts to the original secret. These secret shares are then distributed to the other parties so that each party knows an in- complete set of shares of the other parties. Yet, each party can reliably compute the result of the function by computing a function of the received shares. Barthe et al. [2019] analyze this example, where they prove the independence between each party’s view and any other party’s secrets after the first round of communications. However, there are two rounds of communication in the proto- col, and after the second round, each party would get more information about the other party’s values. So Barthe et al. [2019]’s proof does not ensure the end- to-end security of the protocol. 195 We aim to formally establish the end-to-end security of the protocol. As is very often the case, there is no single “canonical” way of specifying this kind of security property. For MPSAdd between three parties, for instance, one way to formalize security is a unary specification saying that, conditionally on the secret of party 1 and the sum of the other secrets, all the values received by 1 (we call this the view of 1) are independent from the secrets of the other parties. Roughly: (𝑥1, 𝑥2, 𝑥3) $∼ 𝜇0 ⊢ wp [𝑀𝑃𝑆𝐴𝑑𝑑] ∃𝜇. C𝜇 (𝑣, 𝑠).  ⌈𝑥1 = 𝑣 ∧ (𝑥2 + 𝑥3) = 𝑠⌉ ∗ Own(𝑣𝑖𝑒𝑤1) ∗ Own(𝑥2, 𝑥3)   where 𝜇0 is an arbitrary distribution of the three secrets among the three par- ties. The post-condition asserts that if we condition on the party 1’s secret and the sum of party 2 and party 3’s secrets, which the party 1 can infer based on the sum of the three secrets, then what party 1 can view, capture by 𝑣𝑖𝑒𝑤1 is inde- pendent of party 2 and party 3’s secrets. Here, the conditioning nicely expresses that the acceptable leakage is just the sum. There is also a natural relational formulation of the security goal, and in the conference paper Bao et al. [2025] we provide BLUEBELL proofs for: 1. the unary specification; 2. the relational specification (not using unary proof); 3. the equivalence of the two specifications. The relational specification can also be proved using Probabilistic Relational Hoare Logic Barthe et al. [2009]. In the following, we elaborate on the proof for the unary specification. 196 First, we can apply the program rules for loops and assignments to obtain the postcondition 𝑄: 𝑄 = 𝑋 ∗ 𝑅12 ∗ 𝑅3 ∗ 𝑆 ∗ 𝑆𝑢𝑚 where 𝑋 = (𝑥1, 𝑥2, 𝑥3) $∼ 𝜇0 𝑅12 = ∗ 𝑖∈{1,2,3} (𝑟 [𝑖] [1] $∼ U𝑝 ∗ 𝑟 [𝑖] [2] $∼ U𝑝) 𝑅3 = ∗ 𝑖∈{1,2,3} ⌈𝑟 [𝑖] [3] = 𝑥𝑖 − 𝑟 [𝑖] [1] − 𝑟 [𝑖] [2]⌉ 𝑆 =  ∧ 𝑖∈{1,2,3} 𝑠[𝑖] = 𝑟 [1] [𝑖] + 𝑟 [2] [𝑖] + 𝑟 [3] [𝑖]  Sum = ⌈𝑠𝑢𝑚 = 𝑠[1] + 𝑠[2] + 𝑠[3]⌉ Now the goal is to show that 𝑄 entails the desired postcondition. As a first step we transform 𝑋 into (𝑥1, 𝑥2 + 𝑥2) $∼ 𝜇 by DIST-FUN. Then we condition on 197 (𝑥1, 𝑥2 + 𝑥3, 𝑥2, 𝑥3) and the variables in 𝑅12, obtaining: C𝜇′ (𝑣1, 𝑣23, 𝑣2, 𝑣3).  ⌈𝑥1 = 𝑣1 ∧ (𝑥2 + 𝑥3) = 𝑣23 ∧ (𝑥2, 𝑥3) = (𝑣2, 𝑣3)⌉ ∗ CU𝑝 𝑢11. CU𝑝 𝑢12. CU𝑝 𝑢21. CU𝑝 𝑢22. CU𝑝 𝑢31. CU𝑝 𝑢32. 𝑟 [1] [1] = 𝑢11 ∧ 𝑟 [1] [2] = 𝑢12 ∧ 𝑟 [1] [3] = 𝑣1 − 𝑢11 − 𝑢12 𝑟 [2] [2] = 𝑢22 ∧ 𝑟 [2] [3] = 𝑣2 − 𝑢21 − 𝑢22 𝑟 [3] [2] = 𝑢32 ∧ 𝑟 [3] [3] = (𝑣23 − 𝑣2) − 𝑢31 − 𝑢32 𝑠[1] = 𝑢11 + 𝑢21 + 𝑢31 𝑠[2] = 𝑢12 + 𝑢22 + 𝑢32 𝑠[3] = 𝑣1 − 𝑢11 − 𝑢12 + 𝑣2 − 𝑢21 − 𝑢22 + (𝑣23 − 𝑣2) − 𝑢31 − 𝑢32 𝑠𝑢𝑚 = 𝑠[1] + 𝑠[2] + 𝑠[3]   Here 𝜇′ = bind(𝜇0, (𝑥1, 𝑥2, 𝑥3) ↦→ unit(𝑥1, 𝑥2 + 𝑥3, 𝑥2). We already weakened the assertion by forgetting the information about 𝑟 [2] [1] and 𝑟 [3] [1], which are not part of 𝑣𝑖𝑒𝑤1. Now we perform a change of variables using C-TRANSF, to express our equalities in terms of 𝑢′21 = 𝑢21 − 𝑣2 instead of 𝑢21 and 𝑢′31 = 𝑢31 − (𝑣23 − 𝑣2) instead of 𝑢31. To justify the change we simply observe that, for all 𝑛 ∈ Z𝑝, the function 𝑓𝑛 (𝑢) = (𝑢 − 𝑛) mod 𝑝 is a bijection and U𝑝 ◦ 𝑓 −1 𝑛 = U𝑝. This gives us, 198 with some simple arithmetic simplifications: C𝜇′ (𝑣1, 𝑣23, 𝑣2, 𝑣3).  ⌈𝑥1 = 𝑣1 ∧ (𝑥2 + 𝑥3) = 𝑣23 ∧ (𝑥2, 𝑥3) = (𝑣2, 𝑣3)⌉ ∗ CU𝑝 𝑢11. CU𝑝 𝑢12. CU𝑝 𝑢 ′ 21. CU𝑝 𝑢22. CU𝑝 𝑢 ′ 31. CU𝑝 𝑢32. 𝑟 [1] [1] = 𝑢11 ∧ 𝑟 [1] [2] = 𝑢12 ∧ 𝑟 [1] [3] = 𝑣1 − 𝑢11 − 𝑢12 𝑟 [2] [2] = 𝑢22 ∧ 𝑟 [2] [3] = −𝑢′21 − 𝑢22 𝑟 [3] [2] = 𝑢32 ∧ 𝑟 [3] [3] = −𝑢′31 − 𝑢32 𝑠[1] = 𝑢11 + 𝑢′21 + 𝑢 ′ 31 + 𝑣23 𝑠[2] = 𝑢12 + 𝑢22 + 𝑢32 𝑠[3] = 𝑣1 − 𝑢11 − 𝑢12 − 𝑢′21 − 𝑢22 − 𝑢′31 − 𝑢32 𝑠𝑢𝑚 = 𝑠[1] + 𝑠[2] + 𝑠[3]   In particular, we removed all dependencies on 𝑣2 from the inner formula. We can now apply C-ASSOC to collapse all the inner conditioning into a single one: C𝜇′ (𝑣1, 𝑣23, 𝑣2, 𝑣3).  ⌈𝑥1 = 𝑣1 ∧ (𝑥2 + 𝑥3) = 𝑣23 ∧ (𝑥2, 𝑥3) = (𝑣2, 𝑣3)⌉ ∗ C𝑈 (𝑣1,𝑣23) 𝑢. ⌈ ∗ ⌉𝑣𝑖𝑒𝑤1 = 𝑢  where 𝑈 (𝑣1, 𝑣23) = (𝑣 ← U𝑝 ⊗ . . . ⊗ U𝑝; return 𝑔(𝑣)) takes the six independent samples from U𝑝 and returns the values for each of the components of 𝑣𝑖𝑒𝑤1 (which justifies the dependency on 𝑣1 and 𝑣23). Finally, we split 𝜇′ = bind(𝜇, 𝜅) 199 obtaining: C𝜇′ (𝑣1, 𝑣23, 𝑣2, 𝑣3).  ⌈𝑥1 = 𝑣1 ∧ (𝑥2 + 𝑥3) = 𝑣23 ∧ (𝑥2, 𝑥3) = (𝑣2, 𝑣3)⌉ ∗ C𝑈 (𝑣1,𝑣23) 𝑢. ⌈𝑣𝑖𝑒𝑤1 = 𝑢⌉  ⊢ C𝜇′ (𝑣1, 𝑣23, 𝑣2, 𝑣3).  ⌈𝑥1 = 𝑣1 ∧ (𝑥2 + 𝑥3) = 𝑣23⌉ ∗ ⌈(𝑥2, 𝑥3) = (𝑣2, 𝑣3)⌉ ∗ 𝑣𝑖𝑒𝑤1 $∼ 𝑈 (𝑣1, 𝑣23)  (SURE-MERGE, C-UNIT-R) ⊢ C 𝜇̃0 (𝑣1, 𝑣23).  ⌈𝑥1 = 𝑣1 ∧ (𝑥2 + 𝑥3) = 𝑣23⌉ ∗ C𝜅(𝑣1,𝑣23) (𝑣2, 𝑣3).( ⌈(𝑥2, 𝑥3) = (𝑣2, 𝑣3)⌉ ∗ 𝑣𝑖𝑒𝑤1 $∼ 𝑈 (𝑣1, 𝑣23) )  (C-UNASSOC, SURE-STR-CONVEX) ⊢ C 𝜇̃0 (𝑣1, 𝑣23).  ⌈𝑥1 = 𝑣1 ∧ (𝑥2 + 𝑥3) = 𝑣23⌉ ∗ (𝑥2, 𝑥3) $∼ 𝜅(𝑣1, 𝑣23) ∗ 𝑣𝑖𝑒𝑤1 $∼ 𝑈 (𝑣1, 𝑣23)  (C-EXTRACT) ⊢ C 𝜇̃0 (𝑣1, 𝑣23).  ⌈𝑥1 = 𝑣1 ∧ (𝑥2 + 𝑥3) = 𝑣23⌉ ∗ 𝑣𝑖𝑒𝑤1 $∼ 𝑈 (𝑣1, 𝑣23) ∗ ∃𝜇23. (𝑥2, 𝑥3) $∼ 𝜇23  This gets us the desired postcondition and concludes the proof. 5.5.4 Von Neumann Extractor A randomness extractor is a mechanism that transforms a stream of “low- quality” randomness sources into a stream of “high-quality” randomness sources. The von Neumann extractor [von Neumann, 1951] is perhaps the ear- liest instance of such a mechanism, and it converts a stream of independent coins with the same bias 𝑝 into a stream of independent fair coins. Verifying 200 the correctness of the extractor requires careful reasoning under conditioning, and showcases the use of C-WP-SWAP in a unary setting, which gives positive answer to RQ2 and RQ4. def vn(𝑁): len :=0 repeat 𝑁: coin1 Ber(𝑝) coin2 Ber(𝑝) if coin1 ≠coin2 then: out[len] :=coin1 len :=len+1 Figure 5.8: Von Neumann extractor. We can model the extractor, up to 𝑁 ∈ N iterations, in our language3 as shown in fig. 5.8. The program repeat- edly flips two biased coins, and outputs the outcome of the first coin if the out- comes where different, otherwise it re- tries. As an example, we prove in BLUE- BELL that the bits produced in out are independent fair coin flips. Formally, for ℓ produced bits, we want the following to hold: Outℓ ≜ 𝑜𝑢𝑡 [0] $∼ Bern 1 2 ∗· · ·∗𝑜𝑢𝑡 [ℓ−1] $∼ Bern 1 2 . To know how many bits were produced, however, we need to condition on len obtaining the specification (recall 𝑃 ⊩ 𝑄 ≜ 𝑃 ∧ ownVar ⊢ 𝑄 ∧ ownVar): ⊩ wp [1: 𝑣𝑛(𝑁)] { ∃𝜇. C𝜇 ℓ. ( ⌈𝑙𝑒𝑛 = ℓ ≤ 𝑁⌉ ∗ Outℓ )} The BLUEBELL proof of this specification is shown in the outline in fig. 5.9. The postcondition straightforwardly generalizes to a loop invariant 𝑃(𝑖) = ∃𝜇. C𝜇 ℓ. ( ⌈𝑙𝑒𝑛 = ℓ ≤ 𝑖⌉ ∗ Outℓ ) At step (5.2) we show, by using C-UNIT-L and the definition of ⌈ · ⌉, that we can 3While technically our language does not support arrays, they can be easily encoded as a collection of 𝑁 variables. 201 { ownVar } len :=0{ ⌈𝑙𝑒𝑛 = 0⌉ }{ ∃𝜇. C𝜇 ℓ. ( ⌈𝑙𝑒𝑛 = ℓ ≤ 0⌉ )} (5.2) W P -L O O P w it h in va ri an t𝑃 (𝑖) = ∃𝜇 . C 𝜇 ℓ . ( ⌈𝑙𝑒𝑛 = ℓ ≤ 𝑖⌉ ∗O ut ℓ ) ,, repeat 𝑁:{ 𝑃(𝑖) } coin1 Ber(𝑝) coin2 Ber(𝑝){ 𝑃(𝑖) ∗ 𝑐𝑜𝑖𝑛1 $∼ Bern𝑝 ∗ 𝑐𝑜𝑖𝑛2 $∼ Bern𝑝 }{ C𝜇 ℓ. C𝛽 𝑏. ( ⌈𝑙𝑒𝑛 = ℓ ≤ 𝑖⌉ ∗ ⌈(𝑐𝑜𝑖𝑛1 ≠ 𝑐𝑜𝑖𝑛2) = 𝑏⌉ ∗Outℓ ∗ (⌜𝑏 = 1⌝ ⇒ 𝑐𝑜𝑖𝑛1 $∼ Bern 1 2 ) )} (5.3) C -W P -E L IM ,, {⌈𝑙𝑒𝑛 = ℓ ≤ 𝑖⌉ ∗ ⌈(𝑐𝑜𝑖𝑛1 ≠ 𝑐𝑜𝑖𝑛2) = 𝑏⌉ ∗Outℓ ∗ (⌜𝑏 = 1⌝ ⇒ 𝑐𝑜𝑖𝑛1 $∼ Bern 1 2 ) } if coin1 ≠coin2 then:{⌈𝑙𝑒𝑛 = ℓ ≤ 𝑖⌉ ∗ ⌈(𝑐𝑜𝑖𝑛1 ≠ 𝑐𝑜𝑖𝑛2)⌉ ∗Outℓ ∗ 𝑐𝑜𝑖𝑛1 $∼ Bern 1 2 } (5.4) out[len] :=coin1 len :=len+1{⌈𝑙𝑒𝑛 = ℓ + 1 ≤ 𝑖 + 1⌉ ∗ ⌈(𝑐𝑜𝑖𝑛1 ≠ 𝑐𝑜𝑖𝑛2)⌉ ∗Outℓ ∗ 𝑐𝑜𝑖𝑛1 $∼ Bern 1 2 ∗ ⌈(𝑜𝑢𝑡 [𝑙𝑒𝑛] = 𝑐𝑜𝑖𝑛1)⌉ } (5.5) ⌈(𝑐𝑜𝑖𝑛1 ≠ 𝑐𝑜𝑖𝑛2) = 𝑏⌉ ∗ Outℓ ∗ { ⌈𝑙𝑒𝑛 = ℓ + 1 ≤ 𝑖 + 1⌉ ∗ 𝑜𝑢𝑡 [𝑙𝑒𝑛] $∼ Bern 1 2 if 𝑏 = 1 ⌈𝑙𝑒𝑛 = ℓ ≤ 𝑖 + 1⌉ if 𝑏 = 0  (5.6){ C𝜇 ℓ. C𝛽 𝑏. ⌈(𝑐𝑜𝑖𝑛1 ≠ 𝑐𝑜𝑖𝑛2) = 𝑏⌉ ∗ Outℓ ∗ . . . }{ C𝜇′ ℓ ′. ( ⌈𝑙𝑒𝑛 = ℓ′ ≤ 𝑖 + 1⌉ ∗ Outℓ′ )} (5.7){ ∃𝜇. C𝜇 ℓ. ( ⌈𝑙𝑒𝑛 = ℓ ≤ 𝑁⌉ ∗ Outℓ )} Figure 5.9: Proof outline of the Von Neumann extractor example. obtain the loop invariant with 𝑖 = 0: 𝑃(0) = ∃𝜇. C𝜇 ℓ. ( ⌈𝑙𝑒𝑛 = ℓ ≤ 0⌉ ∗ Out0 ) = ∃𝜇. C𝜇 ℓ. ( ⌈𝑙𝑒𝑛 = ℓ ≤ 0⌉ ) . For the proof of the body of the loop we can assume 𝑃(𝑖) and we need to prove the postcondition 𝑃(𝑖 + 1). After sampling the two coins, at step (5.3) we apply the crucial insight behind the extractor. The key idea is that with some probability 𝑞 the two coins will be different, in which case the outcomes of the two coins can be either (0, 1) or (1, 0), which both have the same probability 𝑝(1 − 𝑝). Therefore, if the coins are different, 𝑐𝑜𝑖𝑛1 = 0 and 𝑐𝑜𝑖𝑛1 = 1 have the same probability, i.e. 𝑐𝑜𝑖𝑛1 looks like a fair coin. 202 BLUEBELL is capable of representing this reasoning as follows. We start with two independent biased coins, which we can combine into a random variable (𝑐𝑜𝑖𝑛1 ≠ 𝑐𝑜𝑖𝑛2, 𝑐𝑜𝑖𝑛1) recording whether the two outcomes were different and the outcome of the first coin. We use PROD-UNSPLIT and DIST-FUN to derive: 𝑐𝑜𝑖𝑛1 $∼ Bern𝑝 ∗ 𝑐𝑜𝑖𝑛2 $∼ Bern𝑝 ⊢ (𝑐𝑜𝑖𝑛1 ≠ 𝑐𝑜𝑖𝑛2, 𝑐𝑜𝑖𝑛1) $∼ 𝜇0 where 𝜇0 ≜ bind(Bern𝑝 ⊗Bern𝑝, (𝑐𝑜𝑖𝑛1, 𝑐𝑜𝑖𝑛2) ↦→ (𝑐𝑜𝑖𝑛1 ≠ 𝑐𝑜𝑖𝑛2, 𝑐𝑜𝑖𝑛1)) Now we can reformulate 𝜇0 to reflect our above-mentioned insight into why this extrac- tor works: there exists some probability 𝑞 and 𝑞′ such that 𝜇0 = 𝛽 � 𝜅 𝛽 ≜ Bern𝑞 𝜅(1) ≜ Bern 1 2 𝜅(0) ≜ Bern𝑞′ Here one first determines whether the two coins will be different or equal using a Bernoulli distribution that assigns probability 𝑞 for them to be different; here 𝑞 can be obtained using a function of 𝑝. The process then generates 𝑐1 accordingly using 𝜅: in the “different” branch (𝑏 = 1) the first coin is distributed as Bern 1 2 while in the “equal” branch (𝑏 = 0) the first coin is distributed with some bias 𝑞′ (also a function of 𝑝). So using 𝜇0 = 𝛽 � 𝜅 we derive: (𝑐𝑜𝑖𝑛1 ≠ 𝑐𝑜𝑖𝑛2, 𝑐𝑜𝑖𝑛1) $∼ (𝛽 � 𝜅) ⊢ C𝛽�𝜅 (𝑏, 𝑐1). ⌈(𝑐𝑜𝑖𝑛1 ≠ 𝑐𝑜𝑖𝑛2) = 𝑏 ∧ 𝑐𝑜𝑖𝑛1 = 𝑐1⌉ (C-UNIT-R) ⊢ C𝛽 𝑏. C𝜅(𝑏) 𝑐1. ⌈(𝑐𝑜𝑖𝑛1 ≠ 𝑐𝑜𝑖𝑛2) = 𝑏⌉ ∗ ⌈𝑐𝑜𝑖𝑛1 = 𝑐1⌉ (C-FUSE) ⊢ C𝛽 𝑏. ( ⌈(𝑐𝑜𝑖𝑛1 ≠ 𝑐𝑜𝑖𝑛2) = 𝑏⌉ ∗ C𝜅(𝑏) 𝑐1. ⌈𝑐𝑜𝑖𝑛1 = 𝑐1⌉ ) (SURE-STR-CONVEX) ⊢ C𝛽 𝑏. ( ⌈(𝑐𝑜𝑖𝑛1 ≠ 𝑐𝑜𝑖𝑛2) = 𝑏⌉ ∗ ⌜𝑏 = 1⌝ ⇒ CBern 1 2 𝑐1. ⌈𝑐𝑜𝑖𝑛1 = 𝑐1⌉ ) (C-CONS) ⊢ C𝛽 𝑏. ( ⌈(𝑐𝑜𝑖𝑛1 ≠ 𝑐𝑜𝑖𝑛2) = 𝑏⌉ ∗ ⌜𝑏 = 1⌝ ⇒ 𝑐𝑜𝑖𝑛1 $∼ Bern 1 2 ) (C-UNIT-R) 203 The application of C-FUSE allows us to first condition on 𝑐𝑜𝑖𝑛1 ≠ 𝑐𝑜𝑖𝑛2, and then the first coin. We can then weaken the case where 𝑏 = 0 and only record that if 𝑏 = 1 then 𝑐𝑜𝑖𝑛1 is a fair coin. This takes us through step (5.3) of fig. 5.9. Now the precondition of the if statement is conditional on len and 𝑐𝑜𝑖𝑛1 ≠ 𝑐𝑜𝑖𝑛2. Intuitively, we want to evalu- ate the effects of the if statement in the two possible outcomes and put together the results. This is precisely the purpose of the C-WP-SWAP rule, which together with C-CONS gives us the derived rule: C-WP-ELIM ∀𝑣 ∈ supp(𝜇). 𝑃(𝑣) ⊩ wp 𝑡 {𝑄(𝑣)} C𝜇 𝑣. 𝑃(𝑣) ⊩ wp 𝑡 {C𝜇 𝑣. 𝑄(𝑣)} By applying the rule twice (first on the conditioning on 𝑙𝑒𝑛, and then on the conditioning on 𝑐𝑜𝑖𝑛1 ≠ 𝑐𝑜𝑖𝑛2), we can process the if statement case by case, and then combine the postconditions obtained in each case. For the conditional with the guard 𝑐𝑜𝑖𝑛1 ≠ 𝑐𝑜𝑖𝑛2, the false branch is a skip (omitted), so it preserves the precondition with 𝑏 = 0. In the true branch, starting with the precondition at (5.4), we apply WP-ASSIGN to the assignments to obtain (5.5). Last, we com- bine the results from both branches by making the overall postcondition at (5.6) to be parametric on the value of 𝑏 (and ℓ). The last non-obvious step is (5.7) in fig. 5.9, where we show that the condi- tional postcondition of the if statement implies the loop invariant 𝑃(𝑖 + 1). Let 𝐾 (ℓ, 𝑏) =  ⌈𝑙𝑒𝑛 = ℓ + 1 ≤ 𝑖 + 1⌉ ∗ 𝑜𝑢𝑡 [𝑙𝑒𝑛] $∼ Bern 1 2 if 𝑏 = 1 ⌈𝑙𝑒𝑛 = ℓ ≤ 𝑖 + 1⌉ if 𝑏 = 0 204 then the step is proven as follows: C𝜇 ℓ. C𝛽 𝑏. ( ⌈(𝑐𝑜𝑖𝑛1 ≠ 𝑐𝑜𝑖𝑛2) = 𝑏⌉ ∗ Outℓ ∗ 𝐾 (ℓ, 𝑏) ) ⊢ C𝜇⊗𝛽 (ℓ, 𝑏). ( ⌈(𝑐𝑜𝑖𝑛1 ≠ 𝑐𝑜𝑖𝑛2) = 𝑏⌉ ∗ Outℓ ∗ 𝐾 (ℓ, 𝑏) ) (C-ASSOC) ⊢ C𝜇⊗𝛽 (ℓ, 𝑏). ( Outℓ ∗ 𝐾 (ℓ, 𝑏) ) (C-CONS) ⊢ C𝜇′′ (ℓ′, ℓ).  ⌈𝑙𝑒𝑛 = ℓ′ ≤ 𝑖 + 1⌉ ∗Outℓ′−1 ∗ 𝑜𝑢𝑡 [𝑙𝑒𝑛] $∼ Bern 1 2 if ℓ′ = ℓ + 1 ⌈𝑙𝑒𝑛 = ℓ′ ≤ 𝑖 + 1⌉ ∗ Outℓ′ if ℓ′ = ℓ (C-TRANSF) ⊢ C𝜇′′◦𝜋−1 1 ℓ′. ( ⌈𝑙𝑒𝑛 = ℓ′ ≤ 𝑖 + 1⌉ ∗ Outℓ′ ) (C-DIST-PROJ) ⊢ ∃𝜇′. C𝜇′ ℓ′. ( ⌈𝑙𝑒𝑛 = ℓ′ ≤ 𝑖 + 1⌉ ∗ Outℓ′ ) The application of C-TRANSF uses the function 𝑓 (ℓ, 𝑏) = (ℓ+𝑏, ℓ) to introduce the new ℓ′ and then we project away the unused ℓ using the derived C-DIST-PROJ (note that the rule applies to ⌈ · ⌉ assertions and multiple ownership assertions in a separating conjunction thanks to PROD-SPLIT and PROD-UNSPLIT). 5.6 Related Work Research on deductive verification of probabilistic programs has developed a wide range of techniques that employ unary and relational styles of reasoning. BLUEBELL advances the state of the art in both styles, by coherently unifying the strengths of both. Since this chapter focuses on the unary fragment of BLUE- BELL, here we compare BLUEBELL with unary-style deductive techniques in more details, and overview relational-style deductive techniques only at a high- level. At the end, we briefly survey other relevant techniques of reasoning about probabilistic programs. 205 Other Unary Reasoning Techniques for Probabilistic Programs Outside of probabilistic separation logic, another line of unary deductive verification tech- niques is the expectation-based approach. The high-level idea is to reason about expected quantities of probabilistic programs via a weakest-pre-expectation op- erator that propagates information about expected values backwards through the program. The approach has been classically used to verify randomized al- gorithms [Kozen, 1983, Morgan et al., 1996, Kaminski et al., 2016, Kaminski, 2019, Aguirre et al., 2021, Bartocci et al., 2020]. These logics offer ergonomic principles for expectations, but do not aim at unifying principles for analyzing more general classes of properties or proof techniques, like we attempt here. Ellora [Barthe et al., 2018] proposes an assertion-based logic to overcome the limitation of working only with expectations. But it does not support reasoning about separation or conditioning. Detailed Comparison with Lilac [Li et al., 2023a] Now that we have intro- duced the unary fragment of BLUEBELL thoroughly, we revisit its comparison with Lilac [Li et al., 2023a]. Lilac supports reasoning about independence and conditional independence in continuous distributions, thanks to their measure- theoretic model based on Borel spaces. BLUEBELL also uses a measure-theory based model, similar to Lilac, but is limited to discrete distributions. The lim- itation is imposed because we are only able to prove some key lemmas [Bao et al., 2025, Lemma C.1 - C.7] for discrete measures, though we speculate that they also hold for continuous measures. These lemmas are used in the proof of key rules such as C-WP-SWAP and C-FRAME, and Lilac’s proof system does not include similar rules. Also, while BLUEBELL uses Lilac’s independent product as a model of sep- 206 arating conjunction, it differs from Lilac in three aspects: (1) the treatment of ownership, (2) support for mutable state, and (3) the model of conditioning. In Lilac, ownership over program variables and expressions is de- fined as measurability. In BLUEBELL, however, we define ownership as almost-measurability, which is required to support inferences like Own(𝑥) ∗ ⌈𝑥 = 𝑦⌉ ⊢ Own(𝑦) These inferences were implicitly used in the first ver- sion of Lilac, but were not valid in its model. Their arxiv version Li et al. [2023b] fixes the issue by changing the meaning of ⌈𝑥 = 𝑦⌉, while our fix changes the meaning of ownership (and we define ⌈𝐸⌉ assertions based on regular owner- ship). Lilac works with immutable state [Staton, 2020], which simplifies reason- ing in certain contexts (e.g., the frame rule and the if rule). BLUEBELL’s model supports mutable state through a creative use of permissions, obtaining a clean frame rule, at the cost of some predictable bookkeeping. The more significant difference with Lilac is however in the definition of the conditioning modality. Lilac’s modality C𝑣←𝐸 𝑃(𝑣) roughly corresponds to the BLUEBELL assertion ∃𝜇. C𝜇 𝑣. (⌈𝐸 = 𝑣⌉ ∗ 𝑃(𝑣)). The difference is not merely syntactic, and requires changing the model of the modality. For example, Lilac’s modality satisfies C𝑣←𝐸 𝑃1(𝑣) ∧ C𝑣←𝐸 𝑃2(𝑣) ⊢ C𝑣←𝐸 (𝑃1(𝑣) ∧ 𝑃2(𝑣)), but the analogous rule C𝜇 𝑣. 𝐾1(𝑣) ∧ C𝜇 𝑣. 𝐾2(𝑣) ⊢ C𝜇 𝑣. (𝐾1(𝑣) ∧ 𝐾2(𝑣)) is unsound in BLUEBELL: the meaning of the modalities in the premise ensures the existence of two kernels 𝜅1 and 𝜅2 supporting 𝐾1 and 𝐾2 respectively, but the conclusion requires the existence of a single kernel supporting both 𝐾1 and 𝐾2. Lilac’s rule holds because when one conditions on a random variable, the corresponding kernels are unique. We did not find losing this rule limiting. 207 On the other hand, Lilac’s conditioning has two disadvantages: (i) it does not record the distribution of 𝐸 , losing this information when conditioning, (ii) it does not generalize to the relational setting. Even considering only the unary setting, having access to the distribution 𝜇 unlocks a number of new rules (e.g. C-UNIT-R and C-FUSE). In particular, the rules of BLUEBELL provide more ways to convert a conditional assertion back into an unconditional one, which is crucial when the end goal is not under conditioning but conditioning is helpful in intermediate steps. Relational Reasoning We summarize several extensions of pRHL-style rela- tional reasoning, the vanilla version of which is overviewed in section 5.1 Polaris [Tassarotti and Harper, 2019], is an early instance of a probabilis- tic relational (concurrent) separation logic. The motivation is to reconcile the relational lifting based reasoning with the semantics of concurrent programs. However, separation in Polaris is again classic disjointness of state and is not related to (conditional) independence in the style of PSL, Lilac, or BLUEBELL. Gregersen et al. [2024] recently proposed Clutch, a logic to prove contextual refinement in a probabilistic higher-order language, where “out of order” cou- plings between samplings are achieved by using ghost code that pre-samples some assignments, a technique inspired by prophecy variables [Jung et al., 2019]. In the conference version Bao et al. [2025], we showed how BLUEBELL can re- solve this issue without ghost code (in the context of first-order imperative pro- grams) by using framing and probabilistic independence creatively. In contrast to BLUEBELL, Clutch can only express relational properties; it also uses separa- tion but with its classical interpretation as disjointness of deterministic state. 208 CHAPTER 6 DISCUSSION In this thesis, we show three extensions of probabilistic separation logic for reasoning about probabilistic programs involving not only independence, but also negative dependence or conditional independence. We demonstrated that we can use these program logics to formalize the correctness of a range of ran- domized algorithms. With rules that utilize probabilistic (in)dependencies for modular reasoning, these program logic allows relatively compact and clean formal proofs. In this final chapter, we discuss more related work and direc- tions for future work. 6.1 Related Work Concurrent Developments in Probabilistic Separation Logic During the course of my doctoral study, various papers have explored other variations of probabilistic separation logic. Dal Lago et al. [2024] extends PSL to prove com- putational independence, which relaxes probabilistic independence by only re- quiring variables’ distribution to be computationally indistinguishable from in- dependence. They demonstrate their logic by formalizing several simple cryp- tographic protocols developed to guard against against adversaries with com- putational power. Ho et al. [2025] extends Lilac [Li et al., 2023a] to support a probabilistic programming language that can not only sample from distribu- tions but also condition based on observations and soft constraints. They then apply the logic to reason about a range of classic examples in Bayesian reason- ing, including Bayesian coin flip, collider Bayesian network, and the burglar alarm model. Yan et al. [2024] mechanizes a variation of PSL in Isabelle/HOL 209 and uses it to mechanize the security of several probabilistic oblivious algo- rithms. Similar to the unary fragment of BLUEBELL, Ignacij Jereb and Simpson [2025] also tackles the problem of formulating a clean frame rule for impera- tive probabilistic programs, using a model more similar to PSL’s than Lilac’s. However, they do not provide a full program logic with a soundness proof. On the foundation side, Li et al. [2024] bridges the gap between Lilac’s inde- pendent product with the standard independent product in probability theory by showing that they are equivalent up to a suitable equivalence of categories. Quantitative Separation Logic An alternative separation logic for probabilis- tic programs is Quantitative Separation Logic (QSL) by Batz et al. [2019]. QSL is developed to unify separation logic for heap-manipulating programs and weak- est preexpectation, which captures program states using expectations, i.e., numer- ical quantities, instead of assertions. QSL uses connectives in the bunched logic to construct new expectations over the existing one; for example, 𝑄1 ∗ 𝑄2 is the maximum of the numerical product of the quantity 𝑄1(ℎ1) and 𝑄2(ℎ2), where ℎ1, ℎ2 are disjoint subheaps of the current heap. As a result of these design choices, applications of QSL often involve reasoning about some quantities re- lating to the heap — for example, the expected length of a list, which are quite different from the applications of PSL. QSL is automated in Batz et al. [2022] through using a weakest preexpectation calculus to reason about programs and reducing entailments checking to standard separation logic’s entailments check- ing by Echenim et al. [2020]. Automated Verification of Probabilistic Programs Automated verification of probabilistic programs is an active field of research. For example, there 210 is a long line of work in automating weakest pre-expectation style calculus for probabilistic programs [Gretz et al., 2013, Chen et al., 2015, Feng et al., 2017, Batz et al., 2023, Bao et al., 2024], where the bottleneck is to compute the weakest pre-expectation of probabilistic loops. Bartocci et al. [2019, 2020] also utilize moment-based analysis to develop automated tools for analyz- ing probabilistic programs. Probabilistic model checking provides verifica- tion tools [Dehnert et al., 2017, Kwiatkowska et al., 2002] of a different flavor. They typically target probabilistic models specified as discrete-time Markov chains (DTMCs), continuous-time Markov chains (CTMCs), or Markov decision processes (MDPs) and ask users to specify desired properties using temporal formulas involving probabilities. While probabilistic programs can be trans- lated into such probabilistic models, it is unclear how to encode probabilistic (in)dependencies as a property in these tools, whose specification languages do not naturally support relating the probabilities of more than one events. 6.2 Directions for Future Work While recent work shows the potential of probabilistic separation logic, more work has to be done to make this technique appealing to practitioners of prob- abilistic programs. Here we envision some directions for further development of probabilistic separation logic. First of all, currently it takes expertise to man- ually construct the formal proof, so automation of the proof construction is im- portant to make it easier to adopt. Second, users may want to use a richer set of language features, e.g., concurrency, the conditioning operator, a more general loop, or higher-order programs, and it would be helpful to enrich probabilis- tic separation logic to support these language features as well. Third, it would 211 be great to have an unifying framework that is also extensible, so one does not need to apply different program logic to prove different properties. In my most recent project, I work on automating PSL for loop-less proba- bilistic programs, joint with former Cornell undergraduate student Jessica Cho and my advisor Justin Hsu. We draw inspiration from the automation of stan- dard separation logic [Berdine et al., 2004, 2005, 2006, Appel, 2011, Piskac et al., 2014] to develop a syntax-directed version of PSL. However, we observe that probabilistic programs introduce several additional challenges. For example, automating PSL must account for conditionals and loops that branch on ran- domized guards and apply custom axioms about independence at appropriate places. Moreover, various verification tasks demand the ability to infer the dis- tribution of an expression given an assertion. We view our work as a pilot study that demonstrates one approach to automating probabilistic separation logic, with significant opportunities for further exploration. For instance, how can we effectively leverage the post-condition to prune the proof during construction? Is there a loop rule particularly well-suited to automation? Furthermore, can existing techniques for synthesizing probabilistic loop invariants be adapted to support the automation of probabilistic separation logic in programs with loops? Advances in automation are also closely tied to the foundational develop- ment of probabilistic separation logic itself, which may lead to a cleaner and more automatable set of rules. In addition, a unified and extensible framework would enable automation of new extensions to build upon the automation in- frastructure developed for existing ones. 212 BIBLIOGRAPHY Stephen Abbott. Understanding analysis. Springer, 2015. Serge Abiteboul, Richard Hull, and Victor Vianu. Foundations of Databases, vol- ume 8. Addison-Wesley Reading, 1995. Samson Abramsky and Jouko A. Väänänen. From IF to BI. Synthese, 167(2): 207–230, 2009. URL https://doi.org/10.1007/s11229-008-9415-6. Alejandro Aguirre, Gilles Barthe, Justin Hsu, Benjamin Lucien Kaminski, Joost- Pieter Katoen, and Christoph Matheja. A pre-expectation calculus for proba- bilistic sensitivity. Proceedings of the ACM on Programming Languages, 5(POPL), January 2021. URL https://doi.org/10.1145/3434333. Nima Anari, Shayan Oveis Gharan, and Alireza Rezaei. Monte carlo markov chain algorithms for sampling strongly rayleigh distributions and determi- nantal point processes. In Vitaly Feldman, Alexander Rakhlin, and Ohad Shamir, editors, 29th Annual Conference on Learning Theory, volume 49 of Proceedings of Machine Learning Research, pages 103–115, Columbia Univer- sity, New York, New York, USA, 23–26 Jun 2016. PMLR. URL https:// proceedings.mlr.press/v49/anari16.html. Andrew W Appel. Verismall: Verified smallfoot shape analysis. In International Conference on Certified Programs and Proofs, pages 231–246. Springer, 2011. Jialu Bao, Simon Docherty, Justin Hsu, and Alexandra Silva. A bunched logic for conditional independence. In Proceedings of the 36th Annual ACM/IEEE Symposium on Logic in Computer Science, LICS ’21, New York, NY, USA, 2021. Association for Computing Machinery. ISBN 9781665448956. URL https:// doi.org/10.1109/LICS52264.2021.9470712. 213 https://doi.org/10.1007/s11229-008-9415-6 https://doi.org/10.1145/3434333 https://proceedings.mlr.press/v49/anari16.html https://proceedings.mlr.press/v49/anari16.html https://doi.org/10.1109/LICS52264.2021.9470712 https://doi.org/10.1109/LICS52264.2021.9470712 Jialu Bao, Marco Gaboardi, Justin Hsu, and Joseph Tassarotti. A separation logic for negative dependence. Proceedings of the ACM on Programming Languages, 6 (POPL), January 2022. URL https://doi.org/10.1145/3498719. Jialu Bao, Nitesh Trivedi, Drashti Pathak, Justin Hsu, and Subhajit Roy. Data- driven invariant learning for probabilistic programs. Formal Methods in System Design, pages 1–29, 2024. Jialu Bao, Emanuele D’Osualdo, and Azadeh Farzan. Bluebell: An alliance of relational lifting and independence for probabilistic reasoning. Proceedings of the ACM on Programming Languages, 9(POPL), January 2025. URL https: //doi.org/10.1145/3704894. Solon Barocas, Moritz Hardt, and Arvind Narayanan. Fairness and machine learn- ing: Limitations and opportunities. MIT press, 2023. Gilles Barthe, Benjamin Grégoire, and Santiago Zanella Béguelin. Formal certi- fication of code-based cryptographic proofs. In Proceedings of the 36th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages, pages 90–101, 2009. URL https://doi.org/10.1145/1594834.1480894. Gilles Barthe, Boris Köpf, Federico Olmedo, and Santiago Zanella-Béguelin. Probabilistic relational reasoning for differential privacy. ACM Transactions on Programming Languages and Systems (TOPLAS), 35(3):1–49, 2013. Gilles Barthe, Thomas Espitau, Benjamin Grégoire, Justin Hsu, Léo Stefanesco, and Pierre-Yves Strub. Relational reasoning via probabilistic coupling. In International Conference on Logic for Programming, Artificial Intelligence and Rea- soning (LPAR), pages 387–401. Springer, 2015. 214 https://doi.org/10.1145/3498719 https://doi.org/10.1145/3704894 https://doi.org/10.1145/3704894 https://doi.org/10.1145/1594834.1480894 Gilles Barthe, Noémie Fong, Marco Gaboardi, Benjamin Grégoire, Justin Hsu, and Pierre-Yves Strub. Advanced probabilistic couplings for differential pri- vacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Com- munications Security, pages 55–67, 2016a. Gilles Barthe, Marco Gaboardi, Benjamin Grégoire, Justin Hsu, and Pierre-Yves Strub. Proving differential privacy via probabilistic couplings. In 2016 31st Annual ACM/IEEE Symposium on Logic in Computer Science (LICS), pages 1–10, 2016b. URL https://doi.org/10.1145/2933575.2934554. Gilles Barthe, Benjamin Grégoire, Justin Hsu, and Pierre-Yves Strub. Coupling proofs are probabilistic product programs. SIGPLAN Not., 52(1):161–174, Jan- uary 2017. ISSN 0362-1340. URL https://doi.org/10.1145/3093333.3009896. Gilles Barthe, Thomas Espitau, Marco Gaboardi, Benjamin Grégoire, Justin Hsu, and Pierre-Yves Strub. An assertion-based program logic for probabilistic programs. In Programming Languages and Systems, pages 117–144, Cham, 2018. Springer International Publishing. Gilles Barthe, Justin Hsu, and Kevin Liao. A probabilistic separation logic. Proc. ACM Program. Lang., 4(POPL), December 2019. URL https://doi.org/ 10.1145/3371123. Ezio Bartocci, Laura Kovács, and Miroslav Stankovič. Automatic generation of moment-based invariants for prob-solvable loops. In Automated Technology for Verification and Analysis, Taipei, Taiwan, 2019. URL https://doi.org/10.1007/ 978-3-030-31784-3 15. Ezio Bartocci, Laura Kovács, and Miroslav Stankovič. Mora-automatic genera- tion of moment-based invariants. In International Conference on Tools and Al- 215 https://doi.org/10.1145/2933575.2934554 https://doi.org/10.1145/3093333.3009896 https://doi.org/10.1145/3371123 https://doi.org/10.1145/3371123 https://doi.org/10.1007/978-3-030-31784-3_15 https://doi.org/10.1007/978-3-030-31784-3_15 gorithms for the Construction and Analysis of Systems (TACAS), Dublin, Ireland, 2020. URL https://doi.org/10.1007/978-3-030-45190-5 28. Kevin Batz, Benjamin Lucien Kaminski, Joost-Pieter Katoen, Christoph Matheja, and Thomas Noll. Quantitative separation logic: a logic for reasoning about probabilistic pointer programs. Proc. ACM Program. Lang., 3(POPL), January 2019. URL https://doi.org/10.1145/3290347. Kevin Batz, Ira Fesefeldt, Marvin Jansen, Joost-Pieter Katoen, Florian Keßler, Christoph Matheja, and Thomas Noll. Foundations for entailment check- ing in quantitative separation logic (extended version). arXiv preprint arXiv:2201.11464, 2022. Kevin Batz, Mingshuai Chen, Sebastian Junges, Benjamin Lucien Kaminski, Joost-Pieter Katoen, and Christoph Matheja. Probabilistic program verifica- tion via inductive synthesis of inductive invariants. In International Conference on Tools and Algorithms for the Construction and Analysis of Systems, pages 410– 429. Springer, 2023. Luca Becchetti, Andrea Clementi, Emanuele Natale, Francesco Pasquale, and Gustavo Posta. Self-stabilizing repeated balls-into-bins. International Sympo- sium on Distributed Computing (DISC), 2019. Ioana O. Bercea and Guy Even. Dynamic dictionaries for multisets and counting filters with constant time operations. Algorithmica, 85(6):1786–1804, December 2022. ISSN 0178-4617. URL https://doi.org/10.1007/s00453-022-01057-0. Josh Berdine, Cristiano Calcagno, and Peter W. O’Hearn. A decidable fragment of separation logic. In Proceedings of the 24th International Conference on Founda- tions of Software Technology and Theoretical Computer Science, FSTTCS’04, page 216 https://doi.org/10.1007/978-3-030-45190-5_28 https://doi.org/10.1145/3290347 https://doi.org/10.1007/s00453-022-01057-0 97–109, Berlin, Heidelberg, 2004. Springer-Verlag. ISBN 3540240586. URL https://doi.org/10.1007/978-3-540-30538-5 9. Josh Berdine, Cristiano Calcagno, and Peter W O’hearn. Symbolic execution with separation logic. In Programming Languages and Systems: Third Asian Sym- posium, APLAS 2005, Tsukuba, Japan, November 2-5, 2005. Proceedings 3, pages 52–68. Springer, 2005. Josh Berdine, Cristiano Calcagno, and Peter W O’hearn. Smallfoot: Modular au- tomatic assertion checking with separation logic. In Formal Methods for Com- ponents and Objects: 4th International Symposium, FMCO 2005, Amsterdam, The Netherlands, November 1-4, 2005, Revised Lectures 4, pages 115–137. Springer, 2006. Burton H. Bloom. Space/time trade-offs in hash coding with allowable errors. Communications of the ACM, 13(7):422–426, 1970. URL https://doi.org/10. 1145/362686.362692. James Blustein and Amal El-Maazawi. Bloom filters. a tutorial, analysis, and survey. Halifax, NS: Dalhousie University, pages 1–31, 2002. Julius Borcea, Petter Brändén, and Thomas M. Liggett. Negative dependence and the geometry of polynomials. Journal of the American Mathematical Society, 22(2):521–567, 2009. Prosenjit Bose, Hua Guo, Evangelos Kranakis, Anil Maheshwari, Pat Morin, Jason Morrison, Michiel Smid, and Yihui Tang. On the false-positive rate of bloom filters. Information Processing Letters, 108(4):210–213, October 2008. ISSN 0020-0190. URL https://doi.org/10.1016/j.ipl.2008.05.018. 217 https://doi.org/10.1007/978-3-540-30538-5_9 https://doi.org/10.1145/362686.362692 https://doi.org/10.1145/362686.362692 https://doi.org/10.1016/j.ipl.2008.05.018 Stephen Brookes. A semantics for concurrent separation logic. tcs, 375(1–3): 227–270, 2007a. URL https://doi.org/10.1016/j.tcs.2006.12.034. Stephen Brookes. A semantics for concurrent separation logic. Theoretical Com- puter Science, 375(1):227–270, 2007b. ISSN 0304-3975. URL https://doi.org/ 10.1016/j.tcs.2006.12.034. Festschrift for John C. Reynolds’s 70th birthday. James Brotherston and Cristiano Calcagno. Classical BI: A Logic for Reasoning about Dualising Resources. In Proceedings of the 36th Annual ACM SIGPLAN- SIGACT Symposium on Principles of Programming Languages, POPL ’09, page 328–339, New York, NY, USA, 2009. Association for Computing Machinery. ISBN 9781605583792. URL https://doi.org/10.1145/1480881.1480923. Cristiano Calcagno, Peter W. O’Hearn, and Hongseok Yang. Local Action and Abstract Separation Logic. In 22nd Annual IEEE Symposium on Logic in Com- puter Science (LICS 2007), pages 366–378, Wroclaw, Poland, 2007. IEEE. ISBN 978-0-7695-2908-0. URL https://doi.org/10.1109/LICS.2007.30. Qinxiang Cao, Santiago Cuellar, and Andrew W. Appel. Bringing order to the separation logic jungle. In Asian Symposium on Programming Languages and Systems (APLAS), pages 190–211, Suzhou, China, 2017. Springer. Aleksandar Chakarov and Sriram Sankaranarayanan. Probabilistic program analysis with martingales. In International Conference on Computer Aided Veri- fication (CAV), pages 511–526, Saint Petersburg, Russia, 2013. Springer. URL https://doi.org/10.1007/978-3-642-39799-8 34. Yu-Fang Chen, Chih-Duo Hong, Bow-Yaw Wang, and Lijun Zhang. Counterexample-guided polynomial loop invariant generation by Lagrange interpolation. In CAV, 2015. doi: 10.1007/978-3-319-21690-4\ 44. 218 https://doi.org/10.1016/j.tcs.2006.12.034 https://doi.org/10.1016/j.tcs.2006.12.034 https://doi.org/10.1016/j.tcs.2006.12.034 https://doi.org/10.1145/1480881.1480923 https://doi.org/10.1109/LICS.2007.30 https://doi.org/10.1007/978-3-642-39799-8_34 David Maxwell Chickering. Learning bayesian networks is np-complete. In Learning from data: Artificial intelligence and statistics V, pages 121–130. Springer, 1996. David Maxwell Chickering. Optimal structure identification with greedy search. Journal of machine learning research, 3(Nov):507–554, 2002. Kenta Cho and Bart Jacobs. Disintegration and Bayesian inversion via string diagrams. Mathematical Structures in Computer Science, 29(7):938–971, 2019. URL https://doi.org/10.1017/S0960129518000488. Ken Christensen, Allen Roginsky, and Miguel Jimeno. A new analysis of the false positive rate of a bloom filter. Information Processing Letters, 110(21):944– 949, 2010. ISSN 0020-0190. URL https://doi.org/10.1016/j.ipl.2010.07.024. Ugo Dal Lago, Davide Davoli, and Bruce M Kapron. On Separation Logic, Com- putational Independence, and Pseudorandomness. In 2024 IEEE 37th Com- puter Security Foundations Symposium (CSF), pages 651–666. IEEE Computer Society, 2024. A Philip Dawid. Conditional independence in statistical theory. Journal of the Royal Statistical Society: Series B (Methodological), 41(1):1–15, 1979. A. Philip Dawid. Separoids: A mathematical framework for conditional inde- pendence and irrelevance. Annals of Mathematics and Artificial Intelligence, 32 (1-4):335–372, 2001. Christian Dehnert, Sebastian Junges, Joost-Pieter Katoen, and Matthias Volk. A storm is coming: A modern probabilistic model checker. In International Conference on Computer Aided Verification, pages 592–600. Springer, 2017. 219 https://doi.org/10.1017/S0960129518000488 https://doi.org/10.1016/j.ipl.2010.07.024 Edsger W Dijkstra. Guarded commands, nondeterminacy and formal deriva- tion of programs. Communications of the ACM, 18(8):453–457, 1975. Bolin Ding and Arnd Christian König. Fast set intersection in memory. Pro- ceedings of the VLDB Endowment, 4(4):255–266, 2011. URL https://doi.org/10. 14778/1938545.1938550. Simon Docherty. Bunched Logics: A Uniform Approach. PhD thesis, UCL (Univer- sity College London), 2019. Devdatt P. Dubhashi and Desh Ranjan. Balls and bins: A study in negative dependence. Random Structures and Algorithms, 13(2):99–124, 1998. Arnaud Durand, Miika Hannula, Juha Kontinen, Arne Meier, and Jonni Virtema. Probabilistic Team Semantics. In International Symposium on Founda- tions of Information and Knowledge Systems (FoIKS), Budapest, Hungary, volume 10833 of Lecture Notes in Computer Science, pages 186–206. Springer, 2018. URL https://doi.org/10.1007/978-3-319-90050-6 11. Evgeniı̆ Borisovich Dynkin. Theory of Markov processes. Courier Corporation, 2012. Mnacho Echenim, Radu Iosif, and Nicolas Peltier. The bernays-schönfinkel- ramsey class of separation logic with uninterpreted predicates. ACM Trans. Comput. Logic, 21(3), March 2020. ISSN 1529-3785. doi: 10.1145/3380809. URL https://doi.org/10.1145/3380809. Facebook. Infer static analyzer. URL https://fbinfer.com/. Ronald Fagin and Moshe Y. Vardi. The Theory of Data Dependencies - An Overview. In International Colloquium on Automata, Languages and 220 https://doi.org/10.14778/1938545.1938550 https://doi.org/10.14778/1938545.1938550 https://doi.org/10.1007/978-3-319-90050-6_11 https://doi.org/10.1145/3380809 https://fbinfer.com/ Programming (ICALP), pages 1–22, 1984. URL https://doi.org/10.1007/ 3-540-13345-3 1. Yijun Feng, Lijun Zhang, David N Jansen, Naijun Zhan, and Bican Xia. Finding polynomial loop invariants for probabilistic programs. In ATVA, 2017. Michael J Fischer, Nancy A Lynch, and Michael S Paterson. Impossibility of distributed consensus with one faulty process. Journal of the ACM (JACM), 32 (2):374–382, 1985. Nir Friedman, Dan Geiger, and Moises Goldszmidt. Bayesian network classi- fiers. Machine learning, 29(2):131–163, 1997. Bert E Fristedt and Lawrence F Gray. A modern approach to probability theory. Springer Science & Business Media, 2013. Tobias Fritz. A synthetic approach to Markov kernels, conditional independence and theorems on sufficient statistics. Advances in Mathematics, 370:107–239, 2020. URL https://doi.org/10.1016/j.aim.2020.107239. Dan Geiger and Judea Pearl. Logical and Algorithmic Properties of Conditional Independence and Graphical Models. The Annals of Statistics, 21(4):2001–2021, 1993. Michele Giry. A categorical approach to probability theory. Categorical aspects of topology and analysis, pages 68–85, 1982. Robert Goldblatt. Varieties of complex algebras. Annals of Pure and Applied Logic, 44(3):173–242, 1989. URL https://doi.org/10.1016/0168-0072(89)90032-8. Oded Goldreich. Secure multi-party computation. Manuscript. Preliminary ver- sion, 78(110):1–108, 1998. 221 https://doi.org/10.1007/3-540-13345-3_1 https://doi.org/10.1007/3-540-13345-3_1 https://doi.org/10.1016/j.aim.2020.107239 https://doi.org/10.1016/0168-0072(89)90032-8 Kiran Gopinathan and Ilya Sergey. Certifying certainty and uncertainty in approximate membership query structures. In International Conference on Computer Aided Verification (CAV), volume 12225 of Lecture Notes in Com- puter Science, pages 279–303, Los Angeles, California, 2020. springer. URL https://doi.org/10.1007/978-3-030-53291-8 16. Andrew D Gordon, Thomas A Henzinger, Aditya V Nori, and Sriram K Raja- mani. Probabilistic programming. In Future of Software Engineering Proceed- ings, pages 167–181. 2014. Simon Oddershede Gregersen, Alejandro Aguirre, Philipp G Haselwarter, Joseph Tassarotti, and Lars Birkedal. Asynchronous Probabilistic Couplings in Higher-Order Separation Logic. Proceedings of the ACM on Programming Languages, 8(POPL):753–784, 2024. Friedrich Gretz, Joost-Pieter Katoen, and Annabelle McIver. Prinsys - on a quest for probabilistic loop invariants. In QEST, 2013. doi: 10.1007/ 978-3-642-40196-1\ 17. Tao Gu, Jialu Bao, Justin Hsu, Alexandra Silva, and Fabio Zanasi. A categorical approach to dibi models. In 9th International Conference on Formal Structures for Computation and Deduction, FSCD 2024, July 10-13, 2024, Tallinn, Estonia, volume 299 of LIPIcs, pages 17:1–17:20. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2024. URL https://doi.org/10.4230/LIPIcs.FSCD.2024.17. Miika Hannula, Juha Kontinen, Jan Van den Bussche, and Jonni Virtema. Descriptive complexity of real computation and probabilistic independence logic. In ”IEEE Symposium on Logic in Computer Science (LICS), pages 550–563, 2020. 222 https://doi.org/10.1007/978-3-030-53291-8_16 https://doi.org/10.4230/LIPIcs.FSCD.2024.17 Jaakko Hintikka and Gabriel Sandu. Informational Independence as a Semanti- cal Phenomenon. In Logic, Methodology and Philosophy of Science VIII, volume 126 of Studies in Logic and the Foundations of Mathematics, pages 571–589. Else- vier, 1989. URL https://doi.org/10.1016/S0049-237X(08)70066-1. Shing Hin Ho, Nicolas Wu, and Azalea Raad. Bayesian separation logic. arXiv preprint arXiv:2507.15530, 2025. Charles Antony Richard Hoare. Algorithm 64: quicksort. Communications of the ACM, 4(7):321, 1961. Tony Hoare, Bernhard Möller, Georg Struth, and Ian Wehrman. Concurrent Kleene algebra and its foundations. The Journal of Logic and Algebraic Program- ming, 80(6):266–296, 2011. Steven J Holtzen. Exploiting Program Structure for Scaling Probabilistic Program- ming. University of California, Los Angeles, 2021. Justin Hsu. Probabilistic Couplings for Probabilistic Reasoning. PhD thesis, Univer- sity of Pennsylvania, 2017. Janez Ignacij Jereb and Alex Simpson. Safety, relative tightness and the proba- bilistic frame rule. arXiv e-prints, pages arXiv–2506, 2025. Bart Jacobs and Fabio Zanasi. A Formal Semantics of Influence in Bayesian Rea- soning. In International Symposium on Mathematical Foundations of Computer Science (MFCS), Aalborg, Denmark, volume 83 of Leibniz International Proceed- ings in Informatic, pages 21:1–21:14. dagstuhl, 2017. URL https://doi.org/10. 4230/LIPIcs.MFCS.2017.21. Kumar Joag-Dev and Frank Proschan. Negative association of random variables 223 https://doi.org/10.1016/S0049-237X(08)70066-1 https://doi.org/10.4230/LIPIcs.MFCS.2017.21 https://doi.org/10.4230/LIPIcs.MFCS.2017.21 with applications. The Annals of Statistics, 11(1):286–295, 1983. URL https: //doi.org/10.1214/aos/1176346079. Ralf Jung, David Swasey, Filip Sieczkowski, Kasper Svendsen, Aaron Turon, Lars Birkedal, and Derek Dreyer. Iris: Monoids and invariants as an orthog- onal basis for concurrent reasoning. ACM SIGPLAN Notices, 50(1):637–650, 2015. Ralf Jung, Robbert Krebbers, Jacques-Henri Jourdan, Ales Bizjak, Lars Birkedal, and Derek Dreyer. Iris from the ground up: A modular foundation for higher- order concurrent separation logic. Journal of Functional Programming, 28:e20, 2018. URL https://doi.org/10.1017/S0956796818000151. Ralf Jung, Rodolphe Lepigre, Gaurav Parthasarathy, Marianna Rapoport, Amin Timany, Derek Dreyer, and Bart Jacobs. The future is ours: prophecy variables in separation logic. Proc. ACM Program. Lang., 4(POPL), December 2019. URL https://doi.org/10.1145/3371113. Benjamin Lucien Kaminski. Advanced Weakest Precondition Calculi for Probabilistic Programs. PhD thesis, RWTH Aachen University, 2019. Benjamin Lucien Kaminski, Joost-Pieter Katoen, Christoph Matheja, and Fed- erico Olmedo. Weakest precondition reasoning for expected run–times of probabilistic programs. In European Symposium on Programming (ESOP), pages 364–389. Springer, 2016. Neville Kenneth Kitson, Anthony C Constantinou, Zhigao Guo, Yang Liu, and Kiattikun Chobtham. A survey of bayesian network structure learning. Arti- ficial Intelligence Review, 56(8):8721–8814, 2023. 224 https://doi.org/10.1214/aos/1176346079 https://doi.org/10.1214/aos/1176346079 https://doi.org/10.1017/S0956796818000151 https://doi.org/10.1145/3371113 Daphne Koller and Nir Friedman. Probabilistic graphical models: principles and techniques. MIT press, 2009. Dexter Kozen. Semantics of probabilistic programs. Journal of Computer and System Sciences, 22(3):328–350, 1981. URL https://doi.org/10.1016/ 0022-0000(81)90036-2. Dexter Kozen. A probabilistic PDL. In Proceedings of the Fifteenth Annual ACM Symposium on Theory of Computing, pages 291–297, 1983. Robbert Krebbers, Jacques-Henri Jourdan, Ralf Jung, Joseph Tassarotti, Jan- Oliver Kaiser, Amin Timany, Arthur Charguéraud, and Derek Dreyer. MoSeL: A general, extensible modal framework for interactive proofs in separation logic. Proceedings of the ACM on Programming Languages, 2(ICFP):77:1–77:30, 2018. URL https://doi.org/10.1145/3236772. Alex Kulesza and Ben Taskar. Determinantal point processes for machine learn- ing. Found. Trends Mach. Learn., 5(2-3):123–286, 2012. URL https://doi.org/ 10.1561/2200000044. Marta Kwiatkowska, Gethin Norman, and David Parker. Prism: Probabilistic symbolic model checker. In International Conference on Modelling Techniques and Tools for Computer Performance Evaluation, pages 200–204. Springer, 2002. John M. Li, Amal Ahmed, and Steven Holtzen. Lilac: A modal separation logic for conditional probability. Proc. ACM Program. Lang., 7(PLDI), June 2023a. URL https://doi.org/10.1145/3591226. John M. Li, Amal Ahmed, and Steven Holtzen. Lilac: A modal separation logic for conditional probability, 2023b. URL https://arxiv.org/abs/2304.01339. 225 https://doi.org/10.1016/0022-0000(81)90036-2 https://doi.org/10.1016/0022-0000(81)90036-2 https://doi.org/10.1145/3236772 https://doi.org/10.1561/2200000044 https://doi.org/10.1561/2200000044 https://doi.org/10.1145/3591226 https://arxiv.org/abs/2304.01339 John M. Li, Jon Aytac, Philip Johnson-Freyd, Amal Ahmed, and Steven Holtzen. A nominal approach to probabilistic separation logic. In IEEE Symposium on Logic in Computer Science (LICS), pages 55:1–55:14. ACM, 2024. Nancy A Lynch. Distributed algorithms. Elsevier, 1996. Michael Mitzenmacher and Eli Upfal. Probability and Computing - Randomized Algorithms and Probabilistic Analysis. Cambridge University Press, 2005. Eugenio Moggi. Notions of computation and monads. Information and Computa- tion, 93(1):55–92, 1991. URL https://doi.org/10.1016/0890-5401(91)90052-4. Carroll Morgan, Annabelle McIver, and Karen Seidel. Probabilistic predicate transformers. ACM Transactions on Programming Languages and Systems, 1996. URL https://doi.org/10.1145/229542.229547. James K Mullin. A second look at bloom filters. Communications of the ACM, 26 (8):570–571, 1983. Wolfgang Mulzer. Five Proofs of Chernoff’s Bound with Applications, May 2019. URL https://doi.org/10.48550/arXiv.1801.03365. Kevin P Murphy. Machine learning: a probabilistic perspective. MIT press, 2012. Fritz Obermeyer, Eli Bingham, Martin Jankowiak, Neeraj Pradhan, Justin Chiu, Alexander Rush, and Noah Goodman. Tensor variable elimination for plated factor graphs. In International Conference on Machine Learning, pages 4871– 4880. PMLR, 2019. Peter W. O’Hearn and David J. Pym. The logic of bunched implications. bsl, pages 215–244, 1999. 226 https://doi.org/10.1016/0890-5401(91)90052-4 https://doi.org/10.1145/229542.229547 https://doi.org/10.48550/arXiv.1801.03365 Prakash Panangaden. Labelled Markov Processes. Imperial College Press, 2009. URL https://doi.org/10.1142/p595. Judea Pearl. Probabilistic reasoning in intelligent systems: networks of plausible in- ference. Elsevier, 2014. Judea Pearl and Azaria Paz. Graphoids: A Graph-Based Logic for Reasoning about Relevance Relations. University of California (Los Angeles). Computer Science Department, ˜, 1985. Judea Pearl, Dan Geiger, and Thomas Verma. Conditional independence and its representations. Kybernetika, 25(7):33–44, 1989. Ruzica Piskac, Thomas Wies, and Damien Zufferey. Automating separation logic with trees and data. In Computer Aided Verification: 26th International Conference, CAV 2014, Held as Part of the Vienna Summer of Logic, VSL 2014, Vienna, Austria, July 18-22, 2014. Proceedings 26, pages 711–728. Springer, 2014. Andrew M. Pitts. Nominal Sets: Names and Symmetry in Computer Science. Cam- bridge Tracts in Theoretical Computer Science. Cambridge University Press, 2013. URL https://doi.org/10.1017/CBO9781139084673. Konstantinos Psounis and Balaji Prabhakar. A randomized web-cache re- placement scheme. In Proceedings IEEE INFOCOM 2001. Conference on Com- puter Communications. Twentieth Annual Joint Conference of the IEEE Computer and Communications Society (Cat. No.01CH37213), volume 3, pages 1407–1415 vol.3, 2001. URL https://doi.org/110.1109/INFCOM.2001.916636. Michael O Rabin. Probabilistic algorithm for testing primality. Journal of number theory, 12(1):128–138, 1980. 227 https://doi.org/10.1142/p595 https://doi.org/10.1017/CBO9781139084673 https://doi.org/110.1109/INFCOM.2001.916636 Prabhakar Raghavan and Clark D Tompson. Randomized rounding: a tech- nique for provably good algorithms and algorithmic proofs. Combinatorica, 7 (4):365–374, 1987. John C Reynolds. Separation logic: A logic for shared mutable data structures. In Proceedings 17th Annual IEEE Symposium on Logic in Computer Science, pages 55–74. IEEE, 2002. John A Rice. Mathematical statistics and data analysis, volume 371. Thom- son/Brooks/Cole Belmont, CA, 2007. Marc Romanı́. A short proof of Hoeffding’s lemma, May 2021. Jeffrey S. Rosenthal. A First Look at Rigorous Probability Theory. World Scientific Publishing Company, 2006. Alex Simpson. Category-theoretic Structure for Independence and Conditional Independence. In Conference on the Mathematical Foundations of Programming Semantics (MFPS), pages 281–297, 2018. URL https://doi.org/10.1016/j.entcs. 2018.03.028. Alex Simpson. Equivalence and conditional independence in atomic sheaf logic. In Proceedings of the 39th Annual ACM/IEEE Symposium on Logic in Computer Science, pages 1–14, 2024. Aravind Srinivasan. Distributions on level-sets with applications to approx- imation algorithms. In IEEE Symposium on Foundations of Computer Science (FOCS), pages 588–597, Las Vegas, Nevada, 2001. IEEE. URL https://doi. org/10.1109/SFCS.2001.959935. Sam Staton. Probabilistic programs as measures. Foundations of Probabilistic Programming, page 43, 2020. 228 https://doi.org/10.1016/j.entcs.2018.03.028 https://doi.org/10.1016/j.entcs.2018.03.028 https://doi.org/10.1109/SFCS.2001.959935 https://doi.org/10.1109/SFCS.2001.959935 Robert S Strichartz. The way of analysis. Jones & Bartlett Learning, 2000. Subhash Suri. Caching, January 2020. URL https://sites.cs.ucsb.edu/∼suri/ ccs130a/Caching.pdf. Joseph Tassarotti and Robert Harper. A separation logic for concurrent random- ized programs. Proceedings of the ACM on Programming Languages, 3(POPL): 64:1–64:30, 2019. URL https://doi.org/10.1145/3290377. Jouko Väänänen. Dependence Logic: A New Approach to Independence Friendly Logic. London Mathematical Society Student Texts. Cambridge University Press, 2007. URL https://doi.org/10.1017/CBO9780511611193. Viktor Vafeiadis and Matthew Parkinson. A marriage of rely/guarantee and separation logic. In CONCUR 2007, Lisbon, Portugal, September 3-8, 2007. Pro- ceedings 18, pages 256–271. Springer, 2007. John von Neumann. Various techniques used in connection with random digits. Journal of Research of the National Bureau of Standards, Applied Math Series, pages 36–38, 1951. Jinyi Wang, Yican Sun, Hongfei Fu, Krishnendu Chatterjee, and Amir Kafsh- dar Goharshady. Quantitative analysis of assertion violations in probabilis- tic programs. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 1171–1186, Virtual, 2021. acmpress. URL https://doi.org/10.1145/3453483.3454102. Yuxin Wang, Zeyu Ding, Guanhong Wang, Daniel Kifer, and Danfeng Zhang. Proving differential privacy with shadow execution. Proceedings of the ACM on Programming Languages, (POPL, 19):655–669, 2019. 229 https://sites.cs.ucsb.edu/~suri/ccs130a/Caching.pdf https://sites.cs.ucsb.edu/~suri/ccs130a/Caching.pdf https://doi.org/10.1145/3290377 https://doi.org/10.1017/CBO9780511611193 https://doi.org/10.1145/3453483.3454102 Pengbo Yan, Toby Murray, Olga Ohrimenko, Van-Thuan Pham, and Robert Si- son. Combining classical and probabilistic independence reasoning to ver- ify the security of oblivious algorithms. In International Symposium on Formal Methods, pages 188–205. Springer, 2024. Danfeng Zhang and Daniel Kifer. LightDP: Towards automating differential privacy proofs. In ACM SIGPLAN–SIGACT Symposium on Principles of Pro- gramming Languages (POPL), pages 888–901, 2017. 230 APPENDIX A BUNCHED LOGIC AND PROBABILISTIC SEPARATION LOGIC A.1 Proofs related to Bunched Logic Theorem 2.2.13. XD = (𝑋D, ⊑D, ⊗D, 𝐸D) is a BI frame. Proof. We show that the defined structure satisfies all the frame conditions. Commutativity For any 𝜇1 ∈ D(Mem[𝑆]), 𝜇2 ∈ D(Mem[𝑇]), 𝜇 ∈ 𝜇1 ⊗D 𝜇2 ⇔𝜇(𝑚) = 𝜇1(𝜋𝑆𝑚) · 𝜇2(𝜋𝑇𝑚) for any 𝑚 ∈ Mem[𝑆 ∪ 𝑇] ⇔𝜇 ∈ 𝜇2 ⊗D 𝜇1. Associativity Given 𝜇1 ∈ D(Mem[𝑆]), 𝜇2 ∈ D(Mem[𝑇]), 𝜇3 ∈ D(Mem[𝑅]), for any 𝜇0 ∈ 𝜇1 ⊗D 𝜇2 and 𝜇 ∈ 𝜇0 ⊗D 𝜇3, 𝜇 ∈ 𝜇0 ⊗D 𝜇3 ⇔𝜇(𝑚) = 𝜇0(𝜋𝑆∪𝑇𝑚) · 𝜇3(𝜋𝑅𝑚) for any 𝑚 ∈ Mem[𝑆 ∪ 𝑇 ∪ 𝑅] ⇔𝜇(𝑚) = (𝜇1(𝜋𝑆𝑚) · 𝜇2(𝜋𝑇𝑚)) · 𝜇3(𝜋𝑅𝑚) for any 𝑚 ∈ Mem[𝑆 ∪ 𝑇 ∪ 𝑅] ⇔𝜇(𝑚) = 𝜇1(𝜋𝑆𝑚) · (𝜇2(𝜋𝑇𝑚) · 𝜇3(𝜋𝑅𝑚)) for any 𝑚 ∈ Mem[𝑆 ∪ 𝑇 ∪ 𝑅] 231 Define 𝜇′ = 𝜋𝑇∪𝑅𝜇. Then for any 𝑚 ∈ Mem[𝑇 ∪ 𝑅], 𝜇′(𝑚) = ∑︁ 𝑚′∈Mem[𝑆] 𝜇(𝑚′ ⊲⊳ 𝑚) = ∑︁ 𝑚′∈Mem[𝑆] 𝜇1(𝜋𝑆𝑚′ ⊲⊳ 𝑚) · (𝜇2(𝜋𝑇𝑚′ ⊲⊳ 𝑚) · 𝜇3(𝜋𝑅𝑚′ ⊲⊳ 𝑚)) = ©­« ∑︁ 𝑚′∈Mem[𝑆] 𝜇1(𝑚′)ª®¬ · (𝜇2(𝜋𝑇𝑚) · 𝜇3(𝜋𝑅𝑚)) = 𝜇2(𝜋𝑇𝑚) · 𝜇3(𝜋𝑅𝑚) Thus, 𝜇′ ∈ 𝜇2 ⊗D 𝜇3, and furthermore, 𝜇(𝑚) = 𝜇1(𝜋𝑆𝑚) · (𝜇2(𝜋𝑇𝑚) · 𝜇3(𝜋𝑅𝑚)) for any 𝑚 ∈ Mem[𝑆 ∪ 𝑇 ∪ 𝑅] ⇒𝜇(𝑚) = 𝜇1(𝜋𝑆𝑚) · 𝜇′(𝜋𝑇∪𝑅𝑚) for any 𝑚 ∈ Mem[𝑆 ∪ 𝑇 ∪ 𝑅] ⇒𝜇 ∈ 𝜇1 ⊗D 𝜇′ Unit Existence Given any 𝜇 ∈ D(Mem[𝑆]), for any 𝑚 ∈ Mem[𝑆], (𝜇 ⊗D ⟨⟩)(𝑚) = 𝜇(𝜋𝑆𝑚) · ⟨⟩(𝜋∅𝑚) = 𝜇(𝑚). Thus, 𝜇 ∈ 𝜇 ⊗D ⟨⟩. Unit Closure 𝐸D is closed under the pre-order as 𝐸D = 𝑋D. Unit Coherence For any 𝜇𝑥 ∈ D(Mem[𝑆]), 𝜇𝑒 ∈ D(Mem[𝑇]), if 𝜇𝑦 ∈ 𝜇𝑥 ⊗D 𝜇𝑒, then for any 𝑚 ∈ Mem[𝑆 ∪ 𝑇], 𝜇𝑦 (𝑚) = 𝜇𝑥 (𝜋𝑆𝑚) · 𝜇𝑒 (𝜋𝑇𝑚). Thus, for any 𝑚𝑆 ∈ Mem[𝑆], 𝑚𝑇 ∈ Mem[𝑇], 𝜇𝑦 (𝑚𝑆 ⊲⊳ 𝑚𝑇 ) = 𝜇𝑥 (𝑚𝑆) · 𝜇𝑒 (𝑚𝑇 ), which implies that∑︁ 𝑚′ 𝑇 ∈Mem[𝑇] 𝜇𝑦 (𝑚𝑆 ⊲⊳ 𝑚′𝑇 ) = ∑︁ 𝑚′ 𝑇 ∈Mem[𝑇] 𝜇𝑥 (𝑚𝑆) · 𝜇𝑒 (𝑚′𝑇 ) = 𝜇𝑥 (𝑚𝑆). Therefore, 𝜇𝑥 ⊑D 𝜇𝑦. 232 Down-Closed If 𝜇𝑧 ∈ 𝜇𝑥 ⊗D 𝜇𝑦, and 𝜇′𝑥 ⊑D 𝜇𝑥 , 𝜇′𝑦 ⊑D 𝜇𝑦, then define 𝑋 = dom(𝜇𝑥), 𝑌 = dom(𝜇𝑦), 𝑋′ = dom(𝜇′𝑥), 𝑌 ′ = dom(𝜇′𝑦), and define 𝜇 = 𝜋𝑋 ′∪𝑌 ′𝜇𝑧. The fact that 𝜇𝑧 ∈ 𝜇𝑥 ⊗D 𝜇𝑦 implies that for any 𝑚 ∈ Mem[𝑋 ∪ 𝑌 ], 𝜇𝑧 (𝑚) = 𝜇𝑥 (𝜋𝑋𝑚) · 𝜇𝑦 (𝜋𝑌𝑚); Thus, 𝜇(𝑚) = (𝜋𝑋 ′∪𝑌 ′𝜇𝑧) (𝑚) = ∑︁ 𝑚′∈Mem[𝑋∪𝑌\(𝑋 ′∪𝑌 ′)] 𝜇𝑧 (𝑚′ ⊲⊳ 𝑚) = ∑︁ 𝑚′∈Mem[𝑋∪𝑌\(𝑋 ′∪𝑌 ′)] 𝜇𝑥 (𝜋𝑋𝑚′ ⊲⊳ 𝑚) · 𝜇𝑦 (𝜋𝑌𝑚′ ⊲⊳ 𝑚) = ∑︁ 𝑚1∈Mem[𝑋\𝑋 ′] ∑︁ 𝑚2∈Mem[𝑌\𝑌 ′] 𝜇𝑥 (𝜋𝑋𝑚1 ⊲⊳ 𝑚2 ⊲⊳ 𝑚) · 𝜇𝑦 (𝜋𝑌𝑚1 ⊲⊳ 𝑚2 ⊲⊳ 𝑚) = ©­« ∑︁ 𝑚1∈Mem[𝑋\𝑋 ′] 𝜇𝑥 (𝜋𝑋𝑚1 ⊲⊳ 𝑚)ª®¬ · ©­« ∑︁ 𝑚2∈Mem[𝑌\𝑌 ′] 𝜇𝑦 (𝜋𝑌𝑚2 ⊲⊳ 𝑚)ª®¬ = 𝜋𝑋 ′𝜇𝑥 (𝑚) · 𝜋𝑌 ′𝜇𝑦 (𝑚) = 𝜇′𝑥 (𝑚) · 𝜇′𝑦 (𝑚) Hence, 𝜇 ∈ 𝜇′𝑥 ⊗D 𝜇′𝑦, and by definition, 𝜇 ⊑D 𝜇𝑧. □ Lemma 2.3.2. For any distribution 𝜇 ∈ XD, for a set of variables {𝑋𝑖}𝑖∈𝑆, 𝜇 |= ∗𝑖∈𝑆 Own(𝑋𝑖) iff variables {𝑋𝑖}𝑖∈𝑆 are distinct and mutually independent. Proof. For the forward definition: 𝜇 |= ∗𝑖∈𝑆 Own(𝑋𝑖) implies 𝜇 |= ∗𝑖∈𝑇 Own(𝑋𝑖) for any subset 𝑇 ⊆ 𝑆. by inductively unfolding the satisfaction definition and applying eq. (Down-Closed), there exists a set of distributions {𝜇𝑖}𝑖∈𝑇 such that • 𝜇𝑖 |= Own(𝑋𝑖) for any 𝑖 ∈ 𝑇 . 233 • dom(𝜇𝑖) are all disjoint • Let 𝑇 ′ = ∪𝑖∈𝑇 {𝑋𝑖}. For any 𝑚 ∈ Mem[𝑇], 𝜋𝑇𝜇(𝑚) = ∏ 𝑖∈𝑇 𝜇𝑖 (𝜋dom(𝜇𝑖)𝑚) . Thus, the first condition implies 𝑋𝑖 ∈ dom(𝜇𝑖) for each 𝑖, and thus 𝜇𝑖 (𝑋𝑖 = 𝑣) can be evaluated for any 𝑖 ∈ 𝑇 and 𝑣 ∈ Val. Combining this with the third condition, we have that: for any set of {𝑣𝑖 ∈ Val}𝑖∈𝑇 𝜋𝑅𝜇(𝑚) = ∏ 𝑖∈𝑇 𝜇𝑖 (𝑋𝑖 = 𝑣𝑖) = ∏ 𝑖∈𝑇 𝜇(𝑋𝑖 = 𝑣𝑖). Also, the second condition combined with the third one imply that all 𝑋𝑖 are distinct. For the backwards direction. We define 𝜇𝑖 = 𝜋𝑋𝑖𝜇 for each 𝑖 ∈ 𝑇 . Then clearly, each 𝜇𝑖 satisfies Own(𝑋𝑖). For convenience, relabel the variables in 𝑇 as 𝑇1, . . . , 𝑇𝑚, and denote ∪𝑘 𝑖=1{𝑇𝑖} as 𝑇 [: 𝑘]. Furthermore, we can prove by induc- tion that 𝜋𝑇 [:𝑘]𝜇 |= ∗𝑘𝑖=1 Own(𝑋𝑖) for 1 ≤ 𝑘 ≤ 𝑚. Base: 𝜋𝑋𝑖𝜇 |= Own(𝑋𝑖). Inductive: 𝑋𝑖 being mutually independent implies that for any set of values {𝑣𝑖 ∈ Val}1≤𝑖≤𝑚, 𝜇 (∧ 𝑖≤𝑘 𝑇𝑖 = 𝑣𝑖 ) = ∏ 𝑖≤𝑘 𝜇(𝑇𝑖 = 𝑣𝑖) = 𝜇 ( ∧ 1≤𝑖<𝑘 𝑇𝑖 = 𝑣𝑖 ) · 𝜇(𝑇𝑘 = 𝑣𝑘 ) Thus, for any 𝑚 ∈ Mem[∪1≤𝑖≤𝑘 {𝑇𝑖}], 𝜇(𝑚) = 𝜋𝑇 [:𝑘−1]𝜇 ( 𝜋𝑇 [:𝑘−1]𝑚 ) · 𝜋𝑇𝑘𝜇(𝜋𝑇𝑘𝑚) And therefore, 𝜋𝑇 [:𝑘]𝜇 ∈ ( 𝜋𝑇 [:𝑘−1]𝜇 ) ◦ 𝜋𝑇𝑘𝜇. By inductive hypothe- sis, ( 𝜋𝑇 [:𝑘]𝜇 ) |= ∗𝑘−1 𝑖=1 Own(𝑋𝑖), and by satisfaction, we have 𝜋𝑇 [:𝑘]𝜇 |= ∗𝑘𝑖=1 Own(𝑋𝑖). 234 □ A.2 Proofs related to Probabilistic Separation Logic Lemma 2.3.7 (Restriction). Let 𝜇 ∈ D(Mem[𝑆]) and let 𝜑 be a BI formula. Then: 𝜇 |= 𝜑⇔ (𝜎, 𝜋FV(𝜑) (𝜇)) |= 𝜑. Proof. The reverse direction follows by the persistence. The forward direction follows by induction on 𝜑. • 𝜑 ≡ ⊤,⊥, and atomic propositions 𝑝. Trivial. • 𝜑 ≡ 𝜑1 ∧ 𝜑2. By induction, we have 𝜋FV(𝜑1)𝜇 |= 𝜑1 and 𝜋FV(𝜑2)𝜇 |= 𝜑2. By persistence, we have 𝜋FV(𝜑1∧𝜑2)𝜇 |= 𝜑1 and 𝜋FV(𝜑1∧𝜑2)𝜇 |= 𝜑2 so 𝜋FV(𝜑1∧𝜑2)𝜇 |= 𝜑1 ∧ 𝜑2. • 𝜑 ≡ 𝜑1 ∨ 𝜑2. By induction, we have 𝜋FV(𝜑𝑖)𝜇 |= 𝜑𝑖 for 𝑖 = 1 or 𝑖 = 2. By Kripke monotonicity, we have 𝜋FV(𝜑1,𝜑2)𝜇 |= 𝜑𝑖 so 𝜋FV(𝜑1∧𝜑2)𝜇 |= 𝜑1 ∨ 𝜑2. • 𝜑 ≡ 𝜑1 → 𝜑2. Take any 𝜇′ ⊒ 𝜋FV(𝜑1→𝜑2)𝜇 such that 𝜇′ |= 𝜑1. By in- ductive hypothesis, 𝜋FV(𝜑1)𝜇 ′ |= 𝜑1 Because 𝜇′ ⊒ 𝜋FV(𝜑1→𝜑2)𝜇 , we have 𝜋FV(𝜑1→𝜑2)𝜇 ′ = 𝜋FV(𝜑1→𝜑2)𝜇, and thus 𝜋FV(𝜑1)𝜇 ′ = 𝜋FV(𝜑1)𝜇. There exists a distribution 𝜇′′ such that dom(𝜇′′) = dom(𝜇) ∪ dom(𝜋FV(𝜑1)𝜇 ′), and 𝜋dom(𝜇) (𝜇′′) = 𝜇 and 𝜋dom(𝜋𝜑1 𝜇 ′) (𝜇′′) = 𝜋𝜑1𝜇 ′. In particular, 𝜇′′ ⊒ 𝜇. By 235 persistence, we have 𝜇′′ |= 𝜑1 and by validity, we have 𝜇′′ |= 𝜑2. By induc- tion, 𝜋FV(𝜑2) (𝜇′′) |= 𝜑2. Since 𝜋FV(𝜑2) (𝜇′′) ⊑ 𝜇′, persistence gives 𝜇′ |= 𝜑2. So, 𝜋FV(𝜑1→𝜑2)𝜇 |= 𝜑1 → 𝜑2 as desired. • 𝜑 ≡ 𝜑1 ∗ 𝜑2. There exists 𝜇1 and 𝜇2 with 𝜇1 ◦ 𝜇2 ⊑ 𝜇 and 𝜇1 |= 𝜑1 and 𝜇2 |= 𝜑2. By induction, we have 𝜋FV(𝜑1)𝜇1 |= 𝜑1 and 𝜋FV(𝜑2)𝜇2 |= 𝜑2. By persistence, we have 𝜋FV(𝜑1∗𝜑2)𝜇1 |= 𝜑1 and 𝜋FV(𝜑1∗𝜑2)𝜇2 |= 𝜑2. Now, since 𝜇1 ◦ 𝜇2 is defined, 𝜋FV(𝜑1∗𝜑2)𝜇1 ◦ 𝜋FV(𝜑1∗𝜑2)𝜇2 is defined as well and 𝜋FV(𝜑1∗𝜑2)𝜇1 ◦ 𝜋FV(𝜑1∗𝜑2)𝜇2 ⊑ 𝜋FV(𝜑1∗𝜑2)𝜇. So, 𝜋FV(𝜑1∗𝜑2)𝜇 |= 𝜑1 ∗ 𝜑2 as de- sired. • 𝜑 ≡ 𝜑1 −∗ 𝜑2. Take any 𝜇′ such that 𝜇′ ◦ 𝜋FV(𝜑1−∗𝜑2)𝜇 ↓ and 𝜇′ |= 𝜑1. If 𝜇′ ◦ 𝜇 ↓, then 𝜇′ ◦ 𝜇 |= 𝜑2 and by induction, (𝜋FV(𝜑1−∗𝜑2)𝜇 ′) ◦ (𝜋FV(𝜑1−∗𝜑2)𝜇) |= 𝜑2. persistence gives 𝜇′ ◦ 𝜋FV(𝜑1−∗𝜑2)𝜇 |= 𝜑2. Otherwise, suppose that 𝜇′ ◦ 𝜇 is not defined. Since 𝜇′ ◦ 𝜋FV(𝜑1−∗𝜑2)𝜇 ↓, it must be the case that ∅ ≠ dom(𝜇′) ∩ dom(𝜇) ⊆ Var \ FV(𝜑1 −∗ 𝜑2); thus, (𝜋FV(𝜑1) (𝜇′)◦𝜇) ↓. By induction, 𝜋FV(𝜑1) (𝜇′) |= 𝜑1 and so 𝜋FV(𝜑1) (𝜇′)◦𝜇 |= 𝜑2. By induction again, (𝜋FV(𝜑1)∩FV(𝜑2)𝜇 ′) ◦ (𝜋FV(𝜑2)𝜇) |= 𝜑2. By persistence and the fact that the extension is defined, we have 𝜇′ ◦ (𝜋FV(𝜑1−∗𝜑2)𝜇) |= 𝜑2. So, (𝜋FV(𝜑1−∗𝜑2)𝜇) |= 𝜑1 −∗ 𝜑2 as desired. □ Lemma A.2.1 (Soundness for RV, WV, MV [Barthe et al., 2019]). Let 𝜇′ = ⟦𝑐⟧𝜇, and let 𝑅 = RV(𝑐),𝑊 = WV(𝑐), 𝑆 = Var \MV(𝑐). Then: 1. Variables outside of MV(𝑐) are not modified: 𝜋𝐶 (𝜇′) = 𝜋𝐶 (𝜇). 2. The sets 𝑅 and𝑊 are disjoint. 236 3. There exists 𝑓 : Mem[𝑅] → D(Mem[MV(𝑐)]) with 𝜇′ = bind(𝜇, 𝑚 ↦→ 𝑓 (𝜋𝑅 (𝑚)) ⊗ unit(𝜋𝑆 (𝑚))). Theorem 2.3.6 (Soundness). If ⊢ {𝜑} 𝑐 {𝜓} is derivable, then |= {𝜑} 𝑐 {𝜓}. Proof. By induction on the derivation. Let 𝜇 satisfy the pre-condition of the conclusion. SKIP Trivial. SEQ By induction hypothesis. DASSN By induction on the syntax of 𝜑. RASSN The output distribution ⟦𝑥 ← 𝑒⟧𝜇 = bind(𝜇, 𝑚 ↦→ unit(𝑚 [𝑥 ↦→ ⟦𝑒⟧(𝑚)])). Because 𝑥 ∉ FV(𝑒), for any 𝑚, ⟦𝑥⟧(𝑚 [𝑥 ↦→ ⟦𝑒⟧(𝑚)]) = ⟦𝑒⟧(𝑚 [𝑥 ↦→ ⟦𝑒⟧(𝑚)]). Thus, ⟦𝑥 ← 𝑒⟧𝜇 |= [𝑥 = 𝑒]. SAMP The output distribution ⟦𝑥 $← 𝑑⟧(𝜇) = bind(𝜇, 𝑚 ↦→ bind(𝑑, 𝑣 ↦→ unit(𝑚 [𝑥 ↦→ 𝑣]))). Thus, ⟦𝑥⟧(⟦𝑥 $← 𝑑⟧(𝜇)) = bind(⟦𝑥 $← 𝑑⟧(𝜇), 𝑚 ↦→ ⟦𝑥⟧(𝑚)) = bind(bind(𝜇, 𝑚 ↦→ bind(𝑑, 𝑣 ↦→ unit(𝑚 [𝑥 ↦→ 𝑣]))), 𝑚 ↦→ ⟦𝑥⟧(𝑚)) = bind(𝜇, 𝑚 ↦→ bind(𝑑, 𝑣 ↦→ unit(𝑣))) = bind(𝜇, 𝑚 ↦→ 𝑑) = 𝑑. Therefore, ⟦𝑥 $← 𝑑⟧(𝜇) |= 𝑥 $∼ 𝑑. 237 COND Since 𝜇 |= 𝜑 and |= 𝜑 → Detm⟨𝑏⟩, we have 𝜇 |= Detm⟨𝑏⟩. Thus, either 𝜇 |= [𝑏 = tt] or 𝜇 |= [𝑏 = ff ]. Note that exactly one case holds. If 𝜇 |= [𝑏 = tt] holds, then 𝜇 |= 𝜑 ∧ [𝑏 = tt] and thus ⟦if 𝑏 then 𝑐 else 𝑐′⟧(𝜇) = ⟦𝑐⟧(𝜇), we can conclude by induction. The case 𝜇 |= [𝑏 = ff ] is similar. RCOND Because 𝜇 |= 𝜑 ∗ Own(𝑏), there exist 𝜇1, 𝜇2 such that 𝜇1 ◦ 𝜇2 ⊑ 𝜇, and 𝜇1 |= 𝜑 and 𝜇2 |= Own(𝑏). Let 𝜌 be the probability ⟦𝑏⟧(𝜇2) (tt). We may assume that 𝜌 ∈ (0, 1) — if 𝜌 is equal to zero or one then we can conclude by induction. By the semantics of commands, we have ⟦if 𝑏 then 𝑐 else 𝑐′⟧(𝜇) = 𝜌 · ⟦𝑐⟧(𝜇𝑡) + (1 − 𝜌) · ⟦𝑐′⟧(𝜇 𝑓 ) where 𝜇𝑡 is the distribution 𝜇 conditioned on 𝑏 = tt, and 𝜇 𝑓 is the distribu- tion 𝜇 conditioned on 𝑏 = ff . Recall that by the induction hypothesis, we have: ⟦𝑐⟧(𝜇𝑡) |= 𝜓 ∗ [𝑏 = tt] and ⟦𝑐′⟧(𝜇 𝑓 ) |= 𝜓 ∗ [𝑏 = ff ] . Thus, we can decompose the output states into 𝜈 ◦ 𝜈𝑡 ⊑ ⟦𝑐⟧(𝜇𝑡) and 𝜈 ◦ 𝜈 𝑓 ⊑ ⟦𝑐⟧(𝜇 𝑓 ) such that 𝜈 |= 𝜓 and 𝜈𝑡 |= [𝑏 = tt] and 𝜈 𝑓 |= [𝑏 = ff ] noting that 𝜈 can be taken to be the same in both branches since 𝜓 ∈ SP; by lemma 2.3.7, we may also assume that dom(𝜈𝑡) = dom(𝜈 𝑓 ). Thus, we 238 have: 𝜌 · 𝜈 ◦ 𝜈𝑡 + (1 − 𝜌) · 𝜈 ◦ 𝜈 𝑓 = 𝜌 · (𝜈 ⊗ 𝜈𝑡) + (1 − 𝜌) · (𝜈 ⊗ 𝜈 𝑓 ) = 𝜈 ⊗ (𝜈𝑡 ⊕𝜌 𝜈 𝑓 ) = 𝜈 ◦ (𝜈𝑡 ⊕𝜌 𝜈 𝑓 ) ⊑ ⟦if 𝑏 then 𝑐 else 𝑐′⟧(𝜇), and we can conclude since 𝜈 |= 𝜓 and 𝜈𝑡 ⊕𝜌 𝜈 𝑓 |= Own(𝑏). LOOP For any 𝜇 |= 𝜑, the side condition implies 𝜇 |= Detm⟨𝑏⟩. We show by induction that 𝜇 |= 𝜑 implies that for any 𝑛 > 0, ⟦(if 𝑏 then 𝑐)𝑛⟧(𝜇) |= 𝜑 ∧ Detm⟨𝑏⟩. Base case: ⟦(if 𝑏 then 𝑐)0⟧(𝜇) = 𝜇, so it satisfies 𝜑. By side condition that |= 𝜑→ Detm⟨𝑏⟩, we have ⟦(if 𝑏 then 𝑐)0⟧(𝜇) |= 𝜑 ∧ Detm⟨𝑏⟩. Inductive case: Say 𝜇′ = ⟦(if 𝑏 then 𝑐)𝑛⟧(𝜇). By inductive hypothesis, 𝜇′ |= 𝜑 ∧ Detm⟨𝑏⟩, there are two possibilities: • 𝜇′ |= 𝜑 ∧ [𝑏 = ff ], then ⟦(if 𝑏 then 𝑐)𝑛+1⟧(𝜇) = ⟦if 𝑏 then 𝑐⟧(𝜇′) = 𝜇′, which implies that ⟦(if 𝑏 then 𝑐)𝑛+1⟧(𝜇) |= 𝜑 ∧ [𝑏 = ff ]. • 𝜇′ |= 𝜑 ∧ [𝑏 = tt], then ⟦(if 𝑏 then 𝑐)𝑛+1⟧(𝜇) = ⟦if 𝑏 then 𝑐⟧(𝜇′) = ⟦𝑐⟧(𝜇′), which implies that ⟦(if 𝑏 then 𝑐)𝑛+1⟧(𝜇) |= 𝜑 because ⊢ {𝜑 ∧ [𝑏 = tt]} 𝑐 {𝜑}. Since |= 𝜑 → Detm⟨𝑏⟩, so ⟦if 𝑏 then 𝑐𝑛+1⟧(𝜎′, 𝜇′) |= 𝜑 ∧ Detm⟨𝑏⟩. In both cases, ⟦if 𝑏 then 𝑐𝑛+1⟧(𝜎′, 𝜇′) |= 𝜑 ∧ Detm⟨𝑏⟩. 239 Since we assumed that the loop ends in finite step, there exists a fi- nite number 𝑁 such that ⟦(if 𝑏 then 𝑐)𝑁⟧(𝜇) |= 𝜑 ∧ [𝑏 = ff ] (and also ⟦(if 𝑏 then 𝑐)𝑁−1⟧(𝜇) |= 𝜑 ∧ [𝑏 = tt] if 𝑁 > 1, but this fact is not used in this proof). Then ⟦while 𝑏 do 𝑐⟧(𝜇) = ⟦(if 𝑏 then 𝑐)𝑁⟧(𝜇) |= 𝜑 ∧ [𝑏 = ff ]. WEAK By induction hypothesis and semantics of implication. TRUE Trivial. CONJ By induction hypothesis and semantics of conjunction. CASE By case analysis. CONST The fact that ⟦𝑐⟧(𝜇) |= 𝜓 follows by induction. To show ⟦𝑐⟧(𝜇) |= 𝜂, by the restriction property we have 𝜋FV(𝜂) (𝜇) |= 𝜂 initially, and since the free variables of 𝜂 are disjoint from the modified variables of 𝑐, we have 𝜋FV(𝜂) (⟦𝑐⟧𝜇) |= 𝜂 as well. Thus, by monotonicity, ⟦𝑐⟧(𝜇) |= 𝜂 as desired. FRAME There exist 𝜇1, 𝜇2 such that 𝜇1 ◦ 𝜇2 ⊑ 𝜇, and 𝜇1 |= 𝜑 and 𝜇2 |= 𝜂; let 𝑆1 ≜ dom(𝜇1), and note that 𝑇 ∪ RV(𝑐) ⊆ 𝑆1 by the last side-condition. By the restriction property we have 𝜋FV(𝜂) (𝜇2) |= 𝜂; let 𝑆2 ≜ dom(𝜇2) ∩ FV(𝜂) and note that 𝑆1 and 𝑆2 are disjoint. Let 𝑆3 be the set of all variables not contained in 𝑆1 or 𝑆2. Since WV(𝑐) is disjoint from 𝑆2 by the first side- condition, we must have WV(𝑐) ⊆ 𝑆1 ∪ 𝑆3. By induction, we have ⟦𝑐⟧(𝜇) |= 𝜓. The restriction property gives 𝜋FV(𝜓) (⟦𝑐⟧(𝜇)) |= 𝜓. By the third side-condition, RV(𝑐) ⊆ 𝑆1. By soundness of RV and WV, all variables in WV(𝑐) must be written to before they are read and there is a function 𝐹 : Mem[𝑆1] → D(Mem[WV(𝑐) ∪ 𝑆1]) such that: 𝜋WV(𝑐)∪𝑆1 (⟦𝑐⟧𝜇) = bind(𝜇, 𝑚 ↦→ 𝐹 (𝜋𝑆1 (𝑚))). 240 Since 𝑆2 ⊆ FV(𝜂), variables in 𝑆2 are not in MV(𝑐) by the first side- condition, and 𝑆2 is disjoint from WV(𝑐) ∪ 𝑆1. By soundness of MV, we have: 𝜋(WV(𝑐)∪𝑆1)∪𝑆2 (⟦𝑐⟧𝜇) = bind(𝜋(WV(𝑐)∪𝑆1)∪𝑆2 (𝜇), (𝑚1, 𝑚2) ↦→ 𝐹 (𝑚1)⊗unit(𝑚2)). Since 𝑆1 and 𝑆2 are independent in 𝜇, we know that 𝑆1 ∪WV(𝑐) and 𝑆2 are independent in ⟦𝑐⟧(𝜇) as well. Hence: ⟦𝑐⟧𝜇 ⊒ (𝜋𝑆1∪WV(𝑐) (⟦𝑐⟧𝜇)) ◦ (𝜋𝑆2 (⟦𝑐⟧𝜇)). We know that 𝐹𝑉 (𝜓) ⊆ 𝑇 ∪ WV(𝑐) ⊆ 𝑆1 ∪ WV(𝑐) so since 𝜓 is valid in ⟦𝑐⟧(𝜇), it is valid in the first conjunct by the restriction property and the second side-condition. Since 𝜋𝑆2 (⟦𝑐⟧𝜇) = 𝜋𝑆2 (𝜇), and 𝜂 does not de- pend on modified deterministic variables, 𝜂 is valid in the second conjunct. Thus, we can conclude: ⟦𝑐⟧(𝜇) |= 𝜓 ∗ 𝜂. □ 241 APPENDIX B LINA: A SEPARATION LOGIC FOR NEGATIVE DEPENDENCE B.1 Preliminaries Lemma B.1.1. Say S = {𝑆𝑖 | 1 ≤ 𝑖 ≤ 𝑁} where 𝑆𝑖 are disjoint, 𝑆 = ∪S and 𝜇 ∈ Mem[𝑆], Then, 𝑆𝑖 are independent in 𝜇 if and only if for any family of all monotone or all antitone functions 𝑓𝑖 : Mem[𝑆𝑖] → R+, E𝑥∼𝜇 [∏ 𝑆𝑖∈S 𝑓𝑖 (𝜋𝑆𝑖𝑥) ] = ∏ 𝑆𝑖∈S E𝑥∼𝜇 [ 𝑓𝑖 (𝜋𝑆𝑖𝑥)] . (B.1) Proof. The forward direction is straightforward. The backward direction needs more careful analysis. In general, zero correlation does not imply independence, but here, we have the equality for all family of monotone or antitone functions, so that suffices for independence. We prove by induction on T = {𝑆𝑖 | 1 ≤ 𝑖 ≤ 𝐾} that for any family of 𝑣𝑖 ∈ Mem[𝑆𝑖], E𝑥∼𝜇  ( ∧ 𝑆𝑖∈T 𝜋𝑆𝑖𝑥 = 𝑣𝑖 ) ∧ ©­« ∧ 𝑆𝑖∈S\T 𝜋𝑆𝑖𝑥 < 𝑣𝑖 ª®¬  = ∏ 𝑆𝑖∈T E𝑥∼𝜇 [ 𝜋𝑆𝑖𝑥 = 𝑣𝑖 ] · ∏ 𝑆𝑖∈S\T E𝑥∼𝜇 [ 𝜋𝑆𝑖𝑥 < 𝑣𝑖 ] . (B.2) Case |T | = 1: Say T = {𝑆 𝑗 }. Since indicator functions 𝑆𝑖 < 𝑣𝑖 and 𝑆𝑖 ≤ 𝑣𝑖 are 242 both monotonically decreasing, E𝑥∼𝜇 𝜋𝑆 𝑗𝑥 = 𝑣 𝑗 ∧ ( ∧ 𝑆𝑖∈S\T 𝜋𝑆𝑖𝑥 < 𝑣𝑖)  = E𝑥∼𝜇 𝜋𝑆 𝑗𝑥 ≤ 𝑣 𝑗 ∧ ( ∧ 𝑆𝑖∈S\T 𝜋𝑆𝑖𝑥 < 𝑣𝑖)  − E𝑥∼𝜇 𝜋𝑆 𝑗𝑥 < 𝑣 𝑗 ∧ ( ∧ 𝑆𝑖∈S\T 𝜋𝑆𝑖𝑥 < 𝑣𝑖)  = E𝑥∼𝜇 [ 𝜋𝑆 𝑗𝑥 ≤ 𝑣 𝑗 ] · ∏ 𝑆𝑖∈S\T E𝑥∼𝜇 [ 𝜋𝑆𝑖𝑥 < 𝑣𝑖 ] − E𝑥∼𝜇 [ 𝜋𝑆 𝑗𝑥 < 𝑣 𝑗 ] · ∏ 𝑆𝑖∈S\T E𝑥∼𝜇 [ 𝜋𝑆𝑖𝑥 < 𝑣𝑖 ] (By Equation (B.1)) = (E𝑥∼𝜇 [ 𝜋𝑆 𝑗𝑥 ≤ 𝑣 𝑗 ] − E𝑥∼𝜇 [ 𝜋𝑆 𝑗𝑥 < 𝑣 𝑗 ] ) · ∏ 𝑆𝑖∈S\T E𝑥∼𝜇 [ 𝜋𝑆𝑖𝑥 < 𝑣𝑖 ] = E𝑥∼𝜇 [ 𝜋𝑆 𝑗𝑥 = 𝑣 𝑗 ] · ∏ 𝑆𝑖∈S\T E𝑥∼𝜇 [ 𝜋𝑆𝑖𝑥 < 𝑣𝑖 ] Case |T | > 1 Let 𝑆 𝑗 be an element in T . E𝑥∼𝜇 ( ∧ 𝑆𝑖∈T 𝜋𝑆𝑖𝑥 = 𝑣𝑖) ∧ ( ∧ 𝑆𝑖∈S\T 𝜋𝑆𝑖𝑥 < 𝑣𝑖)  = E𝑥∼𝜇 𝜋𝑆 𝑗𝑥 ≤ 𝑣 𝑗 ∧ ( ∧ 𝑆𝑖∈T\{𝑆 𝑗 } 𝜋𝑆𝑖𝑥 = 𝑣𝑖) ∧ ( ∧ 𝑆𝑖∈S\T 𝜋𝑆𝑖𝑥 < 𝑣𝑖)  − E𝑥∼𝜇 𝜋𝑆 𝑗𝑥 < 𝑣 𝑗 ∧ ( ∧ 𝑆𝑖∈T\{𝑆 𝑗 } 𝜋𝑆𝑖𝑥 = 𝑣𝑖) ∧ ( ∧ 𝑆𝑖∈S\T 𝜋𝑆𝑖𝑥 < 𝑣𝑖)  = E𝑥∼𝜇 [ 𝜋𝑆 𝑗𝑥 ≤ 𝑣 𝑗 ] · ∏ 𝑆𝑖∈T\{𝑆 𝑗 } E𝑥∼𝜇 [ 𝜋𝑆𝑖𝑥 = 𝑣𝑖 ] · ∏ 𝑆𝑖∈S\T E𝑥∼𝜇 [ 𝜋𝑆𝑖𝑥 < 𝑣𝑖 ] − E𝑥∼𝜇 [ 𝜋𝑆 𝑗𝑥 < 𝑣 𝑗 ] · ∏ 𝑆𝑖∈T\{𝑆 𝑗 } E𝑥∼𝜇 [ 𝜋𝑆𝑖𝑥 = 𝑣𝑖 ] · ∏ 𝑆𝑖∈S\T E𝑥∼𝜇 [ 𝜋𝑆𝑖𝑥 < 𝑣𝑖 ] = E𝑥∼𝜇 [ 𝜋𝑆 𝑗𝑥 = 𝑣 𝑗 ] · ∏ 𝑆𝑖∈T\{𝑆 𝑗 } E𝑥∼𝜇 [ 𝜋𝑆𝑖𝑥 = 𝑣𝑖 ] · ∏ 𝑆𝑖∈S\T E𝑥∼𝜇 [ 𝜋𝑆𝑖𝑥 < 𝑣𝑖 ] When T = S, Equation (B.2) implies E𝑥∼𝜇 [∧ 𝑆𝑖∈S 𝜋𝑆𝑖𝑥 = 𝑣𝑖 ] = ∏ 𝑆𝑖∈S E𝑥∼𝜇 [ 𝜋𝑆𝑖𝑥 = 𝑣𝑖 ] for any 𝑣𝑖’s. Thus, components in S are independent. □ 243 We prove some properties of coarsening. In the following we will use an alternative definition of coarsening, which will be shown to be equivalent to what we define in the main text. Definition B.1.1 (Alternative definition of coarsening). We first index any par- tition S as S1, . . . ,S|S|. Say |S′| = 𝑚, |S| = 𝑛. We say S′ coarsens a partition S there exists a function a 𝑓 : [𝑚] → P([𝑛]) such that 1) ∪𝑖∈[𝑚] 𝑓 (𝑖) = [𝑛]; 2) for any 𝑖, 𝑗 ∈ [𝑚], either 𝑖 = 𝑗 or 𝑓 (𝑖), 𝑓 ( 𝑗) are disjoint; 3) S′ = {∪{S 𝑗 | 𝑗 ∈ 𝑓 (𝑖)} | 𝑖 ∈ [𝑚]}. Lemma B.1.2. Let S, S′ be two partitions. Then S′ coarsens S according to Defini- tion B.1.1 if and only if S′ coarsens S according to Definition 3.3.4 . Proof. We index S as S1, . . . ,S𝑛 and S′ as S′1, . . . ,S ′ |𝑚 | . Backward direction: By that definition, we know a) for any S′ 𝑖 ∈ S′, S′ 𝑖 = ∪R for some R ⊆ S; b) ∪S = ∪S′. We define the function 𝑔 : [𝑚] → P([𝑛]) as 𝑔(𝑖) = { 𝑗 | S 𝑗 ⊆ S′𝑖 }. This 𝑔 would satisfies all the conditions required: 1. By substitution, ∪𝑖∈[𝑚]𝑔(𝑖) = ∪𝑖∈[𝑚]{ 𝑗 | S 𝑗 ⊆ S′𝑖 } = ∪𝑠′∈S′{ 𝑗 | S 𝑗 ⊆ 𝑠′}. By b), for any 𝑗 ∈ [𝑛], S 𝑗 ⊆ ∪S′. Then by a) and that S is a partition, if 𝑠′ covers any of S 𝑗 , it must covers all of S 𝑗 , then S 𝑗 ⊆ ∪S′ implies there exists 𝑠′ ∈ S′ such that S 𝑗 ⊆ 𝑠′. Thus, 𝑗 ∈ { 𝑗 | S 𝑗 ⊆ 𝑠′} ⊆ ∪𝑠′∈S′{ 𝑗 | S 𝑗 ⊆ 𝑠′}. For any 𝑗 ∉ [𝑛], S 𝑗 is undefined, so it is impossible that S 𝑗 ⊆ 𝑠′ for some 𝑠′ ⊆ S′. Therefore, ∪𝑠′∈S′{ 𝑗 | S 𝑗 ⊆ 𝑠′} = [𝑛]. 2. For any 𝑘 ∈ 𝑔(𝑖), S𝑘 ⊆ S′𝑖 . If 𝑖 ≠ 𝑗 , then S′ 𝑖 and S′ 𝑗 are disjoint since S′ is a partition. Thus, S𝑘 ⊈ S′ 𝑗 , and 𝑘 ∉ 𝑔( 𝑗). So for any 𝑖 ≠ 𝑗 , 𝑔(𝑖), 𝑔( 𝑗) are disjoint. 244 3. By substitution, {∪{S 𝑗 | 𝑗 ∈ 𝑔(𝑖)} | 𝑖 ∈ [𝑚]} = {∪{S 𝑗 | S 𝑗 ⊆ S′𝑖 } | 𝑖 ∈ [𝑚]} = {∪{S 𝑗 | S 𝑗 ⊆ 𝑠′} | 𝑠′ ∈ S′}. Again, by a) and that S is a partition, if 𝑠′ ∈ S covers any part of of S 𝑗 , it must covers all of S 𝑗 , so ∪{S 𝑗 | S 𝑗 ⊆ 𝑠′} = 𝑠′. Thus, {∪{S 𝑗 | S 𝑗 ⊆ 𝑠′} | 𝑠′ ∈ S′} = S′. Forward direction: By 3), we know that S′ = {∪{S 𝑗 | 𝑗 ∈ 𝑓 (𝑖)} | 𝑖 ∈ [𝑚]}. So for any S′ 𝑖 ∈ S′, we have 𝑠′ = ∪{S 𝑗 | 𝑗 ∈ 𝑓 (𝑖)}, which is a subset of S′ by construction. So we proved a). Also, ∪S′ = ∪{∪{S 𝑗 | 𝑗 ∈ 𝑓 (𝑖)} | 𝑖 ∈ [𝑚]} = ∪{S 𝑗 | 𝑗 ∈ 𝑓 (𝑖) | 𝑖 ∈ [𝑚]}, and by 1), that is equivalent to ∪{S 𝑗 | 𝑗 ∈ [𝑛]}, which is equivalent to ∪S. □ We can prove that coarsening commute with projections. Lemma B.1.3. Given a partition S = {S𝑖}𝑖 and a set 𝑋 , let S𝑋 = {S𝑖 ∩ 𝑋 | S𝑖 ∈ S}. For any T coarsening S𝑋 , there exists a coarsening S′ of S such that T = {S𝑖 ∩ 𝑋 | S𝑖 ∈ S′}; conversely, for any S′ coarsening S, and S′ 𝑋 = {S𝑖 ∩ 𝑋 | S𝑖 ∈ S′}, we have S′ 𝑋 coarsens S𝑋 . Proof. Forward direction: By Definition B.1.1, there exists a coarsening function 𝑓 such that T = {∪{(S𝑋) 𝑗 | 𝑗 ∈ 𝑓 (𝑖)} | 𝑖 ∈ [|T |]} = {∪{S 𝑗 ∩ 𝑋 | 𝑗 ∈ 𝑓 (𝑖)} | 𝑖 ∈ [|T |]} = {(∪{S 𝑗 | 𝑗 ∈ 𝑓 (𝑖)}) ∩ 𝑋 | 𝑖 ∈ [|T |]} = {𝑆′ ∩ 𝑋 | 𝑆′ ∈ S′} (where S′ = {∪{S 𝑗 | 𝑗 ∈ 𝑓 (𝑖)} | 𝑖 ∈ [|T |]}) 245 S′ has the same size as T , so S′ = {∪{S 𝑗 | 𝑗 ∈ 𝑓 (𝑖)} | 𝑖 ∈ [|S′|]}, and thus S′ coarsens S. Backward direction: S′ coarsens S, so there exists a coarsening function 𝑓 such that S′ = {∪{𝑆 𝑗 | 𝑗 ∈ 𝑓 (𝑖)} | 𝑖 ∈ [|S′|]}. Thus, S′𝑋 = {(∪{𝑆 𝑗 | 𝑗 ∈ 𝑓 (𝑖)}) ∩ 𝑋 | 𝑖 ∈ [|S′|]} = {∪{𝑆 𝑗 ∩ 𝑋 | 𝑗 ∈ 𝑓 (𝑖)} | 𝑖 ∈ [|S′|]} = {∪{𝑆𝑋 𝑗 | 𝑗 ∈ 𝑓 (𝑖)} | 𝑖 ∈ [|S′|]}. Therefore, S′ 𝑋 coarsens S𝑋 . □ B.2 A BI Frame for Negative Association B.2.1 Capturing Negative Association Theorem 3.3.2. For any two states 𝜇1, 𝜇2 ∈ 𝑋 , 𝜇1 ⊕𝑠 𝜇2 ⊆ 𝜇1 ⊕ 𝜇2 ⊆ 𝜇1 ⊕𝑤 𝜇2. Proof. Let 𝑆 denote dom(𝜇1) and 𝑇 denote dom(𝜇2). For any 𝜇 ∈ 𝜇1 ⊕𝑠 𝜇2, we have 𝜋𝑆𝜇 = 𝜇1, 𝜋𝑇𝜇 = 𝜇2, and 𝜇 satisfies NA. 𝜇 being NA implies 𝜇 is R-PNA for any partition R on dom(𝜇) So for any partition S on 𝑆, partition T on 𝑇 , 𝜇 is S ∪ T -PNA. Therefore, 𝜇 ∈ 𝜇1 ⊕ 𝜇2. 246 For any 𝜇 ∈ 𝜇1 ⊕ 𝜇2, 𝜋𝑆𝜇 = 𝜇1, 𝜋𝑇𝜇 = 𝜇2, and 𝜇 is {𝑆, 𝑇}-PNA since 𝜇1 is {𝑆}-PNA, 𝜇2 is {𝑇}-PNA. Thus, 𝜇 ∈ 𝜇1 ⊕𝑤 𝜇2. □ Theorem 3.3.1. Given a set of variables 𝑆, 𝑆 satisfies NA in 𝜇 iff 𝜇 satisfies S-PNA for any S partitioning 𝑆 iff 𝜇 satisfies {{𝑥} | 𝑥 ∈ 𝑆}-PNA. Proof. The second equivalence is straightforward: • {{𝑠} | 𝑠 ∈ S} is a partition of 𝑆, so we have the backward direction. • Any S partitioning 𝑆 coarsens {{𝑠} | 𝑠 ∈ 𝑆}, so we have the first direction. For the forward direction of the first equivalence, it suffices to prove that for any partition S of 𝑆, any family of all monotone or all antitone functions 𝑓𝑖 : Mem[𝑆𝑖] → R+, E𝑚∼𝜇 [∏ 𝑆𝑖∈S 𝑓𝑖 (𝜋𝑆𝑖𝑚) ] ≤ ∏ 𝑆𝑖∈S E𝑚∼𝜇 [ 𝑓𝑖 (𝜋𝑆𝑖𝑚) ] . (B.3) We prove that by induction on the size of S. Base case |S| = 1: S-PNA is trivial. Base case |S| = 2: S-PNA is straightforward from NA. Inductive case: Assuming 𝜇 satisfies S-PNA for any partition with size less than 𝐾 , we want to show that 𝜇 satisfies S-PNA for any partition with size equals to 𝐾 . Say S = {𝑆1, . . . , 𝑆𝐾}. For any family of all monotone or all antitone func- tions 𝑓𝑖 : Mem[𝑆𝑖] → R+, either both 𝑚 ↦→∏𝐾−1 𝑖=1 𝑓𝑖 (𝜋𝑆𝑖𝑚) and 𝑓𝐾 are mono- tone, or both 𝑚 ↦→ ∏𝐾−1 𝑖=1 𝑓𝑖 (𝜋𝑆𝑖𝑚) and 𝑓𝐾 are antitone. Thus, by the induc- 247 tive hypothesis E𝑥∼𝜇 [ 𝐾∏ 𝑖=1 𝑓𝑖 (𝜋𝑆𝑖𝑚) ] ≤ E𝑥∼𝜇 [ 𝐾−1∏ 𝑖=1 𝑓𝑖 (𝜋𝑆𝑖𝑚) ] · E𝑥∼𝜇 [ 𝑓𝐾 (𝜋𝑆𝐾𝑚) ] (We can partition ∪𝐾 𝑖=1𝑆𝑖 into {∪𝐾−1 𝑖=1 𝑆𝑖, 𝑆𝐾}) ≤ 𝐾−1∏ 𝑖=1 E𝑥∼𝜇 [ 𝑓𝑖 (𝜋𝑆𝑖𝑚) ] · E𝑥∼𝜇 [ 𝑓𝐾 (𝜋𝑆𝐾𝑚) ] (B.4) (We can partition ∪𝐾 𝑖=1𝑆𝑖 into {𝑆1, . . . , 𝑆𝐾}) = 𝐾∏ 𝑖=1 E𝑥∼𝜇 [ 𝑓𝑖 (𝜋𝑆𝑖𝑚) ] . (B.5) The backward direction of the first equivalence is more involved. For any two disjoint 𝐴, 𝐵 ⊆ 𝑆, we know 𝜇 satisfies {𝐴, 𝐵}-PNA, so for every pair of both monotone or both antitone functions 𝑓 : Mem[𝐴] → R+, 𝑔 : Mem[𝐵] → R+, we have E𝑚∼𝜇 [ 𝑓 (𝜋𝐴𝑚) · 𝑔(𝜋𝐵𝑚)] ≤ E𝑚∼𝜇 [ 𝑓 (𝜋𝐴𝑚)] · E𝑚∼𝜇 [𝑔(𝜋𝐵𝑚)] . But the problem is to show this inequality when 𝑓 , 𝑔 are not both non-negative. We prove that in three steps: 1. If 𝑓 , 𝑔 are lower-bounded by −𝐿, i.e., 𝑓 (𝑥) ≥ −𝐿 and 𝑔(𝑥) ≥ −𝐿 for any 𝑥. Then 𝑥 → 𝑓 (𝑥) +𝐿 and 𝑥 → 𝑔(𝑥) +𝐿 are both non-negative functions. Thus, E𝑚∼𝜇 [( 𝑓 (𝜋𝐴𝑚) + 𝐿) · (𝑔(𝜋𝐵𝑚) + 𝐿)] ≤ E𝑚∼𝜇 [ 𝑓 (𝜋𝐴𝑚) + 𝐿] · E𝑚∼𝜇 [𝑔(𝜋𝐵𝑚) + 𝐿] . (B.6) 248 Meanwhile, E[( 𝑓 (𝜋𝐴𝑚) + 𝐿) · (𝑔(𝜋𝐵𝑚) + 𝐿)] = E[ 𝑓 (𝜋𝐴𝑚) · 𝑔(𝜋𝐵𝑚)] + 𝐿 · E[ 𝑓 (𝜋𝐴𝑚)] + 𝐿 · E[𝑔(𝜋𝐵𝑚)] + 𝐿2 E[ 𝑓 (𝜋𝐴𝑚) + 𝐿] · E[𝑔(𝜋𝐵𝑚) + 𝐿] = (E[ 𝑓 (𝜋𝐴𝑚)] + 𝐿) · (E[𝑔(𝜋𝐵𝑚)] + 𝐿) = E[ 𝑓 (𝜋𝐴𝑚)] · E[𝑔(𝜋𝐵𝑚)] + 𝐿 · E[ 𝑓 (𝜋𝐴𝑚)] + 𝐿 · E[𝑔(𝜋𝐵𝑚)] + 𝐿2. So Equation (B.6) implies that E𝑚∼𝜇 [ 𝑓 (𝜋𝐴𝑚) · 𝑔(𝜋𝐵𝑚)] ≤ E𝑚∼𝜇 [ 𝑓 (𝜋𝐴𝑚)] · E𝑚∼𝜇 [𝑔(𝜋𝐵𝑚)] . 2. If the codomain of 𝑓 or 𝑔 does not range across both negative and posi- tive numbers, then we can also prove the desired inequality by applying the monotone convergence theorem on the result for lower-bounded func- tions. • Say 𝑓 is non-negative and 𝑔 is non-positive. For any natural number 𝑛, 𝑚 ∈ Mem[𝐴 ∪ 𝐵], we define 𝑔𝑛 (𝜋𝐵𝑚) = max(𝑔(𝜋𝐵𝑚),−𝑛), ℎ𝑛 (𝑚) = 𝑓 (𝜋𝐴𝑚) · 𝑔𝑛 (𝜋𝐵𝑚). Then for any 𝑛, 𝑔𝑛 and ℎ𝑛 are lower-bounded non- positive functions; and for any 𝑚, {𝑔𝑛 (𝑚)}𝑛∈N is a monotonically de- creasing sequence converging to 𝑔(𝑚), {ℎ𝑛 (𝑚)}𝑛∈N is a monotonically decreasing sequence converging to 𝑓 (𝜋𝐴𝑚) ·𝑔(𝜋𝐵𝑚). By the monotone convergence theorem, E𝑚∼𝜇 𝑓 (𝜋𝐴𝑚) · 𝑔(𝜋𝐵𝑚) = lim 𝑛→∞ E𝑚∼𝜇ℎ𝑛 (𝑚) E𝑚∼𝜇𝑔(𝜋𝐵𝑚) = lim 𝑛→∞ E𝑚∼𝜇𝑔𝑛 ( 𝑝𝑖𝐵𝑚). 249 By what we proved above, for any 𝑛, we have E𝑚∼𝜇 [ℎ𝑛 (𝑚)] ≤ E𝑚∼𝜇 [ 𝑓 (𝜋𝐴𝑚)] · E𝑚∼𝜇 [𝑔𝑛 (𝜋𝐵𝑚)] Taking that to the limit 𝑛→∞, lim 𝑛→∞ E𝑚∼𝜇 [ℎ𝑛 (𝑚)] ≤ lim 𝑛→∞ ( E𝑚∼𝜇 [ 𝑓 (𝜋𝐴𝑚)] · E𝑚∼𝜇 [𝑔(𝜋𝐵𝑚)] ) = lim 𝑛→∞ E𝑚∼𝜇 [ 𝑓 (𝜋𝐴𝑚)] · lim 𝑛→∞ E𝑚∼𝜇 [𝑔(𝜋𝐵𝑚)] Therefore, for any distribution 𝜇 ∈ D(Mem[𝐴 ∪ 𝐵]), E𝑚∼𝜇 [ 𝑓 (𝜋𝐴𝑚) · 𝑔(𝜋𝐵𝑚)] ≤ E𝑚∼𝜇 [ 𝑓 (𝜋𝐴𝑚)] · E𝑚∼𝜇 [𝑔(𝜋𝐵𝑚)] . • The case where 𝑓 is non-positive and 𝑔 is non-negative is symmetric. • The case where 𝑓 and 𝑔 are both non-positive is also similar. We will define 𝑓𝑛 (𝜋𝐴𝑚) = max( 𝑓 (𝜋𝐴𝑚),−𝑛), 𝑔𝑛 (𝜋𝐵𝑚) = max(𝑔(𝜋𝐵𝑚),−𝑛), ℎ𝑛 (𝑚) = 𝑓𝑛 (𝜋𝐴𝑚) · 𝑔𝑛 (𝜋𝐵𝑚). Then we have E𝑚∼𝜇 𝑓 (𝜋𝐴𝑚) · 𝑔(𝜋𝐵𝑚) = lim 𝑛→∞ E𝑚∼𝜇ℎ𝑛 (𝑚) E𝑚∼𝜇𝑔(𝜋𝐵𝑚) = lim 𝑛→∞ E𝑚∼𝜇𝑔𝑛 ( 𝑝𝑖𝐵𝑚) E𝑚∼𝜇 𝑓 (𝜋𝐵𝑚) = lim 𝑛→∞ E𝑚∼𝜇 𝑓𝑛 ( 𝑝𝑖𝐴𝑚). And the rest follows. 3. Now we consider the general case where we only know both 𝑓 and 𝑔 are either lower-bounded or upper bounded. • If both 𝑓 and 𝑔 are lower-bounded, reduce to the first case. 250 • If 𝑓 is lower-bounded by 𝐿, 𝑔 is upper-bounded by 𝑈, then we can consider function 𝑓 ′ = 𝑓 + 𝐿 and 𝑔′ = 𝑔 −𝑈. Then 𝑓 ′ is non-negative and 𝑔′ is non-positive, so by step 2, we have E𝑚∼𝜇 [ 𝑓 ′(𝜋𝐴𝑚) · 𝑔′(𝜋𝐵𝑚)] ≤ E𝑚∼𝜇 [ 𝑓 ′(𝜋𝐴𝑚)] · E𝑚∼𝜇 [𝑔′(𝜋𝐵𝑚)] . By calculations analogous to what we did in step 1, that implies E𝑚∼𝜇 [ 𝑓 (𝜋𝐴𝑚) · 𝑔(𝜋𝐵𝑚)] ≤ E𝑚∼𝜇 [ 𝑓 (𝜋𝐴𝑚)] · E𝑚∼𝜇 [𝑔(𝜋𝐵𝑚)] . • If 𝑓 is upper-bounded and 𝑔 is lower-bounded: analogous to above. • If both 𝑓 and 𝑔 are upper-bounded: also, analogous to above. Thus, 𝜇 satisfies {𝐴, 𝐵}-PNA implies 𝜇 satisfies (𝐴, 𝐵)-NA. And therefore, 𝜇 satisfies {𝐴, 𝐵}-PNA for any 𝐴, 𝐵 ⊆ 𝑆 implies 𝑆 satisfies strong NA in 𝜇. □ B.2.2 Omitted Proofs of Frame Conditions Theorem 3.3.4. The structure XPNA = (𝑋D, ⊑D, ⊕, 𝐸D) is a Down-Closed BI frame. Proof. We sketch the conditions, using the notation from the definition: Down-Closed. Let dom(𝑥) = 𝑆,dom(𝑥′) = 𝑆′,dom(𝑦) = 𝑇,dom(𝑦′) = 𝑇 ′. We claim that we can take 𝑧′ = 𝜋𝑆′∪𝑇 ′𝑧. We evidently have 𝑧 ⊒ 𝑧′, and 𝜋𝑆′𝑧 ′ = 𝜋𝑆′𝜋𝑆𝑧 = 𝑥 ′ and 𝜋𝑇 ′𝑧′ = 𝜋𝑇 ′𝜋𝑇 𝑧 = 𝑦′. What remains to show is that 𝑧′ is S ∪ T -PNA for any S, T such that 𝑥′ is S-PNA, 𝑦′ is T -PNA, and (∪S) ∩ (∪T ) = ∅. 251 If 𝑥′ is S-PNA, then 𝑥 is S-PNA; if 𝑦′ is T -PNA, then 𝑦 is T -PNA; then 𝑧 ∈ 𝑥 ⊕ 𝑦 must be S∪T -PNA. Since 𝑧′ := 𝜋𝑆′∪𝑇 ′𝑧, and (∪S) ∪ (∪T ) ⊆ 𝑆′∪𝑇 ′, we have 𝑧′ is S∪T -PNA too. And evidently, dom(𝑧′) = 𝑆′∪𝑇 ′ = dom(𝑥′) ∪ dom(𝑦′). So 𝑧′ ∈ 𝑥′ ⊕ 𝑦′. Commutativity Immediate. Associativity Let dom(𝑥) = 𝑅,dom(𝑦) = 𝑆,dom(𝑧) = 𝑇 . We can assume that these sets are all disjoint, otherwise there is nothing to prove. We claim that we can take 𝑠 = 𝜋𝑆∪𝑇𝑤. For any 𝑤 in 𝑡 ⊕ 𝑧, 𝑡 ∈ 𝑥 ⊕ 𝑦, we want to show that 𝑤 ∈ 𝑥 ⊕ 𝑠 and 𝑠 ∈ 𝑦 ⊕ 𝑧. • For any partition R,S such that (∪R) ∩ (∪S) = ∅ and 𝑥 is R-PNA, 𝑠 is S-PNA. For set 𝑋 ⊆ Var, write {𝑌 ∩ 𝑋 | 𝑌 ∈ S} as S𝑋 . Then, by Lemma B.1.3, 𝑠 is S-PNA implies 𝑦 must be S𝑆-PNA. Similarly, 𝑠 is S-PNA implies 𝑧 must be S𝑇 -PNA. Then, 𝑡 ∈ 𝑥⊕𝑦 must be R∪(S𝑆)-PNA, and 𝑤 ∈ 𝑡⊕𝑧 must be R∪S𝑆∪S𝑇 - PNA. Note that S coarsens S𝑆 ∪ S𝑇 so 𝑤 is R ∪ S𝑆 ∪ S𝑇 -PNA implies that 𝑤 is R ∪ S-PNA. Also, 𝜋𝑅𝑤 = 𝜋𝑅𝜋𝑅∪𝑆𝑤 = 𝜋𝑅𝑡 = 𝑥, and dom(𝑤) = 𝑅 ∪ 𝑆 ∪ 𝑇 = dom(𝑥) ∪ dom(𝑠). Hence, 𝑤 ∈ 𝑥 ⊕ 𝑠. • Note that 𝑥 is trivially {𝑅}-PNA. Then, for any partition S,T such that 𝑅 ∩ (∪S) ∩ (∪T ) = ∅ and 𝑦 is S-PNA and 𝑧 is T -PNA, first 𝑡 must be ({𝑅} ∪ S)-PNA, and then 𝑤 must be ({𝑅} ∪ S ∪ T)-PNA. By projection, 𝑠 = 𝜋𝑆∪𝑇 must be S ∪ T 𝑧-PNA. Also, 𝜋𝑆𝑠 = 𝜋𝑆𝜋𝑆∪𝑇𝑤 = 𝜋𝑆𝑤 = 𝜋𝑆𝜋𝑅∪𝑆𝑤 = 𝜋𝑆𝑡 = 𝑦, and similarly, 𝜋𝑇 𝑠 = 𝑧. Also, dom(𝑠) = 𝑆 ∪ 𝑇 = dom(𝑦) ∪ dom(𝑧). 252 Hence, 𝑠 ∈ 𝑦 ⊕ 𝑧. Unit Existence Take 𝑒 to be 𝜇 where 𝜇 is the (unique) distribution in D(Mem[∅]). Unit Closure Immediate as we take 𝐸 = 𝑀 . Unit Coherence 𝑥 ∈ 𝑦 ⊕ 𝑒 entails 𝑦 = 𝜋dom(𝑦)𝑥, which implies 𝑦 ⊑ 𝑥. □ B.3 Soundness and Completeness of 𝑀-BI algebras B.3.1 Algebraic Soundness and Completeness The proof is very similar to the proof of BI soundness and completeness in sec- tion 2.2.3: we first construct a new algebra – “𝑀-BI algebra,” prove the algebraic soundness and completeness and then establish the overall theorem. Definition B.3.1 (𝑀-BI algebra). An 𝑀-BI algebra is an algebra A𝑀 = (𝐴,∧,∨,→ ,⊤,⊥, ∗𝑚∈𝑀 ,−∗𝑚∈𝑀 ,⊤∗𝑚∈𝑀) such that • For each 𝑚 ∈ 𝑀 , the structure (𝑎,∧,∨,→,⊤,⊥, ∗𝑚,−∗𝑚,⊤∗𝑚) is a BI algebra; • If 𝑚1 ≤ 𝑚2 then 𝑎 ∗𝑚1 𝑏 ≤ 𝑎 ∗𝑚2 𝑏. We can interpret 𝑀-BI in an 𝑀-BI algebra A𝑀 . Let V : AP → A𝑀 be a map assigning atomic propositions to elements of A𝑀 . We extend V to an interpre- tation ⟦−⟧A mapping 𝑀-BI propositions to elements of A𝑀 , defined by: 253 ⟦𝑃⟧A = V(𝑃) ⟦⊤⟧A = ⊤ ⟦𝐼𝑚⟧A = ⊤∗𝑚 ⟦⊥⟧A = ⊥ ⟦𝑃 ∧𝑄⟧A = ⟦𝑃⟧A ∧ ⟦𝑄⟧A ⟦𝑃 ∨𝑄⟧A = ⟦𝑃⟧A ∨ ⟦𝑄⟧A ⟦𝑃→ 𝑄⟧A = ⟦𝑃⟧A → ⟦𝑄⟧A ⟦𝑃 ∗𝑚 𝑄⟧A = ⟦𝑃⟧A ∗𝑚 ⟦𝑄⟧A ⟦𝑃 −∗𝑚 𝑄⟧A = ⟦𝑃⟧A −∗𝑚 ⟦𝑄⟧A Theorem B.3.1 (Algebraic Soundness). If 𝑃 ⊢ 𝑄 is provable, then ⟦𝑃⟧A ≤ ⟦𝑄⟧A for any algebraic interpretation ⟦−⟧A. Proof. By induction on the derivation of 𝑃 ⊢ 𝑄. The cases for everything ex- cept ∗-WEAKENING follow from the exact same argument as for standard BI and BI-algebra, as in theorem 2.2.3. For the remaining case of ∗-WEAKENING, which derives 𝑃 ∗𝑚2 𝑄 from 𝑃 ∗𝑚1 𝑄 if 𝑚1 ≤ 𝑚2. We have ⟦𝑃 ∗𝑚1 𝑄⟧A = ⟦𝑃⟧A ∗𝑚1 ⟦𝑄⟧A (By definition of ⟦−⟧A) ≤ ⟦𝑃⟧A ∗𝑚2 ⟦𝑄⟧A (By definition of 𝑀-BI algebra) = ⟦𝑃 ∗𝑚2 𝑄⟧A. (By definition of ⟦−⟧A) □ Next, we want to show the algebraic completeness. Analogous to before the- orem 2.2.6, we construct the Lindenbaum-Tarski algebra corresponding to 𝑀-BI. 254 Definition B.3.2 (Lindenbaum-Tarski Algebra). Define the equivalence relation 𝑃 ∼ 𝑄 as 𝑃 ⊢ 𝑄 and 𝑄 ⊢ 𝑃. Let [𝑃]∼ be the equivalence class of 𝑃 under ∼. Take 𝐼𝑚, ⊤, and ⊥ to be [𝐼𝑚]∼, [⊤]∼, and [⊥]∼, respectively. Then we define: ... [𝑃]∼ ∗𝑚 [𝑄]∼ = [𝑃 ∗𝑚 𝑄]∼ [𝑃]∼ −∗𝑚 [𝑄]∼ = [𝑃 −∗𝑚 𝑄]∼ The fact that these operations are well-defined and form a 𝑀-BI algebra follows almost entirely from lemma 2.2.4. The only remaining case is to check that if 𝑚1 ≤ 𝑚2 then [𝑃]∼ ∗𝑚1 [𝑄]∼ ≤ [𝑃]∼ ∗𝑚2 [𝑄]∼. We have [𝑃]∼ ∗𝑚1 [𝑄]∼ = [𝑃 ∗𝑚1 𝑄]∼ ≤ [𝑃 ∗𝑚2 𝑄]∼ (Since 𝑃 ∗𝑚1 𝑄 ⊢ 𝑃 ∗𝑚2 𝑄) = [𝑃]∼ ∗𝑚2 [𝑄]∼ Then, we can construct an algebraic interpretation into Lindenbaum-Tarski algebra, ⟦−⟧L, and use it to prove algebraic completeness. Theorem B.3.2 (Algebraic Completeness). If ⟦𝑃⟧A ≤ ⟦𝑄⟧A for all algebraic inter- pretations ⟦−⟧A, then 𝑃 ⊢ 𝑄. The proof is identical to the proof for theorem 2.2.6. B.3.2 Soundness of 𝑀-BI formulas 𝑀-BI formulas are interpreted on 𝑀-BI frames. We define a structure called complex algebra on 𝑀-BI frames and show that the complex algebra of every 𝑀- 255 BI frame is an 𝑀-BI algebra. Definition B.3.3 (Complex Algebra). If X is an 𝑀-BI frame, then the complex algebra of X, written Com(X) is the structure (P⊑ (𝑋),∩,∪,→X , 𝑋, ∅, ∗𝑚∈𝑀 ,−∗𝑚∈𝑀 , 𝐸𝑚∈𝑀) where P⊑ (𝑋) = {𝐴 ⊆ 𝑋 | 𝑎 ∈ 𝐴 ∧ 𝑎 ⊑ 𝑏 → 𝑏 ∈ 𝐴} 𝐴→X 𝐵 = {𝑎 | ∀𝑏. 𝑎 ⊑ 𝑏 ∧ 𝑏 ∈ 𝐴→ 𝑏 ∈ 𝐵} 𝐴 ∗𝑚 𝐵 = {𝑥 | ∃𝑤, 𝑦, 𝑧. 𝑤 ⊑ 𝑥 ∧ 𝑤 ∈ 𝑦 ⊕𝑚 𝑧 ∧ 𝑦 ∈ 𝐴 ∧ 𝑧 ∈ 𝐵} 𝐴 −∗𝑚 𝐵 = {𝑥 | ∀𝑤, 𝑦, 𝑧. (𝑥 ⊑ 𝑤 ∧ 𝑧 ∈ 𝑤 ⊕𝑚 𝑦 ∧ 𝑦 ∈ 𝐴) → 𝑧 ∈ 𝐵} Lemma B.3.3. If X is an 𝑀-BI frame, then Com(X) is an 𝑀-BI algebra. Proof. Each (𝑋, ⊑, ⊕𝑚, 𝐸𝑚) is a BI frame. Lemma 2.2.7 shows that the complex of a BI frame is a BI algebra. Thus the only thing to check is that the ordering on ∗ respects the ordering on 𝑀 . Let 𝑚1 ≤ 𝑚2. We must show that 𝐴 ∗𝑚1 𝐵 ⊆ 𝐴 ∗𝑚2 𝐵. Let 𝑥 ∈ 𝐴 ∗𝑚1 𝐵. Then there exists 𝑤, 𝑦, 𝑧 such that 𝑤 ⊑ 𝑥 and 𝑤 ∈ 𝑦 ⊕𝑚1 𝑧, with 𝑦 ∈ 𝐴 and 𝑧 ∈ 𝐵. by Operation Inclusion property, we have that 𝑤 ∈ 𝑦 ⊕𝑚2 𝑧, hence 𝑥 ∈ 𝐴 ∗𝑚2 𝐵. □ Theorem B.3.4. Let X = (𝑋, ⊑, ◦𝑚, 𝐸𝑚) be a 𝑀-BI frame and letVf : AP → P(𝑋) be a persistent valuation on X. Define the algebraic assignmentVa : AP → Com(X) by lettingVa(𝑝) = Vf(𝑝) for all atomic proposition 𝑝. Define the algebraic interpretation ⟦−⟧𝑎 by taking the homomorphic extension ofV𝑎 Then we have: 𝑥 |=Vf 𝑃 if and only if 𝑥 ∈ ⟦𝑃⟧a. Proof. The proof is almost identical to the proof for theorem 2.2.8. We show that by induction on the syntax of the formula 𝑃. 𝑀-BI formula only differs from BI formula by having indexed version of the ∗, 𝐼,−∗, so the only difference in the 256 proof is that: in the induction proof for formula ∗𝑚, 𝐼𝑚,−∗𝑚, we use the indexed version of the operations in the complex algebra. □ Theorem B.3.5 (Soundness of 𝑀-BI). In 𝑀-BI logic, if 𝑃 ⊢ 𝑄 is derivable, then 𝑃 |= 𝑄. The proof is identical to the proof of theorem 2.2.9. B.3.3 Completeness of 𝑀-BI formulas We reverse the direction now; we define a prime filter frame for every 𝑀-BI alge- bra and show that a prime filter frame of any 𝑀-BI algebra is an 𝑀-BI frame. Definition B.3.4 (Prime Filter Frame). If A is an 𝑀-BI algebra, then the prime filter 𝑀-frame of A is defined as Prf(A) = (Prf(𝐴), ⊆, ⊕𝑚∈𝑀 , 𝐸𝑚∈𝑀) where 𝐹1 ⊕𝑚 𝐹2 = {𝐹 ∈ Prf(𝐴) | ∀𝑎1 ∈ 𝐹1.∀𝑎2 ∈ 𝐹2. 𝑎1 ∗𝑚 𝑎2 ∈ 𝐹} 𝐸𝑚 = {𝐹 ∈ Prf(𝐴) | ⊤∗𝑚 ∈ 𝐹} Lemma B.3.6. If A is an 𝑀-BI algebra, then Prf(A) is an 𝑀-BI frame. Proof. Lemma 2.2.10 shows that for each𝑚 ∈ 𝑀 , (Prf(𝐴), ⊆, ⊕𝑚, 𝐸𝑚) is a BI frame. Therefore, we only need to check the Operation Inclusion property. Let 𝑚1 ≤ 𝑚2 and let 𝐹, 𝐺, 𝐻 ∈ Prf(𝐴) with 𝐹 ∈ 𝐺 ⊕𝑚1 𝐻. Let 𝑎 ∈ 𝐺 and 𝑏 ∈ 𝐻. Then 𝑎 ∗𝑚1 𝑏 ∈ 𝐹. Since 𝑎 ∗𝑚1 𝑏 ≤ 𝑎 ∗𝑚2 𝑏, and filters are upward-closed, 𝑎 ∗𝑚2 𝑏 ∈ 𝐹, hence 𝐹 ∈ 𝐺 ⊕𝑚2 𝐻. □ Theorem B.3.7. Let A = (𝐴, . . . ) be a 𝑀-BI algebra and let ⟦−⟧ : FormBI → 𝐴 be an algebraic interpretation that homomorphically extends the assignmentVa : AP → 𝐴. 257 Define the persistent valuationVf : AP → P(Prf(𝐴)) on the prime filter frame Prf(A) by: Vf(𝑝) = {𝐹 ∈ Prf(𝐴) | Va(𝑝) ∈ 𝐹} Then for 𝐹 ∈ Prf(𝐴), we have 𝐹 |=Vf 𝑃 if and only if ⟦𝑃⟧ ∈ 𝐹 . The proof is almost identical to the proof of theorem 2.2.11. Then, we can prove the completeness of 𝑀-BI using the same argument as for theorem 2.2.12. Theorem B.3.8 (Completeness of 𝑀-BI). In 𝑀-BI logic, if 𝑃 |= 𝑄, then 𝑃 ⊢ 𝑄 is derivable. B.4 A 𝑀-BI Model for Independence and Negative Association B.4.1 Independence Implies PNA The proof that independence implies PNA will use the following lemma. Lemma B.4.1. In a distribution 𝜇, if 𝜇 satisfies {𝑆1, 𝑆2}-PNA, 𝜇 satisfies {𝑇1, 𝑇2}- PNA, and 𝑆1 ∪ 𝑆2 is independent from 𝑇1 ∪ 𝑇2 in 𝜇 then 𝜇 is {𝑆1 ∪ 𝑇1, 𝑆2 ∪ 𝑇2}-PNA. Proof. By the definition of PNA and independence, 𝑆1, 𝑆2 are disjoint, 𝑇1, 𝑇2 are disjoint, and 𝑆1 ∪ 𝑇1, 𝑆2 ∪ 𝑇2 are disjoint. For any monotonically decreasing/in- 258 creasing functions 𝑓 : Mem[𝑆1 ∪ 𝑇1] → R+, 𝑔 : Mem[𝑆2 ∪ 𝑇2] → R+, E𝑚∼𝜇 [ 𝑓 (𝜋𝑆1∪𝑇1𝑚) · 𝑔(𝜋𝑆2∪𝑇2𝑚)] = E𝑠∼𝜋𝑆1∪𝑆2 𝜇 E𝑡∼𝜋𝑇1∪𝑇2 𝜇 [ 𝑓 (𝜋𝑆1𝑠 ⊲⊳ 𝜋𝑇1𝑡) · 𝑔(𝜋𝑆2𝑠 ⊲⊳ 𝜋𝑇2𝑡)] (By independence of 𝑆1 ∪ 𝑆2 and 𝑇1 ∪ 𝑇2) ≤ E𝑠∼𝜋𝑆1∪𝑆2 𝜇 ( E𝑡1∼𝜋𝑇1 𝜇 [ 𝑓 (𝜋𝑆1𝑠, 𝑡1)] · E𝑡2∼𝜋𝑇2 𝜇 [𝑔(𝜋𝑆2𝑠, 𝑡2)] ) (♦) ≤ E𝑠1∼𝜋𝑆1 𝜇 E𝑡1∼𝜋𝑇1 𝜇 [ 𝑓 (𝑠1, 𝑡1)] · E𝑠2∼𝜋𝑆2 𝜇 E𝑡2∼𝜋𝑇2 𝜇 [𝑔(𝑠2, 𝑡2)] (♥) ≤ E𝑚∼𝜇 [ 𝑓 (𝜋𝑆1∪𝑇1𝑚)] · E𝑚∼𝜇 [𝑔(𝜋𝑆2∪𝑇2𝑚)] (♣) where the step ♦ is because 𝜋𝑇1∪𝑇2𝜇 is 𝑇1, 𝑇2-PNA and 𝑓 (𝜋𝑆1𝑠, 𝑡1), 𝑔(𝜋𝑆2𝑠, 𝑡2) are both monotonically decreasing/increasing in 𝑇1, 𝑇2; the step ♥ is because 𝜋𝑆1∪𝑆2𝜇 is 𝑆1, 𝑆2-PNA and that E𝑡1∼𝜋𝑇1 𝜇 [ 𝑓 (𝜋𝑆1𝑠, 𝑡1)], and E𝑡2∼𝜋𝑇2 𝜇 [𝑔(𝜋𝑆2𝑠, 𝑡2)] are both monotonically decreasing/increasing in 𝑆1, 𝑆2; and the step ♣ is by indepen- dence of 𝑆1 and 𝑇1 and the independence of 𝑆2 and 𝑇2 in 𝜇. □ Theorem 3.4.5 (Independence implies PNA). Let 𝑆, 𝑇 ⊆ Var be two disjoint sets of variables. Suppose 𝜇1 ∈ D(Mem[𝑆]), 𝜇2 ∈ D(Mem[𝑇]). If 𝜇1 satisfies S-PNA and 𝜇2 satisfies T -PNA, then any 𝜇 ∈ 𝜇𝑆 ⊗D 𝜇𝑇 satisfies (S ∪ T )-PNA. Proof. Fix S and T . Say S = {𝑆1, . . . , 𝑆𝑝} and T = {𝑇1, . . . , 𝑇𝑞}. For any R coarsening S ∪ T , indexing S ∪ T as {𝑈1, . . . ,𝑈𝑝+𝑞}, indexing R as {𝑅1, . . . , 𝑅𝑛}, we have: R = {∪{𝑈 𝑗 | 𝑗 ∈ 𝑓 (𝑖)} | 𝑖 ∈ [𝑛]}. Then, given a family of monotonically increasing/decreasing functions 𝑔𝑖 : 𝑅𝑖 → R+ E𝑚∼𝜇 [ ∏ 𝑅𝑖∈R 𝑔𝑖 (𝜋𝑅𝑖𝑚) ] = E𝑚∼𝜇  ∏ 𝑖∈[𝑛] 𝑔𝑖 (𝜋∪{𝑈 𝑗 | 𝑗∈ 𝑓 (𝑖)}𝑚)  . 259 For each 𝑖, ∪{𝑈 𝑗 | 𝑗 ∈ 𝑓 (𝑖)} can be divided into the part in 𝑆 and the part in 𝑇 . We refer to them as 𝑆′ 𝑖 and 𝑇 ′ 𝑖 . (Some of 𝑆′ 𝑖 and 𝑇 ′ 𝑖 may be empty). Thus, for each 𝑖, 𝑔𝑖 (𝜋∪{𝑈 𝑗 | 𝑗∈ 𝑓 (𝑖)}𝑚) = 𝑔𝑖 (𝜋𝑆′𝑖∪𝑇 ′𝑖𝑚). By Lemma B.1.3, S′ = {𝑆′1, . . . , 𝑆 ′ 𝑛} coarsens S, and T ′ = {𝑇 ′1, . . . , 𝑇 ′ 𝑛} coarsens T . So 𝜇 is S′-PNA and T ′-PNA. We prove by induction on 𝑘 ∈ [𝑛] that E𝑚∼𝜇  ∏ 𝑖∈[𝑘] 𝑔𝑖 (𝜋𝑆′ 𝑖 ∪𝑇 ′ 𝑖 𝑚)  ≤ ∏ 𝑖∈[𝑘] E𝑚∼𝜇 [ 𝑔𝑖 (𝜋𝑆′ 𝑖 ∪𝑇 ′ 𝑖 𝑚) ] . Base case When 𝑘 = 1, trivial. Inductive case For 𝑘 < 𝑛, assume E𝑚∼𝜇  ∏ 𝑖∈[𝑘] 𝑔𝑖 (𝜋𝑆′ 𝑖 ∪𝑇 ′ 𝑖 𝑚)  ≤ ∏ 𝑖∈[𝑘] E𝑚∼𝜇 [ 𝑔𝑖 (𝜋𝑆′ 𝑖 ∪𝑇 ′ 𝑖 𝑚) ] . Note that 𝜇 is S′-PNA implies that 𝜇 is {∪𝑖∈[𝑘] (𝑆′𝑖), 𝑆′𝑘+1}-PNA, and 𝜇 is T ′- PNA implies that {∪𝑖∈[𝑘] (𝑇 ′𝑖 ), 𝑇 ′𝑘+1}-NA. Thus, by Lemma B.4.1, 𝜇 is also {{∪𝑖∈[𝑘] (𝑆′𝑖) ∪ {∪𝑖∈[𝑘] (𝑇 ′𝑖 ), 𝑆′𝑘+1 ∪ 𝑇 ′ 𝑘+1}-NA. Also, since all 𝑔𝑖 is monotoni- cally increasing (decreasing) and non-negative, 𝑚 ↦→ ∏ 𝑖∈[𝑘] 𝑔𝑖 (𝑚) is also a monotonically increasing (decreasing) function from ∪𝑖∈[𝑘]𝑆′𝑖 ∪ ∪𝑖∈[𝑘]𝑇 ′𝑖 to R+. Therefore, E𝑚∼𝜇  ∏ 𝑖∈[𝑘+1] 𝑔𝑖 (𝜋𝑆′ 𝑖 ∪𝑇 ′ 𝑖 𝑚)  ≤ E𝑚∼𝜇  ∏ 𝑖∈[𝑘] 𝑔𝑖 (𝜋𝑆′ 𝑖 ∪𝑇 ′ 𝑖 𝑚)  · E𝑚∼𝜇 [ 𝑔𝑘+1(𝜋𝑆′ 𝑘+1∪𝑇 ′ 𝑘+1 𝑚) ] ≤ ∏ 𝑖∈[𝑘+1] E𝑚∼𝜇 [ 𝑔𝑖 (𝜋𝑆′ 𝑖 ∪𝑇 ′ 𝑖 𝑚) ] , where the second inequality follows from the inductive hypothesis. 260 Thus, the desired inequality holds for any R coarseningS∪T and any family of monotonically increasing (decreasing) functions on R. Thus, 𝜇 is S∪T -PNA. □ B.4.2 Axioms of Negative Association Lemma 3.5.5 (N-NARY MONOTONE MAP). Let 𝑥, 𝑥𝛾,𝛼 and 𝑦𝛾 be program variables. Let 𝐾𝛾 be natural numbers. The following is valid in (X𝑁𝐴,V∗). |= 𝑁 ⊛ 𝛾=0 ©­« 𝐾𝛾∧ 𝛼=0 Own(𝑥𝛾,𝛼)ª®¬ ∧ 𝑁∧ 𝛾=0 [ 𝑦𝛾 = 𝑓𝛾 ( 𝑥𝛾,0, . . . , 𝑥𝛾,𝐾𝛾 )] → 𝑁 ⊛ 𝛾=0 Own(𝑦𝛾) when 𝑓1, . . . , 𝑓𝑁 all monotone or all antitone (Mono-Map) Proof. Abbreviate the partition of variables {⋃𝐾𝛾 𝛼=0{𝑥𝛾,𝛼} | 𝐿 ≤ 𝛾 ≤ 𝑀 } as 𝑋 [𝐿 : 𝑀]. Intuitively, we group all the 𝑥𝛾,𝛼 with the same 𝛾 as a block in the partition, and different blocks in the partition are separated by the separating conjunction ⊛. For any 𝜇 |=⊛𝑁 𝛾=0 (∧𝐾𝛾 𝛼=0 Own(𝑥𝛾,𝛼) ) , we use induction and definition unfold- ing to show that 𝜇 satisfies 𝑋𝑁 -PNA. We choose the inductive hypothesis 𝑃(𝑀) to be: 𝜇 |=⊛𝑀 𝛾=0 (∧𝐾𝛾 𝛼=0 Own(𝑥𝛾,𝛼) ) implies that 𝜇 is 𝑋 [0 : 𝑀]-PNA. Base case: 𝑋 [: 0] = {⋃𝐾𝛾 𝛼=0{𝑥𝛾,𝛼} } is partition that contains a single block. Thus, 𝜇 is trivially 𝑋 [0 : 0]-PNA. Inductive case: For any 0 < 𝑀 ≤ 𝑀 , by satisfaction rules, 𝜇 |= ⊛𝑀 𝛾=0 (∧𝐾𝛾 𝛼=0 Own(𝑥𝛾,𝛼) ) implies there exists 𝜇′, 𝜇1, 𝜇2 such that 𝜇 ⊒ 𝜇′ ∈ 261 𝜇1 ⊕ 𝜇2, 𝜇1 |= 𝑀−1 ⊛ 𝛾=0 ( 𝐾𝛾∧ 𝛼=0 Own(𝑥𝛾,𝛼)) and 𝜇2 |= 𝐾𝑀∧ 𝛼=0 Own(𝑥𝑀,𝛼) By inductive hypothesis, 𝜇1 satisfies 𝑋 [0 : 𝑀 − 1]-PNA. And 𝑋 [𝑀 : 𝑀] is a partition that contains a single block, so trivially, 𝜇2 satisfies 𝑋 [𝑀 : 𝑀]- PNA. Therefore, 𝜇′ ∈ 𝜇1 ⊕ 𝜇2 implies that 𝜇′ is 𝑋 [0 : 𝑀]-PNA Therefore, we can conclude 𝜇 is 𝑋 [0 : 𝑁]-PNA from 𝜇 |=⊛𝑁 𝛾=0 (∧𝐾𝛾 𝛼=0 Own(𝑥𝛾,𝛼) ) . If additionally 𝜇 |= ∧𝑁 𝛾=0 𝑦𝛾 = 𝑓𝛾 (𝑥𝛾,1, . . . , 𝑥𝛾,𝐾𝛾 ) and 𝑓𝛾 are all monotone or antitone, then we can show that 𝜇 is { {𝑦𝛾} | 1 ≤ 𝛾 ≤ 𝑁 } -PNA. For any family of non-negative monotone functions 𝑔𝛾, note that the composed function 𝑔𝛾 ◦ 𝑓𝛾 are either all monotone or all antitone. Thus, E𝑚∼𝜇  ∏ 1≤𝛾≤𝑁 𝑔𝛾 (𝑦𝛾)  = E𝑚∼𝜇  ∏ 1≤𝛾≤𝑁 𝑔𝛾 ( 𝑓𝛾 (𝑥𝛾,1, . . . , 𝑥𝛾,𝐾𝛾 ))  = E𝑚∼𝜇  ∏ 1≤𝛾≤𝑁 (𝑔𝛾 ◦ 𝑓𝛾) (𝑥𝛾,1, . . . , 𝑥𝛾,𝐾𝛾 )  ≤ ∏ 1≤𝛾≤𝑁 E𝑚∼𝜇 [ (𝑔𝛾 ◦ 𝑓𝛾) (𝑥𝛾,1, . . . , 𝑥𝛾,𝐾𝛾 ) ] (Because 𝜇 is 𝑋 [0, 𝑁]-PNA) = ∏ 1≤𝛾≤𝑁 E𝑚∼𝜇 [ 𝑔𝛾 (𝑦𝛾) ] . (B.7) That is, 𝜇 is { {𝑦𝛾} | 1 ≤ 𝛾 ≤ 𝑁 } -PNA. And by Theorem 3.3.1 this implies that {{𝑦𝛾} | 1 ≤ 𝛾 ≤ 𝑁} satisfies NA in 𝜇. Then, by Theorem 3.3.5, (𝜎, 𝜇) |= ⊛𝑁 𝛾=1 Own(𝑦𝛾). □ 262 B.4.3 The Restriction Property of 𝑀-BI Formulas For the counterexample of the restriction property, we prove a lemma. Lemma B.4.2. Let 𝜇 be the uniform distribution over one hot vectors on 𝐴, 𝐵. Then, 𝜇 |= (Unif{0,1}⟨𝐶⟩) −⊛ (Own(𝐵) ∗ Own(𝐶)). Proof. Fix any 𝜇𝐶 such that 𝜇𝐶 |= Unif{0,1}⟨𝐶⟩, which implies that 𝜋𝐶𝜇𝐶 (0) = 0.5 and 𝜋𝐶𝜇𝐶 (1) = 0.5. Fix 𝜇𝑒 ∈ 𝜇 ⊕ 𝜇𝐶 . Since 𝐵 ∈ dom(𝜇), 𝜇 is trivially {{𝐵}}-PNA. Similarly, 𝜇𝐶 is trivially {{𝐶}}- PNA. Thus, 𝜇𝑒 ∈ 𝜇 ⊕ 𝜇𝐶 must be {{𝐵}, {𝐶}}-PNA. Then for any two both monotone or antitone functions 𝑓 : Mem[𝐵] → R+, 𝑔 : Mem[𝐶] → R+, E𝑚∼𝜇𝑒 [ 𝑓 (𝜋𝐵𝑚) · 𝑔(𝜋𝐶𝑚)] ≤ E𝑚∼𝜇𝑒 [ 𝑓 (𝜋𝐵𝑚)] · E𝑚∼𝜇𝑒 [𝑔(𝜋𝐶𝑚)] . Similarly, 𝜇𝑒 ∈ 𝜇 ⊕ 𝜇𝐶 must be {{𝐴}, {𝐶}}-PNA, and thus, for any two both monotone or antitone functions 𝑓 : Mem[𝐴] → R+, 𝑔 : Mem[𝐶] → R+, E𝑚∼𝜇𝑒 [ 𝑓 (𝜋𝐴𝑚) · 𝑔(𝜋𝐶𝑚)] ≤ E𝑚∼𝜇𝑒 [ 𝑓 (𝜋𝐴𝑚)] · E𝑚∼𝜇𝑒 [𝑔(𝑝𝑖𝐶𝑚)] . (B.8) Next, we want to prove that 𝜇𝑒 |= Own(𝐵) ∗ Own(𝐶). We prove by contradiction. Suppose variables 𝐵 and 𝐶 are not independent in 𝜇𝑒, then by Lemma B.1.1 that says NA definition with equality instead of inequality asserts independence, there must exists some both monotone or both antitone functions 𝑓 : Mem[𝐵] → R+, 𝑔 : Mem[𝐶] → R+ such that E𝑚∼𝜇𝑒 [ 𝑓 (𝜋𝐵𝑚) · 𝑔(𝜋𝐶𝑚)] < E𝑚∼𝜇𝑒 [ 𝑓 (𝜋𝐵𝑚)] · E𝑚∼𝜇𝑒 [𝑔(𝜋𝐶𝑚)] Since 𝜇𝑒 ∈ 𝜇 ⊕ 𝜇𝐶 , we have 𝜇𝑒 ⊒ 𝜇, and 𝜇 being a uniform distribution over one-hot vectors on 𝐴, 𝐵 indicates that for any 𝑚 in the support of 𝜇𝑒, 𝐴 = 1 iff 263 𝐵 = 0, and 𝐴 = 0 iff 𝐵 = 1. Therefore, E𝑚∼𝜇𝑒 [ 𝑓 (−𝜋𝐴𝑚) · (−𝑔(𝜋𝐶𝑚))] = E𝑚∼𝜇𝑒 [ 𝑓 (𝜋𝐵𝑚) · (−𝑔(𝜋𝐶𝑚))] = −E𝑚∼𝜇𝑒 [ 𝑓 (𝜋𝐵𝑚) · 𝑔(𝜋𝐶𝑚)] > −E𝑚∼𝜇𝑒 [ 𝑓 (−𝜋𝐴𝑚)] · E𝑚∼𝜇𝑒 [𝑔(𝜋𝐶𝑚)] = E𝑚∼𝜇𝑒 [ 𝑓 (−𝜋𝐴𝑚)] · E𝑚∼𝜇𝑒 [−𝑔(𝜋𝐶𝑚)] where 𝑥 ↦→ 𝑓 (−𝑥) and 𝑥 → −𝑔(𝑥) are two both monotone or both antitone func- tions because 𝑓 , 𝑔 are so. Thus, this inequality contradicts Equation (B.8). Therefore, 𝐵 and 𝐶 must be independent in 𝜇𝑒. Hence, 𝜇𝑒 |= Own(𝐵) ∗ Own(𝐶), and 𝜇 |= (Unif{0,1}⟨𝐶⟩) −⊛ (Own(𝐵) ∗ Own(𝐶)). □ Theorem 3.5.2. There exists 𝜇 ∈ D(Mem[𝑆]) and formula 𝜑 such that 𝜇 |= 𝜑 but 𝜋FV(𝜑) ̸ |= 𝜑. Proof. Let 𝐴, 𝐵, 𝐶 be three variables in Var. Let 𝜑 = (Unif{0,1}⟨𝐶⟩) −⊛ (Own(𝐵) ∗ Own(𝐶)). Let 𝜇 be the uniform distribution over one hot vectors on 𝐴, 𝐵. Then, we claim 𝜇 |= 𝜑 but 𝜋{𝐵,𝐶}𝜇 ̸ |= 𝜑. For 𝜇 |= 𝜑, it suffices to show that for any 𝜇𝐶 where 𝐶’s value is the uniform distribution on {0, 1}, for any 𝜇′ ⊒ 𝜇, and 𝜇𝑒 ∈ 𝜇′ ⊕ 𝜇𝐶 , 𝐵 and 𝐶 are independent in 𝜇𝑒 according to Lemma B.4.2. To show 𝜋{𝐵,𝐶}𝜇 ̸ |= 𝜑, we first note that 𝜋{𝐵,𝐶}𝜇 = 𝜋{𝐵}𝜇 is a uniform distri- bution of 0 and 1 on 𝐵. Let 𝜇′ 𝐶 ∈ D(Mem[{𝐶}]) be the uniform distribution on {0, 1}, 𝜇′ ∈ D(Mem[{𝐵,𝐶}]) be the uniform distribution over one-hot vectors on 𝐵,𝐶. Clearly, 𝐵,𝐶 are not independent in 𝜇′, so 𝜇′ ̸ |= Own(𝐵) ∗ Own(𝐶). Also, 𝜇′ is in 𝜋{𝐵,𝐶}𝜇 ⊕ 𝜇′𝐶 . So 𝜋𝐵,𝐶𝜇 ̸ |= Unif{0,1}⟨𝐶⟩ −⊛ (Own(𝐵) ∗ Own(𝐶)). □ Theorem 3.5.1 (Restriction). For any distribution 𝜇 ∈ 𝑋D, for any 𝜑 be an MBI+ 264 formula interpreted on (X𝑁𝐴,V∗), and any valuationV, 𝜇 |=V 𝜑⇔ 𝜋FV(𝜑)𝜇 |=V 𝜑. Proof. We prove it by induction on the syntax of formula. Most cases are the same as in lemma 2.3.7. So we only show the case for the additional case 𝑃 ⊛ 𝑄. 𝜑 = 𝑃 ⊛ 𝑄: Assuming 𝜇 |= 𝑃 ⊛ 𝑄, then there exists 𝜇′, 𝜇1, 𝜎2 such that 𝜇 ⊒ 𝜇′ ∈ 𝜇1 ⊕ 𝜇2, 𝜇1 |= 𝑃, and 𝜇2 |= 𝑄. By the definition of the pre-order and ⊕, it must 𝜇 ⊒ 𝜇′ ∈ 𝜇1 ⊕ 𝜇2. By inductive hypothesis, 𝜋FV(𝜑)𝜇1 |= 𝑃 and 𝜋FV(𝜑)𝜇2 |= 𝑄. Also, by eq. (Down-Closed), there exists 𝜇′′ ⊑ 𝜇′ and 𝜇′′𝜋FV(𝜑)𝜇1 ⊕ 𝜋FV(𝜑)𝜇2. So 𝜇′′ |= 𝑃 ⊛ 𝑄 and by persistence, 𝜇 |= 𝑃 ⊛ 𝑄. □ 265 APPENDIX C DIBI: A BUNCHED LOGIC FOR CONDITIONAL INDEPENDENCE C.1 A Probabilistic Model of DIBI Remark. In the following, we sometimes abbreviate dom( 𝑓𝑖) as 𝐷𝑖 and range( 𝑓𝑖) as 𝑅𝑖. C.1.1 Well-definedness of the Structure To facilitate proving that (X𝐶𝐼 , ⊑, ⊕̂, ⊙̂, 𝐸) is a DIBI frame, we first show some properties of the binary operations and the order. First, we prove that X𝐶𝐼 is closed under ⊕ and ⊙. Lemma C.1.1. X𝐶𝐼 is closed under ⊕ and ⊙. Proof. For any 𝑓1, 𝑓2 ∈ X𝐶𝐼 , we need to show that • If 𝑓1 ⊕ 𝑓2 is defined, then 𝑓1 ⊕ 𝑓2 ∈ X𝐶𝐼 . Recall that 𝑓1 ⊕ 𝑓2 is defined if and only if 𝑅1∩𝑅2 = 𝐷1∩𝐷2, which implies that (𝑅1 ∪ 𝑅2) \ (𝐷1 ∪ 𝐷1) = (𝑅1 \ 𝐷1) ⊎ (𝑅2 \ 𝐷2). State 𝑓1 ⊕ 𝑓2 preserves the input because for any 𝑑 ∈ Mem[𝐷1 ∪ 𝐷2], we 266 can obtain the following (we will refer to this as ★): (𝜋𝐷1∪𝐷2 ( 𝑓1 ⊕ 𝑓2)) (𝑑) (𝑑) = ∑︁ 𝑥∈Mem[(𝑅1∪𝑅2)\(𝐷1∪𝐷2)] ( 𝑓1 ⊕ 𝑓2) (𝑑) (𝑑 ⊲⊳ 𝑥) = ∑︁ 𝑥1∈Mem[𝑅1\𝐷1], 𝑥2∈Mem[𝑅2\𝐷2] 𝑓1(𝑑𝐷1) (𝑑𝐷1 ⊲⊳ 𝑥1) · 𝑓2(𝑑𝐷2) (𝑑𝐷2 ⊲⊳ 𝑥2) = ©­« ∑︁ 𝑥1∈Mem[𝑅1\𝐷1] 𝑓1(𝑑𝐷1) (𝑑𝐷1 ⊲⊳ 𝑥1)ª®¬ · ©­« ∑︁ 𝑥2∈Mem[𝑅2\𝐷2] 𝑓2(𝑑𝐷2) (𝑑𝐷2 ⊲⊳ 𝑥2)ª®¬ = 1 · 1 = 1 (Using 𝑓1, 𝑓2 ∈ X𝐶𝐼) Then, for any input 𝑑 ∈ Mem[𝐷1 ∪ 𝐷2], ( 𝑓1 ⊕ 𝑓2) (𝑑) is a distribution since:∑︁ 𝑚∈Mem[𝑅1∪𝑅2] ( 𝑓1 ⊕ 𝑓2) (𝑑) (𝑚) = ∑︁ 𝑚∈Mem[𝑅1∪𝑅2] 𝑓1(𝑑𝐷1) (𝑚𝑅1) · 𝑓2(𝑑𝐷2) (𝑚𝑅2) ‡ = ∑︁ 𝑥1∈Mem[𝑅1\𝐷1], 𝑥2∈Mem[𝑅2\𝐷2] 𝑓1(𝑑𝐷1) (𝑑𝐷1 ⊲⊳ 𝑥1) · 𝑓2(𝑑𝐷2) (𝑑𝐷2 ⊲⊳ 𝑥2) = 1 (Using the last two steps of (★)) Step ‡ follows 𝑓1 and 𝑓2 being input-preserving, which means that the term 𝑓𝑖 (𝜋𝑑𝐷𝑖) (𝜋𝑚𝑅𝑖) is 0 when 𝑑𝐷𝑖 ≠ 𝑚𝐷𝑖 . Thus, 𝑓1 ⊕ 𝑓2 is a kernel in X𝐶𝐼 . • If 𝑓1 ⊙ 𝑓2 is defined, then 𝑓1 ⊙ 𝑓2 ∈ X𝐶𝐼 . Recall that 𝑓1 ⊙ 𝑓2 : Mem[𝐷1] → D(Mem[𝑅2]) is defined iff 𝑅1 = 𝐷2. The composition 𝑓1 ⊙ 𝑓2 preserves the input because for any 𝑑 ∈ Mem[𝐷1], we 267 can obtain (♠): (𝜋𝐷1 𝑓1 ⊙ 𝑓2) (𝑑) (𝑑) = ∑︁ 𝑥∈Mem[𝑅2\𝐷1] ( 𝑓1 ⊙ 𝑓2) (𝑑) (𝑑 ⊲⊳ 𝑥) = ∑︁ 𝑥∈Mem[𝑅2\𝐷1] 𝑓1(𝑑) (𝑑 ⊲⊳ 𝑥𝑅1\𝐷1) · 𝑓2(𝑑 ⊲⊳ 𝑥𝑅1\𝐷1) (𝑑 ⊲⊳ 𝑥) = ∑︁ 𝑥1∈Mem[𝑅1\𝐷1] 𝑓1(𝑑) (𝑑 ⊲⊳ 𝑥1) · ©­« ∑︁ 𝑥2∈Mem[𝑅2\𝑅1] 𝑓2(𝑑 ⊲⊳ 𝑥1) (𝑑 ⊲⊳ 𝑥1 ⊲⊳ 𝑥2)ª®¬ = ∑︁ 𝑥1∈Mem[𝑅1\𝐷1] ( 𝑓1(𝑑) (𝑑 ⊲⊳ 𝑥1) · 1) (Using 𝑓2 ∈ X𝐶𝐼) = 1 Then, for any 𝑑 ∈ 𝐷1, ( 𝑓1 ⊙ 𝑓2) (𝑑) is a distribution as∑︁ 𝑚∈Mem[𝑅2] ( 𝑓1 ⊙ 𝑓2) (𝑑) (𝑚) = ∑︁ 𝑚∈Mem[𝑅2] 𝑓1(𝑑) (𝑚𝑅1) · 𝑓2(𝑚𝑅1) (𝑚) (Equation (4.2)) ♥ = ∑︁ 𝑥∈Mem[𝑅2\𝐷1] 𝑓1(𝑑) (𝑑 ⊲⊳ 𝑥𝑅1\𝐷1) · 𝑓2(𝑑 ⊲⊳ 𝑥𝑅1\𝐷1) (𝑑 ⊲⊳ 𝑥) = 1 (Using the last three steps of (♠)) Step ♥ follows from 𝑓1, 𝑓2 being input-preserving, so the 𝑓𝑖 terms are 0 when 𝑑𝐷𝑖 ≠ 𝑚𝐷𝑖 . Thus 𝑓1 ⊙ 𝑓2 is a kernel in X𝐶𝐼 . □ Lemma C.1.2 (Reflexivity and transitivity of order). The order ⊑ defined in X𝐶𝐼 is transitive and reflexive. 268 Proof. Let 𝑥 : Mem[𝐴] → D(Mem[𝑋]) ∈ 𝑀 , 𝑆 = ∅, 𝑣 = unit𝑋 . Then (𝑥 ⊕ unit𝑆) ⊙ 𝑣 = (𝑥 ⊕ unit∅) ⊙ unit𝑋 = 𝑥 ⊙ unit𝑋 (By lemma C.1.7) = 𝑥 Thus, we have 𝑥 ⊑ 𝑥, and the order is reflexive. For any 𝑥, 𝑦, 𝑧 ∈ 𝑀 , if 𝑥 ⊑ 𝑦 and 𝑦 ⊑ 𝑧, then by definition of ⊑, there exist 𝑆1 and 𝑣1 such that 𝑦 = (𝑥 ⊕ unit𝑆1) ⊙ 𝑣1, and there exist 𝑆2 and 𝑣2 such that 𝑧 = (𝑦 ⊕ unit𝑆2) ⊙ 𝑣2. We can now calculate: 𝑧 = (𝑦 ⊕ unit𝑆2) ⊙ 𝑣2 = (((𝑥 ⊕ unit𝑆1) ⊙ 𝑣1) ⊕ unit𝑆2) ⊙ 𝑣2 = (((𝑥 ⊕ unit𝑆1) ⊙ 𝑣1) ⊕ (unit𝑆2 ⊙ unit𝑆2)) ⊙ 𝑣2 = (𝑥 ⊕ unit𝑆1 ⊕ unit𝑆2) ⊙ (𝑣1 ⊕ unit𝑆2) ⊙ 𝑣2 (By C.1.8 and lemma C.1.9) = (𝑥 ⊕ unit𝑆1∪𝑆2) ⊙ ((𝑣1 ⊕ unit𝑆2) ⊙ 𝑣2) X𝐶𝐼 is closed under ⊕, ⊙, so (𝑣1 ⊕ unit𝑆2) ⊙ 𝑣2 ∈ X𝐶𝐼 . Thus, 𝑧 = (𝑥 ⊕ unit𝑆1∪𝑆2) ⊙ (𝑣1 ⊕ unit𝑆2) ⊙ 𝑣2 showing that 𝑥 ⊑ 𝑧. So the order is transitive. □ Next we prove that the parallel composition ⊕ is associative, commutative and identify its identity. 269 C.1.2 Associativity of Parallel Composition Lemma C.1.3 (⊕ - Associativity). We show that when ( 𝑓 ⊕ 𝑔) ⊕ ℎ and 𝑓 ⊕ (𝑔 ⊕ ℎ) are defined, ( 𝑓 ⊕ 𝑔) ⊕ ℎ = 𝑓 ⊕ (𝑔 ⊕ ℎ). Proof. Consider 𝑓 : Mem[𝑆] → D(Mem[𝑆∪𝑇]), 𝑔 : Mem[𝑈] → D(Mem[𝑈∪𝑉]), and ℎ : Mem[𝑊] → D(Mem[𝑊 ∪ 𝑋]). For any 𝑑 ∈ Mem[𝑆 ∪ 𝑈 ∪𝑊], and 𝑚 ∈ Mem[𝑆 ∪ 𝑇 ∪𝑈 ∪𝑉 ∪𝑊 ∪ 𝑋], (( 𝑓 ⊕ 𝑔) ⊕ ℎ) (𝑑) (𝑚) = ( 𝑓 (𝑑𝑆) (𝑚𝑆∪𝑇 ) · 𝑔(𝑑𝑈) (𝑚𝑈∪𝑉 ) ) · ℎ(𝑑𝑊 ) (𝑚𝑊∪𝑋) (def. ⊕) = 𝑓 (𝑑𝑆) (𝑚𝑆∪𝑇 ) · ( 𝑔(𝑑𝑈) (𝑚𝑈∪𝑉 ) · ℎ(𝑑𝑊 ) (𝑚𝑊∪𝑋) ) = ( 𝑓 ⊕ (𝑔 ⊕ ℎ)) (𝑑) (𝑚) □ Lemma C.1.4 (Standard associativity of ⊕). For any 𝑓1, 𝑓2, 𝑓3 ∈ 𝑀 , ( 𝑓1 ⊕ 𝑓2) ⊕ 𝑓3 is defined if and only if 𝑓1 ⊕ ( 𝑓2 ⊕ 𝑓3) is defined and they are equal. Proof. To show that ( 𝑓1 ⊕ 𝑓2) ⊕ 𝑓3 is defined if and only if 𝑓1 ⊕ ( 𝑓2 ⊕ 𝑓3) is defined, it suffices to show that 𝑅1 ∩ 𝑅2 = 𝐷1 ∩ 𝐷2 (C.1) (𝑅1 ∪ 𝑅2) ∩ 𝑅3 = (𝐷1 ∪ 𝐷2) ∩ 𝐷3 (C.2) if and only if 𝑅2 ∩ 𝑅3 = 𝐷2 ∩ 𝐷3 (C.3) 𝑅1 ∩ (𝑅2 ∪ 𝑅3) = 𝐷1 ∩ (𝐷2 ∪ 𝐷3) (C.4) We show that eq. (C.3) and eq. (C.4) follows from eq. (C.1) and eq. (C.2): Recall that 𝐷1 ⊆ 𝑅1, 𝐷2 ⊆ 𝑅2, 𝐷3 ⊆ 𝑅3, so 270 • Equation (C.3) follows from 𝐷2 ∩ 𝐷3 ⊆ 𝑅2 ∩ 𝑅3 and 𝐷2 ∩ 𝐷3 ⊇ 𝑅2 ∩ 𝑅3, which holds because 𝑅2 ∩ 𝑅3 = 𝑅2 ∩ ((𝑅1 ∪ 𝑅2) ∩ 𝑅3) = 𝑅2 ∩ ((𝐷1 ∪ 𝐷2) ∩ 𝐷3) (By eq. (C.2)) = 𝑅2 ∩ ((𝐷1 ∩ 𝐷3) ∪ (𝐷2 ∩ 𝐷3)) ⊆ ((𝑅2 ∩ 𝑅1) ∩ 𝐷3) ∪ (𝐷2 ∩ 𝐷3) (By 𝐷1 ⊆ 𝑅1) = ((𝐷2 ∩ 𝐷1) ∩ 𝐷3) ∪ (𝐷2 ∩ 𝐷3) (By eq. (C.1)) ⊆ 𝐷2 ∩ 𝐷3 • Equation (C.4) follows from (𝐷1 ∪ 𝐷2) ∩ 𝐷3 ⊆ (𝑅1 ∪ 𝑅2) ∩ 𝑅3 and (𝐷1 ∪ 𝐷2) ∩ 𝐷3 ⊇ (𝑅1 ∪ 𝑅2) ∩ 𝑅3, which holds because 𝑅1 ∩ (𝑅2 ∪ 𝑅3) = (𝑅1 ∩ 𝑅2) ∪ (𝑅1 ∩ 𝑅3) ⊆ (𝑅1 ∩ 𝑅2) ∪ (𝑅1 ∩ (𝑅1 ∪ 𝑅2) ∩ 𝑅3) = (𝐷1 ∩ 𝐷2) ∪ (𝑅1 ∩ (𝐷1 ∪ 𝐷2) ∩ 𝐷3) (By eq. (C.1) and eq. (C.2)) = (𝐷1 ∩ 𝐷2) ∪ ((𝑅1 ∩ 𝐷1 ∩ 𝐷3) ∪ (𝑅1 ∩ 𝐷2 ∩ 𝐷3)) ⊆ (𝐷1 ∩ 𝐷2) ∪ ((𝐷1 ∩ 𝐷3) ∪ (𝑅1 ∩ 𝑅2 ∩ 𝐷3)) (By 𝐷2 ⊆ 𝑅2) ⊆ (𝐷1 ∩ 𝐷2) ∪ ((𝐷1 ∩ 𝐷3) ∪ (𝐷1 ∩ 𝐷2 ∩ 𝐷3)) (By eq. (C.1)) ⊆ (𝐷1 ∩ 𝐷2) ∪ (𝐷1 ∩ 𝐷3) = 𝐷1 ∩ (𝐷2 ∪ 𝐷3) We show that eq. (C.1) and eq. (C.2) follows from eq. (C.3) and eq. (C.4): • Equation (C.1) follows from 𝐷1 ∩ 𝐷2 ⊆ 𝑅1 ∩ 𝑅2 and 𝐷1 ∩ 𝐷2 ⊇ 𝑅1 ∩ 𝑅2, 271 which holds because 𝑅1 ∩ 𝑅2 = (𝑅1 ∩ (𝑅2 ∪ 𝑅3)) ∩ 𝑅2 = (𝐷1 ∩ (𝐷2 ∪ 𝐷3)) ∩ 𝑅2 (By eq. (C.4)) = 𝐷1 ∩ ((𝐷2 ∩ 𝑅2) ∪ (𝐷3 ∩ 𝑅2)) = 𝐷1 ∩ (𝐷2 ∪ (𝐷3 ∩ 𝑅2)) ⊆ 𝐷1 ∩ (𝐷2 ∪ (𝑅3 ∩ 𝑅2)) (By 𝐷3 ⊆ 𝑅3) = 𝐷1 ∩ (𝐷2 ∪ (𝐷3 ∩ 𝐷2)) (By eq. (C.3)) = 𝐷1 ∩ 𝐷2 • Equation (C.2) follows from (𝐷1 ∪ 𝐷2) ∩ 𝐷3 ⊆ (𝑅1 ∪ 𝑅2) ∩ 𝑅3 and (𝐷1 ∪ 𝐷2) ∩ 𝐷3 ⊇ (𝑅1 ∪ 𝑅2) ∩ 𝑅3, which holds because (𝑅1 ∪ 𝑅2) ∩ 𝑅3 = (𝑅1 ∩ 𝑅3) ∪ (𝑅2 ∩ 𝑅3) = (𝑅1 ∩ (𝑅2 ∪ 𝑅3) ∩ 𝑅3) ∪ (𝑅2 ∩ 𝑅3) = (𝐷1 ∩ (𝐷2 ∪ 𝐷3) ∩ 𝑅3) ∪ (𝐷2 ∩ 𝐷3) (By eq. (C.4)) = (𝐷1 ∩ ((𝐷2 ∩ 𝑅3) ∪ (𝐷3 ∩ 𝑅3))) ∪ (𝐷2 ∩ 𝐷3) ⊆ (𝐷1 ∩ ((𝑅2 ∩ 𝑅3) ∪ 𝐷3)) ∪ (𝐷2 ∩ 𝐷3) (By 𝐷2 ⊆ 𝑅2, 𝐷3 ⊆ 𝑅3) = (𝐷1 ∩ ((𝐷2 ∩ 𝐷3) ∪ 𝐷3)) ∪ (𝐷2 ∩ 𝐷3) (By eq. (C.3)) = (𝐷1 ∩ 𝐷3) ∪ (𝐷2 ∩ 𝐷3) = (𝐷1 ∪ 𝐷2) ∩ 𝐷3 Thus, eq. (C.1) and eq. (C.2) hold if and only if eq. (C.3) and eq. (C.4) hold. Therefore, ( 𝑓1 ⊕ 𝑓2) ⊕ 𝑓3 is defined if and only if 𝑓1 ⊕ ( 𝑓2 ⊕ 𝑓3) is defined. By lemma C.1.3, they are equal when both defined. □ 272 C.1.3 Commutativity of Parallel Composition Lemma C.1.5 (⊕ - Commutativity). When 𝑓1 ⊕ 𝑓2 and 𝑓2 ⊕ 𝑓1 are both defined, 𝑓1 ⊕ 𝑓2 = 𝑓2 ⊕ 𝑓1. Proof. For any 𝑑 ∈ Mem[𝐷1 ∪ 𝐷2], 𝑚 ∈ D(Mem[𝑅1 ∪ 𝑅2]) such that 𝑑 ⊲⊳ 𝑚 is defined, ( 𝑓1 ⊕ 𝑓2) (𝑑) (𝑚) = 𝑓1(𝑑𝐷1) (𝑚𝑅1) · 𝑓2(𝑑𝐷2) (𝑚𝑅2) = 𝑓2(𝑑𝐷2) (𝑚𝑅2) · 𝑓1(𝑑𝐷1) (𝑚𝑅1) = ( 𝑓2 ⊕ 𝑓1) (𝑑) (𝑚) Thus, 𝑓1 ⊕ 𝑓2 = 𝑓2 ⊕ 𝑓1. □ Lemma C.1.6 (⊕ - Identity). For any 𝑓 : Mem[𝐴] → D(Mem[𝐴 ∪ 𝑋]) ∈ 𝑀 , and any 𝑆 ⊆ 𝐴, we must show 𝑓 ⊕ unit𝑆 = 𝑓 Proof. Since 𝑆 ⊆ 𝐴, we have dom( 𝑓 ⊕unit𝑆) = 𝐴∪𝑆 = 𝐴 = dom( 𝑓 ) and range( 𝑓 ⊕ unit𝑆) = 𝐴 ∪ 𝑋 ∪ 𝑆 = 𝐴 ∪ 𝑋 = range( 𝑓 ). For any 𝑑 ∈ Mem[𝐴], and any 𝑟 ∈ Mem[𝐴 ∪ 𝑋] such that 𝑑 ⊲⊳ 𝑟 is defined, we have ( 𝑓 ⊕ unit𝑆) (𝑑) (𝑟) = 𝑓 (𝑑) (𝑟) · unit(𝑑𝑆) (𝑟𝑆) = 𝑓 (𝑑) (𝑟) · 1 = 𝑓 (𝑑) (𝑟) If 𝑑 ⊲⊳ 𝑟 is not defined, then ( 𝑓 ⊕ unit𝑆) (𝑑) (𝑟) = 𝑓 (𝑑) (𝑟). Hence, 𝑓 ⊕ unit𝑆 = 𝑓 . □ 273 C.1.4 Other Properties Used in Proving Frame Conditions Lemma C.1.7. For any 𝑓 : Mem[𝐴] → D(Mem[𝐴 ∪ 𝑋]) ∈ 𝑀 , and any 𝑆 ⊆ 𝐴, we have 𝑓 ⊕ unit𝑆 = 𝑓 Proof. Since 𝑆 ⊆ 𝐴, we have dom( 𝑓 ⊕unit𝑆) = 𝐴∪𝑆 = 𝐴 = dom( 𝑓 ) and range( 𝑓 ⊕ unit𝑆) = 𝐴 ∪ 𝑋 ∪ 𝑆 = 𝐴 ∪ 𝑋 = range( 𝑓 ). For any 𝑑 ∈ Mem[𝐴], and any 𝑟 ∈ Mem[𝐴 ∪ 𝑋] such that 𝑑 ⊗ 𝑟 is defined, we have ( 𝑓 ⊕ unit𝑆) (𝑑) (𝑟) = 𝑓 (𝑑) (𝑟) · unit(𝑑𝑆) (𝑟𝑆) = 𝑓 (𝑑) (𝑟) · 1 = 𝑓 (𝑑) (𝑟) Hence, 𝑓 ⊕ unit𝑆 = 𝑓 . □ Lemma C.1.8 (Reverse Exchange Equality). We show that when both ( 𝑓1 ⊕ 𝑓2) ⊙ ( 𝑓3 ⊕ 𝑓4) and ( 𝑓1 ⊙ 𝑓3) ⊕ ( 𝑓2 ⊙ 𝑓4) are defined, it holds that ( 𝑓1 ⊕ 𝑓2) ⊙ ( 𝑓3 ⊕ 𝑓4) = ( 𝑓1 ⊙ 𝑓3) ⊕ ( 𝑓2 ⊙ 𝑓4). (C.5) Proof. First, the well-definedness of 𝑓1 ◦ 𝑓3 implies that 𝐷1 ⊆ 𝑅1 = 𝐷3 ⊆ 𝑅3, and the well-definedness of 𝑓2 ◦ 𝑓4 implies that 𝐷2 ⊆ 𝑅2 = 𝐷4 ⊆ 𝑅4. Moreover, both terms are of type Mem[𝐷1 ∪ 𝐷2] → D(Mem[𝑅3 ∪ 𝑅4]), and, for any 𝑑 ∈ 274 Mem[𝐷1 ∪ 𝐷2] and 𝑚 ∈ Mem[𝑅3 ∪ 𝑅4], we have: ( ( 𝑓1 ⊕ 𝑓2) ⊙ ( 𝑓3 ⊕ 𝑓4) ) (𝑑) (𝑚) = ( 𝑓1 ⊕ 𝑓2) (𝑑) (𝑚𝑅1∪𝑅2) · ( 𝑓3 ⊕ 𝑓4) (𝑚𝐷3∪𝐷4) (𝑚) (Equation (4.2)) = ( 𝑓1(𝑑𝐷1) (𝑚𝑅1) · 𝑓2(𝑑𝐷2) (𝑚𝑅2) ) · ( 𝑓3(𝑚𝐷3) (𝑚𝑅3) · 𝑓4(𝑚𝐷4) (𝑚𝑅4) ) ( ( 𝑓1 ⊙ 𝑓3) ⊕ ( 𝑓2 ⊙ 𝑓4) ) (𝑑) (𝑚) = ( 𝑓1 ⊙ 𝑓3) (𝑑𝐷1) (𝑚𝑅3) · ( 𝑓2 ⊙ 𝑓4) (𝑑𝐷2) (𝑚𝑅3) = ( 𝑓1(𝑑𝐷1) (𝑚𝑅1) · 𝑓3(𝑑𝐷3) (𝑚𝑅3) ) · ( 𝑓2(𝑑𝐷2) (𝑚𝑅2) · 𝑓4(𝑑𝐷4) (𝑚𝑅4) ) = ( 𝑓1(𝑑𝐷1) (𝑚𝑅1) · 𝑓2(𝑑𝐷2) (𝑚𝑅2) ) · ( 𝑓3(𝑚𝐷3) (𝑚𝑅3) · 𝑓4(𝑚𝐷4) (𝑚𝑅4) ) Thus, ( 𝑓1 ⊙ 𝑓3) ⊕ ( 𝑓2 ⊙ 𝑓4) = ( 𝑓1 ⊕ 𝑓2) ⊙ ( 𝑓3 ⊕ 𝑓4). □ Lemma C.1.9. For any 𝑓1, 𝑓2, 𝑓3, 𝑓4 in X𝐶𝐼 , ( 𝑓1 ⊙ 𝑓3) ⊕ ( 𝑓2 ⊙ 𝑓4) is defined implies ( 𝑓1 ⊕ 𝑓2) ⊙ ( 𝑓3 ⊕ 𝑓4) is also defined. The converse does not always hold, but if in addition, 𝑓1 ⊙ 𝑓3 and 𝑓2 ⊙ 𝑓4 are defined, then ( 𝑓1 ⊕ 𝑓2) ⊙ ( 𝑓3 ⊕ 𝑓4) is defined implies ( 𝑓1 ⊙ 𝑓3) ⊕ ( 𝑓2 ⊙ 𝑓4) is defined too. Proof. We prove each direction individually: • Given ( 𝑓1 ⊙ 𝑓3) ⊕ ( 𝑓2 ⊙ 𝑓4) is defined, it must that 𝑅1 = 𝐷3, 𝑅2 = 𝐷4, and 𝑅3 ∩ 𝑅4 = 𝐷1 ∩ 𝐷2. Thus, 𝑅1 ∩ 𝑅2 = 𝐷3 ∩ 𝐷4 ⊆ 𝑅3 ∩ 𝑅4 = 𝐷1 ∩ 𝐷2, ensuring that 𝑓1 ⊕ 𝑓2 is defined; 𝑅3 ∩ 𝑅4 = 𝐷1 ∩ 𝐷2 ⊆ 𝑅1 ∩ 𝑅2 = 𝐷3 ∩ 𝐷4, ensuring that 𝑓3 ⊕ 𝑓4 is defined; range( 𝑓1⊕ 𝑓2) = 𝑅1∪𝑅2 = 𝐷3∪𝐷4 = dom( 𝑓3⊕ 𝑓4), ensuring ( 𝑓1⊕ 𝑓2)⊙( 𝑓3⊕ 𝑓4) is defined. • Given 𝑓1 ⊙ 𝑓3 and 𝑓2 ⊙ 𝑓4 are defined, ( 𝑓1 ⊙ 𝑓3) ⊕ ( 𝑓2 ⊙ 𝑓4) is defined if 275 𝑅3 ∩ 𝑅4 = 𝐷1 ∩ 𝐷2. When ( 𝑓1 ⊕ 𝑓2) ⊙ ( 𝑓3 ⊕ 𝑓4) is defined, 𝑅3 ∩ 𝑅4 = 𝐷3 ∩ 𝐷4 (Because 𝑓3 ⊕ 𝑓4 is defined) = 𝑅1 ∩ 𝑅2 (Because 𝑓1 ⊙ 𝑓3 and 𝑓2 ⊙ 𝑓4 are defined) = 𝐷1 ∩ 𝐷2 (Because 𝑓1 ⊕ 𝑓2 is defined) So ( 𝑓1 ⊙ 𝑓3) ⊕ ( 𝑓2 ⊙ 𝑓4) is also defined. □ C.1.5 Main Theorem: Proving Frame Conditions Theorem 4.2.1. (X𝐶𝐼 , ⊑, ⊕̂, ⊙̂,X𝐶𝐼) is a DIBI frame. Proof. We restate the frame conditions using concrete definitions of ⊕ and ⊙ and then check that they hold. ⊕ Down-Closed We want to show that for any 𝑥′, 𝑥, 𝑦′, 𝑦 ∈ 𝑀 , if 𝑥′ ⊑ 𝑥 and 𝑦′ ⊑ 𝑦 and 𝑥 ⊕ 𝑦 = 𝑧, then 𝑥′ ⊕ 𝑦′ is defined, and 𝑥′ ⊕ 𝑦′ = 𝑧′ ⊑ 𝑧. Since 𝑥′ ⊑ 𝑥 and 𝑦′ ⊑ 𝑦, there exist sets 𝑆1, 𝑆2, and 𝑣1, 𝑣2 ∈ 𝑀 such that 𝑥 = (𝑥′ ⊕ unit𝑆1) ⊙ 𝑣1, and 𝑦 = (𝑦′ ⊕ unit𝑆2) ⊙ 𝑣2. Thus, 𝑥 ⊕ 𝑦 = ((𝑥′ ⊕ unit𝑆1) ⊙ 𝑣1) ⊕ ((𝑦′ ⊕ unit𝑆2) ⊙ 𝑣2) = ( (𝑥′ ⊕ unit𝑆1) ⊕ (𝑦′ ⊕ unit𝑆2) ) ⊙ (𝑣1 ⊕ 𝑣2) (By lemma C.1.9 and C.1.8) = ( (𝑥′ ⊕ 𝑦′) ⊕ (unit𝑆1 ⊕ unit𝑆2) ) ⊙ (𝑣1 ⊕ 𝑣2) (By commutativity and associativity) = ( (𝑥′ ⊕ 𝑦′) ⊕ (unit𝑆1∪𝑆2) ) ⊙ (𝑣1 ⊕ 𝑣2) This derivation proved that 𝑥′ ⊕ 𝑦′ is defined, and 𝑥′ ⊕ 𝑦′ ⊑ 𝑥 ⊕ 𝑦 = 𝑧. 276 ⊙ Up-Closed We want to show that for any 𝑧′, 𝑧, 𝑥, 𝑦 ∈ 𝑀 , if 𝑧 = 𝑥 ⊙ 𝑦 and 𝑧′ ⊒ 𝑧, then there exists 𝑥′, 𝑦′ such that 𝑥′ ⊒ 𝑥, 𝑦′ ⊒ 𝑦, and 𝑧′ = 𝑥′ ⊙ 𝑦′. Since 𝑧′ ⊒ 𝑧, there exist set 𝑆, and 𝑣 ∈ 𝑀 such that 𝑧′ = (𝑧 ⊕ unit𝑆) ⊙ 𝑣. Thus, 𝑧′ = (𝑧 ⊕ unit𝑆) ⊙ 𝑣 = ((𝑥 ⊙ 𝑦) ⊕ unit𝑆) ⊙ 𝑣 = ((𝑥 ⊙ 𝑦) ⊕ (unit𝑆 ⊙ unit𝑆)) ⊙ 𝑣 = ((𝑥 ⊕ unit𝑆) ⊙ (𝑦 ⊕ unit𝑆)) ⊙ 𝑣 (By lemma C.1.9 and C.1.8) = (𝑥 ⊕ unit𝑆) ⊙ ((𝑦 ⊕ unit𝑆) ⊙ 𝑣) (By standard associativity of ⊙) Thus, for 𝑥′ = 𝑥 ⊕ unit𝑆 and 𝑦′ = (𝑦 ⊕ unit𝑆) ⊙ 𝑣, 𝑧′ = 𝑥′ ⊙ 𝑦′. ⊕ Commutativity We want to show that 𝑧 = 𝑥 ⊕ 𝑦 implies that 𝑧 = 𝑦 ⊕ 𝑥. First, 𝑥 ⊕ 𝑦 is defined iff range(𝑥) ∩ range(𝑦) = dom(𝑥) ∩ dom(𝑦) iff 𝑦 ⊕ 𝑥 is defined; second, when 𝑥 ⊕ 𝑦 and 𝑦 ⊕ 𝑥 are both defined, they are equal due to lemma C.1.5. Thus, the ⊕ commutativity frame condition is satisfied. ⊕ Associativity We want to show that 𝑧 = (𝑥 ⊕ 𝑦) ⊕ 𝑧 implies that 𝑧 = 𝑥 ⊕ (𝑦 ⊕ 𝑧). We show that in lemma C.1.4. ⊕ Unit Existence We want to show that for any 𝑥 ∈ 𝑀 , there exists 𝑒 ∈ 𝐸 such that 𝑥 = 𝑒 ⊕ 𝑥. We show that 𝑒 = unit∅ serves as the unit under ⊕ for any 𝑥. For any 𝑥 : Mem[𝐴] → D(Mem[𝐵]), 𝑥 ⊕ unit∅ is defined because 𝐵 ∩ ∅ = ∅ = 𝐴 ∩ ∅, and by lemma C.1.7, (𝑥 ⊕ unit∅) = 𝑥. ⊕ Unit Coherence We want to show that for any 𝑦 ∈ 𝑀 , 𝑒 ∈ 𝐸 = 𝑀 , if 𝑥 = 𝑦 ⊕ 𝑒, 277 then 𝑥 ⊒ 𝑦. 𝑥 = 𝑦 ⊕ 𝑒 = (𝑦 ⊙ unitrange(𝑦)) ⊕ (unitdom(𝑒) ⊙ 𝑒) = (𝑦 ⊕ unitdom(𝑒)) ⊙ (unitrange(𝑦) ⊕ 𝑒) (By lemma C.1.8 and lemma C.1.9) = (𝑦 ⊕ unitdom(𝑒)) ⊙ (𝑒 ⊕ unitrange(𝑦)) (⊕ Commutativity) Thus, 𝑥 ⊒ 𝑦. ⊙ Associativity The frame axiom reduces to the standard associativity of ⊙. Kleisli composition satisfies standard associativity, so ⊙ also satisfies stan- dard associativity. ⊙ Unit ExistenceL and ⊙ Unit ExistenceR We need to show that, for any 𝑥 ∈ 𝑀 , there exists 𝑒 ∈ 𝐸 such that 𝑒 ⊙ 𝑥 = 𝑥, and there exists 𝑒′ ∈ 𝐸 such that 𝑥 ⊙ 𝑒′ = 𝑥. Since ⊙ is the Kleisli composition, for any morphism 𝑥 : Mem[𝐴] → D(Mem[𝐵]), unit𝐴 is the left unit, and unit𝐵 is the right unit. In addition, for all 𝑆, unit𝑆 ∈ 𝑀 = 𝐸 . ⊙ CoherenceR For any 𝑦 ∈ 𝑀, 𝑒 ∈ 𝐸 such that 𝑥 = 𝑦 ⊙ 𝑒, we want to show that 𝑥 ⊒ 𝑦. We proved in lemma C.1.7 that (𝑦 ⊕ unit∅) = 𝑦 for any 𝑦, so 𝑥 = 𝑦 ⊙ 𝑒 = (𝑦 ⊕ unit∅) ⊙ 𝑒, and 𝑥 ⊑ 𝑦 as desired. Unit Closure We want to show that for any 𝑒 ∈ 𝐸 and 𝑒′ ⊒ 𝑒, 𝑒′ ∈ 𝐸 . This is evident because 𝐸 = 𝑀 and 𝑀 is closed under ⊕ and ⊙. Reverse Exchange Given 𝑥 = 𝑦 ⊕ 𝑧 and 𝑦 = 𝑦1 ⊙ 𝑦2, 𝑧 = 𝑧1 ⊙ 𝑧2, we want to show that there exists 𝑢 = 𝑦1 ⊕ 𝑧1, 𝑣 = 𝑦2 ⊕ 𝑧2, and 𝑥 = 𝑢 ⊙ 𝑣. After substitution, we get (𝑦1 ⊙ 𝑦2) ⊕ (𝑧1 ⊙ 𝑧2) = 𝑦 ⊕ 𝑧 = 𝑥. By C.1.8 and lemma C.1.9, when (𝑦1 ⊙ 𝑦2) ⊕ (𝑧1 ⊙ 𝑧2) is defined, (𝑦1 ⊕ 𝑧1) ⊙ (𝑦2 ⊙ 𝑧2) 278 is also defined, and (𝑦1 ⊙ 𝑦2) ⊕ (𝑧1 ⊙ 𝑧2) = (𝑦1 ⊕ 𝑧1) ⊙ (𝑦2 ⊕ 𝑧2). Thus (𝑦1 ⊕ 𝑧1) ⊙ (𝑦2 ⊕ 𝑧2) = 𝑦 ⊕ 𝑧 = 𝑥, and thus 𝑢 = 𝑦1 ⊕ 𝑧1, 𝑣 = 𝑦2 ⊕ 𝑧2 completes the proof. □ C.2 Capturing Conditional Independence C.2.1 Properties of the Probabilistic Frame We prove some properties of the model that are useful for proving lemma C.2.8. Lemma C.2.1 (Disintegration). If 𝑓 = 𝑓1 ⊙ 𝑓2 , then 𝜋𝑅1 𝑓 = 𝑓1. Conversely, if 𝜋𝑅1 𝑓 = 𝑓1, then there exists 𝑔 such that 𝑓 = 𝑓1 ⊙ 𝑔. Proof. In short, it follows from properties of Kleisli category of discrete proba- bility monad: Kleisli category of discrete probability monad is a Markov cate- gory that has conditionals [Fritz, 2020, Example 11.2]; since the kernels are mor- phisms in this category, and the operator ⊙ is the morphism composition in the category, we have this lemma. We spell out the detailed proof in the following. For the forwards direction, suppose that 𝑓 = 𝑓1 ⊙ 𝑓2. Then, 𝜋𝑅1 𝑓 = 𝜋𝑅1 ( 𝑓1 ⊙ 𝑓2) = 𝑓1 ⊙ (𝜋𝑅1 𝑓2) = 𝑓1 ⊙ unit𝑅1 = 𝑓1. Thus, 𝜋𝑅1 𝑓 = 𝑓1. For the converse, assume 𝜋𝑅1 𝑓 = 𝑓1. Denote range( 𝑓 ) as 𝑅. Define 𝑔 : Mem[𝑅1] → D(Mem[𝑅]) such that for any 𝑟 ∈ Mem[𝑅1], 𝑚 ∈ Mem[𝑅] such 279 that 𝑓1(𝑟𝐷1) (𝑟) ≠ 0 , let 𝑔(𝑟) (𝑚) :=  𝑓 (𝑟𝐷1 ) (𝑚) 𝑓1 (𝑟𝐷1 ) (𝑟) 𝑟 ⊲⊳ 𝑚 is defined 0 𝑟 ⊲⊳ 𝑚 not defined We need to check that 𝑔 ∈ X𝐶𝐼 . Fixing any 𝑟 ∈ Mem[𝑅1],∑︁ 𝑚∈Mem[𝑅] 𝑔(𝑟) (𝑚) = ∑︁ 𝑚∈Mem[𝑅] and 𝑚⊲⊳𝑟 is defined 𝑓 (𝑟𝐷1) (𝑚) 𝑓1(𝑟𝐷1) (𝑟) (By definition of 𝑔) = ∑︁ 𝑦∈Mem[𝑅\𝑅′] 𝑓 (𝑟𝐷1) (𝑚) 𝑓1(𝑟𝐷1) (𝑟) = ∑︁ 𝑦∈Mem[𝑅\𝑅′] 𝑓 (𝑟𝐷1) (𝑦)∑ 𝑥∈Mem[𝑅\𝑅1] 𝑓 (𝑟𝐷1) (𝑟 ⊲⊳ 𝑥) (Because 𝜋𝑅1 𝑓 = 𝑓1) = 1 so 𝑔 does map any input to a distribution, and 𝑔 preserves the input. By their types, 𝑓1 ⊙ 𝑔 is defined. For any 𝑑 ∈ Mem[𝐷1], 𝑚 ∈ Mem[𝑅], if 𝑓1(𝑑) (𝑚𝑅1) ≠ 0, then ( 𝑓1 ⊙ 𝑔) (𝑑) (𝑚) = 𝑓1(𝑑) (𝑚𝑅1) · 𝑔(𝑚𝑅1) (𝑚) = 𝑓1(𝑑) (𝑚𝑅1) · 𝑓 (𝑚𝐷1) (𝑚) 𝑓1(𝑚𝐷1) (𝑚𝑅1) = 𝑓 (𝑑) (𝑚) (𝑑 ⊲⊳ 𝑚 is defined iff 𝑑 = 𝑚𝐷1) If (𝜋𝑅1 𝑓 ) (𝑑) (𝑚𝑅1) = 0, then 𝑓 (𝑑) (𝑚) = 0, and ( 𝑓1 ⊙ 𝑔) (𝑑) (𝑚) = 𝑓1(𝑑) (𝑚𝑅1) · 𝑔(𝑚𝑅1) (𝑚) = 0 = 𝑓 (𝑑) (𝑚). Thus, 𝑓1 ⊙ 𝑔 = 𝑓 . □ Lemma C.2.2 (Uniqueness). For any 𝑓 , 𝑔 : Mem[𝑋] → D(Mem[𝑋 ∪𝑌 ]) in 𝑀 , and arbitrary ℎ ∈ 𝑀 , if 𝑓 ⊑ ℎ and 𝑔 ⊑ ℎ, then 𝑓 = 𝑔. Proof. 𝑓 ⊑ ℎ implies that there exists 𝑣1, 𝑆1 such that ( 𝑓 ⊕ unit𝑆1) ⊙ 𝑣1 = ℎ; 𝑔 ⊑ ℎ implies that there exists 𝑣2, 𝑆2 such that (𝑔 ⊕unit𝑆2) ⊙ 𝑣2 = ℎ. Take ℎ : Mem[𝑊] → 280 D(Mem[𝑍 ∪𝑊]), and then 𝑓 ⊕ unit𝑆1 = 𝜋range( 𝑓 ⊕unit𝑆1 )ℎ = 𝜋𝑋∪𝑌∪dom(ℎ)ℎ 𝑔 ⊕ unit𝑆2 = 𝜋range(𝑔⊕unit𝑆2 )ℎ = 𝜋𝑋∪𝑌∪dom(ℎ)ℎ Thus, 𝑓 ⊕ unit𝑆1 = 𝑔 ⊕ unit𝑆2 . Now, suppose 𝑓 ≠ 𝑔. This would imply 𝑓 ⊕ unit𝑆1 ≠ 𝑔 ⊕ unit𝑆2 which is a contradiction. Thus, 𝑓 = 𝑔. □ Lemma C.2.3 ( ⊙ elimination). For any 𝑓 , 𝑔 ∈ X𝐶𝐼 , if 𝑓 ⊙ (𝑔 ⊕ unit𝑋) is defined and dom(𝑔) ⊆ dom( 𝑓 ), then 𝑓 ⊙ (𝑔 ⊕ unit𝑋) = 𝑔 ⊕ 𝑓 . Proof. Let 𝑓 : Mem[𝑆] → D(Mem[𝑆 ∪ 𝑇]) and 𝑔 : Mem[𝑈] → D(Mem[𝑈 ∪ 𝑉]) be in 𝑀 . When𝑈 ⊆ 𝑆, 𝑓 ⊙ (𝑔 ⊕ unit𝑋) = ( 𝑓 ⊕ unit𝑈) ⊙ (𝑔 ⊕ unit𝑋 ⊕ unit𝑆∪𝑇 ) (By C.1.7) = (unit𝑈 ⊕ 𝑓 ) ⊙ (𝑔 ⊕ unit𝑋 ⊕ unit𝑆∪𝑇 ) (By commutativity) = (unit𝑈 ⊕ 𝑓 ) ⊙ (𝑔 ⊕ unit𝑆∪𝑇 ) (†) = (unit𝑈 ⊙ 𝑔) ⊕ ( 𝑓 ⊙ unit𝑆∪𝑇 ) (By lemma C.1.9 and C.1.8) = 𝑔 ⊕ 𝑓 □ where † follows from 𝑋 ⊆ 𝑆 ∪ 𝑇 , which holds as 𝑓 ⊙ (𝑔 ⊕ unit𝑋) defined implies 𝑆 ∪ 𝑇 = 𝑋 ∪𝑈. Lemma C.2.4 (Converting ⊕ to ⊙). For any kernel 𝑓 : Mem[𝑆] → D(Mem[𝑆 ∪𝑇]) and 𝑔 : Mem[𝑈] → D(Mem[𝑈 ∪ 𝑉]) in X𝐶𝐼 . If 𝑓 ⊕ 𝑔 is defined, then 𝑓 ⊕ 𝑔 = ( 𝑓 ⊕ unit𝑈) ⊙ (unit𝑆∪𝑇 ⊕ 𝑔). 281 Proof. 𝑓 ⊕ 𝑔 = ( 𝑓 ⊙ unit𝑆∪𝑇 ) ⊕ (unit𝑈 ⊙ 𝑔) = ( 𝑓 ⊕ unit𝑈) ⊙ (unit𝑆∪𝑇 ⊕ 𝑔) (By lemma C.1.9 and C.1.8) □ Lemma C.2.5 (Quasi-Downwards-closure of ⊙). For any 𝑓 , 𝑔, ℎ, 𝑖 ∈ X𝐶𝐼 , if 𝑓 ⊑ ℎ, 𝑔 ⊑ 𝑖, and 𝑓 ⊙ 𝑔, ℎ ⊙ 𝑖 are all defined, then 𝑓 ⊙ 𝑔 ⊑ ℎ ⊙ 𝑖. Proof. Since 𝑓 ⊑ ℎ, 𝑔 ⊑ 𝑖, there must exist sets 𝑆1, 𝑆2 and 𝑣1, 𝑣2 ∈ 𝑀 such that ℎ = ( 𝑓 ⊕unit𝑆1) ⊙ 𝑣1, 𝑖 = (𝑔⊕unit𝑆2) ⊙ 𝑣2. 𝑓 ⊙𝑔 is defined, so dom(𝑔) = range( 𝑓 ) ⊆ range( 𝑓 ⊕ unit𝑆1) = dom(𝑣1). Thus, ℎ ⊙ 𝑖 = ( 𝑓 ⊕ unit𝑆1) ⊙ 𝑣1 ⊙ (𝑔 ⊕ unit𝑆2) ⊙ 𝑣2 = ( 𝑓 ⊕ unit𝑆1) ⊙ (𝑔 ⊕ 𝑣1) ⊙ 𝑣2 (By lemma C.2.3 and dom(𝑔) ⊆ dom(𝑣1)) = ( 𝑓 ⊕ unit𝑆1) ⊙ (𝑔 ⊕ unitdom(𝑣1)) ⊙ (unitrange(𝑔) ⊕ 𝑣1) ⊙ 𝑣2 (By lemma C.2.4) = ( 𝑓 ⊕ unit𝑆1) ⊙ (𝑔 ⊕ unit𝑆1) ⊙ (unitrange(𝑔) ⊕ 𝑣1) ⊙ 𝑣2 (†) = (( 𝑓 ⊙ 𝑔) ⊕ (unit𝑆1 ⊙ unit𝑆1)) ⊙ (unitrange(𝑔) ⊕ 𝑣1) ⊙ 𝑣2 (♥) = (( 𝑓 ⊙ 𝑔) ⊕ unit𝑆1) ⊙ (unitrange(𝑔) ⊕ 𝑣1) ⊙ 𝑣2 where † follows from dom(𝑔) = range( 𝑓 ) and lemma C.1.7, and ♥ follows from lemma C.1.9 and C.1.8. Therefore, 𝑓 ⊙ 𝑔 ⊑ ℎ ⊙ 𝑖. □ 282 C.2.2 Key Lemmas: Conditional Independence is Expressed Lemma C.2.6 (Classical flavor in intuitionistic model). For any 𝑓 ∈ 𝑀 , 𝑓 |= (∅ ⊲ 𝑍) # ((𝑍 ⊲ 𝑋) ∗ (𝑍 ⊲ 𝑌 )) if and only if there exist 𝑔, ℎ, 𝑖 ∈ 𝑀 , such that 𝑔 : Mem[∅] → D(Mem[𝑍]), ℎ : Mem[𝑍] → D(Mem[𝑍∪𝑋]), 𝑖 : Mem[𝑍] → D(Mem[𝑍∪𝑌 ]), and 𝑔⊙(ℎ⊕𝑖) ⊑ 𝑓 . Proof. The backwards direction trivially follows from persistence. We detail the proof for the forward direction here. Suppose 𝑓 |= (∅ ⊲ 𝑍) # ((𝑍 ⊲ 𝑋) ∗ (𝑍 ⊲ 𝑌 )). Then, there exist 𝑓1, 𝑓2, 𝑓3, 𝑓4 such that 𝑓1 ⊙ 𝑓2 = 𝑓 , 𝑓3 ⊕ 𝑓4 ⊑ 𝑓2, 𝑓1 |= (∅ ⊲ 𝑍), 𝑓3 |= (𝑍 ⊲ 𝑋) and 𝑓4 |= (𝑍 ⊲ 𝑌 ). • 𝑓1 |= (∅ ⊲ 𝑍) implies that there exists 𝑓 ′′1 ⊑ 𝑓1 such that dom( 𝑓 ′′1 ) = ∅, and range( 𝑓 ′′1 ) ⊇ 𝑍 . Let 𝑓 ′1 = 𝜋𝑍 𝑓 ′′ 1 . Note that 𝑓 ′1 : Mem[∅] → D(Mem[𝑍]) and 𝑓 ′1 ⊑ 𝑓 ′′1 ⊑ 𝑓1. Hence, there exists some set 𝑆1 and 𝑣1 ∈ 𝑀 such that 𝑓1 = ( 𝑓 ′1 ⊕ unit𝑆1) ⊙ 𝑣1. • 𝑓3 |= (𝑍 ⊲ 𝑋) implies that there exists 𝑓 ′′3 ⊑ 𝑓3 such that dom( 𝑓 ′′3 ) = 𝑍 , and range( 𝑓 ′′3 ) ⊇ 𝑋 . Define 𝑓 ′3 = 𝜋𝑍∪𝑋 𝑓 ′′3 . Then 𝑓 ′3 ⊑ 𝑓 ′′3 ⊑ 𝑓3, and 𝑓 ′3 : Mem[𝑍] → D(Mem[𝑋 ∪ 𝑍]). • 𝑓4 |= (𝑍 ⊲ 𝑌 ) implies that there exists 𝑓 ′′4 ⊑ 𝑓4 such that dom( 𝑓 ′′4 ) = 𝑍 , and range( 𝑓 ′′4 ) ⊇ 𝑌 . Define 𝑓 ′4 = 𝜋𝑍∪𝑌 𝑓 ′′4 and note that 𝑓 ′4 : Mem[𝑍] → D(Mem[𝑌 ∪ 𝑍]). • By ⊕ Down-Closed, having 𝑓3 ⊕ 𝑓4 defined implies that 𝑓 ′3 ⊕ 𝑓 ′ 4 is also de- fined and 𝑓 ′3 ⊕ 𝑓 ′ 4 ⊑ 𝑓3 ⊕ 𝑓4 ⊑ 𝑓2. Thus, there exists some 𝑣2 ∈ 𝑀 and finite set 𝑆2 such that 𝑓2 = ( 𝑓 ′3 ⊕ 𝑓 ′ 4 ⊕ unit𝑆2) ⊙ 𝑣2. 283 Using these observations, we can now calculate and show that 𝑓 ′1 ⊙ ( 𝑓 ′ 3 ⊕ 𝑓 ′4 ⊕ unit𝑍 ) ⊑ 𝑓1 ⊕ 𝑓2: 𝑓1 ⊙ 𝑓2 = ( 𝑓 ′1 ⊕ unit𝑆1) ⊙ 𝑣1 ⊙ ( 𝑓 ′3 ⊕ 𝑓 ′ 4 ⊕ unit𝑆2) ⊙ 𝑣2 = ( 𝑓 ′1 ⊕ unit𝑆1) ⊙ ( 𝑓 ′3 ⊕ 𝑓 ′ 4 ⊕ 𝑣1 ) ⊙ 𝑣2 (By lemma C.2.3 and dom( 𝑓 ′3 ⊕ 𝑓 ′ 4 ) = 𝑍 ⊆ range( 𝑓 ′1 ⊕ unit𝑆1)) = ( 𝑓 ′1 ⊕ unit𝑆1) ⊙ ( ( 𝑓 ′3 ⊕ 𝑓 ′ 4 ⊕ unitdom(𝑣1 ) ) ⊙ (unit𝑋∪𝑌∪𝑍 ⊕ 𝑣1) ) ⊙ 𝑣2 (By lemma C.2.4) = ( 𝑓 ′1 ⊕ unit𝑆1) ⊙ ( 𝑓 ′3 ⊕ 𝑓 ′ 4 ⊕ unit𝑍 ⊕ unit𝑆1) ⊙ (unit𝑋∪𝑌∪𝑍 ⊕ 𝑣1) ⊙ 𝑣2 (By dom(𝑣1) = 𝑍 ∪ 𝑆1) = ( ( 𝑓 ′1 ⊙ ( 𝑓 ′ 3 ⊕ 𝑓 ′ 4 ⊕ unit𝑍 )) ⊕ (unit𝑆1 ⊙ unit𝑆1) ) ⊙ (unit𝑋∪𝑌∪𝑍 ⊕ 𝑣1) ⊙ 𝑣2 (By lemma C.1.8 and lemma C.1.9) = ( ( 𝑓 ′1 ⊙ ( 𝑓 ′ 3 ⊕ 𝑓 ′ 4 ⊕ unit𝑍 )) ⊕ unit𝑆1 ) ⊙ (unit𝑋∪𝑌∪𝑍 ⊕ 𝑣1) ⊙ 𝑣2 = ( ( 𝑓 ′1 ⊙ ( 𝑓 ′ 3 ⊕ 𝑓 ′ 4 )) ⊕ unit𝑆1 ) ⊙ (unit𝑋∪𝑌∪𝑍 ⊕ 𝑣1) ⊙ 𝑣2 (By lemma C.1.7) To finish, take 𝑔= 𝑓 ′1 : Mem[∅] → D(Mem[𝑍]), ℎ= 𝑓 ′3 : Mem[𝑍] → D(Mem[𝑍 ∪ 𝑋]), 𝑖= 𝑓 ′4 : Mem[𝑍] → D(Mem[𝑍 ∪𝑌 ]), and note that 𝑔⊙ (ℎ⊕ 𝑖) = 𝑓 ′1 ⊙ ( 𝑓 ′ 3 ⊕ 𝑓 ′ 4) ⊑ 𝑓1 ⊕ 𝑓2 ⊑ 𝑓 . □ Lemma C.2.7. If 𝑋,𝑌 are conditionally independent given 𝑆, then values on 𝑋 ∩ 𝑌 is determined given values on 𝑆. Proof. In short, conditional independence is closed under projection (of the con- ditionally independent components). Thus, 𝑋 ∩ 𝑌 is conditionally independent to itself given 𝑆. Any random variable independent to itself must be determin- istic. Thus, 𝑋 ∩𝑌 is deterministic given 𝑆. We spell out the detailed proof below. Let 𝑋′ = 𝑋 \𝑌 , 𝑌 ′ = 𝑌 \ 𝑋 . By assumption, 𝑋,𝑌 are conditionally independent 284 given 𝑆 , so 𝑥 ∈ Mem[𝑋], 𝑦 ∈ Mem[𝑌 ], 𝑠 ∈ Mem[𝑆], 𝜇(𝑋 = 𝑥 | 𝑆 = 𝑠) · 𝜇(𝑌 = 𝑦 | 𝑆 = 𝑠) = 𝜇(𝑋 = 𝑥 ∩ 𝑌 = 𝑦 | 𝑆 = 𝑠). Thus, if we denote 𝑥′ = 𝜋𝑋 ′𝑥, 𝑦′ = 𝜋𝑌 ′𝑦 and let 𝑀 = 𝑋 ∩ 𝑌 , then for any 𝑚 ∈ Mem[𝑀], 𝜇(𝑋′ = 𝑥′ ∩ 𝑌 ′ = 𝑦′ ∩ 𝑀 = 𝑚 | 𝑆 = 𝑠) = 𝜇(𝑋′ = 𝑥′ ∩ 𝑀 = 𝑚 | 𝑆 = 𝑠) · 𝜇(𝑌 ′ = 𝑦′ ∩ 𝑀 = 𝑚 | 𝑆 = 𝑠) (C.6) For any probabilistic events 𝐸1, 𝐸2, 𝐸3, 𝜇(𝐸1 ∩ 𝐸2 | 𝐸3) = 𝜇(𝐸1 | 𝐸2, 𝐸3) · 𝜇(𝐸2 | 𝐸3). Thus, eq. (C.6) implies that 𝜇(𝑋 ′ = 𝑥′ | 𝑀 = 𝑚 ∩ 𝑆 = 𝑠) · 𝜇(𝑌 ′ = 𝑦′ | 𝑀 = 𝑚 ∩ 𝑆 = 𝑠) · 𝜇(𝑀 = 𝑚 | 𝑆 = 𝑠) = 𝜇(𝑋 ′ = 𝑥′ ∩ 𝑌 ′ = 𝑦′ | 𝑀 = 𝑚 ∩ 𝑆 = 𝑠) (C.7) Then, for any 𝑠 ∈ Mem[𝑆], 𝑚 ∈ Mem[𝑀] such that 𝜇(𝑀 = 𝑚 ∩ 𝑆 = 𝑠) ≠ 0,∑︁ 𝑥′∈Mem[𝑋 ′]∩𝑦′∈Mem[𝑌 ′] 𝜇(𝑋′ = 𝑥′ | 𝑀 = 𝑚 ∩ 𝑆 = 𝑠) · 𝜇(𝑌 ′ = 𝑦′ | 𝑀 = 𝑚, 𝑆 = 𝑠) · 𝜇(𝑀 = 𝑚 | 𝑆 = 𝑠) = ∑︁ 𝑥′∈Mem[𝑋 ′],𝑦′∈Mem[𝑌 ′] 𝜇(𝑋′ = 𝑥′, 𝑌 ′ = 𝑦′ | 𝑀 = 𝑚, 𝑆 = 𝑠) (Because of eq. (C.7)) = 1 (C.8) Meanwhile, for any 𝑠 ∈ Mem[𝑆], 𝑚 ∈ Mem[𝑀] such that 𝑚 ⊲⊳ 𝑠 is defined and 𝜇(𝑀 = 𝑚, 𝑆 = 𝑠) ≠ 0,∑︁ 𝑥′∈Mem[𝑋′ ],𝑦′∈Mem[𝑌 ′ ] 𝜇(𝑋 ′ = 𝑥′ | 𝑀 = 𝑚 ∩ 𝑆 = 𝑠) · 𝜇(𝑌 ′ = 𝑦′ | 𝑀 = 𝑚 ∩ 𝑆 = 𝑠) · 𝜇(𝑀 = 𝑚 | 𝑆 = 𝑠) = ©­« ∑︁ 𝑥′∈Mem[𝑋′ ],𝑦′∈Mem[𝑌 ′ ] 𝜇(𝑋 ′ = 𝑥′ | 𝑀 = 𝑚, 𝑆 = 𝑠) · 𝜇(𝑌 ′ = 𝑦′ | 𝑀 = 𝑚 ∩ 𝑆 = 𝑠)ª®¬ · 𝜇(𝑀 = 𝑚 | 𝑆 = 𝑠) = ©­« ∑︁ 𝑥′∈Mem[𝑋′ ] 𝜇(𝑋 ′ = 𝑥′ | 𝑀 = 𝑚 ∩ 𝑆 = 𝑠)ª®¬ · ©­« ∑︁ 𝑦′∈Mem[𝑌 ′ ] 𝜇(𝑌 ′ = 𝑦′ | 𝑀 = 𝑚 ∩ 𝑆 = 𝑠)ª®¬ · 𝜇(𝑀 = 𝑚 | 𝑆 = 𝑠) = 1 · 𝜇(𝑀 = 𝑚 | 𝑆 = 𝑠) (C.9) 285 Combining eq. (C.9) and eq. (C.8), we derive 𝜇(𝑀 = 𝑚 | 𝑆 = 𝑠) = 1. That is, when 𝑋 ⊥⊥ 𝑌 | 𝑆, whether 𝑀 ⊇ 𝑆 or not, 𝜇(𝑀 = 𝑚, 𝑆 = 𝑠) ≠ 0 implies 𝜇(𝑀 = 𝑚 | 𝑆 = 𝑠) = 1. Thus, 𝑋 ⊥⊥ 𝑌 | 𝑆 renders values on 𝑋 ∩ 𝑌 deterministic given values on 𝑆. □ Lemma C.2.8. For a distribution 𝜇 on Var, let 𝑓𝜇 denote the kernel ⟨⟩ ↦→ 𝜇. Then, there exist 𝑆, 𝑋,𝑌 ⊆ Var, 𝑓1 : Mem[∅] → D(Mem[𝑆]), 𝑓2 : Mem[𝑆] → D(Mem[𝑆 ∪ 𝑋]), 𝑓3 : Mem[𝑆] → D(Mem[𝑆∪𝑌 ]) such that 𝑓1⊙ ( 𝑓2⊕ 𝑓3) ⊑ 𝑓𝜇, if and only if 𝑋 ⊥⊥ 𝑌 | 𝑆 and also 𝑋 ∩ 𝑌 ⊆ 𝑆. Proof. Forward direction: Assume the existence of 𝑓1, 𝑓2, 𝑓3 satisfying 𝑓1 ⊙ ( 𝑓2 ⊕ 𝑓3) ⊑ 𝑓𝜇. We must prove 𝑋 ⊥⊥ 𝑌 | 𝑆 and 𝑋 ∩ 𝑌 ⊆ 𝑆. 1. 𝑋 ∩ 𝑌 ⊆ 𝑆: 𝑓2 ⊕ 𝑓3 defined implies (𝑋 ∪ 𝑆) ∩ (𝑌 ∪ 𝑆) ⊆ 𝑆∩ 𝑆. Thus, 𝑋 ∩𝑌 ⊆ 𝑆. 2. 𝑋 ⊥⊥ 𝑌 | 𝑆: By assumption, 𝑓1 ⊙ ( 𝑓2 ⊕ 𝑓3) ⊑ 𝑓𝜇. lemma C.2.1 gives us 𝑓1 ⊙ ( 𝑓2 ⊕ 𝑓3) = 𝜋𝑆∪𝑋∪𝑌 ( 𝑓𝜇), and 𝑓1 = 𝜋𝑆 ( 𝑓𝜇). Thus, for any 𝑚 ∈ Mem[𝑋 ∪𝑌 ∪ 𝑆], 𝜇(𝑋 = 𝑚𝑋 , 𝑌 = 𝑚𝑌 , 𝑆 = 𝑚𝑆) = (𝜋𝑋∪𝑌∪𝑆𝜇) (𝑚𝑋 ⊲⊳ 𝑚𝑌 ⊲⊳ 𝑚𝑆) = 𝜋𝑋∪𝑌∪𝑆 ( 𝑓𝜇) (⟨⟩)(𝑚𝑋 ⊲⊳ 𝑚𝑌 ⊲⊳ 𝑚𝑆) = 𝑓1 ⊙ ( 𝑓2 ⊕ 𝑓3) (⟨⟩)(𝑚𝑋 ⊲⊳ 𝑚𝑌 ⊲⊳ 𝑚𝑆) Similarly, since 𝑓1 = 𝜋𝑆 ( 𝑓𝜇), we have 𝜇(𝑆 = 𝑚𝑆) = (𝜋𝑆𝜇) (𝑚𝑆) = ( 𝜋𝑆 ( 𝑓𝜇) ) (⟨⟩)(𝑚𝑆) = 𝑓1(⟨⟩)(𝑚𝑆) (C.10) 286 By definition of conditional probability, when 𝜇(𝑆 = 𝑚𝑆) ≠ 0, 𝜇(𝑋 = 𝑚𝑋 , 𝑌 = 𝑚𝑌 | 𝑆 = 𝑚𝑆) = 𝜇(𝑋 = 𝑚𝑋 , 𝑌 = 𝑚𝑌 , 𝑆 = 𝑚𝑆) 𝜇(𝑆 = 𝑚𝑆) = 𝑓1 ⊙ ( 𝑓2 ⊕ 𝑓3) (⟨⟩)(𝑚𝑆 ⊲⊳ 𝑚𝑋 ⊲⊳ 𝑚𝑌 ) 𝑓1(⟨⟩)(𝑚𝑆) = 𝑓1(⟨⟩)(𝑚𝑆) · ( 𝑓2 ⊕ 𝑓3) (𝑚𝑆) (𝑚𝑆 ⊲⊳ 𝑚𝑋 ⊲⊳ 𝑚𝑌 ) 𝑓1(⟨⟩)(𝑚𝑆) (By eq. (4.2)) = ( 𝑓2 ⊕ 𝑓3) (𝑚𝑆) (𝑚𝑆 ⊲⊳ 𝑚𝑋 ⊲⊳ 𝑚𝑌 ) = 𝑓2(𝑚𝑆) (𝑚𝑋∪𝑆) · 𝑓3(𝑚𝑆) (𝑚𝑌∪𝑆) (C.11) Let 𝑓 ′2 = 𝑓2 ⊕ unitMem[𝑌 ] , 𝑓 ′3 = 𝑓3 ⊕ unitMem[𝑋] . By lemma C.2.4, 𝑓1 ⊙ ( 𝑓2 ⊕ 𝑓3) = 𝑓1 ⊙ 𝑓2 ⊙ ( 𝑓3 ⊕ unitMem[𝑋]) = 𝑓1 ⊙ 𝑓2 ⊙ 𝑓 ′3 𝑓1 ⊙ ( 𝑓2 ⊕ 𝑓3) = 𝑓1 ⊙ ( 𝑓3 ⊕ 𝑓2) = 𝑓1 ⊙ 𝑓3 ⊙ ( 𝑓2 ⊕ unitMem[𝑌 ]) = 𝑓1 ⊙ 𝑓3 ⊙ 𝑓 ′2 Lemma C.2.1 gives us 𝜋𝑋∪𝑆 ( 𝑓𝜇) = 𝑓1 ⊙ 𝑓2, and 𝜋𝑌∪𝑆 ( 𝑓𝜇) = 𝑓1 ⊙ 𝑓3, Therefore, 𝜇(𝑋 = 𝑚𝑋 , 𝑆 = 𝑚𝑆) = (𝜋𝑋∪𝑆 ( 𝑓𝜇)) (⟨⟩)(𝑚𝑆 ⊲⊳ 𝑚𝑋) = ( 𝑓1 ⊙ 𝑓2) (⟨⟩)(𝑚𝑆 ⊲⊳ 𝑚𝑋) = 𝑓1(⟨⟩)(𝑚𝑆) · 𝑓2(𝑚𝑆) (𝑚𝑆 ⊲⊳ 𝑚𝑋) 𝜇(𝑌 = 𝑚𝑌 , 𝑆 = 𝑚𝑆) = (𝜋𝑌∪𝑆 ( 𝑓𝜇) (⟨⟩)(𝑚𝑆 ⊲⊳ 𝑚𝑌 ) = ( 𝑓1 ⊙ 𝑓3) (⟨⟩)(𝑚𝑆 ⊲⊳ 𝑚𝑌 ) = 𝑓1(⟨⟩)(𝑚𝑆) · 𝑓3(𝑚𝑆) (𝑚𝑆 ⊲⊳ 𝑚𝑌 ) 287 Thus, by definition of conditional probability. 𝜇(𝑋 = 𝑚𝑋 | 𝑆 = 𝑚𝑆) = 𝜇(𝑋 = 𝑚𝑋 , 𝑆 = 𝑚𝑆) 𝜇(𝑆 = 𝑚𝑆) = 𝑓1(⟨⟩)(𝑚𝑆) · 𝑓2(𝑚𝑆) (𝑚𝑆∪𝑋) 𝑓1(⟨⟩)(𝑚𝑆) = 𝑓2(𝑚𝑆) (𝑚𝑆∪𝑋) (C.12) 𝜇(𝑋 = 𝑚𝑌 | 𝑆 = 𝑚𝑆) = 𝜇(𝑋 = 𝑚𝑋 , 𝑆 = 𝑚𝑆) 𝜇(𝑆 = 𝑚𝑆) = 𝑓1(⟨⟩)(𝑚𝑆) · 𝑓3(𝑚𝑆) (𝑚𝑆∪𝑌 ) 𝑓1(⟨⟩)(𝑚𝑆) = 𝑓3(𝑚𝑆) (𝑚𝑆∪𝑌 ) (C.13) Substituting eq. (C.12) and eq. (C.13) into the equation eq. (C.11), we have 𝜇(𝑋 = 𝑚𝑋 , 𝑌 = 𝑚𝑌 | 𝑆 = 𝑚𝑆) = 𝜇(𝑋 = 𝑚𝑋 | 𝑆 = 𝑚𝑆) · 𝜇(𝑋 = 𝑚𝑌 | 𝑆 = 𝑚𝑆)) Thus, 𝑋,𝑌 are conditionally independent given 𝑆. This completes the proof for the first direction. Backward direction: We want to show that if 𝑋 ⊥⊥ 𝑌 | 𝑆 and 𝑋 ∩ 𝑌 ⊆ 𝑆 then there exists such 𝑓1, 𝑓2, 𝑓3 that 𝑓1 ⊙ ( 𝑓2 ⊕ 𝑓3) ⊑ 𝑓𝜇. Given 𝜇, we define 𝑓1 = 𝜋𝑆 ( 𝑓𝜇) and construct 𝑓2, 𝑓3 as follows: Let 𝑓2 : Mem[𝑆] → D(Mem[𝑆 ∪ 𝑋]). For any 𝑠 ∈ Mem[𝑆], 𝑥 ∈ Mem[𝑋], when 𝑓1(⟨⟩)(𝑠) ≠ 0, let 𝑓2(𝑠) (𝑠 ⊲⊳ 𝑥) :=  (𝜋𝑆∪𝑋 𝑓𝜇) (⟨⟩)(𝑠⊲⊳𝑥) 𝑓1 (⟨⟩)(𝑠) if 𝑠 ⊲⊳ 𝑥 is defined 0 otherwise When 𝑓1(⟨⟩)(𝑠) = 0, we can define 𝑓2(𝑠) (𝑠 ⊲⊳ 𝑥) arbitrarily as long as 𝑓2(𝑠) is a distribution, because that distribution will be zeroed out in 𝑓1⊙ ( 𝑓2⊕ 𝑓3) anyway. 288 Similarly, let 𝑓3 : Mem[𝑆] → D(Mem[𝑆 ∪ 𝑌 ]). For any 𝑠 ∈ Mem[𝑆], 𝑥 ∈ Mem[𝑌 ] such that 𝑠 ⊲⊳ 𝑦 is defined, when 𝑓1(⟨⟩)(𝑠) ≠ 0, let 𝑓3(𝑠) (𝑠 ⊲⊳ 𝑦) :=  (𝜋𝑆∪𝑌 𝑓𝜇) (𝑠⊲⊳𝑦) 𝑓1 (⟨⟩)(𝑠) if 𝑠 ⊲⊳ 𝑦 is defined 0 otherwise By construction, 𝑓1, 𝑓2, 𝑓3 each has the type needed for the lemma. We are left to prove that given any 𝑠 ∈ Mem[𝑆], 𝑓2 and 𝑓3 are kernels in X𝐶𝐼 , 𝑓1 ⊙ ( 𝑓2 ⊕ 𝑓3) is defined, and 𝑓1 ⊙ ( 𝑓2 ⊕ 𝑓3) ⊑ 𝑓𝜇. • State 𝑓2 is in X𝐶𝐼 , which boils down to show that: for any 𝑠 ∈ Mem[𝑆], 𝑓2(𝑠) forms a distribution, and also 𝑓2 preserves the input. It can be shown through by mechanical calculation and we omit it here. • State 𝑓3 is in X𝐶𝐼 . Similar as above. • State 𝑓1 ⊙ ( 𝑓2 ⊕ 𝑓3) is defined. 𝑓2 ⊕ 𝑓3 is defined because 𝑅2 ∩ 𝑅3 = (𝑆 ∪ 𝑋) ∩ (𝑆 ∪𝑌 ) = 𝑆 ∪ (𝑋 ∩𝑌 ), and by assumption, 𝑋 ∩𝑌 ⊆ 𝑆, so 𝑆 ∪ (𝑋 ∩𝑌 ) = 𝑆 = 𝐷2 ∩ 𝐷3. Then 𝑓1 ⊙ ( 𝑓2 ⊕ 𝑓3) is defined because dom( 𝑓2 ⊕ 𝑓3) = 𝐷2 ∪ 𝐷3 = 𝑆 ∪ 𝑆 = 𝑆 = range( 𝑓1). • State 𝑓1 ⊙ ( 𝑓2 ⊕ 𝑓3) ⊑ 𝑓𝜇. It suffices to show that there exists 𝑔 such that ( 𝑓1 ⊙ ( 𝑓2 ⊕ 𝑓3)) ⊙ 𝑔 = 𝑓𝜇. For any 𝑠 ∈ Mem[𝑆], 𝑥 ∈ Mem[𝑋], 𝑦 ∈ Mem[𝑌 ] such that 𝑠 ⊲⊳ 𝑥 ⊲⊳ 𝑦 is defined, 𝑓1 ⊙ ( 𝑓2 ⊕ 𝑓3) (⟨⟩)(𝑠 ⊲⊳ 𝑥 ⊲⊳ 𝑦) (C.14) = 𝑓1(⟨⟩)(𝑠) · 𝑓2 ⊕ 𝑓3(𝑠) (𝑠 ⊲⊳ 𝑥 ⊲⊳ 𝑦) = 𝑓1(⟨⟩)(𝑠) · ( 𝑓2(𝑠) (𝑠 ⊲⊳ 𝑥) · 𝑓3(𝑠) (𝑠 ⊲⊳ 𝑦)) = 𝜇(𝑆 = 𝑠) · (𝜇(𝑋 = 𝑥 | 𝑆 = 𝑠) · 𝜇(𝑌 = 𝑦 | 𝑆 = 𝑠)) (C.15) 289 Because 𝑋,𝑌 are conditionally independent given 𝑆 in the distribution 𝑞, 𝜇(𝑋 = 𝑥 | 𝑆 = 𝑠) · 𝜇(𝑌 = 𝑦 | 𝑆 = 𝑠) = 𝜇(𝑋 = 𝑥,𝑌 = 𝑦 | 𝑆 = 𝑠) (C.16) Substituting eq. (C.16) into eq. (C.15), we have 𝑓1 ⊙ ( 𝑓2 ⊕ 𝑓3) (⟨⟩)(𝑠 ⊲⊳ 𝑥 ⊲⊳ 𝑦) = 𝜇(𝑆 = 𝑠) · 𝜇(𝑋 = 𝑥,𝑌 = 𝑦 | 𝑆 = 𝑠) = 𝜇(𝑋 = 𝑥,𝑌 = 𝑦, 𝑆 = 𝑠) Let 𝑔 : Mem[𝑋 ∪ 𝑌 ∪ 𝑆] → D(Mem[Val]) such that for any 𝑑 ∈ Mem[𝑋 ∪ 𝑌 ∪ 𝑆], 𝑚 ∈ Mem[Val] such that 𝑑 ⊲⊳ 𝑚 is defined, let 𝑔(𝑑) (𝑚) = 𝜇(Val = 𝑚 | 𝑋 ∪ 𝑌 ∪ 𝑆 = 𝑑) Then, ( 𝑓1 ⊙ ( 𝑓2 ⊕ 𝑓3)) ⊙ 𝑔 is defined, and ( 𝑓1 ⊙ ( 𝑓2 ⊕ 𝑓3) ⊙ 𝑔) (⟨⟩)(𝑚) = ( 𝑓1 ⊙ ( 𝑓2 ⊕ 𝑓3)) (⟨⟩)(𝑚𝑋∪𝑌∪𝑆) · 𝑔(𝑚𝑋∪𝑌∪𝑆) (𝑚) = 𝜇(Val = 𝑚) Thus, ( 𝑓1 ⊙ ( 𝑓2 ⊕ 𝑓3)) ⊙ 𝑔 = 𝑓𝜇, and therefore 𝑓1 ⊙ ( 𝑓2 ⊕ 𝑓3) ⊑ 𝑓𝜇. This completes the proof for the backwards direction. □ C.2.3 Validating Graphoid Axioms, Section 4.2.3 Lemma C.2.9 (Weak Union). The following judgment is valid in X𝐶𝐼 : |= [𝑍] # ( [𝑋] ∗ [𝑌 ∪𝑊]) → [𝑍 ∪𝑊] # ( [𝑋] ∗ [𝑌 ]) Proof. For any 𝑓 ∈ X𝐶𝐼 , if 𝑓 |= [𝑍] # ( [𝑋] ∗ [𝑌 ∪ 𝑊]), by lemma C.2.6, there exist 𝑓1, 𝑓2, 𝑓3 ∈ 𝑀 such that 𝑓1 ⊙ ( 𝑓2 ⊕ 𝑓3) ⊑ 𝑓 , 𝑓1 : Mem[∅] → D(Mem[𝑍]), 𝑓2 : Mem[𝑍] → D(Mem[𝑍 ∪ 𝑋]), 𝑓3 : Mem[𝑍] → D(Mem[𝑍 ∪ 𝑌 ∪𝑊]). 290 Let 𝑓 1 3 = 𝜋𝑍∪𝑊 𝑓3, then by Disintegration there exists 𝑓 2 3 ∈ 𝑀 such that 𝑓3 = 𝑓 1 3 ⊙ 𝑓 2 3 . We note that 𝑓1 ⊙ ( 𝑓2 ⊕ 𝑓3) = 𝑓1 ⊙ 𝑓3 ⊙ (unit𝑍∪𝑌∪𝑊 ⊕ 𝑓2) (By lemma C.2.4) = 𝑓1 ⊙ 𝑓3 ⊙ (unit𝑌∪𝑊 ⊕ 𝑓2) (By dom( 𝑓2) = 𝑍) = 𝑓1 ⊙ ( 𝑓 1 3 ⊙ 𝑓 2 3 ) ⊙ (unit𝑌∪𝑊 ⊕ 𝑓2) = 𝑓1 ⊙ 𝑓 1 3 ⊙ ( 𝑓 2 3 ⊙ (unit𝑌∪𝑊 ⊕ 𝑓2)) = 𝑓1 ⊙ 𝑓 1 3 ⊙ (( 𝑓2 ⊕ unit𝑊 ) ⊕ 𝑓 2 3 ) (†) where † follows from lemma C.2.3 and dom( 𝑓2 ⊕ unit𝑊 ) = 𝑍 ∪𝑊 ⊆ range( 𝑓 1 3 ). Thus, 𝑓1 ⊙ 𝑓 1 3 ⊙ (( 𝑓2 ⊕ unit𝑊 ) ⊕ 𝑓 2 3 ) ⊑ 𝑓 . Note that 𝑓1 ⊙ 𝑓 1 3 has type Mem[∅] → D(Mem[𝑍 ∪𝑊]), so 𝑓1 ⊙ 𝑓 1 3 |= (∅ ⊲ 𝑍 ∪𝑊). State 𝑓2 ⊕ unit𝑊 has type Mem[𝑍 ∪ 𝑊] → D(Mem[𝑍 ∪ 𝑊 ∪ 𝑋]), so 𝑓2 ⊕unit𝑊 |= (𝑍 ∪𝑊 ⊲ 𝑋). State 𝑓 2 3 has type Mem[𝑍 ∪𝑊] → D(Mem[𝑍 ∪𝑊 ∪𝑌 ]), so 𝑓 2 3 |= (𝑍 ∪𝑊 ⊲ 𝑌 ). Therefore, 𝑓1 ⊙ 𝑓 1 3 ⊙ (( 𝑓2 ⊕ unit𝑊 ) ⊕ 𝑓 2 3 ) |= (∅ ⊲ 𝑍 ∪𝑊) # (𝑍 ∪𝑊 ⊲ 𝑋) ∗ (𝑍 ∪𝑊 ⊲ 𝑌 ). By persistence, 𝑓 |= [𝑍 ∪𝑊] # ( [𝑋] ∗ [𝑌 ]), and Weak Union is valid. □ Lemma C.2.10 (Contraction). The following judgment is valid in X𝐶𝐼 : |= ( [𝑍] # ( [𝑋] ∗ [𝑌 ])) ∧ ([𝑍 ∪ 𝑌 ] # ( [𝑋] ∗ [𝑊])) → [𝑍] # ( [𝑋] ∗ [𝑌 ∪𝑊]) Proof. If ℎ |= ( [𝑍] # ( [𝑋] ∗ [𝑌 ])) ∧ ([𝑍 ∪ 𝑌 ] # ( [𝑋] ∗ [𝑊])), then • ℎ |= [𝑍] # ( [𝑋] ∗ [𝑌 ]). By the Classical flavor in intuitionistic model lemma, there exists 𝑓1, 𝑓2, 𝑓3 such that 𝑓1 : Mem[∅] → D(Mem[𝑍]), 𝑓2 : Mem[𝑍] → D(Mem[𝑍 ∪ 𝑋]), 𝑓3 : Mem[𝑍] → D(Mem[𝑍 ∪ 𝑌 ]), and 𝑓1 ⊙ ( 𝑓2 ⊕ 𝑓3) ⊑ ℎ. Note 𝑓1 ⊙ ( 𝑓2 ⊕ 𝑓3) has type Mem[∅] → D(Mem[𝑍 ∪ 𝑌 ∪ 𝑍]). 291 • ℎ |= [𝑍 ∪ 𝑌 ] # ( [𝑋] ∗ [𝑊]). By lemma C.2.6, there exists 𝑔1, 𝑔2, 𝑔3 such that 𝑔1 : Mem[∅] → D(Mem[𝑍 ∪𝑌 ]), 𝑔2 : Mem[𝑍 ∪𝑌 ] → D(Mem[𝑍 ∪𝑌 ∪ 𝑋]), 𝑔3 : Mem[𝑍 ∪ 𝑌 ] → D(Mem[𝑍 ∪ 𝑌 ∪𝑊]), and 𝑔1 ⊙ (𝑔2 ⊕ 𝑔3) ⊑ ℎ. Note 𝑔1 ⊙ 𝑔2 has type Mem[∅] → D(Mem[𝑍 ∪ 𝑌 ∪ 𝑋]). By lemma C.2.2, 𝑓1 ⊙ ( 𝑓2 ⊕ 𝑓3) = 𝑔1 ⊙ 𝑔2. 𝑔1 ⊙ (𝑔2 ⊕ 𝑔3) = 𝑔1 ⊙ (𝑔2 ⊕ unit𝑍∪𝑌 ) ⊙ (unit𝑍∪𝑌∪𝑋 ⊕ 𝑔3) (By lemma C.2.4) = 𝑔1 ⊙ 𝑔2 ⊙ (unit𝑍∪𝑋 ⊕ 𝑔3) (Because 𝑍 ∪ 𝑌 ⊆ dom(𝑔2), 𝑌 ⊆ dom(𝑔3)) = 𝑓1 ⊙ ( 𝑓2 ⊕ 𝑓3) ⊙ (unit𝑍∪𝑋 ⊕ 𝑔3) ( 𝑓1 ⊙ ( 𝑓2 ⊕ 𝑓3) = 𝑔1 ⊙ 𝑔2) = 𝑓1 ⊙ ( ( 𝑓2 ⊙ unit𝑍∪𝑋) ⊕ ( 𝑓3 ⊙ 𝑔3) ) (By C.1.8) = 𝑓1 ⊙ ( 𝑓2 ⊕ ( 𝑓3 ⊙ 𝑔3) ) By their types, it is easy to see that 𝑓1 |= (∅ ⊲ 𝑍), 𝑓2 |= (𝑍 ⊲ 𝑋), 𝑓3 ⊙ 𝑔3 |= (𝑍 ⊲ 𝑌 ∪𝑊). So, 𝑓1 ⊙ ( 𝑓2 ⊕ ( 𝑓3 ⊙ 𝑔3)) |= [𝑍] # ( [𝑋] ∗ [𝑌 ∪𝑊]). Also, note that ℎ ⊒ 𝑔1 ⊙ (𝑔2 ⊕ 𝑔3) = 𝑓1 ⊙ ( 𝑓2 ⊕ ( 𝑓3 ⊙ 𝑔3)), so by persistence, ℎ |= (∅ ⊲ 𝑍) # ((𝑍 ⊲ 𝑋) ∗ (𝑍 ⊲ 𝑌 ∪𝑊)). □ C.3 CPSL Assertion Logic For the proof of Theorem 4.3.1, we need the following characterization of 𝑔 ⊑ 𝑓 . Proposition C.3.1. Let 𝑓 be a Markov kernel, and let 𝐷 ⊆ dom( 𝑓 ) ⊆ 𝑅 ⊆ range( 𝑓 ). Then we have 𝜋𝑅 ( 𝑓 (𝑚)) = 𝑔(𝑚′) for all 𝑚′ ∈ Mem[𝐷], 𝑚 ∈ Mem[dom( 𝑓 )] such that 𝑚𝐷 = 𝑚′ if and only if 𝑔 ⊑ 𝑓 and dom(𝑔) = 𝐷, range(𝑔) = 𝑅. 292 Proof. For the reverse direction, suppose that 𝑓 = (𝑔 ⊕ unit𝑆) ⊙ 𝑣, with 𝑆 disjoint from dom(𝑔). Since range(𝑔) ⊆ dom(𝑣), we have: 𝜋𝑅 ( 𝑓 (𝑚)) = 𝜋𝑅 ((𝑔 ⊕ unit𝑆) (𝑚)) = 𝜋𝑅 (𝑔(𝑚𝐷) ⊕ unit𝑆 (𝑚𝑆)) = 𝜋𝑅 (𝑔(𝑚𝐷)) ⊗ 𝜋𝑅 (unit𝑆 (𝑚𝑆)) = 𝑔(𝑚𝐷) = 𝑔(𝑚′). For the forward direction, evidently dom(𝑔) = 𝐷 and range(𝑔) = 𝑅. Since 𝑓 preserves input to output, we have 𝜋dom( 𝑓 ) (𝑔(𝑚′)) = 𝜋dom( 𝑓 ) ( 𝑓 (𝑚)) = unit(𝑚′) so 𝑔 preserves input to output and 𝑔 is a Markov kernel. We claim that 𝑔 ⊑ 𝑓 . First, consider 𝑔 ⊕ unitdom( 𝑓 )\𝐷 ; write 𝐷′ = dom( 𝑓 ) \ 𝐷. For any 𝑚 ∈ Mem[dom( 𝑓 )], we have: 𝜋𝐷′∪𝑅 ( 𝑓 (𝑚)) = 𝜋𝑅 ( 𝑓 (𝑚)) ⊗ 𝜋𝐷′ ( 𝑓 (𝑚)) = 𝑔(𝑚𝐷) ⊗ unit𝐷′ (𝑚𝐷′) = (𝑔 ⊕ unit𝐷′) (𝑚). So by lemma C.2.1, for every 𝑚 ∈ Mem[dom( 𝑓 )] there exists a family of kernels 𝑔′𝑚 : Mem[𝐷′ ∪ 𝑅] → D(Mem[range( 𝑓 )]) such that 𝑓 (𝑚) = bind((𝑔 ⊕ unit𝐷′) (𝑚), 𝑔′𝑚) Defining 𝑔′(𝑚) ≜ 𝑔′ 𝑚dom( 𝑓 ) (𝑚), we have: 𝑓 (𝑚) = ((𝑔 ⊕ unit𝐷′) ⊙ 𝑔′) (𝑚) and so 𝑔 ⊑ 𝑓 . □ 293 C.3.1 Restriction Theorem 4.3.1 (Restriction in DIBI+). Let 𝑃 ∈ DIBI+ with atomic propositions (𝜙 ⊲ 𝜓), as described above. Then 𝑓 |= 𝑃 if and only if there exists 𝑓 ′ ⊑ 𝑓 such that range( 𝑓 ′) ⊆ FV(𝑃) and 𝑓 ′ |= 𝑃. Proof. The reverse direction is immediate from persistence. For the forward direction, we argue by induction with a stronger hypothesis. If 𝑓 |= 𝑃, we call a state 𝑓 ′ a witness of 𝑓 |= 𝑃 if 𝑓 ′ ⊑ 𝑓 , FVR(𝑃) ⊆ range( 𝑓 ′) ⊆ FV(𝑃), dom( 𝑓 ′) ⊆ FVD(𝑃), and 𝑓 ′ |= 𝑃. We show that 𝑓 |= 𝑃 implies that there is a witness 𝑓 ′ |= 𝑃, by induction on 𝑃. Case (𝐷 ⊲ 𝑅): We will use two basic facts, both following from the form of the domain and range assertions: 1. If 𝑚 |=𝑑 𝐷, then 𝑑𝑜𝑚(𝑚) = FV(𝐷). 2. If 𝜇 |=𝑟 𝑅, then 𝑑𝑜𝑚(𝜇) ⊇ FV(𝐷). 𝑓 |= (𝐷 ⊲ 𝑅) implies that there exists 𝑓 ′ ⊑ 𝑓 such that for any 𝑚 ∈ 𝑀𝑑 such that 𝑚 |=𝑑 𝐷, 𝑓 ′(𝑚) is defined and 𝑓 ′(𝑚) |=𝑟 𝑅. Let 𝑇 = range( 𝑓 ′) ∩ (FV(𝐷) ∪ FV(𝑅)). We claim that 𝜋𝑇 𝑓 ′ is the desired witness for 𝑓 |= 𝑃. • 𝜋𝑇 𝑓 ′ is defined and 𝜋𝑇 𝑓 ′ ⊑ 𝑓 because: dom( 𝑓 ′) = 𝑑𝑜𝑚(𝑚) (for any 𝑚 ∈ 𝑀𝑑 such that 𝑚 |=𝑑 𝐷) = FV(𝐷) ⊆ 𝑇. Thus 𝜋𝑇 𝑓 ′ is defined, and 𝜋𝑇 𝑓 ′ ⊑ 𝑓 ′ ⊑ 𝑓 . 294 • range(𝜋𝑇 𝑓 ′) = 𝑇 ⊆ FV(𝐷) ∪ FV(𝑅) = FV(𝑃). • 𝜋𝑇 𝑓 ′ |= (𝐷 ⊲ 𝑅): For any 𝑚 ∈ 𝑀𝑑 such that 𝑚 |=𝑑 𝐷, 𝑓 ′(𝑚) is a distribution. Based on the restriction theorem for probabilistic BI, 𝜋FV(𝑅)∩range( 𝑓 ′) ( 𝑓 ′(𝑚)) |= 𝑅 too. Since 𝑇 ⊇ FV(𝑅) ∩ range( 𝑓 ′), persis- tence in 𝑀𝑟 , implies 𝜋𝑇 ( 𝑓 ′(𝑚)) |= 𝑅. By definition of marginalization on kernels, (𝜋𝑇 𝑓 ′) (𝑚) = 𝜋𝑇 ( 𝑓 ′(𝑚)). Since (𝜋𝑇 𝑓 ′) (𝑚) |= 𝑅, we have 𝜋𝑇 𝑓 ′ |= (𝐷 ⊲ 𝑅) as well. • FVD(𝑃) = FV(𝐷), so dom(𝜋𝑇 𝑓 ′) = dom(𝑚) = FV(𝐷) = FVD(𝑃). • FVR(𝑃) = FV(𝐷 ⊲ 𝑅) = FV(𝐷) ∪ FV(𝑅), so range(𝜋𝑇 𝑓 ′) ⊇ dom((𝜋𝑇 𝑓 ′) (𝑚)) (for any 𝑚 ∈ 𝑀𝑑 such that 𝑚 |=𝑑 𝐷) ⊇ FV(𝐷) ∪ FV(𝑅) (By (𝜋𝑇 𝑓 ′) (𝑚) |= 𝑅) = FVR(𝑃). so 𝜋𝑇 𝑓 ′ is a desired witness for 𝑓 |= 𝑃. Case 𝑄 ∧ 𝑅: Assuming FVR(𝑄) = FV(𝑄) = FVR(𝑅) = FV(𝑅). By definition, 𝑓 |= 𝑄 ∧ 𝑅 implies that 𝑓 |= 𝑄 and 𝑓 |= 𝑅. By induction, there exists 𝑓 ′ ⊑ 𝑓 such that FVR(𝑄) = range( 𝑓 ′) = FV(𝑄), dom( 𝑓 ′) ⊆ FVD(𝑄), and 𝑓 ′ |= 𝑄, and there exists 𝑓 ′′ ⊑ 𝑓 such that FVR(𝑅) = range( 𝑓 ′′) = FV(𝑅), dom( 𝑓 ′′) ⊆ FVD(𝑅) and 𝑓 ′′ |= 𝑅. Thus, range( 𝑓 ′) = range( 𝑓 ′′). Note that dom( 𝑓 ′) = dom( 𝑓 ) ∩ range( 𝑓 ′) because in our models, 𝑓 ′ ⊑ 𝑓 implies that there exists 𝑆 and some 𝑣 such that 𝑓 = ( 𝑓 ′ ⊕ 𝜂𝑆) ⊙ 𝑣, and we can make 𝑆 disjoint of dom( 𝑓 ′) and range( 𝑓 ′) wolog. Then, dom( 𝑓 ) = dom( 𝑓 ′ ⊕ 𝑆) = dom( 𝑓 ′) ∪ 𝑆, and range( 𝑓 ′) = range( 𝑓 ′ ⊕ 𝑆) \ 𝑆, so dom( 𝑓 ) ∪ range( 𝑓 ′) ⊆ dom( 𝑓 ′). Meanwhile, since dom( 𝑓 ′) ⊆ dom( 𝑓 ) and dom( 𝑓 ′) ⊆ range( 𝑓 ′), dom( 𝑓 ′) ⊆ dom( 𝑓 ) ∩ range( 𝑓 ′). So dom( 𝑓 ′) = 295 dom( 𝑓 ) ∩ range( 𝑓 ′). Similarly, dom( 𝑓 ′′) ⊆ dom( 𝑓 ) ∩ range( 𝑓 ′′), so range( 𝑓 ′) = range( 𝑓 ′′) implies that dom( 𝑓 ′) = dom( 𝑓 ′). Since dom( 𝑓 ′) = dom( 𝑓 ′′) and range( 𝑓 ′) = range( 𝑓 ′′), proposition C.3.1 implies that 𝑓 ′ = 𝑓 ′′. This is the desired witness: 𝑓 ′ = 𝑓 ′′ |= 𝑄 and 𝑓 ′ = 𝑓 ′′ |= 𝑅. Case 𝑄 ∨ 𝑅: 𝑓 |= 𝑄 ∨ 𝑅 implies that 𝑓 |= 𝑄 or 𝑓 |= 𝑅. Without loss of generality, suppose 𝑓 |= 𝑄. By induction, there exists 𝑓 ′ ⊑ 𝑓 such that FVR(𝑄) ⊆ range( 𝑓 ′) ⊆ FV(𝑄), dom( 𝑓 ′) ⊆ FVD(𝑄). Then: range( 𝑓 ′) ⊆ FV(𝑄) ∪ FV(𝑅) = FV(𝑃) range( 𝑓 ′) ⊇ FVR(𝑄) ∩ FVR(𝑅) = FVR(𝑃) dom( 𝑓 ′) ⊆ FV(𝑄) ∪ FV(𝑅) = FVD(𝑃). Thus, 𝑓 ′ is a desired witness. Case 𝑄 # 𝑅: Assuming FVD(𝑅) ⊆ FVR(𝑄). 𝑓 |= 𝑄 # 𝑅 implies that there exists 𝑓1, 𝑓2 such that 𝑓1 ⊙ 𝑓2 = 𝑓 , 𝑓1 |= 𝑄, and 𝑓2 |= 𝑅. 𝑓1 ⊙ 𝑓2 is defined so range( 𝑓1) = dom( 𝑓2). By induction, there exists 𝑓 ′1 ⊑ 𝑓1 such that 𝑓 ′1 |= 𝑄, FVR(𝑄) ⊆ range( 𝑓 ′1) ⊆ FV(𝑄) and dom( 𝑓 ′1) ⊆ FVD(𝑄), and there exists 𝑓 ′2 ⊑ 𝑓2 such that 𝑓 ′2 |= 𝑄, FVR(𝑅) ⊆ range( 𝑓 ′2) ⊆ FV(𝑅), and dom( 𝑓 ′2) ⊆ FVD(𝑅). Now, 𝑓̂ = 𝑓 ′1 ⊙ ( 𝑓 ′ 2 ⊕ unitrange( 𝑓 ′1 )\dom( 𝑓 ′2 )) is defined because dom( 𝑓 ′2) ⊆ FVD(𝑅) ⊆ FVR(𝑄) ⊆ range( 𝑓 ′1). Then, we have 𝑓̂ |= 𝑄 # 𝑅 range( 𝑓̂ ) = range( 𝑓 ′1) ∪ range( 𝑓 ′2) ⊆ FV(𝑄) ∪ FV(𝑅) = FV(𝑃) range( 𝑓̂ ) = range( 𝑓 ′1) ∪ range( 𝑓 ′2) ⊇ FVR(𝑄) ∪ FVR(𝑅) = FVR(𝑃) dom( 𝑓̂ ) = dom( 𝑓 ′1) ⊆ FVD(𝑄) = FVD(𝑃). 296 𝑓 ′1 ⊑ 𝑓 , 𝑓 ′2 ⊕ unitrange( 𝑓 ′1 )\dom( 𝑓 ′2 )⊕ ⊑ 𝑓2, so by lemma C.2.5, 𝑓̂ = 𝑓 ′1 ⊙ ( 𝑓 ′ 2 ⊕ unitrange( 𝑓 ′1 )\dom( 𝑓 ′2 )) ⊑ 𝑓1 ⊙ 𝑓2 = 𝑓 . Thus, 𝑓̂ is a desired witness. Case 𝑄 ∗ 𝑅: 𝑓 |= 𝑄 ∗ 𝑅 implies that there exists 𝑓1, 𝑓2 such that 𝑓1 ⊕ 𝑓2 ⊑ 𝑓 , 𝑓1 |= 𝑄, and 𝑓2 |= 𝑅. By induction, there exists 𝑓 ′1 ⊑ 𝑓1 such that 𝑓 ′1 |= 𝑄, FVR(𝑄) ⊆ range( 𝑓 ′1) ⊆ FV(𝑄) and dom( 𝑓 ′1) ⊆ FVD(𝑄), and there exists 𝑓 ′2 ⊑ 𝑓2 such that 𝑓 ′2 |= 𝑄, FVR(𝑅) ⊆ range( 𝑓 ′2) ⊆ FV(𝑅), and dom( 𝑓 ′2) ⊆ FVD(𝑅). By downwards closure of ⊕, 𝑓 ′1 ⊕ 𝑓 ′ 2 is defined and 𝑓 ′1 ⊕ 𝑓 ′ 2 ⊑ 𝑓1 ⊕ 𝑓2 ⊑ 𝑓 . We have 𝑓 ′1 ⊕ 𝑓 ′ 2 |= 𝑄 ∗ 𝑅, and range( 𝑓 ′1 ⊕ 𝑓 ′ 2) = range( 𝑓 ′1) ∪ range( 𝑓 ′2) ⊆ FV(𝑄) ∪ FV(𝑅) = FV(𝑃) range( 𝑓 ′1 ⊕ 𝑓 ′ 2) = range( 𝑓 ′1) ∪ range( 𝑓 ′2) ⊇ FVR(𝑄) ∪ FVR(𝑅) = FVR(𝑃) dom( 𝑓 ′1 ⊕ 𝑓 ′ 2) = dom( 𝑓 ′1) ∪ dom( 𝑓 ′2) ⊆ FVD(𝑄) ∪ FVD(𝑅) = FVD(𝑃). Thus, 𝑓 ′1 ⊕ 𝑓 ′ 2 is a desired witness. □ 297 C.3.2 Extra Axioms Proposition 4.3.2. The following axiom schemas for atomic propositions are valid. (𝑆 : 𝑝𝑑 ⊲ 𝑝𝑟) ∧ (𝑆 : 𝑝′𝑑 ⊲ 𝑝 ′ 𝑟) → (𝑆 : 𝑝𝑑 ∧ 𝑝′𝑑 ⊲ 𝑝𝑟 ∧ 𝑝 ′ 𝑟) if FV(𝑝𝑟) = FV(𝑝′𝑟) (AP-AND) (𝑆 : 𝑝𝑑 ⊲ 𝑝𝑟) ∧ (𝑆 : 𝑝′𝑑 ⊲ 𝑝 ′ 𝑟) → (𝑆 : 𝑝𝑑 ∨ 𝑝′𝑑 ⊲ 𝑝𝑟 ∨ 𝑝 ′ 𝑟) (AP-OR) (𝑆 : 𝑝𝑑 ⊲ 𝑝𝑟) ∗ (𝑆′ : 𝑝′𝑑 ⊲ 𝑝 ′ 𝑟) → (𝑆 ∪ 𝑆′ : 𝑝𝑑 ∧ 𝑝′𝑑 ⊲ 𝑝𝑟 ∗ 𝑝 ′ 𝑟) (AP-PAR) 𝑝′𝑑 → 𝑝𝑑 and |=𝑟 𝑝𝑟 → 𝑝′𝑟 implies |= (𝑆 : 𝑝𝑑 ⊲ 𝑝𝑟) → (𝑆 : 𝑝′𝑑 ⊲ 𝑝 ′ 𝑟) (AP-IMP) Proof. We check each of the axioms. Case: AP-AND. Suppose that 𝑤 |= (𝑆 : 𝑝𝑑 ⊲ 𝑝𝑟) ∧ (𝑆 : 𝑝′ 𝑑 ⊲ 𝑝′𝑟). By semantics of atomic propositions, there exists 𝑤1 ⊑𝑘 𝑤 and 𝑤2 ⊑𝑘 𝑤 such that for all 𝑚 ∈ Mem[𝑆] such that 𝑚 |=𝑑 𝑝𝑑 ∧ 𝑝′𝑑 , we have 𝑤1(𝑚) |=𝑟 𝑝𝑟 and 𝑤2(𝑚) |=𝑟 𝑝′𝑟 . By restriction (theorem 4.3.1), we may assume that range(𝑤1) = FV(𝑝𝑟) = FV(𝑝′𝑟) = range(𝑤2). Thus, proposition C.3.1 implies that 𝑤1 = 𝑤2, and so 𝑤 |= (𝑆 : 𝑝𝑑 ∧ 𝑝′𝑑 ⊲ 𝑝𝑟 ∧ 𝑝 ′ 𝑟). Case: AP-OR. Immediate, by semantics of ∨. Case: AP-PAR. Suppose that 𝑤 |= (𝑆 : 𝑝𝑑 ⊲ 𝑝𝑟) ∗ (𝑆′ : 𝑝′ 𝑑 ⊲ 𝑝′𝑟). We will show that 𝑤 |= (𝑆 ∪ 𝑆′ : 𝑝𝑑 ∗ 𝑝′𝑑 ⊲ 𝑝𝑟 ∗ 𝑝 ′ 𝑟). By semantics of atomic propositions, there exists 𝑤1 ⊑𝑘 𝑤 and 𝑤2 ⊑𝑘 𝑤 such that 𝑤1 ⊕ 𝑤2 ⊑ 𝑤, and for all 𝑚1 ∈ Mem[𝑆] such that 𝑚1 |=𝑑 𝑝𝑑 , we have 𝑤1(𝑚1) |=𝑟 𝑝𝑟 , and for all 𝑚2 ∈ Mem[𝑆′] such that 𝑚2 |=𝑑 𝑝′𝑑 , we have 𝑤2(𝑚2) |=𝑟 𝑝′𝑟 . Now for any 𝑚 ∈ Mem[𝑆 ∪ 𝑆′] such that 𝑚 |=𝑑 𝑝𝑑 ∧ 𝑝′𝑑 , we have 𝑚𝑆 |=𝑑 𝑝𝑑 and 𝑚𝑆′ |=𝑑 𝑝′𝑑 . Thus 𝑤1(𝑚𝑆) |=𝑟 𝑝𝑟 and 𝑤2(𝑚𝑆′) |=𝑟 𝑝′𝑟 . Letting 𝑇 = 𝑆 ∩ 𝑆′ 298 and 𝑇1 = 𝑆 \ 𝑇 ; 𝑇2 = 𝑆′ \ 𝑇 be disjoint sets, and noting that 𝑤1, 𝑤2 both preserve inputs on 𝑇 , we have: 𝑤1 ⊕ 𝑤2(𝑚) = 𝜋𝑇1𝑤1(𝑚𝑆) ⊗ unit(𝑚𝑇 ) ⊗ 𝜋𝑇2𝑤2(𝑚𝑆′) = (𝜋𝑇1𝑤1(𝑚𝑆) ⊗ unit(𝑚𝑇 )) ⊕𝑟 (unit(𝑚𝑇 ) ⊗ 𝜋𝑇2𝑤2(𝑚𝑆′)) = 𝑤1(𝑚𝑆) ⊕𝑟 𝑤2(𝑚𝑆′) |=𝑟 𝑝𝑟 ∗ 𝑝′𝑟 Thus, 𝑤 |= (𝑆 ∪ 𝑆′ : 𝑝𝑑 ∗ 𝑝′𝑑 ⊲ 𝑝𝑟 ∗ 𝑝 ′ 𝑟). Case: AP-IMP. Immediate, by semantics of→. □ Proposition 4.3.4. (AXIOMS FOR DIBI+) The following axioms are sound, assuming both precedent and antecedent are in DIBI+. (𝑃 #𝑄) # 𝑅 → 𝑃 # (𝑄 ∗ 𝑅) (INDEP-1) 𝑃 #𝑄 → 𝑃 ∗ 𝑄 if FVD(𝑄) = ∅ (INDEP-2) 𝑃 #𝑄 → 𝑃 # (𝑄 ∗ (𝑆 ⊲ [𝑆])) (PAD) (𝑃 ∗ 𝑄) # (𝑅 ∗ 𝑆) → (𝑃 # 𝑅) ∗ (𝑄 # 𝑆) (RESTEXCH) Proof. We prove them one by one. INDEP-1 We want to show that when (𝑃 # 𝑄) # 𝑅, 𝑃 # (𝑄 ∗ 𝑅) are both formula in DIBI+, 𝑓 |= (𝑃 #𝑄) # 𝑅 implies 𝑓 |= 𝑃 # (𝑄 ∗ 𝑅). By proof system of DIBI, 𝑓 |= (𝑃 #𝑄) # 𝑅 implies that 𝑓 |= 𝑃 # ( 𝑄 # 𝑅 ) . While 𝑃 # ( 𝑄 # 𝑅 ) may not satisfy the restriction property, that is okay because we will only used conditions guaranteed by the fact that (𝑃 # 𝑄) # 𝑅, 𝑃 # (𝑄 ∗ 299 𝑅) ∈ DIBI+. In particular, we rely on 𝑃,𝑄, 𝑅 each satisfies restriction, and FVD(𝑄 ∗ 𝑅) ⊆ FVR(𝑃), which implies that FVD(𝑅) ⊆ FVD(𝑄 ∗ 𝑅) ⊆ FVR(𝑃) (C.17) 𝑓 |= 𝑃 # ( 𝑄 # 𝑅 ) implies there exists 𝑓𝑝, 𝑓𝑞, 𝑓𝑟 such that 𝑓𝑝 |= 𝑃, 𝑓𝑞 |= 𝑄, and 𝑓𝑟 |= 𝑅, and 𝑓𝑝 ⊙ ( 𝑓𝑞 ⊙ 𝑓𝑟) = 𝑓 . By restriction property theorem 4.3.1, 𝑓𝑞 |= 𝑄 implies that there exists 𝑓 ′𝑞 ⊑ 𝑓𝑞 such that FVR(𝑄) ⊆ range( 𝑓 ′𝑞) ⊆ FV(𝑄) and dom( 𝑓 ′𝑞) ⊆ FVD(𝑄). 𝑓 ′𝑞 ⊑ 𝑓𝑞 so there exists 𝑣, 𝑇 such that 𝑓𝑞 = ( 𝑓 ′𝑞 ⊕𝑘 unit𝑇 ) ⊙ 𝑣. Similarly, 𝑓𝑟 |= 𝑅, by theorem 4.3.1, there exists 𝑓 ′𝑟 ⊑ 𝑓𝑟 such that FVR(𝑅) ⊆ range( 𝑓 ′𝑟 ) ⊆ FV(𝑅) and dom( 𝑓 ′𝑟 ) ⊆ FVD(𝑅). 𝑓 ′𝑟 ⊑ 𝑓𝑟 so there exists 𝑢, 𝑆 such that 𝑓𝑟 = ( 𝑓 ′𝑟 ⊕𝑘 unit𝑆) ⊙ 𝑢. Now, we claim that FVD(𝑅) ⊆ dom( 𝑓 ′𝑞 ⊕ unit𝑇 ): By theorem 4.3.1 𝑓𝑝 |= 𝑃 implies that there exists 𝑓 ′𝑝 ⊑ 𝑓𝑝 such that FVR(𝑃) ⊆ range( 𝑓 ′𝑝) ⊆ FV(𝑃), dom( 𝑓 ′𝑝) ⊆ 𝐹FV(𝑃), and 𝑓 ′𝑝 |= 𝑃. Thus, FVR(𝑃) ⊆ range( 𝑓𝑝) = dom( 𝑓𝑞). Recall that FVD(𝑅) ⊆ FVR(𝑃), so FVD(𝑅) ⊆ dom 𝑓𝑞 = dom 𝑓 ′𝑞 ⊕ unit𝑇 . As a corollary, we have dom( 𝑓 ′𝑟 ) ⊆ FVD(𝑅) ⊆ dom( 𝑓 ′𝑞 ⊕ unit𝑇 ) ⊆ dom(𝑣), 300 and dom( 𝑓 ′𝑟 ) ⊆ FVD(𝑅) ⊆ dom( 𝑓 ′𝑞 ⊕ unit𝑇 ). Then, 𝑓𝑞 ⊙ 𝑓𝑟 = ( ( 𝑓 ′𝑞 ⊕ unit𝑇 ) ⊙ 𝑣 ) ⊙ ( ( 𝑓 ′𝑟 ⊕ unit𝑆) ⊙ 𝑢 ) = ( 𝑓 ′𝑞 ⊕ unit𝑇 ) ⊙ ( 𝑣 ⊙ ( 𝑓 ′𝑟 ⊕ unit𝑆) ) ⊙ 𝑢 (By standard associativity of ⊙) = ( 𝑓 ′𝑞 ⊕ unit𝑇 ) ⊙ ( 𝑓 ′𝑟 ⊕ 𝑣) ⊙ 𝑢 (By lemma C.2.3 and dom( 𝑓 ′𝑟 ) ⊆ dom(𝑣)) = ( 𝑓 ′𝑞 ⊕ unit𝑇 ) ⊙ (( 𝑓 ′𝑟 ⊙ unitrange( 𝑓 ′𝑟 )) ⊕ (unitdom(𝑣) ⊙ 𝑣) ⊙ 𝑢 = ( 𝑓 ′𝑞 ⊕ unit𝑇 ) ⊙ ( 𝑓 ′𝑟 ⊕ unitdom(𝑣)) ⊙ (unitrange( 𝑓 ′𝑟 ) ⊕ 𝑣) ⊙ 𝑢 (♥) = (( 𝑓 ′𝑞 ⊕ unit𝑇 ) ⊕ 𝑓 ′𝑟 ) ⊙ (𝑣 ⊕ unitrange( 𝑓 ′𝑟 )) ⊙ 𝑢 (†) = (( 𝑓 ′𝑞 ⊕ unit𝑇 ) ⊙ 𝑣) ⊕ ( 𝑓 ′𝑟 ⊙ unitrange( 𝑓 ′𝑟 )) ⊙ 𝑢 (♥) = 𝑓𝑞 ⊕ 𝑓𝑟 where † follows from lemma C.2.3, dom( 𝑓 ′𝑟 ) ⊆ dom( 𝑓 ′𝑞 ⊕ unit𝑇 ) and exact commutativity, ♥ follows from lemma C.1.8 and lemma C.1.9. Thus, 𝑓𝑞 ⊙ 𝑓𝑟 |= 𝑄 ∗ 𝑅. And by satisfaction rules, 𝑓 |= 𝑃 # (𝑄 ∗ 𝑅) INDEP-2 We want to show that under the special condition FVD(𝑄) = ∅, 𝑓 |= 𝑃 #𝑄 implies that 𝑓 |= 𝑃 ∗ 𝑄. If 𝑓 |= 𝑃 #𝑄, then there exists 𝑓𝑝, 𝑓𝑞 such that 𝑓𝑝 ⊙ 𝑓𝑞 = 𝑓 and 𝑓𝑝 |= 𝑃, 𝑓𝑞 |= 𝑄. By restriction property theorem 4.3.1, 𝑓𝑞 |= 𝑄 implies that there exists 𝑓 ′𝑞 ⊑ 𝑓𝑞 such that FVR(𝑄) ⊆ range( 𝑓 ′𝑞) ⊆ FV(𝑄) and dom( 𝑓 ′𝑞) ⊆ FVD(𝑄). 𝑓 ′𝑞 ⊑ 𝑓𝑞 so there exists 𝑣, 𝑇 such that 𝑓𝑞 = ( 𝑓 ′𝑞 ⊕𝑘 unit𝑇 ) ⊙ 𝑣. Since dom( 𝑓 ′𝑞) ⊆ FVD(𝑄) and FVD(𝑄) = ∅, it must dom( 𝑓 ′𝑞) = ∅, and thus 301 no matter what the domain of 𝑓𝑝 is, dom( 𝑓 ′𝑞) ⊆ dom( 𝑓𝑝). Thus, 𝑓𝑝 ⊙ 𝑓𝑞 = 𝑓𝑝 ⊙ ( 𝑓 ′𝑞 ⊕ unit𝑇 ) ⊙ 𝑣 = ( 𝑓𝑝 ⊕ 𝑓 ′𝑞) ⊕ 𝑣 (By lemma C.2.3 and dom( 𝑓 ′𝑞) ⊆ dom( 𝑓𝑝)) Thus, 𝑓𝑝 ⊕ 𝑓 ′𝑞 ⊑ 𝑓𝑝 ⊙ 𝑓𝑞 = 𝑓 . By satisfaction rules, 𝑓𝑝 |= 𝑃 and 𝑓 ′𝑞 |= 𝑄 implies that 𝑓𝑝 ⊕ 𝑓 ′𝑞 |= 𝑃 ∗ 𝑄. Thus, by persistence, 𝑓 |= 𝑃 ∗ 𝑄 PAD We want to show that when 𝑃 # 𝑄, 𝑃 # (𝑄 ∗ (𝑆 ⊲ 𝑆)) are both in DIBI+, 𝑓 |= 𝑃 #𝑄 implies 𝑓 |= 𝑃 # (𝑄 ∗ (𝑆 ⊲ 𝑆)). One key guarantee we rely on from the grammar of DIBI+ is that FVD(𝑄) ∪ 𝑆 = FVD(𝑄 ∗ (𝑆 ⊲ 𝑆)) ⊆ FVR(𝑃). When 𝑓 |= 𝑃 #𝑄, there exists 𝑓𝑝, 𝑓𝑞 such that 𝑓𝑝 ⊙ 𝑓𝑞 = 𝑓 and 𝑓𝑝 |= 𝑃, 𝑓𝑞 |= 𝑄, By theorem 4.3.1, 𝑓𝑝 |= 𝑃 implies that there exists 𝑓 ′𝑝 ⊑ 𝑓𝑝 such that FVR(𝑃) ⊆ range( 𝑓 ′𝑝) ⊆ FV(𝑃), dom( 𝑓 ′𝑝) ⊆ 𝐹FV(𝑃), and 𝑓 ′𝑝 |= 𝑃. By the fact that 𝑓𝑝 ⊙ 𝑓𝑞 is defined, and that the definition of preorder in our con- crete models, 𝑓 ′𝑝 ⊑ 𝑓𝑝 implies dom( 𝑓𝑞) = range( 𝑓𝑝) ⊇ range( 𝑓 ′𝑝) ⊇ FVR(𝑃) ⊇ 𝑆 Since 𝑓𝑞 preserves input, 𝑆 ⊆ dom( 𝑓𝑞) implies that 𝑓𝑞 = 𝑓𝑞 ⊕unit𝑆, and thus 𝑓𝑝 ⊙ 𝑓𝑞 = 𝑓𝑝 ⊙ ( 𝑓𝑞 ⊕ unit𝑆). Note that unit𝑆 |= (𝑆 ⊲ [𝑆]), and 𝑓𝑞 |= 𝑄. Thus, 𝑓𝑞 ⊕ unit𝑆 |= 𝑄 ∗ (𝑆 ⊲ [𝑆]). Since 𝑓𝑝 |= 𝑃, it follows that 𝑓𝑝 ⊙ ( 𝑓𝑞 ⊕ unit𝑆) |= 𝑃 # (𝑄 ∗ (𝑆 ⊲ [𝑆])) Since 𝑓 = 𝑓𝑝 ⊙ 𝑓𝑞 = 𝑓𝑝 ⊙ ( 𝑓𝑞 ⊕ unit𝑆), 𝑓 |= 𝑃 # (𝑄 ∗ (𝑆 ⊲ [𝑆])) 302 RESTEXCH We want to show that when (𝑃 ∗ 𝑄) # (𝑅 ∗ 𝑆) and (𝑃 #𝑅) ∗ (𝑄 #𝑆) are both formula in DIBI+, 𝑓 |= (𝑃 ∗ 𝑄) # (𝑅 ∗ 𝑆) implies 𝑓 |= (𝑃 ∗ 𝑅) ∗ (𝑄 ∗ 𝑆). The key properties that being in DIBI+ guarantees us are that FVD(𝑅) ⊆ FVR(𝑃) FVD(𝑆) ⊆ FVR(𝑄) FVD(𝑅 ∗ 𝑆) = FVD(𝑅) ∪ FVD(𝑆) ⊆ FVR(𝑃 ∗ 𝑄) = FVR(𝑃) ∪ FVR(𝑄) If 𝑓 |= (𝑃 ∗ 𝑄) # (𝑅 ∗ 𝑆), then there exists 𝑓1, 𝑓2 such that 𝑓1 ⊙ 𝑓2 = 𝑓 , 𝑓1 |= 𝑃 ∗ 𝑄, 𝑓2 |= 𝑅 ∗ 𝑆. That is, there exist 𝑢1, 𝑣1 such that 𝑢1 ⊕ 𝑣1 ⊑ 𝑓1, 𝑢1 |= 𝑃, and 𝑣1 |= 𝑄; there exist 𝑢2, 𝑣2 such that 𝑢2 ⊕ 𝑣2 ⊑ 𝑓2, 𝑢2 |= 𝑅, 𝑣2 |= 𝑆. By theorem 4.3.1, • 𝑢1 |= 𝑃 implies there exists 𝑢′1 ⊑ 𝑢1 such that FVR(𝑃) ⊆ range(𝑢′1) ⊆ FV(𝑃), dom(𝑢′1) ⊆ FVD(𝑃), and 𝑢′1 |= 𝑃. • 𝑣1 |= 𝑄 implies there exists 𝑣′1 ⊑ 𝑣1 such that FVR(𝑄) ⊆ range(𝑣′1) ⊆ FV(𝑄), dom(𝑣′1) ⊆ FVD(𝑄), and 𝑣′1 |= 𝑄. • 𝑢2 |= 𝑅 implies there exists 𝑢′2 ⊑ 𝑢2 such that FVR(𝑅) ⊆ range(𝑢′2) ⊆ FV(𝑅), dom(𝑢′2) ⊆ FVD(𝑅), and 𝑢′2 |= 𝑅. • 𝑣2 |= 𝑆 implies there exists 𝑣′2 ⊑ 𝑣2 such that FVR(𝑆) ⊆ range(𝑣′2) ⊆ FV(𝑆), dom(𝑣′2) ⊆ FVD(𝑆), and 𝑣′2 |= 𝑆. By Downwards closure property of ⊕, 𝑢′2 ⊕ 𝑣 ′ 2 is defined and 𝑢′2 ⊕ 𝑣 ′ 2 ⊑ 𝑢2 ⊕ 𝑣2 ⊑ 𝑓2. Say that 𝑓1 = (𝑢1 ⊕ 𝑣1 ⊕ unit𝑆1) ⊙ ℎ1, 𝑓2 = (𝑢′2 ⊕ 𝑣 ′ 2 ⊕ unit𝑆2) ⊙ ℎ2. Also, dom(𝑢′2 ⊕ 𝑣 ′ 2) = dom(𝑢′2) ∪ dom(𝑣′2) ⊆ FVD(𝑅) ∪ FVD(𝑆) ⊆ FVR(𝑃) ∪ FVD(𝑄) ⊆ range(𝑢′1) ∪ range(𝑣′1) ⊆ range(𝑢1) ∪ range(𝑣1) = range(𝑢1 ⊕ 𝑣1) 303 Then 𝑓1 ⊙ 𝑓2 = (𝑢1 ⊕ 𝑣1 ⊕ unit𝑆1) ⊙ ℎ1 ⊙ (𝑢′2 ⊕ 𝑣 ′ 2 ⊕ unit𝑆2) ⊙ ℎ2 = (𝑢1 ⊕ 𝑣1 ⊕ unit𝑆1) ⊙ ((𝑢′2 ⊕ 𝑣 ′ 2) ⊕ ℎ1) ⊙ ℎ2 (♥) = (𝑢1 ⊕ 𝑣1 ⊕ unit𝑆1) ⊙ ((𝑢′2 ⊕ 𝑣 ′ 2) ⊙ unitrange(𝑢′2⊕𝑣 ′ 2)) ⊕ (unitdom(ℎ1) ⊙ ℎ1) ⊙ ℎ2 = (𝑢1 ⊕ 𝑣1 ⊕ unit𝑆1) ⊙ (𝑢′2 ⊕ 𝑣 ′ 2 ⊕ unitdom(ℎ1)) ⊙ (unitrange(𝑢′2⊕𝑣 ′ 2) ⊕ ℎ1) ⊙ ℎ2 (†) = (𝑢1 ⊕ 𝑣1 ⊕ unit𝑆1) ⊙ (𝑢′2 ⊕ 𝑣 ′ 2 ⊕ unitrange(𝑢1⊕𝑣1) ⊕ unit𝑆1) ⊙ (unitrange(𝑢′2⊕𝑣 ′ 2) ⊕ ℎ1) ⊙ ℎ2 = ( ((𝑢1 ⊕ 𝑣1) ⊙ (𝑢′2 ⊕ 𝑣 ′ 2 ⊕ unitrange(𝑢1⊕𝑣1))) ⊕ unit𝑆1 ) ⊙ (unitrange(𝑢′2⊕𝑣 ′ 2) ⊕ ℎ1) ⊙ ℎ2 (†) = ( (𝑢1 ⊙ (𝑢′2 ⊕ unitrange(𝑢1))) ⊕ (𝑣1 ⊙ (𝑣′2 ⊕ unitrange(𝑣1))) ⊕ unit𝑆1 ) ⊙ (unitrange(𝑢′2⊕𝑣 ′ 2) ⊕ ℎ1) ⊙ ℎ2 († and exact commutativity, associativity) where ♥ follows from lemma C.2.3, dom(𝑢′2 ⊕ 𝑣 ′ 2) ⊆ range(𝑢1 ⊕ 𝑣1) ⊆ dom(ℎ1), and † follows from lemma C.1.8 and lemma C.1.9. Thus, (𝑢1 ⊙ (𝑢′2 ⊕ unitrange(𝑢1))) ⊕ (𝑣1 ⊙ (𝑣′2 ⊕ unitrange(𝑣1))) ⊑ 𝑓1 ⊙ 𝑓2. Recall that 𝑢′2 |= 𝑅. By persistence, 𝑢′2 ⊕ unitrange(𝑢1) |= 𝑅. Similarly, 𝑣′2 |= 𝑆, so by persistence, 𝑣′2 ⊕ unitrange(𝑣1) |= 𝑆. Therefore, (𝑢1 ⊙ (𝑢′2 ⊕ unitrange(𝑢1))) ⊕ (𝑣1 ⊙ (𝑣′2 ⊕ unitrange(𝑣1))) |= (𝑃 # 𝑅) ∗ (𝑄 # 𝑆) Then, by persistence, 𝑓 |= (𝑃 # 𝑅) ∗ (𝑄 # 𝑆). □ Proposition 4.3.5. (AXIOMS FOR ATOMIC PROPOSITIONS) The following axioms 304 are sound. For any 𝑆, 𝐴, 𝐵, 𝐶 ⊆ Var, (𝑆 ⊲ [𝐴] ∗ [𝐵]) → (𝑆 ⊲ [𝐴]) ∗ (𝑆 ⊲ [𝐵]) if 𝐴 ∩ 𝐵 ⊆ 𝑆 (REVPAR) (𝑆 ⊲ [𝐴] ∗ [𝐵]) → (𝑆 ⊲ [𝐴 ∪ 𝐵]) (UNIONRAN) (𝐴 ⊲ 𝐵) # (𝐵 ⊲ 𝐶) → (𝐴 ⊲ 𝐶) (ATOMSEQ) (𝐴 ⊲ 𝐵) → (𝐴 ⊲ 𝐴) # (𝐴 ⊲ 𝐵) (UNITL) (𝐴 ⊲ 𝐵) → (𝐴 ⊲ 𝐵) # (𝐵 ⊲ 𝐵) (UNITR) Proof. We prove it one by one. REVPAR Given any 𝑓 |= (𝑆 ⊲ [𝐴] ∗ [𝐵]), by satisfaction rules and semantic of atomic propositions, there exists 𝑓 ′ ⊑ 𝑓 such that for all 𝑚 ∈ 𝑀𝑑 such that 𝑚 |=𝑑 𝑆, 𝑓 ′(𝑚) |=𝑟 [𝐴] ∗ [𝐵]. Since 𝑓 ′(𝑚) is defined and 𝑓 ′(𝑚) |=𝑟 [𝐴] ∗ [𝐵], it follows that dom( 𝑓 ′) = 𝑆 and range( 𝑓 ′) ⊇ 𝑆 ∪ 𝐴 ∪ 𝐵. Thus, we can define 𝑓1 = 𝜋𝑆∪𝐴 𝑓 ′, 𝑓2 = 𝜋𝑆∪𝐵 𝑓 ′. Note that 𝑓1 |= (𝑆 ⊲ 𝐴), 𝑓2 |= (𝑆 ⊲ 𝐵). Also, because 𝐴 ∩ 𝐵 ⊆ 𝑆, range( 𝑓1) ∩ range( 𝑓2) = (𝑆 ∪ 𝐴) ∩ (𝑆 ∪ 𝐵) = 𝑆, and thus 𝑓1 ⊕ 𝑓2 is defined. We now want to show that 𝑓1 ⊕ 𝑓2 ⊑ 𝑓 . Note 𝑓 ′(𝑚) |=𝑟 [𝐴] ∗ [𝐵] implies that there exists 𝜇1, 𝜇2 such that 𝜇1 ⊕𝑟 𝜇2 ⊑ 𝑓 ′(𝑚), and 𝑑𝑜𝑚(𝜇1) ⊇ 𝐴, 𝑑𝑜𝑚(𝜇2) ⊇ 𝐵. Since 𝑓 ′ preserves input on its domain 𝑆, 𝜋𝑆 𝑓 ′(𝑚) = unit(𝑚), so (𝜇1 ⊕𝑟 unit(𝑚)) ⊕𝑟 (𝜇2 ⊕𝑟 unit(𝑚)) ⊑ 𝑓 ′(𝑚) ⊕𝑟 unit(𝑚) ⊕𝑟 unit(𝑚) = 𝑓 ′(𝑚) too. Let 𝜇′1 = 𝜋𝐴∪𝑆 (𝜇1 ⊕𝑟 unit(𝑚)) and 𝜇′2 = 𝜋𝐵∪𝑆 (𝜇2 ⊕𝑟 unit(𝑚)). Then due to Downwards closure in 𝑀𝑑 , 𝜇′1 ⊕𝑟 𝜇 ′ 2 will also be defined, and 𝜇′1 ⊕𝑟 𝜇 ′ 2 ⊑ (𝜇1 ⊕𝑟 unit(𝑚)) ⊕𝑟 (𝜇2 ⊕𝑟 unit(𝑚)) ⊑ 𝑓 ′(𝑚), 305 which implies that 𝜇′1 ⊕𝑟 𝜇 ′ 2 = 𝜋𝑆∪𝐴∪𝐵 𝑓 ′(𝑚). In the range model, this means that 𝜇′1 = 𝜋𝑆∪𝐴 𝑓 ′(𝑚), 𝜇′2 = 𝜋𝑆∪𝐵 𝑓 ′(𝑚). Then for any 𝑚′ ∈ Mem[𝑆], any 𝑟 ∈ Mem[𝐴 ∪ 𝐵 ∪ 𝑆], (𝜋𝑆∪𝐴∪𝐵 𝑓 ′) (𝑚′) (𝑟) = (𝜋𝑆∪𝐴∪𝐵 𝑓 ′(𝑚′)) (𝑟) = 𝜇′1 ⊕𝑟 𝜇 ′ 2(𝑟) = 𝜇 ′ 1(𝑟 𝑆∪𝐴) · 𝜇′2(𝑟 𝑆∪𝐵) ( 𝑓1 ⊕ 𝑓2) (𝑚′) (𝑟) = 𝑓1(𝑚′) (𝑟𝑆∪𝐴) · 𝑓2(𝑚′) (𝑟𝑆∪𝐵) = (𝜋𝑆∪𝐴 𝑓 ′) (𝑚′) (𝑟𝑆∪𝐴) · (𝜋𝑆∪𝐵 𝑓 ′(𝑚′) (𝑟𝑆∪𝐵) = 𝜇′1(𝑟 𝑆∪𝐴) · 𝜇′2(𝑟 𝑆∪𝐵) Thus, 𝑓1 ⊕ 𝑓2 = 𝜋𝑆∪𝐴∪𝐵 𝑓 ′, which implies that 𝑓1 ⊕ 𝑓2 ⊑ 𝑓 . By their types, 𝑓1 ⊕ 𝑓2 |= (𝑆 ⊲ 𝐴) ∗ (𝑆 ⊲ 𝐵). By persistence, 𝑓 |= (𝑆 ⊲ 𝐴) ∗ (𝑆 ⊲ 𝐵). UNIONRAN Obvious from the semantics of atomic proposition and the range logic. ATOMSEQ Given any 𝑓 |= (𝐴 ⊲ 𝐵) # (𝐵 ⊲ 𝐶), by satisfaction rules and semantic of atomic propositions, there exists • 𝑓1, 𝑓2 such that 𝑓1 ⊙ 𝑓2 = 𝑓 ; • 𝑓 ′1 ⊑ 𝑓1 such that for any 𝑚 ∈ 𝑀𝑑 such that 𝑚 |=𝑑 𝐴, 𝑓 ′1 (𝑚) |=𝑟 [𝐵]. • 𝑓 ′2 ⊑ 𝑓2 such that for any 𝑚 ∈ 𝑀𝑑 such that 𝑚 |=𝑑 𝐵, 𝑓 ′2 (𝑚) |=𝑟 D[𝐶]. Note that 𝑓 ′1 (𝑚) |=𝑟 [𝐵] implies that 𝐵 ⊆ range( 𝑓 ′1), so 𝜋𝐵 𝑓 ′1 is defined. Let 𝑓 ′′1 = 𝜋𝐵 𝑓 ′ 1. Note that for any 𝑚 ∈ 𝑀𝑑 such that 𝑚 |=𝑑 𝐴, 𝑓 ′′1 (𝑚) |=𝑟 [𝐵] too, so 𝑓 ′′ |= (𝐴 ⊲ 𝐵) too. Also, by transitivity, 𝑓 ′′1 ⊑ 𝑓 ′1 ⊑ 𝑓1. 306 Say 𝑓1 = ( 𝑓 ′′1 ⊕ 𝜂𝑆1) ⊙ 𝑣1, 𝑓2 = ( 𝑓 ′2 ⊕ 𝜂𝑆2) ⊙ 𝑣2, then since range( 𝑓 ′′1 ) = 𝐵 = dom( 𝑓 ′2), 𝑓1 ⊙ 𝑓2 = ( 𝑓 ′′1 ⊕ 𝜂𝑆1) ⊙ 𝑣1 ⊙ ( 𝑓 ′2 ⊕ 𝜂𝑆2) ⊙ 𝑣2 = ( 𝑓 ′′1 ⊕ 𝜂𝑆1) ⊙ ( 𝑓 ′2 ⊕ 𝑣1) ⊙ 𝑣2 (By lemma C.2.3 and dom( 𝑓 ′2) = 𝐵 = range( 𝑓 ′′1 ) ⊆ dom(𝑣1)) = ( 𝑓 ′′1 ⊕ 𝜂𝑆1) ⊙ ( 𝑓 ′2 ⊕ 𝜂dom(𝑣1)) ⊙ (𝑣1 ⊕ 𝜂range( 𝑓1)) ⊙ 𝑣2 (By lemma C.2.4) = ( 𝑓 ′′1 ⊕ 𝜂𝑆1) ⊙ ( 𝑓 ′2 ⊕ 𝜂𝑆) ⊙ (𝑣1 ⊕ 𝜂range( 𝑓1)) ⊙ 𝑣2 = (( 𝑓 ′′1 ⊙ 𝑓 ′ 2) ⊕ 𝜂𝑆1) ⊙ (𝑣1 ⊕ 𝜂range( 𝑓1)) ⊙ 𝑣2 So 𝑓 ′′1 ⊙ 𝑓 ′ 2 ⊑ 𝑓1 ⊙ 𝑓2 = 𝑓 . 𝑓 ′′1 : Mem[𝐴] → D(Mem[𝐵]), 𝑓 ′2 : Mem[𝐵] → D(Mem[range( 𝑓 ′2)])𝐴, so 𝑓 ′′1 ⊙ 𝑓 ′ 2 : Mem[𝐴] → D(Mem[range( 𝑓 ′2)]). Since range( 𝑓 ′2) ⊇ 𝐶, it follows that 𝑓 ′′1 ⊙ 𝑓 ′ 2 |= (𝐴 ⊲ 𝐶), and thus 𝑓 |= (𝐴 ⊲ 𝐶) by persistence. UNITL If 𝑓 |= (𝐴 ⊲ 𝐵), then there must exists 𝑓 ′ ⊑ 𝑓 such that for all 𝑚 ∈ 𝑀𝑑 such that 𝑚 |= 𝐴, 𝑓 ′(𝑚) |=𝑟 [𝐵]. Given any witness 𝑓 ′, 𝑓 ′ = unitMem[𝐴] ⊙ 𝑓 ′, and also 𝑓 ′ |=𝑟 (𝐴 ⊲ 𝐵). Note that unitMem[𝐴] |=𝑟 (𝐴 ⊲ 𝐴), so 𝑓 ′ = unitMem[𝐴] ⊙ 𝑓 ′ |= (𝐴 ⊲ 𝐴) # (𝐴 ⊲ 𝐵). UNITR Analogous as the UNITL case, except that now using the fact 𝑓 ′ = 𝑓 ′ ⊙ unitMem[𝐵] for any 𝑓 ′ : Mem[𝐴] → D(Mem[𝐵]). □ 307 C.4 CPSL Soundness Definition 4.3.6 (CPSL Validity). A CPSL judgment {𝑃} 𝑐 {𝑄} is valid, written |= {𝑃} 𝑐 {𝑄}, if for every input distribution 𝜇 ∈ D(Mem[Var]) such that the lifted input 𝑓𝜇 ≜ ⟨⟩ ↦→ 𝜇 satisfies 𝑓𝜇 |= 𝑃, the lifted output satisfies 𝑓⟦𝑐⟧𝜇 |= 𝑄. Now, we are ready to prove soundness of CPSL. Theorem 4.3.3 (CPSL Soundness). CPSL is sound: derivable judgments are valid. Proof. By induction on the derivation. Throughout, we write 𝜇 : Mem[Var] for the input distribution and 𝑓 : Mem[∅] → D(Mem[Var]) for the kernel obtained by lifting the input distribution, and we assume that 𝑓 satisfies the pre-condition of the conclusion. ASSN By restriction (theorem 4.3.1), there exists 𝑘1 ⊑ 𝑓 such that FV(𝑒) ⊆ FVR(𝑃) ⊆ range(𝑘1) ⊆ FV(𝑃). Since 𝑓 has empty domain, we have 𝑓 = 𝑘1 ⊙ 𝑘2 for some 𝑘2 : Mem[range(𝑘1)] → D(Mem[Var]). Let 𝑓 ′ = ⟨⟩ ↦→ ⟦𝑥 ← 𝑒⟧(𝜇) be the lifted output. By the semantics of the program and associativity, we have: 𝑓 ′ = ⟨⟩ ↦→ bind(𝜇, 𝑚 ↦→ unit(𝑚 [𝑥 ↦→ ⟦𝑒⟧(𝑚)])) = 𝜋Var\{𝑥} 𝑓 ⊙ (𝑚1 ↦→ unit(𝑚1 ∪ (𝑥 ↦→ ⟦𝑒⟧(𝑚1)))︸ ︷︷ ︸ 𝑔1 ⊕𝑚2 ↦→ unit(𝑚2)︸ ︷︷ ︸ 𝑔2 ) where 𝑚 : Mem[Var], 𝑚1 : Mem[FV(𝑒)], and 𝑚2 : Mem[(Var \ {𝑥}) \ FV(𝑒)]. The maps 𝑔1 and 𝑔2 evidently preserves input to output and are thus kernels. Also, because range(𝑘1) ⊆ FV(𝑃) ⊆ (Var \ {𝑥}) and 𝑘1 ⊑ 𝑓 , 308 we have 𝑘1 ⊑ 𝜋Var\{𝑥} 𝑓 ; in addition, since 𝑘1 |= 𝑃, we have 𝑔 |= 𝑃 by persistence. Since 𝑔1 ⊑ 𝑔1 ⊕ 𝑔2 and 𝑔1 |= (FV(𝑒) ⊲ 𝑥 = 𝑒), we have 𝑔1 ⊕ 𝑔2 |= (FV(𝑒) ⊲ 𝑥 = 𝑒) as well. Thus, we conclude 𝑓 ′ |= 𝑃 # (FV(𝑒) ⊲ 𝑥 = 𝑒). SAMP By restriction (theorem 4.3.1), there exists 𝑘1 ⊑ 𝑓 such that FVR(𝑃) ⊆ range(𝑘1) ⊆ FV(𝑃); let 𝐾 = range(𝑘1). Since 𝑓 has empty domain, we have 𝑓 = 𝑘1 ⊙ 𝑘2 for some 𝑘2 : Mem[𝐾] → D(Mem[Var]). Let 𝑓 ′ = ∅ ↦→ ⟦𝑥 $← 𝑑⟧(𝜇) be the lifted output. We have: 𝑓 ′ = ⟨⟩ ↦→ bind(𝜇, 𝑚 ↦→ bind(⟦𝑑⟧, 𝑣 ↦→ unit(𝑚 [𝑥 ↦→ 𝑣]))) = 𝑓 ⊙ (𝑚 ↦→ bind(⟦𝑑⟧, 𝑣 ↦→ unit(𝑚 [𝑥 ↦→ 𝑣]))) = 𝜋Var\{𝑥} 𝑓 ⊙ ((𝑚1 ↦→ unit(𝑚1)) ⊕ bind(⟦𝑑⟧, 𝑣 ↦→ unit(𝑥 ↦→ 𝑣))) where 𝑚 : Mem[Var], 𝑚1 : Mem[FV(𝑑)], and 𝑚2 : Mem[(Var \ FV(𝑑)) \ {𝑥}]. Again, 𝑔1 and 𝑔2 evidently preserves input to the output and thus are kernels . Because range(𝑘1) ⊆ Var\ {𝑥} and 𝑘1 ⊑ 𝑓 , we have 𝑘1 ⊑ 𝜋Var\{𝑥} 𝑓 . Because 𝑘1 ⊑ 𝜋Var\{𝑥} 𝑓 and 𝑘1 |= 𝑃, we have 𝜋Var\{𝑥} 𝑓 |= 𝑃 by persistence. Since 𝑔1 ⊑ 𝑔1 ⊕ 𝑔2 and 𝑔1 |= (∅ ⊲ 𝑥 $∼ 𝑑), we have 𝑔1 ⊕ 𝑔2 |= (∅ ⊲ 𝑥 $∼ 𝑑) as well. Thus, we conclude 𝑓 ′ |= 𝑃 # (∅ ⊲ 𝑥 $∼ 𝑑). SKIP Trivial. SEQ Trivial. COND At the high level, we proceed the proof in three steps: first, we show that for any 𝑓 satisfying (∅ ⊲ [𝑏]); 𝑃, there exists 𝑗1, 𝑗2 such that 𝑓 = 𝑗1 ⊙ 𝑗2 and range( 𝑗1) = {𝑏}; second, we describe exactly two kernels 𝑙tt and 𝑙ff such that 𝑓𝜇 |⟦𝑏=tt⟧ = 𝑙tt ⊙ 𝑗2 and 𝑓𝜇 |⟦𝑏=ff⟧ = 𝑙ff ⊙ 𝑗2; last, we compute 𝑓⟦if 𝑏 then 𝑐 else 𝑐′⟧𝜇 and show that it satisfies the post-condition. Since all assertions are in DIBI+, we have FVD(𝑃) ⊆ FVR(∅ ⊲ [𝑏]) = {𝑏}. Since 𝑓 |= (∅ ⊲ [𝑏]) # 𝑃, there exists 𝑘1, 𝑘2 such that 𝑘1 ⊙ 𝑘2 = 𝑓 , with 309 𝑘1 |= (∅ ⊲ [𝑏]) and 𝑘2 |= 𝑃. By restriction (theorem 4.3.1), there exists 𝑗1 such that 𝑗1 ⊑ 𝑘1 and dom( 𝑗1) ⊆ FVD(∅ ⊲ [𝑏]) = ∅ {𝑏} = FVR(∅ ⊲ [𝑏]) ⊆ range( 𝑗1) ⊆ FV(∅ ⊲ [𝑏]) = {𝑏}. By restriction (theorem 4.3.1), there exists 𝑗2 such that 𝑗2 ⊑ 𝑘2 and 𝑗2 |= 𝑃, and dom( 𝑗2) ⊆ FVD(𝑃) ⊆ FVR(∅ ⊲ [𝑏]) = {𝑏}. Since dom(𝑘2) = range(𝑘1) ⊇ {𝑏}, we may assume without loss of generality that 𝑗2 |= 𝑃, 𝑗2 ⊑ 𝑘2, and dom( 𝑗2) = {𝑏}. Thus 𝑗1 ⊙ 𝑗2 is defined, and so 𝑗1 ⊙ 𝑗2 ⊑ 𝑘1 ⊙ 𝑘2 ⊑ 𝑓 by lemma C.2.5. By lemma C.2.1, there exists 𝑗 : Mem[range( 𝑗2)] → D(Mem[Var]) such that 𝑗1 ⊙ ( 𝑗2 ⊙ 𝑗) = ( 𝑗1 ⊙ 𝑗2) ⊙ 𝑗 = 𝑓 . Since 𝑗2 ⊑ 𝑗2 ⊙ 𝑗 , we have 𝑗2 ⊙ 𝑗 |= 𝑃. Thus, we may assume without loss of generality that range( 𝑗2) = Var and 𝑗1 ⊙ 𝑗2 = 𝑓 = ⟨⟩ ↦→ 𝜇. Let 𝑙tt, 𝑙ff : Mem[∅] → D(Mem[𝑏]) be defined by 𝑙tt (⟨⟩) = unit(𝑏 ↦→ tt) and 𝑙ff (⟨⟩) = unit(𝑏 = ff ); evidently, 𝑙tt |= (∅ ⊲ 𝑏 = tt) and 𝑙ff |= (∅ ⊲ 𝑏 = ff ). Now, we have: 𝑓𝜇 |⟦𝑏=tt⟧ = 𝑙tt ⊙ 𝑗2 𝑓𝜇 |⟦𝑏=ff⟧ = 𝑙ff ⊙ 𝑗2 where each equality holds if the left side is defined. Regardless of whether the conditional distributions are defined, we always have: 𝑙tt ⊙ 𝑗2 |= (∅ ⊲ 𝑏 = tt) # 𝑃 𝑙ff ⊙ 𝑗2 |= (∅ ⊲ 𝑏 = ff ) # 𝑃. Since both of these kernels have empty domain, we have 𝑙tt ⊙ 𝑗2 = 𝜈tt and 𝑙ff ⊙ 𝑗2 = 𝜈ff for two distributions 𝜈tt, 𝜈ff ∈ D(Mem[Var]). By induction, we 310 have: 𝑓⟦𝑐⟧𝜈tt |= (∅ ⊲ 𝑏 = tt) # (𝑏 : 𝑏 = tt ⊲ 𝑄1) 𝑓⟦𝑐⟧𝜈ff |= (∅ ⊲ 𝑏 = ff ) # (𝑏 : 𝑏 = ff ⊲ 𝑄2). By similar reasoning as for the pre-conditions, there exists 𝑘′1, 𝑘 ′ 2 : Mem[𝑏] → D(Mem[Var]) such that 𝑘′1 |= (𝑏 : 𝑏 = tt ⊲ 𝑄1) and 𝑘′2 |= (𝑏 : 𝑏 = ff ⊲ 𝑄2), and: 𝑓⟦𝑐⟧𝜈tt = 𝑙tt ⊙ 𝑘 ′ 1 𝑓⟦𝑐⟧𝜈ff = 𝑙ff ⊙ 𝑘 ′ 2. Let 𝑘′ : Mem[𝑏] → D(Mem[Var]) be the composite kernel defined as follows: 𝑘′( [𝑏 ↦→ 𝑣]) ≜  𝑘′1( [𝑏 ↦→ tt]) : 𝑣 = tt 𝑘′2( [𝑏 ↦→ ff ]) : 𝑣 = ff . By assumption, 𝑘′ |= ((𝑏 : 𝑏 = tt ⊲ 𝑄1) ∧ (𝑏 : 𝑏 = ff ⊲ 𝑄2)). Now, let 𝑝 ≜ 𝜇(⟦𝑏 = tt⟧) be the probability of taking the first branch. Then we can conclude: 𝑓⟦if 𝑏 then 𝑐 else 𝑐′⟧𝜇 = 𝑓⟦𝑐⟧(𝜇 |⟦𝑏=tt⟧)⊕𝑝⟦𝑐′⟧(𝜇 |⟦𝑏=tt⟧) = 𝑓⟦𝑐⟧𝜈tt⊕𝑝⟦𝑐⟧𝜈ff = 𝑓⟦𝑐⟧𝜈tt ⊕𝑝 𝑓⟦𝑐⟧𝜈ff = (𝑙tt ⊙ 𝑘′1) ⊕𝑝 (𝑙ff ⊙ 𝑘 ′ 2) = (𝑙tt ⊙ 𝑘′) ⊕𝑝 (𝑙ff ⊙ 𝑘′) = (𝑙tt ⊕𝑝 𝑙ff ) ⊙ 𝑘′ |= (∅ ⊲ [𝑏]) # ((𝑏 : 𝑏 = tt ⊲ 𝑄1) ∧ (𝑏 : 𝑏 = ff ⊲ 𝑄2)). Above, 𝑘1⊕𝑝𝑘2 lifts the convex combination operator ⊕𝑝 from distributions to kernels from Mem[∅]. We show the last equality in more detail. For any 311 𝑟 ∈ Mem[Var]: (𝑙tt ⊙ 𝑘′) ⊕𝑝 (𝑙ff ⊙ 𝑘′) (⟨⟩)(𝑟) = 𝑝 · (𝑙tt ⊙ 𝑘′) (⟨⟩)(𝑟) + (1 − 𝑝) · (𝑙ff ⊙ 𝑘′) (⟨⟩)(𝑟) = 𝑝 · (𝑙tt ⊙ 𝑘′) (⟨⟩)(𝑟) + (1 − 𝑝) · (𝑙ff ⊙ 𝑘′) (⟨⟩)(𝑟) = 𝑝 · 𝑙tt (⟨⟩)(𝑏 ↦→ tt) · 𝑘′(𝑏 ↦→ tt) (𝑟) + (1 − 𝑝) · 𝑙ff (⟨⟩)(𝑏 ↦→ ff ) · 𝑘′(𝑏 ↦→ ff ) (𝑟) = ((𝑙tt ⊕𝑝 𝑙ff ) ⊙ 𝑘′) (⟨⟩)(𝑟). where the penultimate equality holds because 𝑙tt and 𝑙ff are deterministic. WEAK Trivial. FRAME The proof for this case follows the argument for the frame rule in PSL, with a few minor changes. There exists 𝑘1, 𝑘2 such that 𝑘1 ⊕ 𝑘2 ⊑ 𝑓 , and 𝑘1 |= 𝑃 and 𝑘2 |= 𝑅; let 𝑆1 ≜ range(𝑘1). Also, by restriction, there exists 𝑘′2 ⊑ 𝑘2 such that 𝑘′2 |= 𝑅 and range(𝑘′2) ⊆ FV(𝑅); let 𝑆2 ≜ range(𝑘′2). Since 𝑘1 and 𝑘2 have empty domains, 𝑆1 and 𝑆2 must be disjoint. Let 𝑆3 = Var \ (𝑆2 ∪ 𝑆1). Since MV(𝑐) is disjoint from 𝑆2 by the first side-condition, we have WV(𝑐) ⊆ MV(𝑐) ⊆ 𝑆1 ∪ 𝑆3. Let 𝑓 ′ = 𝑓⟦𝑐⟧𝜇 be the lifted output. By induction, we have 𝑓 ′ |= 𝑄; by re- striction (theorem 4.3.1), there exists 𝑘′1 ⊑ 𝑓 ′ such that range(𝑘′1) ⊆ FV(𝑄) and 𝑘′1 |= 𝑄. By the last side condition, RV(𝑐) ⊆ FVR(𝑃) ⊆ 𝑆1. By soundness of RV and WV (lemma A.2.1), all variables in WV(𝑐) must be written before they are read and there is a function 𝐹 : Mem[𝑆1] → D(Mem[WV(𝑐) ∪ 𝑆1]) such that: 𝜋WV(𝑐)∪𝑆1⟦𝑐⟧𝜇 = bind(𝜇, 𝑚 ↦→ 𝐹 (𝑚𝑆1)). 312 Since 𝑆2 ⊆ FV(𝑅), variables in 𝑆2 are not in MV(𝑐) by the first side- condition, and 𝑆2 is disjoint from WV(𝑐) ∪ 𝑆1. By soundness of MV, we have: 𝜋WV(𝑐)∪𝑆1∪𝑆2⟦𝑐⟧𝜇 = bind(𝜋WV(𝑐)∪𝑆1∪𝑆2𝜇, 𝐹 ⊕ unitMem[WV(𝑐)∪𝑆2]). Since 𝑆1 and 𝑆2 are independent in 𝜇, we know that 𝑆1 ∪WV(𝑐) and 𝑆2 are independent in ⟦𝑐⟧𝜇. Hence: 𝑓𝜋𝑆1∪WV(𝑐)⟦𝑐⟧𝜇 ⊕ 𝑓𝜋𝑆2⟦𝑐⟧𝜇 ⊑ 𝑓 ′. By induction, 𝑓 ′ |= 𝑄. Furthermore, FV(𝑄) ⊆ FVR(𝑃) ∪ WV(𝑐) ⊆ 𝑆1 ∪WV(𝑐) by the second side-condition. By restriction (theorem 4.3.1), 𝑓𝜋𝑆1∪WV(𝑐)⟦𝑐⟧𝜇 |= 𝑄. Furthermore, 𝜋𝑆2⟦𝑐⟧𝜇 = 𝜋𝑆2𝜇, so 𝜋𝑆2⟦𝑐⟧𝜇 |= 𝑅 as well. Thus, 𝑓 ′ |= 𝑄 ∗ 𝑅 as desired. □ 313 APPENDIX D THE UNARY FRAGMENT BLUEBELL FOR REASONING ABOUT INDEPENDENCE AND CONDITIONAL INDEPENDENCE D.1 The Rules of BLUEBELL In fig. D.1 we summarize the notation we use for assertions over BLUEBELL’s model. Recall that BLUEBELL’s assertions 𝑃 ∈ PA𝐼 ≜M𝐼 u−→ Prop are the upward- closed predicates over elements of the RAM𝐼 . ⌜𝜙⌝ ≜ λ . 𝜙 Own(𝑏) ≜ λ𝑎. 𝑏 ⪯ 𝑎 𝑃 ∧𝑄 ≜ λ𝑎. 𝑃(𝑎) ∧𝑄(𝑎) 𝑃 ∗𝑄 ≜ λ𝑎. ∃𝑏1, 𝑏2. (𝑏1 · 𝑏2) ⪯ 𝑎 ∧ 𝑃(𝑏1) ∧𝑄(𝑏2) ∃𝑥 : 𝑋. 𝐾 ≜ λ𝑎. ∃𝑥 : 𝑋. 𝐾 (𝑥) (𝑎) (𝐾 : 𝑋 → PA𝐼 ) ∀𝑥 : 𝑋. 𝐾 ≜ λ𝑎.∀𝑥 : 𝑋. 𝐾 (𝑥) (𝑎) (𝐾 : 𝑋 → PA𝐼 ) Own(F , 𝜇) ≜ ∃𝑝.Own(F , 𝜇, 𝑝) 𝐸 $∼ 𝜇 ≜ ∃F , 𝜇.Own(F , 𝜇) ∗ ⌜𝐸 � (F (𝑖), 𝜇(𝑖)) ∧ 𝜇 = 𝜇(𝑖) ◦ 𝐸−1⌝ C𝜇 𝐾 ≜ λ𝑎. ∃F , 𝜇, 𝑝, 𝜅. (F , 𝜇, 𝑝) ⪯ 𝑎 ∧ ∀𝑖 ∈ 𝐼 . 𝜇(𝑖) = bind(𝜇, 𝜅(𝑖)) ∧ ∀𝑣 ∈ supp(𝜇).𝐾 (𝑣) (F , 𝜅(𝐼) (𝑣), 𝑝) (𝜇 :D(𝐴), 𝐾 : 𝐴→ PA𝐼 ) wp 𝑡 {𝑄} ≜ λ𝑎.∀𝜇0.∀𝑐. (𝑎 · 𝑐) ⪯ 𝜇0 ⇒ ∃𝑏. ( (𝑏 · 𝑐) ⪯ ⟦𝑡⟧(𝜇0) ∧𝑄(𝑏) ) ⌈𝐸⌉ ≜ (𝐸 ∈ true) $∼ 𝛿True Own(𝐸) ≜ ∃𝜇. 𝐸 $∼ 𝜇 (𝑥:𝑞) ≜ ∃P, 𝑝.Own(P, 𝑝) ∗ ⌜𝑝(𝑖) (𝑥) = 𝑞⌝ 𝑃@𝑝 ≜ 𝑃 ∧ ∃P .Own(P, 𝑝) ⌊𝑅⌋ ≜ ∃𝜇 :D(Val𝑋). ⌜𝜇(𝑅) = 1⌝ ∗ C𝜇 𝑣. ⌈𝑥 = 𝑣(𝑥)⌉𝑥∈𝑋 (𝑅 ⊆ Val𝑋, 𝑋 ⊆ 𝐼 × Var) Figure D.1: The assertions used in BLUEBELL. Proposition D.1.1 (Upward-closure). All the assertions in fig. D.1 are upward- closed. Proof. Easy by inspection of the definitions. The definitions where upward- closedness is less obvious (e.g. joint conditioning) are made upward-closed by 314 construction by explicit use of the order ⪯ in the definition. □ Lemma D.1.2. For all 𝜇 :D(𝐴 × 𝐵), there exists a 𝜅 : 𝐴 → D(𝐵) such that 𝜇 = (𝜇 ◦ 𝜋−1 1 ) � 𝜅. Proof. Let 𝜇1 = 𝜇 ◦ 𝜋−1 1 . Then the result is immediate by letting 𝜅(𝑎) (𝑏) =  𝜇0 (𝑎,𝑏) 𝜇1 (𝑎) if𝜇1(𝑎) > 0 0 otherwise □ D.1.1 Program Semantics We assume each primitive operator 𝜑 ∈ {+,−, <, . . .} has an associated arity ar(𝜑) ∈ N, and is given semantics as some function ⟦𝜑⟧ : Valar(𝜑) → Val. Definition D.1.1. Expressions 𝑒 ∈ E are given semantics as a function ⟦𝑒⟧ : Mem[Var] → Val as standard: ⟦𝑣⟧(𝑠) ≜ 𝑣 ⟦𝑥⟧(𝑠) ≜ 𝑠(𝑥) ⟦𝜑(𝑒1, . . . , 𝑒ar(𝜑))⟧(𝑠) ≜ ⟦𝜑⟧(⟦𝑒1⟧, . . . , ⟦𝑒ar(𝜑)⟧) Definition D.1.2 (Term semantics). Given 𝑡 ∈ T we define its kernel semantics K⟦𝑡⟧ : Mem[Var] → D(ΣMem[Var]) as follows: K⟦skip⟧(𝑠) ≜ unit(𝑠) K⟦𝑥 := 𝑒⟧(𝑠) ≜ unit(𝑠[𝑥 ↦→ ⟦𝑒⟧(𝑠) ]) K⟦𝑥 𝑑⟧(𝑠) ≜ bind(⟦𝑑⟧(⟦𝑒1⟧(𝑠), . . . , ⟦𝑒𝑛⟧(𝑠)), 𝑣 ↦→ return(𝑠[𝑥 ↦→ 𝑣 ])) K⟦𝑡1; 𝑡2⟧(𝑠) ≜ bind(K⟦𝑡1⟧(𝑠), 𝑠′ ↦→ K⟦𝑡2⟧(𝑠′)) K⟦if 𝑒 then 𝑡1 else 𝑡2⟧(𝑠) ≜ if ⟦𝑒⟧(𝑠) ≠ 0 then K⟦𝑡1⟧(𝑠) else K⟦𝑡2⟧(𝑠) K⟦repeat 𝑒 𝑡⟧(𝑠) ≜ loop𝑡 (⟦𝑒⟧(𝑠), 𝑠) 315 where loop𝑡 simply iterates 𝑡: loop𝑡 (𝑛, 𝑠) ≜  unit(𝑠) 𝑛 ≤ 0 bind(loop𝑡 (𝑛 − 1, 𝑠), 𝑠′ ↦→ K⟦𝑡⟧(𝑠′)) Otherwise The semantics of a term is then defined as: ⟦𝑡⟧ : D(ΣMem[Var]) → D(ΣMem[Var]) ⟦𝑡⟧(𝜇) ≜ bind(𝜇, 𝑠 ↦→ K⟦𝑡⟧(𝑠)) Evaluation contexts E are defined by the following grammar: E F 𝑥 := 𝐸 | 𝑥 𝑑 | if 𝐸 then 𝑡1 else 𝑡2 | repeat 𝐸 𝑡 𝐸 F [ · ] | 𝜑( ®𝑒1, 𝐸, ®𝑒2) A simple property holds for evaluation contexts. Lemma D.1.3. K⟦E[𝑒]⟧(𝑠) = K⟦E[⟦𝑒⟧(𝑠)]⟧(𝑠). Proof. Easy by induction on the structure of evaluation contexts. □ D.2 Measure Theory Lemmas Notation In what follows, given 𝑛 ∈ N with 𝑛 > 1, we write [𝑛] to de- note the set {1, . . . , 𝑛}. Moreover, for iterated summation we use the notation∑ 𝑖∈𝐼 |Φ(𝑖) 𝑓 (𝑖) where 𝐼 = {𝑖0, 𝑖1, . . .} is countable and Φ is a predicate on elements of 𝐼, to denote the sum 𝑓 ( 𝑗0)+ 𝑓 ( 𝑗1)+ . . . where 𝑗0, 𝑗1, . . . is the sublist of 𝑖0, 𝑖1, . . . consisting of the elements that satisfy Φ. A similar convention is used for other commutative and associative operators, e.g. ∪. A countable partition of Ω is a 316 partition of Ω, 𝑆 ⊆ 𝒫(Ω), with countably many sets. For uniformity, we rep- resent countable partitions as 𝑆 = {𝐴𝑖}𝑖∈N with the convention that when the partition has finitely many sets, say 𝑛, all the 𝐴𝑖 with 𝑖 ≥ 𝑛 are empty. As mentioned, BLUEBELL is only concerned with discrete distributions, i.e. distributions over a countable set of outcomes. The following lemma ex- presses the key property of 𝜎-algebras over countable outcomes that we exploit for proving the other results. Lemma D.2.1. Let Ω be an countable set, and F to be an arbitrary 𝜎-algebra on Ω. Then there exists a countable partition 𝑆 of Ω such that F = 𝜎(𝑆). Proof. For every element 𝑥 ∈ Ω, we identify the smallest event 𝐸𝑥 ∈ F such that 𝑥 ∈ 𝐸𝑥 , and show that for 𝑥, 𝑧 ∈ Ω, either 𝐸𝑥 = 𝐸𝑧 or 𝐸𝑥 ∩ 𝐸𝑧 = ∅. Then the set 𝑆 = {𝐸𝑥 | 𝑥 ∈ Ω} is a partition of Ω, and any event 𝐸 ∈ F can be represented as⋃ 𝑥∈𝐸 𝐸𝑥 , which suffices to show that F = 𝜎(𝑆). For every 𝑥, 𝑦, let 𝐴𝑥,𝑦 =  Ω if ∀𝐸 ∈ F , either 𝑥, 𝑦 both in 𝐸 or 𝑥, 𝑦 both not in 𝐸 𝐸 otherwise,pick any 𝐸 ∈ F such that 𝑥 ∈ 𝐸 and 𝑦 ∉ 𝐸 Then we show that, for all 𝑥, 𝐸𝑥 = ∩𝑦∈Ω𝐴𝑥,𝑦 is the smallest event in F such that 𝑥 ∈ 𝐸𝑥 as follows. If there exists 𝐸′𝑥 such that 𝑥 ∈ 𝐸′𝑥 and 𝐸′𝑥 ⊂ 𝐸𝑥 , then 𝐸𝑥 \ 𝐸′𝑥 is not empty. Let 𝑦 be an element of 𝐸𝑥 \ 𝐸′𝑥 , and by the definition of 𝐴𝑥,𝑦, we have 𝑦 ∉ 𝐴𝑥,𝑦. Thus, 𝑦 ∉ ∩𝑦∈Ω𝐴𝑥,𝑦 = 𝐸𝑥 , which contradicts with 𝑦 ∈ 𝐸𝑥 \ 𝐸′𝑥 . Next, for any 𝑥, 𝑧 ∈ Ω, since 𝐸𝑥 is the smallest event containing 𝑥 and 𝐸𝑧 is the smallest event containing 𝑧, the smaller event 𝐸𝑧 \ 𝐸𝑥 is either equivalent to 𝐸𝑧 or not containing 𝑧. 317 If 𝐸𝑧 \ 𝐸𝑥 = 𝐸𝑧, then 𝐸𝑥 and 𝐸𝑧 are disjoint. If 𝑧 ∉ 𝐸𝑧 \ 𝐸𝑥 , then it must 𝑧 ∈ 𝐸𝑥 , which implies that there exists no 𝐸 ∈ F such that 𝑥 ∈ 𝐸 and 𝑧 ∉ 𝐸 . Because F is closed under complement, then there exists no 𝐸 ∈ F such that 𝑥 ∉ 𝐸 and 𝑧 ∈ 𝐸 as well. Therefore, we have 𝑥 ∈ ⋂ 𝑦∈Ω 𝐴𝑧,𝑦 = 𝐸𝑧 as well. Furthermore, because 𝐸𝑧 is the smallest event in F that contains 𝑧 and 𝐸𝑥 also contains 𝑧, we have 𝐸𝑧 ⊆ 𝐸𝑥 ; symmetrically, we have 𝐸𝑥 ⊆ 𝐸𝑧. Thus, 𝐸𝑥 = 𝐸𝑧. Hence, the set 𝑆 = {𝐸𝑥 | 𝑥 ∈ Ω} is a countable partition of Ω. □ Lemma D.2.2. If 𝑆 = {𝐴𝑖}𝑖∈N is a partition of Ω, and F = 𝜎(𝑆), then every event 𝐸 ∈ F can be written as 𝐸 = ⊎ 𝑖∈𝐼 𝐴𝑖 for some 𝐼 ⊆ N. In other words, 𝜎(𝑆) = {[} ]⊎ 𝑖∈𝐼 𝐴𝑖 |𝐼 ⊆ N. Proof. Because 𝜎-algebras are closed under countable union, for any 𝐼 ⊆ N,⊎ 𝑖∈𝐼 𝐴𝑖 ∈ 𝜎(𝑆). Thus, 𝜎(𝑆) ⊇ {[} ]⊎ 𝑖∈𝐼 𝐴𝑖 | 𝐼 ⊆ N. Also, {[} ]⊎ 𝑖∈𝐼 𝐴𝑖 | 𝐼 ⊆ N is a 𝜎-algebra: • Ω = ⊎ 𝑖∈N 𝐴𝑖. • Given a countable sequences of events 𝐸1 = ⊎ 𝑖∈𝐼1 𝐴𝑖, 𝐸2 = ⊎ 𝑖∈𝐼2 𝐴𝑖, . . . , let 𝐼 = ⋃ 𝑗∈N 𝐼 𝑗 ; then we have ⋃ 𝑗∈N 𝐸𝑖 = ⊎ 𝑖∈𝐼 𝐴𝑖. • If 𝐸 = ⊎ 𝑖∈𝐼 𝐴𝑖, then the complement of 𝐸 is (Ω \ 𝐸) = ⊎ 𝑖∈(N\𝐼) 𝐴𝑖. Then, {⊎𝑖∈𝐼 𝐴𝑖 | 𝐼 ⊆ N} is a 𝜎-algebra that contains 𝑆. Therefore, 𝜎(𝑆) = {[} ]⊎ 𝑖∈𝐼 𝐴𝑖 | 𝐼 ⊆ N. □ 318 Lemma D.2.3. Let Ω be a countable set. If 𝑆1 = {𝐴𝑖}𝑖∈N and 𝑆2 = {𝐵 𝑗 } 𝑗∈N are both countable partitions of Ω, then 𝜎(𝑆1) ⊆ 𝜎(𝑆2) implies that for any 𝐵 𝑗 ∈ 𝑆2 with 𝐵 𝑗 ≠ ∅, we can find a unique 𝐴𝑖 ∈ 𝑆1 such that 𝐵 𝑗 ⊆ 𝐴𝑖. Proof. For any 𝐵 𝑗 ∈ 𝑆2 with 𝐵 𝑗 ≠ ∅, pick an arbitrary element 𝑠 ∈ 𝐵 𝑗 and denote the unique element of 𝑆1 that contains 𝑠 as 𝐴𝑖. Because 𝐴𝑖 ∈ 𝑆1 and 𝑆1 ⊂ 𝜎(𝑆1) ⊆ 𝜎(𝑆2), we have 𝐴𝑖 ∈ 𝜎(𝑆2). Note that 𝑠 ∈ 𝐵 𝑗 and 𝐵 𝑗 is an element of the partition 𝑆2 that generates 𝜎(𝑆2), 𝐵 𝑗 must be the smallest event in 𝜎(𝑆2) that contains 𝑠. Because 𝑠 ∈ 𝐴𝑖 as well, 𝐵 𝑗 being the smallest event containing 𝑠 implies that 𝐵 𝑗 ⊆ 𝐴𝑖. □ Lemma D.2.4. Assume we are given a 𝜎-algebra F1 over a countable set Ω, measure 𝜇1 ∈ D(F1), a countable set 𝐴, a distribution 𝜇 ∈ Σ𝐴, and a function 𝜅1 : 𝐴→ D(F1) such that 𝜇1 = bind(𝜇, 𝜅1). Then, for any probability space (F2, 𝜇2) such that (F1, 𝜇1) ⊑ (F2, 𝜇2), there exists 𝜅2 such that 𝜇2 = bind(𝜇, 𝜅2), and that for any 𝑎 ∈ supp(𝜇), (F1, 𝜅1(𝑎)) ⊑ (F2, 𝜅2(𝑎)). Proof. By lemma D.2.1, F𝑖 = 𝜎(𝑆𝑖) for some countable partition 𝑆𝑖. Also, (F1, 𝜇1) ⊑ (F2, 𝜇2) implies that F1 ⊆ F2. So we have 𝜎(𝑆1) ⊆ 𝜎(𝑆2), which by lemma D.2.3 implies that for any 𝐵 ∈ 𝑆2 with 𝐵 ≠ ∅, we can find a unique 𝐴 ∈ 𝑆1 such that 𝐵 ⊆ 𝐴. Let 𝑓 be the mapping associating to any 𝐵 ≠ ∅ the corresponding 𝐴 = 𝑓 (𝐵), and 𝑓 (𝐵) = ∅ when 𝐵 = ∅. Then, we define 𝜅2 as follows: for any 𝑎 ∈ 𝐴, 𝐸 ∈ F2, there exists 𝑆 ⊆ 𝑆2 such that 𝐸 = ⊎ 𝐵∈𝑆 𝐵, then define 𝜅2(𝑎) (𝐸) = ∑︁ 𝐵∈𝑆 𝜅1(𝑎) ( 𝑓 (𝐵)) · ℎ(𝐵), where ℎ(𝐵) = 𝜇2(𝐵)/𝜇2( 𝑓 (𝐵)) if 𝜇2( 𝑓 (𝐵)) ≠ 0 and ℎ(𝐵) = 0 otherwise. 319 Then we calculate: bind(𝜇, 𝜅2) (𝐸) = ∑︁ 𝑎∈𝐴 𝜇(𝑎) · 𝜅2(𝐸) = ∑︁ 𝑎∈𝐴 𝜇(𝑎) · ∑︁ 𝐵∈𝑆 𝜅1(𝑎) ( 𝑓 (𝐵)) · ℎ(𝐵) = ∑︁ 𝐵∈𝑆 ∑︁ 𝑎∈𝐴 𝜇(𝑎) · 𝜅1(𝑎) ( 𝑓 (𝐵)) · ℎ(𝐵) = ∑︁ 𝐵∈𝑆 bind(𝜇, 𝜅1) ( 𝑓 (𝐵)) · ℎ(𝐵) = ∑︁ 𝐵∈𝑆 𝜇1( 𝑓 (𝐵)) · ℎ(𝐵) = ∑︁ 𝐵∈𝑆 |𝜇2 ( 𝑓 (𝐵))≠0 𝜇1( 𝑓 (𝐵)) · 𝜇2(𝐵) 𝜇2( 𝑓 (𝐵)) = ∑︁ 𝐵∈𝑆 |𝜇2 ( 𝑓 (𝐵))≠0 𝜇2( 𝑓 (𝐵)) · 𝜇2(𝐵) 𝜇2( 𝑓 (𝐵)) (𝜇1(𝐸′) = 𝜇2(𝐸′) for any 𝐸′ ∈ F1) = ∑︁ 𝐵∈𝑆 |𝜇2 ( 𝑓 (𝐵))≠0 𝜇2(𝐵) = ∑︁ 𝐵∈𝑆 |𝜇2 ( 𝑓 (𝐵))≠0 𝜇2(𝐵) + ∑︁ 𝐵∈𝑆 |𝜇2 ( 𝑓 (𝐵))=0 𝜇2(𝐵) (Because 𝜇2( 𝑓 (𝐵)) = 0 implies 𝜇2(𝐵) = 0) = ∑︁ 𝐵∈𝑆 𝜇2(𝐵) = 𝜇2( ⊎ 𝐵∈𝑆 𝐵) = 𝜇2(𝐸) Thus, bind(𝜇, 𝜅2) = 𝜇2. Also, for any 𝑎 ∈ 𝐴𝜇, for any 𝐸 ∈ F1, there exists 𝑆′ ⊆ 𝑆1 such that 𝐸 = 320 ⊎ 𝐴∈𝑆′ 𝐴. 𝜅2(𝑎) (𝐸) = 𝜅2(𝑎) (⊎ 𝐴∈𝑆′ 𝐴 ) = ∑︁ 𝐴∈𝑆′ 𝜅2(𝑎) (𝐴) = ∑︁ 𝐴∈𝑆′ ∑︁ 𝐵⊆𝐴|𝐵∈F2 𝜅2(𝑎) (𝐵) = ∑︁ 𝐴∈𝑆′ ∑︁ 𝐵⊆𝐴|𝐵∈F2,𝜇2 ( 𝑓 (𝐵))≠0 𝜅1(𝑎) ( 𝑓 (𝐵)) · 𝜇2(𝐵) 𝜇2( 𝑓 (𝐵)) = ∑︁ 𝐴∈𝑆′ |𝜇2 (𝐴)≠0 𝜅1(𝑎) (𝐴) · (∑ 𝐵⊆𝐴|𝐵∈F2 𝜇2(𝐵) ) 𝜇2(𝐴) = ∑︁ 𝐴∈𝑆′ |𝜇2 (𝐴)≠0 𝜅1(𝑎) (𝐴) · 𝜇2(𝐴) 𝜇2(𝐴) = ∑︁ 𝐴∈𝑆′ |𝜇2 (𝐴)≠0 𝜅1(𝑎) (𝐴) = ∑︁ 𝐴∈𝑆′ 𝜅1(𝑎) (𝐴) = 𝜅1(𝑎) (⊎ 𝐴∈𝑆′ 𝐴 ) = 𝜅1(𝑎) (𝐸) Thus, for any 𝑎, (𝜎1, 𝜅1(𝑎)) ⊑ (𝜎2, 𝜅2(𝑎)). □ Lemma D.2.5. Given two 𝜎-algebras F1 and F2 over two countable underlying sets Ω1,Ω2, then a general element in the product 𝜎-algebra F1 ⊗ F2 can be expressed as⊎ (𝑖, 𝑗)∈𝐼 (𝐴𝑖 × 𝐵 𝑗 ) for some 𝐼 ⊆ N2 and 𝐴𝑖 ∈ F1, 𝐵 𝑗 ∈ F2 for (𝑖, 𝑗) ∈ 𝐼. Proof. By lemma D.2.1, each 𝜎-algebra F𝑖 is generated by a countable partition over Ω𝑖. Let 𝑆1 = {𝐴𝑖}𝑖∈N be the countable partition that generates F1, 𝑆2 = {𝐵𝑖}𝑖∈N be the countable partition that generates F2. By lemma D.2.2, a general element in F1 can be written as ⊎ 𝑗∈𝐽 𝐴 𝑗 for some 𝐽 ⊆ N, and similarly, a general element in F2 can be written as ⊎ 𝑘∈𝐾 𝐵𝑘 for some 𝐾 ⊆ N. 321 Note that {𝐴 𝑗 × 𝐵𝑘 } 𝑗 ,𝑘∈N is a partition because: if (𝐴 𝑗 × 𝐵𝑘 ) ∩ (𝐴 𝑗 ′ × 𝐵𝑘 ′) ≠ ∅ for some 𝑗 ≠ 𝑗 ′ and 𝑘 ≠ 𝑘′, then it must 𝐴 𝑗 ∩ 𝐴 𝑗 ′ ≠ ∅ and 𝐵𝑘 ∩ 𝐵𝑘 ′ ≠ ∅, and that imply that 𝐴 𝑗 = 𝐴 𝑗 ′ and 𝐵 𝑗 = 𝐵 𝑗 ′ ; therefore, 𝐴 𝑗 × 𝐵𝑘 = 𝐴 𝑗 ′ × 𝐵𝑘 ′ . We next show that F1 ⊗ F2 is generated by the partition {𝐴 𝑗 × 𝐵𝑘 } 𝑗 ,𝑘∈N. F1 ⊗ F2 = 𝜎(F1 × F2) = 𝜎 ( {∗}⊎ 𝑗∈𝐽1 𝐴 𝑗 × ⊎ 𝑗∈𝐽2 𝐵 𝑗 |𝐽1, 𝐽2 ⊆ N ) = 𝜎 ( {∗}⊎ 𝑗∈𝐽1,𝑘∈𝐽2 (𝐴 𝑗 × 𝐵𝑘 ) |𝐽1, 𝐽2 ⊆ N ) = 𝜎 ( {∗}𝐴 𝑗 × 𝐵𝑘 | 𝑗 , 𝑘 ⊆ N ) Since each 𝐴 𝑗 ∈ 𝑆1 ⊆ F1 and 𝐵𝑘 ∈ 𝑆2 ⊆ F2 a general element in F1 ⊗ F2 can be expressed as {∗}⊎ 𝑗 ,𝑘⊆𝐼 (𝐴 𝑗 × 𝐵𝑘 ) | 𝐴 𝑗 ∈ F1, 𝐵𝑘 ∈ F2, 𝐼 ⊆ N2 according to lemma D.2.1. □ Lemma D.2.6. Given two probability spaces (F𝑎, 𝜇𝑎), (F𝑏, 𝜇𝑏) ∈ P(Ω), their indepen- dent product (F𝑎, 𝜇𝑎)⊛ (F𝑏, 𝜇𝑏) exists if 𝜇𝑎 (𝐸𝑎) · 𝜇𝑏 (𝐸𝑏) = 0 for any 𝐸𝑎 ∈ F𝑎, 𝐸𝑏 ∈ F𝑏 such that 𝐸𝑎 ∩ 𝐸𝑏 = ∅. Proof. We first define 𝜇 : {𝐸𝑎 ∩ 𝐸𝑏 | 𝐸𝑎 ∈ F𝑎, 𝐸𝑏 ∈ F𝑏} → [0, 1] by 𝜇(𝐸𝑎 ∩ 𝐸𝑏) = 𝜇𝑎 (𝐸𝑎) ·𝜇𝑏 (𝐸𝑏) for any 𝐸𝑎 ∈ F𝑎, 𝐸𝑏 ∈ F𝑏, and then show that 𝜇 could be extended to a probability measure on F𝑎 ⊕ F𝑏. • We first need to show that 𝜇 is well-defined. That is, 𝐸𝑎 ∩ 𝐸𝑏 = 𝐸′𝑎 ∩ 𝐸′𝑏 implies 𝜇𝑎 (𝐸𝑎) · 𝜇𝑏 (𝐸𝑏) = 𝜇𝑎 (𝐸′𝑎) · 𝜇𝑏 (𝐸′𝑏). When 𝐸𝑎 ∩ 𝐸𝑏 = 𝐸′𝑎 ∩ 𝐸′𝑏, it must 𝐸𝑎 ∩ 𝐸′𝑎 ⊇ 𝐸𝑎 ∩ 𝐸𝑏 = 𝐸′𝑎 ∩ 𝐸′𝑏, Thus, 𝐸𝑎 \ 𝐸′𝑎 ⊆ 𝐸𝑎 \ 𝐸𝑏, and then 𝐸𝑎 \ 𝐸′𝑎 is disjoint from 𝐸𝑏; symmetrically, 𝐸′𝑎 \ 𝐸𝑎 is disjoint from 𝐸′ 𝑏 . Since 𝐸𝑎, 𝐸′𝑎 are both in F𝑎, we have 𝐸𝑎 \ 𝐸′𝑎 322 and 𝐸′𝑎 \ 𝐸𝑎 both measurable in F𝑎. Their disjointness and the result above implies that 𝜇𝑎 (𝐸𝑎 \𝐸′𝑎) ·𝜇𝑏 (𝐸𝑏) = 0 and 𝜇𝑎 (𝐸′𝑎 \𝐸𝑎) ·𝜇𝑏 (𝐸′𝑏) = 0. Symmetric reasoning can also show that 𝐸′ 𝑏 \ 𝐸𝑏 is disjoint from 𝐸′𝑎 ∩ 𝐸𝑎, and 𝐸𝑏 \ 𝐸′𝑏 is disjoint from 𝐸′𝑎 ∩ 𝐸𝑎, which implies 𝜇𝑎 (𝐸𝑏 \ 𝐸′𝑏) · 𝜇𝑏 (𝐸 ′ 𝑎 ∩ 𝐸𝑎) = 0 and 𝜇𝑎 (𝐸′𝑏 \ 𝐸𝑏) · 𝜇𝑏 (𝐸 ′ 𝑎) = 0. Then there are four possibilities: – If 𝜇𝑏 (𝐸𝑏) = 0 and 𝜇𝑏 (𝐸′𝑏) = 0, then 𝜇𝑎 (𝐸𝑎)·𝜇𝑏 (𝐸𝑏) = 0 = 𝜇𝑎 (𝐸′𝑎)·𝜇𝑏 (𝐸′𝑏). – If 𝜇𝑎 (𝐸𝑎 \ 𝐸′𝑎) = 0 and 𝜇𝑏 (𝐸′𝑎 \ 𝐸𝑎) = 0. Then 𝜇𝑎 (𝐸𝑎) · 𝜇𝑏 (𝐸𝑏) = 𝜇𝑎 ((𝐸′𝑎 \ 𝐸𝑎) ⊎ (𝐸′𝑎 ∩ 𝐸𝑎)) · 𝜇𝑏 (𝐸𝑏) = (𝜇𝑎 (𝐸′𝑎 \ 𝐸𝑎) + 𝜇𝑎 (𝐸′𝑎 ∩ 𝐸𝑎)) · 𝜇𝑏 (𝐸𝑏) = 𝜇𝑎 (𝐸′𝑎 ∩ 𝐸𝑎) · 𝜇𝑏 (𝐸𝑏) = (𝜇𝑎 (𝐸𝑎 \ 𝐸′𝑎) + 𝜇𝑎 (𝐸′𝑎 ∩ 𝐸𝑎)) · 𝜇𝑏 (𝐸𝑏) = 𝜇𝑎 (𝐸′𝑎) · 𝜇𝑏 (𝐸𝑏) Thus, either 𝜇𝑎 (𝐸′𝑎 ∩ 𝐸𝑎) = 0, which implies that 𝜇𝑎 (𝐸𝑎) · 𝜇𝑏 (𝐸𝑏) = (0+0) · 𝜇𝑏 (𝐸𝑏) = 0 = (0+0) · 𝜇𝑏 (𝐸𝑏) = 𝜇𝑎 (𝐸′𝑎) · 𝜇𝑏 (𝐸′𝑏), or we have both 𝜇𝑏 (𝐸′𝑏 \𝐸𝑏) = 0 and 𝜇𝑏 (𝐸𝑏 \𝐸′𝑏) = 0, which imply that 𝜇𝑎 (𝐸𝑎) · 𝜇𝑏 (𝐸𝑏) = 𝜇𝑎 (𝐸′𝑎) · 𝜇𝑏 (𝐸𝑏) = 𝜇𝑎 (𝐸′𝑎) · 𝜇𝑏 ((𝐸𝑏 ∩ 𝐸′𝑏) ⊎ (𝐸𝑏 \ 𝐸 ′ 𝑏)) = 𝜇𝑎 (𝐸′𝑎) · (𝜇𝑏 (𝐸𝑏 ∩ 𝐸′𝑏) + 0) = 𝜇𝑎 (𝐸′𝑎) · (𝜇𝑏 (𝐸𝑏 ∩ 𝐸′𝑏) + 𝜇𝑏 (𝐸 ′ 𝑏 \ 𝐸𝑏)) = 𝜇𝑎 (𝐸′𝑎) · 𝜇𝑏 (𝐸′𝑏). 323 – If 𝜇𝑏 (𝐸′𝑏) = 0 and 𝜇𝑏 (𝐸𝑎 \ 𝐸′𝑎) = 0, then 𝜇𝑎 (𝐸𝑎) · 𝜇𝑏 (𝐸𝑏) = (𝜇𝑎 (𝐸𝑎 ∩ 𝐸′𝑎) + 𝜇𝑎 (𝐸𝑎 \ 𝐸′𝑎)) · (𝜇𝑏 (𝐸𝑏 ∩ 𝐸′𝑏) + 𝜇𝑏 (𝐸𝑏 \ 𝐸 ′ 𝑏)) = 𝜇𝑎 (𝐸𝑎 ∩ 𝐸′𝑎) · 𝜇𝑏 (𝐸𝑏 \ 𝐸′𝑏) Because 𝜇𝑎 (𝐸𝑏 \ 𝐸′𝑏) · 𝜇𝑏 (𝐸 ′ 𝑎 ∩ 𝐸𝑎) = 0 and 𝜇𝑎 (𝐸′𝑏 \ 𝐸𝑏) · 𝜇𝑏 (𝐸 ′ 𝑎) = 0. Thus, 𝜇𝑎 (𝐸𝑎) · 𝜇𝑏 (𝐸𝑏) = 0 = 𝜇𝑎 (𝐸′𝑎) · 𝜇𝑏 (𝐸′𝑏). – If 𝜇𝑏 (𝐸𝑏) = 0 and 𝜇𝑏 (𝐸′𝑎 \ 𝐸𝑎) = 0, then symmetric as above. In all these cases, 𝜇𝑎 (𝐸𝑎) · 𝜇𝑏 (𝐸𝑏) = 𝜇𝑎 (𝐸′𝑎) · 𝜇𝑏 (𝐸′𝑏) as desired. • Show that 𝜇 satisfy countable additivity in {𝐸𝑎 ∩ 𝐸𝑏 | 𝐸𝑎 ∈ F𝑎, 𝐸𝑏 ∈ F𝑏}. We start with showing that 𝜇 is finite-additive. Suppose 𝐸𝑛𝑎 ∩ 𝐸𝑛𝑏 =⊎ 𝑖∈[𝑛] (𝐴𝑖 ∩ 𝐵𝑖) where each 𝐴𝑖 ∈ F𝑎 and 𝐵𝑖 ∈ F𝑏. Fix any 𝐴𝑖 ∩ 𝐵𝑖, there is unique minimal 𝐴 ∈ F𝑎 containing 𝐴𝑖 ∩ 𝐵𝑖, because if 𝐴 ⊇ 𝐴𝑖 ∩ 𝐵𝑖 and 𝐴′ ⊇ 𝐴𝑖 ∩𝐵𝑖, then 𝐴∩ 𝐴′ ⊇ 𝐴𝑖 ∩𝐵𝑖 and 𝐴∩ 𝐴′ ∈ F𝐴 too, and 𝐴∩ 𝐴′ is smaller. Because we have shown that 𝜇 is well-defined, in the following proof, we can assume without loss of generality that 𝐴𝑖 is the smallest set in F𝑎 con- taining 𝐴𝑖 ∩ 𝐵𝑖. Similarly, we let 𝐵𝑖 to be the smallest set in F𝑏 containing 𝐴𝑖 ∩ 𝐵𝑖. Thus, 𝐸𝑛𝑎 ∩ 𝐸𝑛𝑏 = ⊎ 𝑖∈[𝑛] (𝐴𝑖 ∩ 𝐵𝑖) implies every 𝐴𝑖 is smaller than 𝐸𝑛𝑎 and every 𝐵𝑖 is smaller than 𝐸𝑛 𝑏 . Therefore, 𝐸𝑛𝑎 ⊇ ∪𝑖∈[𝑛]𝐴𝑖 and 𝐸𝑛 𝑏 ⊇ ∪𝑖∈[𝑛]𝐵𝑖, which implies that 𝐸𝑛𝑎 ∩ 𝐸𝑛𝑏 ⊇ (∪𝑖∈[𝑛]𝐴𝑖) ∩ (∪𝑖∈[𝑛]𝐵𝑖) ⊇ ∪𝑖∈[𝑛] (𝐴𝑖 ∩ 𝐵𝑖) = 𝐸 𝑛 𝑎 ∩ 𝐸𝑛𝑏 , which implies that the ⊇ in the inequalities all collapse to =. For any 𝐼 ⊆ [𝑛], define 𝛼𝐼 = ∩𝑖∈𝐼𝐴𝑖\(∪𝑖∈[𝑛]\𝐼𝐴𝑖), and 𝛽𝐼 = ∩𝑖∈𝐼𝐵𝑖\(∪𝑖∈[𝑛]\𝐼𝐵𝑖). For any 𝐼 ≠ 𝐼′, 𝛼𝐼∩𝛼𝐼′ = ∅. Thus, {𝛼𝐼}𝐼⊆[𝑛] is a set of disjoint sets in ∪𝑖∈[𝑛]𝐴𝑖, and similarly, {𝛽𝐼}𝐼⊆[𝑛] is a set of disjoint sets in ∪𝑖∈[𝑛]𝐵𝑖. Also, for any 324 𝑖 ∈ [𝑛], we have 𝐴𝑖 = ∪𝐼⊆[𝑛] |𝑖∈𝐼𝛼𝐼 and 𝐵𝑖 = ∪𝐼⊆[𝑛] |𝑖∈𝐼𝛽𝐼 . Furthermore, for any 𝐼, 𝛼𝐼 ∩ ∪𝑖∈[𝑛]𝐵𝑖 ⊆ (∪𝑖∈[𝑛]𝐴𝑖) ∩ (∪𝑖∈[𝑛]𝐵𝑖) = ⊎ 𝑖∈[𝑛] 𝐴𝑖 ∩ 𝐵𝑖, and thus, 𝛼𝐼 ∩ ∪𝑖∈[𝑛]𝐵𝑖 = ( ⊎ 𝑖∈[𝑛] 𝐴𝑖 ∩ 𝐵𝑖) ∩ (𝛼𝐼 ∩ ∪𝑖∈[𝑛]𝐵𝑖) = ⊎ 𝑖∈[𝑛] ( 𝐴𝑖 ∩ 𝐵𝑖 ∩ 𝛼𝐼 ∩ ∪ 𝑗∈[𝑛]𝐵 𝑗 ) = ⊎ 𝑖∈𝐼 ( 𝐴𝑖 ∩ 𝐵𝑖 ∩ 𝛼𝐼 ∩ ∪ 𝑗∈[𝑛]𝐵 𝑗 ) (𝐴𝑖 ∩ 𝛼𝐼 = ∅ if 𝑖 ∉ 𝐼) = ⊎ 𝑖∈𝐼 (𝐴𝑖 ∩ 𝐵𝑖 ∩ 𝛼𝐼) (𝐵𝑖 ∩ ∪ 𝑗∈[𝑛]𝐵 𝑗 = 𝐵𝑖 for any 𝑖) = ⊎ 𝑖∈𝐼 (𝐵𝑖 ∩ 𝛼𝐼) (𝐴𝑖 ∩ 𝛼𝐼 = 𝛼𝐼 for any 𝑖 ∈ 𝐼) = 𝛼𝐼 ∩ ∪𝑖∈𝐼𝐵𝑖 (D.1) 325 Now, 𝜇(𝐸𝑛𝑎 ∩ 𝐸𝑛𝑏) = 𝜇((∪𝑖∈[𝑛]𝐴𝑖) ∩ (∪𝑖∈[𝑛]𝐵𝑖)) = 𝜇((⊎𝐼⊆[𝑛] 𝛼𝐼) ∩ (∪𝑖∈[𝑛]𝐵𝑖)) (By definition of 𝛼𝐼) = 𝜇𝑎 ( ⊎ 𝐼⊆[𝑛] 𝛼𝐼) · 𝜇𝑏 (∪𝑖∈[𝑛]𝐵𝑖) (By definition of 𝜇) =  ∑︁ 𝐼⊆[𝑛] 𝜇𝑎 (𝛼𝐼)  · 𝜇𝑏 (∪𝑖∈[𝑛]𝐵𝑖) (By finite-additivity of 𝜇𝑎) = ∑︁ 𝐼⊆[𝑛] 𝜇𝑎 (𝛼𝐼) · 𝜇𝑏 (∪𝑖∈[𝑛]𝐵𝑖) = ∑︁ 𝐼⊆[𝑛] 𝜇(𝛼𝐼 ∩ (∪𝑖∈[𝑛]𝐵𝑖)) (By definition of 𝜇) = ∑︁ 𝐼⊆[𝑛] 𝜇(𝛼𝐼 ∩ (∪𝑖∈𝐼𝐵𝑖)) (By eq. (D.1)) = ∑︁ 𝐼⊆[𝑛] 𝜇𝑎 (𝛼𝐼) · 𝜇𝑏 (∪𝑖∈𝐼𝐵𝑖) (By definition of 𝜇) = ∑︁ 𝐼⊆[𝑛] 𝜇𝑎 (𝛼𝐼) · 𝜇𝑏 (∪𝑖∈𝐼 (⊎𝐼′⊆[𝑛] |𝑖∈𝐼′𝛽𝐼′)) (By definition of 𝛽𝐼) = ∑︁ 𝐼⊆[𝑛] 𝜇𝑎 (𝛼𝐼) · 𝜇𝑏 (⊎𝐼′⊆[𝑛] |𝐼∩𝐼′≠∅𝛽𝐼′) = ∑︁ 𝐼⊆[𝑛] 𝜇𝑎 (𝛼𝐼) · ∑︁ 𝐼′⊆[𝑛] |𝐼∩𝐼′≠∅ 𝜇𝑏 (𝛽𝐼′) = ∑︁ 𝐼⊆[𝑛] ∑︁ 𝐼′⊆[𝑛] |𝐼∩𝐼′≠∅ 𝜇𝑎 (𝛼𝐼) · 𝜇𝑏 (𝛽𝐼′) Meanwhile, for any 𝐼, 𝐼′, if |𝐼 ∩ 𝐼′| ≥ 2, then there exists some 𝑗 , 𝑘 such that 326 𝑗 ∈ 𝐼 ∩ 𝐼′ and 𝑘 ∈ 𝐼 ∩ 𝐼′, so 𝜇𝑎 (𝛼𝐼) · 𝜇𝑏 (𝛽𝐼′) = 𝜇𝑎 (∩𝑖∈𝐼𝐴𝑖 \ (∪𝑖∈[𝑛]\𝐼𝐴𝑖)) · 𝜇𝑏 (∩𝑖∈𝐼𝐵𝑖 \ (∪𝑖∈[𝑛]\𝐼𝐵𝑖)) ≤ 𝜇𝑎 (𝐴 𝑗 ∩ 𝐴𝑘 ) · 𝜇𝑏 (𝐵 𝑗 ∩ 𝐵𝑘 ) = 𝜇(𝐴 𝑗 ∩ 𝐴𝑘 ∩ 𝐵 𝑗 ∩ 𝐵𝑘 ) = 𝜇((𝐴 𝑗 ∩ 𝐵 𝑗 ) ∩ (𝐴𝑘 ∩ 𝐵𝑘 )) = 𝜇(∅) = 0. Thus, continuing our previous derivation, 𝜇(𝐸𝑛𝑎 ∩ 𝐸𝑛𝑏) = ∑︁ 𝐼⊆[𝑛] ∑︁ 𝐼′⊆[𝑛] |𝐼∩𝐼′≠∅ 𝜇𝑎 (𝛼𝐼) · 𝜇𝑏 (𝛽𝐼′) = ∑︁ 𝐼⊆[𝑛] ∑︁ 𝐼′⊆[𝑛] |1=|𝐼∩𝐼′ | 𝜇𝑎 (𝛼𝐼) · 𝜇𝑏 (𝛽𝐼′) (Because 𝜇𝑎 (𝛼𝐼) · 𝜇𝑏 (𝛽𝐼′) = 0 if |𝐼 ∩ 𝐼′| ≥ 2) = ∑︁ 𝑖∈[𝑛] ∑︁ 𝐼⊆[𝑛] |𝑖∈𝐼 ∑︁ 𝐼′⊆[𝑛] |𝐼∩𝐼′={𝑖} 𝜇𝑎 (𝛼𝐼) · 𝜇𝑏 (𝛽𝐼′) = ∑︁ 𝑖∈[𝑛] ∑︁ 𝐼⊆[𝑛] |𝑖∈𝐼 ∑︁ 𝐼′⊆[𝑛] |𝑖∈𝐼′ 𝜇𝑎 (𝛼𝐼) · 𝜇𝑏 (𝛽𝐼′) (Because 𝜇𝑎 (𝛼𝐼) · 𝜇𝑏 (𝛽𝐼′) = 0 if |𝐼 ∩ 𝐼′| ≥ 2) = ∑︁ 𝑖∈[𝑛]  ∑︁ 𝐼⊆[𝑛] |𝑖∈𝐼 𝜇𝑎 (𝛼𝐼) · ∑︁ 𝐼′⊆[𝑛] |𝑖∈𝐼′ 𝜇𝑏 (𝛽𝐼′)  = ∑︁ 𝑖∈[𝑛] 𝜇𝑎 (𝐴𝑖) · 𝜇𝑏 (𝐵𝑖) = ∑︁ 𝑖∈[𝑛] 𝜇(𝐴𝑖 ∩ 𝐵𝑖) Thus, we established the finite additivity. For countable additivity, sup- pose 𝐸𝑎 ∩ 𝐸𝑏 = ⊎ 𝑖∈N(𝐴𝑖 ∩ 𝐵𝑖). By the same reason as above, we also have 𝐸𝑎 ∩ 𝐸𝑏 = (∪𝑖∈N𝐴𝑖) ∩ (∪𝑖∈N𝐵𝑖) = ∪𝑖∈N(𝐴𝑖 ∩ 𝐵𝑖) = 𝐸𝑎 ∩ 𝐸𝑏 . 327 Then, 𝜇(𝐸𝑎 ∩ 𝐸𝑏) = 𝜇((∪𝑖∈N𝐴𝑖) ∩ (∪𝑖∈N𝐵𝑖)) = 𝜇𝑎 (∪𝑖∈N𝐴𝑖) · 𝜇𝑏 (∪𝑖∈N𝐵𝑖) = 𝜇𝑎 ( lim 𝑛→∞ ∪𝑖∈[𝑛]𝐴𝑖) · 𝜇𝑏 ( lim 𝑛→∞ ∪𝑖∈[𝑛]𝐵𝑖) = lim 𝑛→∞ 𝜇𝑎 (∪𝑖∈[𝑛]𝐴𝑖) · lim 𝑛→∞ 𝜇𝑏 (∪𝑖∈[𝑛]𝐵𝑖) (By continuity of 𝜇𝑎 and 𝜇𝑏) = lim 𝑛→∞ 𝜇𝑎 (∪𝑖∈[𝑛]𝐴𝑖) · 𝜇𝑏 (∪𝑖∈[𝑛]𝐵𝑖) (†) = lim 𝑛→∞ ∑︁ 𝑖∈[𝑛] 𝜇𝑏 (𝐵𝑖) · 𝜇𝑎 (𝐴𝑖) (By eq. (D.1)) = ∑︁ 𝑖∈N 𝜇𝑏 (𝐵𝑖) · 𝜇𝑎 (𝐴𝑖), (D.2) where (†) holds because that the product of limits equals to the limit of the product when both lim𝑛→∞ 𝜇𝑎 (∪𝑖∈[𝑛]𝐴𝑖) and lim𝑛→∞ 𝜇𝑏 (∪𝑖∈[𝑛]𝐵𝑖) are finite. Thus, we proved countable additivity as well. • Next we show that we can extend 𝜇 to a measure on F𝑎 ⊕ F𝑏. So far, we proved that 𝜇 is a sub-additive measure on the {𝐸𝑎 ∩ 𝐸𝑏 |𝐸𝑎 ∈ F𝑎, 𝐸𝑏 ∈ F𝑏}, which forms a 𝜋-system. By a known theorem in probability theory (e.g. [Rosenthal, 2006, Corollary 2.5.4]), we can extend a sub-additive mea- sure on a 𝜋-system to the 𝜎-algebra it generates if the 𝜋-system is a semi- algebra. Thus, we can extend 𝜇 to a measure on 𝜎({𝐸𝑎∩𝐸𝑏 | 𝐸𝑎 ∈ F𝑎, 𝐸𝑏 ∈ F𝑏}) if we can prove 𝐽 = {𝐸𝑎 ∩ 𝐸𝑏 | 𝐸𝑎 ∈ F𝑎, 𝐸𝑏 ∈ F𝑏} is a semi-algebra. – 𝐽 contains ∅ and Ω: trivial. – 𝐽 is closed under finite intersection: (𝐸𝑎∩𝐸𝑏) ∩ (𝐸′𝑎∩𝐸′𝑏) = (𝐸𝑎∩𝐸 ′ 𝑎) ∩ (𝐸𝑏 ∩ 𝐸′𝑏), where 𝐸𝑎 ∩ 𝐸′𝑎 ∈ F𝑎, and 𝐸𝑏 ∩ 𝐸′𝑏 ∈ F𝑏. 328 – The complement of any element of 𝐽 is equal to a finite disjoint union of elements of 𝐽: (𝐸𝑎 ∩ 𝐸𝑏)𝐶 = 𝐸𝐶𝑎 ∪ 𝐸𝐶𝑏 = (𝐸𝐶𝑎 ∩Ω) ⊎ (𝐸𝑎 ∩ 𝐸𝐶𝑏 ) where 𝐸𝐶𝑎 , 𝐸𝑎 ∈ F𝑎, and 𝐸𝐶 𝑏 ,Ω ∈ F𝑏. As shown in Li et al. [2023a], 𝜎({𝐸𝑎 ∩ 𝐸𝑏 | 𝐸𝑎 ∈ F𝑎, 𝐸𝑏 ∈ F𝑏}) = F𝑎 ⊕ F𝑏 (D.3) Thus, the extension of 𝜇 is a measure on F𝑎 ⊕ F𝑏. • Last, we show that 𝜇 is a probability measure on F𝑎 ⊕ F𝑏: 𝜇(Ω) = 𝜇𝑎 (Ω) · 𝜇𝑏 (Ω) = 1. □ Lemma D.2.7. Consider two probability spaces (F1, 𝜇1), (F2, 𝜇2) ∈ P(Ω), and some other probability space (Σ𝐴, 𝜇) and kernel 𝜅 such that 𝜇1 = bind(𝜇, 𝜅). Then, the independent product (F1, 𝜇1) ⊛ (F2, 𝜇2) exists if and only if for any 𝑎 ∈ supp(𝜇), the independent product (F1, 𝜅(𝑎)) ⊛ (F2, 𝜇2) exists. When they both exist, (F1, 𝜇1) ⊛ (F2, 𝜇2) = (F1 ⊕ F2, bind(𝜇, λ𝑎. 𝜅(𝑎) ⊛ 𝜇2)) Proof. We first show the backwards direction. By lemma D.2.6, for any 𝑎 ∈ supp(𝜇), to show that the independent product (F1, 𝜅(𝑎)) ⊛ (F1, 𝜇1) exists, it suffices to show that for any 𝐸1 ∈ F1, 𝐸2 ∈ F2 such that 𝐸1 ∩ 𝐸2 = ∅, 𝜅(𝑎) (𝐸1) · 𝜇2(𝐸2) = 0. Fix any such 𝐸1, 𝐸2, because (F1, 𝜇1) ⊛ (F2, 𝜇2) is defined, we have 𝜇1(𝐸1) · 𝜇2(𝐸2) = 0, then either 𝜇1(𝐸1) = 0 or 𝜇2(𝐸2) = 0. 329 • If 𝜇1(𝐸1) = 0: Recall that 𝜇1(𝐸1) = bind(𝜇, 𝜅) (𝐸1) = ∑︁ 𝑎∈𝐴 𝜇(𝑎) · 𝜅(𝑎) (𝐸1) = ∑︁ 𝑎∈supp(𝜇) 𝜇(𝑎) · 𝜅(𝑎) (𝐸1) Because all 𝜇(𝑎) > 0 and 𝜅(𝑎) (𝐸1) ≥ 0 for all 𝑎 ∈ supp(𝜇) ∑𝑎∈supp(𝜇) 𝜇(𝑎) · 𝜅(𝑎) (𝐸1) = 0 implies that 𝜇(𝑎) · 𝜅(𝑎) (𝐸1) = 0 for all 𝑎 ∈ supp(𝜇). Thus, for all 𝑎 ∈ supp(𝜇), it must 𝜅(𝑎) (𝐸1) = 0. Therefore, 𝜅(𝑎) (𝐸1) · 𝜇2(𝐸2) = 0 for all 𝑎 ∈ supp(𝜇) with this 𝐸1, 𝐸2. • If 𝜇2(𝐸2) = 0, then it is also clear that 𝜅(𝑎) (𝐸1) · 𝜇2(𝐸2) = 0 for all 𝑎 ∈ supp(𝜇). Thus, we have 𝜅(𝑎) (𝐸1) · 𝜇2(𝐸2) = 0 for any 𝐸1 ∩ 𝐸2 = ∅ and 𝑎 ∈ supp(𝜇). By lemma D.2.6, the independent product (F1, 𝜅(𝑎)) ⊛ (F1, 𝜇1) exists. For the forward direction: for any 𝐸1 ∈ F1 and 𝐸2 ∈ F2 such that 𝐸1 ∩ 𝐸2 = ∅, the independent product (F1, 𝜅(𝑎)) ⊛ (F2, 𝜇2) exists implies that 𝜅(𝑎) (𝐸1) · 𝜇2(𝐸2) = 0. Thus, 𝜇1(𝐸1) · 𝜇2(𝐸2) = bind(𝜇, 𝜅) (𝐸1) · 𝜇2(𝐸2) =  ∑︁ 𝑎∈𝐴 𝜇(𝑎) · 𝜅(𝑎) (𝐸1)  · 𝜇2(𝐸2) = ∑︁ 𝑎∈𝐴𝜇 𝜇(𝑎) · (𝜅(𝑎) (𝐸1) · 𝜇2(𝐸2)) = ∑︁ 𝑎∈𝐴𝜇 𝜇(𝑎) · 0 = 0 Thus, by lemma D.2.6, the independent product (F1, 𝜇1)⊛ (F2, 𝜇2) exists. For 330 any 𝐸1 ∈ F1 and 𝐸2 ∈ F2, bind(𝜇, λ𝑎. 𝜅(𝑎) ⊛ 𝜇2) (𝐸1 ∩ 𝐸2) = ∑︁ 𝑎∈supp(𝜇) 𝜇(𝑎) · (𝜅(𝑎) ⊛ 𝜇2) (𝐸1 ∩ 𝐸2) = ∑︁ 𝑎∈supp(𝜇) 𝜇(𝑎) · 𝜅(𝑎) (𝐸1) · 𝜇2(𝐸2) =  ∑︁ 𝑎∈supp(𝜇) 𝜇(𝑎) · 𝜅(𝑎) (𝐸1)  · 𝜇2(𝐸2) = bind(𝜇, 𝜅) (𝐸1) · 𝜇2(𝐸2) = 𝜇1(𝐸1) · 𝜇2(𝐸2) = (𝜇1 ⊛ 𝜇2) (𝐸1 ∩ 𝐸2) Thus, (F1, 𝜇1) ⊛ (F2, 𝜇2) = (F1 ⊕ F2, bind(𝜇, λ𝑎. 𝜅(𝑎) ⊛ 𝜇2)). □ D.3 Construction of the BLUEBELL Model Lemma D.3.1. The structure PSp is an ordered unital resource algebra (RA) as defined in definition 5.3.1. Proof. We defined · and ⪯ the same way as in Li et al. [2023a], and they have proved that · is associative and commutative, and ⪯ is transitive and reflexive. We check the rest of conditions one by one. Condition 𝑎 · 𝑏 = 𝑏 · 𝑎 The independent product is proved to be commutative in Li et al. [2023a]. Condition (𝑎 · 𝑏) · 𝑐 = 𝑎 · (𝑏 · 𝑐) The independent product is proved to be asso- ciative in Li et al. [2023a]. 331 Condition 𝑎 ⪯ 𝑏 ⇒ 𝑏 ⪯ 𝑐 ⇒ 𝑎 ⪯ 𝑐 The order ⪯ is proved to be transitive in Li et al. [2023a]. Condition 𝑎 ⪯ 𝑎 The order ⪯ is proved to be reflexive in Li et al. [2023a]. Condition V(𝑎 · 𝑏) ⇒ V(𝑎) Pattern matching on 𝑎 · 𝑏, either there exists prob- ability spaces P1,P2 such that 𝑎 = P1, 𝑏 = P2 and P1 ⊛ P2 is defined, or 𝑎 · 𝑏 = . Case: 𝑎 · 𝑏 = Note that V(𝑎 · 𝑏) does not hold when 𝑎 · 𝑏 = , so we can eliminate this case by ex falso quodlibet. Case: 𝑎 · 𝑏 = P1 ⊛ P2 Then 𝑎 = P1, and thus V(𝑎). Condition V(𝜀) Clear because 𝜀 ≠ . Condition 𝑎 ⪯ 𝑏 ⇒ V(𝑏) ⇒ V(𝑎) Pattern matching on 𝑎 and 𝑏, either there ex- ists probability spaces P1,P2 such that 𝑎 = P1, 𝑏 = P2 and P1 ⊑ P2 is defined, or 𝑏 = . Case: 𝑏 = Then V(𝑏) does not hold, and we can eliminate this case by ex falso quodlibet. Case: 𝑎 = P1, 𝑏 = P2 and P1 ⊑ P2 We clearly have V(𝑎). Condition 𝜀 · 𝑎 = 𝑎 Pattern matching on 𝑎, either 𝑎 = or there exists some probability space P such that 𝑎 = P. Case: 𝑎 = Then 𝜀 · 𝑎 = = 𝑎. Case: 𝑎 = P Then 𝜀 · 𝑎 = 𝑎. Condition 𝑎 ⪯ 𝑏 ⇒ 𝑎 · 𝑐 ⪯ 𝑏 · 𝑐 Pattern matching on 𝑎 and 𝑏. If 𝑎 ⪯ 𝑏, then either 𝑏 = or there exists P,P′ such that 𝑎 = P and 𝑏 = P′. Case: 𝑏 = Then 𝑏 · 𝑐 = is the top element, and then 𝑎 · 𝑐 ⪯ 𝑏 · 𝑐. 332 Otherwise 𝑎 ⪯ 𝑏 iff P ⪯ P′, then either 𝑏 · 𝑐 = and 𝑎 · 𝑐 ⪯ 𝑏 · 𝑐 follows, or 𝑏 · 𝑐 = P′ ⊛ P′′ for some probability space 𝑐 = P′′. Then P ⪯ P′ implies that P ·P′′ is also defined and P ·P′ ⪯ P ·P′′. Thus, 𝑎 ·𝑐 ⪯ 𝑏 ·𝑐 too. □ Lemma D.3.2 (RA composition preserves compatibility). F1 # 𝑝1 ⇒ F2 # 𝑝2 ⇒ (F1 ⊕ F2) # (𝑝1 · 𝑝2) Proof. Let 𝑆1 = {𝑥 ∈ Var | 𝑝1(𝑥) = 0}, 𝑆2 = {𝑥 ∈ Var | 𝑝2(𝑥) = 0}. If F1 # 𝑝1, then there exists P′1 ∈ P((Var \ 𝑆1) → Val) such that P1 = P′1 ⊗ 𝟙𝑆1→Val In addition, if F2 # 𝑝2, then there exists P′2 ∈ P((Var \ 𝑆2) → Val) such that P2 = P′2 ⊗ 𝟙𝑆2→Val. Then, P1 · P2 = P1 ⊛ P2 = (P′1 ⊗ 𝟙𝑆1→Val) ⊛ (P′2 ⊗ 𝟙𝑆2→Val) Say (F ′1 , 𝜇 ′ 1) = P ′ 1, and (F ′2 , 𝜇 ′ 2) = P ′ 2. Then the sigma algebra of P1 · P2 is 𝜎( { (𝐸1 × 𝑆1 → Val) ∩ (𝐸2 × 𝑆2 → Val) | 𝐸1 ∈ F ′1 , 𝐸2 ∈ F ′2 } ) =𝜎( { ((𝐸1 × (𝑆1 \ 𝑆2) → Val) ∩ (𝐸2 × (𝑆2 \ 𝐸1) → Val)) × (𝑆1 ∩ 𝑆2) | 𝐸1 ∈ F ′1 , 𝐸2 ∈ F ′2 } ) Then, there exists P′′ ∈ P((Var \ (𝑆1 ∩ 𝑆2)) → Val) such that P1 · P2 = P′′ ⊗ 𝟙(𝑆1∩𝑆2)→Val). Also, {𝑥 ∈ Var | (𝑝1 · 𝑝2) (𝑥) = 0} ={𝑥 ∈ Var | 𝑝1(𝑥) + 𝑝2(𝑥) = 0} ={𝑥 ∈ Var | 𝑝1(𝑥) = 0 and 𝑝2(𝑥) = 0} =𝑆1 ∩ 𝑆2 Therefore, F1 ⊕ F2 is compatible with 𝑝1 · 𝑝2 □ 333 Lemma D.3.3. The structure (Perm, ⪯,V, ·, 𝜀) is an ordered unital resource algebra (RA) as defined in definition 5.3.1. Proof. We check the conditions one by one. Condition 𝑎 · 𝑏 = 𝑏 · 𝑎 Follows from the commutativity of addition. Condition (𝑎 · 𝑏) · 𝑐 = 𝑎 · (𝑏 · 𝑐) Follows from the associativity of addition. Condition 𝑎 ⪯ 𝑏 ⇒ 𝑏 ⪯ 𝑐 ⇒ 𝑎 ⪯ 𝑐 ⪯ is a point-wise lifting of the order ≤ on arithmetics, so it follows from the transitivity of ≤. Condition 𝑎 ⪯ 𝑎 ⪯ is a point-wise lifting of the order ≤ on arithmetics, so it follows from the reflexivity of ≤. Condition V(𝑎 · 𝑏) ⇒ V(𝑎) By definition, V(𝑎 · 𝑏) ⇒ ∀𝑥 ∈ Var, (𝑎 · 𝑏) (𝑥) ≤ 1 ⇒ ∀𝑥 ∈ Var, 𝑎(𝑥) + 𝑏(𝑥) ≤ 1 ⇒ ∀𝑥 ∈ Var, 𝑎(𝑥) ≤ 1 ⇒ V(𝑎) Condition V(𝜀) Note that 𝜀 = λ . 0 satisfies that ∀𝑥 ∈ Var, 𝜀(𝑥) ≤ 1, so V(𝜀). Condition 𝑎 ⪯ 𝑏 ⇒ V(𝑏) ⇒ V(𝑎) By definition, 𝑎 ⪯ 𝑏 means ∀𝑥 ∈ Var.𝑎(𝑥) ≤ 𝑏(𝑥), and V(𝑏) means that ∀𝑥 ∈ Var.𝑏(𝑥) ≤ 1. Thus, 𝑎 ⪯ 𝑏 and V(𝑏) implies that ∀𝑥 ∈ Var.𝑎(𝑥) ≤ 𝑏(𝑥) ≤ 1, which implies V(𝑎). Condition 𝜀 · 𝑎 = 𝑎 By definition, 𝜀 · 𝑎 = λ𝑥. (λ . 0) (𝑥) + 𝑎(𝑥) = λ𝑥. 0 + 𝑎(𝑥) = 𝑎. 334 Condition 𝑎 ⪯ 𝑏 ⇒ 𝑎 · 𝑐 ⪯ 𝑏 · 𝑐 By definition, 𝑎 ⪯ 𝑏 ⇔ ∀𝑥 ∈ Var.𝑎(𝑥) ≤ 𝑏(𝑥) ⇒ ∀𝑥 ∈ Var.𝑎(𝑥) + 𝑐(𝑥) ≤ 𝑏(𝑥) + 𝑐(𝑥) ⇒ 𝑎 · 𝑐 ⪯ 𝑏 · 𝑐 □ Lemma D.3.4. The structure PSpPm is an ordered unital resource algebra (RA) as defined in definition 5.3.1. Proof. We want to check that PSpPm satisfies all the requirements to be an or- dered unital resource algebra (RA). Because PSpPm is very close to a product of PSp and Perm, the proof below is very close to the proof that product RAs are RA. First, lemma D.3.2 implies that · is well-defined. Then we need to check all the RA axioms are satisfied. For any 𝑎, 𝑏 ∈ PSpPm and any P1, 𝑝1,P2, 𝑝2 such that 𝑎 = (P1, 𝑝1), 𝑏 = (P2, 𝑝2). We check the conditions one by one. Condition V(𝑎 · 𝑏) ⇒ V(𝑎) By definition, 𝑎 · 𝑏 = (P1, 𝑝1) · (P2, 𝑝2) = (P1 · P2, 𝑝1 · 𝑝2). And V(P1 · P2, 𝑝1 · 𝑝2) implies that V(P1 · P2) and V(𝑝1 · 𝑝2). Because PSp and Perm are both RAs, we have V(P1) and V(𝑝1). Thus, V(P1, 𝑝1). Condition V(𝜀) Clear because 𝜀 = (𝟙Mem[Var] , λ𝑥. 0) and 𝟙Mem[Var] ≠ , and ∀𝑥.(λ𝑥. 0) (𝑥) ≤ 1. Condition 𝑎 ⪯ 𝑏 ⇒ V(𝑏) ⇒ V(𝑎) 𝑎 ⪯ 𝑏 implies that P1 ⪯ P2 and 𝑝1 ⪯ 𝑝2. V(𝑏) implies that P2 ≠ , and ∀𝑥.(𝑝2) (𝑥) ≤ 1. Thus, P1 ≠ , and ∀𝑥.(𝑝1) (𝑥) ≤ 1. And therefore, V(𝑏). 335 Condition 𝜀 · 𝑎 = 𝑎 𝜀 · 𝑎 = (𝟙Mem[Var] , λ𝑥. 0) · (P1, 𝑝1) = (𝟙Mem[Var] · P1, λ𝑥. 0 · 𝑝1) = (P1, 𝑝1) = 𝑎. Condition 𝑎 ⪯ 𝑏 ⇒ 𝑎 · 𝑐 ⪯ 𝑏 · 𝑐 𝑎 ⪯ 𝑏 implies that P1 ⪯ P2 and 𝑝1 ⪯ 𝑝2. Say 𝑐 = (P3, 𝑝3). Then 𝑎 · 𝑐 = (P1 · P3, 𝑝1 · 𝑝3) and 𝑏 · 𝑐 = (P2 · P3, 𝑝2 · 𝑝3). Because P1 ⪯ P2, P1 · P3 ⪯ P2 · P3; similarly, 𝑝1 · 𝑝3 ⪯ 𝑝2 · 𝑝3. Thus, 𝑎 · 𝑐 ⪯ 𝑏 · 𝑐. □ Lemma D.3.5. If 𝑀 is an RA, then 𝑀 𝐼 is also an RA. Proof. RA is known to be closed under products, and 𝑀 𝐼 can be obtained as products of 𝑀 , so we omit the proof. □ Lemma D.3.6. M𝐼 is an RA. Proof. By lemma D.3.4, PSpPm is an RA. By lemma D.3.5,M𝐼 = PSpPm𝐼 is also an RA. □ D.4 Characterizations of Joint Conditioning Interestingly, it is possible to characterize the conditioning modality using the other connectives of the logic. Proposition D.4.1 (Alternative Characterization of Joint conditioning). The fol- lowing is a logically equivalent characterization of the joint conditioning modality: C𝜇 𝐾 ⊣⊢ ∃F , 𝜇, 𝑝, 𝜅.Own(F , 𝜇, 𝑝) ∗ ⌜∀𝑖 ∈ 𝐼 . 𝜇(𝑖) = bind(𝜇, 𝜅(𝑖))⌝ ∗ ∀𝑣 ∈ supp(𝜇).Own(F , 𝜅(𝐼) (𝑣), 𝑝) −∗ 𝐾 (𝑣) 336 Proof. In the following, we sometimes abbreviate ∀𝑖 ∈ 𝐼 . 𝜇(𝑖) = bind(𝜇, 𝜅(𝑖)) by writing just 𝜇 = bind(𝜇, 𝜅). We start with the embedding: ∃F , 𝜇, 𝑝, 𝜅.Own(F , 𝜇, 𝑝) ∗ ⌜∀𝑖 ∈ 𝐼 . 𝜇(𝑖) = bind(𝜇, 𝜅(𝑖))⌝ ∗ ∀𝑎 ∈ supp(𝜇).Own(F , 𝜅(𝐼) (𝑎), 𝑝) −∗ 𝐾 (𝑎) ⊣⊢ λ𝑟. ∃F , 𝜇, 𝑝, 𝜅. ( Own(F , 𝜇′, 𝑝) ∗ ⌜𝜇 = bind(𝜇, 𝜅)⌝ ∗ (∀𝑎 ∈ supp(𝜇).Own(F , 𝜅𝑎, 𝑝) −∗ 𝐾 (𝑎)) ) (𝑟) ⊣⊢ λ𝑟. ∃F , 𝜇, 𝑝, 𝜅, F1, 𝜇1, 𝑝1, F2, 𝜇2, 𝑝2, F3, 𝜇3, 𝑝3, 𝑟 ⊒ (F1, 𝜇1, 𝑝1) · (F2, 𝜇2, 𝑝2) · (F3, 𝜇3, 𝑝3)∧ (F1, 𝜇1, 𝑝1) ⊒ (F , 𝜇, 𝑝) ∧ ⌜𝜇 = bind(𝜇, 𝜅)⌝∧ (∀𝑎 ∈ supp(𝜇).∀𝑟1, 𝑟2. 𝑟1 · (F3, 𝜇3, 𝑝3) = 𝑟2 ∧ 𝑟1 ⊒ (F , 𝜅𝑎, 𝑝) ⇒ 𝐾 (𝑎) (𝑟2)) ⊣⊢ λ𝑟. ∃F , 𝜇, 𝑝, F3, 𝜇3, 𝑝3, 𝜅. 𝑟 ⊒ (F , 𝜇, 𝑝) · (F3, 𝜇3) ∧ ⌜𝜇 = bind(𝜇, 𝜅)⌝∧ (∀𝑎 ∈ supp(𝜇).∀𝑟1, 𝑟2. 𝑟1 · (F3, 𝜇3, 𝑝3) = 𝑟2 ∧ 𝑟1 ⊒ (F , 𝜅𝑎, 𝑝) ⇒ 𝐾 (𝑎) (𝑟2)) For the last equivalence, the forward direction holds because 𝑟 ⊒ (F1, 𝜇1, 𝑝1) · (F2, 𝜇2, 𝑝2) · (F3, 𝜇3, 𝑝3) ⊒ (F1, 𝜇1, 𝑝1) · (F3, 𝜇3, 𝑝3) ⊒ (F , 𝜇, 𝑝) · (F3, 𝜇3, 𝑝3). The backward direction holds because we can pick (F1, 𝜇1, 𝑝1) = (F , 𝜇, 𝑝), (F2, 𝜇2) be the trivial probability space on 𝑠 and 𝑝2 = λ . 0. • To show that the embedding implies the original assertion C𝜇 𝐾 , we start 337 with 𝜇(𝑖) ⊛ 𝜇3(𝑖). For any 𝑖, we have 𝜇(𝑖) = bind(𝜇, 𝜅(𝑖)), and thus 𝜇(𝑖) ⊛ 𝜇3(𝑖) = bind(𝜇, 𝜅(𝑖)) ⊛ 𝜇3(𝑖). According to lemma D.2.7, 𝜇(𝑖)⊛𝜇3(𝑖) is defined implies that 𝜅(𝑖) (𝑎)⊛𝜇3(𝑖) is defined for any 𝑎 ∈. Furthermore, 𝜇(𝑖) ⊛ 𝜇3(𝑖) = bind(𝜇, λ𝑎. 𝜅(𝑖) (𝑎) ⊛ 𝜇3(𝑖)) We abbreviate the hyperkernel [𝑖: λ𝑎. 𝜅(𝑖) (𝑎) ⊛ 𝜇3(𝑖) | 𝑖 ∈ 𝐼] as 𝜅′. For any 𝑎 ∈ supp(𝜇), the assertion ∀𝑎 ∈ supp(𝜇).∀𝑟1, 𝑟2.𝑟1 ⊛ (F3, 𝜇3, 𝑝3) = 𝑟2 ∧ 𝑟1 ⊒ (F , 𝜅(𝐼)𝑎, 𝑝) ⇒ 𝐾 (𝑎) (𝑟2) applies with the specific case 𝑟1 = (F , 𝜅(𝐼) (𝑎), 𝑝), gives us 𝐾 (𝑎) ((F , 𝜅(𝐼) (𝑎), 𝑝) · (F3, 𝜇3, 𝑝3)]) By the definition of composition in our resource algebra, we have that 𝐾 (𝑎) holds on (F ⊕ F3, 𝜅 ′(𝐼) (𝑎), 𝑝 + 𝑝3). For any 𝑟, – If V(𝑟), then there exists F ′, 𝜇′, 𝑝′ such that 𝑟 = (F ′, 𝜇′, 𝑝′). Note that 𝑟 = (F ′, 𝜇′, 𝑝′) ⊒ (F , 𝜇, 𝑝) · (F3, 𝜇3, 𝑝3) = (F ⊕ F3, 𝜇 ⊛ 𝜇3, 𝑝 + 𝑝3) By lemma D.2.4, 𝜇 ⊛ 𝜇3 = bind(𝜇, 𝜅′) implies that there exists 𝜅′′ such that 𝜇(𝑖) = bind(𝜇, 𝜅′′(𝑖)), and that for any 𝑎 ∈ supp 𝜇, (F ⊕ F3, 𝜅 ′(𝐼) (𝑎)) ⊑ (F ′, 𝜅′′(𝐼) (𝑎)). Thus, by monotonicity with respect to the extension order, that would imply 𝐾 (𝑎) holds on (F ′, 𝜅′′(𝐼) (𝑎), 𝑝′). And 𝐾 (𝑎) holds on (F ′, 𝜅′′(𝐼) (𝑎), 𝑝′) for any 𝑎 ∈ supp 𝜇 together with 𝜇(𝑖) = bind(𝜇, 𝜅′′(𝑖)) implies that 𝑟 satisfy the original assertion of conditioning modality. 338 – If not V(𝑟), then 𝑟 satisfies any assertions, so 𝑟 satisfy the original assertion of conditioning modality. • To show the other direction that having the original assertion implies the embedded assertion. Assume C𝜇 𝐾 (𝑟), that is, ∃F , 𝜇, 𝑝, 𝜅. (F , 𝜇, 𝑝) ⪯ 𝑟 ∧ ∀𝑖 ∈ 𝐼 . 𝜇(𝑖) = bind(𝜇, 𝜅(𝑖)) ∧ ∀𝑣 ∈ supp(𝜇).𝐾 (𝑣) (F , 𝜅(𝐼) (𝑣), 𝑝) (𝑟) To show that 𝑟 also satisfy the embedding, we pick the witness for the existential quantifier as follows: let (F3, 𝜇3) be the trivial probabil- ity space on Mem[Var]; let 𝑝3 = λ . 0; pick (Fembd, 𝜇embd, 𝑝embd) be the (Forig, 𝜇orig, 𝑝orig) that witness C𝜇 𝐾 (𝑟), and 𝜅embd = 𝜅orig. Then: – First we show 𝑟 ⪰ (Forig, 𝜇orig, 𝑝orig) = (Forig, 𝜇orig, 𝑝orig) · (F3, 𝜇3, 𝑝3) = (Fembd, 𝜇embd, 𝑝embd) · (F3, 𝜇3, 𝑝3) – 𝜇orig = bind(𝜇, 𝜅orig(𝐼) (𝑎)) implies 𝜇embd = bind(𝜇, 𝜅embd(𝐼) (𝑎)). – For any 𝑟1, 𝑟2, 𝑟1 · (F3, 𝜇3, 𝑝3) = 𝑟2 ∧ 𝑟1 ⊒ (Fembd, 𝜅embd(𝐼) (𝑎), 𝑝embd) implies that 𝑟2 = 𝑟1 ⊒ (Forig, 𝜅orig(𝐼) (𝑎), 𝑝orig). By the assumption that the orig assertion holds, we have 𝐾 (𝑎) (Forig, 𝜅orig(𝐼) (𝑎), 𝑝orig), which implies 𝐾 (𝑎) (𝑟2). Therefore, 𝑟 also satisfy the embedding. □ 339 D.5 Soundness D.5.1 Soundness of Primitive Rules Soundness of Distribution Ownership Rules Lemma D.5.1. DIST-INJ is sound. Proof. Assume a valid 𝑎 ∈ M𝐼 is such that both 𝐸 $∼ 𝜇(𝑎) and 𝐸 $∼ 𝜇′(𝑎) hold. Let 𝑎 = (F , 𝜇0, 𝑝), then we know 𝜇 = 𝜇0 ◦ 𝐸−1 = 𝜇′, which proves the claim. □ Lemma D.5.2. SURE-MERGE is sound. Proof. The proof for the forward direction is very similar to the one for sec- tion 5.3.5. For 𝑎 ∈ M𝐼 , if (⌈𝐸1⌉ ∗ ⌈𝐸2⌉)(𝑎). Then there exists 𝑎1, 𝑎2 such that 𝑎1 · 𝑎2 ⪯ 𝑎 and ⌈𝐸1⌉ (𝑎1), ⌈𝐸2⌉ (𝑎2). Say 𝑎 = (F , 𝜇, 𝑝), 𝑎1 = (F1, 𝜇1, 𝑝1) and 𝑎2 = (F2, 𝜇2, 𝑝2). Then ⌈𝐸1⌉ (𝑎1) implies that 𝜇1(𝐸−1 1 (True)) = 1 And similarly, 𝜇2(𝐸−1 2 (True)) = 1 Thus, 𝜇(𝐸−1 1 (True) ∩ 𝐸−1 2 (True)) = 𝜇1(𝐸−1 1 (True)) · 𝜇2(𝐸−1 2 (True)) = 1. Hence, 𝜇(𝐸1 ∧ 𝐸−1 2 (True)) = 𝜇(𝐸−1 1 (True) ∩ 𝐸−1 2 (True)) = 1 340 Thus, ⌈𝐸1 ∧ 𝐸2⌉ (𝑎). Now we prove the backwards direction: Say 𝑎 = (F , 𝜇, 𝑝). if ⌈𝐸1 ∧ 𝐸2⌉ (𝑎), then 𝜇(𝐸1 ∧ 𝐸−1 2 (True)) = 1, and then 𝜇(𝐸−1 1 (True)) ≥ 𝜇(𝐸1 ∧ 𝐸−1 2 (True)) = 1 𝜇(𝐸−1 2 (True)) ≥ 𝜇(𝐸1 ∧ 𝐸−1 2 (True)) = 1 Let F1 = 𝜎(𝐸−1 1 (True)) and F2 = 𝜎(𝐸−1 2 (True)). Then, ⌈𝐸1⌉ (F1, 𝜇 |F1 , λ . 0) ⌈𝐸2⌉ (F2, 𝜇 |F2 , λ . 0) (F1, 𝜇 |F1 , λ . 0) ∗ (F2, 𝜇 |F2 , λ . 0) ⪯ 𝑎 Thus, ⌈𝐸1⌉ ∗ ⌈𝐸2⌉ holds on 𝑎. □ Lemma D.5.3. PROD-SPLIT is sound. Proof. For any (F , 𝜇, 𝑝) such that ((𝐸1, 𝐸2) $∼ 𝜇1 ⊗ 𝜇2) (F , 𝜇, 𝑝), by definition, it must ∃F ′, 𝜇′. (Own(F ′, 𝜇′)) (F , 𝜇, 𝑝) ∗ (𝐸1, 𝐸2) � (F ′(𝑖), 𝜇′(𝑖)) ∧ 𝜇1 ⊗ 𝜇2 = 𝜇′(𝑖) ◦ (𝐸1, 𝐸2)−1. We can derive from it that ∃F ′, 𝜇′, 𝑝′.(F ′, 𝜇′) ⪯ (F , 𝜇, 𝑝)∗( ∀𝑎, 𝑏 ∈ 𝐴.∃𝐿𝑎,𝑏,𝑈𝑎,𝑏 ∈ F ′(𝑖). 𝐿𝑎,𝑏 ⊆ (𝐸1, 𝐸2)−1(𝑎, 𝑏) ⊆ 𝑈𝑎,𝑏 ∧ 𝜇′(𝐿𝑎,𝑏) = 𝜇′(𝑈𝑎,𝑏)∧ 𝜇1 ⊗ 𝜇2(𝑎, 𝑏) = 𝜇′(𝑖) (𝐿𝑎,𝑏) = 𝜇′(𝑖) (𝑈𝑎,𝑏) ) Also, for any 𝑎, 𝑏, 𝑎′, 𝑏′ ∈ 𝐴 such that 𝑎 ≠ 𝑎′ or 𝑏 ≠ 𝑏′, we have 𝐿𝑎,𝑏 disjoint from 𝐿𝑎′,𝑏′ because on 𝐿𝑎,𝑏 ∩ 𝐿𝑎′,𝑏′ , the random variable (𝐸1, 𝐸2) maps to both (𝑎, 𝑏) and (𝑎′, 𝑏′). 341 Define F1(𝑖) = 𝜎( { (⋃𝑏∈𝐴 𝐿𝑎,𝑏) | 𝑎 ∈ 𝐴 } ∪ { (⋃𝑏∈𝐴𝑈𝑎,𝑏) | 𝑎 ∈ 𝐴 } ), and similarly define F2(𝑖) = 𝜎( { (⋃𝑎∈𝐴 𝐿𝑎,𝑏) | 𝑏 ∈ 𝐴 } ∪ { (⋃𝑎∈𝐴𝑈𝑎,𝑏) | 𝑏 ∈ 𝐴 } ). Denote 𝜇′ restricted to F1 as 𝜇′1 and 𝜇′ restricted to F2 as 𝜇′2. We want to show that (F1(𝑖), 𝜇′1(𝑖)) ⊛ (F2(𝑖), 𝜇′2(𝑖)) ⊑ (F ′(𝑖), 𝜇′(𝑖)), which boils down to show that for any 𝑋1 ∈ F1(𝑖), any 𝑋2 ∈ F2(𝑖), 𝜇′(𝑋1 ∩ 𝑋2) = 𝜇′1(𝑋1) · 𝜇′2(𝑋2) For convenience, we will denote ∪𝑏∈𝐴𝐿𝑎,𝑏 as 𝐿𝑎, denote ∪𝑎∈𝐴𝐿𝑎,𝑏 as 𝐿𝑏, de- note ∪𝑏∈𝐴𝑈𝑎,𝑏 as𝑈𝑎, and denote ∪𝑎∈𝐴𝑈𝑎,𝑏 as𝑈𝑏. First, using a standard construction in measure theory proofs, we rewrite F1 and F2 as sigma algebra generated by sets of partitions. Specifically, F1 is equivalent to 𝜎( {⋂ 𝑎∈𝑆1 𝐿𝑎 ∩ ⋂ 𝑎∈𝑆2 𝑈𝑎 \ ( ⋃ 𝑎∈𝐴\𝑆1 𝐿𝑎 ∪ ⋃ 𝑎∈𝐴\𝑆2 𝑈𝑎) | 𝑆1, 𝑆2 ⊆ 𝐴 } ) and similarly, F2 is equivalent to 𝜎( {⋂ 𝑏∈𝑇1 𝐿𝑏 ∩ ⋂ 𝑏∈𝑇2 𝑈𝑏 \ ( ⋃ 𝑏∈𝐴\𝑇1 𝐿𝑏 ∪ ⋃ 𝑏∈𝐴\𝑇2 𝑈𝑏) | 𝑇1, 𝑇2 ⊆ 𝐴 } ). Thus, by lemma D.2.2, any event 𝑋1 in F1 can be represented by⊎ 𝑆1∈𝐼1,𝑆2∈𝐼2 ⋂ 𝑎∈𝑆1 𝐿𝑎 ∩ ⋂ 𝑎∈𝑆2 𝑈𝑎 \ ( ⋃ 𝑎∈𝐴\𝑆1 𝐿𝑎 ∪ ⋃ 𝑎∈𝐴\𝑆2 𝑈𝑎) for some 𝐼1, 𝐼2 ⊆ P(𝐴), where P is the powerset over 𝐴. Similarly, any event 𝑋2 in F2 can be represented by⊎ 𝑆3∈𝐼3,𝑆4∈𝐼4 ⋂ 𝑏∈𝑆3 𝐿𝑏 ∩ ⋂ 𝑏∈𝑆4 𝑈𝑏 \ ( ⋃ 𝑏∈𝐴\𝑆3 𝐿𝑏 ∪ ⋃ 𝑏∈𝐴\𝑆2 𝑈𝑏) 342 for some 𝐼3, 𝐼4 ⊆ P(𝐴). Thus, 𝑋1 ∩ 𝑋2 can be represented as 𝑋1 ∩ 𝑋2 = (⊎𝑆1∈𝐼1,𝑆2∈𝐼2 ⋂ 𝑎∈𝑆1 𝐿𝑎 ∩ ⋂ 𝑎∈𝑆2 𝑈𝑎 \ ( ⋃ 𝑎∈𝐴\𝑆1 𝐿𝑎 ∪ ⋃ 𝑎∈𝐴\𝑆2 𝑈𝑎))⋂(⊎𝑆3∈𝐼3,𝑆4∈𝐼4 ⋂ 𝑏∈𝑆3 𝐿𝑏 ∩ ⋂ 𝑏∈𝑆4 𝑈𝑏 \ ( ⋃ 𝑏∈𝐴\𝑆3 𝐿𝑏 ∪ ⋃ 𝑏∈𝐴\𝑆2 𝑈𝑏)) = ⊎ 𝑆1∈𝐼1,𝑆2∈𝐼2,𝑆3∈𝐼3,𝑆4∈𝐼4 ( ⋂ 𝑎∈𝑆1 𝐿𝑎 ∩ ⋂ 𝑎∈𝑆2 𝑈𝑎 \ ( ⋃ 𝑎∈𝐴\𝑆1 𝐿𝑎 ∪ ⋃ 𝑎∈𝐴\𝑆2 𝑈𝑎)) ∩ (⋂𝑏∈𝑆3 𝐿𝑏 ∩ ⋂ 𝑏∈𝑆4 𝑈𝑏 \ ( ⋃ 𝑏∈𝐴\𝑆3 𝐿𝑏 ∪ ⋃ 𝑏∈𝐴\𝑆2 𝑈𝑏)) Because 𝐿𝑎,𝑏 and 𝐿𝑎′,𝑏′ are disjoint as long as not 𝑎 = 𝑎′ and 𝑏 = 𝑏′, we have 𝐿𝑎 disjoint from 𝐿𝑎′ if 𝑎 ≠ 𝑎′. Thus, ⋂ 𝑎∈𝑆1 𝐿𝑎∩ ⋂ 𝑎∈𝑆2 𝑈𝑎 \ ( ⋃ 𝑎∈𝐴\𝑆1 𝐿𝑎∪ ⋃ 𝑎∈𝐴\𝑆2 𝑈𝑎) is not empty only when 𝑆1 is singleton and empty. • If 𝑆1 is empty, then⋂ 𝑎∈𝑆1 𝐿𝑎∩ ⋂ 𝑎∈𝑆2 𝑈𝑎\( ⋃ 𝑎∈𝐴\𝑆1 𝐿𝑎∪ ⋃ 𝑎∈𝐴\𝑆2 𝑈𝑎) = ⋂ 𝑎∈𝑆2 𝑈𝑎\( ⋃ 𝑎∈𝐴 𝐿𝑎∪ ⋃ 𝑎∈𝐴\𝑆2 𝑈𝑎) has measure 0 because ⋃ 𝑎∈𝐴 𝐿𝑎 has measure 1. • Otherwise, if 𝑆1 is singleton, say 𝑆1 = {𝑎′}, then⋂ 𝑎∈𝑆1 𝐿𝑎 ∩ ⋂ 𝑎∈𝑆2 𝑈𝑎 \ ( ⋃ 𝑎∈𝐴\𝑆1 𝐿𝑎 ∪ ⋃ 𝑎∈𝐴\𝑆2 𝑈𝑎) = 𝐿𝑎′ ∩ ⋂ 𝑎∈𝑆2 𝑈𝑎 \ ⋃ 𝑎∈𝐴\𝑆2 𝑈𝑎). Furthermore, 𝜇′(⋂𝑎∈𝑆2 𝑈𝑎) = 𝜇′( ⋂ 𝑎∈𝑆2 𝐿𝑎 ⊎ (𝑈𝑎 \ 𝐿𝑎)) = 𝜇′(⋂𝑎∈𝑆2 𝐿𝑎) + 0 And ⋂ 𝑎∈𝑆2 𝐿𝑎 is non-empty only if 𝑆2 is a singleton set or empty set. Thus, 𝐿𝑎′ ∩ ⋂ 𝑎∈𝑆2 𝑈𝑎 \ ⋃ 𝑎∈𝐴\𝑆2 𝑈𝑎) ⊆ ⋂ 𝑎∈𝑆2 𝑈𝑎 has non-zero measure only if 𝑆2 is empty or a singleton set. – When 𝑆2 is empty, 𝐿𝑎′ ∩ ⋂ 𝑎∈𝑆2 𝑈𝑎 \ ⋃ 𝑎∈𝐴\𝑆2 𝑈𝑎 = 𝐿𝑎′ \ ⋃ 𝑎∈𝐴𝑈𝑎 ⊆ 𝐿𝑎′ \𝑈𝑎′ = ∅ 343 – When 𝑆2 = {𝑎′}, 𝐿𝑎′ ∩ ⋂ 𝑎∈𝑆2 𝑈𝑎 \ ⋃ 𝑎∈𝐴\𝑆2 𝑈𝑎 = 𝐿𝑎′ \ ⋃ 𝑎∈𝐴,𝑎≠𝑎′ 𝑈𝑎 . – When 𝑆2 = {𝑎′′} for some 𝑎′′ ≠ 𝑎′ 𝐿𝑎′ ∩ ⋂ 𝑎∈𝑆2 𝑈𝑎 \ ⋃ 𝑎∈𝐴\𝑆2 𝑈𝑎 = 𝐿𝑎′ ∩𝑈𝑎′′ \ ⋃ 𝑎∈𝐴,𝑎≠𝑎′′ 𝑈𝑎 = ∅ Thus, 𝜇′(𝑋1) =𝜇′ ( ⋃ 𝑆1∈𝐼1,𝑆2∈𝐼2 ⋂ 𝑎∈𝑆1 𝐿𝑎 ∩ ⋂ 𝑎∈𝑆2 𝑈𝑎 \ ( ⋃ 𝑎∈𝐴\𝑆1 𝐿𝑎 ∪ ⋃ 𝑎∈𝐴\𝑆2 𝑈𝑎)∩) =𝜇′ ( ⋃ {𝑎′}∈𝐼1,𝑆2∈𝐼2 (𝐿𝑎′ ∩ ⋂ 𝑎∈𝑆2 𝑈𝑎 \ ⋃ 𝑎∈𝐴\𝑆2 𝑈𝑎) ) =𝜇′ ( ⋃ {𝑎′}∈𝐼1∩𝐼2 𝐿𝑎′ ∩𝑈𝑎′ \ ⋃ 𝑎∈𝐴,𝑎≠𝑎′ 𝑈𝑎 ) =𝜇′ ( ⋃ {𝑎′}∈𝐼1∩𝐼2 (𝐿𝑎′ \ ⋃ 𝑎∈𝐴,𝑎≠𝑎′ 𝑈𝑎) ) =𝜇′ ( ⋃ {𝑎′}∈𝐼1∩𝐼2 (𝐿𝑎′ \ ⋃ 𝑎∈𝐴,𝑎≠𝑎′ (𝐿𝑎 ⋃(𝑈𝑎 \ 𝐿𝑎)))) =𝜇′ ( ⋃ {𝑎′}∈𝐼1∩𝐼2 (𝐿𝑎′ \ ⋃ 𝑎∈𝐴,𝑎≠𝑎′ (𝐿𝑎)) ) =𝜇′ ( ⋃ {𝑎′}∈𝐼1∩𝐼2 𝐿𝑎′ ) Denote ⋃ {𝑎′}∈𝐼1∩𝐼2 𝐿𝑎′ as 𝑋′1. And 𝑋1 \ 𝑋′1 and 𝑋′1 \ 𝑋1 both have measure 0. Similar results hold for 𝑋2 as well, and we can show that 𝜇′(𝑋2) =𝜇′ ( ⋃ {𝑏′}∈𝐼3∩𝐼4 𝐿𝑏′ ) Denote ⋃ {𝑏′}∈𝐼3∩𝐼4 𝐿𝑏′ as 𝑋′2. And 𝑋2 \ 𝑋′2 and 𝑋′2 \ 𝑋2 both have measure 0. 344 Thus, 𝜇′(𝑋1 ∩ 𝑋2) =𝜇′(𝑋1 ∩ 𝑋2 ∩ 𝑋′1) + 𝜇 ′((𝑋1 ∩ 𝑋2) \ 𝑋′1) =𝜇′(𝑋1 ∩ 𝑋2 ∩ 𝑋′1) + 0 =𝜇′(𝑋1 ∩ 𝑋2 ∩ 𝑋′1 ∩ 𝑋 ′ 2) + 𝜇 ′((𝑋1 ∩ 𝑋2 ∩ 𝑋′1) \ 𝑋 ′ 2) + 0 =𝜇′(𝑋1 ∩ 𝑋2 ∩ 𝑋′1 ∩ 𝑋 ′ 2) + 0 + 0 =𝜇′(𝑋1 ∩ 𝑋2 ∩ 𝑋′1 ∩ 𝑋 ′ 2) + 𝜇 ′((𝑋2 ∩ 𝑋′1 ∩ 𝑋 ′ 2) \ 𝑋1) =𝜇′(𝑋2 ∩ 𝑋′1 ∩ 𝑋 ′ 2) =𝜇′(𝑋2 ∩ 𝑋′1 ∩ 𝑋 ′ 2) + 𝜇 ′((𝑋′1 ∩ 𝑋 ′ 2) \ 𝑋2) =𝜇′(𝑋′1 ∩ 𝑋 ′ 2) =𝜇′ ( (⋃{𝑎′}∈𝐼1∩𝐼2 𝐿𝑎′) ∩ (⋃{𝑏′}∈𝐼3∩𝐼4 𝐿𝑏′)) =𝜇′ (⋃ {𝑎′}∈𝐼1∩𝐼2,{𝑏′}∈𝐼3∩𝐼4 𝐿𝑎′,𝑏′ ) = ∑︁ {𝑎′}∈𝐼1∩𝐼2 {𝑏′}∈𝐼3∩𝐼4 𝜇′(𝐿𝑎′,𝑏′) Next we show that 𝜇′(𝑖) (𝐿𝑎,𝑏) = 𝜇′(𝑖) (𝑋1) · 𝜇′(𝑖) (𝑋2). Note that 𝜇′(𝐿𝑎) =∑ 𝑏 𝜇 ′(𝐿𝑎,𝑏) = 𝜇′(𝐸−1 1 (𝑎)), and 𝜇′(𝐿𝑏) = ∑ 𝑎 𝜇 ′(𝐿𝑎,𝑏) = 𝜇′(𝐸−1 2 (𝑏)). And 𝜇1 ⊗ 𝜇2 = 𝜇′(𝑖) ◦ (𝐸1, 𝐸2)−1 implies that 𝜇′(𝑖) (𝐿𝑎,𝑏) = 𝜇1 ⊗ 𝜇2(𝑎, 𝑏) = 𝜇1(𝑎) · 𝜇2(𝑏) 345 Then 𝜇1(𝑎) = 𝜇1(𝑎) · ∑︁ 𝑏∈𝐴 𝜇2(𝑏) = ∑︁ 𝑏∈𝐴 𝜇1(𝑎) · 𝜇2(𝑏) = ∑︁ 𝑏∈𝐴 𝜇′(𝑖) (𝐿𝑎,𝑏) = 𝜇′(𝑖) (∑︁ 𝑏∈𝐴 𝐿𝑎,𝑏 ) = 𝜇′(𝑖) (𝐿𝑎), and similarly, 𝜇2(𝑏) = (∑︁ 𝑎∈𝐴 𝜇1(𝑎) ) · 𝜇2(𝑏) = ∑︁ 𝑎∈𝐴 (𝜇1(𝑎) · 𝜇2(𝑏)) = ∑︁ 𝑎∈𝐴 𝜇′(𝑖) (𝐿𝑎,𝑏) = 𝜇′(𝑖) (∑︁ 𝑎∈𝐴 𝐿𝑎,𝑏 ) = 𝜇′(𝑖) (𝐿𝑏). Thus, 𝜇′(𝑖) (𝐿𝑎,𝑏) = 𝜇1(𝑎) · 𝜇2(𝑏) = 𝜇′(𝑖) (𝐿𝑎) · 𝜇′(𝑖) (𝐿𝑏) 346 Therefore, 𝜇′(𝑋1 ∩ 𝑋2) = ∑︁ {𝑎′}∈𝐼1∩𝐼2 {𝑏′}∈𝐼3∩𝐼4 𝜇′(𝐿𝑎′,𝑏′) = ∑︁ {𝑎′}∈𝐼1∩𝐼2 {𝑏′}∈𝐼3∩𝐼4 𝜇′(𝐿𝑎′) · 𝜇′(𝐿𝑏′) = ∑︁ {𝑎′}∈𝐼1∩𝐼2 𝜇′(𝐿𝑎′) · ∑︁ {𝑏′}∈𝐼3∩𝐼4 𝜇′(𝐿𝑏′) =𝜇′(𝑋1) · 𝜇′(𝑋2) =𝜇′1(𝑋1) · 𝜇′2(𝑋2) Thus we have (F1, 𝜇 ′ 1) ⊛ (F2, 𝜇 ′ 2) ⊑ (F ′, 𝜇′). Let 𝑝1 = 𝑝2 = λ𝑥. 𝑝′(𝑥)/2. Next we show that 𝐸1 $∼ 𝜇1(F1, 𝜇 ′ 1, 𝑝1) and 𝐸2 $∼ 𝜇2(F2, 𝜇 ′ 2, 𝑝2). By definition, 𝐸1 $∼ 𝜇1(F1, 𝜇 ′ 1, 𝑝1) is equivalent to ∃F ′′, 𝜇′′. (Own(F ′′, 𝜇′′)) (F1, 𝜇 ′ 1, 𝑝1) ∗ 𝐸1 � (F ′′(𝑖), 𝜇′′(𝑖)) ∧ 𝜇1 = 𝜇′′(𝑖) ◦ 𝐸−1 1 , which is equivalent to ∃F ′′, 𝜇′′. (F ′′, 𝜇′′) ⪯ (F1, 𝜇 ′ 1) ∗ ( ∀𝑎 ∈ 𝐴.∃𝑆𝑎, 𝑇𝑎 ∈ F ′′(𝑖). 𝑆𝑎 ⊆ 𝐸−1 1 (𝑎) ⊆ 𝑇𝑎 ∧ 𝜇 ′′(𝑖) (𝑆𝑎) = 𝜇′′(𝑖) (𝑆𝑎) ∧ 𝜇1(𝑎) = 𝜇′′(𝑖) (𝑆𝑎) = 𝜇′′(𝑖) (𝑇𝑎) ) We can pick the existential witness to be F1, 𝜇 ′ 1. For any 𝑎 ∈ 𝐴, 𝐸−1 1 (𝑎) =⋃ 𝑏∈𝐴 (𝐸1, 𝐸2)−1(𝑎, 𝑏). Because we have 𝐿𝑎,𝑏 ⊆ (𝐸1, 𝐸2)−1(𝑎, 𝑏) ⊆ 𝑈𝑎,𝑏, then ⋃ 𝑏∈𝐴 𝐿𝑎,𝑏 ⊆ 𝐸−1 1 (𝑎) = ⋃ 𝑏∈𝐴 (𝐸1, 𝐸2)−1(𝑎, 𝑏) ⊆ ⋃ 𝑏∈𝐴𝑈𝑎,𝑏 . By definition, for each 𝑎, ⋃ 𝑏∈𝐴 𝐿𝑎,𝑏 ∈ F1(𝑖) and ⋃ 𝑏∈𝐴𝑈𝑎,𝑏 ∈ F1(𝑖), and we also 347 have 𝜇′1(𝑖) ( ⋃ 𝑏∈𝐴 𝐿𝑎,𝑏) = ∑︁ 𝑏∈𝐴 𝜇′1(𝑖) (𝐿𝑎,𝑏) = ∑︁ 𝑏∈𝐴 𝜇′1(𝑖) (𝑈𝑎,𝑏) = 𝜇′1(𝑖) (⋃ 𝑏∈𝐴𝑈𝑎,𝑏 ) = 𝜇1(𝑎) Thus, 𝑆𝑎 = ⋃ 𝑏∈𝐴 𝐿𝑎,𝑏 and 𝑇𝑎 = ⋃ 𝑏∈𝐴𝑈𝑎,𝑏 witnesses the conditions needed for 𝐸1 $∼ 𝜇1(F1, 𝜇 ′ 1, 𝑝1). And similarly, we have 𝐸2 $∼ 𝜇2(F2, 𝜇 ′ 2, 𝑝2). □ Soundness of Conditioning Rules Lemma D.5.4. C-TRUE is sound. Proof. Let 𝜀 = (F𝜀, 𝜇𝜀, 𝑝𝜀) ∈ M𝐼 be the unit ofM𝐼 and 𝜅 = λ𝑣. 𝜇𝜀. Then, True ⊢ Own(F𝜀, 𝜇𝜀) ⊢ Own(F𝜀, 𝜇𝜀) ∗ ⌜∀𝑖 ∈ 𝐼 . 𝜇𝜀 (𝑖) = bind(𝜇, 𝜅(𝑖))⌝ ⊢ Own(F𝜀, 𝜇𝜀) ∗ ⌜∀𝑖 ∈ 𝐼 . 𝜇𝜀 (𝑖) = bind(𝜇, 𝜅(𝑖))⌝ ∗ True ⊢ ∃F𝜀, 𝜇𝜀, 𝜅.Own(F𝜀, 𝜇𝜀) ∗ ⌜∀𝑖 ∈ 𝐼 . 𝜇𝜀 (𝑖) = bind(𝜇, 𝜅(𝑖))⌝ ∗ (∀𝑣 ∈ supp(𝜇).Own(F𝜀, 𝜅(𝐼) (𝑣), 𝑝𝜀) −∗ True) ⊢ C𝜇 .True □ Lemma D.5.5. C-FALSE is sound. Proof. Assume 𝑎 ∈ M𝐼 is such that V(𝑎) and that it satisfies C𝜇 𝑣. False. By 348 definition, this means that, for some F0, 𝜇0, 𝑝0, and 𝜅0: (F0, 𝜇0, 𝑝0) ⪯ 𝑎 (D.4) ∀𝑖 ∈ 𝐼 . 𝜇0(𝑖) = bind(𝜇, 𝜅0(𝑖)) (D.5) ∀𝑣 ∈ supp(𝜇). False(F0, 𝜅0(𝐼) (𝑣), 𝑝0) (D.6) Let 𝑣0 ∈ supp(𝜇)—we know one exists because 𝜇 is a (discrete) probability distri- bution. Then by (D.6) on 𝑣0 we get False(F0, 𝜅0(𝐼) (𝑣0), 𝑝0) holds. Since False( ) is by definition false, we get False(𝑎) holds ex falso. □ Lemma D.5.6. C-CONS is sound. Proof. Assume 𝑎 ∈ M𝐼 is such that V(𝑎) and that it satisfies C𝜇 𝑣. 𝐾 (𝑣). By defi- nition, this means that, for some F0, 𝜇0, 𝑝0, and 𝜅0: (F0, 𝜇0, 𝑝0) ⪯ 𝑎 (D.7) ∀𝑖 ∈ 𝐼 . 𝜇0(𝑖) = bind(𝜇, 𝜅0(𝑖)) (D.8) ∀𝑣 ∈ supp(𝜇). 𝐾 (𝑣) (F0, 𝜅0(𝐼) (𝑣), 𝑝0) (D.9) Then by the premise ∀𝑣. 𝐾 (𝑣) ⊢ 𝐾′(𝑣) and (D.9) we obtain ∀𝑣 ∈ supp(𝜇). 𝐾′(𝑣) (F0, 𝜅0(𝐼) (𝑣), 𝑝0) (D.10) By (D.7), (D.8), and (D.10) we get C𝜇 𝑣. 𝐾′(𝑣) as desired. □ Lemma D.5.7. C-FRAME is sound. Proof. Assume 𝑎 ∈ M𝐼 is such that V(𝑎) and that it satisfies 𝑃 ∗ C𝜇 𝑣. 𝐾 (𝑣). By definition, this means that there exist some (F1, 𝜇1, 𝑝1), (F2, 𝜇2, 𝑝2), and 𝜅 such 349 that (F1, 𝜇1, 𝑝1) · (F2, 𝜇2, 𝑝2) ⪯ 𝑎 (D.11) 𝑃(F1, 𝜇1, 𝑝1) (D.12) ∀𝑖 ∈ 𝐼 .𝜇2(𝑖) = bind(𝜇, 𝜅(𝑖)) (D.13) ∀𝑣 ∈ supp(𝜇). 𝐾 (𝑣) (F2, 𝜅(𝐼) (𝑣), 𝑝2) (D.14) Now let: (F ′, 𝜇′, 𝑝′) = (F1(𝑖), 𝜇1(𝑖)) ⊛ (F2(𝑖), 𝜇2(𝑖)) 𝜅′(𝑖) = λ𝑣. 𝜇1(𝑖) ⊛ 𝜅(𝑖) (𝑣) By lemma D.2.7, for each 𝑖 ∈ 𝐼: (F ′, 𝜇′, 𝑝′) = (F1(𝑖), 𝜇1(𝑖)) ⊛ (F2(𝑖), 𝜇2(𝑖)) = (F1(𝑖) ⊕ F2(𝑖), bind(𝜇, λ𝑣. 𝜇1(𝑖) ⊛ 𝜅(𝑖) (𝑣))) (By lemma D.2.7) = (F1(𝑖) ⊕ F2(𝑖), bind(𝜇, 𝜅′(𝑖))) Notice that 𝜅′(𝐼) (𝑣) = 𝜇1 ⊛ 𝜅(𝐼) (𝑣). Thus we obtain: (F ′, 𝜇′, 𝑝′) ⪯ 𝑎 (D.15) ∀𝑖 ∈ 𝐼 .𝜇′(𝑖) = bind(𝜇, 𝜅′(𝑖)) (D.16) and for all 𝑣 ∈ supp(𝜇), (F1, 𝜇1, 𝑝1) ⊛ (F2, 𝜅(𝐼) (𝑣), 𝑝2) = (F ′, 𝜇1 ⊛ 𝜅(𝐼) (𝑣), 𝑝′) ⪯ (F ′, 𝜅′(𝐼) (𝑣), 𝑝′) (D.17) 𝑃(F1, 𝜇1, 𝑝1) (D.18) 𝐾 (𝑣) (F2, 𝜅(𝐼) (𝑣), 𝑝2) (D.19) which gives us that 𝑎 satisfies C𝜇 𝑣. (𝑃 ∗ 𝐾 (𝑣)) as desired. □ Lemma D.5.8. C-UNIT-L is sound. Proof. Straightforward. □ 350 Lemma D.5.9. C-UNIT-R is sound. Proof. We prove the two directions separately. Forward direction 𝐸 $∼ 𝜇 ⊢ C𝜇 𝑣. ⌈𝐸 = 𝑣⌉ By unfolding the assumption 𝐸 $∼ 𝜇 we get that there exist F , 𝜇 such that: Own(F , 𝜇) ∗ ⌜𝐸 � (F (𝑖), 𝜇(𝑖))⌝ ∗ ⌜𝜇 = 𝜇(𝑖) ◦ 𝐸−1⌝ holds. Let 𝜅 ≜ λ 𝑗 .  λ𝑣. 𝜇( 𝑗) if 𝑗 ≠ 𝑖 λ𝑣. 𝛾𝑣 if 𝑗 = 𝑖 𝛾𝑣 ≜ λ𝑋 :F (𝑖). 𝜇(𝑖) (𝑋 ∩ (𝐸 = 𝑣)−1) 𝜇(𝑖) ((𝐸 = 𝑣)−1) That is, 𝜅( 𝑗) maps every 𝑣 to 𝜇( 𝑗) when 𝑖 ≠ 𝑗 , while when 𝑖 = 𝑗 it maps 𝑣 to the distribution 𝜇(𝑖) conditioned on 𝐸 = 𝑣. Note that 𝜅 is well defined because 1. although the events 𝑋 ∩ (𝐸 = 𝑣)−1 and (𝐸 = 𝑣)−1 might not belong to F (𝑖), their probability is uniquely determined by almost measurabil- ity of 𝐸 ; 2. we are only interested in the cases where 𝑣 ∈ supp(𝜇), which implies that the denominator is not zero: 𝜇(𝑖) ((𝐸 = 𝑣)−1) = 𝜇(𝑣) > 0. By construction we obtain that ∀ 𝑗 ∈ 𝐼 . 𝜇( 𝑗) = bind(𝜇, 𝜅( 𝑗)) (D.20) ∀𝑣 ∈ supp(𝜇). 𝜅(𝑖) (𝑣) ((𝐸 = 𝑣)−1) = 1 (D.21) From (D.21) we get that ⌈𝐸 = 𝑣⌉ holds on (F (𝑖), 𝜅(𝑖) (𝑣), 𝑝(𝑖)), from which it follows that: Own(F , 𝜅(𝐼) (𝑣), 𝑝) −∗ ⌈𝐸 = 𝑣⌉ 351 Therefore we obtain ∃F , 𝜇, 𝜅, 𝑝.Own(F , 𝜇, 𝑝) ∗ ⌜∀ 𝑗 ∈ 𝐼 .𝜇( 𝑗) = bind(𝜇, 𝜅( 𝑗))⌝ ∗ (∀𝑣 ∈ 𝐴𝜇 .Own(F , 𝜅(𝐼) (𝑣), 𝑝) −∗ ⌈𝐸 = 𝑣⌉) which gives us C𝜇 𝑣. ⌈𝐸 = 𝑣⌉ by proposition D.4.1. Backward direction C𝜇 𝑣. ⌈𝐸 = 𝑣⌉ ⊢ 𝐸 $∼ 𝜇 First note that ⌈𝐸 = 𝑣⌉ (F , 𝜅(𝑣), 𝑝) ⇔ ( ((𝐸 = 𝑣) ∈ true) $∼ 𝛿True ) (F , 𝜅(𝐼) (𝑣), 𝑝) ⇔ ((𝐸 = 𝑣) ∈ true) � (F (𝑖), 𝜅(𝑖) (𝑣)) ∧ 𝛿True = 𝜅(𝑖) (𝑣) ◦ ((𝐸 = 𝑣) ∈ true)−1 ⇔ ((𝐸 = 𝑣) ∈ true) � (F (𝑖), 𝜅(𝑖) (𝑣)) ∧ 𝛿𝑣 = 𝜅(𝑖) (𝑣) ◦ 𝐸−1 for some 𝜅. This implies ⌜𝐸 � F (𝑖), 𝜅(𝑖) (𝑣)⌝. Then, for any value 𝑣 ∈ supp(𝜇), 𝜇(𝑖) ◦ 𝐸−1(𝑣) = (bind(𝜇, 𝜅(𝑖)) ◦ 𝐸−1) (𝑣) = bind(𝜇, 𝜅(𝑖)) (𝐸−1(𝑣)) = ∑︁ 𝑣′∈supp(𝜇) 𝜇(𝑣′) · 𝜅(𝑖) (𝑣′) (𝐸−1(𝑣)) = ∑︁ 𝑣′∈supp(𝜇) 𝜇(𝑣′) · (𝜅(𝑖) (𝑣′) ◦ 𝐸−1) (𝑣) = ∑︁ 𝑣′∈supp(𝜇) 𝜇(𝑣′) · 𝛿𝑣′ (𝑣) = 𝜇(𝑣) This implies the pure facts that 𝐸 � (F (𝑖), 𝜇(𝑖)) and 𝜇 = 𝜇(𝑖) ◦ 𝐸−1. There- 352 fore: C𝜇 𝑣. ⌈𝐸 = 𝑣⌉ ⊢ ∃F , 𝜇, 𝜅, 𝑝.Own(F , 𝜇, 𝑝) ∗ ⌜∀ 𝑗 ∈ 𝐼 .𝜇( 𝑗) = bind(𝜇, 𝜅( 𝑗))⌝ ∗ (∀𝑣 ∈ 𝐴𝜇 .Own(F , 𝜅(𝐼) (𝑣), 𝑝) −∗ ⌈𝐸 = 𝑣⌉) ⊢ ∃F , 𝜇.Own(F , 𝜇) ∗ ⌜𝐸 � (F (𝑖), 𝜇(𝑖))⌝ ∗ ⌜𝜇 = 𝜇(𝑖) ◦ 𝐸−1⌝ ⊢ 𝐸 $∼ 𝜇 □ Lemma D.5.10. C-ASSOC is sound. Proof. Define 𝜅′ = λ𝑣. bind(𝜅(𝑣), λ𝑤. return(𝑣, 𝑤)). We start by rewriting the as- sumption C𝜇 𝑣. C𝜅(𝑣) 𝑤. 𝐾 (𝑣, 𝑤) so that 𝑘′ is used and 𝐾 depends only on the binding of the innermost modality: C𝜇 𝑣. C𝜅(𝑣) 𝑤. 𝐾 (𝑣, 𝑤) ⊢ C𝜇 𝑣. C𝜅′ (𝑣) (𝑣′, 𝑤). 𝐾 (𝑣, 𝑤) (C-TRANSF, C-CONS) ⊢ C𝜇 𝑣. C𝜅′ (𝑣) (𝑣′, 𝑤). 𝐾 (𝑣′, 𝑤) (C-PURE, C-CONS) C-TRANSF is applied to the innermost modality by using the bijection 𝑓𝑣 (𝑤) = (𝑣, 𝑤). Then, since (𝑣′, 𝑤) ∈ supp(𝑘′(𝑣)) ⇒ 𝑣 = 𝑣′, we can replace 𝑣′ for 𝑣 in 𝐾 . Our goal is now to prove: C𝜇 𝑣. C𝜅′ (𝑣) (𝑣′, 𝑤). 𝐾 (𝑣′, 𝑤) ⊢ Cbind(𝜇,𝜅′) (𝑣′, 𝑤). 𝐾 (𝑣′, 𝑤) Let 𝑎 ∈ M𝐼 be such that V(𝑎) and that it satisfies C𝜇 𝑣. C𝜅′ (𝑣) (𝑣′, 𝑤). 𝐾 (𝑣′, 𝑤). From this assumption we know that, for some F0, 𝜇0, 𝑝0, and 𝜅0: (F0, 𝜇0, 𝑝0) ⪯ 𝑎 (D.22) ∀𝑖 ∈ 𝐼 . 𝜇0(𝑖) = bind(𝜇, 𝜅0(𝑖)) (D.23) 353 such that ∀𝑣 ∈ supp(𝜇), there are some F 𝑣1 , 𝜇 𝑣 1, 𝑝 𝑣 1, and 𝜅𝑣1 satisfying: (F 𝑣1 , 𝜇 𝑣 1, 𝑝 𝑣 1) ⪯ (F0, 𝜅0(𝐼) (𝑣), 𝑝0) (D.24) ∀𝑖 ∈ 𝐼 . 𝜇𝑣1(𝑖) = bind(𝜅′(𝑣), 𝜅𝑣1 (𝑖)) (D.25) ∀(𝑣′, 𝑤) ∈ supp(𝜅′(𝑣)). 𝐾 (𝑣′, 𝑤) (F 𝑣1 , 𝜅 𝑣 1 (𝐼) (𝑣 ′, 𝑤), 𝑝𝑣1) (D.26) Our goal is to prove Cbind(𝜇,𝜅′) (𝑣′, 𝑤). 𝐾 (𝑣′, 𝑤) holds on 𝑎. To this end, we want to show that there exists 𝜅′2 such that: ∀𝑖 ∈ 𝐼 . 𝜇0(𝑖) = bind(bind(𝜇, 𝜅′), 𝜅′2(𝑖)) (D.27) ∀(𝑣′, 𝑤) ∈ supp(bind(𝜇, 𝜅′)). 𝐾 (𝑣′, 𝑤) (F0, 𝜅 ′ 2(𝐼) (𝑣 ′), 𝑝0) (D.28) Now let 𝜅2(𝑖) = λ(𝑣′, 𝑤). 𝜅𝑣′1 (𝑖) (𝑣 ′, 𝑤). which by construction and eq. (D.25) gives us 𝜇𝑣1(𝑖) = bind(𝜅′(𝑣), 𝜅𝑣1 (𝑖)) = bind(𝜅′(𝑣), 𝜅2(𝑖)) Therefore, by eq. (D.24), we can apply lemma D.2.4 and obtain that there exists a 𝜅′2 such that 𝜅0(𝑖) (𝑣) = bind(𝜅′(𝑣), 𝜅′2(𝑖)) (D.29)( F0, 𝜅 ′ 2(𝑖) (𝑣 ′, 𝑤) ) ⊒ ( F 𝑣′1 , 𝜅2(𝑖) (𝑣′, 𝑤) ) = ( F 𝑣′1 , 𝜅𝑣 ′ 1 (𝑖) (𝑣 ′, 𝑤) ) (D.30) By eqs. (D.23) and (D.29) we have: 𝜇0(𝑖) = bind(𝜇, 𝜅0(𝑖)) = bind(𝜇, λ𝑣. bind(𝜅′(𝑣), 𝜅′2(𝑖))) By associativity of bind = bind(bind(𝜇, 𝜅′), 𝜅′2(𝑖)) 354 which proves eq. (D.27). Finally, to prove eq. (D.28), we can observe that (𝑣′, 𝑤) ∈ supp(bind(𝜇, 𝜅′)) implies 𝑣′ ∈ supp(𝜇); therefore, by (D.26), upward closure of 𝐾 (𝑣′, 𝑤), and (D.30) and (D.24), we can conclude 𝐾 (𝑣′, 𝑤) holds on (F0, 𝜅 ′ 2(𝐼) (𝑣 ′), 𝑝0), as desired. □ Lemma D.5.11. C-UNASSOC is sound. Proof. Assume 𝑎 ∈ M𝐼 is such that V(𝑎) and that it satisfies Cbind(𝜇,𝜅) 𝑤. 𝐾 (𝑤). By definition, this means that, for some F0, 𝜇0, 𝑝0, and 𝜅0: (F0, 𝜇0, 𝑝0) ⪯ 𝑎 (D.31) ∀𝑖 ∈ 𝐼 . 𝜇0(𝑖) = bind(bind(𝜇, 𝜅), 𝜅0(𝑖)) (D.32) ∀𝑤 ∈ supp(bind(𝜇, 𝜅)). 𝐾 (𝑤) (F0, 𝜅0(𝐼) (𝑤), 𝑝0) (D.33) Our goal is to show that 𝑎 satisfies C𝜇 𝑣. C𝜅(𝑣) 𝑤. 𝐾 (𝑤), for which it would suffice to show that there is a 𝜅1 such that: ∀𝑖 ∈ 𝐼 . 𝜇0(𝑖) = bind(𝜇, 𝜅1(𝑖)) (D.34) and for all 𝑣 ∈ supp(𝜇) there is a 𝜅𝑣2 with ∀𝑖 ∈ 𝐼 . 𝜅1(𝑖) (𝑣) = bind(𝜅(𝑣), 𝜅𝑣2 (𝑖)) (D.35) ∀𝑤 ∈ supp(𝜅(𝑣)). 𝐾 (𝑤) (F0, 𝜅 𝑣 2 (𝐼) (𝑤), 𝑝0) (D.36) To prove this we let 𝜅1(𝑖) = λ𝑣. bind(𝜅(𝑣), 𝜅0(𝑖)) 𝜅𝑣2 (𝑖) = 𝜅0(𝑖) By the associativity of bind we have 𝜇0(𝑖) = bind(bind(𝜇, 𝜅), 𝜅0(𝑖)) = bind(𝜇, λ𝑣. bind(𝜅(𝑣), 𝜅0(𝑖))) = bind(𝜇, 𝜅1(𝑖)) 355 which proves (D.34). By construction, 𝜅1(𝑖) (𝑣) = bind(𝜅(𝑣), 𝜅0(𝑖)) = bind(𝜅(𝑣), 𝜅𝑣2 (𝑖)) proving (D.35). Finally, 𝑣 ∈ supp(𝜇) and 𝑤 ∈ supp(𝜅(𝑣)) imply 𝑤 ∈ supp(bind(𝜇, 𝜅)), so by (D.33) we proved (D.36), concluding the proof. □ Lemma D.5.12. C-SKOLEM is sound. Proof. For any resource 𝑟 = (F , 𝜇, 𝑝), ( C𝜇 𝑣. ∃𝑥 : Var. 𝑄(𝑣, 𝑥) ) (F , 𝜇, 𝑝) ⇔ ∃𝜅.∀𝑖 ∈ 𝐼 .𝜇(𝑖) = bind(𝜇, 𝜅(𝑖)) ∧ ∀𝑣 ∈ supp(𝜇). (∃𝑥 : 𝑋. 𝑄(𝑣, 𝑥)) (F , 𝜅(𝐼) (𝑣), 𝑝) For all 𝑣 ∈ supp(𝜇), ∃𝑥 : 𝑋. 𝑄(𝑣, 𝑥) holds on (F , 𝜅(𝐼) (𝑣), 𝑝). Thus, 𝑄(𝑣, 𝑥𝑣) (F , 𝜅(𝐼) (𝑣), 𝑝) holds for some 𝑥𝑣. Then define 𝑓 : 𝐴 → Var by letting 𝑓 (𝑣) = 𝑥𝑣 for 𝑣 ∈ supp(𝜇). Then, ∃𝜅.∀𝑖 ∈ 𝐼 .𝜇(𝑖) = bind(𝜇, 𝜅(𝑖)) ∧ ∀𝑣 ∈ supp(𝜇). 𝑄(𝑣, 𝑓 (𝑣)) (F , 𝜅(𝐼) (𝑣), 𝑝) And therefore F , 𝜇, 𝑝 satisfies ∃ 𝑓 : 𝐴→ Var. C𝜇 𝑣. 𝑄(𝑣, 𝑥). □ Lemma D.5.13. C-TRANSF is sound. Proof. For any resource 𝑎 = (F , 𝜇, 𝑝), if ( C𝜇 𝑣. 𝐾 (𝑣) ) ((F , 𝜇, 𝑝)), then ∃𝜅. (F , 𝜇, 𝑝) ⪯ 𝑎 ∧ ∀𝑖 ∈ 𝐼 . 𝜇(𝑖) = bind(𝜇, 𝜅(𝑖)) ∧ ∀𝑣 ∈ supp(𝜇).(𝐾 (𝑣)) ((F , 𝜅(𝐼) (𝑣), 𝑝)) 356 𝜇 = bind(𝜇, 𝜅) says that for any 𝐸 ∈ F , 𝜇(𝐸) = ∑︁ 𝑣∈supp(𝜇) 𝜇(𝑣) · 𝜅(𝐼) (𝑣) (𝐸) = ∑︁ 𝑣 | 𝑓 (𝑣)∈supp(𝜇) 𝜇( 𝑓 (𝑣)) · 𝜅(𝐼) ( 𝑓 (𝑣)) (𝐸) (Because 𝑓 is bijective) = ∑︁ 𝑣∈supp(𝜇′) 𝜇′(𝑣) · 𝜅(𝐼) ( 𝑓 (𝑣)) (𝐸) (Because 𝜇′(𝑣) = 𝜇( 𝑓 (𝑣))) = bind(𝜇′, λ𝑣. 𝜅(𝐼) ( 𝑓 (𝑣))) (𝐸) Thus, 𝜇 = bind(𝜇′, λ𝑣. 𝜅(𝐼) ( 𝑓 (𝑣))). Furthermore, (𝐾 ( 𝑓 (𝑣))) ((F , 𝜅(𝐼) ( 𝑓 (𝑣)), 𝑝)). Thus, if we denote λ𝑣. 𝜅(𝐼) ( 𝑓 (𝑣)) as 𝜅′, it satisfies (F , 𝜇, 𝑝) ⪯ 𝑎 ∧ ∀𝑖 ∈ 𝐼 . 𝜇(𝑖) = bind(𝜇′, 𝜅′(𝑖)) ∧ ∀𝑣 ∈ supp(𝜇).(𝐾 (𝑣)) ((F , 𝜅′(𝐼) (𝑣), 𝑝)) Thus, ( C′𝜇 𝑣. 𝐾 ( 𝑓 (𝑣)) ) ((F , 𝜇, 𝑝)). □ Lemma D.5.14. SURE-STR-CONVEX is sound. Proof. Assume 𝑎 ∈ M𝐼 is a valid resource that satisfies C𝜇 𝑣.(𝐾 (𝑣) ∗ ⌈𝐸⌉). Then, by definition, we know that, for some (F0, 𝜇0, 𝑝0) and 𝜅0: (F0, 𝜇0, 𝑝0) ⪯ 𝑎 (D.37) ∀𝑖 ∈ 𝐼 . 𝜇0(𝑖) = bind(𝜇, 𝜅0(𝑖)) (D.38) and, for all 𝑣 ∈ supp(𝜇), there are (F 𝑣1 , 𝜇 𝑣 1, 𝑝 𝑣 1), (F 𝑣 2 , 𝜇 𝑣 2, 𝑝 𝑣 2) such that (F 𝑣1 , 𝜇 𝑣 1, 𝑝 𝑣 1) · (F 𝑣 2 , 𝜇 𝑣 2, 𝑝 𝑣 2) ⪯ (F0, 𝜅0(𝐼) (𝑣), 𝑝0) (D.39) 𝐾 (𝑣) (F 𝑣1 , 𝜇 𝑣 1, 𝑝 𝑣 1) (D.40) ⌈𝐸⌉ (F 𝑣2 , 𝜇 𝑣 2, 𝑝 𝑣 2) (D.41) 357 From (D.41) we know that for all 𝑣 ∈ supp(𝜇) there are 𝐿𝑣1, 𝐿 𝑣 0,𝑈 𝑣 1 ,𝑈 𝑣 0 ∈ F 𝑣 2 (𝑖) such that: 𝐿𝑣0 ⊆ 𝐸 −1(False) ⊆ 𝑈𝑣 0 𝜇𝑣2(𝐿 𝑣 0) = 𝜇 𝑣 2(𝑈 𝑣 0) = 0 𝐿𝑣1 ⊆ 𝐸 −1(True) ⊆ 𝑈𝑣 1 𝜇𝑣2(𝐿 𝑣 1) = 𝜇 𝑣 2(𝑈 𝑣 1) = 1 Without loss of generality, all 𝐿𝑣0, 𝐿 𝑣 1,𝑈 𝑣 0 ,𝑈 𝑣 1 can be assumed to be only non-trivial on FV(𝐸). Consequently, we can also assume that 𝑝𝑣2(𝑥) < 1 for every 𝑥, and in addition 𝑝𝑣2(𝑥) > 0 if and only if 𝑥 ∈ FV 𝐸 and 𝑗 = 𝑖. From these components we can construct a new resource: F3( 𝑗) ≜  𝜎 ( {⋂𝑣∈supp(𝜇) 𝐿 𝑣 1, ⋃ 𝑣∈supp(𝜇)𝑈 𝑣 1} ) if 𝑗 = 𝑖 {Mem[Var], ∅} if 𝑗 ≠ 𝑖 𝜇3 ≜ 𝜇0 |F3 𝑝3 ≜ λ𝑥.  min { 𝑝𝑣2(𝑥) |𝑣 ∈ supp(𝜇) } if 𝑗 = 𝑖 ∧ 𝑥 ∈ FV(𝐸) 0 otherwise By construction we obtain that ∀ 𝑗 ∈ 𝐼 . F3( 𝑗) ⊆ F0( 𝑗), and that V(F3, 𝜇3, 𝑝3). Now letting 𝑝′1 = 𝑝0 − 𝑝3, we obtain a valid resource (F0, 𝜇0, 𝑝 ′ 1). Moreover, we have F0 = F0⊕F3 and ∀ 𝑗 ∈ 𝐼 .∀𝑋 ∈ F3( 𝑗). 𝜇3(𝑋) ∈ {0, 1}, which means that for any 𝑋 ∈ F3 and 𝑌 ∈ F0, 𝜇3(𝑋) · 𝜇0(𝑌 ) = 𝜇0(𝑋∩𝑌 ). Then, by (D.38): (F0, bind(𝜇, 𝜅0), 𝑝′1) ⊛ (F3, 𝜇3, 𝑝3) ⪯ (F0, 𝜇0, 𝑝0) = 𝑎 To close the proof it would then suffice to show that C𝜇 𝑣.𝐾 (𝑣) holds on (F0, bind(𝜇, 𝜅0), 𝑝′1) and that ⌈𝐸⌉ holds on (F3( 𝑗), 𝜇3, 𝑝3). The latter is obvious. The former follows from the fact that 𝜅0( 𝑗) (𝑣) |F 𝑣1 = 𝜇𝑣1( 𝑗); by upward-closure and (D.40) this means that, for all 𝑣 ∈ supp(𝜇): 𝐾 (𝑣) (F 𝑣1 , 𝜇 𝑣 1, 𝑝 𝑣 1) ⇒ 𝐾 (𝑣) (F0, 𝜅0(𝐼) (𝑣), 𝑝′1) 358 which proves our claim. □ Lemma D.5.15. C-FOR-ALL is sound. Proof. By unfolding the definitions, C𝜇 𝑣.∀𝑥 : 𝑋.𝑄(𝑣) ⇔ ∃F , 𝜇0, 𝜅.Own((F , 𝜇0)) ∗ ⌜∀𝑖 ∈ 𝐼 . 𝜇0(𝑖) = bind(𝜇, 𝜅(𝑖))⌝ ∗ (∀𝑎 ∈ 𝐴𝜇 .Own((F , [𝑖: 𝜅(𝑖) (𝑎) | 𝑖 ∈ 𝐼])) −∗ ∀𝑥 : 𝑋.𝑄(𝑣)) ⇒ ∀𝑥 : 𝑋.∃F , 𝜇0, 𝜅.Own((F , 𝜇0)) ∗ ⌜∀𝑖 ∈ 𝐼 . 𝜇0(𝑖) = bind(𝜇, 𝜅(𝑖))⌝ ∗ (∀𝑎 ∈ 𝐴𝜇 .Own((F , [𝑖: 𝜅(𝑖) (𝑎) | 𝑖 ∈ 𝐼])) −∗ 𝑄(𝑣)) ⇔∀𝑥 : 𝑋. C𝜇 𝑣.𝑄(𝑣) □ Lemma D.5.16. C-PURE is sound. Proof. We first prove the forward direction: For any 𝑎 ∈ M𝐼 , if( ⌜𝜇(𝑋) = 1⌝ ∗ C𝜇 .𝐾 (𝑣) ) ((𝑎)), then there exists some F0, 𝜇0, 𝑝0, and 𝜅0: (F0, 𝜇0, 𝑝0) ⪯ 𝑎 ∀𝑖 ∈ 𝐼 . 𝜇0(𝑖) = bind(𝜇, 𝜅0(𝑖)) ∀𝑣 ∈ supp(𝜇). (𝐾 (𝑣)) ((F0, 𝜅0(𝐼) (𝑣), 𝑝0)) The pure fact ⌜𝜇(𝑋) = 1⌝ implies that 𝑋 ⊇ supp(𝜇) , and thus for every 𝑣 ∈ supp(𝜇), ⌜𝑣 ∈ 𝑋⌝. Therefore, (𝐾 (𝑣)) ((F0, 𝜅0(𝐼) (𝑣), 𝑝0)), which witnesses that( C𝜇 .⌜𝑣 ∈ 𝑋⌝ ∗ 𝐾 (𝑣) ) ((𝑎)). We then prove the backward direction: if C𝜇 .⌜𝑣 ∈ 𝑋⌝∗𝐾 (𝑣), then there exists 359 F0, 𝜇0, 𝑝0, and 𝜅0: (F0, 𝜇0, 𝑝0) ⪯ 𝑎 ∀𝑖 ∈ 𝐼 . 𝜇0(𝑖) = bind(𝜇, 𝜅0(𝑖)) ∀𝑣 ∈ supp(𝜇). (⌜𝑣 ∈ 𝑋⌝ ∗ 𝐾 (𝑣)) ((F0, 𝜅0(𝐼) (𝑣), 𝑝0)) Then it must 𝑋 ⊇ supp(𝜇), which implies that ⌜𝜇(𝑋) = 1⌝. Meanwhile, ⌜𝑣 ∈ 𝑋⌝ ∗ 𝐾 (𝑣) holding on (F0, 𝜅0(𝐼) (𝑣), 𝑝0) implies that 𝐾 (𝑣) holds on (F0, 𝜅0(𝐼) (𝑣), 𝑝0) Therefore, ⌜𝜇(𝑋) = 1⌝ ∗ C𝜇 .𝐾 (𝑣) holds on 𝑎. □ D.5.2 Soundness of Primitive WP Rules Structural Rules Lemma D.5.17. WP-CONS is sound. Proof. For any resource 𝑎, if (wp 𝑡 {𝑄})(𝑎), then ∀𝜇0.∀𝑐. (𝑎 · 𝑐) ⪯ 𝜇0 ⇒ ∃𝑏. ( (𝑏 · 𝑐) ⪯ ⟦𝑡⟧(𝜇0) ∧ (𝑄) ((𝑏)) ) From the premise 𝑄 ⊢ 𝑄′, and the fact that 𝑏 must be valid for (𝑏 · 𝑐) ⪯ ⟦𝑡⟧(𝜇0) to hold, we have that 𝑄(𝑏) implies 𝑄′(𝑏). Thus, it must ∀𝜇0.∀𝑐. (𝑎 · 𝑐) ⪯ 𝜇0 ⇒ ∃𝑏. ( (𝑏 · 𝑐) ⪯ ⟦𝑡⟧(𝜇0) ∧𝑄′(𝑏) ) , which says (wp 𝑡 {𝑄′})(𝑎). □ Lemma D.5.18. WP-FRAME is sound. 360 Proof. Let 𝑎 ∈ M𝐼 be a valid resource such that it satisfies 𝑃 ∗ wp 𝑡 {𝑄}. By definition, this means that, for some 𝑎1, 𝑎2: 𝑎1 · 𝑎2 ⪯ 𝑎 (D.42) 𝑃(𝑎1) (D.43) ∀𝜇0, 𝑐. (𝑎2 · 𝑐) ⪯ 𝜇0 ⇒ ∃𝑏. ( (𝑏 · 𝑐) ⪯ ⟦𝑡⟧(𝜇0) ∧𝑄(𝑏) ) (D.44) Our goal is to prove 𝑎 satisfies wp 𝑡 {𝑃 ∗𝑄}, which, by unfolding the definitions, amounts to: ∃𝑎′ ⪯ 𝑎.∀𝜇0, 𝑐 ′. (𝑎′·𝑐′) ⪯ 𝜇0 ⇒ ∃𝑏1, 𝑏. ((𝑏1·𝑏)·𝑐′) ⪯ ⟦𝑡⟧(𝜇0)∧𝑃(𝑏1)∧𝑄(𝑏) (D.45) Our goal can be proven by instantiating 𝑎′ = (𝑎1 ·𝑎2) and 𝑏1 = 𝑎1, from which we reduce the goal to proving, for all 𝜇0, 𝑐 ′: ((𝑎1 · 𝑎2) · 𝑐′) ⪯ 𝜇0 ⇒ ∃𝑏. ((𝑎1 · 𝑏) · 𝑐′) ⪯ ⟦𝑡⟧(𝜇0) ∧ 𝑃(𝑎1) ∧𝑄(𝑏) (D.46) We have that 𝑃(𝑎1) holds by (D.43). By associativity and commutativity of the RA operation, we reduce the goal to: (𝑎2 · (𝑎1 · 𝑐′)) ⪯ 𝜇0 ⇒ ∃𝑏. (𝑏 · (𝑎1 · 𝑐′)) ⪯ ⟦𝑡⟧(𝜇0) ∧𝑄(𝑏) (D.47) This follows by applying assumption (D.44) with 𝑐 = (𝑎1 · 𝑐′). □ Lemma D.5.19. C-WP-SWAP is sound. Proof. By the meaning of conditioning modality and weakest precondition transformer, (ownVar ∧ C𝜇 𝑣.wp 𝑡 {𝑄(𝑣)})(𝑎) ⇔ ownVar(𝑎) ∧ ∃F , 𝜇, 𝑝, 𝜅. (F , 𝜇, 𝑝) ⪯ 𝑎 ∧ ∀𝑖 ∈ 𝐼 . 𝜇(𝑖) = bind(𝜇, 𝜅(𝑖)) ∧ ∀𝑣 ∈ supp(𝜇).(wp 𝑡 {𝑄(𝑣)})(F , 𝜅(𝐼) (𝑣), 𝑝) 361 Intuitively, for each 𝑣, running 𝑡 on each fibre (F , 𝜅(𝐼) (𝑣), 𝑝) gives a output re- source that satisfies 𝑄(𝑣). Assume V(𝑎) holds and let 𝑎 = (F𝑎, 𝜇𝑎, 𝑝𝑎). By lemma D.2.4, when (F , 𝜇, 𝑝) ⪯ 𝑎, 𝜇 = bind(𝜇, 𝜅) iff that there exists 𝜅′′ such that 𝜇𝑎 = bind(𝜇, 𝜅′′) and 𝜅(𝐼) (𝑣) ⊑ 𝜅′′(𝐼) (𝑣) for every 𝑣. Thus, (C𝜇 𝑣.wp 𝑡 {𝑄(𝑣)})(F𝑎, 𝜇𝑎, 𝑝𝑎) ⇔ ∃𝜅.∀𝑖 ∈ 𝐼 . 𝜇𝑎 (𝑖) = bind(𝜇, 𝜅′′(𝑖)) ∧ ∀𝑣 ∈ supp(𝜇).(wp 𝑡 {𝑄(𝑣)})(F , 𝜅(𝐼) (𝑣), 𝑝) We want to show that wp 𝑡 {C𝜇 𝑣.𝑄(𝑣)}(𝑎) which is equivalent to ∀𝜇′.∀𝑐. 𝑎 · 𝑐 ⪯ 𝜇′⇒ ∃𝑎′. 𝑎′ · 𝑐 ⪯ ⟦𝑡⟧(𝜇′) ∧ (C𝜇 𝑄(𝑣)) (𝑎). Let’s fix an arbitrary 𝜇′, 𝑐 that satisfy V(𝑎 · 𝑐) ∧ 𝑎 · 𝑐 ⪯ 𝑎𝜇′ , we try to construct a corresponding 𝑎′. The high-level approach that we will take is to show that running 𝑡 on 𝑎 takes us to a resource that is equivalent to bind the set of output resource satisfying 𝑄(𝑣) to 𝜇. Recall that 𝑎 = (F𝑎, 𝜇𝑎, 𝑝𝑎) also satisfies ownVar, which says F𝑎 = ΣVar. We claim that 𝑎 ·𝑐 ⪯ (ΣVar, 𝜇 ′, 𝑝1) holds implies that the probability space 𝑐 is trivial. Say 𝑐 = (F𝑐, 𝜇𝑐, 𝑝𝑐), then for any 𝐸 ∈ F𝑐, the event 𝐸 must also in F𝑎 and ΣVar because they are the full sigma algebra. By definition of 𝑎 · 𝑐 ⪯ (ΣVar, 𝜇 ′, 𝑝1), we have 𝜇𝑐 (𝐸) · 𝜇𝑎 (𝐸) = 𝜇′(𝐸 ∩ 𝐸) = 𝜇′(𝐸). (D.48) Another implication of 𝑎 · 𝑐 ⪯ (ΣVar, 𝜇 ′, 𝑝1) is that we have 𝜇𝑐 (𝐸) = 𝜇′(𝐸) and 362 𝜇𝑎 (𝐸) = 𝜇′(𝐸). Combining with eq. (D.48), we can conclude 𝜇′(𝐸) · 𝜇′(𝐸) = 𝜇′(𝐸), which implies that 𝜇𝑐 (𝐸) = 𝜇′(𝐸) ∈ {0, 1}. Therefore, 𝑐 is a trivial probability space and (F𝑎, 𝜅(𝐼) (𝑣), 𝑝𝑎) · 𝑐 ⪯ (F𝑎, 𝜅(𝐼) (𝑣), 𝑝𝑎) Furthermore, for every 𝑣 ∈ supp(𝜇), we have (wp 𝑡 {𝑄(𝑣)})(F , 𝜅(𝐼) (𝑣), 𝑝) which implies ∀𝜅′.(F𝑎, 𝜅(𝐼) (𝑣), 𝑝𝑎) · 𝑐 ⪯ 𝜅′(𝐼) (𝑣) (D.49) ⇒ ∃𝑎𝑣 . (𝑎𝑣 · 𝑐 ⪯ ⟦𝑡⟧(𝜅′(𝐼) (𝑣))) ∧𝑄(𝑣) (𝑎𝑣). (D.50) Therefore, 𝑎 · 𝑐 ⪯ 𝑎𝜇′ ⇒ ∀𝑣 ∈ supp(𝜇).(V((F𝑎, 𝜅(𝐼) (𝑣), 𝑝𝑎) · 𝑐) ∧ (F𝑎, 𝜅(𝐼) (𝑣), 𝑝𝑎) · 𝑐 ⪯ (ΣVar, 𝜅(𝐼) (𝑣), 1) (By D.2.7 and D.2.4) ⇒ ∀𝑣 ∈ supp(𝜇).∃𝑎𝑣 .V(𝑎𝑣 · 𝑐) ∧ (𝑎𝑣 · 𝑐 ⪯ (ΣVar, ⟦𝑡⟧(𝜅(𝐼) (𝑣)), 1)) ∧𝑄(𝑣) (𝑎𝑣) (By eq. (D.49)) ⇒ ∀𝑣 ∈ supp(𝜇).𝑝𝑎𝑣 + 𝑝𝑐 ⪯ 1 ∧𝑄(𝑣) (ΣVar, ⟦𝑡⟧(𝜅′(𝐼) (𝑣)), 1). (By upwards closure) Let 𝑎′𝑣 = (ΣVar, ⟦𝑡⟧(𝜅′(𝐼) (𝑣)), 𝑝𝑎). Because 𝜇𝑐 (𝐸) ∈ {0, 1} for any 𝐸 ∈ F𝑐, for every 𝑣, we have (ΣVar, ⟦𝑡⟧(𝜅′(𝐼) (𝑣))) · (F𝑐, 𝜇𝑐) defined and thus 𝑎′𝑣 · 𝑐 valid. Define 𝑎′ = (ΣVar, bind(𝜇, λ𝑣. ⟦𝑡⟧(𝜅′(𝐼) (𝑣)), 𝑝𝑎) By lemma D.2.7, V(𝑎′𝑣 ·𝑐) for all 𝑣 ∈ supp𝜇 implies V(𝑎′·𝑐). Also, because𝑄(𝑣) (𝑎𝑣) for all 𝑣 ∈ 𝐴𝜇, (C𝜇 𝑣.𝑄(𝑣)) (𝑎′). Thus, (wp 𝑡 {C𝜇 𝑣.𝑄(𝑣)})(𝑎). □ 363 Program Rules Lemma D.5.20. WP-SKIP is sound. Proof. Assume 𝑎 ∈ M𝐼 is valid and such that 𝑃(𝑎) holds. By unfolding the definition of WP, we need to prove ∀𝜇0.∀𝑐. (𝑎 · 𝑐) ⪯ 𝜇0 ⇒ ∃𝑏. ( (𝑏 · 𝑐) ⪯ ⟦𝑡⟧(𝜇0) ∧ 𝑃(𝑏) ) which follows trivially by ⟦[𝑖:skip]⟧(𝜇0) = 𝜇0 and picking 𝑏 = 𝑎. □ Lemma D.5.21. WP-SEQ is sound. Proof. Assume 𝑎0 ∈ M𝐼 is a valid resource such that (wp [𝑖: 𝑡] {wp [𝑖: 𝑡′] {𝑄}})(𝑎0) holds. Our goal is to prove (wp ( [𝑖: 𝑡; 𝑡′]) {𝑄})(𝑎0) holds, which unfolds by defi- nition of WP into: ∀𝜇0.∀𝑐0. (𝑎0 · 𝑐0) ⪯ 𝜇0 ⇒ ∃𝑎2. ( (𝑎2 · 𝑐0) ⪯ ⟦[𝑖: 𝑡; 𝑡′]⟧(𝜇0) ∧𝑄(𝑎2) ) (D.51) Take an arbitrary 𝜇0 and 𝑐0 such that (𝑎0 · 𝑐0) ⪯ 𝜇0. By unfolding the WPs in the assumption, we have that there exists a 𝑎1 ∈ M𝐼 such that: (𝑎1 · 𝑐0) ⪯ ⟦[𝑖: 𝑡]⟧(𝜇0) (D.52) ∀𝜇1.∀𝑐1. (𝑎1 · 𝑐1) ⪯ 𝜇1 ⇒ ∃𝑎2. ((𝑎2 · 𝑐1) ⪯ ⟦[𝑖: 𝑡′]⟧(𝜇1) ∧𝑄(𝑎2)) (D.53) We can apply (D.53) to (D.52) by instantiating 𝜇1 with ⟦[𝑖: 𝑡]⟧(𝜇0), and 𝑐1 with 𝑐0, obtaining: ∃𝑎2. ((𝑎2 · 𝑐0) ⪯ ⟦[𝑖: 𝑡′]⟧(⟦[𝑖: 𝑡]⟧(𝜇0)) ∧𝑄(𝑎2)) Since by definition, ⟦𝑡; 𝑡′⟧(𝜇0) = ⟦𝑡′⟧(⟦𝑡⟧(𝜇0)), we obtain the goal (D.51) as de- sired. □ 364 Lemma D.5.22. WP-ASSIGN is sound. Proof. Let 𝑎 ∈ M𝐼 be a valid resource, and let 𝑎(𝑖) = (F , 𝜇, 𝑝). By assumption we have 𝑝(𝑥) = 1 and 𝑝(𝑦) > 0 for all 𝑦 ∈ FV(𝑒). We want to show that 𝑎 satisfies wp [𝑖:x := 𝑒] {⌈𝑥 = 𝑒⌉}. This is equivalent to ∀𝜇0.∀𝑐. (𝑎 · 𝑐 ⪯ 𝜇0) ⇒ ∃𝑏. (𝑏 · 𝑐 ⪯ ⟦[𝑖:x := 𝑒]⟧(𝜇0) ∧ ⌈𝑥 = 𝑒⌉ (𝑏)) We show this holds by picking 𝑏 as follows: 𝑏 ≜ 𝑎[𝑖: (F𝑏, 𝜇𝑏, 𝑝)] F𝑏 ≜ {Mem[Var], ∅, 𝐴,Mem[Var] \ 𝐴} 𝐴 ≜ {𝑠[𝑥 ↦→ ⟦𝑒⟧(𝑠) ] |𝑠 ∈ Mem[Var]} where 𝜇𝑏 is determined by setting 𝜇𝑏 (𝐴) = 1. By construction we have that ⌈𝑥 = 𝑒⌉ (𝑏) holds. To close the proof we then need to show that (𝑏 · 𝑐) ⪯ ⟦[𝑖:x := 𝑒]⟧(𝜇0). Let 𝑐(𝑖) = (F𝑐, 𝜇𝑐, 𝑝𝑐). Observe that by the assumptions on 𝑝, we have V(𝑏) since F𝑏 is only non-trivial on FV(𝑒) ∪ {𝑥}; moreover, by the assumption V(𝑎 · 𝑐) we have that V(𝑝 + 𝑝𝑐) holds, which means that 𝑝𝑐 (𝑥) = 0, and thus F𝑐 is trivial on 𝑥. Let us define the function pre : 𝒫(Mem[Var]) → 𝒫(Mem[Var]) as: pre(𝑋) ≜ {𝑠 |𝑠[𝑥 ↦→ ⟦𝑒⟧(𝑠) ] ∈ 𝑋}. That is, pre(𝑋) is the weakest precondition (in the standard sense) of the assign- ment. By construction, we have: pre(𝐴) = Mem[Var] pre(𝑋1 ∩ 𝑋2) = pre(𝑋1) ∩ pre(𝑋2) pre(Mem[Var] \ 𝐴) = ∅ pre(𝑋𝑐) = 𝑋𝑐 for all 𝑋𝑐 ∈ F𝑐 In particular, the latter holds because F𝑐 is trivial in 𝑥. 365 By unfolding the definition of ⟦ · ⟧, it is easy to check that for every 𝑋 ∈ ΣMem[Var] : ⟦x := 𝑒⟧(𝜇0) (𝑋) = 𝜇0(pre(𝑋)) We are now ready to show (𝑏 ·𝑐) ⪯ ⟦[𝑖:x := 𝑒]⟧(𝜇0) by showing that (F𝑏, 𝜇𝑏)⊛ (F𝑐, 𝜇𝑐) = (F𝑏 ⊕ F𝑐, ⟦x := 𝑒⟧(𝜇0) | (F𝑏⊕F𝑐)) where 𝜇0 = 𝜇0(𝑖). To show this it suffices to prove that for every 𝑋𝑏 ∈ F𝑏 and every 𝑋𝑐 ∈ F𝑐, ⟦x := 𝑒⟧(𝜇0) (𝑋𝑏 ∩ 𝑋𝑐) = 𝜇𝑏 (𝑋𝑏) · 𝜇𝑐 (𝑋𝑐). We proceed by case analysis on 𝑋𝑏: Case: 𝑋𝑏 = 𝐴 Then: ⟦x := 𝑒⟧(𝜇0) (𝐴 ∩ 𝑋𝑐) = 𝜇0(pre(𝐴 ∩ 𝑋𝑐)) = 𝜇0(pre(𝐴) ∩ pre(𝑋𝑐)) = 𝜇0(Mem[Var] ∩ pre(𝑋𝑐)) = 𝜇0(pre(𝑋𝑐)) = 𝜇𝑏 (𝐴) · 𝜇0(𝑋𝑐) = 𝜇𝑏 (𝐴) · 𝜇𝑐 (𝑋𝑐) Case: 𝑋𝑏 = Mem[Var] \ 𝐴 Then: ⟦x := 𝑒⟧(𝜇0) (Mem[Var] \ 𝐴 ∩ 𝑋𝑐) = 𝜇0(pre((Mem[Var] \ 𝐴) ∩ 𝑋𝑐)) = 𝜇0(pre(Mem[Var] \ 𝐴) ∩ pre(𝑋𝑐)) = 𝜇0(∅ ∩ pre(𝑋𝑐)) = 0 = 𝜇𝑏 (Mem[Var] \ 𝐴) · 𝜇𝑐 (𝑋𝑐) Case: 𝑋𝑏 = Mem[Var] or 𝑋𝑏 = ∅ Analogous to the previous cases. □ Lemma D.5.23. WP-SAMP is sound. 366 Proof. Assume 𝑎 ∈ M𝐼 is valid and such that 𝑎(𝑖) = (F , 𝜇, 𝑝), with 𝑝(𝑥) = 1. Our goal is to show that 𝑎 satisfies wp [𝑖:x 𝑑(®𝑣)] {𝑥 $∼ 𝑑 (®𝑣)} which is equivalent to proving, for all 𝜇0 and for all 𝑐: (𝑎 · 𝑐 ⪯ 𝜇0) ⇒ ∃𝑏. ( 𝑏 · 𝑐 ⪯ ⟦[𝑖:x 𝑑(®𝑣)]⟧(𝜇0) ∧ (𝑥 $∼ 𝑑 (®𝑣)) (𝑏) ) (D.54) Let 𝜇0 = 𝜇0(𝑖) and 𝜇1 = ⟦x 𝑑(®𝑣)⟧(𝜇0). Moreover, let 𝑐(𝑖) = (F𝑐, 𝜇𝑐, 𝑝𝑐). Observe that by the assumptions on 𝑝 and validity of 𝑎 · 𝑐, we have 𝑝𝑐 (𝑥) = 0, which means F𝑐 is trivial on 𝑥. We aim to prove (D.54) by letting 𝑏 ≜ 𝑎[𝑖: (F𝑏, 𝜇𝑏, 𝑝𝑏)] 𝜇𝑏 ≜ 𝜇1 |F𝑏 F𝑏 ≜ 𝜎 ( {[} ] {𝑠 ∈ Mem[Var] |𝑠(𝑥) = 𝑣}|𝑣 ∈ Val ) 𝑝𝑏 ≜ (𝑥:1) Note that by construction V(𝑝𝑏 + 𝑝𝑐), and V(𝑏) since F𝑏 is only non-trivial in 𝑥. Similarly to the proof for section 5.4, we define the function pre : 𝒫(Mem[Var]) → 𝒫(Mem[Var]) as: pre(𝑋) ≜ {𝑠 |∃𝑣 ∈ Val. 𝑠[𝑥 ↦→ 𝑣 ] ∈ 𝑋}. Since F𝑐 is trivial on 𝑥, for all 𝑋𝑐 ∈ F𝑐, pre(𝑋𝑐) = 𝑋𝑐. Moreover, for all 𝑋𝑏 ∈ F𝑏 \ {∅}, pre(𝑋𝑏) = Mem[Var], since 𝑋𝑏 is trivial on every variable except 𝑥. By unfolding the definitions, we have: 𝜇1(𝑋) = ⟦x 𝑑(®𝑣)⟧(𝜇0) (𝑋) = ∑︁ 𝑠∈𝑋 𝜇0(pre(𝑠)) · ⟦𝑑⟧(®𝑣) (𝑠(𝑥)) We now show that (F𝑏, 𝜇𝑏)⊛(F𝑐, 𝜇𝑐) = (F𝑏⊕F𝑐, 𝜇1 | (F𝑏⊕F𝑐)) by showing that for all 𝑋𝑏 ∈ F𝑏 and 𝑋𝑐 ∈ F𝑐: 𝜇1(𝑋𝑏 ∩ 𝑋𝑐) = 𝜇𝑏 (𝑋𝑏) · 𝜇𝑐 (𝑋𝑐). To prove this we first define V: 𝒫(Mem[Var]) → 𝒫(Val) as V(𝑋) ≜ {𝑠(𝑥) |𝑠 ∈ 𝑋}, and 𝑆𝑤 ≜ {𝑠 |𝑠(𝑥) = 𝑤}. We 367 observe that 𝑋𝑏 = ⊎ 𝑤∈V(𝑋𝑏) 𝑆𝑤, and thus 𝑋𝑏 ∩ 𝑋𝑐 = ⊎ 𝑤∈V(𝑋𝑏) (𝑋𝑐 ∩ 𝑆𝑤); moreover, pre(𝑋𝑐 ∩ 𝑆𝑤) = {𝑠 |𝑠[𝑥 ↦→ 𝑤 ] ∈ 𝑋𝑐} = 𝑋𝑐. Thus, we can calculate: 𝜇1(𝑋𝑏 ∩ 𝑋𝑐) = ∑︁ 𝑠∈𝑋𝑏∩𝑋𝑐 𝜇0(pre(𝑠)) · ⟦𝑑⟧(®𝑣) (𝑠(𝑥)) = ∑︁ 𝑤∈V(𝑋𝑏) ∑︁ 𝑠∈𝑋𝑐∩𝑆𝑤 𝜇0(pre(𝑠)) · ⟦𝑑⟧(®𝑣) (𝑤) = ∑︁ 𝑤∈V(𝑋𝑏) ( ⟦𝑑⟧(®𝑣) (𝑤) · ∑︁ 𝑠∈𝑋𝑐∩𝑆𝑤 𝜇0(pre(𝑠)) ) = ©­« ∑︁ 𝑤∈V(𝑋𝑏) ⟦𝑑⟧(®𝑣) (𝑤) · 𝜇0(pre(𝑋𝑐 ∩ 𝑆𝑤))ª®¬ = ©­« ∑︁ 𝑤∈V(𝑋𝑏) ⟦𝑑⟧(®𝑣) (𝑤)ª®¬ · 𝜇0(𝑋𝑐) = 𝜇𝑏 (𝑋𝑏) · 𝜇𝑐 (𝑋𝑐) The last equation is given by 𝑎 · 𝑐 ⪯ 𝜇0 which implies that 𝜇𝑐 = 𝜇0 |F𝑐 , and by: 𝜇𝑏 (𝑋𝑏) = 𝜇1(𝑋𝑏) = ∑︁ 𝑠∈𝑋𝑏 𝜇0(pre(𝑠)) · ⟦𝑑⟧(®𝑣) (𝑠(𝑥)) = ∑︁ 𝑤∈V(𝑋𝑏) ∑︁ 𝑠∈𝑆𝑤 𝜇0(pre(𝑠)) · ⟦𝑑⟧(®𝑣) (𝑤) = ∑︁ 𝑤∈V(𝑋𝑏) ⟦𝑑⟧(®𝑣) (𝑤) Finally, we need to show (𝑥 $∼ 𝑑 (®𝑣)) (𝑏) which amounts to proving 𝑥� (F𝑏, 𝜇𝑏) and ⟦𝑑⟧(®𝑣) = 𝜇𝑏 ◦𝑥−1. The former holds because by construction 𝑥 is measurable in F𝑏. For the latter, for all𝑊 ⊆ Val: (𝜇𝑏 ◦ 𝑥−1) (𝑊) = 𝜇𝑏 (𝑥−1(𝑊)) = ∑︁ 𝑤∈V(𝑥−1 (𝑊)) ⟦𝑑⟧(®𝑣) (𝑤) = ∑︁ 𝑤∈𝑊 ⟦𝑑⟧(®𝑣) (𝑤) = ⟦𝑑⟧(®𝑣) (𝑊). □ Lemma D.5.24. WP-IF-PRIM is sound. 368 Proof. For any valid resource 𝑎, (if 𝑣 then wp [𝑖: 𝑡1] {𝑄(1)} else wp [𝑖: 𝑡2] {𝑄(0)})(𝑎) ⇔  (wp [𝑖: 𝑡1] {𝑄(1)})(𝑎) if 𝑣 � 1 (wp [𝑖: 𝑡2] {𝑄(0)})(𝑎) otherwise ⇔ ∀𝜇0.∀𝑐. (𝑎 · 𝑐) ⪯ 𝜇0 ⇒  ∃𝑏. (𝑏 · 𝑐) ⪯ ⟦𝑖 : 𝑡1⟧(𝜇0) ∧𝑄(1) (𝑏) if 𝑣 � 1 ∃𝑏. (𝑏 · 𝑐) ⪯ ⟦𝑖 : 𝑡2⟧(𝜇0) ∧𝑄(0) (𝑏) otherwise ⇔ ∀𝜇0.∀𝑐. (𝑎 · 𝑐) ⪯ 𝜇0 ⇒ ∃𝑏. (𝑏 · 𝑐) ⪯ ⟦𝑖 : if 𝑣 then 𝑡1 else 𝑡2⟧(𝜇0) ∧𝑄(𝑣 � 1) (𝑏) ⇒(wp [𝑖: if 𝑣 then 𝑡1 else 𝑡2] {𝑄(𝑣 � 1)})(𝑎) □ Lemma D.5.25. WP-BIND is sound. Proof. For any resource 𝑎 = (F , 𝜇, 𝑝), (⌈𝑒 = 𝑣⌉∗wp [ 𝑖: E[𝑣] ] {𝑄})(F , 𝜇, 𝑝) iff there exists (F1, 𝜇1, 𝑝1), (F2, 𝜇2, 𝑝2) such that (⌈𝑒 = 𝑣⌉)(F1, 𝜇1, 𝑝1) (wp [ 𝑖: E[𝑣] ] {𝑄})(F2, 𝜇2, 𝑝2) (F1, 𝜇1, 𝑝1) · (F2, 𝜇2, 𝑝2) ⪯ (F , 𝜇, 𝑝) By the upwards closure, we also have (⌈𝑒 = 𝑣⌉)(F , 𝜇, 𝑝) (wp [ 𝑖: E[𝑣] ] {𝑄})(F , 𝜇, 𝑝) The fact that (⌈𝑒 = 𝑣⌉)(F1, 𝜇1, 𝑝1) implies that 𝜇1((𝑒 = 𝑣)−1(True)) = 1, which implies that ⟦𝑒⟧(𝑠) = 𝑣 for all 𝑠 ∈ supp(𝜇1(𝑖)). 369 By lemma D.1.3, we have for any 𝑠 ∈ Mem[Var], K⟦E[𝑒]⟧(𝑠) = K⟦E[⟦𝑒⟧(𝑠)]⟧(𝑠), which implies that for any 𝜇0 over ΣMem[Var] ⟦E[𝑒]⟧(𝜇0) = 𝑠← 𝜇0; K⟦E[𝑒]⟧(𝑠) = 𝑠← 𝜇0; K⟦E[⟦𝑒⟧(𝑠)]⟧(𝑠) = 𝑠← 𝜇0; K⟦E[𝑣]⟧(𝑠) = ⟦E[𝑣]⟧(𝜇0). Define 𝜇′0 = ⟦[𝑖: E[𝑣]]⟧𝜇0. Thus, (wp [ 𝑖: E[𝑣] ] {𝑄})(𝑎) iff ∀𝜇0.∀𝑐. (V(𝑎 · 𝑐) ∧ 𝑎 · 𝑐 ⪯ 𝑎𝜇0) ⇒ ∃𝑎′. (V(𝑎′ · 𝑐) ∧ 𝑎′ · 𝑐 ⪯ 𝑎𝜇′0 ∧𝑄(𝑎 ′)) iff ∀𝜇0.∀𝑐. (V(𝑎 · 𝑐) ∧ 𝑎 · 𝑐 ⪯ 𝑎𝜇0) ⇒ ∃𝑎′. (V(𝑎′ · 𝑐) ∧ 𝑎′ · 𝑐 ⪯ 𝑎𝜇′0 ∧𝑄(𝑎 ′)) iff ( wp [ 𝑖: E[𝑒] ] {𝑄} ) ((𝑎)). □ Lemma D.5.26. WP-LOOP-UNF is sound. Proof. By definition, ⟦repeat (𝑛 + 1) 𝑡⟧(𝜇) = ( 𝑠← 𝜇; 𝑠′← loop𝑡 (𝑛, 𝑠); K⟦𝑡⟧(𝑠′) ) = ⟦(repeat 𝑛 𝑡); 𝑡⟧(𝜇) thus the rule follows from the argument of lemma D.5.21. □ Lemma D.5.27. WP-LOOP is sound. Proof. By induction on 𝑛. 370 Base case 𝑛 = 0 Analogously to lemma D.5.20 since, by definition, ⟦repeat 0 𝑡⟧(𝜇0) = 𝜇0. Induction step 𝑛 > 0 By induction hypothesis 𝑃(0) ⊢ wp [ 𝑗 :repeat (𝑛 − 1) 𝑡] {𝑃(𝑛− 1)} holds, and we want to show that 𝑃(0) ⊢ wp [ 𝑗 :repeat 𝑛 𝑡] {𝑃(𝑛)}. By lemma D.5.26, it suffices to show 𝑃(0) ⊢ wp [ 𝑗 :repeat (𝑛 − 1) 𝑡] {wp [ 𝑗 : 𝑡] {𝑃(𝑛)}}. By applying the induction hypothesis and lemma D.5.17 we are left with proving 𝑃(𝑛 − 1) ⊢ wp [ 𝑗 : 𝑡] {𝑃(𝑛)} which is implied by the premise of the rule with 𝑖 = 𝑛 − 1 < 𝑛. □ D.5.3 Soundness of Derived Rules In this section we provide derivations for the rules we claim are derivable in BLUEBELL. Ownership and Distributions Lemma D.5.28. SURE-DIRAC is sound. Proof. 𝐸 $∼ 𝛿𝑣 ⊣⊢ ∃F , 𝜇.Own((F , 𝜇)) ∗ ⌜𝜇 ◦ 𝐸−1 = 𝛿𝑣⌝ ⊣⊢ ∃F , 𝜇.Own((F , 𝜇)) ∗ ⌜𝜇 ◦ (𝐸 = 𝑣)−1 = 𝛿True⌝ ⊣⊢ ⌈𝐸 = 𝑣⌉ □ Lemma D.5.29. SURE-EQ-INJ is sound. 371 Proof. ⌈𝐸 = 𝑣⌉ ∗ ⌈𝐸 = 𝑣′⌉ ⊢ 𝐸 $∼ 𝛿𝑣 ∗ 𝐸 $∼ 𝛿𝑣′ (SURE-DIRAC) ⊢ 𝐸 $∼ 𝛿𝑣 ∧ 𝐸 $∼ 𝛿𝑣′ ⊢ ⌜𝛿𝑣 = 𝛿𝑣′⌝ (DIST-INJ) ⊢ ⌜𝑣 = 𝑣′⌝ □ Lemma D.5.30. SURE-SUB is sound. Proof. 𝐸1 $∼ 𝜇 ∗ ⌈(𝐸2 = 𝑓 (𝐸1))⌉ ⊢ C𝜇 𝑣. ⌈𝐸1 = 𝑣⌉ ∗ ⌈(𝐸2 = 𝑓 (𝐸1))⌉ (C-UNIT-R, C-FRAME) ⊢ C𝜇 𝑣. ⌈𝐸1 = 𝑣 ∧ 𝐸2 = 𝑓 (𝐸1)⌉ (SURE-MERGE) ⊢ C𝜇 𝑣. ⌈𝐸2 = 𝑓 (𝑣)⌉ (C-CONS) ⊢ C𝜇 𝑣. C𝛿 𝑓 (𝑣) 𝑣 ′. ⌈𝐸2 = 𝑣′⌉ (C-UNIT-L) ⊢ C𝜇′ 𝑣 ′. ⌈𝐸2 = 𝑣′⌉ (C-ASSOC, C-SURE-PROJ) where 𝜇′= bind(𝜇, λ𝑥. 𝛿 𝑓 (𝑥)) = 𝜇 ◦ 𝑓 −1. By C-UNIT-R we thus get 𝐸2 $∼ 𝜇 ◦ 𝑓 −1. □ Lemma D.5.31. DIST-FUN is sound. Proof. Assume 𝐸 : Mem[Var] → 𝐴 and 𝑓 : 𝐴→ 𝐵, then: 𝐸 $∼ 𝜇 ⊢ C𝜇 𝑣. ⌈(𝐸 = 𝑣)⌉ (C-UNIT-R) ⊢ C𝜇 𝑣. ⌈( 𝑓 ◦ 𝐸) = 𝑓 (𝑣)⌉ (C-CONS) ⊢ C𝜇 𝑣. C𝛿 𝑓 (𝑣) 𝑣 ′. ⌈( 𝑓 ◦ 𝐸) = 𝑣′⌉ (C-UNIT-L) ⊢ C𝜇′ 𝑣 ′. ⌈( 𝑓 ◦ 𝐸) = 𝑣′⌉ (C-ASSOC, C-SURE-PROJ) where 𝜇′ = bind(𝜇, λ𝑥. 𝛿 𝑓 (𝑥)) = 𝜇 ◦ 𝑓 −1. By C-UNIT-R we thus get ( 𝑓 ◦ 𝐸) $∼ 𝜇 ◦ 𝑓 −1. □ 372 Lemma D.5.32. DIRAC-DUP is sound. Proof. 𝐸 $∼ 𝛿𝑣 ⊢ ⌈𝐸 = 𝑣⌉ (SURE-DIRAC) ⊢ ⌈𝐸 = 𝑣⌉ ∗ ⌈𝐸 = 𝑣⌉ (SURE-MERGE) ⊢ 𝐸 $∼ 𝛿𝑣 ∗ 𝐸 $∼ 𝛿𝑣 (SURE-DIRAC) □ Lemma D.5.33. DIST-SUPP is sound. Proof. 𝐸 $∼ 𝜇 ⊢ C𝜇 𝑣.⌈𝐸 = 𝑣⌉ (C-UNIT-R) ⊢ ⌜𝜇(supp(𝜇)) = 1⌝ ∗ C𝜇 𝑣.⌈𝐸 = 𝑣⌉ ⊢ C𝜇 𝑣. ( ⌜𝑣 ∈ supp(𝜇)⌝ ∗ ⌈𝐸 = 𝑣⌉ ) (C-PURE) ⊢ C𝜇 𝑣. ( ⌈𝐸 = 𝑣⌉ ∗ ⌈𝐸 ∈ supp(𝜇)⌉ ) ⊢ ( C𝜇 𝑣.⌈𝐸 = 𝑣⌉ ) ∗ ⌈𝐸 ∈ supp(𝜇)⌉ (SURE-STR-CONVEX) ⊢ 𝐸 $∼ 𝜇 ∗ ⌈𝐸 ∈ supp(𝜇)⌉ (C-UNIT-R) □ Lemma D.5.34. PROD-UNSPLIT is sound. Proof. 𝐸1 $∼ 𝜇1 ∗ 𝐸2 $∼ 𝜇2 ⊢ C𝜇1 𝑣1. C𝜇2 𝑣2. ( ⌈𝐸1 = 𝑣1⌉ ∗ ⌈𝐸2 = 𝑣2⌉ ) (C-UNIT-R, C-FRAME) ⊢ C𝜇1 𝑣1. C𝜇2 𝑣2. ⌈(𝐸1, 𝐸2) = (𝑣1, 𝑣2)⌉ (SURE-MERGE) ⊢ C𝜇1⊗𝜇2 (𝑣1, 𝑣2). ⌈(𝐸1, 𝐸2) = (𝑣1, 𝑣2)⌉ (C-ASSOC) ⊢ (𝐸1, 𝐸2) $∼ 𝜇1 ⊗ 𝜇2 (C-UNIT-R) 373 □ Joint conditioning Lemma D.5.35. C-FUSE is sound. Proof. Recall that 𝜇 � 𝜅 ≜ λ(𝑣, 𝑤). 𝜇(𝑣)𝜅(𝑣) (𝑤). which can be reformulated as 𝜇 � 𝜅 = bind(𝜇, λ𝑣. (bind(𝜅(𝑣), λ𝑤. return(𝑣, 𝑤)))). The (⊢) direction is an instance of C-ASSOC. The (⊣) direction follows from C-UNASSOC: C𝜇�𝜅 (𝑣′, 𝑤′). 𝐾 (𝑣′, 𝑤′) ⊢ C𝜇 𝑣. Cbind(𝜅(𝑣),λ𝑤.𝛿 (𝑣,𝑤) ) (𝑣 ′, 𝑤′). 𝐾 (𝑣′, 𝑤′) (C-UNASSOC) ⊢ C𝜇 𝑣. C𝜅(𝑣) 𝑤. C𝛿 (𝑣,𝑤) (𝑣′, 𝑤′). 𝐾 (𝑣′, 𝑤′) (C-UNASSOC) ⊢ C𝜇 𝑣. C𝜅(𝑣) 𝑤. 𝐾 (𝑣, 𝑤) (C-UNIT-L) □ Lemma D.5.36. C-SWAP is sound. Proof. C𝜇1 𝑣1. C𝜇2 𝑣2. 𝐾 (𝑣1, 𝑣2) ⊢ C𝜇1⊗𝜇2 (𝑣1, 𝑣2). 𝐾 (𝑣1, 𝑣2) (C-FUSE) ⊢ C𝜇2 𝑣2. C𝜇1 𝑣1. 𝐾 (𝑣1, 𝑣2) (C-FUSE) Where 𝜇1 ⊗ 𝜇2 = 𝜇1 � (λ . 𝜇2) = 𝜇2 � (λ . 𝜇1) justifies the applications of C-FUSE. □ Lemma D.5.37. SURE-CONVEX is sound. 374 Proof. By SURE-STR-CONVEX with 𝐾 = True. □ Lemma D.5.38. Section 5.3.5 is sound. Proof. C𝜇 𝑣.𝐸 $∼ 𝜇′ ⊢ C𝜇 𝑣. C𝜇′ 𝑤.⌈𝐸 = 𝑤⌉ (C-UNIT-R) ⊢ C𝜇′ 𝑤. C𝜇 𝑣.⌈𝐸 = 𝑤⌉ (C-SWAP) ⊢ C𝜇′ 𝑤.⌈𝐸 = 𝑤⌉ (SURE-CONVEX) ⊢ 𝐸 $∼ 𝜇′ (C-UNIT-R) □ Lemma D.5.39. The following rule is sound: ∀(𝑣, ) ∈ supp(𝜇).∀𝜇′. C𝜇′ 𝑤. 𝑃(𝑣) ⊢ 𝑃(𝑣) C𝜇 (𝑣, 𝑤). 𝑃(𝑣) ⊣⊢ C𝜇◦𝜋−1 𝑣. 𝑃(𝑣) Proof. Assume that for all (𝑣, ) ∈ supp(𝜇), ∀𝜇′. C𝜇′ 𝑤. 𝑃(𝑣) ⊢ 𝑃(𝑣) (i.e. 𝑃(𝑣) is convex). By lemma D.1.2 there is some 𝜅 such that 𝜇 = (𝜇 ◦ 𝜋−1) � 𝜅. Then: C𝜇 (𝑣, 𝑤). 𝑃(𝑣) ⊣⊢ C𝜇◦𝜋−1 𝑣. C𝜅(𝑣) 𝑤. 𝑃(𝑣) (C-FUSE) ⊣⊢ C𝜇◦𝜋−1 𝑣. 𝑃(𝑣) The last step is justified by the convexity assumption in the (⊢) direction, and by C-TRUE and C-FRAME in the (⊣) direction. □ Lemma D.5.40. C-SURE-PROJ is sound. Proof. By lemma D.5.39 and lemma D.5.37. □ Lemma D.5.41. Section 5.3.5 is sound. 375 Proof. C𝜇1 𝑣1. ( ⌈𝐸1 = 𝑣1⌉ ∗ 𝐸2 $∼ 𝜇2 ) ⊢ C𝜇1 𝑣1. ( ⌈𝐸1 = 𝑣1⌉ ∗ C𝜇2 𝑣2. ⌈𝐸2 = 𝑣2⌉ ) (C-UNIT-R) ⊢ C𝜇1 𝑣1. C𝜇2 𝑣2. ( ⌈𝐸1 = 𝑣1⌉ ∗ ⌈𝐸2 = 𝑣2⌉ ) (C-FRAME) ⊢ C𝜇1 𝑣1. C𝜇2 𝑣2. ⌈𝐸1 = 𝑣1 ∧ 𝐸2 = 𝑣2⌉ (SURE-MERGE) ⊢ C𝜇1⊗𝜇2 (𝑣1, 𝑣2). ⌈(𝐸1, 𝐸2) = (𝑣1, 𝑣2)⌉ (C-ASSOC) ⊢ (𝐸1, 𝐸2) $∼ (𝜇1 ⊗ 𝜇2) (C-UNIT-R) ⊢ 𝐸1 $∼ 𝜇1 ∗ 𝐸2 $∼ 𝜇2 (PROD-SPLIT) □ Lemma D.5.42. Section 5.3.5 is sound. Proof. By lemma D.5.39 and lemma D.5.38. □ Weakest Precondition Lemma D.5.43. Section 5.4 is sound. Proof. Special case of WP-LOOP with 𝑛 = 0, which makes the premises trivial. □ Lemma D.5.44. Section 5.4 is sound. 376 Proof. From the premises, we derive: 𝑃 ∗ ⌈𝑒 = 1⌉ ⊩ wp [1: 𝑡1] {𝑄(1)} 𝑃 ∗ ⌈𝑒 = 0⌉ ⊩ wp [1: 𝑡2] {𝑄(0)} ∀𝑏 ∈ {0, 1}. 𝑃 ∗ ⌈𝑒 = 1⌉ ⊩ if 𝑏 then wp [1: 𝑡1] {𝑄(1)} else wp [1: 𝑡2] {𝑄(0)} ∀𝑏 ∈ {0, 1}. 𝑃 ∗ ⌈𝑒 = 𝑏⌉ ⊩ wp [1: (if 𝑏 then 𝑡1 else 𝑡2)] {𝑄(𝑏 � 1)} WP-IF-PRIM ∀𝑏 ∈ {0, 1}. 𝑃 ∗ ⌈𝑒 = 𝑏⌉ ⊩ wp [1: (if e then 𝑡1 else 𝑡2)] {𝑄(𝑏 � 1)} WP-BIND C𝛽 𝑏. (𝑃 ∗ ⌈𝑒 = 𝑏⌉) ⊩ C𝛽 𝑏.wp [1: (if e then 𝑡1 else 𝑡2)] {𝑄(𝑏 � 1)} C-CONS 𝑃 ∗ 𝑒 $∼ 𝛽 ⊩ C𝛽 𝑏.wp [1: (if e then 𝑡1 else 𝑡2)] {𝑄(𝑏 � 1)} C-UNIT-R,C-FRAME 𝑃 ∗ 𝑒 $∼ 𝛽 ⊩ wp [1: (if e then 𝑡1 else 𝑡2)] {C𝛽 𝑏. 𝑄(𝑏 � 1)} C-WP-SWAP □ 377 Biographical Sketch Dedication Acknowledgements Table of Contents List of Figures Introduction Probabilistic Programs Independence and Dependencies in Programs Separation Logic for Independence and Dependencies Outline of the Thesis Bunched Logic and Probabilistic Separation Logic Background Bunched Logic (BI) Syntax and Semantics Proof System Soundness and Completeness of BI A Discrete Probabilistic Frame of BI Probabilistic Separation Logic A Simple Probabilistic Programming Language A Concrete BI Model for Asserting Independence A Program Logic for Reasoning about Independence A Program Logic for Negative Dependence Overview Negative Association A BI Frame for Negative Dependence Initial Attempts at a BI Frame for Negative Association Our BI Frame for Negative Association M-BI: Combining BI Models The Syntax and Proof Rules Semantics A M-BI Model for Independence and NA Logic of Independence and Negative Association Assertion Logic Program Logic Examples Probability-related Axioms for Examples Bloom filter, High-level Bloom filter, Low-level Permutation Hashing Fully-dynamic Dictionary Repeated Balls-into-bins Process Related Work A Bunched Logic for Dependence and Independence DIBI Logic Syntax and semantics Proof system Soundness and Completeness of DIBI A Probabilistic Model of DIBI A Concrete Probabilistic Frame of DIBI Capturing Conditional Independence Validating the Semi-graphoid Axioms Conditional Probabilistic Separation Logic CPSL: Assertion Logic Conditional Probabilistic Separation Logic (CPSL) Example: CPSL in Action Related Work Bluebell: A Unifying Framework for Independence, Conditional Independence and Relational Reasoning Overview Preliminaries: Programs and Probability Spaces The Bluebell Logic An Alternative Approach to Bunched Logic A Model of Probabilistic Spaces A Model of Mutable Probabilistic Stores Joint Conditioning The Rules of Conditioning and Independence Reasoning about Programs in Bluebell Case Studies for Bluebell One Time Pad Revisited Markov Blankets Multi-party Secure Computation Von Neumann Extractor Related Work Discussion Related Work Directions for Future Work Bunched Logic and Probabilistic Separation Logic Proofs related to Bunched Logic Proofs related to Probabilistic Separation Logic LINA: A Separation Logic for Negative Dependence Preliminaries A BI Frame for Negative Association Capturing Negative Association Omitted Proofs of Frame Conditions Soundness and Completeness of M-BI algebras Algebraic Soundness and Completeness Soundness of M-BI formulas Completeness of M-BI formulas A M-BI Model for Independence and Negative Association Independence Implies PNA Axioms of Negative Association The Restriction Property of M-BI Formulas DIBI: A Bunched Logic for Conditional Independence A Probabilistic Model of DIBI Well-definedness of the Structure Associativity of Parallel Composition Commutativity of Parallel Composition Other Properties Used in Proving Frame Conditions Main Theorem: Proving Frame Conditions Capturing Conditional Independence Properties of the Probabilistic Frame Key Lemmas: Conditional Independence is Expressed Validating Graphoid Axioms, Section 4.2.3 CPSL Assertion Logic Restriction Extra Axioms CPSL Soundness The Unary Fragment Bluebell for Reasoning About Independence and Conditional Independence The Rules of Bluebell Program Semantics Measure Theory Lemmas Construction of the Bluebell Model Characterizations of Joint Conditioning Soundness Soundness of Primitive Rules Soundness of Primitive WP Rules Soundness of Derived Rules