NONSTANDARD MODELS OF THE WEAK SECOND ORDER THEORY OF ONE SUCCESSOR A Dissertation Presented to the Faculty of the Graduate School of Cornell University in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy by Thomas Robert Kern May 2016 c 2016 Thomas Robert Kern ALL RIGHTS RESERVED NONSTANDARD MODELS OF THE WEAK SECOND ORDER THEORY OF ONE SUCCESSOR Thomas Robert Kern, Ph.D. Cornell University 2016 In this paper, we seek to classify the nonstandard models of the weak (monadic) second order theory of one successor (WS1S), the theory of the two-typed structure consisting of natural numbers and finite sets of natural number equipped with a successor (+1) operation, and membership relation between the two types. This will require an automata-theoretic approach. Chapter 1 will provide an introductory background to the intersection of automata theory and model theory. In chapter 2, we use the Krohn-Rhodes Theorem and an observation about the powerset determinization construction to prove what will be essentially be a quantifier-elimination type result for our theory, although since the result is of independent interest, it will be presented in a more general automata-theoretic context as a generating set for the regular functions under composition. In chapter 3, we provide an axiomatization for WS1S. In chapter 4, we apply this axiomatization to prove a number of key theorems regarding nonstandard models of WS1S. This includes a classification of the possible first order types, tools for completely classifying nonstandard models, an exploration of countable nonstandard models with the simplest nontrivial first order type, and a tool for producing new nonstandard models given old nonstandard models. BIOGRAPHICAL SKETCH Thomas Kern received a Bachelor of Arts from Dartmouth College in 2009. There he served in various leadership positions for the Dartmouth Math Society. He participated in the Budapest Semesters in Mathematics program in Fall of 2007. This thesis represents his efforts towards obtaining a Ph.D. in Mathematics from Cornell University in May 2016. He will additionally be receiving a Masters in Computer Science from Cornell University in May 2016. iii To those who helped me along my journey, and to those who kept me company along the way. iv ACKNOWLEDGEMENTS This material is based upon work supported by the National Science Foundation Graduate Research Fellowship under Grants No. DMS-0852811 and DMS1161175. This work would not have been possible without encouragement and guidance from my advisor, Anil Nerode. I would also like to thank Scott Messick for valuable discussions on the theory of finite automata. v TABLE OF CONTENTS Biographical Sketch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix 1 Introduction 1 2 Generating the Regular Functions and Krohn-Rhodes 3 2.1 Moore Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.2 Regular Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.3 Determinization and Harvesting . . . . . . . . . . . . . . . . . . . 13 2.4 Length Modification . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.5 The Krohn-Rhodes Theorem . . . . . . . . . . . . . . . . . . . . . . 24 2.6 A Further Breaking Down . . . . . . . . . . . . . . . . . . . . . . . 36 2.7 Reverse Moore Machines . . . . . . . . . . . . . . . . . . . . . . . . 40 2.8 Removing RASn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 2.9 Further Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 3 Axiomatizing the Weak Second Order Theory of One Successor 48 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 3.2 Axiomatizing (ω, <) . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 3.3 Basic Set Axioms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 3.4 Logical Automata . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 3.5 Basic Consequences of Our Axioms . . . . . . . . . . . . . . . . . . 61 3.6 Determinization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 3.7 Complementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 3.8 Conjunction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 3.9 Bounding and Unbounding . . . . . . . . . . . . . . . . . . . . . . 73 3.10 Permutation and Introduction of Variables . . . . . . . . . . . . . 78 3.11 Projection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 3.12 Base Case Automata . . . . . . . . . . . . . . . . . . . . . . . . . . 83 3.13 The Weak Exclusively Second Order Theory of One Successor . . 84 3.14 0-ary Automata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 3.15 Finitely Many Axioms Do Not Suffice . . . . . . . . . . . . . . . . 88 3.16 Conclusion and Future Research . . . . . . . . . . . . . . . . . . . 92 4 Regarding Nonstandard Models of the Weak Second Order Theory of One Successor 94 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 4.2 Recalling our Axiomatization . . . . . . . . . . . . . . . . . . . . . 96 vi 4.3 L-Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 4.4 Minimal L-Models . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 4.5 The Tail-Head Lemma . . . . . . . . . . . . . . . . . . . . . . . . . 110 4.6 Countable 1-Models . . . . . . . . . . . . . . . . . . . . . . . . . . 116 4.7 Cut-And-Paste Models . . . . . . . . . . . . . . . . . . . . . . . . . 129 4.8 Conclusion and Further Research . . . . . . . . . . . . . . . . . . . 134 A Appendix A: List of Axioms 137 Bibliography 138 vii LIST OF TABLES 2.1 Sample Application of an Ordinal Removal Sequence . . . . . . . 32 2.2 Comparing ASn and RASn on Sample Input . . . . . . . . . . . . 43 2.3 The Action of RTrunc on Cwpair(Mask(w), ASn(w)) . . . . . . . . . 44 2.4 Producing RASnTrunc on Sample Input . . . . . . . . . . . . . . . . 45 3.1 Transition Diagram for the Determinization and Harvester Near the Dummy State . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 3.2 Transition Table for ASub . . . . . . . . . . . . . . . . . . . . . . . . 83 3.3 Transition Table for ASuc . . . . . . . . . . . . . . . . . . . . . . . . 84 4.1 Transition Table for Bit . . . . . . . . . . . . . . . . . . . . . . . . 119 4.2 Transition Table for RBit . . . . . . . . . . . . . . . . . . . . . . . . 119 viii LIST OF FIGURES 2.1 Alignment Convention for Moore Machines . . . . . . . . . . . . 5 2.2 Behavior of Tuplefy on Sample Input . . . . . . . . . . . . . . . . 10 2.3 Alignment Convention for Reverse Moore Machines . . . . . . . 15 2.4 Transition Diagram for the Automaton Mj in our Cascade . . . . 34 ix CHAPTER 1 INTRODUCTION In this series of papers, we utilize the Bu¨ chi Theorem, which formalizes the expressive equivalence between finite state automata and weak monadic second order logic, to explore model theoretic properties of the weak (monadic1) second order theory of one successor. Specifically, we will try to answer a number of questions about the nonstandard models of this theory. The study of finite automata began in the 1950s with Stephen Kleene’s work on Nerve Nets[6], an early form of neural network. Their connection to monadic second order logics was pointed out by Bu¨ chi[1] in the 1960s. Since that time, four key equivalence theorems have been proved. Respectively, finite automata, Bu¨ chi automata, finite tree automata, and Rabin automata are expressively equivalent to the weak second order theory of one successor, the second order theory of one successor, the weak second order theory of two successors, and the second order theory of two successors. The theory of one successor has first order part ω, the natural numbers, with the one successor operation +1. The theory of two successors has first order part {0, 1}∗, the set of finite strings of 0s and 1s, with operations S0 and S1 which respectively tack a 0 and a 1 onto the end of a word. A weak second order theory has, in addition to quantification over first order objects, quantification over finite sets of first order objects. The (full) second order theory has quantification over arbitrary sets of first order objects. Automata theory is particularly fruitful in terms of equivalence theorems: 1Throughout this Dissertation, we will only consider monadic second order theories, so the term monadic will be omitted. 1 notions of regular expression exist for these classes, various modifications of these automata are known to be expressively equivalent, and we have the Myhill-Nerode Theorem, which provides an abstract characterization of these languages. 2 CHAPTER 2 GENERATING THE REGULAR FUNCTIONS AND KROHN-RHODES In this chapter we seek to produce a small generating set for the regular functions, those functions of words which can be expressed in the regular word logic. The main result is that essentially any regular function can be computed by a two-pass process: first scanning the input from left to right, leaving behind some data, and then scanning the input and data left behind from right to left, outputting the value of the function as we go. We then provide an interpretation of the Krohn-Rhodes theorem which allows us to break up our generating set even further. For the convenience of the reader, a simplified version of the proof of the Krohn-Rhodes Theorem from Ginzburg [5] is also presented. 2.1 Moore Machines The Krohn-Rhodes Theorem concerns itself with finite state transducers, an abstraction of systems that: • Accept inputs from a discrete set at discrete times, • Retain some memory about previous inputs, which updates whenever an input is read, • For each input read, produce some output from a discrete set based on the input and memory. This provides us with an abstraction of synchronous (as opposed to those that update continuously), digital (as opposed to those that deal with analog values) 3 systems. Originally proved in [7], the Krohn-Rhodes theorem itself allows us to decompose arbitrary finite state transducers into a cascade of transducers from a small generating set. Computational implementations of this decomposition are available [3]. The Krohn-Rhodes Theorem can be used to analyze the rough behavior of automata, providing applications to Artificial Intelligence [4]. We formalize finite state transducers as follows: Definition. A Moore Machine is a tuple: (Σ, Q, q0, Γ, δ, ). • Σ is a finite set of input characters (alphabet). • Q is a finite set of states. • q0 ∈ Q is the initial state. • Γ is a finite set of output characters. • δ : Q × Σ → Q is the transition function. • : Σ → Γ is the output function. For convenience, we denote the map x → δ(a, x) by δa. Given an input word w ∈ Σ∗, we construct a run r and output o such that: • r[0] = q0. • r[i + 1] = δ(r[i], w[i]) for 0 ≤ i < |w|. • o[i] = (r[i]), for 0 ≤ i ≤ |w|. Where our indexing notation is such that: w = w[0], . . . , w[|w| − 1]. 4 The correct way of thinking about Moore Machines is that each input acts as a transition between one state and the next, or that each state is the state between inputs. A proper representation would have input characters half a step offset from states. Outputs are simply a product of the state the automaton is in and so should be in step with the states. However, representing the sequences of input characters and states as words requires a choice of direction to shift half a step. A considerable effort has been made to pick the option to result in the cleanest presentation. In this paper, the input character w[i] tells the device how to transition from state r[i] to state r[i + 1] 1. Example 1. Here we show below how inputs, states, and outputs, respectively, line up according to our notation. abbba q0 q1 q2 q1 q2 q3 o0 o1 o0 o1 o0 o2 Figure 2.1: Alignment Convention for Moore Machines We will typically consider Moore Machines that simply output their states: Definition. A Moore Machine is said to be transparent if Γ = Q and is the identity. In this case, we present our Moore Machine as a tuple: (Σ, Q, q0, δ). We can interpret Moore Machines as functions from input words to output words: Definition. Given a Moore Machine M = (Σ, Q, q0, Γ, δ, ), and word w ∈ Σ∗, we denote by M (w) the output of M on input w, in Γ∗. 1This is as opposed to the input character w[i] telling the device how to transition from state r[i − 1] to state r[i]. This alternative is not uncommon. 5 Moore Machines, however, represent only a small subset of those functions on words which we can reason about using finite automata. Definition. Given two alphabets, Σ, Γ, a function on words f : Σ∗ → Γ∗ is said to be: • length-preserving if for any word w, |f (w)| = |w|. • causal if f (w)[i] depends only on w[0], . . . , w[i]. • strictly causal if f (w)[i] depends only on w[0], . . . , w[i − 1]. • character-wise if f (w)[i] depends only on w[i]. Definition. Define the following useful functions for dealing with words: • Let Sa denote the successor-a function which appends the character a to the end of a word. • Let Trunc denote the function which removes the last character of a word. • Let Rest denote the function which removes the first character of a word. Proposition 1. Given a Moore Machine M , the function M computes is strictly causal, and increases length by 1. We may find it more convenient to deal with length preserving versions of this function: Definition. Given a Moore Machine M = (Σ, Q, q0, Γ, δ, ), and word w ∈ Σ∗, we denote by: • M Trunc(w) the output of M on input w with the last state removed. • M Rest(w) the output of M on input w with the first state (the start state) removed. 6 Proposition 2. Given a Moore Machine M , the function M Trunc is strictly causal and length preserving, and the function M Rest is causal and length preserving. In the next section, we will explore other kinds of functions of which can be reasoned about using finite automata. 2.2 Regular Functions The notion of finite automaton, or finite state recognizer, is more commonly studied than the finite state transducer. Finite state automata are an abstraction of systems that: • Accept inputs from a discrete set at discrete times, • Retain some memory about previous inputs, which updates whenever an input is read, • Having finished reading a sequence of inputs, either accepts or rejects. Just as finite state transducers can be interpreted as functions on words, finite state automata can be interpreted as predicates on words, returning a boolean value after having read in a word. Their predicative nature means that finite state automata are more convenient to use in applications to formal logic. On the other hand, real world systems are more often interested in transforming inputs, and so are better represented by finite state transducers. Finite automata give us a notion of a regular set or regular event, a collection of words or sequences of inputs which are exactly those which some finite 7 automaton accepts. Once we define what it means to represent a function of words with an automaton, this will give us a notion of regular function. We formalize finite state recognizers as follows: Definition. A finite state automaton is a tuple: (Σ, Q, I, δ, F ). • Σ is a finite set of input characters (alphabet). • Q is a finite set of states. • I ⊆ Q is the set of initial states. • δ : Q × Σ → Q is the transition relation. • F ⊆ Q is the set of final states. Given an input word w ∈ Σ∗, we say that r ∈ Q∗ is a run on input w if: • r[0] ∈ I. • δ(r[i], w[i], r[i + 1]) for 0 ≤ i < |w|. We say that w is accepted if there is a run r on input w such that r[|w|] ∈ F . Definition. A finite state automaton A = (Σ, Q, I, δ, F ) is deterministic if: • I is a singleton. • δ is a function from Q × Σ → Q, that is, given a q ∈ Q, and a ∈ Σ, there is a unique q ∈ Q such that δ(q, a, q ). By default, we say that A is nondeterministic. 8 A well known theorem of finite automata is that: Proposition 3. If R is the set of accepted inputs of some automaton, then it is also the set of accepted inputs of some deterministic automaton. In either case, we say that R is regular. A common convention in logic is to identify a function f with the relation Rf which consists of all pairs of the form (x, f (x)), or, for n-ary functions, (x0, . . . , xn−1, f (x0, . . . , xn−1)), for x in the domain of f . As such, we will be able to define a regular function as a relation which is regular and also a function. The question now is how to input multiple words, especially multiple words of different lengths, to a finite automaton. We introduce the Tuplefy map to merge words together in parallel so they can be read by an automaton. For words of different lengths, we add a dummy character #. Definition. Define the map Tuplefy : Σ∗0 × · · · × Σ∗n−1 → ((Σ0 ∪ {#}) × · · · × (Σn−1 ∪ {#}))∗, Which satisfies:  wj [i] Tuplefy(w0, . . . , wn−1)[i]j = # it exists otherwise and |Tuplefy(w0, . . . , wn−1)| = maxi |wi|. Example 2. Here we show below how Tuplefy combines words together to one word: 9 w0 w1 w2 w3 w4 Tuplefy(w0, w1, w2, w3, w4) a a b a a a #   b   a b b b a b b #   b   a b b a b # #   b   a a a a # # # a a # # # # a Figure 2.2: Behavior of Tuplefy on Sample Input Now we can define the notion of regular relation and regular function: Definition. An n-ary relation R ⊆ Σ∗0 × · · · × Σ∗n−1 is regular if there is a finite automaton A with input alphabet (Σ0 ∪ {#}) × · · · × (Σn−1 ∪ {#}) such that: R(w0, . . . , wn−1) ⇐⇒ A accepts Tuplefy(w0, . . . , wn−1). An n-ary function f : Σ∗0 ×· · ·×Σ∗n−1 → Σ∗n is regular if there is a finite automaton A with input alphabet (Σ0 ∪ {#}) × · · · × (Σn ∪ {#}) such that: f (w0, . . . , wn−1) = wn ⇐⇒ A accepts Tuplefy(w0, . . . , wn). Regular relations and functions are a key part of the analysis of various automaton logics. For instance: Definition. Given a finite alphabet Σ, let WΣ = (Σ∗, ≤, =el, Sa|a∈Σ) where: • ≤ is the prefix relation on words, • =el is the equal length relation on words, 10 • Sa is the Successor-a unary operation, which appends an a onto the end of a word. We call WΣ the regular word logic over Σ. Proposition 4. If R is a relation on Σ∗, the following are equivalent: • R is regular, • R is given by a formula φ in the language of WΣ. Hence our interest in regular functions. If φ(x0, . . . , xn) is a formula in the language of WΣ such that: ∀x0, . . . , xn−1∃xn : φ(x0, . . . , xn), Then, since lexicographic ordering 1, we can also make sense of Cwf . Let f : Σ0 × · · · × Σn−1 → Γ. Then we define the partial function Cwf : Σ∗0 × · · · × Σ∗n−1 → Γ∗, which takes in inputs of all the same lengths and produces an output of the same length, where: Cwf (w0, . . . , wn−1) = Cwf (Tuplefy(w0, . . . , wn−1)). Noting that Tuplefy does not produce characters with # in them for equal-length inputs. Of course, Cwf is causal and length preserving (and reverse-causal, when we define the notion). Every Moore Machine function can be written as the corresponding function for the corresponding transparent Moore Machine composed with a bitwise application of its function. We now define some notation for dealing with Reverse Moore Machines, analogous to our notation established previously. Definition. A Reverse Moore Machine is a tuple: (Σ, Q, qf , Γ, δ, ). 14 • Σ is a finite set of input characters (alphabet). • Q is a finite set of states. • qf ∈ Q is the final state. • Γ is a finite set of output characters. • δ : Q × Σ → Q is the reverse transition function. • : Σ → Γ is the output function. Given an input word w ∈ Σ∗, we construct a run r and output o such that: • r[|w|] = qf . • r[i] = δ(r[i + 1], w[i]) for 0 ≤ i < |w|. • o[i] = (r[i]), for 0 ≤ i ≤ |w|. Example 3. Here we show below how inputs, states, and outputs, respectively, line up according to our notation. abbba q0 q1 q2 q1 q2 qf o0 o1 o0 o1 o0 o2 Figure 2.3: Alignment Convention for Reverse Moore Machines To prevent type mismatches, we will denote a Reverse Moore Machine with letters R, P as opposed to letters M, N for Moore Machines. Definition. A Reverse Moore Machine is said to be transparent if Γ = Q and is the identity. In this case, we present our Reverse Moore Machine as a tuple: (Σ, Q, qf , δ). 15 As before, we can interpret Reverse Moore Machines as functions from input words to output words: Definition. Given a Reverse Moore Machine R = (Σ, Q, qf , Γ, δ, ), and word w ∈ Σ∗, we denote by: • R(w) the output of R on input w. • RTrunc(w) the output of R on input w with the last state removed. • RRest(w) the output of R on input w with the first state (the start state) removed. Analogous to our notions of causal and strictly causal, we have notions of reverse causal and strictly reverse causal: Definition. Given two alphabets, Σ, Γ, a function on words f : Σ∗ → Γ∗ is said to be: • reverse causal if f (w)[i] depends only on w[i], w[i + 1], . . .. • strictly reverse causal if f (w)[i] depends only on w[i + 1], w[i + 2], . . .. Of course, a function is character-wise iff it is causal and reverse causal. We also have a notion of reverse deterministic automaton: Definition. A finite automaton A = (Σ, Q, I, δ, F ) is reverse deterministic if: • F is a singleton. • For each q, a, there is a unique q such that (q , a, q) ∈ δ. A reverse deterministic automaton can also be viewed as a transparent Reverse Moore Machine. We now have the notation to state our Theorem: 16 Theorem 1. Any length-preserving regular function can be written as a multivariable composition of the form: f (w) = RTrunc(CwPair(w, M Trunc(w))). It’s worth noting that CwPair has the same action here as Tuplefy. We’ve chosen to use CwPair here to indicate that we’re not using the length-padding features of Tuplefy. We need to prove a few lemmas first. Lemma 1. Given a length-preserving regular function f : Σ∗ → Γ, there is an automaton B with state set Q and map : Q → Γ such that f (w) is Trunc ◦ Cw applied to any run of B on input w. Proof. Given our length-preserving function f : Σ∗ → Γ∗, let: C = ((Σ ∪ {#}) × (Γ ∪ {#}), Q, I, δ, F ) Witness the regularity of f . Since C automatically rejects any inputs with a # in them, we can restrict it to: A = (Σ × Γ, Q, I, δ, F ) Recognizing the same set of inputs. Construct an automaton: B = (Σ, Q × Γ, I × Γ, δ , F × Γ), Where: ((q, o), a, (q , o )) ∈ δ ⇐⇒ (q, (a, o), q ) ∈ δ 17 B is nondeterministic. I assert that runs of B on input w will necessarily have Γ component So(f (w)), for some o ∈ Γ. Firstly, it should be clear that if r is an accepting run of A on input Tuplefy(w, f (w)), then Tuplefy(r, So(f (w))) is an accepting run of B on input w for any o ∈ Γ. Now, suppose Tuplefy(r , g) is an accepting run of B on input w. Then it is easy to check that r is an accepting run of A on input Tuplefy(w, Trunc(g)). Since A witnessed f being a regular function, Trunc(g) must be f (w). This completes the proof for taking the Γ component of the states of B. Definition. Given a nondeterministic automaton A = (Σ, Q, I, δ, F ), define its determinization: det(A) = (Σ, P(Q), {I}, δ , {E ⊂ Q : E ∩ F = ∅}), Where: δ (K, a) = {q ∈ Q|∃q ∈ K : δ(q, a, q )}. Typically throughout this chapter we will be concerned with the determinization as a transparent Moore Machine, so the set of final states doesn’t matter much. Definition. Given a nondeterministic automaton A = (Σ, Q, Iδ, F ), define its harvester: harv(A) = (Σ × P(Q), Q ∪ {qF }, Q ∪ {qF }, δ , {qF }), Where: • The q such that δ (q, (a, K), q ) (for q ∈ Q) is given by the least q ∈ K such that δ(q , a, q ) or qF if none exist. 18 • The q such that δ (q, (a, K), qF ) is given by finding the least q ∈ F such that there is a q ∈ K such that δ(q , a, q ) and then having q be the least q ∈ K such that δ(q , a, q ). If no such q ∈ F exists, then q is given by qF . The powerset determinization of an automaton A produces an automaton that keeps track of, at every position, the set of states of A which are reachable through some sequence of transitions, having read the input up to that point. However, not every one of these reachable states necessarily shows up in some accepting run: it may be that being in one state now means later on having to be in another state which we cannot transition out of, or that being in a state now dooms us to being in a reject state once we have finished reading the input. In order to use our determinization to find an accepting run of the original automaton, provided there is an accepting run, we need to start at the end and work our way backwards, all the while staying within states that we know can be traced through a sequence of transitions back to a start state at the beginning. Specifically, if we know we’re in a state q which is reachable through some sequence of transitions after having read Sa(w), then there must be at least one state q which is reachable through some sequence of transitions after having read w such that reading a takes us from state q to state q. It is necessary to introduce an additional dummy state qF to start out in to make sure our harvester automaton is reverse-deterministic. Although our application of the harvester automaton will see only pairs of the form (a, K) for a some symbol being read by our original automaton and K the set of states reachable immediately prior to reading that specific a, our automaton should be prepared to read in arbitrary input pairs. An invalid input will cause the reverse deterministic automaton to go into the dummy qF state. Additionally, 19 a cheap fix is necessary to account for the fact that we don’t know what state the run we’re trying to produce ends on and we need a single “final” state for our reverse deterministic automaton to “start out” in. The dummy qF state represents all final states which are reachable. Truncating the run will remove this dummy state. Lemma 2. Suppose we have a nondeterministic automaton A, and valid input w. Let p be the run of det(A) on input w. I claim that harv(A) on input Tuplefy(w, Trunc(p)) will have run r, an accepting run of A on input w. Proof. It suffices to show that the only occurrence of qF in r is as the final character. By construction, harv(A) will satisfy the transition relations. As mentioned before the construction of harv(A) also prevents backwards transitioning into the qF state for this particular input, since a reachable state can always be traced back to a reachable state. Combining our two lemmas, given a regular, length-preserving function f , we have a nondeterministic automaton B which takes in a word w and has a run r which projects to f (w). By our second lemma: r = harv(B)Trunc(CwPair(w, det(B)Trunc)) By modifying the outer Reverse Moore Machine, we can throw in the appropriate projection to its output map to produce f (w). This completes the proof. This is a remarkable result. Every length-preserving regular function can be computed in a two-step process: one pass forwards through the input leaving behind some information as we do, then a pass backwards through the input with this additional information to directly produce the output. Two passes suffice; having more passes doesn’t increase our expressive power. 20 2.4 Length Modification We’re very close to generating all the regular functions, but we’re not quite there. The only functions we have dealt with so far were length-preserving. In this section, we see that most of the interesting behavior was already captured in the length-preserving case. Proposition 7. Suppose f : Σ → Γ is a regular function. Then there is a fixed constant c associated to f such that f (w) is no longer than c + |w| for every w. The proof is based on the pumping lemma. Proof. Let A be a deterministic automaton with c states accepting exactly words of the form Tuplefy(w, f (w)). Choose a specific w and suppose f (w) is longer than c + |w|. Imagine what happens as A reads in Tuplefy(w, f (w)), specifically,  # after w is finished, and A is reading in characters of the form   for some  o o ∈ Γ. Because there are more positions like this than there are states of A, by the pigeonhole principle some two positions will have the same state, say at positions i and j. Note however that if we remove all positions in Tuplefy(w, f (w)) between i and j (including i, excluding j) we still have an accepting run, of the form Tuplefy(w, g) for some g strictly shorter than f (w). This contradicts our assumption that A accepted exactly words of the form Tuplefy(w, f (w)). From this, we can also show that for any n-ary regular function f , there is a fixed constant c associated to f such that f (w0, . . . , wn−1) is no longer than c plus the length of the maximum input. 21 Note that the automaton A in our proof never encounters the input character  #  . As such, we may assume that this character acts as the identity on the  # states of A and still have an automaton witnessing the regularity of f . This automaton recognizes all pairs of the form Tuplefy(S#c (w), f (w)). (S#c simply represents a c-fold composition of the S# function). As such, the function which takes S#c (w) to S#d (f (w)) for d = |f (w)| − |w| is a length-preserving regular function g, and thus can be written as a composition of character-wise maps, truncated Moore Machines, and truncated Reverse Moore Machines as in Theorem 1. Since f can be written as: f (w) = Unpad(g(S#c (w))), Where Unpad removes final # characters. Note that we do not know how many final # characters there will be. For functions which reduce length, the number will be more than c and could be as much as c + |w|. One might be tempted to try to replace Unpad with some function like S#−1 (or, even less suited to the task, Trunc), which either removes a single final # or leaves the word alone if it cannot. However, since there are regular functions which take words of arbitrary length and reduce them to length 1, we need a generator that can produce unbounded shortening as well. Note that if f is n-ary for n > 1, we can use what we’ve already shown to see that: f (w0, . . . , wn−1) = Unpad(g(S#c n(w))), For some regular, length-preserving function g. As such, we have the following theorem: 22 Theorem 2. Any regular function can be written as a multivariable composition of: • Truncated Moore Machines, • Truncated Reverse Moore Machines, • Character-wise maps, • Tuplefy (allowing us to generate multiary character-wise maps), • Sa for various a, • Unpad. It’s worth noting here that our generating set is infinite. Specifically, there are an infinite number of Moore Machines and Reverse Moore Machines. There are also an infinite number of Character-wise maps, but this isn’t essential – one could use encoding methods to work purely with a single two-character alphabet (plus, optionally, the dummy character #). The infinitude of our generating set, however, is essential. The easiest way to see this is to talk about period introduction. Provided the input to a regular function has a sufficiently long periodic portion in the middle, the output of the regular function will also have a long periodic portion in the middle (it suffices to verify this of the generators above). What’s more, the period of the periodic portion of the output can only have prime factors which show up either in the periods of the periodic portions of the inputs or which are smaller than the number of states of the associated automaton to the regular function. However, one can easily build regular functions which introduce any prime factor into the periodicity of their inputs, so we must not allow any bound on the sizes of associated automata to regular functions in our generating set. We summarize this result as follows: 23 Proposition 8. Any set of regular functions which generates all regular functions under multivariable composition must be infinite. We conclude this chapter with a discussion of the Krohn-Rhodes Theorem. First, we will provide a proof of the Krohn-Rhodes Theorem adapted from Ginzburg [5]. Then we will use the Krohn-Rhodes Theorem to decompose our Moore Machine and Reverse Moore Machine generators into smaller, simpler generators. A final short discussion will clean up our set of generators, followed by proposed future research. 2.5 The Krohn-Rhodes Theorem In this section, we present the Krohn-Rhodes Theorem as adapted to the context of multivariate composition of regular functions. The original proof of the Krohn-Rhodes Theorem, in [7], was presented in terms of wreath products of semigroups. More modern presentations of the Krohn-Rhodes Theorem typically present it in terms of the cascade product of finite state transducers. The cascade product of two transducers M1, M2 is a system consisting of both machines. First, machine M2 reads in both the input to the system and the current state of M1 to update its state. When it has finished, machine M1 updates its state based only on the input to the system. Finally, an output is produced based on the states of M1 and M2. This reflects the reality of systems where updating the states of our machines takes a small but appreciable amount of time. In a well designed system, M2 should not have to wait for M1 to finish its update before it can update its state. As such, M2 uses the state of M1 prior to reading the input to update. 24 The Krohn-Rhodes Theorem separates out two extremes of behavior for finite automata. In general, reading in an input character induces a function on the states of the automaton. This function may map two states to the same state or to separate states. At one extreme, it may act as a permutation in which case it maps all states to separate states. In this case, it is possible to undo this action. We can recover the state before reading a character which acts as a permutation, provided we know which character the automaton read. At the other extreme, an input character may act as a reset in which case it maps all states to the same state. In this case all information about the previous state is lost. Definition. A Moore Machine or deterministic automaton is said to be: • A permutation automaton if each of its inputs acts as a permutation on its states, • A reset automaton if each of its inputs acts as a reset or the identity on its states, • A permutation-reset automaton if each of its inputs acts as a permutation or a reset on its states. We now state the Krohn-Rhodes theorem, in a bit of an unusual fashion: Theorem 3 (Krohn-Rhodes). Given a transparent Moore Machine M , we can write its truncated action M Trunc as a multivariable composition of truncated actions of permutation-reset Moore Machines M0, . . . , Mn−1 for n the number of states of M , and a final character-wise map f . What’s more, this composition takes on a fairly simple 25 form. Let: Then: w0 =M0Trunc(w), w1 =M1Trunc(Tuplefy(w, w0)), w2 =M2Trunc(Tuplefy(w, w0, w1)), ... wn−1 =MnT−ru1nc(Tuplefy(w, w0, . . . , wn−2)), M Trunc(w) = Cwf (w0, . . . , wn−1). Every map in this composition is length-preserving. Of course we can write this as simply one large multivariable composition of a character-wise map and truncated actions of permutation-reset transparent Moore Machines, but this is unwieldy to write down. The above also presents an efficient way of computing the composition, although readers concerned with efficiency are encouraged to look into the Holonomy decomposition [3]. Our proof is inductive: we show that for every transparent Moore Machine M , there’s another transparent Moore Machine M that keeps track of a state M is not in in a permutation-reset way. This reduces the amount of information we need to keep track of by one state, and we can keep doing this until we’ve kept track of all the information to know what state M is in. The proof in [5] allows for the possibility that we can keep track of several states our automaton M is not in in a permutation-reset way at the same time, as opposed to one at a time in the proof below. This is more efficient, but adds needless complexity to the proof. 26 The key piece of our construction is the Permutation-Reset Lemma, below. Lemma 3 (Permutation-Reset Lemma). Given two finite ordered sets of the same size I, J, and map between them f , there is a map g : I → J such that: • g either acts as: – A bijection from I to J (a permutation on the position indices), – Or has singleton image (a reset on position indices), • And for x, y ∈ I, with x = y, we have f (x) = g(y). • For any x ∈ I, we have f maps elements of I \ {x} to J \ {g(x)}. Proof. Suppose f does act as a bijection. Then g = f is a permutation and satisfies the inequality condition. Suppose f does not act as a bijection. Then g which maps everything to the smallest element of J which is not in the image of f has singleton image and satisfies the inequality condition. The third condition is just a rephrasing of the second, but will come in handy later on. A specific application of the Permutation-Reset Lemma is that we can have a transparent Moore Machine that keeps track of a state our original transparent Moore Machine is not in: Lemma 4. Given a transparent Moore Machine M = (Σ, Q, q0, δ), there is a permutation-reset transparent Moore Machine M with the same state set such that on input w, the state of M at any one time is not the state of M . 27 Proof. Assign a natural ordering to Q. Define M = (Σ, Q, q0, δ), Where q0 is the smallest element in Q which is not q0, and δ(q, a) is given by: • δ(q, a) if a acts as a permutation. • Otherwise, the smallest q ∈ Q which is not in the image of any state under the action of a. If the action of a was a permutation originally, it is still a permutation in our new automaton. This permutation not only takes us from the state our automaton is in before reading a to the state afterwards, but also from a state our automaton is not in before reading a to a state our automaton is not in afterwards. In the second case, note that the choice of q does not depend on q, so this action is a reset. Obviously, it takes us from a state our original automaton is not in before reading a to a state our automaton is not in after reading a. Lemma 5. Given transparent Moore Machines M, M as above with state sets Q, there is a third transparent Moore Machine: M = (Σ × Q, {0, . . . , |Q| − 2}, ˆı0, δ), Such that for any input w, if M on reading w winds up in state q, and M on reading w winds up in state q then M on reading Tuplefy(w, M Trunc(w)) will wind up in state ˆı, where q is the element in position2 ˆı of Q \ {q}. 2To maintain notational consistency within this paper, where ordered collections are indexed starting with 0, we refer to the first element of a set as being in position 0, and generally the i+1st element of a set as being in position i. To avoid confusion, we will endeavor to avoid using the notation “ith element”, instead using notation “the element in position i”. 28 Proof. Let M = (Σ×Q, {0, . . . , |Q|−2}, ˆı0, δ) where ˆı0 is the index of q0 in Q\{q0} and δ(i, (a, q)) is computed by: • Computing q = δ(q, a). This is the state M says that M is not in after reading a. • Computing q, the ith element of Q \ {q}. This is the state of M prior to reading a. • Return j, the index of δ(q, a) in Q \ q . The above construction is designed specifically to satisfy the conclusion. As such, we have that M Trunc(w) = Cwp(M Trunc(w), M Trunc(Tuplefy(w, M Trunc(w)))) Where p(q, i) is the ith element of Q \ {q}. Finally, we prove the Krohn-Rhodes Theorem: Proof. By induction on the number of states of M . Base Case: If M has one state, then f in the composition is 0-ary, and we can have it just output that constant state. Inductive Case: Given our M , we can write: M Trunc(w) = Cwp(M Trunc(w), M Trunc(Tuplefy(w, M Trunc(w)))), With M permutation-reset. By our inductive hypothesis, we can write M Trunc(w), which has one fewer state than M , as: M Trunc(w) = Cwf (w1, . . . , wn−1), 29 Where: w1 =M1Trunc(w) w2 =M2Trunc(Tuplefy(w, w1))) w3 =M3Trunc(Tuplefy(w, w1, w2) ... wn−1 =MnT−ru1nc(Tuplefy(w, w1, . . . , wn−2)) As such, M Trunc(Tuplefy(w, M Trunc(w))) is given by just plugging in: Where: M Trunc(Tuplefy(w, M Trunc(w))) = Cwf (w1, . . . , wn−1), w1 =M1Trunc(Tuplefy(w, M Trunc(w))), w2 =M2Trunc(Tuplefy(Tuplefy(w, M Trunc(w)), w1)), w3 =M3Trunc(Tuplefy(Tuplefy(w, M Trunc(w)), w1, w2)), ... wn−1 =MnT−ru1nc(Tuplefy(Tuplefy(w, M Trunc(w)), w1, . . . , wn−2)). Alternately, fiddling with some parentheses in the definitions of our au- tomata: M Trunc(Tuplefy(w, M Trunc(w))) = Cwf (w1, . . . , wn−1, ) 30 Where: w0 =M Trunc(w), w1 =M1Trunc(Tuplefy(w, w0)), w2 =M2Trunc(Tuplefy(w, w0, w1)), w3 =M3Trunc(Tuplefy(w, w0, w1, w2)), ... wn−1 =MnT−ru1nc(Tuplefy(w, w0, . . . , wn−2)). In which case: M Trunc(w) = Cwp(M Trunc(w), Cwf (w1, . . . , wn−1)). w0 We can combine p and f to get a single character-wise function on w0, . . . , wn−1, thus completing the induction. The proof is still straightforward if we unwind the induction. In our construction, w0 is keeping track of a state M is not in, but it may as well be keeping track of an index for a state M is not in. w1 is keeping track of an index of a state M is not in once we’ve removed the state in position w0 from Q. w2 is keeping track of an index of a state M is not in once we’ve removed the states w1 and w2 are keeping track of from Q. And so forth. We formalize this indexed removal process as follows: Definition. Given a positive integer n, an ordinal removal sequence for n is a (possibly empty) sequence of positive integers (k0, . . . , ki) satisfying: i x (S is strictly increasing) (1.5) ∀x : y : x < y < S(x) (S is a successor operation) (1.6) ∃o : ∀x : o ≤ x (< has minimal element) (1.7) ∃o : ∀x : x = o =⇒ ∃y : S(y) = x (S has image ω \ {o}) Note that it follows from (1.1) and (1.7) that the o in (1.6) and (1.7) are unique, and from (1.4) that the o in (1.6) and (1.7) are the same. We denote this element with the shorthand 0. It is worth noting that this axiomatization does not define (ω, <) uniquely. It is simply a base from which all other first order facts about (ω, <) can be proved. As mentioned in [9], this set of axioms characterizes the linear orderings of the form ω followed by Z × L ordered lexicographically (ω + ζ · L), for some linear ordering L. These linear orderings all have the same first order theory. Together, all of these axioms set up the foundation upon which we will develop the second order part of our axiomatization. 51 3.3 Basic Set Axioms In this section, we introduce some basic axioms for the second order elements of WS1S. These first few have been intentionally chosen to resemble axioms of ZFC. We will also define a notion of bitwise operation and show that our axioms suffice to show that the second order elements of WS1S are closed under all the bitwise operations we could expect them to be closed under. Axiom (Extensionality). Sets are equal if and only if they contain the same elements. Axiom (Singleton). For any x, there is a set, denoted {x}, containing only x. Definition. Given two sets A, B, we write: C = A ∪ B :≡ ∀x : x ∈ C ⇐⇒ x ∈ A ∨ x ∈ B C = A ∩ B :≡ ∀x : x ∈ C ⇐⇒ x ∈ A ∧ x ∈ B C = A \ B :≡ ∀x : x ∈ C ⇐⇒ x ∈ A ∧ x ∈/ B A ⊆ B :≡ ∀x : x ∈ A =⇒ x ∈ B Axiom (Closure Under Boolean Operations). The sets are closed under the boolean operations ∪ and \. We can interpret second order elements of WS1S with strings of 0s and 1s via their characteristic functions, that is, given a set X, we can produce the sequence s where sn = 1 iff n ∈ X. Via this interpretation, we can define a notion of bitwise map, where the membership of a number in the output depends only on which of the inputs it is a member of. Obviously the set of finite sets of natural numbers is not closed under all bitwise maps. Bitwise negation, that is, complementation, takes a finite set to a cofinite set. However, this is essentially the only obstruction. 52 Definition. We say that an n-ary boolean function f : {0, 1}n → {0, 1} is bounded if it takes (0, . . . , 0) to 0. Definition. Given an n-ary boolean function f : {0, 1}n → {0, 1}, we define the bitwise-f map Bwf : Bwf : (P(X))n → P(X), Such that: k ∈ Bwf (X0, . . . , Xn−1) ⇐⇒ f (δk∈X0, . . . , δk∈Xn−1) = 1, Where of course:  0 δk∈X = 1 k ∈/ X, k ∈ X. Definition. Given an indexed collection X of sets X0, . . . , Xn−1 we denote by X[k] the set of positions i such that k ∈ Xi: (X[k])i := {i|k ∈ Xi}. Given a tuple s ∈ {0, 1}n and indexed collection of sets X0, . . . , Xn, we denote by Xs the set of positions k where the set of indices i such that k ∈ Xi is exactly the positions of 1s in s: Xs := {k|k ∈ Xi ⇐⇒ si = 1}. Lemma 8. Given a tuple s ∈ {0, 1}n, if s = (0, . . . , 0), then Xs can be written as a combination of X0, . . . , Xn−1 using only ∪ and \. Proof. First, we note that intersection, ∩, can be written using only \: A ∩ B = A \ (A \ B). 53 From this, we can use associativity to write arbitrary intersections in terms of ∪ and \. Next, note that for any s ∈ {0, 1}n, Xj j|sj =1 Is the set of positions k where X[k] contains as a subset all the positions of 1s in s. Next, note that Xi i Is the set of positions k where X[k] is nonempty. Next, note that for any j, ( Xi) \ Xj i Is the set of positions k where X[k] is nonempty and does not contain j. As such, for any s ∈ {0, 1}n (with s = (0, . . . , 0)), (( Xi) \ Xj) j|sj =0 i Is the set of positions k where X[k] is nonempty and does not contain any positions of 0s in s. Thus it is that    Xs =  Xj ∩  (( Xi) \ Xj) . j|sj =1 j|sj =0 i Proposition 17. Any bitwise bounded boolean function can be written as a combination of ∪ and \. 54 Proof. We can simply write: Bwf (X0, . . . , Xn−1) = Xs. s|f (s)=1 As such, closure under union and difference is enough make sure that the second order objects are closed under all bitwise bounded boolean functions. Finally, we provide an axiom to ensure that all of the second order objects are bounded. There’s a subtlety here: not even our monadic second order logic can say directly that every set has (arbitrarily) finitely many elements, but it is possible to say that every set has an upper bound, which in (ω, <), is equivalent. Axiom. Every nonempty second order element has a <-maximal element. We denote the maximal element of a set X by max(X) We note that it follows from this axiom and the S is a successor axiom that every set has a downwards closure. Definition. Given a set X, we denote by its downward closure, dcl(X), the ⊆-least downwards closed set containing X as a subset. Proposition 18. It follows from our axioms that every set has a downwards closure. Proof. Given a set X, X has a <-maximal element, denote it by x. Since S(x) > x, there must be a downwards closed set Y containing x which does not contain x (otherwise we would have x ≥ S(x). I claim that Y contains all of X, since any other element z ∈ X satisfies z < x, and thus since Y contains x and is downwards closed, this means Y contains z as well by the definition of <. 55 I also claim that Y is ⊆-minimal, in that any other downwards closed set containing x must also contain something strictly larger than x. Suppose we have two downwards closed sets containing x, W = W . Without loss of generality, we may assume there is some w ∈ W with w ∈/ W . Since w is not in W , a downwards closed set containing x. Thus we have that w x, and so w > x (by totality). So there cannot be two downwards closed sets containing x as their maximal element. Finally, we note that atoms under ⊆ are actually singletons. Proposition 19 (No spurious singletons). A nonempty set X contains only itself and the emptyset as subsets iff it is a singleton {x} for some x. Proof. (⇒): Suppose we have a nonempty set X containing only itself and the emptyset as subsets. Since X is nonempty, it contains an element, say x. Suppose X also contains an element y = x. Then X \ {x} is neither X nor empty, contradicting our hypothesis. So X = {x}. (⇐): Suppose Y ⊆ {x} and Y is nonempty. Then Y must contain an element y. However, in order to be a subset, y ∈ {x}, hence y = x. Y cannot contain any other elements, as {x} doesn’t contain any other elements. A simple result, but a critical one later, since we will need to represent first order elements as singletons, and to do that we will need to be certain that singletons are describable in terms of their ⊆ behaviour alone. We summarize and number these axioms as follows: Axiom 2. The second order objects are defined by their elements, closed under bounded bitwise operations, closed under finite modification, and bounded. 56 (2.1) ∀X, Y : (∀x : x ∈ X ⇔ x ∈ Y ) ⇒ X = Y (Extensionality) (2.2) ∀x : ∃X : ∀y : y ∈ X ⇔ y = x (Singleton) (2.3) ∀X, Y : ∃Z : ∀x : x ∈ Z ⇔ (x ∈ X ∨ x ∈ Y ) (Union Closure) (2.4) ∀X, Y : ∃Z : ∀x : x ∈ Z ⇔ (x ∈ X ∧ x ∈/ Y ) (Difference Closure) (2.5) ∀X : ∃x : x ∈ X ∧ ∀y : y ∈ X ⇒ y ≤ x (Bounding) 3.4 Logical Automata In this section we will introduce a notion of logical automata, automata which have been modified to operate within WS1S. Alternately, we can think of this in terms of canonical forms for formula. WS1S has a very nice canonical form for formulas, and logical automata will provide a semantically transparent parameterization of these canonical forms. Recall that if X is an indexed collection with finite set of indices K, and s ∈ {0, 1}K, then we use the shorthand Xs to stand for {x|∀k ∈ K : sk = 1 ⇐⇒ x ∈ Xk}. Thus, we write x ∈ Xs as shorthand for:    x ∈ Xs :≡  x ∈ Xk ∧  x ∈/ Xk . k|sk =1 k|sk =0 We also use the shorthand n for the set {0, . . . , n − 1}, but never in reference to the second order element of a model of WS1S. Definition. A logical automaton is a 5-tuple A = (n, Q, I, δ, F ) where: • n is the number of inputs, potentially 0. • Q is a finite set of states. We will refer to the elements of Q via some canonical ordering as 0, . . . , |Q| − 1, but often the states will retain additional information, 57 for instance they may be pairs of states of other automata. We will also refer to the number of states of an automaton A as |A|. Often we will prove results for automata with state set m where this suffices to prove the result in general. • I ⊆ Q is the set of initial states. • δ ⊆ Q × 2n × Q is the transition relation. • F ⊆ Q is the set of final states. These automata are specifically designed to accept n-tuples of second order elements of a model of WS1S. Given an n-tuple of second order objects X = (X0, . . . , Xn−1), we say that a first order-object and a |Q|-tuple of second order objects Y = (Y0, . . . , Y|Q|−1) is a run of A on input X if: • is strictly larger than anything in any Xi (particularly, the least strict upper bound to the Xi). • Yi form a partition of dcl( ), that is, they have pairwise empty intersection and together their union is all of dcl( ). • 0 ∈ Yq for some q ∈ I. • For any 0 < x < , if x ∈ Yq and x ∈ Xs and S(x) ∈ Yq then (q, s, q ) ∈ δ. We say that such a run is accepting if additionally: • ∈ Yq for some q ∈ F . We interpret the definition of a run as follows: first order elements represent instances of time. If x ∈ Yq for some state q (and since the Yq form a partition of dcl({ }), this happens for exactly one state provided x ≤ ) we interpret this as 58 the automaton being in state q at time x. Of course, we require the automaton to be in exactly one state at any time, and we require the automaton to be in a start state at time 0. Then we require that if the automaton is in a state q at time x and it reads input s (the input being read at any one time is just a binary sequence representing whether x is in each of the inputs.) at time x, then at time S(x), the automaton should be in a state q such that q, s, and q are related by the transition relation. We also require that the automaton stop at some time , after having read all the input. Note that at time max( i∈n Xi), the state of the automaton has not yet been updated to take into account the input at time max( i∈n Xi) which by construction is nontrivial, so we require that be strictly larger than this. A design choice has been made as to whether the state at time x should be the state of the automaton before reading the input at time x or after. There are advantages and disadvantages to both, and both give equivalent notions of automata, but it seems like the former option is preferable, due to saving steps on many inductive proofs throughout this chapter. Next, given an automaton A = (n, Q, I, δ, F ), we construct a formula ΦA(X0, . . . , Xn−1) which formalizes the above. Definition. Define the formula R(n,Q,I,δ,F )(X0, . . . , Xn−1, , Y0, . . . , Y|Q|−1) in WS1S as the conjunction of: • ∀x : i∈Q j∈Q[x ∈ Yj ⇐⇒ (i = j ∧ 0 ≤ x ≤ )] • i∈I 0 ∈ Yi • ∀x : 0 ≤ x < =⇒ (i,s,k)∈δ(x ∈ Yi ∧ x ∈ Xs ∧ S(x) ∈ Yk) 59 This formalizes the notion of ( , Y0, . . . , Y|Q|−1) being a run of (n, Q, I, δ, F ) on input X0, . . . , Xn−1. Definition. Define the formula Φ(n,Q,I,δ,F )(X0, . . . , Xn−1) in WS1S as: ∃ : ∃Y0, . . . , Y|Q|−1 : = LSUB( X) ∧R(n,Q,I,δ,F )(X0, . . . , Xn−1, , Y0, . . . , Y|Q|−1) ∧ ∈ Yi. i∈F Where LSUB(X) is shorthand for the least strict upper bound for a set, either 0 for the emptyset or S(max(X)) for a nonempty set. This formalizes the notion of there being an accepting run of (n, Q, I, δ, F ) on input X0, . . . , Xn−1. We are now prepared to introduce our axiom schema for finite automata. It suffices in this case to simply specify that automata have runs on any input. We will be able to prove later theorems involving whether automata have accepting runs or not. Of course, not every automaton has a run on any input, but there are simple things we can check to ensure that every automaton has a run on any input. Definition. An automaton (n, Q, I, δ, F ) is said to be runnable if: • |I| > 0, • ∀q, s ∈ 2n : ∃q : δ(q, s, q ). Definition. An automaton (n, Q, I, δ, F ) is said to be reverse-runnable if: • |F | > 0, 60 • ∀q, s ∈ 2n : ∃q : δ(q , s, q). Note that these are assertions about the internal structure of the automaton: nothing about the runs or acceptance of the automaton. As such, the following axiom schema is nontrivial: Axiom. (3.A) If A = (n, Q, I, δ, F ) is a runnable automaton, for any sequence of inputs X0, . . . , Xn−1, Φ(n,Q,I,δ,Q)(X0, . . . , Xn−1). Axiom. (4.A) If A = (n, Q, I, δ, F ) is a reverse-runnable automaton, for any sequence of inputs X0, . . . , Xn−1, Φ(n,Q,Q,δ,F )(X0, . . . , Xn−1). This is a very specific sort of inductive axiom: if we can start constructing a run of our automaton and at any point if we have a partial run, we can find a state to transition to next, then we can construct the whole run of our automaton. In the runnable case we construct our run from start to end, and in the reverse-runnable case, we construct a run from end to start. Note that in the runnable case, we don’t have to worry about our run being accepting, and in the reverse-runnable case, we don’t have to worry about our run having the right start state, since we’ve replaced F and I respectively with all of Q. This completes our axiomatization. 3.5 Basic Consequences of Our Axioms In this section we collect some of the basic consequences of our axioms, in preparation for future sections. 61 First, we define a predecessor operation P which is meant as an inverse to S, and define the operations S , P on second order elements which are meant to apply their respective operations to each element in the set. Definition. Define  0 P (x) := y x = 0, x = S(y). Given a second order element X, define S (X) and P (X) such that: S(x) ∈ S (X) ⇐⇒ x ∈ X, and 0 ∈/ S (X), and: x ∈ P (X) ⇐⇒ S(x) ∈ X. Proposition 20. It follows from our axioms that the second order objects are closed under S . Proof. Construct the automaton B := (1, {0, 1}, {0}, δ, {0, 1}), Where: δ = {(i, j, j)|i, j ∈ {0, 1}}. Let Y be a run of B on input X. Then it is easy to check that Y1 is S (X) by looking at the transition relation. Proposition 21. It follows from our axioms that the second order objects are closed under P . 62 Proof. Construct the automaton B = (1, {0, 1} × {0, 1}, {0, 1} × {0, 1}, δ, {(0, 0)}), Where: δ = {((i, j), k, (j, k))|i, j, k ∈ {0, 1}}. Let Y be a run of B on input X (this uses (4.A)). Then it is easy to check by looking at the transition relation that Y(1,0) ∪ Y(1,1) is just X, and from this that Y(0,1) ∪ Y(1,1) is P (X). Secondly, I assert that every set has a <-least element. This is a key result that allows us to perform inductive proofs: if we want to show that something always happens for any first order element, ∀x : P (x), then it suffices to prove three things: 1. P (0), 2. P (x) =⇒ P (S(x)), 3. The set of x such that ¬P (x) is a second order element of our model4. I claim it will then follow that ∀x : P (x). Let Y be the set of x such that ¬P (x). Then Y will have a least element. This least element cannot be 0 or the successor of anything, contradicting (1.7). Typically, we will prove (3) by application of the closure axioms (2.2), (2.3), (2.4), (3.A), and (4.A). We can even show that a particular run satisfies the transition relation of an automaton using induction. In order to use induction, we need to be able to talk about the set of positions where, given an automaton A and input X, a proposed 4Or, at least, for any y, the set {x|x < y ∧ ¬P (x)} is a second order element of our model. 63 run Y fails to satisfy the transition relation δ for A. These positions are given by: [Ya ∩ Xs ∩ P (Yb)]. (a,s,b)∈/δ Typically, though, there will be a more direct proof: if X, Y are defined in such a way as to satisfy a particular transition relation, then it will usually follow directly that they satisfy the transition relation for our automaton A. Proposition 22. It follows from our axioms that every nonempty set has a <-least element. Proof. Begin with a set X. Consider the following automaton: Bit := (1, {0, 1}, {0}, δ, {0, 1}), Where: δ := {(0, 0, 0), (0, 1, 1), (1, 0, 1), (1, 1, 1)}. Now, let , Y0, Y1 be the run of Bit on input X. will be the successor of the largest element of X. Y0 will be the set of positions up to and including the first appearance of an element of the input, and Y1 will be all positions afterwards up to and one past the last appearance of an element of the input. First, I assert that if z ∈ Y0 and w < z then w ∈ Y0. Suppose not. Then there is a largest w with w < z and w ∈ Y1, the largest element of Y1 ∩ dcl({z}). This w cannot be z, since z ∈ Y0, so it must be that S(w) ≤ z. But since w was the largest element of Y1 ∩ dcl({z}), it must be that S(w) ∈ Y0, contradicting the construction of our transition relation. Y0 is nonempty since 0 is the starting state. Thus, By (2.5), Y0 has a largest element y. Since {Y0, Y1} is a partition of dcl({ }), S(y) is either in Y1 or not in dcl({ }) at all, that is, S(y) > (by totality). 64 Case 1: suppose that S(y) is in Y1. Since y ∈ Y0 and S(y) ∈ Y1, it must be by the transition relation that y ∈ X. Furthermore, I assert that there is no z < y such that z ∈ X, since otherwise by the transition relation, S(z) would be in Y1, contradicting our earlier proved downward closure of Y0. So this y is the smallest element of X. Case 2: suppose that S(y) > . Since y ∈ dcl({ }), it must be that y ≤ , thus since S is a successor operation, y = . By the downward closure of Y0, it must be that Y0 = dcl({ }). I claim in this case that X is empty, since if it contained an element w, then S(w) would be in Y1 and Y1 must be empty, since {Y0, Y1} is a partition of dcl({ }). 3.6 Determinization One goal in this chapter is to prove that the standard Bu¨ chi-Elgot-Trakhtenbrot result, equivalence of finite automata and WS1S formulas, holds for our logical automata, and that this equivalence is a consequence of our axioms. Specifically, given a formula φ, we wish to prove that there is an automaton A such that: φ(X0, . . . , Xn−1) ⇐⇒ ΦA(X0, . . . , Xn−1). We will denote this automaton Aφ. As such, much of the next few sections is dedicated to reproving classical results of automata theory, from the given axioms. Critical to the importance of finite automata is the expressive equivalence of deterministic and non-deterministic automata. While non-deterministic automata are much more natural from an atemporal, relational, formal logic stand- 65 point, being as they are simply a specific case of a much more general class of local constraint systems5, deterministic automata are much more natural from a temporal automaton perspective. Additionally, it is clear how to complement them, a critical problem in the theory of general local constraint systems. Definition. An automaton A = (n, Q, I, δ, F ) is deterministic if: • |I| = 1, • ∀q, s ∈ 2n : ∃!q : δ(q, s, q ). An automaton A = (n, Q, I, δ, F ) is reverse-deterministic if: • |F | = 1, • ∀q, s ∈ 2n : ∃!q : δ(q , s, q). Proposition 23. It follows from our axioms that every deterministic automaton A = (n, Q, I, δ, F ) has exactly one run on input X0, . . . , Xn−1 of length = LSUB( X). Proof. Since A is runnable, it follows from (3.A) that A has at least one such run. Suppose there are two runs Y and Y . Consider the set K= [Yq ∩ Yq ], q=q ∈Q The set of positions where Y and Y differ. I assert that K is empty. Suppose not, then it must have a least element. This least element cannot be 0, since |I| = 1. 5Here we refer to systems with a notion of labelling and locality where we distinguish those labellings which can be augmented to ones that in every local region satisfy some constraint versus those that cannot. A wide variety of automata are of this form, including cellular automata, but a particularly enlightening example may be unranked tree automata, whose local regions are not bounded in size. Additionally, differential equations and operations are of this form: there is a sense in which they are a special case of continuous automata. 66 Then it is S(x) for some x. Then we know the state of our runs at time x agree, but at time S(x) disagree, contradicting δ being a function from Q × 2n → Q. Definition. Given an automaton A = (n, Q, I, δ, F ), define the determinization of A, denoted det(A), as the automaton: det(A) = (n, P(Q), {I}, δ , {K ⊂ n|K ∩ F = ∅}), Where: δ (R, s, T ) : ⇐⇒ T = {q |∃q ∈ R : δ(q, s, q )}. Of course det(A) is deterministic. Proposition 24. From our axioms, we can conclude that given an automaton A = (n, m, I, δ, F ) and an input X0, . . . , Xn−1, ΦA(X) ⇐⇒ Φdet(A)(X) To summarize the proof: I will assert that at any point x, the state of det(A) will be the set of states for which there is a run of A on the same input with that state at point x. Showing that this follows from our axioms is a bit tricky: we can show that there is no specific point x where this stops happening, but our level of induction is not strong enough for this to suffice. In order to show that states don’t disappear in the run of det(A), that is, it isn’t the case that there’s a state which is reachable, but doesn’t appear in the run of det(A), one simply has to compare a run containing the state in question at the position in question to the run of det(A). Bitwise operations will give us the set of positions where states of our run do not appear in the run of det(A) and this set cannot have a smallest element, so it’s empty. 67 To show that states don’t spuriously appear in the run of det(A), that is, it isn’t the case that there’s a state which isn’t reachable, but does appear in the run of det(A) is trickier: given a state at a position in the run of det(A), we construct a run of A backwards, starting at this state, and using the run of det(A) as a guide to make sure that we stay within states that can be traced back to the starting state and not wind up in some dead-end branch. This construction is performed by an automaton H. We expect H to be reverse-runnable, but only because of our choice of input, so we need to add in a dummy state to ensure that H has a transition relation that makes it reverse-runnable, and then argue that for this particular input, the dummy state is never used. So if there is an accepting run of A, it must be the case that the run of det(A) ends in a state containing the final state of our accepting run of A, hence this run of det(A) is accepting. If there is an accepting run of det(A) then it must be the case that some accept state appears in the final state of this run, and so there’s a run of A such that this state appears as the final state. It is also worth noting that this is the essential use of axiom schema (3.A) and (4.A): asserting that the determinization has a run at all and then asserting that the auxiliary automaton H has a reverse run. Other uses of these axioms could be made into their own axioms with significantly less semantic content, but this case seems to use the full power of our schema. Proof. det(A) is deterministic, so we may consider the unique run Z on input X. I assert the following. Suppose x ∈ ZJ for J ∈ P(m). Then J is exactly the set of all states q such that there is a run Y0, . . . , Ym−1 satisfying RA(X0, . . . , Xn−1, x, Y0, . . . , Ym−1) with x ∈ Yq. Informally, the state of det(A) at any one time x is the set of states for which there is some run of A with that 68 state at time x. It should be clear that if this statement is true for x then it’s true for S(x). Suppose it’s the case that x ∈ ZJ and J is exactly the set of states q that show up at position x in some run Y of A on input X. Then by construction of δ , S(x) is in some ZK for K the set of states which potentially show up as the next states after reading the input at position x. One may construct an appropriate run by augmenting the runs Y by combining the singleton and boolean operation axioms. Now suppose it’s the case that there’s a run Y0, . . . , Ym−1 with state q appearing at position x but x ∈ ZJ for some J with q ∈/ J. Then consider the set: U := {x|∃i, J : x ∈ Yi ∧ x ∈ ZJ ∧ i ∈/ J} = Yi \ ZJ . i∈m Ji This set exists by closure properties. By our assumption, this set is nonempty, and hence has a smallest element y. By construction of det(A), this y = 0. What this means is that there is an S−1(y) which is not in U but y is in U . But again, one can easily check that if S−1(y) ∈/ U then by construction of det(A), neither is y. So the only thing that could go wrong was if there were a state q and a position x such that x ∈ ZJ for some J q, but there was no run Y of A with x ∈ Yq. That is, some state mysteriously appeared in the run of det(A) that didn’t belong there. Suppose this is the case. Consider the following automaton6: H := (n + m, m ∪ {qd}, m ∪ {qd}, δ , {q}). Where (p, s t, p ) ∈ δ if (p, s, p ) ∈ δ ∧ tp = 1 or if p = qd and there is no a such that (a, s, p ) ∈ δ and ta = 1. 6This automaton is based on the Harvester Automaton for A used in Generating the Regular Functions and Krohn-Rhodes. 69 By construction, H is reverse-runnable, and we refer to qd as the dummy state. Since H is reverse-runnable, there is a run W such that: RB(X0, . . . , Xn−1, U0, . . . , Um−1, x, W0, . . . , Wm, Wqd), Where: Ui := {x|∃J i : x ∈ ZJ } = ZJ . Ji I assert that Wqd = ∅. Suppose not, then Wqd has a maximal element w. This element is not x, since we’re in state q at that point. Label our situation as follows: let J be the state of det(A) at position w and J be the state of det(A) at position S(w) (note that this even makes sense if S(w) = x). Let s be the input at time w. So, specifically, w ∈ Xs, w ∈ ZJ , w ∈ Wqd, S(w) ∈ ZJ , and S(w) ∈ Wr. Here’s a diagram to keep track of the situation. Recall that X was the original input to our automaton A, Z was the run of det(A) on input X, and W was the run of H on input (X, U) where U was a representation of Z. w S(w) Xs ZJ J W qd r (r ) Table 3.1: Transition Diagram for the Determinization and Harvester Near the Dummy State On the one hand, because w ∈ Wqd, it must be the case that there is no state a such that (a, s, r) ∈ δ with a ∈ J. • In case S(w) = x, r must be the final state of H which we chose to be q, which we know is in J . 70 • Otherwise, r must have been obtained going backwards from another state, r , which by the transition relation for H tells us that r ∈ J . Either way, we get that r ∈ J . By construction of Z, we know that any state showing up in J can be traced back to a state in J. Specifically, there is some state a such that (a, s, r) ∈ δ with a ∈ J, a contradiction. This proves our assertion that Wqd is empty, which means that W is a valid run of A on input X up to point x (part of the definition of H was that its output would have to satisfy the transition relation for A or have a qd show up somewhere). This completes the proof of our earlier assertion, that the state of det(A) at any one time x is the set of states for which there is some run of A up to that point with that state. So, as stated earlier, if there is an accepting run of A, it must be the case that the run of det(A) ends in a state containing the final state of our accepting run of A, hence this run of det(A) is accepting. If there is an accepting run of det(A) then it must be the case that some accept state appears in the final state of this run, and so there’s a run of A such that this state appears as the final state. We will see in the next section that this means that it follows from our axioms that the standard complementation procedure of determinizing and then complementing the set of accept states works correctly. 71 3.7 Complementation Definition. Given a deterministic automaton A = (n, Q, I, δ, F ), define Swap(A) = (n, Q, I, δ, Q \ F ). Proposition 25. It follows from our axioms that if A is deterministic, and X a suitable input, then: ¬ΦA(X) ⇐⇒ ΦSwap(A)(X). Proof. It follows from determinization that there is a unique run Y of A on input X. One can determine the last state of this run by taking the maximum of Y, and determining where it came from. Either it came from F in which case ΦA(X) or it came from Q \ F , in which case ΦA(X), but not both. Definition. Given a general automaton A = (n, Q, I, δ, F ), define Comp(A) = Swap(det(A)). Proposition 26. It follows from our axioms that for any A and suitable input X: ¬ΦA(X) ⇐⇒ ΦComp(A)(X). 3.8 Conjunction In this section we show that if we have automata A and A which correspond to formulas φ, φ , then we can construct an automaton Conj(A, A ) which corresponds to the formula φ ∧ φ . Definition. Given two n-ary automata A = (n, Q, I, δ, F ) and A = (n, Q , I , δ , F ) define their conjunction: Conj(A, A ) := (n, Q × Q , I × I , δ , F × F ), 72 Where: δ = {((a, b), s, (c, d))|(a, s, c) ∈ δ ∧ (b, s, c) ∈ δ }, Proposition 27. It follows from our axioms that, given n-ary A, A and suitable X, then: [ΦA(X) ∧ ΦA (X)] ⇐⇒ ΦConj(A,A )(X). Proof. (⇒): Suppose we have runs Y witnessing ΦA(X) and Y witnessing ΦA (X). Construct run W where: W(q,q ) = Yq ∩ Yq . It’s easy to check that W witnesses ΦConj(A,A )(X). (⇐): Suppose we have run Y witnessing ΦConj(A,A )(X). Then let: Wq = Y(q,q ), q ∈Q And: Wq = Y(q,q ). q∈Q Then it’s easy to check that W witnesses ΦA(X) and W witnesses ΦA (X). 3.9 Bounding and Unbounding Just as important as the expressive equivalence between deterministic and nondeterministic automata is the expressive equivalence between bounded automata, those automata, like our implementation of logical automata, whose runs terminate immediately after reading the last nontrivial input, and unbounded automata, those automata which are allowed to continue running after 73 reading the last nontrivial input. In this section, we will define an analog of Φ for running logical automata unboundedly, and provide constructions for an equivalence between bounded and unbounded automata. Definition. Recall that we constructed a formula RA(X0, . . . , Xn−1, , Y0, . . . , Y|Q|−1) to indicate that Y0, . . . , Y|Q|−1 was a run of length of A on input X0, . . . , Xn−1. Define the formula Ψ(n,Q,I,δ,F )(X0, . . . , Xn−1) in WS1S as: ∃ : ∃Y0, . . . , Y|Q|−1 : ≥ LSUB( X) ∧R(n,Q,I,δ,F )(X0, . . . , Xn−1, , Y0, . . . , Y|Q|−1) ∧ ∈ Yi. i∈F We now define a notion of unbounding a logical automaton: Definition. Given an automaton A = (n, Q, I, δ, F ), define its unbounding: UBd(A) := (n, Q × Q, {(i, i)|i ∈ I}, δ , Q × F ), Where: δ ((a, b), s, (c, d)) : ⇐⇒ (s = 0 ∧ d = b ∧ δ(a, s, c)) ∧(s = 0 ∧ d = c ∧ δ(a, s, c)) This automaton keeps track of two things: the state of the original automaton A and the last state the automaton went into when it saw a nonempty input. Proposition 28. Given an automaton A = (n, m, I, δ, F ), and suitable input X, it is provable from our axioms that the following are equivalent: 74 1. ΦA(X), 2. ΦUBd(A)(X), 3. ΨUBd(A)(X). Proof. (1) ⇒ (2): Let Y be the accepting run as given in (1). Given our accepting run Y of A, our goal is to construct two runs, which we will then combine together to produce an accepting run of UBd(A). This first is our run Y. The second is the run above, but delayed in the case of inputs which are 0. Define the automaton D: D := (|Q| + 1, Q, I, δ , Q), Where: δ (a, r t, b) : ⇐⇒ [(t = 1 ∧ rb = 1) ∨ (t = 0 ∧ b = a)] . Let V be the run of D on input P (Y0), . . . , P (Y|Q|−1), i Xi. One then checks that, by construction, W where: W(q,q ) = Yq ∩ Vq , Is an accepting run of UBd(A) on input X. (2) ⇒ (3): This is trivial. A bounded run of UBd(A) is also an unbounded run of UBd(A). (3) ⇒ (1): Given an accepting unbounded run Y of UBd(A), let LSUB( i Xi). Then our desired run of A is W, where: Wq = Y(q,q ) ∩ dcl({ }). q = 75 One can show inductively that everywhere beyond finishing reading the input X, the second coordinate of the state UBd(A) is in is a particular accept state of A. One then argues that at position max( i Xi), UBd(A) read a nontrivial input, and so updated the second coordinate of its state to agree with the first coordinate of its state (ie, that particular accept state). If i Xi is empty, then the second coordinate of the state cannot change, so we must have started in an accept state. Hence, the run W is accepting. We now define a notion of bounding a logical automaton: Definition. Given an automaton A = (n, Q, I, δ, F ), define its bounding: Bd(A) := (n, Q, I, δ, F ), Where: F = {q ∈ Q|∃i, q0, . . . , qi−1 :q = q0 ∧qi−1 ∈ F ∧∀j : δ(qj, 0, qj+1) Proposition 29. It follows from our axioms that, given an automaton A = (n, Q, I, δ, F ), and input X, the following are equivalent: 1. ΨA(X), 2. ΨBd(A)(X), 3. ΦBd(A)(X), Proof. (1) ⇒ (2): This is clear. Any witnessing run in (1) is also a witnessing run in (2). 76 (3) ⇒ (1): One simply takes the witnessing run in (3) and augments it with the finite series of states which lead the final state of such a run to a final state of A. This augmented run is in our model by closure under finite modifications, and of course is an accepting unbounded run of A. (2) ⇒ (3): Consider an accepting run in (2). I asset that this run is all ac- cept states from the end of reading the input. Specifically, suppose we have , Y0, . . . , Y|Q|−1 with ≥ LSUB( X), RBd(A)(X, , Y), and ∈ Yf for some f ∈ F . Define the set: U := Yi \ dcl( i∈/F Where F is the set of accept states of Bd(A). X ), I assert that U , the set of all positions beyond reading the input where the state is not an accept state of Bd(A), is empty, and thus that LSUB( X) ∈ Yi for some i ∈ F , which would mean that the appropriate prefix of Y is the desired run in (3). Suppose U is nonempty. Then let u be the maximal element of U . u cannot be max( Y ), since we know that max( Y ) ∈ Yf for some f ∈ F by hypothesis. So it must be the case that S(u) is in Yf for some f ∈ F . In this case one may augment the sequence witnessing that f ∈ F to verify that whatever i gives us u ∈ Yi must also be in F , contradicting our choice of u. In the next sections, we will take advantage of our new equivalence to handle a small detail regarding free variables and existential quantification (projection). 77 3.10 Permutation and Introduction of Variables Our project so far has been to show that if we have a formula φ with free second order variables X0, . . . , Xn−1, then there is an automaton A with n inputs such that: ∀X0, . . . , Xn−1 : φ(X0, . . . , Xn−1) ⇐⇒ ΦA(X0, . . . , Xn−1). However, this isn’t quite all that we need to show. Consider the example of taking the conjunction of two formulas φ and φ , where φ has free variables X0, X1 and φ has free variables X1, X2. We construct inductively, automata A, A such that φ(X0, X1) ⇐⇒ ΦA(X0, X1) and φ (X1, X2) ⇐⇒ ΦA(X1, X2). Before we can take our conjunctions, we must first construct automata B, B that take in 3 inputs, such that φ(X0, X1) ⇐⇒ ΦB(X0, X1, X2) and φ (X0, X1, X2) ⇐⇒ ΦB(X0, X1, X2). Then we may use our main proposition from our section on conjunction to produce an automaton C such that φ(X0, X1) ∧ φ (X1, X2) ⇐⇒ ΦC(X0, X1, X2). A small detail, certainly, but one we cannot brush over if we wish to prove that our equivalence holds from our axioms. There are many ways to handle this issue indirectly, but in this section we will address the issue directly with a pair of results. 78 Definition. Given an automaton A = (n, Q, I, δ, F ), define the automaton Aug(A): Aug(A) := (n + 1, Q, I, δ , F ), Where: δ := {(a, s t, b)|(a, s, b) ∈ δ}. Proposition 30. Given an n-ary deterministic automaton A, for which an empty input takes accepting states to accepting states, the following are equivalent: 1. ΨA(X0, . . . , Xn−1), 2. ΨAug(A)(X0, . . . , Xn−1, Xn). Proof. (2) ⇒ (1) is immediate. The accepting run for (2) is the accepting run for (1). (1) ⇒ (2): The only difficult case here is if the accepting run for (1) is shorter than the new input Xn. Otherwise the accepting run for (1) is the accepting run for (2). Suppose now that we need to extend the accepting run for (1) to be longer, while still being accepting. We will prove that ΦAug(A)(X0, . . . , Xn), from which (2) will follow. Since A is deterministic, so is Aug(A). By axiom (3.A), there is a bounded run Y of Aug(A) on input X0, . . . , Xn. Since runs of deterministic automata are unique, Y is just an extension of the accepting run for (1). One then can prove by induction that since the accepting run for (1) ended in an accept state, and since the transition relation for Aug(A) on empty input takes accepting states to accepting states, Y must end in an accepting state. Corollary 1. Given an n-ary automaton A, the following are equivalent, for all X0, . . . , Xn: 79 1. ΦA(X0, . . . , Xn−1), 2. Φdet(A)(X0, . . . , Xn−1), 3. ΨUBd(det(A))(X0, . . . , Xn−1), 4. ΨAug(UBd(det(A)))(X0, . . . , Xn−1, Xn), 5. ΦBd(Aug(UBd(det(A))))(X0, . . . , Xn). Proof. All of the steps above are equivalences we’ve previously shown. It simply suffices for (3) ⇐⇒ (4) to note that any automaton in the image of UBd will take accept states to accept states, and that the unbounding of a deterministic automaton will be itself deterministic. Finally, note that in our original motivating example, we had a formula φ (X1, X2), and we wanted to, given an automaton A such that: φ (X1, X2) ⇐⇒ ΦA(X1, X2), construct an automata B such that: φ (X1, X2) ⇐⇒ ΦB(X0, X1, X2). In order to obtain such an automaton in addition to our introduction of variables construction, we also need a permutation of variables construction. Simply note that the obvious construction of appropriately permuting the transition relations works correctly: one can show that the run of one automaton, by virtue of satisfying its transition relation, satisfies the transition relation of the other on suitably permuted inputs. Having shown we can handle them, henceforth in this paper we will ignore issues of introduction and permutation of variables. In the next section, we 80 apply our unbounding construction to prove that projected automata behave correctly, provably from our axioms. 3.11 Projection In this section we seek to prove that the traditional construction for projecting the language accepted by an automaton works correctly in our axiom system. That is, we wish to show that given an automaton A, there is an automaton Proj(A) such that: ∃Xn : ΦA(X0, . . . , Xn−1, Xn) ⇐⇒ ΦProj(A)(X0, . . . , Xn−1). Definition. Given an automaton A = (n + 1, Q, I, δ, F ), define the unbounded projection: UProj(A): UProj(A) = (n, Q × {0, 1}, I × {0, 1}, δ , F × {0, 1}), Where: δ := {((a, i), s, (b, j))|i, j ∈ {0, 1}, (a, s i, b) ∈ δ}. Proposition 31. Given suitable input X0, . . . , Xn−1, the following are equivalent: 1. ∃Xn : ΨA(X0, . . . , Xn), 2. ΨUProj(A)(X0, . . . , Xn−1). Proof. (1) ⇒ (2): Suppose Xn is such that ΦA has accepting run Y on input X0, . . . , Xn. Then construct run: W(i,0) =Yi ∩ (dcl(Xn ∪ Yi) \ Xn), i W(i,1) =Yi ∩ Xn. 81 I assert that this is an accepting run for ΨUProj(A). One can check that the transition relation holds. (2) ⇒ (1): Suppose ΨUProj(A) has accepting run Y on input X0, . . . , Xn−1. Then construct: Wi =Y(i,0) ∩ Y(i,1) Xn = Y(i,1) i Then I assert that Xn is such that ΨA has accepting run W on input X0, . . . , Xn. Finally, we construct the projection operation for bounded automata: Proposition 32. Given automaton A, the following are equivalent: 1. ∃Xn : ΦA(X0, . . . , Xn), 2. ∃Xn : ΨUBd(A)(X0, . . . , Xn), 3. ΨUProj(UBd(A))(X0, . . . , Xn−1), 4. ΦBd(UProj(UBd(A)))(X0, . . . , Xn−1). As such, we refer to Bd(UProj(UBd(A))) as Proj(A). This completes our collection of inductive constructions. We now need to provide, for various possible atomic formulae, base case automata that are provably equivalent to them. 82 3.12 Base Case Automata While we’ve developed a lot of tools for constructing new automata from old, we haven’t yet constructed many automata to start with. Definition. Define the deterministic automaton ASub: ASub := (2, {s, f }, {s}, δ, {s}), Where the b such that δ(a, s, b) is given by the table below: s\a s f 00 s f 01 s f 10 f f 11 s f Table 3.2: Transition Table for ASub Proposition 33. It is provable from our axioms that X ⊆ Y iff ASub accepts the pair X, Y . Proof. (⇒): Suppose that X ⊆ Y . Then one may prove that ASub, on reading X, Y , is always in the state s, since by the transition relation, there can be no every element of the form n ∈ ω (elements in the ω-part). We also introduce the natural S operation which takes elements of the form n to n + 1 and elements of the form ∞ + z to ∞ + z + 1. Definition. A subset X of ω + ζ is said to be internally periodic of period m if there is some a ∈ ω and b ∈ Z and some set P ⊆ {0, . . . , m − 1}, such that an element of 88 ω + ζ between a and ∞ + b is in X iff it is of the form c or ∞ + c for some c congruent to an element of P modulo m. Definition. Let p be an odd prime. Let Wp denote the collection of bounded subsets of ω + ζ, which are internally periodic of some period not divisible by p. Then let Hp denote the two-typed structure: Hp := (ω + ζ, Wp, ∈, S). Lemma 9. In Hp, the downward closed sets are exactly those sets of the form {x|x < } for some ∈ ω + ζ. Proof. It is clear that sets of this form are downward closed. It suffices to show that if a second order object of Hp is downward closed, then it is of this form. Suppose the set X ∈ Wp is downward closed. Then it must be internally periodic of period 1. Since it’s in Wp it must also be bounded above by some x ∈ ω + ζ. Suppose X contains any element of the ζ part. Then by properties of Z, we can find a largest element in its ζ part, and thus a largest element w ∈ X. By downward closure this set must contain all elements of the ζ part up to w. By internal periodicity, X must contain a final segment of elements of the ω part. And by downward closure again, X must contain all elements of the ω part. Thus X is of the desired form. Suppose instead that X does not contain any element of the ζ part. By internal periodicity, it must thus be bounded above in the ω part. One can then reason that any downwards closed subset of ω which is bounded above must be of the desired form. Proposition 37. Hp is a model of axioms (1.1) through (2.5) 89 Proof. Recall that our axioms (1.1) through (1.7) classified those linear orderings which were elementarily equivalent to (ω, <). As mentioned in [9], these are the linear orderings of the form (ω + ζ · L), in particular, this includes ω + ζ, the structure we’re working in now. It suffices to prove that the construction of < in terms of downward closed sets gives us the same < as the natural ordering on ω + ζ, which follows immediately from the above lemma. One then routinely checks axioms (2.1) through (2.5): (2.1): Second order elements in Hp are sets and thus satisfy extensionality. (2.2): Singletons are bounded and internally periodic of period 1. (2.3): The union of two bounded internally periodic sets of periods a, b is bounded and internally periodic of period ab. If p a, b then p ab. (2.4): The difference of two bounded internally periodic sets of periods a, b is bounded and internally periodic of period ab. (2.5): By definition, all second order elements in Hp are bounded above. Proposition 38. Hp is a model of (3.A) for any A with fewer than p states. Proof. Given a runnable automaton A, first, restrict to an automaton A which is runnable and deterministic by removing terms from its transition relation. Any accepting run of A on suitable input X0, . . . , Xn−1 will also be a run of A. Let X0, . . . , Xn−1 be from Wp, and let us construct run Y0, . . . , Y|Q|−1. If none of the X0, . . . , Xn−1 actually intersect the ζ part, just use the theorem for the standard model of WS1S, so we may assume at least one of the Xi has an element from the ζ part. First let us construct the run restricted to the ω part. By runnability and induction on ω, we can construct the unique run restricted to the ω part. I claim 90 that this run is eventually periodic of a period not divisible by p. Since each of the inputs X0, . . . , Xn−1 restricted to their ω parts are eventually periodic of a period not divisible by p, then beyond some point x ∈ ω, all of the X0, . . . , Xn−1 will together be periodic of a period m which is a factor of the product of the periods of each of the Xi, hence also not divisible by p. Since A is deterministic, its run on this input will eventually be periodic of period k times m for some k less than or equal to the number of states of A, which is less than p. Hence, this run will have period not divisible by p. Now we need to continue the run of A into the ζ part. The Xi will together be initially periodic of period m up to some point ∞ + z where one of them first diverges from its periodic behavior. Continue the Yi up to this point by throwing ∞ + w into Yi iff elements of the periodic portion of the ω part of the Yi are congruent to w modulo the collective period of the Yi. I claim that this still satisfies the transition relation, because the identical periods in the ω part satisfied the transition relation. After the point ∞ + z, continue the run inductively just following the transition function up until the end of the input. Proposition 39. Hp is a model of (4.A) for any A with fewer than p states. The proof is similar to the above, but in reverse. Proposition 40. Hp is not a model of WS1S. Proof. One can write a formula τp(X) which says: p−1 τp(X) := 0 ∈ X∧ Si(0) ∈/ X i=1 ∧∀x : Sp(x) ∈ X =⇒ x ∈ X. 91 That is, X contains 0 and exactly every pth element, up to some point. It is a fact of WS1S that ∀x : ∃y > x : ∃X : τp(X) ∧ y ∈ X. That is, for any x, we can always find such a periodic set containing an element larger than x. However, this is clearly not true of Hp for x in the ζ part. As such, we can find models of arbitrary finite subsets of our axiom system which are not models of the whole thing. Theorem 8. WS1S is not finitely axiomatizable. Proof. Suppose we have a finite axiomatization of WS1S. Take its conjunction to produce a single axiom φ for WS1S. Since our axiom system is complete, φ must follow from some finite subset of it. For some large enough p, Hp will satisfy this finite set of axioms. So φ is true of Hp. This contradicts our assumption that φ was an axiomatization for WS1S, since it is true of a structure which is not a model of WS1S. 3.16 Conclusion and Future Research As we can see from this example, this axiom system is well suited for proving that Hp is nearly a model of WS1S. Compare with an axiom system similar to that in [10], which would require proving induction for every WS1S formula. In chapter 4, we will use this axiom system to investigate what nonstandard models WS1S has. 92 It should be clear from our discussion in section 2 that we could just as easily be studying the weak second order theory of (ω, <). One may then ask about the weak second order theories, or just the second order theories in general, of other linear orderings. Is there, for instance, a notion of automaton that takes inputs which instead of being labeled finite segments of ω, were labeled intervals of Q as a linear ordering? Do such automata provide a canonical form for weak second order formulas of (Q, <)? Perhaps such automata, if they exist, would be related to continuous automata. One may also ask whether this axiomatization is minimal. An effort has been made to avoid redundancy among axioms (1.1) through (2.5), although whether there is some residual redundancy and how many of these axioms follow from axioms (3.A) or (4.A) are still matters for study. A proof of minimality would require producing a large number of structures which satisfy various subsets of our axiomatization. Finally, we saw some examples of relations on first order elements of WS1S that essentially required the use of second order quantification in their definitions. Specifically, < which played a critical role in our axiomatization, and modular congruence modulo a fixed natural number, which played a critical role in the construction of our nearly-models Hp. One may then ask if there are any more which cannot be expressed in terms of these two. Although WS1S is not nearly as expressive as larger axiomatized theories such as Peano Arithmetic or ZFC, it is easily decidable and its nonstandard models are tractable (yet still, as we will see in the next chapter, interesting). 93 CHAPTER 4 REGARDING NONSTANDARD MODELS OF THE WEAK SECOND ORDER THEORY OF ONE SUCCESSOR In this chapter, we answer two key questions about nonstandard models of the weak second order theory of one successor (WS1S): What can their firstorder parts look like, and what can their second order parts look like for a particularly interesting case (when the first order part is ω +ζ). We will also provide some tools for constructing nonstandard models of WS1S from already known nonstandard models. This will go a long way towards establishing a complete classification of the nonstandard models of WS1S. 4.1 Introduction Recall that the weak second order theory of one successor, WS1S, is the theory of the two-typed structure G := (ω, F(ω), ∈, S), where F(ω) is the set of finite subsets of ω, ∈ is containment, and S is the successor operation on ω. As usual, lowercase variables will denote elements of the first type, uppercase variables will represent elements of the second type, and boldface variables will indicate shorthand for finite or infinite sequences of elements of the corresponding type indexed by natural numbers. Crucially, these will not be indexed by first order elements from the current model of WS1S, but standard natural numbers. This theory is of interest because it is a decidable second order theory of ω, and equally expressive to finite automata. Recall that in the previous chapter, we proved that WS1S was not finitely axiomatizable, and provided an axiomatization with schema indexed by finite 94 automata. This axiomatization is well suited to verifying that various structures satisfy it and thus are nonstandard models of WS1S. A nonstandard model of WS1S is simply a structure which satisfies all of the same sentences as the standard (or ground) model G as defined above, but which is not the standard model. Nonstandard models are a useful tool for understanding the limits of the expressiveness of a language. Our nonstandard models will clearly differ from our ground model G, but not in the WS1S formulas they satisfy. That is, the ways that our nonstandard models differ from G or from each other will not be expressible in the language of WS1S. We expect to find a large number of nonstandard models of WS1S. The Lo¨ wenheim-Skolem Theorem says that if a countable first-order theory has an infinite model, then it has one of every infinite cardinality. Since we can view WS1S as the first-order theory of the two-typed structure G, this gives us nonstandard models of WS1S of every infinite cardinality. A priori, studying nonstandard models of second order theories seems like a hefty task: in a nonstandard model H, even if one has a grasp on the collection of first order objects, the second order objects are some nebulous collection related to the first order objects via H’s interpretation of ∈, ∈H. Fortunately, nonstandard models of second order theories satisfy extensionality, so the second order elements are determined by which first order elements they ∈H-contain. This allows us to identify second order elements with the collection of their ∈Helements. So up to isomorphism, a nonstandard model of WS1S will be of the form (M1, M2, ∈, S) where M2 is a collection of subsets of M1, ∈ is genuine containment, and S is some interpretation of S on M1. Combined with our understanding of nonstandard models of (ω, <), this will give us a very constructive 95 representative for every isomorphism class of nonstandard model of WS1S. Once we have this constructive form for representatives of isomorphism classes of nonstandard models of WS1S, we need to decide which structures of this constructive form actually are nonstandard models of WS1S. We will prove that for every nonstandard model L of (ω, <), there is a nonstandard model of WS1S whose first order elements, ordered by that model’s interpretation of < form the linear ordering L. Then we will examine what second order parts are available in the simplest nonstandard case: where the first order part is the linear ordering ω + ζ. Finally, we will provide tools for cutting apart nonstandard models of WS1S and gluing them back together to produce other nonstandard models. 4.2 Recalling our Axiomatization Recall that our axiomatization from Axiomatizing the Weak Second Order Theory of One Successor made heavy use of shorthand. In this section, we reintroduce this shorthand, and present a compact list of our axioms. Definition. A set X is said to be downward closed if: dc(A) :≡ ∀X : S(x) ∈ A =⇒ x ∈ A We define a linear ordering on the first order elements by: x ≤ y :≡ ∀X : dc(X) ∧ y ∈ X =⇒ x ∈ X 96 Definition. Given two sets A, B, we write: C = A ∪ B :≡ ∀x : x ∈ C ⇐⇒ x ∈ A ∨ x ∈ B C = A ∩ B :≡ ∀x : x ∈ C ⇐⇒ x ∈ A ∧ x ∈ B C = A \ B :≡ ∀x : x ∈ C ⇐⇒ x ∈ A ∧ x ∈/ B A ⊆ B :≡ ∀x : x ∈ A =⇒ x ∈ B Recall that we had a notion of automaton: Definition. A logical automaton is a 5-tuple A = (n, Q, I, δ, F ) where: • n is the number of inputs, potentially 0. • Q is a finite set of states. We may occasionally refer to the elements of Q via some canonical ordering as 0, . . . , |Q| − 1, but more often we will let the elements of Q retain some additional structure. We will also refer to the number of states of an automaton A as |A|. Often we will prove results for automata with state set m where this suffices to prove the result in general. • I ⊆ Q is the set of initial states. • δ ⊆ Q × 2n × Q is the transition relation. • F ⊆ Q is the set of final states. Definition. Define the formula R(n,Q,I,δ,F )(X0, . . . , Xn−1, , Y0, . . . , Y|Q|−1) in WS1S as the conjunction of: • ∀x : i∈Q j∈Q[x ∈ Yj ⇐⇒ (i = j ∧ 0 ≤ x ≤ )] • i∈I 0 ∈ Yi • ∀x : 0 ≤ x < =⇒ (i,s,k)∈δ(x ∈ Yi ∧ x ∈ Xs ∧ S(x) ∈ Yk) 97 This formalizes the notion of ( , Y0, . . . , Y|Q|−1) being a run of (n, Q, I, δ, F ) on input X0, . . . , Xn−1. Definition. Define the formula Φ(n,Q,I,δ,F )(X0, . . . , Xn−1) in WS1S as: ∃ : ∃Y0, . . . , Y|Q|−1 : = LSUB( X) ∧R(n,Q,I,δ,F )(X0, . . . , Xn−1, , Y0, . . . , Y|Q|−1) ∧ ∈ Yi. i∈F Where LSUB(X) is shorthand for the least strict upper bound for a set, either 0 for the emptyset or S(max(X)) for a nonempty set. This formalizes the notion of the automaton (n, Q, I, δ, F ) accepting inputs X0, . . . , Xn−1. Of course, not every automaton will have an accepting run on every input, but we can expect certain classes of automata to at least have runs on every input: Definition. An automaton (n, Q, I, δ, F ) is said to be runnable if: • |I| > 0, • ∀q, s ∈ 2n : ∃q : δ(q, s, q ). Definition. An automaton (n, Q, I, δ, F ) is said to be reverse-runnable if: • |F | > 0, • ∀q, s ∈ 2n : ∃q : δ(q , s, q). Finally, we can present our axiom system: 98 (1.1) ∀x, y, z : x < y ∧ x < z =⇒ x < z (< is transitive) (1.2) ∀x, y : x < y =⇒ ¬(y < x) (< is antisymmetric) (1.3) ∀x, y : x < y ∨ x = y ∨ y < x (< is total) (1.4) ∀x : S(x) > x (S is strictly increasing) (1.5) ∀x : y : x < y < S(x) (S is a successor) (1.6) ∃o : ∀x : o ≤ x (< has minimal element) (1.7) ∃o : ∀x : x = o =⇒ ∃y : S(y) = x (S has image ω \ {0}) (2.1) ∀X, Y : (∀x : x ∈ X ⇔ x ∈ Y ) ⇒ X = Y (Extensionality) (2.2) ∀x : ∃X : ∀y : y ∈ X ⇔ y = x (Singleton) (2.3) ∀X, Y : ∃Z : ∀x : x ∈ Z ⇔ (x ∈ X ∨ x ∈ Y ) (Union Closure) (2.4) ∀X, Y : ∃Z : ∀x : x ∈ Z ⇔ (x ∈ X ∧ x ∈/ Y ) (Difference Closure) (2.5) ∀X = ∅ : ∃x : x ∈ X ∧ ∀y : y ∈ X ⇒ y ≤ x (Maximal Element) (3.A) ∀X0, . . . , Xn−1 : Φ(n,Q,I,δ,Q)(X0, . . . , Xn−1) (A Runnable) (4.A) ∀X0, . . . , Xn−1 : Φ(n,Q,Q,δ,F )(X0, . . . , Xn−1) (A Reverse-Runnable) 4.3 L-Models In this section, we introduce the notion of an L-model, a very constructive form of nonstandard model of WS1S that we will use as representatives for isomorphism classes of our nonstandard models. Recall that nonstandard models of (ω, <) are of the form ω + ζ · L [9]. An L-model will be one whose first order part is ω + ζ · L. Definition. A two-typed structure (M1, M2, ∈, S) is said to be an L-model if: M1 = ω + ζ · L Where ζ = (Z, <), the linear ordering of the integers, L is a linear ordering, and · is the 99 lexicographic product, so: ω + ζ · L = ω + ζ + ··· + ζ, L Specifically, M1 contains elements of the form n ∈ ω, and elements of the form (a, b) with a ∈ Z and b ∈ L, with elements from ω ordered naturally, being less than elements of the form (a, b). Elements of the form (a, b) are ordered first by second coordinate and then by first coordinate. And: 1. M2 ⊆ P(M1), and ∈ is ordinary containment between M1 and M2, 2. S is ordinary successor on ω and takes (a, b) ∈ ζ × L to (a + 1, b), 3. Every set of the form {x|x < } is in M2. 4. The only sets X in M2 which, when intersected with each ω or ζ part form an initial (possibly empty) interval of that part are of the form {x|x < }. Proposition 41. In every L-model, < as defined on M1 agrees with < as defined in terms of downwards (predecessor) closed elements. Proof. Suppose that x ≤ y ∈ ω + ζ · L, and we have a predecessor closed set X containing y. Since X is predecessor closed, when intersected with any ω or ζ part, it will form a (possibly empty) initial interval. Hence X is of the form {w|w < }. By transitivity of <, x must also be in this set. Suppose now that every predecessor closed set X containing y contains x. Since our model contains {w|w ≤ x} and this is a predecessor closed set containing x it must also contain y. Hence y ≤ x. If a structure is an L-model, that represents a significant start towards proving that it satisfies all of our axioms. 100 Proposition 42. Every L-model satisfies axioms (1.1) through (1.7), and (2.1). Proof. Our condition that < as defined in terms of downward (predecessor) closed sets agrees with ≤ as the linear ordering on ω + ζ · L will go a long way. By virtue of being a linear ordering < will satisfy axioms (1.1), (1.2), and (1.3). One checks that when we interpret < as the linear ordering on ω + ζ · L and S as the natural successor operation on ω + ζ · L, axioms (1.4) through (1.7) are satisfied. Thus, when we interpret < as the equivalent linear ordering defined in terms of predecessor closed collections, and S as the natural successor operation on ω + ζ · L, axioms (1.4) through (1.7) are satisfied. Finally, (2.1) is satisfied by virtue of condition (1) and the fact that sets satisfy extensionality. Proposition 43. G is the unique 0-model of WS1S. (We use natural numbers n for the linear order of size n whose elements are 0, . . . , n − 1.) Proof. G clearly is a 0-model (one whose first order type is ω) which models WS1S. Suppose there is another 0-model G which is a model of WS1S. Any second order element of G will be a finite set of natural numbers, so one may construct a formula of WS1S explicitly stating that that second order element is an element of our model. As such the second order elements of G must contain as a subset the second order elements of G. Furthermore, G cannot contain any second order elements which are infinite subsets of ω, since these aren’t bounded, violating our maximal element axiom. As such, G = G. Proposition 44. Every model of WS1S is isomorphic to an L-model for some L. 101 Proof. Given a model (M1, M2, ∈, S), define an equivalence relation ∼ on M1 where: x ∼ y :≡ ∃n, m : Sn(x) = Sm(y). Note that ≤, as defined in terms of downward closed (ie, predecessor closed) elements of M2, respects these equivalence classes. That is, if x ≤ y and s ∼ x and y ∼ y, then x ≤ y or x ∼ y . Pick a representative of each equivalence class, including 0 (the unique element not in the image of S) for its equivalence class, and denote the set of representatives of other equivalence classes by L. L admits a natural linear ordering inherited by ≤ on M1, as per above. Every element of M1 is thus Sn(0) or Sn( ) or S−n( ) for some ∈ L, which we will map into in our isomorphism n ∈ ω or (n, ) or (−n, ) respectively. This clearly respects S. Condition (3) holds because these sets are definable. Condition (4) is equivalent to saying that all predecessor closed sets are of the form {x|x < }, which we can write in the language of WS1S and which is true of our ground model G, so is in the theory of WS1S. We will identify any element of M2 with the set of its ∈-elements. This association is injective by extensionality. It is not surjective. This clearly respects ∈. Of course, this isn’t an assurance that there are L-models for every L. In the case of nonstandard models of the first order theories of (ω, +) and (ω, +, ·), we can still define <, and there are some very strict restrictions on what sorts of linear order types < can have. Specifically, nonstandard models of (ω, +) must have linear order type ω + ζ · L for some dense L. We will address this restriction in the next section. 102 4.4 Minimal L-Models Recall that we have a notion of modular congruence in WS1S. Given a natural number n (not an element of our model), we can construct the formula: Definition. Given an L-model (M1, M2, S, ∈) of WS1S, we write µm(X, x) :≡ ∀y : Sm(y) ≤ x =⇒ [Sm(y) ∈ X ⇐⇒ y ∈ X] . And, for n < m, Mn,m(x) :≡ ∀Y : x ∈ Y ∧ µm(Y, x) =⇒ Sn(0) ∈ Y. Proposition 45. The following are part of the theory of WS1S: 1. For all Mn,m(x) ⇐⇒ M(n+1 mod m),m(S(x)). 2. For all m ∈ ω, x ∈ ω + ζ · L, exactly one of Mi,m(x) is true for i = 0, . . . , m − 1. 3. For all m, n, exactly one of Mn,m(Si(x)) for i = 0, . . . , m − 1. 4. For all m, n, Mn,m(x) ⇐⇒ Mn,m(Sm(x)). In agreement with G, we say that a first order element x in a specific model is congruent to n modulo m if Mn,m(x). Also, note that in an L-model, specifying the congruences of the elements of the form ( , 0) is enough to determine the congruences of all elements. What’s more, once we’ve specified L and these congruences, we have enough information to throw in just those second order elements which are absolutely necessary to construct an L-model. But first, we need a language for specifying our congruences: 103 Definition. A sequence {sn}n>2∈ω is said to be a congruence type if: • sn < n • if a|b then sb ≡ sa mod a. We will denote the set of all congruence types K. Note that the sequence 0 := {0}n>2 is a congruence type, although there are continuum many other such sequences1. Once we have a particular association of congruence types to all of our elements in mind, we can define modular equivalence classes for every element of ω + ζ · L: Definition. Given L, k : L → K, define Mnk,m:  (r ≡ n mod m) Mnk,m(x) :≡ (k( )m + r ≡ n mod m) x=r∈ω x = (r, ) If Mnk,m(x), we say that x is congruent to n modulo m. Alternately, for clarity and when k is given, we write x mod m = n, with the note that m and n are natural numbers, and not general elements of our models. Note that: Mkk( )m,m((0, )), That is, (0, ) is congruent to k( )m modulo m. 1One may choose remainders modulo primes arbitrarily, and the congruence type is determined completely by remainders modulo powers of primes (via the Chinese Remainder Theorem), which must themselves be interconsistent. Remainders modulo pa determine remainders modulo pb for b < a. 104 Definition. Given a linear ordering L, a map k : L → K, a tuple x0, . . . , xn−1 of elements of ω + ζ · L, a tuple m0, . . . , mn−2 of natural numbers > 1, and a tuple p0, . . . , pn−2 of subsets where pi ⊆ {0, . . . , mi − 1}, define: Pw(k, x, m, p) := : {y ∈ [xi, xi+1)|y mod mi ∈ pi}. i This is a second order element whose behavior is piecewise periodic. Each piece is an interval of the form [xi, xi+1), the mi and pi specify the periodic behavior on each of those intervals, and the map k describes how infinite periodic behavior which is terminal in one ω or ζ component gets picked up in the next. We will construct our model consisting entirely of second order elements of this form for arbitrary x, m, and p, but for fixed k. It’s necessary that we have a clear sense of the congruence class of each of our infinite elements: in order to be a model of WS1S, it is necessary that we are not lead to believe that some infinite element is congruent to for 0 modulo 2 but 1 modulo 4, since one can write out the statement: ∀x : M0,2(x) =⇒ ¬M1,4(x). In the language of WS1S, which is true of G and hence in the theory of WS1S. Definition. Given a linear ordering L and a map k : L → K, define PPL,k as: PPL,k := {Pw(k, x, m, p)}. And HL,k as the L-model with second order type PPL,k. There’s a little bit of work showing that HL,k is an L-model, which we will do in the proposition below. 105 As we shall see, since all of these elements are describable in terms of the endpoints of the intervals x0, . . . , xn−1, they’ll wind up existing in some form or other in every L-model of WS1S. Once we specify k, we have an exact set that must appear in our L-model of WS1S. What’s more, the presence of just these elements is enough to satisfy our axioms, giving us a model of WS1S. This intuitively makes sense for our runnability schemes: given any deterministic automaton, on piecewise periodic input, we should expect a piecewise periodic run. If we have a long segment of periodic behavior of the input, after some amount of time settling in, the states of the automaton will also be periodic, of some period a factor of the original period times a number less than the number of states of our automaton. The time spent settling in is finite, so we just encode these all as pieces of period one. Theorem 9. HL,k is an L-model of WS1S. Proof. First, we need to show that HL,k is an L-model. Specifically, we need to show that it satisfies conditions (3) and (4). Condition (3) holds by construction. Suppose now that a set X is piecewise periodic and initial in every ω or ζ part. Assume there is some piece in the specification of X, [xi, xi+1) such that [xi, xi+1) ∩ X = [xi, xi+1) and [xi, xi+1) ∩ X = ∅. Consider the largest such piece. Looking in the last two periods of this piece, we can find an element x such that x ∈/ X but S(x) ∈ X, contradicting that X is initial in the ω or ζ part in which x appears, contradicting our assumptions. As such, every periodic piece of X is either the complete interval or empty. If an empty interval were followed by a complete interval, then we would again have an x ∈/ X with S(x) ∈ X, so it must be that X consists of a series of complete intervals followed by a series of empty intervals. That is, X is an interval of the form {x|x ≤ } for the weak right endpoint of the largest nonempty interval. 106 As such, we’ve already checked axioms (1.1) through (1.7) and (2.1). (2.2): It is clear that singletons are piecewise periodic. (2.3), (2.4): Let x0, . . . , xn−1 be a common refinement of the endpoints of the intervals describing piecewise periodic sets X, Y . Then X and Y can also be described as piecewise periodic with pieces [xi, xi+1). On any one particular piece [xi, xi+1), X and Y will have certain periodic behaviors of periods m, m . One can then verify that X ∩ Y and X \ Y will have periodic behavior of period some factor of mm on that particular piece. Hence, X ∩ Y and X \ Y are piecewise periodic, and thus in our model. (2.5) Given a piecewise periodic set X, we can find its largest element by looking in the last period of the last nonempty piece. (3.A) Given a runnable automaton A, let A be a deterministic automaton obtained by removing tuples from the transition relation of A. We will show that A has a run on any suitable input X0, . . . , Xn−1, and thus so does A. Given our inputs X0, . . . , Xn−1 piecewise periodic, we can find a common refinement of the endpoints of their pieces y0, . . . , yr−1, and common periodicities m0, . . . , mr−2. We can also make sure y0 = 0 by possibly adding in new empty pieces to the beginning of the specifications of our inputs. Iterating through the intervals [yi, yi+1), do the following: Step 1) Let ri : {0, . . . , mi − 1} → 2n describe the periodic behavior of the inputs such that for w ∈ [yi, yi+1), w ∈ Xj ⇐⇒ (ri(w mod mi))j = 1. Step 2) Consider the state of the automaton A just after entering the region 107 [yi, yi+1). If this is the first region, this is the start state. If this is not the first region, this was computed when we worked through the previous region. Call this state q. Let m = mi. Step 3) Consider the operation of one period (m characters) of input on states, giving us a map f : Q → Q. Step 4) Apply f to q repeatedly until a state is encountered that appeared before. This will give us a sequence of states: q = q0, q1, . . . , qs, . . . , qt−1, qt = qs. Note that t cannot be larger than |Q|, since there are only |Q| distinct states. Step 5a) In case Smt(yi) ≥ yi+1, append to our specification of each element of the run the interval [yi, yi+1) with period mt and periodic behavior just the appropriate behavior for a run of an automaton on the input in the interval [yi, yi+1). If this happens, skip step 6) and go on to the next interval, remembering the state of the automaton after reading the final character. Step 5b) Otherwise, append to our specification of each element of the run the interval [yi, Sms(yi)) with period ms and periodic behavior just the appropriate behavior for the run of an automaton on the input in the interval [yi, Sms(yi)) starting from state q. Step 6) Now, append to our specification of each element of the run the interval [Sms(yi), yi+1) with period m(t−s) and periodic behavior the appropriate behavior for the run of an automaton on the input in the interval [Sms(yi), Smt(yi)), starting from state qs. Since the state this interval starts in and the state immediately after the end are the same, this is a valid part of the run of A. Based on the 108 remainder of yi+1 modulo m(t − s), determine the state of the automaton after after reading the final character just before position yi+1. Repeat from step 2) for each piece [yi, yi+1). This will construct the desired run of A on input X0, . . . , Xn−1. The proof for (4.A) for reverse-runnable A is identical, but in reverse. Proposition 46. Every L-model of WS1S contains HL,k as a subset of its second order objects for the appropriate k. Proof. Given an L-model of WS1S, we can determine k by looking at the congruence classes of elements of the form (0, ). It then suffices to note that the elements Pw(k, x, m, p) are describable in terms of the elements x0, . . . , xn−1. Since such elements exist for every increasing sequence x0, . . . , xn−1 in the ground model G, they must exist in every nonstandard model as well. Finally, we must ask the question: how many models have we actually found? Specifically, when is HL,k isomorphic to HL ,k ? Definition. Two congruence types s, s are said to be shifts if there is some k such that sn = sn + k mod n for each n. Proposition 47. HL,k and HL ,k are isomorphic iff there is an isomorphism of linear orderings f : L → L such that k( ) and k (f ( )) are shifts for each . Proof. (⇒): Suppose HL,k and HL ,k are isomorphic. Then there is a bijection g : ω + ζ · L → ω + ζ · L and a bijection h : PPL,k → PPL ,k . Recall that we defined an equivalence relation on ω + ζ · L by: x ∼ y :≡ ∃n, m : Sn(x) = Sm(y). 109 Since g respects S, g respects these equivalence classes, giving us a bijection g : L → L . g must also respect <, since g respects <. This is the isomorphism f we’re looking for. Suppose f ((0, )) = (z, ). will be g ( ). By isomorphism, (0, ) and (z, ) will have the same modular equivalence classes, so (0, ), and (0, ) will have shifted modular equivalence classes, as desired. (⇐) is a straightforward construction of the isomorphism. Proposition 48. There are continuum many 1-models. Proof. Note that there are continuum many congruence types, and that the shift relation has countable equivalence classes. Hence there are continuum many congruences types up to shifting. The result then follows by the above proposition. Now that we’ve shown that there are nonstandard models of WS1S, we would like a tool for classifying these nonstandard models. 4.5 The Tail-Head Lemma Suppose we have a particular 1-model M of WS1S, and two second order objects X, X from M which agree on the ω part. Since the symmetric difference of X, X is an element of M , it has a least element. This means that there is some point in the ζ part such that before that point, X and X agree. Put another way, the behavior of a second order element of M on the ω part determines an initial part of its behavior on the ζ part. In this section, we formalize and expand on this result. 110 Definition. Given a linear ordering L, a cut is just a partition of L into two subsets (A, B) where A is an initial segment of L and B is a final segment of L. A cut (A, B) is said to be infinite if A has no greatest element and B has no least element. Proposition 49. Any infinite cut of ω + ζ · L is of the form (ω + ζ · A, ζ · B) for some cut (A, B) of L. Proposition 50. Given an L-model (M1, M2, ∈, S) of WS1S and an infinite cut (A, B) of M1, A, B ∈/ M2. Proof. Since (M1, M2, ∈, S) is an model of WS1S, every second order element must have a <-least and <-greatest element, which by assumption neither A nor B has. Proposition 51. Let (M1, M2, ∈, S) be an L-model of WS1S. Let (A, B) be an infinite cut of M1. Suppose X, Y ∈ M2 are second order objects, and x ∈ A and X and Y agree for all y ∈ A with y ≥ x. Then X and Y agree on some initial segment of B. Proof. Let X∆Y denote the symmetric difference of X and Y . By closure under boolean operations, X∆Y ∈ M2. Additionally, X∆Y \ {y|y ≤ x} ∈ M2, as this later set is the downward closure of the singleton {x}. This set, X∆Y \ {y|y ≤ x} ∈ M2, is either empty or has a <-smallest element. In either case, it doesn’t contain elements up to some point in B. Put another way, X and Y agree up to that point in B. Proposition 52. Let (M1, M2, ∈, S) be an L-model of WS1S. Let (A, B) be an infinite cut of M1. Suppose X, Y ∈ M2 are second order objects and x ∈ B and X and Y agree for all y ∈ B with y ≤ x. Then X and Y agree on some final segment of B. Proof. The proof is a reverse version of the above. 111 Definition. Given an L-model with an infinite cut (A, B) on its first order part, we define equivalence relations =AT (tail equivalence) and =BH (head equivalence) on M2, and more generally, on P(M1) by: X =AT Y :≡ ∃x ∈ A : ∀y ∈ A : y ≥ x =⇒ y ∈/ X∆Y, X =BH Y :≡ ∃x ∈ B : ∀y ∈ B : y ≤ x =⇒ y ∈/ X∆Y. Where ∆ denotes symmetric difference. Specifically, X and Y are tail equivalent if they agree on a final interval of A and are head equivalent if they agree on an initial interval of B. We can now reinterpret our propositions from before as saying that given an infinite cut (A, B) the tail equivalence class of a second order object determines its head equivalence class, and the head equivalence class determines its tail equivalence class. Definition. Given an L-model with an infinite cut (A, B) on its first order part, we define its tail-head function TH(A,B): TH(A,B) : M2/(=AT ) → M2/(=BH ), Such that, for any X ∈ M2, TH(A,B)([X]=AT ) = [X]=BH . Additionally, we may wish to view TH(A,B) as a partial function: TH(A,B) : P(A)/(=AT ) → P(B)/(=BH ). Following from our propositions above, we have: Proposition 53. TH(A,B) is well-defined and 1-1. 112 Definition. We say that two sets X, Y are equal or congruent mod finite (written X =∗ Y ) if their symmetric difference is finite. Theorem 10. Two second order elements X, Y of an L-model have the same head and tail equivalence classes for every infinite cut iff they are equal mod finite. Proof. (⇒): suppose we have two sets X, Y which have the same head and tail equivalence classes, but which are not equal mod finite. Consider those points in Z = X∆Y , which must be infinite. As mentioned in [9], every infinite linear ordering has an infinite ascending sequence or an infinite descending sequence. Assume Z with its inherited ordering has an infinite ascending sequence x0, . . . , xn, . . . (the argument in the descending case is similar, although it is important to note that no infinite descending sequence can intersect the ω part). Split L into two parts, A = { ∈ L : ∃n : xn > }, and B = L \ A (which may be empty, but that isn’t disallowed). It can easily be seen that (A, B) is an infinite cut. I assert that X and Y do not have the same =AT equivalence class, since given any x it’s not the case that X and Y agree on A after x by construction of A in terms of points where X and Y differ. (⇐) is immediate for each cut (A, B). There are only finitely many points to avoid, so one may easily avoid them. Finally, in this section we present the Tail-Head Lemma, which states that an L-model is determined by the set of its tail-head functions. But first, in order to prove this, we need a quick lemma about linear orderings in general: Definition. Given a linear ordering L, an infinite collection of intervals I is said to be a cut covering family if it satisfies: 113 • For each interior cut (A, B) (ie, A, B = ∅) of L, there is an interval I in I such that I contains some elements of A and some elements of B. • For the exterior cuts (∅, L) and (L, ∅), there are intervals I in I which are initial and nonempty and final and nonempty respectively. In either case, the interval is said to cover the cut. Proposition 54. For any linear ordering L and cut covering family I for L, there is a finite subcollection I ⊂ I having union all of L. Proof. Suppose we have an infinite cut covering family I for L. Consider S as the set of all x ∈ L such that the interval {y ∈ L|y ≤ x} is a subset of a finite union of elements of I. This set S is nonempty, as the interval covering the cut (∅, L) is initial, and thus alone covers a number of intervals of the form {y ∈ L|y ≤ x}. S is also an initial interval of L, since if we can cover {y ∈ L|y ≤ x} and w < x, then the same set of intervals covers {y ∈ L|y ≤ w}. Suppose S is not all of L. Consider the cut (S, L \ S). This cut is covered by some interval I, which contains an element s ∈ S and t ∈/ S. Add this interval to the finite cover for s and we get get a finite set of intervals covering up to t contradicting that t ∈/ S. Thus, S is all of L. Finally, consider the final cut (L, ∅). This is covered by a final interval I containing a point x. Thus all of L is covered by the finite set of intervals covering up to x together with I. Now we are prepared to prove the Tail-Head Lemma. Theorem 11 (Tail-Head Lemma). For given L, the set of second order elements of any L-model is determined by the set of its tail-head functions TH(A,B) for every infinite cut 114 (A, B) of ω + ζ · L. Proof. Suppose we have two L-models H = (ω + ζ · L, M2, ∈, S) and H = (ω + ζ · L, M2, ∈, S), with the same tail-head functions for every infinite cut. Let X ∈ M2. I claim that X is also in M2. This would suffice to prove the lemma. For every infinite cut (A, B), consider an element Y in M2 which agrees with X on its head and tail equivalence classes for that cut. In particular, there’s some elements a ∈ A and b ∈ B such that X ∩ [a, b] = Y ∩ [a, b]. Note that [a, b] is an element of M2 and M2 so X ∩[a, b] is an element shared in common by H and H . We collect these intervals as part of a cut covering family I with [a, b] covering the cut (A, B). For every finite cut (A, B), let a be the final element of A and b the initial element of B. Throw in the interval [a, b] to I to cover (A, B). Note that X ∩ [a, b] is in both M2 and M2. For the cut (∅, ω + ζ · L), we cover it by throwing the interval {0} into I. Note that X ∩ {0} is in both M2 and M2. For the cut (ω + ζ · L, ∅), let x be an upper bound for X. Cover our cut by throwing the interval {y|y > x} into I. Note that X ∩ {y|y > x} = ∅ is in both M2 and M2. All together, this gives us a complete cut covering family I. By the proposition above, there is a finite subcollection I ⊆ I having union all of ω + ζ · L. Take the union over all intervals I in I of X ∩ I. This is a finite union of elements of M2, hence an element of M2. However, since all of these intervals together cover all of ω + ζ · L, this union is X. Thus X is in M2, as desired. 115 One necessarily asks the question: what sorts of tail-head functions are available? Taking the simplest example, one may ask what possible tail-head functions are available for the (ω, ζ) cut for 1-models of WS1S. In the next section, we propose a partial solution. 4.6 Countable 1-Models Looking at the L-models HL,k that we constructed previously, we notice that the possible tail equivalence classes for the cut (ω, ζ · L) are just the periodic ones. Put another way, the set of all intersections of second order elements with the ω part is just the set of all eventually periodic subsets of ω. One may then wonder if, given a particular subset of ω, there is a model of WS1S that has second order elements, which, when intersected with the ω part, give us that particular subset. Or, more generally, if one has a suitably closed collection of subsets of ω, is there a model of WS1S for which that collection is exactly the set of intersections of second order elements with the ω part? One can of course get whatever subsets one wants with a compactness argument: Proposition 55. Given a collection A of subsets of ω, there is a nonstandard model of WS1S such that for every A ∈ A, there is a second order element X in the model whose intersection with ω is A. Proof. For each element A ∈ A, add the second order constant symbol CA to the language of WS1S. For each A ∈ A and n ∈ ω, add either the formula Sn(0) ∈ CA or Sn(0) ∈/ CA as appropriate to the theory of WS1S. Together this 116 produces a collection of formulas, any finite subset of which is satisfiable (if one has only finitely many restrictions on CA, one may construct a finite set of natural numbers satisfying those restrictions). By the compactness theorem, it follows that there is a model of WS1S with constants CA|A ∈ A for which the intersection CA ∩ ω = A. However, the proof of the compactness theorem makes no assurances on the sort of models being constructed. Following the proof in [8], we first complete the collection of formulas in question, and then perform a Henkin construction to get our model. If one of our sets A in A is unbounded, we will need to introduce a new first order object, call it ∞, to serve as an upper bound for CA. Due to the arbitrariness of our choices in how we complete our collection of formulas, it may be the case that we wind up requiring that CA and CA agree on elements of the form S−n(∞) for all n for A and A with different tail equivalence classes. Every time this happens, we are required to add in a genuinely new first order element ∞ to represent an upper bound for (CA∆CA ) ∩ {x : x ≤ z}, as well as elements of the form Sz(∞) for integer z. As we can see, the L for these models can, unchecked, grow very large. What if we wish to restrict our attention just to 1-Models? In case of countable A, we can still find our nonstandard model. For clarity, we will refer to elements in the ζ portion of a 1-model as ∞ + z for appropriate z. Proposition 56. Any 1-model is determined by its tail-head function TH(ω,ζ). Proof. ω + ζ only has two infinite cuts, (ω, ζ) and (ω + ζ, ∅). In the later case, we already know the tail head function. There is only one tail equivalence class, that of the empty set, since all second order elements are bounded, and it gets 117 mapped to the only head equivalence class, which is trivial. Definition. Given three sets X−, X0, X1 partitioning ω, define: Bit(X−, X0, X1) = (Y0, Y1), Such that x ∈ Yi if the largest y ≤ x which is in X0 ∩ X1 is in Xi, or 0 if none exist. Define: RBit(X−, X0, X1) = (Y0, Y1), Such that: x ∈ Yi if the smallest y ≥ x which is in X0 ∩ X1 is in Xi, or 0 if none exist. Definition. Given sets Xσ for σ ∈ Sn partitioning ω, define: Accn(X) = Y, Where Y is also a collection indexed by elements of Sn, and x ∈ Y i x (S is strictly increasing) (1.5) ∀x : y : x < y < S(x) (S is a successor) (1.6) ∃o : ∀x : o ≤ x (< has minimal element) (1.7) ∃o : ∀x : x = o =⇒ ∃y : S(y) = x (S has image ω \ {0}) (2.1) ∀X, Y : (∀x : x ∈ X ⇔ x ∈ Y ) ⇒ X = Y (Extensionality) (2.2) ∀x : ∃X : ∀y : y ∈ X ⇔ y = x (Singleton) (2.3) ∀X, Y : ∃Z : ∀x : x ∈ Z ⇔ (x ∈ X ∨ x ∈ Y ) (Union Closure) (2.4) ∀X, Y : ∃Z : ∀x : x ∈ Z ⇔ (x ∈ X ∧ x ∈/ Y ) (Difference Closure) (2.5) ∀X = ∅ : ∃x : x ∈ X ∧ ∀y : y ∈ X ⇒ y ≤ x (Maximal Element) (3.A) ∀X0, . . . , Xn−1 : Φ(n,Q,I,δ,Q)(X0, . . . , Xn−1) (A Runnable) (4.A) ∀X0, . . . , Xn−1 : Φ(n,Q,Q,δ,F )(X0, . . . , Xn−1) (A Reverse-Runnable) 137 BIBLIOGRAPHY [1] Bu¨ chi, J. R. (1990). On a Decision Method in Restricted Second Order Arithmetic. The Collected Works of J. Richard Bu¨ chi, 425-435. [2] Downey, R. G., & Fellows, M. R. (1999). Parameterized complexity. New York: Springer. [3] Egri-Nagy, A., & Nehaniv, C. (2005). Algebraic Hierarchical Decomposition of Finite State Automata: Comparison of Implementations for KrohnRhodes Theory. Implementation and Application of Automata Lecture Notes in Computer Science, 315-316. [4] Egri-Nagy, A., & Nehaniv, C. (2006). Making Sense of the Sensory Data Coordinate Systems by Hierarchical Decomposition. Lecture Notes in Computer Science Knowledge-Based Intelligent Information and Engineering Systems, 333340. [5] Ginzburg, A. (1968). Algebraic theory of automata. New York: Academic Press. [6] Kleene, S. C.: 1956, ‘Representation of Events in Nerve Nets and Finite Automata’, in C. E. Shannon and J. McCarthy (eds.), Automata Studies, Princeton University Press, Princeton, NJ, pp. 3-42. [7] Krohn, K., & Rhodes, J. (1965). Algebraic Theory of Machines. I. Prime Decomposition Theorem for Finite Semigroups and Machines Transactions of the American Mathematical Society, 450-450. [8] Marker, D. (2002). Model theory: An introduction. New York: Springer. [9] Rosenstein, J. G. (1982). Linear orderings. New York: Academic Press. [10] Siefkes, D. (1978). An axiom system for the weak monadic second order theory of two successors. Israel J. Math. Israel Journal of Mathematics, 30(3), 264-284. 138