NONSTANDARD MODELS OF THE WEAK SECOND ORDER THEORY OF ONE SUCCESSOR
A Dissertation Presented to the Faculty of the Graduate School
of Cornell University in Partial Fulﬁllment of the Requirements for the Degree of
Doctor of Philosophy
by Thomas Robert Kern
May 2016

c 2016 Thomas Robert Kern ALL RIGHTS RESERVED

NONSTANDARD MODELS OF THE WEAK SECOND ORDER THEORY OF ONE SUCCESSOR
Thomas Robert Kern, Ph.D. Cornell University 2016
In this paper, we seek to classify the nonstandard models of the weak (monadic) second order theory of one successor (WS1S), the theory of the two-typed structure consisting of natural numbers and ﬁnite sets of natural number equipped with a successor (+1) operation, and membership relation between the two types. This will require an automata-theoretic approach. Chapter 1 will provide an introductory background to the intersection of automata theory and model theory. In chapter 2, we use the Krohn-Rhodes Theorem and an observation about the powerset determinization construction to prove what will be essentially be a quantiﬁer-elimination type result for our theory, although since the result is of independent interest, it will be presented in a more general automata-theoretic context as a generating set for the regular functions under composition. In chapter 3, we provide an axiomatization for WS1S. In chapter 4, we apply this axiomatization to prove a number of key theorems regarding nonstandard models of WS1S. This includes a classiﬁcation of the possible ﬁrst order types, tools for completely classifying nonstandard models, an exploration of countable nonstandard models with the simplest nontrivial ﬁrst order type, and a tool for producing new nonstandard models given old nonstandard models.

BIOGRAPHICAL SKETCH Thomas Kern received a Bachelor of Arts from Dartmouth College in 2009. There he served in various leadership positions for the Dartmouth Math Society. He participated in the Budapest Semesters in Mathematics program in Fall of 2007. This thesis represents his efforts towards obtaining a Ph.D. in Mathematics from Cornell University in May 2016. He will additionally be receiving a Masters in Computer Science from Cornell University in May 2016.
iii

To those who helped me along my journey, and to those who kept me company along the way.
iv

ACKNOWLEDGEMENTS This material is based upon work supported by the National Science Foundation Graduate Research Fellowship under Grants No. DMS-0852811 and DMS1161175. This work would not have been possible without encouragement and guidance from my advisor, Anil Nerode. I would also like to thank Scott Messick for valuable discussions on the theory of ﬁnite automata.
v

TABLE OF CONTENTS

Biographical Sketch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix

1 Introduction

1

2 Generating the Regular Functions and Krohn-Rhodes

3

2.1 Moore Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2.2 Regular Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.3 Determinization and Harvesting . . . . . . . . . . . . . . . . . . . 13

2.4 Length Modiﬁcation . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.5 The Krohn-Rhodes Theorem . . . . . . . . . . . . . . . . . . . . . . 24

2.6 A Further Breaking Down . . . . . . . . . . . . . . . . . . . . . . . 36

2.7 Reverse Moore Machines . . . . . . . . . . . . . . . . . . . . . . . . 40

2.8 Removing RASn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

2.9 Further Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

3 Axiomatizing the Weak Second Order Theory of One Successor

48

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

3.2 Axiomatizing (ω, <) . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

3.3 Basic Set Axioms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

3.4 Logical Automata . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

3.5 Basic Consequences of Our Axioms . . . . . . . . . . . . . . . . . . 61

3.6 Determinization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

3.7 Complementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

3.8 Conjunction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

3.9 Bounding and Unbounding . . . . . . . . . . . . . . . . . . . . . . 73

3.10 Permutation and Introduction of Variables . . . . . . . . . . . . . 78

3.11 Projection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

3.12 Base Case Automata . . . . . . . . . . . . . . . . . . . . . . . . . . 83

3.13 The Weak Exclusively Second Order Theory of One Successor . . 84

3.14 0-ary Automata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

3.15 Finitely Many Axioms Do Not Sufﬁce . . . . . . . . . . . . . . . . 88

3.16 Conclusion and Future Research . . . . . . . . . . . . . . . . . . . 92

4 Regarding Nonstandard Models of the Weak Second Order Theory of

One Successor

94

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

4.2 Recalling our Axiomatization . . . . . . . . . . . . . . . . . . . . . 96

vi

4.3 L-Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 4.4 Minimal L-Models . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 4.5 The Tail-Head Lemma . . . . . . . . . . . . . . . . . . . . . . . . . 110 4.6 Countable 1-Models . . . . . . . . . . . . . . . . . . . . . . . . . . 116 4.7 Cut-And-Paste Models . . . . . . . . . . . . . . . . . . . . . . . . . 129 4.8 Conclusion and Further Research . . . . . . . . . . . . . . . . . . . 134

A Appendix A: List of Axioms

137

Bibliography

138

vii

LIST OF TABLES 2.1 Sample Application of an Ordinal Removal Sequence . . . . . . . 32 2.2 Comparing ASn and RASn on Sample Input . . . . . . . . . . . . 43 2.3 The Action of RTrunc on Cwpair(Mask(w), ASn(w)) . . . . . . . . . 44 2.4 Producing RASnTrunc on Sample Input . . . . . . . . . . . . . . . . 45 3.1 Transition Diagram for the Determinization and Harvester Near
the Dummy State . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 3.2 Transition Table for ASub . . . . . . . . . . . . . . . . . . . . . . . . 83 3.3 Transition Table for ASuc . . . . . . . . . . . . . . . . . . . . . . . . 84 4.1 Transition Table for Bit . . . . . . . . . . . . . . . . . . . . . . . . 119 4.2 Transition Table for RBit . . . . . . . . . . . . . . . . . . . . . . . . 119
viii

LIST OF FIGURES 2.1 Alignment Convention for Moore Machines . . . . . . . . . . . . 5 2.2 Behavior of Tuplefy on Sample Input . . . . . . . . . . . . . . . . 10 2.3 Alignment Convention for Reverse Moore Machines . . . . . . . 15 2.4 Transition Diagram for the Automaton Mj in our Cascade . . . . 34
ix

CHAPTER 1 INTRODUCTION
In this series of papers, we utilize the Bu¨ chi Theorem, which formalizes the expressive equivalence between ﬁnite state automata and weak monadic second order logic, to explore model theoretic properties of the weak (monadic1) second order theory of one successor. Speciﬁcally, we will try to answer a number of questions about the nonstandard models of this theory.
The study of ﬁnite automata began in the 1950s with Stephen Kleene’s work on Nerve Nets[6], an early form of neural network. Their connection to monadic second order logics was pointed out by Bu¨ chi[1] in the 1960s.
Since that time, four key equivalence theorems have been proved. Respectively, ﬁnite automata, Bu¨ chi automata, ﬁnite tree automata, and Rabin automata are expressively equivalent to the weak second order theory of one successor, the second order theory of one successor, the weak second order theory of two successors, and the second order theory of two successors. The theory of one successor has ﬁrst order part ω, the natural numbers, with the one successor operation +1. The theory of two successors has ﬁrst order part {0, 1}∗, the set of ﬁnite strings of 0s and 1s, with operations S0 and S1 which respectively tack a 0 and a 1 onto the end of a word. A weak second order theory has, in addition to quantiﬁcation over ﬁrst order objects, quantiﬁcation over ﬁnite sets of ﬁrst order objects. The (full) second order theory has quantiﬁcation over arbitrary sets of ﬁrst order objects.
Automata theory is particularly fruitful in terms of equivalence theorems:
1Throughout this Dissertation, we will only consider monadic second order theories, so the term monadic will be omitted.
1

notions of regular expression exist for these classes, various modiﬁcations of these automata are known to be expressively equivalent, and we have the Myhill-Nerode Theorem, which provides an abstract characterization of these languages.
2

CHAPTER 2 GENERATING THE REGULAR FUNCTIONS AND KROHN-RHODES
In this chapter we seek to produce a small generating set for the regular functions, those functions of words which can be expressed in the regular word logic. The main result is that essentially any regular function can be computed by a two-pass process: ﬁrst scanning the input from left to right, leaving behind some data, and then scanning the input and data left behind from right to left, outputting the value of the function as we go. We then provide an interpretation of the Krohn-Rhodes theorem which allows us to break up our generating set even further. For the convenience of the reader, a simpliﬁed version of the proof of the Krohn-Rhodes Theorem from Ginzburg [5] is also presented.
2.1 Moore Machines
The Krohn-Rhodes Theorem concerns itself with ﬁnite state transducers, an abstraction of systems that:
• Accept inputs from a discrete set at discrete times, • Retain some memory about previous inputs, which updates whenever an
input is read, • For each input read, produce some output from a discrete set based on the
input and memory.
This provides us with an abstraction of synchronous (as opposed to those that update continuously), digital (as opposed to those that deal with analog values)
3

systems.
Originally proved in [7], the Krohn-Rhodes theorem itself allows us to decompose arbitrary ﬁnite state transducers into a cascade of transducers from a small generating set. Computational implementations of this decomposition are available [3]. The Krohn-Rhodes Theorem can be used to analyze the rough behavior of automata, providing applications to Artiﬁcial Intelligence [4].
We formalize ﬁnite state transducers as follows: Deﬁnition. A Moore Machine is a tuple: (Σ, Q, q0, Γ, δ, ).
• Σ is a ﬁnite set of input characters (alphabet). • Q is a ﬁnite set of states. • q0 ∈ Q is the initial state. • Γ is a ﬁnite set of output characters. • δ : Q × Σ → Q is the transition function. • : Σ → Γ is the output function.
For convenience, we denote the map x → δ(a, x) by δa. Given an input word w ∈ Σ∗, we construct a run r and output o such that:
• r[0] = q0. • r[i + 1] = δ(r[i], w[i]) for 0 ≤ i < |w|. • o[i] = (r[i]), for 0 ≤ i ≤ |w|.
Where our indexing notation is such that: w = w[0], . . . , w[|w| − 1].
4

The correct way of thinking about Moore Machines is that each input acts as a transition between one state and the next, or that each state is the state between inputs. A proper representation would have input characters half a step offset from states. Outputs are simply a product of the state the automaton is in and so should be in step with the states. However, representing the sequences of input characters and states as words requires a choice of direction to shift half a step. A considerable effort has been made to pick the option to result in the cleanest presentation. In this paper, the input character w[i] tells the device how to transition from state r[i] to state r[i + 1] 1.
Example 1. Here we show below how inputs, states, and outputs, respectively, line up according to our notation.
abbba q0 q1 q2 q1 q2 q3 o0 o1 o0 o1 o0 o2
Figure 2.1: Alignment Convention for Moore Machines
We will typically consider Moore Machines that simply output their states:
Deﬁnition. A Moore Machine is said to be transparent if Γ = Q and is the identity. In this case, we present our Moore Machine as a tuple: (Σ, Q, q0, δ).
We can interpret Moore Machines as functions from input words to output words:
Deﬁnition. Given a Moore Machine M = (Σ, Q, q0, Γ, δ, ), and word w ∈ Σ∗, we denote by M (w) the output of M on input w, in Γ∗.
1This is as opposed to the input character w[i] telling the device how to transition from state r[i − 1] to state r[i]. This alternative is not uncommon.
5

Moore Machines, however, represent only a small subset of those functions on words which we can reason about using ﬁnite automata. Deﬁnition. Given two alphabets, Σ, Γ, a function on words f : Σ∗ → Γ∗ is said to be:
• length-preserving if for any word w, |f (w)| = |w|. • causal if f (w)[i] depends only on w[0], . . . , w[i]. • strictly causal if f (w)[i] depends only on w[0], . . . , w[i − 1]. • character-wise if f (w)[i] depends only on w[i]. Deﬁnition. Deﬁne the following useful functions for dealing with words:
• Let Sa denote the successor-a function which appends the character a to the end of a word.
• Let Trunc denote the function which removes the last character of a word. • Let Rest denote the function which removes the ﬁrst character of a word. Proposition 1. Given a Moore Machine M , the function M computes is strictly causal, and increases length by 1.
We may ﬁnd it more convenient to deal with length preserving versions of this function: Deﬁnition. Given a Moore Machine M = (Σ, Q, q0, Γ, δ, ), and word w ∈ Σ∗, we denote by:
• M Trunc(w) the output of M on input w with the last state removed. • M Rest(w) the output of M on input w with the ﬁrst state (the start state) removed.
6

Proposition 2. Given a Moore Machine M , the function M Trunc is strictly causal and length preserving, and the function M Rest is causal and length preserving.
In the next section, we will explore other kinds of functions of which can be reasoned about using ﬁnite automata.
2.2 Regular Functions
The notion of ﬁnite automaton, or ﬁnite state recognizer, is more commonly studied than the ﬁnite state transducer. Finite state automata are an abstraction of systems that:
• Accept inputs from a discrete set at discrete times, • Retain some memory about previous inputs, which updates whenever an
input is read, • Having ﬁnished reading a sequence of inputs, either accepts or rejects.
Just as ﬁnite state transducers can be interpreted as functions on words, ﬁnite state automata can be interpreted as predicates on words, returning a boolean value after having read in a word. Their predicative nature means that ﬁnite state automata are more convenient to use in applications to formal logic. On the other hand, real world systems are more often interested in transforming inputs, and so are better represented by ﬁnite state transducers.
Finite automata give us a notion of a regular set or regular event, a collection of words or sequences of inputs which are exactly those which some ﬁnite
7

automaton accepts. Once we deﬁne what it means to represent a function of words with an automaton, this will give us a notion of regular function.
We formalize ﬁnite state recognizers as follows: Deﬁnition. A ﬁnite state automaton is a tuple: (Σ, Q, I, δ, F ).
• Σ is a ﬁnite set of input characters (alphabet). • Q is a ﬁnite set of states. • I ⊆ Q is the set of initial states. • δ : Q × Σ → Q is the transition relation. • F ⊆ Q is the set of ﬁnal states.
Given an input word w ∈ Σ∗, we say that r ∈ Q∗ is a run on input w if:
• r[0] ∈ I. • δ(r[i], w[i], r[i + 1]) for 0 ≤ i < |w|.
We say that w is accepted if there is a run r on input w such that r[|w|] ∈ F . Deﬁnition. A ﬁnite state automaton A = (Σ, Q, I, δ, F ) is deterministic if:
• I is a singleton. • δ is a function from Q × Σ → Q, that is, given a q ∈ Q, and a ∈ Σ, there is a
unique q ∈ Q such that δ(q, a, q ).
By default, we say that A is nondeterministic.
8

A well known theorem of ﬁnite automata is that:
Proposition 3. If R is the set of accepted inputs of some automaton, then it is also the set of accepted inputs of some deterministic automaton.

In either case, we say that R is regular.

A common convention in logic is to identify a function f with the relation Rf which consists of all pairs of the form (x, f (x)), or, for n-ary functions,

(x0, . . . , xn−1, f (x0, . . . , xn−1)),
for x in the domain of f . As such, we will be able to deﬁne a regular function as a relation which is regular and also a function. The question now is how to input multiple words, especially multiple words of different lengths, to a ﬁnite automaton.

We introduce the Tuplefy map to merge words together in parallel so they can be read by an automaton. For words of different lengths, we add a dummy character #.

Deﬁnition. Deﬁne the map

Tuplefy : Σ∗0 × · · · × Σ∗n−1 → ((Σ0 ∪ {#}) × · · · × (Σn−1 ∪ {#}))∗,

Which satisﬁes:


wj [i] Tuplefy(w0, . . . , wn−1)[i]j =
#

it exists otherwise

and |Tuplefy(w0, . . . , wn−1)| = maxi |wi|.

Example 2. Here we show below how Tuplefy combines words together to one word:

9

w0 w1 w2 w3 w4
Tuplefy(w0, w1, w2, w3, w4)

a a

b

a

a

a

#

 

b

 

a

b b

b

a

b

b

#

 

b

 

a

b

b

a

b

#

#

 

b

 

a

a
a a # # #
a

a #
# # #
a

Figure 2.2: Behavior of Tuplefy on Sample Input

Now we can deﬁne the notion of regular relation and regular function: Deﬁnition. An n-ary relation R ⊆ Σ∗0 × · · · × Σ∗n−1 is regular if there is a ﬁnite automaton A with input alphabet (Σ0 ∪ {#}) × · · · × (Σn−1 ∪ {#}) such that:
R(w0, . . . , wn−1) ⇐⇒ A accepts Tuplefy(w0, . . . , wn−1).
An n-ary function f : Σ∗0 ×· · ·×Σ∗n−1 → Σ∗n is regular if there is a ﬁnite automaton A with input alphabet (Σ0 ∪ {#}) × · · · × (Σn ∪ {#}) such that:
f (w0, . . . , wn−1) = wn ⇐⇒ A accepts Tuplefy(w0, . . . , wn).
Regular relations and functions are a key part of the analysis of various automaton logics. For instance: Deﬁnition. Given a ﬁnite alphabet Σ, let WΣ = (Σ∗, ≤, =el, Sa|a∈Σ) where:
• ≤ is the preﬁx relation on words, • =el is the equal length relation on words,
10

• Sa is the Successor-a unary operation, which appends an a onto the end of a word.
We call WΣ the regular word logic over Σ. Proposition 4. If R is a relation on Σ∗, the following are equivalent:
• R is regular, • R is given by a formula φ in the language of WΣ.
Hence our interest in regular functions. If φ(x0, . . . , xn) is a formula in the language of WΣ such that:
∀x0, . . . , xn−1∃xn : φ(x0, . . . , xn), Then, since lexicographic ordering <L is regular and well-founded, we can restrict to the regular relation:
ψ(x0, . . . , xn) ⇐⇒ φ(x0, . . . , xn) ∧ y : [y <L xn ∧ φ(x0, . . . , y)] , Which is also a function. Such a restriction is called a Skolem function.
Finally, in this section, we connect Moore Machines and Finite Automata: Proposition 5. Given a Moore Machine M = (Σ, Q, q0, Γ, δ, ), the functions M, M Trunc, and M Rest are regular.
Proof. We can construct a deterministic ﬁnite automaton that keeps track of two pieces of information: the state we expect the Moore Machine to be in at any particular point, and a boolean value to keep track of whether the proposed output has so far been correct, additionally, in the case of M , we should expect
11


# a ﬁnal element of the form   and a small amount of information must be
 o
kept track of to handle this correctly.

For example, for M Trunc, let:

A = (Σ × Γ, Q × {0, 1}, (q0, 0), δ , Q × {0}),

Where:


0 δ ((q, i), (a, o)) = δ(q, a),
 1



i = 0 ∧ (q) = o

 

.

otherwise



Proposition 6. Every strictly causal, length-preserving, regular function is given by M Trunc for some Moore Machine M . Every causal, length-preserving function is given by M Rest for some Moore Machine M .
Proof. We prove this for a binary, strictly causal, length-preserving, regular function f . The proof is nearly identical in the general case. Let f : Σ → Γ be given. By our equivalence theorem above, there is a formula φ(w0, w1) in the language WΣ∪Γ such that
φ(w0, w1) ⇐⇒ f (w0) = w1.
Since f is a strictly causal function, the ﬁrst n − 1 characters of the input determine the nth character of the output. By simple tricks in WΣ∪Γ, we can construct a formula φu for each u ∈ Γ such that φu is true of exactly those sequences of characters that produce an output of u in the next place. These φu describe a collection of regular sets which partition all of Σ∗. Let Au be a ﬁnite automaton that recognizes the corresponding collection.
12

Now we construct our Moore Machine M . It should run each of the Au in parallel to determine its state. Since the Au recognize disjoint collections, exactly one of the Au will be in an accept state at any time. The function for our Moore Machine will take in the tuple of states for the Au and output the one which was in an accept state. It sufﬁces now to check that M Trunc is identically f .
The proof for M Rest is similar.
Of course, there are plenty of other regular functions. Take for instance, the Rest function, which removes the ﬁrst character of a string, shifting to the left. Alternately, a function which outputs 00 or 01 depending on whether there are an even or odd number of 0s in the input. Moore Machines allow us to construct functions which transmit information only to the right (towards the end of the inputs), but we will need a new method to transmit information to the left as in these examples.
2.3 Determinization and Harvesting
In this section, we introduce a notion of a Reverse Moore Machine, and provide a technique for decomposing a length-preserving regular function as a multivariable composition of a Moore Machine function, a Reverse Moore Machine function, and character-wise maps to connect them. This will require several stages. First, we will brieﬂy discuss some notation for discussing character-wise functions. Second, we will introduce the notion of a Reverse Moore Machine similar to the Moore Machines introduced in section 1. Third, we will present the decomposition. This decomposition is based on the classical powerset deter-
13

minization construction, viewed from a novel perspective. In the next section, we will discuss handling general regular functions.
First, some notation for functions which operate character-wise: Deﬁnition. Given two ﬁnite alphabets Σ, Γ, and a function f : Σ → Γ, we call the function Cwf : Σ∗ → Γ∗ which applies f to each character of the input, the characterwise f map.
If f is an n-ary function for n > 1, we can also make sense of Cwf . Let
f : Σ0 × · · · × Σn−1 → Γ.
Then we deﬁne the partial function Cwf : Σ∗0 × · · · × Σ∗n−1 → Γ∗,
which takes in inputs of all the same lengths and produces an output of the same length, where:
Cwf (w0, . . . , wn−1) = Cwf (Tuplefy(w0, . . . , wn−1)). Noting that Tuplefy does not produce characters with # in them for equal-length inputs.
Of course, Cwf is causal and length preserving (and reverse-causal, when we deﬁne the notion). Every Moore Machine function can be written as the corresponding function for the corresponding transparent Moore Machine composed with a bitwise application of its function.
We now deﬁne some notation for dealing with Reverse Moore Machines, analogous to our notation established previously. Deﬁnition. A Reverse Moore Machine is a tuple: (Σ, Q, qf , Γ, δ, ).
14

• Σ is a ﬁnite set of input characters (alphabet). • Q is a ﬁnite set of states. • qf ∈ Q is the ﬁnal state. • Γ is a ﬁnite set of output characters. • δ : Q × Σ → Q is the reverse transition function. • : Σ → Γ is the output function.
Given an input word w ∈ Σ∗, we construct a run r and output o such that:
• r[|w|] = qf . • r[i] = δ(r[i + 1], w[i]) for 0 ≤ i < |w|. • o[i] = (r[i]), for 0 ≤ i ≤ |w|. Example 3. Here we show below how inputs, states, and outputs, respectively, line up according to our notation.
abbba q0 q1 q2 q1 q2 qf o0 o1 o0 o1 o0 o2 Figure 2.3: Alignment Convention for Reverse Moore Machines
To prevent type mismatches, we will denote a Reverse Moore Machine with letters R, P as opposed to letters M, N for Moore Machines. Deﬁnition. A Reverse Moore Machine is said to be transparent if Γ = Q and is the identity. In this case, we present our Reverse Moore Machine as a tuple: (Σ, Q, qf , δ).
15

As before, we can interpret Reverse Moore Machines as functions from input words to output words: Deﬁnition. Given a Reverse Moore Machine R = (Σ, Q, qf , Γ, δ, ), and word w ∈ Σ∗, we denote by:
• R(w) the output of R on input w. • RTrunc(w) the output of R on input w with the last state removed. • RRest(w) the output of R on input w with the ﬁrst state (the start state) removed.
Analogous to our notions of causal and strictly causal, we have notions of reverse causal and strictly reverse causal: Deﬁnition. Given two alphabets, Σ, Γ, a function on words f : Σ∗ → Γ∗ is said to be:
• reverse causal if f (w)[i] depends only on w[i], w[i + 1], . . .. • strictly reverse causal if f (w)[i] depends only on w[i + 1], w[i + 2], . . ..
Of course, a function is character-wise iff it is causal and reverse causal. We also have a notion of reverse deterministic automaton: Deﬁnition. A ﬁnite automaton A = (Σ, Q, I, δ, F ) is reverse deterministic if:
• F is a singleton. • For each q, a, there is a unique q such that (q , a, q) ∈ δ.
A reverse deterministic automaton can also be viewed as a transparent Reverse Moore Machine.
We now have the notation to state our Theorem: 16

Theorem 1. Any length-preserving regular function can be written as a multivariable composition of the form:
f (w) = RTrunc(CwPair(w, M Trunc(w))).

It’s worth noting that CwPair has the same action here as Tuplefy. We’ve chosen to use CwPair here to indicate that we’re not using the length-padding features of Tuplefy.
We need to prove a few lemmas ﬁrst.
Lemma 1. Given a length-preserving regular function f : Σ∗ → Γ, there is an automaton B with state set Q and map : Q → Γ such that f (w) is Trunc ◦ Cw applied to any run of B on input w.

Proof. Given our length-preserving function f : Σ∗ → Γ∗, let:

C = ((Σ ∪ {#}) × (Γ ∪ {#}), Q, I, δ, F )

Witness the regularity of f . Since C automatically rejects any inputs with a # in them, we can restrict it to:

A = (Σ × Γ, Q, I, δ, F )

Recognizing the same set of inputs.

Construct an automaton:

B = (Σ, Q × Γ, I × Γ, δ , F × Γ),

Where:

((q, o), a, (q , o )) ∈ δ ⇐⇒ (q, (a, o), q ) ∈ δ

17

B is nondeterministic. I assert that runs of B on input w will necessarily have Γ component So(f (w)), for some o ∈ Γ. Firstly, it should be clear that if r is an accepting run of A on input Tuplefy(w, f (w)), then Tuplefy(r, So(f (w))) is an accepting run of B on input w for any o ∈ Γ.

Now, suppose Tuplefy(r , g) is an accepting run of B on input w. Then it is easy to check that r is an accepting run of A on input Tuplefy(w, Trunc(g)). Since A witnessed f being a regular function, Trunc(g) must be f (w). This completes the proof for taking the Γ component of the states of B.
Deﬁnition. Given a nondeterministic automaton A = (Σ, Q, I, δ, F ), deﬁne its determinization:

det(A) = (Σ, P(Q), {I}, δ , {E ⊂ Q : E ∩ F = ∅}),

Where:

δ (K, a) = {q ∈ Q|∃q ∈ K : δ(q, a, q )}.

Typically throughout this chapter we will be concerned with the determinization as a transparent Moore Machine, so the set of ﬁnal states doesn’t matter much.
Deﬁnition. Given a nondeterministic automaton A = (Σ, Q, Iδ, F ), deﬁne its harvester:
harv(A) = (Σ × P(Q), Q ∪ {qF }, Q ∪ {qF }, δ , {qF }),
Where:

• The q such that δ (q, (a, K), q ) (for q ∈ Q) is given by the least q ∈ K such that δ(q , a, q ) or qF if none exist.

18

• The q such that δ (q, (a, K), qF ) is given by ﬁnding the least q ∈ F such that there is a q ∈ K such that δ(q , a, q ) and then having q be the least q ∈ K such that δ(q , a, q ). If no such q ∈ F exists, then q is given by qF .
The powerset determinization of an automaton A produces an automaton that keeps track of, at every position, the set of states of A which are reachable through some sequence of transitions, having read the input up to that point. However, not every one of these reachable states necessarily shows up in some accepting run: it may be that being in one state now means later on having to be in another state which we cannot transition out of, or that being in a state now dooms us to being in a reject state once we have ﬁnished reading the input. In order to use our determinization to ﬁnd an accepting run of the original automaton, provided there is an accepting run, we need to start at the end and work our way backwards, all the while staying within states that we know can be traced through a sequence of transitions back to a start state at the beginning.
Speciﬁcally, if we know we’re in a state q which is reachable through some sequence of transitions after having read Sa(w), then there must be at least one state q which is reachable through some sequence of transitions after having read w such that reading a takes us from state q to state q.
It is necessary to introduce an additional dummy state qF to start out in to make sure our harvester automaton is reverse-deterministic. Although our application of the harvester automaton will see only pairs of the form (a, K) for a some symbol being read by our original automaton and K the set of states reachable immediately prior to reading that speciﬁc a, our automaton should be prepared to read in arbitrary input pairs. An invalid input will cause the reverse deterministic automaton to go into the dummy qF state. Additionally,
19

a cheap ﬁx is necessary to account for the fact that we don’t know what state the run we’re trying to produce ends on and we need a single “ﬁnal” state for our reverse deterministic automaton to “start out” in. The dummy qF state represents all ﬁnal states which are reachable. Truncating the run will remove this dummy state. Lemma 2. Suppose we have a nondeterministic automaton A, and valid input w. Let p be the run of det(A) on input w. I claim that harv(A) on input Tuplefy(w, Trunc(p)) will have run r, an accepting run of A on input w.
Proof. It sufﬁces to show that the only occurrence of qF in r is as the ﬁnal character. By construction, harv(A) will satisfy the transition relations. As mentioned before the construction of harv(A) also prevents backwards transitioning into the qF state for this particular input, since a reachable state can always be traced back to a reachable state.
Combining our two lemmas, given a regular, length-preserving function f , we have a nondeterministic automaton B which takes in a word w and has a run r which projects to f (w). By our second lemma:
r = harv(B)Trunc(CwPair(w, det(B)Trunc))
By modifying the outer Reverse Moore Machine, we can throw in the appropriate projection to its output map to produce f (w). This completes the proof.
This is a remarkable result. Every length-preserving regular function can be computed in a two-step process: one pass forwards through the input leaving behind some information as we do, then a pass backwards through the input with this additional information to directly produce the output. Two passes sufﬁce; having more passes doesn’t increase our expressive power.
20

2.4 Length Modiﬁcation
We’re very close to generating all the regular functions, but we’re not quite there. The only functions we have dealt with so far were length-preserving. In this section, we see that most of the interesting behavior was already captured in the length-preserving case.
Proposition 7. Suppose f : Σ → Γ is a regular function. Then there is a ﬁxed constant c associated to f such that f (w) is no longer than c + |w| for every w.
The proof is based on the pumping lemma.
Proof. Let A be a deterministic automaton with c states accepting exactly words of the form Tuplefy(w, f (w)). Choose a speciﬁc w and suppose f (w) is longer than c + |w|. Imagine what happens as A reads in Tuplefy(w, f (w)), speciﬁcally,
 #
after w is ﬁnished, and A is reading in characters of the form   for some  o
o ∈ Γ. Because there are more positions like this than there are states of A, by the pigeonhole principle some two positions will have the same state, say at positions i and j. Note however that if we remove all positions in Tuplefy(w, f (w)) between i and j (including i, excluding j) we still have an accepting run, of the form Tuplefy(w, g) for some g strictly shorter than f (w). This contradicts our assumption that A accepted exactly words of the form Tuplefy(w, f (w)).
From this, we can also show that for any n-ary regular function f , there is a ﬁxed constant c associated to f such that f (w0, . . . , wn−1) is no longer than c plus the length of the maximum input.
21

Note that the automaton A in our proof never encounters the input character 
#  . As such, we may assume that this character acts as the identity on the 
# states of A and still have an automaton witnessing the regularity of f . This automaton recognizes all pairs of the form Tuplefy(S#c (w), f (w)). (S#c simply represents a c-fold composition of the S# function).
As such, the function which takes S#c (w) to S#d (f (w)) for d = |f (w)| − |w| is a length-preserving regular function g, and thus can be written as a composition of character-wise maps, truncated Moore Machines, and truncated Reverse Moore Machines as in Theorem 1. Since f can be written as:
f (w) = Unpad(g(S#c (w))),
Where Unpad removes ﬁnal # characters. Note that we do not know how many ﬁnal # characters there will be. For functions which reduce length, the number will be more than c and could be as much as c + |w|. One might be tempted to try to replace Unpad with some function like S#−1 (or, even less suited to the task, Trunc), which either removes a single ﬁnal # or leaves the word alone if it cannot. However, since there are regular functions which take words of arbitrary length and reduce them to length 1, we need a generator that can produce unbounded shortening as well.
Note that if f is n-ary for n > 1, we can use what we’ve already shown to see that:
f (w0, . . . , wn−1) = Unpad(g(S#c n(w))),
For some regular, length-preserving function g.
As such, we have the following theorem:
22

Theorem 2. Any regular function can be written as a multivariable composition of:
• Truncated Moore Machines, • Truncated Reverse Moore Machines, • Character-wise maps, • Tuplefy (allowing us to generate multiary character-wise maps), • Sa for various a, • Unpad.
It’s worth noting here that our generating set is inﬁnite. Speciﬁcally, there are an inﬁnite number of Moore Machines and Reverse Moore Machines. There are also an inﬁnite number of Character-wise maps, but this isn’t essential – one could use encoding methods to work purely with a single two-character alphabet (plus, optionally, the dummy character #).
The inﬁnitude of our generating set, however, is essential. The easiest way to see this is to talk about period introduction. Provided the input to a regular function has a sufﬁciently long periodic portion in the middle, the output of the regular function will also have a long periodic portion in the middle (it sufﬁces to verify this of the generators above). What’s more, the period of the periodic portion of the output can only have prime factors which show up either in the periods of the periodic portions of the inputs or which are smaller than the number of states of the associated automaton to the regular function. However, one can easily build regular functions which introduce any prime factor into the periodicity of their inputs, so we must not allow any bound on the sizes of associated automata to regular functions in our generating set. We summarize this result as follows:
23

Proposition 8. Any set of regular functions which generates all regular functions under multivariable composition must be inﬁnite.
We conclude this chapter with a discussion of the Krohn-Rhodes Theorem. First, we will provide a proof of the Krohn-Rhodes Theorem adapted from Ginzburg [5]. Then we will use the Krohn-Rhodes Theorem to decompose our Moore Machine and Reverse Moore Machine generators into smaller, simpler generators. A ﬁnal short discussion will clean up our set of generators, followed by proposed future research.
2.5 The Krohn-Rhodes Theorem
In this section, we present the Krohn-Rhodes Theorem as adapted to the context of multivariate composition of regular functions. The original proof of the Krohn-Rhodes Theorem, in [7], was presented in terms of wreath products of semigroups. More modern presentations of the Krohn-Rhodes Theorem typically present it in terms of the cascade product of ﬁnite state transducers.
The cascade product of two transducers M1, M2 is a system consisting of both machines. First, machine M2 reads in both the input to the system and the current state of M1 to update its state. When it has ﬁnished, machine M1 updates its state based only on the input to the system. Finally, an output is produced based on the states of M1 and M2. This reﬂects the reality of systems where updating the states of our machines takes a small but appreciable amount of time. In a well designed system, M2 should not have to wait for M1 to ﬁnish its update before it can update its state. As such, M2 uses the state of M1 prior to reading the input to update.
24

The Krohn-Rhodes Theorem separates out two extremes of behavior for ﬁnite automata. In general, reading in an input character induces a function on the states of the automaton. This function may map two states to the same state or to separate states. At one extreme, it may act as a permutation in which case it maps all states to separate states. In this case, it is possible to undo this action. We can recover the state before reading a character which acts as a permutation, provided we know which character the automaton read. At the other extreme, an input character may act as a reset in which case it maps all states to the same state. In this case all information about the previous state is lost. Deﬁnition. A Moore Machine or deterministic automaton is said to be:
• A permutation automaton if each of its inputs acts as a permutation on its states,
• A reset automaton if each of its inputs acts as a reset or the identity on its states, • A permutation-reset automaton if each of its inputs acts as a permutation or a
reset on its states.
We now state the Krohn-Rhodes theorem, in a bit of an unusual fashion: Theorem 3 (Krohn-Rhodes). Given a transparent Moore Machine M , we can write its truncated action M Trunc as a multivariable composition of truncated actions of permutation-reset Moore Machines M0, . . . , Mn−1 for n the number of states of M , and a ﬁnal character-wise map f . What’s more, this composition takes on a fairly simple
25

form. Let: Then:

w0 =M0Trunc(w), w1 =M1Trunc(Tuplefy(w, w0)), w2 =M2Trunc(Tuplefy(w, w0, w1)),
... wn−1 =MnT−ru1nc(Tuplefy(w, w0, . . . , wn−2)),
M Trunc(w) = Cwf (w0, . . . , wn−1).

Every map in this composition is length-preserving. Of course we can write this as simply one large multivariable composition of a character-wise map and truncated actions of permutation-reset transparent Moore Machines, but this is unwieldy to write down. The above also presents an efﬁcient way of computing the composition, although readers concerned with efﬁciency are encouraged to look into the Holonomy decomposition [3].
Our proof is inductive: we show that for every transparent Moore Machine M , there’s another transparent Moore Machine M that keeps track of a state M is not in in a permutation-reset way. This reduces the amount of information we need to keep track of by one state, and we can keep doing this until we’ve kept track of all the information to know what state M is in.
The proof in [5] allows for the possibility that we can keep track of several states our automaton M is not in in a permutation-reset way at the same time, as opposed to one at a time in the proof below. This is more efﬁcient, but adds needless complexity to the proof.

26

The key piece of our construction is the Permutation-Reset Lemma, below. Lemma 3 (Permutation-Reset Lemma). Given two ﬁnite ordered sets of the same size I, J, and map between them f , there is a map g : I → J such that:
• g either acts as: – A bijection from I to J (a permutation on the position indices), – Or has singleton image (a reset on position indices),
• And for x, y ∈ I, with x = y, we have f (x) = g(y). • For any x ∈ I, we have f maps elements of I \ {x} to J \ {g(x)}.
Proof. Suppose f does act as a bijection. Then g = f is a permutation and satisﬁes the inequality condition.
Suppose f does not act as a bijection. Then g which maps everything to the smallest element of J which is not in the image of f has singleton image and satisﬁes the inequality condition.
The third condition is just a rephrasing of the second, but will come in handy later on.
A speciﬁc application of the Permutation-Reset Lemma is that we can have a transparent Moore Machine that keeps track of a state our original transparent Moore Machine is not in: Lemma 4. Given a transparent Moore Machine M = (Σ, Q, q0, δ), there is a permutation-reset transparent Moore Machine M with the same state set such that on input w, the state of M at any one time is not the state of M .
27

Proof. Assign a natural ordering to Q. Deﬁne
M = (Σ, Q, q0, δ),
Where q0 is the smallest element in Q which is not q0, and δ(q, a) is given by:
• δ(q, a) if a acts as a permutation. • Otherwise, the smallest q ∈ Q which is not in the image of any state under
the action of a.
If the action of a was a permutation originally, it is still a permutation in our new automaton. This permutation not only takes us from the state our automaton is in before reading a to the state afterwards, but also from a state our automaton is not in before reading a to a state our automaton is not in afterwards. In the second case, note that the choice of q does not depend on q, so this action is a reset. Obviously, it takes us from a state our original automaton is not in before reading a to a state our automaton is not in after reading a.
Lemma 5. Given transparent Moore Machines M, M as above with state sets Q, there is a third transparent Moore Machine:
M = (Σ × Q, {0, . . . , |Q| − 2}, ˆı0, δ),
Such that for any input w, if M on reading w winds up in state q, and M on reading w winds up in state q then M on reading Tuplefy(w, M Trunc(w)) will wind up in state ˆı, where q is the element in position2 ˆı of Q \ {q}.
2To maintain notational consistency within this paper, where ordered collections are indexed starting with 0, we refer to the ﬁrst element of a set as being in position 0, and generally the i+1st element of a set as being in position i. To avoid confusion, we will endeavor to avoid using the notation “ith element”, instead using notation “the element in position i”.
28

Proof. Let M = (Σ×Q, {0, . . . , |Q|−2}, ˆı0, δ) where ˆı0 is the index of q0 in Q\{q0} and δ(i, (a, q)) is computed by:
• Computing q = δ(q, a). This is the state M says that M is not in after reading a.
• Computing q, the ith element of Q \ {q}. This is the state of M prior to reading a.
• Return j, the index of δ(q, a) in Q \ q .
The above construction is designed speciﬁcally to satisfy the conclusion.
As such, we have that M Trunc(w) = Cwp(M Trunc(w), M Trunc(Tuplefy(w, M Trunc(w))))
Where p(q, i) is the ith element of Q \ {q}. Finally, we prove the Krohn-Rhodes Theorem:
Proof. By induction on the number of states of M . Base Case: If M has one state, then f in the composition is 0-ary, and we can
have it just output that constant state. Inductive Case: Given our M , we can write: M Trunc(w) = Cwp(M Trunc(w), M Trunc(Tuplefy(w, M Trunc(w)))),
With M permutation-reset. By our inductive hypothesis, we can write M Trunc(w), which has one fewer state than M , as:
M Trunc(w) = Cwf (w1, . . . , wn−1), 29

Where:

w1 =M1Trunc(w) w2 =M2Trunc(Tuplefy(w, w1))) w3 =M3Trunc(Tuplefy(w, w1, w2)
...
wn−1 =MnT−ru1nc(Tuplefy(w, w1, . . . , wn−2))

As such, M Trunc(Tuplefy(w, M Trunc(w))) is given by just plugging in:

Where:

M Trunc(Tuplefy(w, M Trunc(w))) = Cwf (w1, . . . , wn−1),

w1 =M1Trunc(Tuplefy(w, M Trunc(w))), w2 =M2Trunc(Tuplefy(Tuplefy(w, M Trunc(w)), w1)), w3 =M3Trunc(Tuplefy(Tuplefy(w, M Trunc(w)), w1, w2)),
... wn−1 =MnT−ru1nc(Tuplefy(Tuplefy(w, M Trunc(w)), w1, . . . , wn−2)).

Alternately, ﬁddling with some parentheses in the deﬁnitions of our au-

tomata:

M Trunc(Tuplefy(w, M Trunc(w))) = Cwf (w1, . . . , wn−1, )

30

Where:
w0 =M Trunc(w), w1 =M1Trunc(Tuplefy(w, w0)), w2 =M2Trunc(Tuplefy(w, w0, w1)), w3 =M3Trunc(Tuplefy(w, w0, w1, w2)),
... wn−1 =MnT−ru1nc(Tuplefy(w, w0, . . . , wn−2)).
In which case:
M Trunc(w) = Cwp(M Trunc(w), Cwf (w1, . . . , wn−1)).
w0
We can combine p and f to get a single character-wise function on w0, . . . , wn−1, thus completing the induction.
The proof is still straightforward if we unwind the induction. In our construction, w0 is keeping track of a state M is not in, but it may as well be keeping track of an index for a state M is not in. w1 is keeping track of an index of a state M is not in once we’ve removed the state in position w0 from Q. w2 is keeping track of an index of a state M is not in once we’ve removed the states w1 and w2 are keeping track of from Q. And so forth. We formalize this indexed removal process as follows:
Deﬁnition. Given a positive integer n, an ordinal removal sequence for n is a (possibly empty) sequence of positive integers (k0, . . . , ki) satisfying:
i <n, 0 ≤ kj <n − j.
31

We interpret an ordinal removal sequence for n as a series of commands operating on an ordered set of size n of the form “remove the i + 1st smallest element remaining.” Note that as elements are removed, there are fewer elements remaining, hence the decreasing upper limit on kj in the second constraint. Consistent with the rest of this paper, we begin our indexing with 0, so a 0 means remove the smallest element.
Deﬁnition. Given an ordered set L, and an ordinal removal sequence for |L|, k = (k0, . . . , ki−1), deﬁne Remove(L, k) recursively:

• Remove(L, ()) = L,
• Remove(L, (k0, . . . , ki−1)) is given removing the element in position ki (with order inherited from L, and 0 means remove the smallest element) of Remove(L, (k0, . . . , ki−2)).

Let Ok,n denote the set of ordinal removal sequences for n of length k.

Recall that is used for concatenation, so:

(k0, . . . , ki−1) j = (k1, . . . , ki−1, j).

Example 4. Consider the ordinal removal sequence for 5: (0, 1, 2, 1) acting on the ordered set (A, B, C, D, E):

Start: Remove at position 0: Remove at position 1: Remove at position 2: Remove at position 1:

(A, B, C, D, E) (B, C, D, E) (B, D, E) (B, D) (B)

Table 2.1: Sample Application of an Ordinal Removal Sequence

32

We now state the lemma formally that allows us to construct the Mj in the Krohn-Rhodes decomposition. Lemma 6. Given a transparent Moore Machine M = (Σ, Q, q0, δ), and 0 ≤ j < |Q| − 1, there is a permutation-reset transparent Moore Machine
Mj = (Σ × Oj,|Q|, {0, . . . , |Q| − j}, k0, δ ),
Such that if κ is a word of ordinal removal sequences for |Q| of length j, that is, κ ∈ (Oj,|Q|)∗, satisfying:
• At any point i, κ[i] does not remove M (w)[i] from Q. That is, M (w)[i] ∈ Remove(Q, κ[i]).
• κ[i + 1] is determined by κ[i] and w[i], speciﬁcally such that: • The map δa (from M ) maps states in Remove(Q, κ[i]) to states in
Remove(Q, κ[i + 1]).
Then o = Mj(Tuplefy(w, κ)) satisﬁes:
• At any point i, κ[i] o[i] does not remove M (w)[i] from Q. That is, M (w)[i] is not in position o[i] of Remove(Q, κ[i]), and in particular M (w)[i] ∈ Remove(Q, κ[i] o[i]).
Proof. The idea here is that each Mj should keep track of a single entry in an ordinal removal sequence that will remove all elements of Q except the state of our original automaton M at any one particular time. These will be the Mj in our multivariate composition, so they will be reading in both the original input (a single character a from w), and a single index from each of M0, . . . , Mj−1, together forming an ordinal removal sequence (κ[i]) of length j. Each Mj then
33

keeps track of an index, which, when added on to the end of the ordinal removal sequence does not remove the one state we must not remove, the state of M at that point.

For reference, a picture of the situation is drawn below:

a w[i]
w0[i]
w1[i]
κ[i] w2[i] ...
wj−1[i]

w[i + 1]
w0[i + 1]
w1[i + 1]
w2[i + 1] ...
wj−1[i + 1]

wj [i]

wj[i + 1]

Figure 2.4: Transition Diagram for the Automaton Mj in our Cascade

As we can see in the diagram, Mj will be reading in w[i], the character that takes our automaton M from M (w)[i] to M (w)[i + 1], and κ[i], the ordinal removal sequence within which we interpret its current state wj[i]. Speciﬁcally, wj[i] will be a position in Remove(Q, κ[i]) where we don’t ﬁnd the current state of M , M (w)[i]. This will transition Mj into the state wj[i + 1], which must be a position in Remove(Q, κ[i + 1]) where we don’t ﬁnd the next state of M , M (w)[i].
Note that Mj does not get direct access to κ[i+1], but of course we need access to κ[i+1] in order to determine the index for M (w)[i+1] in Remove(Q, κ[i+1]) so that we can avoid it. Fortunately, if the previous automata, M0, . . . , Mj−1 work in canonical fashions, knowing a and κ[i] is enough to determine κ[i + 1].
To start with, we need to pick the starting state for Mj, k0, such that k0 is not an index for the start state of M in Remove(Q, κ[0]). Let it be the smallest such
34

index.
By hypothesis, the transition map induced by the character a on the automaton M , δa, maps states in Remove(Q, κ[i]) to states in Remove(Q, κ[i + 1]). By the Permutation-Reset Lemma, we can deﬁne the transition map for Mj, δ , with δ(w[i],κ[i]) a permutation-reset map for any particular w[i] and κ[i] that does the avoiding we require of it.
Finally, it’s worth noting that κ[i] wj[i], κ[i + 1] wj[i + 1] satisfy the requirements on κ[i] and κ[i + 1] in the hypothesis of our lemma. Speciﬁcally:
• Our new w[0] and wj[i + 1] were chosen to avoid removing M (w)[0] and M (w)[i + 1] from Q.
• wj[i + 1] is determined by wj[i], κ[i], and w[i]. • The map δa maps states in Remove(Q, κ[i] wj[i]) to states in Remove(Q, κ[i+
1] wj[i + 1]). This is immediate, looking at the third condition on the function the Permutation-Reset Lemma constructs.
As such the Mj in our proof are the same as the Mj in our multivariate composition for the action of M , and Tuplefy(w1, . . . , wj−1) is a word of ordinal removal sequences κ as above for each suitable j. As such the ﬁnal character-wise map f is simply the map mapping an ordinal removal sequence k of length |Q| − 1 to the single element of Remove(Q, k).
Hopefully, this particular proof will shed some light on the multivariate composition we are using to determine M Trunc(w): why it is shaped the way it is shaped, and what each piece of the composition is keeping track of. Utilizing
35

the Krohn-Rhodes Theorem, we currently have the following set of generators for the regular functions: Theorem 4. Any regular function can be written as a multivariable composition of:
• Truncated Permutation-Reset Transparent Moore Machine maps, • Truncated Reverse Moore Machine maps, • Character-wise maps, • Tuplefy (allowing us to generate multiary character-wise maps), • Sa for various a, • Unpad.
It may seem like we haven’t gained much, but in the next section we will see that there isn’t actually that much to Permutation-Reset Transparent Moore Machines. Then we will handle the reverse Moore Machine case.
2.6 A Further Breaking Down
In this section we prove that truncated permutation-reset transparent Moore Machine maps can be written as the composition of a single truncated permutation transparent Moore Machine map and a single truncated reset transparent Moore Machine map. Then we see that permutation transparent Moore Machines and reset transparent Moore Machines are actually quite familiar objects. As before, these proofs are adapted from [5], which uses vastly different notation.
36

Lemma 7. Given a permutation-reset transparent Moore Machine M = (Σ, Q, q0, δ), there is a permutation transparent Moore Machine M˜ , reset transparent Moore Machine M , and function F such that:
M Trunc(w) = CwF (M Trunc(Tuplefy(w, M˜ Trunc(w))), M˜ Trunc(w))
Proof. Let M˜ = (Σ, SQ, id, δ˜) and M = (Σ × SQ, Q, q0, δ), where SQ is the set of all permutations on Q, and:
If δa is a permutation of the states of M : δ˜(f, a) = δa ◦ f, δ(q, (a, f )) = q.
If δa is a reset on the states of M with image {qa}: δ˜(f, a) = f,
δ(q, (a, f )) = f −1(qa).
As desired M˜ is a permutation automaton and M is a reset automaton (notice that the identity action is necessary in case δa is a permutation).
Let F : SQ × Q → Q with F (f, q) = f (q). I now claim that M Trunc(w) = CwF (M Trunc(Tuplefy(w, M˜ Trunc(w))), M˜ Trunc(w)),
As desired. This is easy to verify in terms of their transition relations. The intuition behind this construction is that M˜ keeps track of the action of each of the permutations and M handles resets by storing them in terms of what state one would have to start in such that after being acted on by just the permutations one winds up in the current state of M .
37

We now deﬁne a couple of transparent Moore Machines in order to reﬁne our decomposition further.

Deﬁnition. For each n, deﬁne the Accumulator on Sn transparent Moore Machine ASn:
ASn = (Sn, Sn, id, δ),

Where:

δg(h) = h · g,

Where Sn is the symmetric group on n elements with composition operation (h · g)(i) = h(g(i)).

Deﬁnition. Deﬁne the bit-storage automaton:

Bit = ({−, 0, 1}, {0, 1}, 0, δ),

Where δ− acts as the identity, δ0 is a reset to state 0, and δ1 is a reset to state 1.

Since every collection of permutations can be viewed as a subset of the symmetric group Sn for some n:
Proposition 9. Every truncated permutation transparent Moore Machine map M Trunc can be written as:
M Trunc(w) = Cwf (ASnTrunc(Cwg(w))), For some functions f and g.

What’s more: Proposition 10. Every truncated reset transparent Moore Machine map M Trunc can be written as:
Cwf (BitTrunc(Cwg0(w)), . . . , BitTrunc(Cwgn−1(w))), For suitable n, f , and g0, . . . , gn−1.
38

Proof. Suppose M = (Σ, Q, q0, δ). Choose an n such that 2n ≥ |Q|. For every state q ∈ Q, associate a unique bitstring b(q) ∈ 2n, where bk(q) is the bit in position k of b(q), such that the start state q0 ∈ Q is given by the all 0s bitstring. Let f = b−1. Suppose δa acts as a reset to the state qa. Then let gk(a) = bk(qa).
As such: Proposition 11. Every truncated Moore Machine map M Trunc can be written as a multivariable composition of:
• ASnTrunc for various n, • BitTrunc, • Character-wise maps.
Now we have a much smaller generating set for our regular functions. Theorem 5. Any regular function can be written as a multivariable composition of:
• ASnTrunc for various n, • BitTrunc, • Truncated Reverse Moore Machine maps, • Character-wise maps, • Tuplefy (allowing us to generate multiary character-wise maps), • Sa for various a, • Unpad.
39

2.7 Reverse Moore Machines

Just as in previous sections, we broke down the truncated Moore Machine maps into compositions involving the accumulator on Sn, the Bit automaton, and character-wise maps (note that uses of Tuplefy are length-preserving, and thus actually character-wise applications of a tuple-construction map), in this section, we break down truncated reverse Moore Machine maps similarly. To save ourselves work, we will simply introduce a reversal map Rev (which is not regular) to connect truncated reverse Moore Machine maps and truncated Moore Machine Maps.
Deﬁnition. Given a word w ∈ Σ∗ deﬁne Rev(w) to be the reversal of w.

Given a Moore Machine M = (Σ, Q, q0, Γ, δ, ), deﬁne its reversal: Rev(M ) = (Σ, Q, q0, Γ, δ, ).
And similarly deﬁne the reversal of a reverse Moore Machine. Proposition 12. Given a Moore Machine M , and word w:

Rev(M )(w) = Rev(M (Rev(w))).

What’s more:

Rev(M )Trunc(w) = Rev(M Rest(Rev(w))).

Functions that are related in this way we say are related by conjugation by Rev. This relation is reﬂexive and symmetric. What’s more since Rev is its own inverse, if f and f are related by conjugation and g and g are related by conjugation, then f ◦ g and f ◦ g will be related by conjugation. Indeed this works for multiary functions as well:
40

Deﬁnition. Given an n-ary function f , we say that (x0, . . . , xn−1) → f (x0, . . . , xn−1)
And (x0, . . . , xn−1) → Rev(f (Rev(x0), . . . , Rev(xn−1)))
Are related by conjugation by Rev. Proposition 13. Given a multi-ary composition of functions, if you replace each function by its conjugation by Rev, the overall composition is related to the original composition by conjugation by Rev.
Proof. It sufﬁces to note that in the resulting composition, whenever the output of a function is fed into the input of another function, it is reversed twice, effectively doing nothing to it.
Additionally, conjugation by Rev does not alter character-wise functions. Finally, we prove a ﬁnal breakdown: Proposition 14. Given a transparent Moore Machine M = (Σ, Q, q0, δ), we can write M Rest as a composition of character-wise maps and M Trunc.
Proof. It is easy to verify that: M Rest(w) = Cwδ(M Trunc(w), w).
It follows from this that for any Moore Machine M , M Rest can be written as a composition of character-wise maps and M Trunc.
41

Proposition 15. Any truncated reverse Moore Machine map can be written as the multivariate composition of:
• RASnTrunc, where RASn is the reversal of ASn, for various n, • RBitTrunc, where RBit is the reversal of Bit, • Character-wise maps.
Proof. Every reverse Moore Machine is Rev(M ) for some M . As such, we can write Rev(M )Trunc as:
Rev ◦ M Rest ◦ Rev. As we’ve seen, we can write M Rest as a multiary composition of ASnTrunc for various n, BitTrunc, and character-wise maps, so by our conjugation of compositions lemma, we can write Rev(M )Trunc as a multiary composition of the conjugations of those components, which is what we were trying to prove.
While we’re at it, note that: Proposition 16. Given a Moore Machine M , we can write M (w) as the multivariable composition of a truncated Moore Machine map and S#.
Proof. Augment M to M by allowing it to interpret the input # (it may do so in any way it likes). Then:
M (w) = M Trunc(S#(w)).
As one ﬁnal reﬁnement of our generating set, we show that RASn is unnecessary as a generator.
42

2.8 Removing RASn

Note that ASn and RASn are very similar automata. For ASn, one interprets the input character w[i] as a permutation relating ASn(w)[i] and ASn(w)[i + 1]. For RASn, one interprets the input character w[i] as a permutation relating RASn(w)[i + 1] and RASn(w)[i]. Since every permutation has an inverse, shouldn’t these two automata be the same up to a suitable character-wise map on the inputs? Certainly not! For there is another constraint on the runs of ASn and RASn we must account for. For ASn, the ﬁrst character of its run is speciﬁed to be id. For RASn, the last character of its run is speciﬁed to be id.
Let’s compare ASn(w) and RASn(Cwinverse(w)) on some generic ﬁve character input w = abcde:

w a b c de

ASn(w)

id a ab abc abcd abcde

RASn(Cwinverse(w)) (abcde)−1 (bcde)−1 (cde)−1 (de)−1 e−1 id

Table 2.2: Comparing ASn and RASn on Sample Input

In addition to applying a suitable transformation to the inputs of ASn, we must also apply a suitable transformation to the outputs of ASn if we want to produce the output of RASn. Speciﬁcally, if we multiply every character in the output of ASn on the left by the inverse of the last character of the output, it will ensure that the new last character of the output is id, but still maintain the transition relationships. This requires passing the information of the last character of ASn to every other position.

43

First, we must introduce the mask of our word w, a word Mask(w) which is all 0s up to the length of w, followed by a 1. This is computed simply by taking S1(Cw0(w)) where 0 is the constant 0 map. Despite its simplicity, this word will be key to performing our computation.

Consider the reverse reset Transparent Moore Machine R = ({0, 1} ×

Sn, Sn, id, δ) with:


b δ(q, a) =
q

a = (1, b) a = (0, b)

This is a reverse reset transparent Moore Machine, and so RTrunc can be written

in terms of character-wise maps and RBitTrunc.

Consider the action of RTrunc on Cwpair(Mask(w), ASn(w)):

w
Mask(w)
ASn(w) RTrunc (Cwpair (“))

a 0 id abcde

b 0 a abcde

c 0 ab abcde

d 0 abc abcde

e 0 abcd abcde

1 abcde abcde

Table 2.3: The Action of RTrunc on Cwpair(Mask(w), ASn(w))

We now have a word which we can combine with ASn(w) via the appropriate bitwise map (multiplying by the inverse on the left) to produce RASn(Cwinverse(w)).
However, what we wanted was RASnTrunc(Cwinverse(w)). We can attain that by combining RASn(Cwinverse(w)) with Mask(w) bitwise to replace the last character of RASn(Cwinverse(w)) with a # and then using Unpad to remove it.
Letting f map pairs of the form (0, a) to a and pairs of the form (1, a) to #:
44

w a b c de

Mask(w)

0 0 0 0 01

RASn(Cwinverse(w)) (abcde)−1 (bcde)−1 (cde)−1 (de)−1 e−1 id

Cwf (“)

(abcde)−1 (bcde)−1 (cde)−1 (de)−1 e−1 #

Unpad(“)

(abcde)−1 (bcde)−1 (cde)−1 (de)−1 e−1

Table 2.4: Producing RASnTrunc on Sample Input

This composition produces the desired output and works in general. As such, we have the following, ﬁnal form, of our theorem: Theorem 6. Any regular function can be written as a multivariable composition of:
• ASnTrunc for various n, • BitTrunc, • RBitTrunc, • Character-wise maps, • Tuplefy, • Sa for various a, • Unpad.

2.9 Further Research
We may think of a regular function as a function whose output can be veriﬁed to be correct by a deterministic ﬁnite automaton, and in this chapter we have spelled out exactly how one might go about ﬁnding that appropriate output.
45

This should feel very similar to the class of NP problems or recursive enumerability, in that once one has an answer one can verify it quickly. In these more advanced cases, the issue of ﬁnding that answer is signiﬁcantly more difﬁcult. In the case of ﬁnite automata, one can get by mostly with just a two-pass system.
While thinking about automata as transducers instead of acceptors takes us away from the underlying logic, it takes us closer to real-world applications of automata. One is then lead to ask similar questions about functions whose graphs are recognized by Bu¨ chi automata, Tree automata, and Rabin automata. The determinization-harvester decomposition can be adapted to trees, but is there an analog to the Krohn-Rhodes theorem for trees in this context? What about in the case of Bu¨ chi automata or Rabin automata, for which there is no end of the input to start the harvester running backwards from? Can nice generators still be found?
There is still much work to be done in establishing a Bu¨ chi-ElgotTrakhtenbrot theorem for graphs. There are several nice candidates for a monadic second-order logic of graphs, and some nice notions of automata operating on graphs, but no full correspondence between them. The current state of the art is Courcelle’s Theorem (see, e.g. [2]), which allows us to translate questions in a graph logic to tree automata operating on a tree decomposition of the original graph, but not back. Fortunately, this is the direction most of interest to applications. But perhaps an approach which instead of trying to connect formulas and acceptors, connected describable functions and transducers would shed light on the problem?
Finally, we also note that it is not possible to break down our ASn generators much further. Of course for large enough n one may decompose the symmetric
46

group Sn as a semidirect product of the alternating group An and S2, and use this decomposition to guide a slight decomposition of ASn, but this doesn’t gain anything. It is likely no further decomposition is possible. One would then like a proof that, for instance, accumulators on the cyclic groups do not sufﬁce, in a way that hopefully sheds some light on what behavior symmetric groups capture that cyclic groups cannot. Alternately, a decomposition of the accumulators on the symmetric groups in terms of accumulators on the cyclic groups would be a remarkable result.
47

CHAPTER 3 AXIOMATIZING THE WEAK SECOND ORDER THEORY OF ONE
SUCCESSOR
In this chapter we present an axiomatization of the theory of the Weak Second Order Theory of One Successor (WS1S), and show that it is complete. Furthermore, we will prove that no such axiomatization can be ﬁnite. This is in preparation for chapter 4 which will explore the nonstandard models of WS1S.
3.1 Introduction
The weak second order theory of one successor, WS1S, is the theory of the two-typed structure G := (ω, F(ω), ∈, S), where F(ω) is the set of ﬁnite subsets of ω, ∈ is the binary relation between types (membership), and S is the successor operation on ω. We will use lower case variables a, b, , x, y, z to denote elements of the ﬁrst type, and upper case variables A, B, X, Y, Z, . . . to represent elements of the second type. Boldface variables x, y, X, Y will indicate shorthand for ﬁnite or inﬁnite sequences of elements of the corresponding type indexed by natural numbers. This structure is particularly interesting not only for being an actually decidable second order theory of ω (the addition of + or weak dyadic second order types makes it undecidable), but also for the correspondence between its theory and the theory of ﬁnite state automata which allows it to be decidable in the ﬁrst place.
Previously an axiomatization of the Weak Second Order Theory of Two Successors (WS2S), whose techniques and results can be reduced to the WS1S case, was completed by Siefkes [10]. However this axiomatization focused on in-
48

ductive axiom schema, suggesting an analogy between WS1S and, e.g. Peano Arithmetic. This paper provides an axiomatization that stresses the weakness of WS1S and the interconnection between WS1S and ﬁnite automata.
As it happens, a ﬁnite axiomatization is too much to hope for, but we will be able to provide an axiomatization with simple statements that are self-evident of G, along with a schema of axioms which can be interpreted as asserting that ﬁnite automata of a certain class have a run on any input. These schema will be fairly complex and only self-evident in the presence of this interpretation. We will not aim in this paper to have our axiomatization be minimal. Instead, the axiomatization presented in this chapter has been chosen speciﬁcally to motivate and set the foundation for chapter 4, investigating the nonstandard models of WS1S.
Closely resembling the approach in [10], our approach to proving the completeness of our axiomatization is as follows: given a formula φ in the language of WS1S, we will inductively deﬁne a ﬁnite automaton Aφ which is expected to be equivalent to the formula φ1. And given a ﬁnite automaton A, we will deﬁne a formula ΦA which is expected to be equivalent to the automaton A2. This will serve as a canonical form for formulas in WS1S. We will then prove that our axioms sufﬁce to show that every formula is equivalent to its canonical form. These steps will resemble the proof of the Bu¨ chi-Elgot-Trakhtenbrot Theorem, the equivalence of formulas in WS1S and ﬁnite automata, with every step being shown to follow from our presented axioms. Finally, we will prove that every true sentence in canonical form can be proved from our axioms.
1That is, Aφ will accept a suitable encoding of X0, . . . , Xn−1 if and only if φ(X0, . . . , Xn−1). 2That is, ΦA(X0, . . . , Xn−1) will hold if and only if A accepts a suitable encoding of X0, . . . , Xn−1.
49

Proving that WS1S cannot be ﬁnitely axiomatized will then be a fairly straightforward matter. Suppose we have a ﬁnite complete axiomatization, which we can by taking a conjunction reduce to a single axiom φ. This formula will follow from our axiomatization, and thus follow from some ﬁnite subset of our axiomatization. We will then produce a model H of this subset of our axiomatization which does not satisfy all of our axioms. φ will be true of H, contradicting our assumption that φ was a complete axiomatization (and thus only true of models that satisﬁed the entire theory of WS1S).
3.2 Axiomatizing (ω, <)
One might wonder how the ﬁrst and second order parts of WS1S interact with each other. More formally, are there relations on ﬁrst order elements which require the use of second order quantiﬁcation in their deﬁnition? One such relation is the ordering < on ω.3 It will be convenient to refer to it in our axiomatization, and each of its appearances will stand in as shorthand for the deﬁnition below: Deﬁnition. A set X is said to be downward closed if:
dc(A) :≡ ∀X : S(x) ∈ A =⇒ x ∈ A
We deﬁne a linear ordering on the ﬁrst order elements by: x ≤ y :≡ ∀X : dc(X) ∧ y ∈ X =⇒ x ∈ X
As such it will be useful to throw in a complete axiomatization of (ω, <) to our axiom system. One such axiomatization is provided in Chapter 13 of [9],
3Another is modular equivalence for any ﬁxed modulus.
50

which is adapted below to use the already-present notation for the successor operation.

Axiom 1. < forms a Linear Ordering with Smallest Element and with Successor Operation S which is All But Onto. That is:

(1.1) ∀x, y, z : x < y ∧ x < z =⇒ x < z

(< is transitive)

(1.2) ∀x, y : x < y =⇒ ¬(y < x)

(< is antisymmetric)

(1.3) ∀x, y : x < y ∨ x = y ∨ y < x

(< is total)

(1.4) ∀x : S(x) > x

(S is strictly increasing)

(1.5) ∀x : y : x < y < S(x)

(S is a successor operation)

(1.6) ∃o : ∀x : o ≤ x

(< has minimal element)

(1.7) ∃o : ∀x : x = o =⇒ ∃y : S(y) = x

(S has image ω \ {o})

Note that it follows from (1.1) and (1.7) that the o in (1.6) and (1.7) are unique, and from (1.4) that the o in (1.6) and (1.7) are the same. We denote this element with the shorthand 0.
It is worth noting that this axiomatization does not deﬁne (ω, <) uniquely. It is simply a base from which all other ﬁrst order facts about (ω, <) can be proved. As mentioned in [9], this set of axioms characterizes the linear orderings of the form ω followed by Z × L ordered lexicographically (ω + ζ · L), for some linear ordering L. These linear orderings all have the same ﬁrst order theory.
Together, all of these axioms set up the foundation upon which we will develop the second order part of our axiomatization.

51

3.3 Basic Set Axioms
In this section, we introduce some basic axioms for the second order elements of WS1S. These ﬁrst few have been intentionally chosen to resemble axioms of ZFC. We will also deﬁne a notion of bitwise operation and show that our axioms sufﬁce to show that the second order elements of WS1S are closed under all the bitwise operations we could expect them to be closed under. Axiom (Extensionality). Sets are equal if and only if they contain the same elements. Axiom (Singleton). For any x, there is a set, denoted {x}, containing only x. Deﬁnition. Given two sets A, B, we write:
C = A ∪ B :≡ ∀x : x ∈ C ⇐⇒ x ∈ A ∨ x ∈ B C = A ∩ B :≡ ∀x : x ∈ C ⇐⇒ x ∈ A ∧ x ∈ B C = A \ B :≡ ∀x : x ∈ C ⇐⇒ x ∈ A ∧ x ∈/ B
A ⊆ B :≡ ∀x : x ∈ A =⇒ x ∈ B
Axiom (Closure Under Boolean Operations). The sets are closed under the boolean operations ∪ and \.
We can interpret second order elements of WS1S with strings of 0s and 1s via their characteristic functions, that is, given a set X, we can produce the sequence s where sn = 1 iff n ∈ X. Via this interpretation, we can deﬁne a notion of bitwise map, where the membership of a number in the output depends only on which of the inputs it is a member of. Obviously the set of ﬁnite sets of natural numbers is not closed under all bitwise maps. Bitwise negation, that is, complementation, takes a ﬁnite set to a coﬁnite set. However, this is essentially the only obstruction.
52

Deﬁnition. We say that an n-ary boolean function f : {0, 1}n → {0, 1} is bounded if it takes (0, . . . , 0) to 0.

Deﬁnition. Given an n-ary boolean function f : {0, 1}n → {0, 1}, we deﬁne the

bitwise-f map Bwf :

Bwf : (P(X))n → P(X),

Such that:

k ∈ Bwf (X0, . . . , Xn−1) ⇐⇒ f (δk∈X0, . . . , δk∈Xn−1) = 1,

Where of course:


0 δk∈X =
1

k ∈/ X, k ∈ X.

Deﬁnition. Given an indexed collection X of sets X0, . . . , Xn−1 we denote by X[k] the set of positions i such that k ∈ Xi:

(X[k])i := {i|k ∈ Xi}.

Given a tuple s ∈ {0, 1}n and indexed collection of sets X0, . . . , Xn, we denote by Xs the set of positions k where the set of indices i such that k ∈ Xi is exactly the positions of 1s in s:
Xs := {k|k ∈ Xi ⇐⇒ si = 1}. Lemma 8. Given a tuple s ∈ {0, 1}n, if s = (0, . . . , 0), then Xs can be written as a combination of X0, . . . , Xn−1 using only ∪ and \.
Proof. First, we note that intersection, ∩, can be written using only \:
A ∩ B = A \ (A \ B).

53

From this, we can use associativity to write arbitrary intersections in terms of ∪ and \.

Next, note that for any s ∈ {0, 1}n,
Xj
j|sj =1
Is the set of positions k where X[k] contains as a subset all the positions of 1s in s.

Next, note that
Xi
i
Is the set of positions k where X[k] is nonempty.

Next, note that for any j,
( Xi) \ Xj
i
Is the set of positions k where X[k] is nonempty and does not contain j.

As such, for any s ∈ {0, 1}n (with s = (0, . . . , 0)),
(( Xi) \ Xj)
j|sj =0 i
Is the set of positions k where X[k] is nonempty and does not contain any positions of 0s in s.

Thus it is that







Xs =  Xj ∩  (( Xi) \ Xj) .

j|sj =1

j|sj =0 i

Proposition 17. Any bitwise bounded boolean function can be written as a combination of ∪ and \.
54

Proof. We can simply write:

Bwf (X0, . . . , Xn−1) =

Xs.

s|f (s)=1

As such, closure under union and difference is enough make sure that the second order objects are closed under all bitwise bounded boolean functions.
Finally, we provide an axiom to ensure that all of the second order objects are bounded. There’s a subtlety here: not even our monadic second order logic can say directly that every set has (arbitrarily) ﬁnitely many elements, but it is possible to say that every set has an upper bound, which in (ω, <), is equivalent.
Axiom. Every nonempty second order element has a <-maximal element. We denote the maximal element of a set X by max(X)
We note that it follows from this axiom and the S is a successor axiom that every set has a downwards closure. Deﬁnition. Given a set X, we denote by its downward closure, dcl(X), the ⊆-least downwards closed set containing X as a subset.
Proposition 18. It follows from our axioms that every set has a downwards closure.
Proof. Given a set X, X has a <-maximal element, denote it by x. Since S(x) > x, there must be a downwards closed set Y containing x which does not contain x (otherwise we would have x ≥ S(x). I claim that Y contains all of X, since any other element z ∈ X satisﬁes z < x, and thus since Y contains x and is downwards closed, this means Y contains z as well by the deﬁnition of <.
55

I also claim that Y is ⊆-minimal, in that any other downwards closed set containing x must also contain something strictly larger than x. Suppose we have two downwards closed sets containing x, W = W . Without loss of generality, we may assume there is some w ∈ W with w ∈/ W . Since w is not in W , a downwards closed set containing x. Thus we have that w x, and so w > x (by totality). So there cannot be two downwards closed sets containing x as their maximal element.
Finally, we note that atoms under ⊆ are actually singletons. Proposition 19 (No spurious singletons). A nonempty set X contains only itself and the emptyset as subsets iff it is a singleton {x} for some x.
Proof. (⇒): Suppose we have a nonempty set X containing only itself and the emptyset as subsets. Since X is nonempty, it contains an element, say x. Suppose X also contains an element y = x. Then X \ {x} is neither X nor empty, contradicting our hypothesis. So X = {x}.
(⇐): Suppose Y ⊆ {x} and Y is nonempty. Then Y must contain an element y. However, in order to be a subset, y ∈ {x}, hence y = x. Y cannot contain any other elements, as {x} doesn’t contain any other elements.
A simple result, but a critical one later, since we will need to represent ﬁrst order elements as singletons, and to do that we will need to be certain that singletons are describable in terms of their ⊆ behaviour alone.
We summarize and number these axioms as follows: Axiom 2. The second order objects are deﬁned by their elements, closed under bounded bitwise operations, closed under ﬁnite modiﬁcation, and bounded.
56

(2.1) ∀X, Y : (∀x : x ∈ X ⇔ x ∈ Y ) ⇒ X = Y

(Extensionality)

(2.2) ∀x : ∃X : ∀y : y ∈ X ⇔ y = x

(Singleton)

(2.3) ∀X, Y : ∃Z : ∀x : x ∈ Z ⇔ (x ∈ X ∨ x ∈ Y ) (Union Closure)

(2.4) ∀X, Y : ∃Z : ∀x : x ∈ Z ⇔ (x ∈ X ∧ x ∈/ Y ) (Difference Closure)

(2.5) ∀X : ∃x : x ∈ X ∧ ∀y : y ∈ X ⇒ y ≤ x

(Bounding)

3.4 Logical Automata

In this section we will introduce a notion of logical automata, automata which have been modiﬁed to operate within WS1S. Alternately, we can think of this in terms of canonical forms for formula. WS1S has a very nice canonical form for formulas, and logical automata will provide a semantically transparent parameterization of these canonical forms.

Recall that if X is an indexed collection with ﬁnite set of indices K, and

s ∈ {0, 1}K, then we use the shorthand Xs to stand for {x|∀k ∈ K : sk = 1 ⇐⇒ x ∈ Xk}. Thus, we write x ∈ Xs as shorthand for:

  

x ∈ Xs :≡  x ∈ Xk ∧  x ∈/ Xk .

k|sk =1

k|sk =0

We also use the shorthand n for the set {0, . . . , n − 1}, but never in reference to

the second order element of a model of WS1S.

Deﬁnition. A logical automaton is a 5-tuple A = (n, Q, I, δ, F ) where:

• n is the number of inputs, potentially 0. • Q is a ﬁnite set of states. We will refer to the elements of Q via some canonical
ordering as 0, . . . , |Q| − 1, but often the states will retain additional information,
57

for instance they may be pairs of states of other automata. We will also refer to the number of states of an automaton A as |A|. Often we will prove results for automata with state set m where this sufﬁces to prove the result in general. • I ⊆ Q is the set of initial states. • δ ⊆ Q × 2n × Q is the transition relation. • F ⊆ Q is the set of ﬁnal states.
These automata are speciﬁcally designed to accept n-tuples of second order elements of a model of WS1S. Given an n-tuple of second order objects X = (X0, . . . , Xn−1), we say that a ﬁrst order-object and a |Q|-tuple of second order objects Y = (Y0, . . . , Y|Q|−1) is a run of A on input X if:
• is strictly larger than anything in any Xi (particularly, the least strict upper bound to the Xi).
• Yi form a partition of dcl( ), that is, they have pairwise empty intersection and together their union is all of dcl( ).
• 0 ∈ Yq for some q ∈ I. • For any 0 < x < , if x ∈ Yq and x ∈ Xs and S(x) ∈ Yq then (q, s, q ) ∈ δ.
We say that such a run is accepting if additionally:
• ∈ Yq for some q ∈ F .
We interpret the deﬁnition of a run as follows: ﬁrst order elements represent instances of time. If x ∈ Yq for some state q (and since the Yq form a partition of dcl({ }), this happens for exactly one state provided x ≤ ) we interpret this as
58

the automaton being in state q at time x. Of course, we require the automaton to be in exactly one state at any time, and we require the automaton to be in a start state at time 0. Then we require that if the automaton is in a state q at time x and it reads input s (the input being read at any one time is just a binary sequence representing whether x is in each of the inputs.) at time x, then at time S(x), the automaton should be in a state q such that q, s, and q are related by the transition relation. We also require that the automaton stop at some time , after having read all the input. Note that at time max( i∈n Xi), the state of the automaton has not yet been updated to take into account the input at time max( i∈n Xi) which by construction is nontrivial, so we require that be strictly larger than this.
A design choice has been made as to whether the state at time x should be the state of the automaton before reading the input at time x or after. There are advantages and disadvantages to both, and both give equivalent notions of automata, but it seems like the former option is preferable, due to saving steps on many inductive proofs throughout this chapter.
Next, given an automaton A = (n, Q, I, δ, F ), we construct a formula ΦA(X0, . . . , Xn−1) which formalizes the above.
Deﬁnition. Deﬁne the formula R(n,Q,I,δ,F )(X0, . . . , Xn−1, , Y0, . . . , Y|Q|−1) in WS1S as the conjunction of:
• ∀x : i∈Q j∈Q[x ∈ Yj ⇐⇒ (i = j ∧ 0 ≤ x ≤ )] • i∈I 0 ∈ Yi • ∀x : 0 ≤ x < =⇒ (i,s,k)∈δ(x ∈ Yi ∧ x ∈ Xs ∧ S(x) ∈ Yk)
59

This formalizes the notion of ( , Y0, . . . , Y|Q|−1) being a run of (n, Q, I, δ, F ) on input X0, . . . , Xn−1.
Deﬁnition. Deﬁne the formula Φ(n,Q,I,δ,F )(X0, . . . , Xn−1) in WS1S as:
∃ : ∃Y0, . . . , Y|Q|−1 : = LSUB( X) ∧R(n,Q,I,δ,F )(X0, . . . , Xn−1, , Y0, . . . , Y|Q|−1) ∧ ∈ Yi.
i∈F
Where LSUB(X) is shorthand for the least strict upper bound for a set, either 0 for the emptyset or S(max(X)) for a nonempty set.

This formalizes the notion of there being an accepting run of (n, Q, I, δ, F ) on input X0, . . . , Xn−1.
We are now prepared to introduce our axiom schema for ﬁnite automata. It sufﬁces in this case to simply specify that automata have runs on any input. We will be able to prove later theorems involving whether automata have accepting runs or not. Of course, not every automaton has a run on any input, but there are simple things we can check to ensure that every automaton has a run on any input.
Deﬁnition. An automaton (n, Q, I, δ, F ) is said to be runnable if:

• |I| > 0, • ∀q, s ∈ 2n : ∃q : δ(q, s, q ). Deﬁnition. An automaton (n, Q, I, δ, F ) is said to be reverse-runnable if:

• |F | > 0,

60

• ∀q, s ∈ 2n : ∃q : δ(q , s, q).
Note that these are assertions about the internal structure of the automaton: nothing about the runs or acceptance of the automaton. As such, the following axiom schema is nontrivial: Axiom. (3.A) If A = (n, Q, I, δ, F ) is a runnable automaton, for any sequence of inputs X0, . . . , Xn−1,
Φ(n,Q,I,δ,Q)(X0, . . . , Xn−1). Axiom. (4.A) If A = (n, Q, I, δ, F ) is a reverse-runnable automaton, for any sequence of inputs X0, . . . , Xn−1,
Φ(n,Q,Q,δ,F )(X0, . . . , Xn−1).
This is a very speciﬁc sort of inductive axiom: if we can start constructing a run of our automaton and at any point if we have a partial run, we can ﬁnd a state to transition to next, then we can construct the whole run of our automaton. In the runnable case we construct our run from start to end, and in the reverse-runnable case, we construct a run from end to start. Note that in the runnable case, we don’t have to worry about our run being accepting, and in the reverse-runnable case, we don’t have to worry about our run having the right start state, since we’ve replaced F and I respectively with all of Q. This completes our axiomatization.
3.5 Basic Consequences of Our Axioms
In this section we collect some of the basic consequences of our axioms, in preparation for future sections.
61

First, we deﬁne a predecessor operation P which is meant as an inverse to S, and deﬁne the operations S , P on second order elements which are meant to apply their respective operations to each element in the set.

Deﬁnition. Deﬁne


0 P (x) :=
y

x = 0, x = S(y).

Given a second order element X, deﬁne S (X) and P (X) such that:

S(x) ∈ S (X) ⇐⇒ x ∈ X,

and 0 ∈/ S (X), and:

x ∈ P (X) ⇐⇒ S(x) ∈ X.

Proposition 20. It follows from our axioms that the second order objects are closed under S .

Proof. Construct the automaton

B := (1, {0, 1}, {0}, δ, {0, 1}),

Where:

δ = {(i, j, j)|i, j ∈ {0, 1}}.

Let Y be a run of B on input X. Then it is easy to check that Y1 is S (X) by looking at the transition relation.
Proposition 21. It follows from our axioms that the second order objects are closed under P .

62

Proof. Construct the automaton

B = (1, {0, 1} × {0, 1}, {0, 1} × {0, 1}, δ, {(0, 0)}),

Where:

δ = {((i, j), k, (j, k))|i, j, k ∈ {0, 1}}.

Let Y be a run of B on input X (this uses (4.A)). Then it is easy to check by looking at the transition relation that Y(1,0) ∪ Y(1,1) is just X, and from this that Y(0,1) ∪ Y(1,1) is P (X).

Secondly, I assert that every set has a <-least element. This is a key result that allows us to perform inductive proofs: if we want to show that something always happens for any ﬁrst order element, ∀x : P (x), then it sufﬁces to prove three things:

1. P (0), 2. P (x) =⇒ P (S(x)), 3. The set of x such that ¬P (x) is a second order element of our model4.

I claim it will then follow that ∀x : P (x). Let Y be the set of x such that ¬P (x). Then Y will have a least element. This least element cannot be 0 or the successor of anything, contradicting (1.7). Typically, we will prove (3) by application of the closure axioms (2.2), (2.3), (2.4), (3.A), and (4.A).
We can even show that a particular run satisﬁes the transition relation of an automaton using induction. In order to use induction, we need to be able to talk about the set of positions where, given an automaton A and input X, a proposed
4Or, at least, for any y, the set {x|x < y ∧ ¬P (x)} is a second order element of our model.
63

run Y fails to satisfy the transition relation δ for A. These positions are given by:
[Ya ∩ Xs ∩ P (Yb)].
(a,s,b)∈/δ
Typically, though, there will be a more direct proof: if X, Y are deﬁned in such a way as to satisfy a particular transition relation, then it will usually follow directly that they satisfy the transition relation for our automaton A.
Proposition 22. It follows from our axioms that every nonempty set has a <-least element.

Proof. Begin with a set X. Consider the following automaton:

Bit := (1, {0, 1}, {0}, δ, {0, 1}),

Where:

δ := {(0, 0, 0), (0, 1, 1), (1, 0, 1), (1, 1, 1)}.

Now, let , Y0, Y1 be the run of Bit on input X. will be the successor of the largest element of X. Y0 will be the set of positions up to and including the ﬁrst appearance of an element of the input, and Y1 will be all positions afterwards up to and one past the last appearance of an element of the input.

First, I assert that if z ∈ Y0 and w < z then w ∈ Y0. Suppose not. Then there is a largest w with w < z and w ∈ Y1, the largest element of Y1 ∩ dcl({z}). This w cannot be z, since z ∈ Y0, so it must be that S(w) ≤ z. But since w was the largest element of Y1 ∩ dcl({z}), it must be that S(w) ∈ Y0, contradicting the construction of our transition relation.

Y0 is nonempty since 0 is the starting state. Thus, By (2.5), Y0 has a largest element y. Since {Y0, Y1} is a partition of dcl({ }), S(y) is either in Y1 or not in dcl({ }) at all, that is, S(y) > (by totality).

64

Case 1: suppose that S(y) is in Y1. Since y ∈ Y0 and S(y) ∈ Y1, it must be by the transition relation that y ∈ X. Furthermore, I assert that there is no z < y such that z ∈ X, since otherwise by the transition relation, S(z) would be in Y1, contradicting our earlier proved downward closure of Y0. So this y is the smallest element of X.
Case 2: suppose that S(y) > . Since y ∈ dcl({ }), it must be that y ≤ , thus since S is a successor operation, y = . By the downward closure of Y0, it must be that Y0 = dcl({ }). I claim in this case that X is empty, since if it contained an element w, then S(w) would be in Y1 and Y1 must be empty, since {Y0, Y1} is a partition of dcl({ }).
3.6 Determinization
One goal in this chapter is to prove that the standard Bu¨ chi-Elgot-Trakhtenbrot result, equivalence of ﬁnite automata and WS1S formulas, holds for our logical automata, and that this equivalence is a consequence of our axioms. Speciﬁcally, given a formula φ, we wish to prove that there is an automaton A such that:
φ(X0, . . . , Xn−1) ⇐⇒ ΦA(X0, . . . , Xn−1).
We will denote this automaton Aφ. As such, much of the next few sections is dedicated to reproving classical results of automata theory, from the given axioms.
Critical to the importance of ﬁnite automata is the expressive equivalence of deterministic and non-deterministic automata. While non-deterministic automata are much more natural from an atemporal, relational, formal logic stand-
65

point, being as they are simply a speciﬁc case of a much more general class of local constraint systems5, deterministic automata are much more natural from a temporal automaton perspective. Additionally, it is clear how to complement them, a critical problem in the theory of general local constraint systems.
Deﬁnition. An automaton A = (n, Q, I, δ, F ) is deterministic if:

• |I| = 1, • ∀q, s ∈ 2n : ∃!q : δ(q, s, q ).

An automaton A = (n, Q, I, δ, F ) is reverse-deterministic if:

• |F | = 1, • ∀q, s ∈ 2n : ∃!q : δ(q , s, q).
Proposition 23. It follows from our axioms that every deterministic automaton A = (n, Q, I, δ, F ) has exactly one run on input X0, . . . , Xn−1 of length = LSUB( X).

Proof. Since A is runnable, it follows from (3.A) that A has at least one such run.

Suppose there are two runs Y and Y . Consider the set

K=

[Yq ∩ Yq ],

q=q ∈Q

The set of positions where Y and Y differ. I assert that K is empty. Suppose not,

then it must have a least element. This least element cannot be 0, since |I| = 1.
5Here we refer to systems with a notion of labelling and locality where we distinguish those labellings which can be augmented to ones that in every local region satisfy some constraint versus those that cannot. A wide variety of automata are of this form, including cellular automata, but a particularly enlightening example may be unranked tree automata, whose local regions are not bounded in size. Additionally, differential equations and operations are of this form: there is a sense in which they are a special case of continuous automata.

66

Then it is S(x) for some x. Then we know the state of our runs at time x agree, but at time S(x) disagree, contradicting δ being a function from Q × 2n → Q.
Deﬁnition. Given an automaton A = (n, Q, I, δ, F ), deﬁne the determinization of A, denoted det(A), as the automaton:

det(A) = (n, P(Q), {I}, δ , {K ⊂ n|K ∩ F = ∅}),

Where:

δ (R, s, T ) : ⇐⇒ T = {q |∃q ∈ R : δ(q, s, q )}.

Of course det(A) is deterministic. Proposition 24. From our axioms, we can conclude that given an automaton A = (n, m, I, δ, F ) and an input X0, . . . , Xn−1,
ΦA(X) ⇐⇒ Φdet(A)(X)

To summarize the proof: I will assert that at any point x, the state of det(A) will be the set of states for which there is a run of A on the same input with that state at point x. Showing that this follows from our axioms is a bit tricky: we can show that there is no speciﬁc point x where this stops happening, but our level of induction is not strong enough for this to sufﬁce.
In order to show that states don’t disappear in the run of det(A), that is, it isn’t the case that there’s a state which is reachable, but doesn’t appear in the run of det(A), one simply has to compare a run containing the state in question at the position in question to the run of det(A). Bitwise operations will give us the set of positions where states of our run do not appear in the run of det(A) and this set cannot have a smallest element, so it’s empty.
67

To show that states don’t spuriously appear in the run of det(A), that is, it isn’t the case that there’s a state which isn’t reachable, but does appear in the run of det(A) is trickier: given a state at a position in the run of det(A), we construct a run of A backwards, starting at this state, and using the run of det(A) as a guide to make sure that we stay within states that can be traced back to the starting state and not wind up in some dead-end branch. This construction is performed by an automaton H. We expect H to be reverse-runnable, but only because of our choice of input, so we need to add in a dummy state to ensure that H has a transition relation that makes it reverse-runnable, and then argue that for this particular input, the dummy state is never used.
So if there is an accepting run of A, it must be the case that the run of det(A) ends in a state containing the ﬁnal state of our accepting run of A, hence this run of det(A) is accepting. If there is an accepting run of det(A) then it must be the case that some accept state appears in the ﬁnal state of this run, and so there’s a run of A such that this state appears as the ﬁnal state.
It is also worth noting that this is the essential use of axiom schema (3.A) and (4.A): asserting that the determinization has a run at all and then asserting that the auxiliary automaton H has a reverse run. Other uses of these axioms could be made into their own axioms with signiﬁcantly less semantic content, but this case seems to use the full power of our schema.
Proof. det(A) is deterministic, so we may consider the unique run Z on input X. I assert the following. Suppose x ∈ ZJ for J ∈ P(m). Then J is exactly the set of all states q such that there is a run Y0, . . . , Ym−1 satisfying RA(X0, . . . , Xn−1, x, Y0, . . . , Ym−1) with x ∈ Yq. Informally, the state of det(A) at any one time x is the set of states for which there is some run of A with that
68

state at time x.

It should be clear that if this statement is true for x then it’s true for S(x). Suppose it’s the case that x ∈ ZJ and J is exactly the set of states q that show up at position x in some run Y of A on input X. Then by construction of δ , S(x) is in some ZK for K the set of states which potentially show up as the next states after reading the input at position x. One may construct an appropriate run by augmenting the runs Y by combining the singleton and boolean operation axioms.

Now suppose it’s the case that there’s a run Y0, . . . , Ym−1 with state q appearing at position x but x ∈ ZJ for some J with q ∈/ J. Then consider the set:

U := {x|∃i, J : x ∈ Yi ∧ x ∈ ZJ ∧ i ∈/ J} = Yi \ ZJ .

i∈m

Ji

This set exists by closure properties. By our assumption, this set is nonempty,

and hence has a smallest element y. By construction of det(A), this y = 0. What

this means is that there is an S−1(y) which is not in U but y is in U . But again,

one can easily check that if S−1(y) ∈/ U then by construction of det(A), neither is

y.

So the only thing that could go wrong was if there were a state q and a position x such that x ∈ ZJ for some J q, but there was no run Y of A with x ∈ Yq. That is, some state mysteriously appeared in the run of det(A) that didn’t belong there. Suppose this is the case. Consider the following automaton6:
H := (n + m, m ∪ {qd}, m ∪ {qd}, δ , {q}).
Where (p, s t, p ) ∈ δ if (p, s, p ) ∈ δ ∧ tp = 1 or if p = qd and there is no a such that (a, s, p ) ∈ δ and ta = 1.
6This automaton is based on the Harvester Automaton for A used in Generating the Regular Functions and Krohn-Rhodes.

69

By construction, H is reverse-runnable, and we refer to qd as the dummy state. Since H is reverse-runnable, there is a run W such that:

RB(X0, . . . , Xn−1, U0, . . . , Um−1, x, W0, . . . , Wm, Wqd),

Where:

Ui := {x|∃J i : x ∈ ZJ } = ZJ .
Ji

I assert that Wqd = ∅. Suppose not, then Wqd has a maximal element w. This element is not x, since we’re in state q at that point.

Label our situation as follows: let J be the state of det(A) at position w and
J be the state of det(A) at position S(w) (note that this even makes sense if S(w) = x). Let s be the input at time w. So, speciﬁcally, w ∈ Xs, w ∈ ZJ , w ∈ Wqd, S(w) ∈ ZJ , and S(w) ∈ Wr. Here’s a diagram to keep track of the situation. Recall that X was the original input to our automaton A, Z was the
run of det(A) on input X, and W was the run of H on input (X, U) where U
was a representation of Z.
w S(w) Xs ZJ J W qd r (r )
Table 3.1: Transition Diagram for the Determinization and Harvester Near the Dummy State

On the one hand, because w ∈ Wqd, it must be the case that there is no state a such that (a, s, r) ∈ δ with a ∈ J.
• In case S(w) = x, r must be the ﬁnal state of H which we chose to be q, which we know is in J .
70

• Otherwise, r must have been obtained going backwards from another state, r , which by the transition relation for H tells us that r ∈ J .
Either way, we get that r ∈ J .
By construction of Z, we know that any state showing up in J can be traced back to a state in J. Speciﬁcally, there is some state a such that (a, s, r) ∈ δ with a ∈ J, a contradiction.
This proves our assertion that Wqd is empty, which means that W is a valid run of A on input X up to point x (part of the deﬁnition of H was that its output would have to satisfy the transition relation for A or have a qd show up somewhere).
This completes the proof of our earlier assertion, that the state of det(A) at any one time x is the set of states for which there is some run of A up to that point with that state.
So, as stated earlier, if there is an accepting run of A, it must be the case that the run of det(A) ends in a state containing the ﬁnal state of our accepting run of A, hence this run of det(A) is accepting. If there is an accepting run of det(A) then it must be the case that some accept state appears in the ﬁnal state of this run, and so there’s a run of A such that this state appears as the ﬁnal state.
We will see in the next section that this means that it follows from our axioms that the standard complementation procedure of determinizing and then complementing the set of accept states works correctly.
71

3.7 Complementation
Deﬁnition. Given a deterministic automaton A = (n, Q, I, δ, F ), deﬁne Swap(A) = (n, Q, I, δ, Q \ F ). Proposition 25. It follows from our axioms that if A is deterministic, and X a suitable input, then:
¬ΦA(X) ⇐⇒ ΦSwap(A)(X).
Proof. It follows from determinization that there is a unique run Y of A on input X. One can determine the last state of this run by taking the maximum of Y, and determining where it came from. Either it came from F in which case ΦA(X) or it came from Q \ F , in which case ΦA(X), but not both. Deﬁnition. Given a general automaton A = (n, Q, I, δ, F ), deﬁne Comp(A) = Swap(det(A)). Proposition 26. It follows from our axioms that for any A and suitable input X:
¬ΦA(X) ⇐⇒ ΦComp(A)(X).
3.8 Conjunction
In this section we show that if we have automata A and A which correspond to formulas φ, φ , then we can construct an automaton Conj(A, A ) which corresponds to the formula φ ∧ φ . Deﬁnition. Given two n-ary automata A = (n, Q, I, δ, F ) and A = (n, Q , I , δ , F ) deﬁne their conjunction:
Conj(A, A ) := (n, Q × Q , I × I , δ , F × F ),
72

Where:

δ = {((a, b), s, (c, d))|(a, s, c) ∈ δ ∧ (b, s, c) ∈ δ },

Proposition 27. It follows from our axioms that, given n-ary A, A and suitable X, then:
[ΦA(X) ∧ ΦA (X)] ⇐⇒ ΦConj(A,A )(X).

Proof. (⇒): Suppose we have runs Y witnessing ΦA(X) and Y witnessing ΦA (X). Construct run W where:

W(q,q ) = Yq ∩ Yq .

It’s easy to check that W witnesses ΦConj(A,A )(X).

(⇐): Suppose we have run Y witnessing ΦConj(A,A )(X). Then let:

Wq =

Y(q,q ),

q ∈Q

And:

Wq = Y(q,q ).
q∈Q
Then it’s easy to check that W witnesses ΦA(X) and W witnesses ΦA (X).

3.9 Bounding and Unbounding
Just as important as the expressive equivalence between deterministic and nondeterministic automata is the expressive equivalence between bounded automata, those automata, like our implementation of logical automata, whose runs terminate immediately after reading the last nontrivial input, and unbounded automata, those automata which are allowed to continue running after
73

reading the last nontrivial input. In this section, we will deﬁne an analog of Φ for running logical automata unboundedly, and provide constructions for an equivalence between bounded and unbounded automata. Deﬁnition. Recall that we constructed a formula
RA(X0, . . . , Xn−1, , Y0, . . . , Y|Q|−1) to indicate that Y0, . . . , Y|Q|−1 was a run of length of A on input X0, . . . , Xn−1.
Deﬁne the formula Ψ(n,Q,I,δ,F )(X0, . . . , Xn−1) in WS1S as: ∃ : ∃Y0, . . . , Y|Q|−1 : ≥ LSUB( X) ∧R(n,Q,I,δ,F )(X0, . . . , Xn−1, , Y0, . . . , Y|Q|−1) ∧ ∈ Yi.
i∈F
We now deﬁne a notion of unbounding a logical automaton: Deﬁnition. Given an automaton A = (n, Q, I, δ, F ), deﬁne its unbounding:
UBd(A) := (n, Q × Q, {(i, i)|i ∈ I}, δ , Q × F ),
Where: δ ((a, b), s, (c, d)) : ⇐⇒ (s = 0 ∧ d = b ∧ δ(a, s, c)) ∧(s = 0 ∧ d = c ∧ δ(a, s, c))
This automaton keeps track of two things: the state of the original automaton A and the last state the automaton went into when it saw a nonempty input. Proposition 28. Given an automaton A = (n, m, I, δ, F ), and suitable input X, it is provable from our axioms that the following are equivalent:
74

1. ΦA(X), 2. ΦUBd(A)(X), 3. ΨUBd(A)(X).

Proof. (1) ⇒ (2): Let Y be the accepting run as given in (1).

Given our accepting run Y of A, our goal is to construct two runs, which we will then combine together to produce an accepting run of UBd(A). This ﬁrst is our run Y. The second is the run above, but delayed in the case of inputs which are 0.

Deﬁne the automaton D:

D := (|Q| + 1, Q, I, δ , Q),

Where:

δ (a, r t, b) : ⇐⇒ [(t = 1 ∧ rb = 1) ∨ (t = 0 ∧ b = a)] .

Let V be the run of D on input P (Y0), . . . , P (Y|Q|−1), i Xi. One then checks that, by construction, W where:
W(q,q ) = Yq ∩ Vq , Is an accepting run of UBd(A) on input X.

(2) ⇒ (3): This is trivial. A bounded run of UBd(A) is also an unbounded run of UBd(A).

(3) ⇒ (1): Given an accepting unbounded run Y of UBd(A), let LSUB( i Xi). Then our desired run of A is W, where:
Wq = Y(q,q ) ∩ dcl({ }).
q

=

75

One can show inductively that everywhere beyond ﬁnishing reading the input X, the second coordinate of the state UBd(A) is in is a particular accept state of A. One then argues that at position max( i Xi), UBd(A) read a nontrivial input, and so updated the second coordinate of its state to agree with the ﬁrst coordinate of its state (ie, that particular accept state). If i Xi is empty, then the second coordinate of the state cannot change, so we must have started in an accept state. Hence, the run W is accepting.
We now deﬁne a notion of bounding a logical automaton: Deﬁnition. Given an automaton A = (n, Q, I, δ, F ), deﬁne its bounding:
Bd(A) := (n, Q, I, δ, F ),
Where:
F = {q ∈ Q|∃i, q0, . . . , qi−1 :q = q0 ∧qi−1 ∈ F ∧∀j : δ(qj, 0, qj+1)
Proposition 29. It follows from our axioms that, given an automaton A = (n, Q, I, δ, F ), and input X, the following are equivalent:
1. ΨA(X), 2. ΨBd(A)(X), 3. ΦBd(A)(X),
Proof. (1) ⇒ (2): This is clear. Any witnessing run in (1) is also a witnessing run in (2).
76

(3) ⇒ (1): One simply takes the witnessing run in (3) and augments it with the ﬁnite series of states which lead the ﬁnal state of such a run to a ﬁnal state of A. This augmented run is in our model by closure under ﬁnite modiﬁcations, and of course is an accepting unbounded run of A.

(2) ⇒ (3): Consider an accepting run in (2). I asset that this run is all ac-

cept states from the end of reading the input. Speciﬁcally, suppose we have

, Y0, . . . , Y|Q|−1 with ≥ LSUB( X), RBd(A)(X, , Y), and ∈ Yf for some f ∈ F .

Deﬁne the set:

U := Yi \ dcl(
i∈/F
Where F is the set of accept states of Bd(A).

X ),

I assert that U , the set of all positions beyond reading the input where the state is not an accept state of Bd(A), is empty, and thus that LSUB( X) ∈ Yi for some i ∈ F , which would mean that the appropriate preﬁx of Y is the desired run in (3). Suppose U is nonempty. Then let u be the maximal element of U . u cannot be max( Y ), since we know that max( Y ) ∈ Yf for some f ∈ F by hypothesis. So it must be the case that S(u) is in Yf for some f ∈ F . In this case one may augment the sequence witnessing that f ∈ F to verify that whatever i gives us u ∈ Yi must also be in F , contradicting our choice of u.

In the next sections, we will take advantage of our new equivalence to handle a small detail regarding free variables and existential quantiﬁcation (projection).

77

3.10 Permutation and Introduction of Variables
Our project so far has been to show that if we have a formula φ with free second order variables X0, . . . , Xn−1, then there is an automaton A with n inputs such that:
∀X0, . . . , Xn−1 : φ(X0, . . . , Xn−1) ⇐⇒ ΦA(X0, . . . , Xn−1). However, this isn’t quite all that we need to show. Consider the example of taking the conjunction of two formulas φ and φ , where φ has free variables X0, X1 and φ has free variables X1, X2. We construct inductively, automata A, A such that
φ(X0, X1) ⇐⇒ ΦA(X0, X1) and
φ (X1, X2) ⇐⇒ ΦA(X1, X2). Before we can take our conjunctions, we must ﬁrst construct automata B, B that take in 3 inputs, such that
φ(X0, X1) ⇐⇒ ΦB(X0, X1, X2)
and φ (X0, X1, X2) ⇐⇒ ΦB(X0, X1, X2).
Then we may use our main proposition from our section on conjunction to produce an automaton C such that
φ(X0, X1) ∧ φ (X1, X2) ⇐⇒ ΦC(X0, X1, X2).
A small detail, certainly, but one we cannot brush over if we wish to prove that our equivalence holds from our axioms. There are many ways to handle this issue indirectly, but in this section we will address the issue directly with a pair of results.
78

Deﬁnition. Given an automaton A = (n, Q, I, δ, F ), deﬁne the automaton Aug(A):

Aug(A) := (n + 1, Q, I, δ , F ),

Where:

δ := {(a, s t, b)|(a, s, b) ∈ δ}.

Proposition 30. Given an n-ary deterministic automaton A, for which an empty input takes accepting states to accepting states, the following are equivalent:

1. ΨA(X0, . . . , Xn−1), 2. ΨAug(A)(X0, . . . , Xn−1, Xn).

Proof. (2) ⇒ (1) is immediate. The accepting run for (2) is the accepting run for (1).
(1) ⇒ (2): The only difﬁcult case here is if the accepting run for (1) is shorter than the new input Xn. Otherwise the accepting run for (1) is the accepting run for (2). Suppose now that we need to extend the accepting run for (1) to be longer, while still being accepting. We will prove that ΦAug(A)(X0, . . . , Xn), from which (2) will follow. Since A is deterministic, so is Aug(A). By axiom (3.A), there is a bounded run Y of Aug(A) on input X0, . . . , Xn. Since runs of deterministic automata are unique, Y is just an extension of the accepting run for (1). One then can prove by induction that since the accepting run for (1) ended in an accept state, and since the transition relation for Aug(A) on empty input takes accepting states to accepting states, Y must end in an accepting state.
Corollary 1. Given an n-ary automaton A, the following are equivalent, for all X0, . . . , Xn:
79

1. ΦA(X0, . . . , Xn−1), 2. Φdet(A)(X0, . . . , Xn−1), 3. ΨUBd(det(A))(X0, . . . , Xn−1), 4. ΨAug(UBd(det(A)))(X0, . . . , Xn−1, Xn), 5. ΦBd(Aug(UBd(det(A))))(X0, . . . , Xn).
Proof. All of the steps above are equivalences we’ve previously shown. It simply sufﬁces for (3) ⇐⇒ (4) to note that any automaton in the image of UBd will take accept states to accept states, and that the unbounding of a deterministic automaton will be itself deterministic.
Finally, note that in our original motivating example, we had a formula φ (X1, X2), and we wanted to, given an automaton A such that:
φ (X1, X2) ⇐⇒ ΦA(X1, X2),
construct an automata B such that:
φ (X1, X2) ⇐⇒ ΦB(X0, X1, X2).
In order to obtain such an automaton in addition to our introduction of variables construction, we also need a permutation of variables construction. Simply note that the obvious construction of appropriately permuting the transition relations works correctly: one can show that the run of one automaton, by virtue of satisfying its transition relation, satisﬁes the transition relation of the other on suitably permuted inputs.
Having shown we can handle them, henceforth in this paper we will ignore issues of introduction and permutation of variables. In the next section, we
80

apply our unbounding construction to prove that projected automata behave correctly, provably from our axioms.

3.11 Projection

In this section we seek to prove that the traditional construction for projecting the language accepted by an automaton works correctly in our axiom system. That is, we wish to show that given an automaton A, there is an automaton Proj(A) such that:

∃Xn : ΦA(X0, . . . , Xn−1, Xn) ⇐⇒ ΦProj(A)(X0, . . . , Xn−1).

Deﬁnition. Given an automaton A = (n + 1, Q, I, δ, F ), deﬁne the unbounded projection: UProj(A):

UProj(A) = (n, Q × {0, 1}, I × {0, 1}, δ , F × {0, 1}),

Where:

δ := {((a, i), s, (b, j))|i, j ∈ {0, 1}, (a, s i, b) ∈ δ}.

Proposition 31. Given suitable input X0, . . . , Xn−1, the following are equivalent:

1. ∃Xn : ΨA(X0, . . . , Xn), 2. ΨUProj(A)(X0, . . . , Xn−1).

Proof. (1) ⇒ (2): Suppose Xn is such that ΦA has accepting run Y on input X0, . . . , Xn. Then construct run:
W(i,0) =Yi ∩ (dcl(Xn ∪ Yi) \ Xn),
i
W(i,1) =Yi ∩ Xn.

81

I assert that this is an accepting run for ΨUProj(A). One can check that the transition relation holds.
(2) ⇒ (1): Suppose ΨUProj(A) has accepting run Y on input X0, . . . , Xn−1. Then construct:
Wi =Y(i,0) ∩ Y(i,1) Xn = Y(i,1)
i
Then I assert that Xn is such that ΨA has accepting run W on input X0, . . . , Xn.
Finally, we construct the projection operation for bounded automata: Proposition 32. Given automaton A, the following are equivalent:
1. ∃Xn : ΦA(X0, . . . , Xn), 2. ∃Xn : ΨUBd(A)(X0, . . . , Xn), 3. ΨUProj(UBd(A))(X0, . . . , Xn−1), 4. ΦBd(UProj(UBd(A)))(X0, . . . , Xn−1).
As such, we refer to Bd(UProj(UBd(A))) as Proj(A). This completes our collection of inductive constructions. We now need to provide, for various possible atomic formulae, base case automata that are provably equivalent to them.
82

3.12 Base Case Automata
While we’ve developed a lot of tools for constructing new automata from old, we haven’t yet constructed many automata to start with. Deﬁnition. Deﬁne the deterministic automaton ASub:
ASub := (2, {s, f }, {s}, δ, {s}), Where the b such that δ(a, s, b) is given by the table below:
s\a s f 00 s f 01 s f 10 f f 11 s f
Table 3.2: Transition Table for ASub
Proposition 33. It is provable from our axioms that X ⊆ Y iff ASub accepts the pair X, Y .
Proof. (⇒): Suppose that X ⊆ Y . Then one may prove that ASub, on reading X, Y , is always in the state s, since by the transition relation, there can be no <least position when ASub switches from state s to state f . Hence it is accepting.
(⇐): (by contrapositive) Suppose that X Y . Then there is some x ∈ X \ Y . After this position, ASub will be in the f state, and by the transition relation there cannot be a <-least position when it returns to the s state, hence it will ﬁnish in the f state. Deﬁnition. Deﬁne the deterministic automaton ASuc:
ASuc := (2, {s0, s1, f }, {s0}, δ, {s0}),
83

Where the b such that δ(a, s, b) is given by the table below: s\a s0 s1 f 00 s0 f f 01 s1 f f 10 f s0 f 11 f s1 f
Table 3.3: Transition Table for ASuc
Proposition 34. It is provable from our axioms that S (X) = Y iff ASuc accepts the pair X, Y .
Proof. (⇒): Suppose that S (X) = Y . Then I assert that an accepting run of ASuc is Zs0 = dcl(Y ) \ Y , Zs1 = Y and Zf = ∅. One can easily check that this satisﬁes the transition relation, and ends with an accept state.
(⇐): Now suppose that S (X) = Y , that is, there is some x ∈ X such that S(x) ∈/ Y or some x ∈/ X such that S(x) ∈ Y . In either case, ASuc will transition into state f after stage S(x), and remain there until the end of the run. It can also be the case that there is no stage S(x), that is, x is the largest element of X ∪ Y , in which case our run will end on a s1 state, which is not accepting.
3.13 The Weak Exclusively Second Order Theory of One Suc-
cessor
We are now prepared to present automata Aφ for each formula φ with just second order free variables, such that
φ(X0, . . . , Xn−1) ⇐⇒ ΦAφ(X0, . . . , Xn−1).
84

This construction will take place in two steps: ﬁrst we will reduce φ to include only second order variables, and those logical atoms and connectives that we know how to deal with (⊆, S , ∧, ¬, ∃) and then translate, inductively, to the automaton Aφ. Deﬁnition. Deﬁne the Weak Exclusively Second Order Theory of One Successor (W!S1S) as WS1S, augmented with the additional relations: S (X, Y ) which indicates that S (X) = Y , and ⊆, without:
• First order variables and quantiﬁcation, • ∈ and S, • Disjunction, • Universal quantiﬁcation. Proposition 35. Any formula φ in WS1S with only second order free variables has an equivalent in W!S1S.
Proof. Our ﬁrst step will be to replace ﬁrst order variables with second order ones, representing the singletons of their ﬁrst order parts. Let Sing(X) be shorthand for:
∀Y : Y ⊆ X =⇒ [(X ⊆ Y ) ∨ (∀Z : Y ⊆ Z)].
• Replace ﬁrst order variables x with X, • Replace atomic formulas of the form x ∈ Y with X ⊆ Y , • Replace terms of the form S(x) with S (X), • For nested uses of S , introduce new variables to represent intermediate
terms. e.g.: Y = S (S (X)) ∃Z : Y = S (Z) ∧ Z = S (X),
85

• Relationalize S , replacing instances of the form S (X) = Y with S (X, Y ), • Replace equality with ⊆:
X = Y X ⊆ Y ∧ Y ⊆ X,
• Replace universal quantiﬁers with conjugated existential quantiﬁers, • Replace ﬁrst order existential quantiﬁers:
∃x : φ ∃X : Sing(X) ∧ φ,
• Replace disjunctions with conjugated conjunctions: φ ∨ ψ ¬((¬φ) ∧ (¬ψ)).
The resulting formula will be equivalent to the original. Note that the lack of spurious singletons is critical to the success of this equivalence. Deﬁnition. Given a formula φ in W!S1S, deﬁne the automaton Aφ inductively:
• If φ := S (X, Y ), then Aφ = ASuc. • If φ := X ⊆ Y , then Aφ = ASub. • If φ := ψ ∧ τ , then augment and permute Aψ and Aτ appropriately so their free
variables match, and take the conjunction of the resulting automata. • If φ := ¬ψ, then let Aφ = Comp(Aψ). • If φ := ∃X : ψ then, modulo suitable variable permutations, let Aφ = Proj(Aψ).
Combining all of our theorems from before, we can prove inductively that: φ(X0, . . . , Xn−1) ⇐⇒ ΦAφ(X0, . . . , Xn−1).
86

Deﬁnition. For formulas φ of WS1S with only free second order variables, we produce Aφ by ﬁrst producing φ in W!S1S which is equivalent, then computing Aφ .
In the next section, we will ﬁnally prove that our axiomatization is complete:
3.14 0-ary Automata
Up to this point, we have provided a canonical form for formulas of WS1S with only free second order variables (namely, for a formula φ its canonical form is ΦAφ), and shown that it follows from our axiom system that every such formula is equivalent to its canonical form. Now, in order to prove that our axiom system is complete, we must show that if a sentence φ is true (in the theory WS1S), then its canonical form is provable.
First, we must examine what the canonical form of a sentence can look like. Speciﬁcally, it looks like a 0-ary automaton, an automaton that takes in no input. What’s more, because none of its inputs contain elements, the length of any run of this 0-ary automaton is LSUB(∅) = 0. So a run starts and ends in the same state. If we unpack the deﬁnitions, we ﬁnd that our 0-ary automaton is accepting iff it has a start state which is also an accept state. Proposition 36. Given 0-ary automaton A = (0, Q, I, δ, F ),
ΦA ⇐⇒ I ∩ F = ∅. What’s more, this follows from our axiomatization.
Proof. (⇒): Given an accepting run Y of A, we note that by deﬁnition, Y is a partition of {0}. Let q ∈ Q be such that 0 ∈ Yq. Then by construction, q ∈ I ∩ F .
87

(⇐): Given a q in I ∩ F , then I claim that Y where Yq = {0} and all other Yq are empty is an accepting run.
As such, any sentence in canonical form which is true is provable. And thus, any sentence which is true is provably equivalent to a sentence in canonical form, which is provable. Hence any sentence in WS1S which is true is provable from our axiom system. One can also verify the soundness of our axiom system. We summarize this in the following theorem: Theorem 7. Our proposed axiom system is sound and complete for WS1S.
3.15 Finitely Many Axioms Do Not Sufﬁce
Our axiom system is fairly small and simple. A signiﬁcant improvement, however, would be to be able to produce a ﬁnite axiomatization of WS1S. In this section, we prove that no such axiomatization is possible, by providing a structure Hp which satisﬁes all of our axioms up to a point, but does not satisfy all of our axioms. Deﬁnition. Let ω + ζ refer to the collection of elements of the form n ∈ ω or ∞ + z|z ∈ Z, with natural ordering such that every element of the form ∞ + z (elements in the ζpart) is > every element of the form n ∈ ω (elements in the ω-part). We also introduce the natural S operation which takes elements of the form n to n + 1 and elements of the form ∞ + z to ∞ + z + 1. Deﬁnition. A subset X of ω + ζ is said to be internally periodic of period m if there is some a ∈ ω and b ∈ Z and some set P ⊆ {0, . . . , m − 1}, such that an element of
88

ω + ζ between a and ∞ + b is in X iff it is of the form c or ∞ + c for some c congruent to an element of P modulo m.
Deﬁnition. Let p be an odd prime. Let Wp denote the collection of bounded subsets of ω + ζ, which are internally periodic of some period not divisible by p. Then let Hp denote the two-typed structure:
Hp := (ω + ζ, Wp, ∈, S).
Lemma 9. In Hp, the downward closed sets are exactly those sets of the form {x|x < } for some ∈ ω + ζ.
Proof. It is clear that sets of this form are downward closed. It sufﬁces to show that if a second order object of Hp is downward closed, then it is of this form. Suppose the set X ∈ Wp is downward closed. Then it must be internally periodic of period 1. Since it’s in Wp it must also be bounded above by some x ∈ ω + ζ. Suppose X contains any element of the ζ part. Then by properties of Z, we can ﬁnd a largest element in its ζ part, and thus a largest element w ∈ X. By downward closure this set must contain all elements of the ζ part up to w. By internal periodicity, X must contain a ﬁnal segment of elements of the ω part. And by downward closure again, X must contain all elements of the ω part. Thus X is of the desired form.
Suppose instead that X does not contain any element of the ζ part. By internal periodicity, it must thus be bounded above in the ω part. One can then reason that any downwards closed subset of ω which is bounded above must be of the desired form.
Proposition 37. Hp is a model of axioms (1.1) through (2.5)
89

Proof. Recall that our axioms (1.1) through (1.7) classiﬁed those linear orderings which were elementarily equivalent to (ω, <). As mentioned in [9], these are the linear orderings of the form (ω + ζ · L), in particular, this includes ω + ζ, the structure we’re working in now. It sufﬁces to prove that the construction of < in terms of downward closed sets gives us the same < as the natural ordering on ω + ζ, which follows immediately from the above lemma.
One then routinely checks axioms (2.1) through (2.5):
(2.1): Second order elements in Hp are sets and thus satisfy extensionality.
(2.2): Singletons are bounded and internally periodic of period 1.
(2.3): The union of two bounded internally periodic sets of periods a, b is bounded and internally periodic of period ab. If p a, b then p ab.
(2.4): The difference of two bounded internally periodic sets of periods a, b is bounded and internally periodic of period ab.
(2.5): By deﬁnition, all second order elements in Hp are bounded above. Proposition 38. Hp is a model of (3.A) for any A with fewer than p states.
Proof. Given a runnable automaton A, ﬁrst, restrict to an automaton A which is runnable and deterministic by removing terms from its transition relation. Any accepting run of A on suitable input X0, . . . , Xn−1 will also be a run of A. Let X0, . . . , Xn−1 be from Wp, and let us construct run Y0, . . . , Y|Q|−1. If none of the X0, . . . , Xn−1 actually intersect the ζ part, just use the theorem for the standard model of WS1S, so we may assume at least one of the Xi has an element from the ζ part. First let us construct the run restricted to the ω part. By runnability and induction on ω, we can construct the unique run restricted to the ω part. I claim
90

that this run is eventually periodic of a period not divisible by p. Since each of the inputs X0, . . . , Xn−1 restricted to their ω parts are eventually periodic of a period not divisible by p, then beyond some point x ∈ ω, all of the X0, . . . , Xn−1 will together be periodic of a period m which is a factor of the product of the periods of each of the Xi, hence also not divisible by p. Since A is deterministic, its run on this input will eventually be periodic of period k times m for some k less than or equal to the number of states of A, which is less than p. Hence, this run will have period not divisible by p.
Now we need to continue the run of A into the ζ part. The Xi will together be initially periodic of period m up to some point ∞ + z where one of them ﬁrst diverges from its periodic behavior. Continue the Yi up to this point by throwing ∞ + w into Yi iff elements of the periodic portion of the ω part of the Yi are congruent to w modulo the collective period of the Yi. I claim that this still satisﬁes the transition relation, because the identical periods in the ω part satisﬁed the transition relation. After the point ∞ + z, continue the run inductively just following the transition function up until the end of the input.
Proposition 39. Hp is a model of (4.A) for any A with fewer than p states.
The proof is similar to the above, but in reverse.
Proposition 40. Hp is not a model of WS1S.
Proof. One can write a formula τp(X) which says:
p−1
τp(X) := 0 ∈ X∧ Si(0) ∈/ X
i=1
∧∀x : Sp(x) ∈ X =⇒ x ∈ X.
91

That is, X contains 0 and exactly every pth element, up to some point. It is a fact of WS1S that
∀x : ∃y > x : ∃X : τp(X) ∧ y ∈ X. That is, for any x, we can always ﬁnd such a periodic set containing an element larger than x. However, this is clearly not true of Hp for x in the ζ part.
As such, we can ﬁnd models of arbitrary ﬁnite subsets of our axiom system which are not models of the whole thing. Theorem 8. WS1S is not ﬁnitely axiomatizable.
Proof. Suppose we have a ﬁnite axiomatization of WS1S. Take its conjunction to produce a single axiom φ for WS1S. Since our axiom system is complete, φ must follow from some ﬁnite subset of it. For some large enough p, Hp will satisfy this ﬁnite set of axioms. So φ is true of Hp. This contradicts our assumption that φ was an axiomatization for WS1S, since it is true of a structure which is not a model of WS1S.
3.16 Conclusion and Future Research
As we can see from this example, this axiom system is well suited for proving that Hp is nearly a model of WS1S. Compare with an axiom system similar to that in [10], which would require proving induction for every WS1S formula. In chapter 4, we will use this axiom system to investigate what nonstandard models WS1S has.
92

It should be clear from our discussion in section 2 that we could just as easily be studying the weak second order theory of (ω, <). One may then ask about the weak second order theories, or just the second order theories in general, of other linear orderings. Is there, for instance, a notion of automaton that takes inputs which instead of being labeled ﬁnite segments of ω, were labeled intervals of Q as a linear ordering? Do such automata provide a canonical form for weak second order formulas of (Q, <)? Perhaps such automata, if they exist, would be related to continuous automata.
One may also ask whether this axiomatization is minimal. An effort has been made to avoid redundancy among axioms (1.1) through (2.5), although whether there is some residual redundancy and how many of these axioms follow from axioms (3.A) or (4.A) are still matters for study. A proof of minimality would require producing a large number of structures which satisfy various subsets of our axiomatization.
Finally, we saw some examples of relations on ﬁrst order elements of WS1S that essentially required the use of second order quantiﬁcation in their deﬁnitions. Speciﬁcally, < which played a critical role in our axiomatization, and modular congruence modulo a ﬁxed natural number, which played a critical role in the construction of our nearly-models Hp. One may then ask if there are any more which cannot be expressed in terms of these two.
Although WS1S is not nearly as expressive as larger axiomatized theories such as Peano Arithmetic or ZFC, it is easily decidable and its nonstandard models are tractable (yet still, as we will see in the next chapter, interesting).
93

CHAPTER 4 REGARDING NONSTANDARD MODELS OF THE WEAK SECOND
ORDER THEORY OF ONE SUCCESSOR
In this chapter, we answer two key questions about nonstandard models of the weak second order theory of one successor (WS1S): What can their ﬁrstorder parts look like, and what can their second order parts look like for a particularly interesting case (when the ﬁrst order part is ω +ζ). We will also provide some tools for constructing nonstandard models of WS1S from already known nonstandard models. This will go a long way towards establishing a complete classiﬁcation of the nonstandard models of WS1S.
4.1 Introduction
Recall that the weak second order theory of one successor, WS1S, is the theory of the two-typed structure G := (ω, F(ω), ∈, S), where F(ω) is the set of ﬁnite subsets of ω, ∈ is containment, and S is the successor operation on ω. As usual, lowercase variables will denote elements of the ﬁrst type, uppercase variables will represent elements of the second type, and boldface variables will indicate shorthand for ﬁnite or inﬁnite sequences of elements of the corresponding type indexed by natural numbers. Crucially, these will not be indexed by ﬁrst order elements from the current model of WS1S, but standard natural numbers. This theory is of interest because it is a decidable second order theory of ω, and equally expressive to ﬁnite automata.
Recall that in the previous chapter, we proved that WS1S was not ﬁnitely axiomatizable, and provided an axiomatization with schema indexed by ﬁnite
94

automata. This axiomatization is well suited to verifying that various structures satisfy it and thus are nonstandard models of WS1S.
A nonstandard model of WS1S is simply a structure which satisﬁes all of the same sentences as the standard (or ground) model G as deﬁned above, but which is not the standard model. Nonstandard models are a useful tool for understanding the limits of the expressiveness of a language. Our nonstandard models will clearly differ from our ground model G, but not in the WS1S formulas they satisfy. That is, the ways that our nonstandard models differ from G or from each other will not be expressible in the language of WS1S.
We expect to ﬁnd a large number of nonstandard models of WS1S. The Lo¨ wenheim-Skolem Theorem says that if a countable ﬁrst-order theory has an inﬁnite model, then it has one of every inﬁnite cardinality. Since we can view WS1S as the ﬁrst-order theory of the two-typed structure G, this gives us nonstandard models of WS1S of every inﬁnite cardinality.
A priori, studying nonstandard models of second order theories seems like a hefty task: in a nonstandard model H, even if one has a grasp on the collection of ﬁrst order objects, the second order objects are some nebulous collection related to the ﬁrst order objects via H’s interpretation of ∈, ∈H. Fortunately, nonstandard models of second order theories satisfy extensionality, so the second order elements are determined by which ﬁrst order elements they ∈H-contain. This allows us to identify second order elements with the collection of their ∈Helements. So up to isomorphism, a nonstandard model of WS1S will be of the form (M1, M2, ∈, S) where M2 is a collection of subsets of M1, ∈ is genuine containment, and S is some interpretation of S on M1. Combined with our understanding of nonstandard models of (ω, <), this will give us a very constructive
95

representative for every isomorphism class of nonstandard model of WS1S. Once we have this constructive form for representatives of isomorphism
classes of nonstandard models of WS1S, we need to decide which structures of this constructive form actually are nonstandard models of WS1S. We will prove that for every nonstandard model L of (ω, <), there is a nonstandard model of WS1S whose ﬁrst order elements, ordered by that model’s interpretation of < form the linear ordering L. Then we will examine what second order parts are available in the simplest nonstandard case: where the ﬁrst order part is the linear ordering ω + ζ.
Finally, we will provide tools for cutting apart nonstandard models of WS1S and gluing them back together to produce other nonstandard models.
4.2 Recalling our Axiomatization
Recall that our axiomatization from Axiomatizing the Weak Second Order Theory of One Successor made heavy use of shorthand. In this section, we reintroduce this shorthand, and present a compact list of our axioms. Deﬁnition. A set X is said to be downward closed if:
dc(A) :≡ ∀X : S(x) ∈ A =⇒ x ∈ A We deﬁne a linear ordering on the ﬁrst order elements by:
x ≤ y :≡ ∀X : dc(X) ∧ y ∈ X =⇒ x ∈ X
96

Deﬁnition. Given two sets A, B, we write:
C = A ∪ B :≡ ∀x : x ∈ C ⇐⇒ x ∈ A ∨ x ∈ B C = A ∩ B :≡ ∀x : x ∈ C ⇐⇒ x ∈ A ∧ x ∈ B C = A \ B :≡ ∀x : x ∈ C ⇐⇒ x ∈ A ∧ x ∈/ B
A ⊆ B :≡ ∀x : x ∈ A =⇒ x ∈ B
Recall that we had a notion of automaton: Deﬁnition. A logical automaton is a 5-tuple A = (n, Q, I, δ, F ) where:
• n is the number of inputs, potentially 0. • Q is a ﬁnite set of states. We may occasionally refer to the elements of Q via some
canonical ordering as 0, . . . , |Q| − 1, but more often we will let the elements of Q retain some additional structure. We will also refer to the number of states of an automaton A as |A|. Often we will prove results for automata with state set m where this sufﬁces to prove the result in general. • I ⊆ Q is the set of initial states. • δ ⊆ Q × 2n × Q is the transition relation. • F ⊆ Q is the set of ﬁnal states. Deﬁnition. Deﬁne the formula R(n,Q,I,δ,F )(X0, . . . , Xn−1, , Y0, . . . , Y|Q|−1) in WS1S as the conjunction of:
• ∀x : i∈Q j∈Q[x ∈ Yj ⇐⇒ (i = j ∧ 0 ≤ x ≤ )] • i∈I 0 ∈ Yi • ∀x : 0 ≤ x < =⇒ (i,s,k)∈δ(x ∈ Yi ∧ x ∈ Xs ∧ S(x) ∈ Yk)
97

This formalizes the notion of ( , Y0, . . . , Y|Q|−1) being a run of (n, Q, I, δ, F ) on input X0, . . . , Xn−1. Deﬁnition. Deﬁne the formula Φ(n,Q,I,δ,F )(X0, . . . , Xn−1) in WS1S as:
∃ : ∃Y0, . . . , Y|Q|−1 : = LSUB( X) ∧R(n,Q,I,δ,F )(X0, . . . , Xn−1, , Y0, . . . , Y|Q|−1) ∧ ∈ Yi.
i∈F
Where LSUB(X) is shorthand for the least strict upper bound for a set, either 0 for the emptyset or S(max(X)) for a nonempty set.
This formalizes the notion of the automaton (n, Q, I, δ, F ) accepting inputs X0, . . . , Xn−1. Of course, not every automaton will have an accepting run on every input, but we can expect certain classes of automata to at least have runs on every input: Deﬁnition. An automaton (n, Q, I, δ, F ) is said to be runnable if:
• |I| > 0, • ∀q, s ∈ 2n : ∃q : δ(q, s, q ). Deﬁnition. An automaton (n, Q, I, δ, F ) is said to be reverse-runnable if:
• |F | > 0, • ∀q, s ∈ 2n : ∃q : δ(q , s, q).
Finally, we can present our axiom system:
98

(1.1) ∀x, y, z : x < y ∧ x < z =⇒ x < z

(< is transitive)

(1.2) ∀x, y : x < y =⇒ ¬(y < x)

(< is antisymmetric)

(1.3) ∀x, y : x < y ∨ x = y ∨ y < x

(< is total)

(1.4) ∀x : S(x) > x

(S is strictly increasing)

(1.5) ∀x : y : x < y < S(x)

(S is a successor)

(1.6) ∃o : ∀x : o ≤ x

(< has minimal element)

(1.7) ∃o : ∀x : x = o =⇒ ∃y : S(y) = x

(S has image ω \ {0})

(2.1) ∀X, Y : (∀x : x ∈ X ⇔ x ∈ Y ) ⇒ X = Y

(Extensionality)

(2.2) ∀x : ∃X : ∀y : y ∈ X ⇔ y = x

(Singleton)

(2.3) ∀X, Y : ∃Z : ∀x : x ∈ Z ⇔ (x ∈ X ∨ x ∈ Y )

(Union Closure)

(2.4) ∀X, Y : ∃Z : ∀x : x ∈ Z ⇔ (x ∈ X ∧ x ∈/ Y )

(Difference Closure)

(2.5) ∀X = ∅ : ∃x : x ∈ X ∧ ∀y : y ∈ X ⇒ y ≤ x

(Maximal Element)

(3.A) ∀X0, . . . , Xn−1 : Φ(n,Q,I,δ,Q)(X0, . . . , Xn−1)

(A Runnable)

(4.A) ∀X0, . . . , Xn−1 : Φ(n,Q,Q,δ,F )(X0, . . . , Xn−1)

(A Reverse-Runnable)

4.3 L-Models

In this section, we introduce the notion of an L-model, a very constructive form of nonstandard model of WS1S that we will use as representatives for isomorphism classes of our nonstandard models. Recall that nonstandard models of (ω, <) are of the form ω + ζ · L [9]. An L-model will be one whose ﬁrst order part is ω + ζ · L. Deﬁnition. A two-typed structure (M1, M2, ∈, S) is said to be an L-model if:
M1 = ω + ζ · L
Where ζ = (Z, <), the linear ordering of the integers, L is a linear ordering, and · is the
99

lexicographic product, so:
ω + ζ · L = ω + ζ + ··· + ζ,
L
Speciﬁcally, M1 contains elements of the form n ∈ ω, and elements of the form (a, b) with a ∈ Z and b ∈ L, with elements from ω ordered naturally, being less than elements of the form (a, b). Elements of the form (a, b) are ordered ﬁrst by second coordinate and then by ﬁrst coordinate. And:
1. M2 ⊆ P(M1), and ∈ is ordinary containment between M1 and M2, 2. S is ordinary successor on ω and takes (a, b) ∈ ζ × L to (a + 1, b), 3. Every set of the form {x|x < } is in M2. 4. The only sets X in M2 which, when intersected with each ω or ζ part form an
initial (possibly empty) interval of that part are of the form {x|x < }.
Proposition 41. In every L-model, < as deﬁned on M1 agrees with < as deﬁned in terms of downwards (predecessor) closed elements.
Proof. Suppose that x ≤ y ∈ ω + ζ · L, and we have a predecessor closed set X containing y. Since X is predecessor closed, when intersected with any ω or ζ part, it will form a (possibly empty) initial interval. Hence X is of the form {w|w < }. By transitivity of <, x must also be in this set.
Suppose now that every predecessor closed set X containing y contains x. Since our model contains {w|w ≤ x} and this is a predecessor closed set containing x it must also contain y. Hence y ≤ x.
If a structure is an L-model, that represents a signiﬁcant start towards proving that it satisﬁes all of our axioms.
100

Proposition 42. Every L-model satisﬁes axioms (1.1) through (1.7), and (2.1).
Proof. Our condition that < as deﬁned in terms of downward (predecessor) closed sets agrees with ≤ as the linear ordering on ω + ζ · L will go a long way. By virtue of being a linear ordering < will satisfy axioms (1.1), (1.2), and (1.3).
One checks that when we interpret < as the linear ordering on ω + ζ · L and S as the natural successor operation on ω + ζ · L, axioms (1.4) through (1.7) are satisﬁed. Thus, when we interpret < as the equivalent linear ordering deﬁned in terms of predecessor closed collections, and S as the natural successor operation on ω + ζ · L, axioms (1.4) through (1.7) are satisﬁed.
Finally, (2.1) is satisﬁed by virtue of condition (1) and the fact that sets satisfy extensionality.
Proposition 43. G is the unique 0-model of WS1S. (We use natural numbers n for the linear order of size n whose elements are 0, . . . , n − 1.)
Proof. G clearly is a 0-model (one whose ﬁrst order type is ω) which models WS1S. Suppose there is another 0-model G which is a model of WS1S. Any second order element of G will be a ﬁnite set of natural numbers, so one may construct a formula of WS1S explicitly stating that that second order element is an element of our model. As such the second order elements of G must contain as a subset the second order elements of G. Furthermore, G cannot contain any second order elements which are inﬁnite subsets of ω, since these aren’t bounded, violating our maximal element axiom. As such, G = G.
Proposition 44. Every model of WS1S is isomorphic to an L-model for some L.
101

Proof. Given a model (M1, M2, ∈, S), deﬁne an equivalence relation ∼ on M1 where:
x ∼ y :≡ ∃n, m : Sn(x) = Sm(y).
Note that ≤, as deﬁned in terms of downward closed (ie, predecessor closed) elements of M2, respects these equivalence classes. That is, if x ≤ y and s ∼ x and y ∼ y, then x ≤ y or x ∼ y . Pick a representative of each equivalence class, including 0 (the unique element not in the image of S) for its equivalence class, and denote the set of representatives of other equivalence classes by L. L admits a natural linear ordering inherited by ≤ on M1, as per above. Every element of M1 is thus Sn(0) or Sn( ) or S−n( ) for some ∈ L, which we will map into in our isomorphism n ∈ ω or (n, ) or (−n, ) respectively. This clearly respects S.
Condition (3) holds because these sets are deﬁnable. Condition (4) is equivalent to saying that all predecessor closed sets are of the form {x|x < }, which we can write in the language of WS1S and which is true of our ground model G, so is in the theory of WS1S.
We will identify any element of M2 with the set of its ∈-elements. This association is injective by extensionality. It is not surjective. This clearly respects ∈.
Of course, this isn’t an assurance that there are L-models for every L. In the case of nonstandard models of the ﬁrst order theories of (ω, +) and (ω, +, ·), we can still deﬁne <, and there are some very strict restrictions on what sorts of linear order types < can have. Speciﬁcally, nonstandard models of (ω, +) must have linear order type ω + ζ · L for some dense L. We will address this restriction in the next section.
102

4.4 Minimal L-Models
Recall that we have a notion of modular congruence in WS1S. Given a natural number n (not an element of our model), we can construct the formula: Deﬁnition. Given an L-model (M1, M2, S, ∈) of WS1S, we write
µm(X, x) :≡ ∀y : Sm(y) ≤ x =⇒ [Sm(y) ∈ X ⇐⇒ y ∈ X] . And, for n < m,
Mn,m(x) :≡ ∀Y : x ∈ Y ∧ µm(Y, x) =⇒ Sn(0) ∈ Y. Proposition 45. The following are part of the theory of WS1S:
1. For all Mn,m(x) ⇐⇒ M(n+1 mod m),m(S(x)). 2. For all m ∈ ω, x ∈ ω + ζ · L, exactly one of Mi,m(x) is true for i = 0, . . . , m − 1. 3. For all m, n, exactly one of Mn,m(Si(x)) for i = 0, . . . , m − 1. 4. For all m, n, Mn,m(x) ⇐⇒ Mn,m(Sm(x)).
In agreement with G, we say that a ﬁrst order element x in a speciﬁc model is congruent to n modulo m if Mn,m(x). Also, note that in an L-model, specifying the congruences of the elements of the form ( , 0) is enough to determine the congruences of all elements.
What’s more, once we’ve speciﬁed L and these congruences, we have enough information to throw in just those second order elements which are absolutely necessary to construct an L-model. But ﬁrst, we need a language for specifying our congruences:
103

Deﬁnition. A sequence {sn}n>2∈ω is said to be a congruence type if:

• sn < n • if a|b then sb ≡ sa mod a.

We will denote the set of all congruence types K.

Note that the sequence 0 := {0}n>2 is a congruence type, although there are continuum many other such sequences1.

Once we have a particular association of congruence types to all of our elements in mind, we can deﬁne modular equivalence classes for every element of ω + ζ · L:

Deﬁnition. Given L, k : L → K, deﬁne Mnk,m:


(r ≡ n mod m) Mnk,m(x) :≡
(k( )m + r ≡ n mod m)

x=r∈ω x = (r, )

If Mnk,m(x), we say that x is congruent to n modulo m.

Alternately, for clarity and when k is given, we write x mod m = n, with the note that m and n are natural numbers, and not general elements of our models.

Note that:

Mkk( )m,m((0, )),

That is, (0, ) is congruent to k( )m modulo m.
1One may choose remainders modulo primes arbitrarily, and the congruence type is determined completely by remainders modulo powers of primes (via the Chinese Remainder Theorem), which must themselves be interconsistent. Remainders modulo pa determine remainders modulo pb for b < a.

104

Deﬁnition. Given a linear ordering L, a map k : L → K, a tuple x0, . . . , xn−1 of elements of ω + ζ · L, a tuple m0, . . . , mn−2 of natural numbers > 1, and a tuple p0, . . . , pn−2 of subsets where pi ⊆ {0, . . . , mi − 1}, deﬁne:
Pw(k, x, m, p) := : {y ∈ [xi, xi+1)|y mod mi ∈ pi}.
i
This is a second order element whose behavior is piecewise periodic. Each piece is an interval of the form [xi, xi+1), the mi and pi specify the periodic behavior on each of those intervals, and the map k describes how inﬁnite periodic behavior which is terminal in one ω or ζ component gets picked up in the next. We will construct our model consisting entirely of second order elements of this form for arbitrary x, m, and p, but for ﬁxed k. It’s necessary that we have a clear sense of the congruence class of each of our inﬁnite elements: in order to be a model of WS1S, it is necessary that we are not lead to believe that some inﬁnite element is congruent to for 0 modulo 2 but 1 modulo 4, since one can write out the statement:
∀x : M0,2(x) =⇒ ¬M1,4(x). In the language of WS1S, which is true of G and hence in the theory of WS1S. Deﬁnition. Given a linear ordering L and a map k : L → K, deﬁne PPL,k as:
PPL,k := {Pw(k, x, m, p)}.
And HL,k as the L-model with second order type PPL,k.
There’s a little bit of work showing that HL,k is an L-model, which we will do in the proposition below.
105

As we shall see, since all of these elements are describable in terms of the endpoints of the intervals x0, . . . , xn−1, they’ll wind up existing in some form or other in every L-model of WS1S. Once we specify k, we have an exact set that must appear in our L-model of WS1S. What’s more, the presence of just these elements is enough to satisfy our axioms, giving us a model of WS1S. This intuitively makes sense for our runnability schemes: given any deterministic automaton, on piecewise periodic input, we should expect a piecewise periodic run. If we have a long segment of periodic behavior of the input, after some amount of time settling in, the states of the automaton will also be periodic, of some period a factor of the original period times a number less than the number of states of our automaton. The time spent settling in is ﬁnite, so we just encode these all as pieces of period one.
Theorem 9. HL,k is an L-model of WS1S.
Proof. First, we need to show that HL,k is an L-model. Speciﬁcally, we need to show that it satisﬁes conditions (3) and (4). Condition (3) holds by construction. Suppose now that a set X is piecewise periodic and initial in every ω or ζ part. Assume there is some piece in the speciﬁcation of X, [xi, xi+1) such that [xi, xi+1) ∩ X = [xi, xi+1) and [xi, xi+1) ∩ X = ∅. Consider the largest such piece. Looking in the last two periods of this piece, we can ﬁnd an element x such that x ∈/ X but S(x) ∈ X, contradicting that X is initial in the ω or ζ part in which x appears, contradicting our assumptions. As such, every periodic piece of X is either the complete interval or empty. If an empty interval were followed by a complete interval, then we would again have an x ∈/ X with S(x) ∈ X, so it must be that X consists of a series of complete intervals followed by a series of empty intervals. That is, X is an interval of the form {x|x ≤ } for the weak right endpoint of the largest nonempty interval.
106

As such, we’ve already checked axioms (1.1) through (1.7) and (2.1).
(2.2): It is clear that singletons are piecewise periodic.
(2.3), (2.4): Let x0, . . . , xn−1 be a common reﬁnement of the endpoints of the intervals describing piecewise periodic sets X, Y . Then X and Y can also be described as piecewise periodic with pieces [xi, xi+1). On any one particular piece [xi, xi+1), X and Y will have certain periodic behaviors of periods m, m . One can then verify that X ∩ Y and X \ Y will have periodic behavior of period some factor of mm on that particular piece. Hence, X ∩ Y and X \ Y are piecewise periodic, and thus in our model.
(2.5) Given a piecewise periodic set X, we can ﬁnd its largest element by looking in the last period of the last nonempty piece.
(3.A) Given a runnable automaton A, let A be a deterministic automaton obtained by removing tuples from the transition relation of A. We will show that A has a run on any suitable input X0, . . . , Xn−1, and thus so does A.
Given our inputs X0, . . . , Xn−1 piecewise periodic, we can ﬁnd a common reﬁnement of the endpoints of their pieces y0, . . . , yr−1, and common periodicities m0, . . . , mr−2. We can also make sure y0 = 0 by possibly adding in new empty pieces to the beginning of the speciﬁcations of our inputs. Iterating through the intervals [yi, yi+1), do the following:
Step 1) Let ri : {0, . . . , mi − 1} → 2n describe the periodic behavior of the inputs such that for w ∈ [yi, yi+1),
w ∈ Xj ⇐⇒ (ri(w mod mi))j = 1.
Step 2) Consider the state of the automaton A just after entering the region
107

[yi, yi+1). If this is the ﬁrst region, this is the start state. If this is not the ﬁrst region, this was computed when we worked through the previous region. Call this state q. Let m = mi.
Step 3) Consider the operation of one period (m characters) of input on states, giving us a map f : Q → Q.
Step 4) Apply f to q repeatedly until a state is encountered that appeared before. This will give us a sequence of states:
q = q0, q1, . . . , qs, . . . , qt−1, qt = qs.
Note that t cannot be larger than |Q|, since there are only |Q| distinct states.
Step 5a) In case Smt(yi) ≥ yi+1, append to our speciﬁcation of each element of the run the interval [yi, yi+1) with period mt and periodic behavior just the appropriate behavior for a run of an automaton on the input in the interval [yi, yi+1). If this happens, skip step 6) and go on to the next interval, remembering the state of the automaton after reading the ﬁnal character.
Step 5b) Otherwise, append to our speciﬁcation of each element of the run the interval [yi, Sms(yi)) with period ms and periodic behavior just the appropriate behavior for the run of an automaton on the input in the interval [yi, Sms(yi)) starting from state q.
Step 6) Now, append to our speciﬁcation of each element of the run the interval [Sms(yi), yi+1) with period m(t−s) and periodic behavior the appropriate behavior for the run of an automaton on the input in the interval [Sms(yi), Smt(yi)), starting from state qs. Since the state this interval starts in and the state immediately after the end are the same, this is a valid part of the run of A. Based on the
108

remainder of yi+1 modulo m(t − s), determine the state of the automaton after after reading the ﬁnal character just before position yi+1.
Repeat from step 2) for each piece [yi, yi+1). This will construct the desired run of A on input X0, . . . , Xn−1.
The proof for (4.A) for reverse-runnable A is identical, but in reverse. Proposition 46. Every L-model of WS1S contains HL,k as a subset of its second order objects for the appropriate k.
Proof. Given an L-model of WS1S, we can determine k by looking at the congruence classes of elements of the form (0, ). It then sufﬁces to note that the elements Pw(k, x, m, p) are describable in terms of the elements x0, . . . , xn−1. Since such elements exist for every increasing sequence x0, . . . , xn−1 in the ground model G, they must exist in every nonstandard model as well.
Finally, we must ask the question: how many models have we actually found? Speciﬁcally, when is HL,k isomorphic to HL ,k ? Deﬁnition. Two congruence types s, s are said to be shifts if there is some k such that sn = sn + k mod n for each n. Proposition 47. HL,k and HL ,k are isomorphic iff there is an isomorphism of linear orderings f : L → L such that k( ) and k (f ( )) are shifts for each .
Proof. (⇒): Suppose HL,k and HL ,k are isomorphic. Then there is a bijection g : ω + ζ · L → ω + ζ · L and a bijection h : PPL,k → PPL ,k . Recall that we deﬁned an equivalence relation on ω + ζ · L by:
x ∼ y :≡ ∃n, m : Sn(x) = Sm(y).
109

Since g respects S, g respects these equivalence classes, giving us a bijection g : L → L . g must also respect <, since g respects <. This is the isomorphism f we’re looking for. Suppose f ((0, )) = (z, ). will be g ( ). By isomorphism, (0, ) and (z, ) will have the same modular equivalence classes, so (0, ), and (0, ) will have shifted modular equivalence classes, as desired.
(⇐) is a straightforward construction of the isomorphism. Proposition 48. There are continuum many 1-models.
Proof. Note that there are continuum many congruence types, and that the shift relation has countable equivalence classes. Hence there are continuum many congruences types up to shifting. The result then follows by the above proposition.
Now that we’ve shown that there are nonstandard models of WS1S, we would like a tool for classifying these nonstandard models.
4.5 The Tail-Head Lemma
Suppose we have a particular 1-model M of WS1S, and two second order objects X, X from M which agree on the ω part. Since the symmetric difference of X, X is an element of M , it has a least element. This means that there is some point in the ζ part such that before that point, X and X agree. Put another way, the behavior of a second order element of M on the ω part determines an initial part of its behavior on the ζ part. In this section, we formalize and expand on this result.
110

Deﬁnition. Given a linear ordering L, a cut is just a partition of L into two subsets (A, B) where A is an initial segment of L and B is a ﬁnal segment of L. A cut (A, B) is said to be inﬁnite if A has no greatest element and B has no least element.
Proposition 49. Any inﬁnite cut of ω + ζ · L is of the form (ω + ζ · A, ζ · B) for some cut (A, B) of L.
Proposition 50. Given an L-model (M1, M2, ∈, S) of WS1S and an inﬁnite cut (A, B) of M1, A, B ∈/ M2.
Proof. Since (M1, M2, ∈, S) is an model of WS1S, every second order element must have a <-least and <-greatest element, which by assumption neither A nor B has.
Proposition 51. Let (M1, M2, ∈, S) be an L-model of WS1S. Let (A, B) be an inﬁnite cut of M1. Suppose X, Y ∈ M2 are second order objects, and x ∈ A and X and Y agree for all y ∈ A with y ≥ x. Then X and Y agree on some initial segment of B.
Proof. Let X∆Y denote the symmetric difference of X and Y . By closure under boolean operations, X∆Y ∈ M2. Additionally, X∆Y \ {y|y ≤ x} ∈ M2, as this later set is the downward closure of the singleton {x}. This set, X∆Y \ {y|y ≤ x} ∈ M2, is either empty or has a <-smallest element. In either case, it doesn’t contain elements up to some point in B. Put another way, X and Y agree up to that point in B.
Proposition 52. Let (M1, M2, ∈, S) be an L-model of WS1S. Let (A, B) be an inﬁnite cut of M1. Suppose X, Y ∈ M2 are second order objects and x ∈ B and X and Y agree for all y ∈ B with y ≤ x. Then X and Y agree on some ﬁnal segment of B.
Proof. The proof is a reverse version of the above.
111

Deﬁnition. Given an L-model with an inﬁnite cut (A, B) on its ﬁrst order part, we deﬁne equivalence relations =AT (tail equivalence) and =BH (head equivalence) on M2, and more generally, on P(M1) by:
X =AT Y :≡ ∃x ∈ A : ∀y ∈ A : y ≥ x =⇒ y ∈/ X∆Y, X =BH Y :≡ ∃x ∈ B : ∀y ∈ B : y ≤ x =⇒ y ∈/ X∆Y. Where ∆ denotes symmetric difference. Speciﬁcally, X and Y are tail equivalent if they agree on a ﬁnal interval of A and are head equivalent if they agree on an initial interval of B.
We can now reinterpret our propositions from before as saying that given an inﬁnite cut (A, B) the tail equivalence class of a second order object determines its head equivalence class, and the head equivalence class determines its tail equivalence class. Deﬁnition. Given an L-model with an inﬁnite cut (A, B) on its ﬁrst order part, we deﬁne its tail-head function TH(A,B):
TH(A,B) : M2/(=AT ) → M2/(=BH ), Such that, for any X ∈ M2,
TH(A,B)([X]=AT ) = [X]=BH . Additionally, we may wish to view TH(A,B) as a partial function:
TH(A,B) : P(A)/(=AT ) → P(B)/(=BH ).
Following from our propositions above, we have: Proposition 53. TH(A,B) is well-deﬁned and 1-1.
112

Deﬁnition. We say that two sets X, Y are equal or congruent mod ﬁnite (written X =∗ Y ) if their symmetric difference is ﬁnite.
Theorem 10. Two second order elements X, Y of an L-model have the same head and tail equivalence classes for every inﬁnite cut iff they are equal mod ﬁnite.
Proof. (⇒): suppose we have two sets X, Y which have the same head and tail equivalence classes, but which are not equal mod ﬁnite. Consider those points in Z = X∆Y , which must be inﬁnite. As mentioned in [9], every inﬁnite linear ordering has an inﬁnite ascending sequence or an inﬁnite descending sequence. Assume Z with its inherited ordering has an inﬁnite ascending sequence x0, . . . , xn, . . . (the argument in the descending case is similar, although it is important to note that no inﬁnite descending sequence can intersect the ω part). Split L into two parts, A = { ∈ L : ∃n : xn > }, and B = L \ A (which may be empty, but that isn’t disallowed). It can easily be seen that (A, B) is an inﬁnite cut. I assert that X and Y do not have the same =AT equivalence class, since given any x it’s not the case that X and Y agree on A after x by construction of A in terms of points where X and Y differ.
(⇐) is immediate for each cut (A, B). There are only ﬁnitely many points to avoid, so one may easily avoid them.
Finally, in this section we present the Tail-Head Lemma, which states that an L-model is determined by the set of its tail-head functions. But ﬁrst, in order to prove this, we need a quick lemma about linear orderings in general:
Deﬁnition. Given a linear ordering L, an inﬁnite collection of intervals I is said to be a cut covering family if it satisﬁes:
113

• For each interior cut (A, B) (ie, A, B = ∅) of L, there is an interval I in I such that I contains some elements of A and some elements of B.
• For the exterior cuts (∅, L) and (L, ∅), there are intervals I in I which are initial and nonempty and ﬁnal and nonempty respectively.
In either case, the interval is said to cover the cut.
Proposition 54. For any linear ordering L and cut covering family I for L, there is a ﬁnite subcollection I ⊂ I having union all of L.
Proof. Suppose we have an inﬁnite cut covering family I for L. Consider S as the set of all x ∈ L such that the interval {y ∈ L|y ≤ x} is a subset of a ﬁnite union of elements of I. This set S is nonempty, as the interval covering the cut (∅, L) is initial, and thus alone covers a number of intervals of the form {y ∈ L|y ≤ x}. S is also an initial interval of L, since if we can cover {y ∈ L|y ≤ x} and w < x, then the same set of intervals covers {y ∈ L|y ≤ w}. Suppose S is not all of L. Consider the cut (S, L \ S). This cut is covered by some interval I, which contains an element s ∈ S and t ∈/ S. Add this interval to the ﬁnite cover for s and we get get a ﬁnite set of intervals covering up to t contradicting that t ∈/ S. Thus, S is all of L.
Finally, consider the ﬁnal cut (L, ∅). This is covered by a ﬁnal interval I containing a point x. Thus all of L is covered by the ﬁnite set of intervals covering up to x together with I.
Now we are prepared to prove the Tail-Head Lemma.
Theorem 11 (Tail-Head Lemma). For given L, the set of second order elements of any L-model is determined by the set of its tail-head functions TH(A,B) for every inﬁnite cut
114

(A, B) of ω + ζ · L.
Proof. Suppose we have two L-models H = (ω + ζ · L, M2, ∈, S) and H = (ω + ζ · L, M2, ∈, S), with the same tail-head functions for every inﬁnite cut. Let X ∈ M2. I claim that X is also in M2. This would sufﬁce to prove the lemma.
For every inﬁnite cut (A, B), consider an element Y in M2 which agrees with X on its head and tail equivalence classes for that cut. In particular, there’s some elements a ∈ A and b ∈ B such that X ∩ [a, b] = Y ∩ [a, b]. Note that [a, b] is an element of M2 and M2 so X ∩[a, b] is an element shared in common by H and H . We collect these intervals as part of a cut covering family I with [a, b] covering the cut (A, B).
For every ﬁnite cut (A, B), let a be the ﬁnal element of A and b the initial element of B. Throw in the interval [a, b] to I to cover (A, B). Note that X ∩ [a, b] is in both M2 and M2.
For the cut (∅, ω + ζ · L), we cover it by throwing the interval {0} into I. Note that X ∩ {0} is in both M2 and M2.
For the cut (ω + ζ · L, ∅), let x be an upper bound for X. Cover our cut by throwing the interval {y|y > x} into I. Note that X ∩ {y|y > x} = ∅ is in both M2 and M2.
All together, this gives us a complete cut covering family I. By the proposition above, there is a ﬁnite subcollection I ⊆ I having union all of ω + ζ · L.
Take the union over all intervals I in I of X ∩ I. This is a ﬁnite union of elements of M2, hence an element of M2. However, since all of these intervals together cover all of ω + ζ · L, this union is X. Thus X is in M2, as desired.
115

One necessarily asks the question: what sorts of tail-head functions are available? Taking the simplest example, one may ask what possible tail-head functions are available for the (ω, ζ) cut for 1-models of WS1S. In the next section, we propose a partial solution.
4.6 Countable 1-Models
Looking at the L-models HL,k that we constructed previously, we notice that the possible tail equivalence classes for the cut (ω, ζ · L) are just the periodic ones. Put another way, the set of all intersections of second order elements with the ω part is just the set of all eventually periodic subsets of ω. One may then wonder if, given a particular subset of ω, there is a model of WS1S that has second order elements, which, when intersected with the ω part, give us that particular subset. Or, more generally, if one has a suitably closed collection of subsets of ω, is there a model of WS1S for which that collection is exactly the set of intersections of second order elements with the ω part?
One can of course get whatever subsets one wants with a compactness argument:
Proposition 55. Given a collection A of subsets of ω, there is a nonstandard model of WS1S such that for every A ∈ A, there is a second order element X in the model whose intersection with ω is A.
Proof. For each element A ∈ A, add the second order constant symbol CA to the language of WS1S. For each A ∈ A and n ∈ ω, add either the formula Sn(0) ∈ CA or Sn(0) ∈/ CA as appropriate to the theory of WS1S. Together this
116

produces a collection of formulas, any ﬁnite subset of which is satisﬁable (if one has only ﬁnitely many restrictions on CA, one may construct a ﬁnite set of natural numbers satisfying those restrictions). By the compactness theorem, it follows that there is a model of WS1S with constants CA|A ∈ A for which the intersection CA ∩ ω = A.
However, the proof of the compactness theorem makes no assurances on the sort of models being constructed. Following the proof in [8], we ﬁrst complete the collection of formulas in question, and then perform a Henkin construction to get our model. If one of our sets A in A is unbounded, we will need to introduce a new ﬁrst order object, call it ∞, to serve as an upper bound for CA. Due to the arbitrariness of our choices in how we complete our collection of formulas, it may be the case that we wind up requiring that CA and CA agree on elements of the form S−n(∞) for all n for A and A with different tail equivalence classes. Every time this happens, we are required to add in a genuinely new ﬁrst order element ∞ to represent an upper bound for (CA∆CA ) ∩ {x : x ≤ z}, as well as elements of the form Sz(∞) for integer z. As we can see, the L for these models can, unchecked, grow very large.
What if we wish to restrict our attention just to 1-Models? In case of countable A, we can still ﬁnd our nonstandard model. For clarity, we will refer to elements in the ζ portion of a 1-model as ∞ + z for appropriate z.
Proposition 56. Any 1-model is determined by its tail-head function TH(ω,ζ).
Proof. ω + ζ only has two inﬁnite cuts, (ω, ζ) and (ω + ζ, ∅). In the later case, we already know the tail head function. There is only one tail equivalence class, that of the empty set, since all second order elements are bounded, and it gets
117

mapped to the only head equivalence class, which is trivial.

Deﬁnition. Given three sets X−, X0, X1 partitioning ω, deﬁne:

Bit(X−, X0, X1) = (Y0, Y1),

Such that x ∈ Yi if the largest y ≤ x which is in X0 ∩ X1 is in Xi, or 0 if none exist.

Deﬁne:

RBit(X−, X0, X1) = (Y0, Y1),

Such that: x ∈ Yi if the smallest y ≥ x which is in X0 ∩ X1 is in Xi, or 0 if none exist.

Deﬁnition. Given sets Xσ for σ ∈ Sn partitioning ω, deﬁne:

Accn(X) = Y,

Where Y is also a collection indexed by elements of Sn, and x ∈ Y i<x σi where σi is such that i ∈ Yσi.
Deﬁnition. Given a collection of subsets X0, . . . , Xn−1 of the same set L, a conﬁguration is an n-tuple of 0s and 1s. We say that the conﬁguration s appears at ∈ L in X0, . . . , Xn−1 if:
∈ Xi ⇐⇒ si = 1.
We use the shorthand write c to X0, . . . , Xn−1 at position in L to mean:

• For each i with si = 1, throw into Xi, • For each i with si = 0, remove from Xi. Deﬁnition. We adapt the bit storage, reverse bit storage, and accumulator on Sn automata from Generating the Regular Functions and Krohn-Rhodes:
Bit := (3, {0, 1}, {0}, δ, {0, 1}),
118

Where δ is deterministic, with the b such that δ(a, s, b) is given by the following table: s\a 0 1 100 0 1 010 0 0 001 1 1 o.w. 0 0
Table 4.1: Transition Table for Bit
RBit := (3, {0, 1}, {0, 1}, δ, {0}), Where the unique a such that δ(a, s, b) is given by the following table:
s\a 0 1 100 0 1 010 0 0 001 1 1 o.w. 0 0
Table 4.2: Transition Table for RBit
Pick an enumeration of Sn.
Accn := (n!, Sn, id, δ, Sn),
Where δ is deterministic, with the b such that δ(a, s, b) is given by:
• a · σ where the only 1 in s is in position corresponding to σ, • id if there isn’t exactly one 1 in s.
Note that these are meant to correspond to Bit, RBit, and Accn, respectively. Proposition 57. If an L-model satisﬁes (1.1) through (2.5) and (3.Bit) and (3.Accn), then it satisﬁes (3.A) for arbitrary runnable A.
119

Proof. It sufﬁces to show that (3.A) holds for deterministic A. Recall from Generating the Regular Functions and Krohn-Rhodes that every map that takes a sequence of inputs and returns the run of A on those inputs with its last character removed can be written as a multivariable composition of character-wise maps and such truncated run maps for automata corresponding to Bit and Accn. One can check that this is just a statement about transition relations and holds even when the words being operated on are nonstandard.

As usual, one encodes words as partitions of the set of positions in the word. Via this encoding, character-wise maps become bitwise applications of bounded boolean functions. Speciﬁcally, if we have f : Σ → Σ , the translation of the character-wise application of f becomes:

f (Xa|a ∈ Σ) = (Yb|b ∈ Σ ),

Where:

Yb =

Xa.

a|f (a)=b

The truncated run maps become the maps from inputs to runs for Bit and Accn,

but with the ﬁnal state removed. Since our L-model is closed under these maps

(simply remove the ﬁnal character of the runs by closure under ﬁnite modiﬁca-

tion), our L-model is closed under any of their multivariable compositions. This

includes the map from inputs to runs for A with the ﬁnal state removed. We can

add in the ﬁnal state by closure under ﬁnite modiﬁcation, completing the proof

of (3.A).

Proposition 58. If an L-model satisﬁes (1.1) through (2.5) and (3.Accn) and (4.RBit) then it satisﬁes (4.A) for arbitrary reverse-runnable A.

Proof. Recall that, up to character-wise maps, the transition relations for run120

ning Accn forward and Accn backwards are the same. The essential difference between saying that Accn has a forward run and that Accn has a reverse run is that in the forward run, the ﬁrst state is speciﬁed, and in the backwards run, the last state is speciﬁed. However, by applying a character-wise premultiplication by the inverse of the last element of the forwards run of Accn, we get the reverse run of Accn.
The proof then continues similarly to the above proposition. Deﬁnition. Given a countable set W of subsets of ω which:
• Contains ∅, ω, • Is closed under union and difference (and hence all bitwise maps), • Is closed under ﬁnite modiﬁcation, • Is closed under Bit, RBit and Accn for every n.
Let: HW1 := (ω + ζ, MW , ∈, S),
Where MW is constructed via the following process:
Step 1) W is countable, so let W0, . . . , Wn, . . . be an enumeration of its elements. We will construct sets Zi ⊆ Z with the intention of throwing in sets of the form Wi ∪ Zi into our model. Begin our construction with the sets Zi empty, and deﬁne a variable h, which we think of as the location of a write head as the integer 0.
Step 2) Iterate through i ∈ ω, completing the following steps:
121

Step 3) Let C be the collection of conﬁgurations which appear inﬁnitely often among X0, . . . , Xi. We will iteratively make sure that when beginning this step, the write head will be located at a conﬁguration c in Zj|j < i such that the conﬁguration c appears inﬁnitely often among X0, . . . , Xi−1. Note that this is trivially true in the case of i = 0.
Step 4) Since the conﬁguration c appears inﬁnitely often among X0, . . . , Xi−1 then either c 0 or c 1 appears inﬁnitely often among X0, . . . , Xi. In the ﬁrst case, do nothing. In the second case, add h (the location of the write head) to Zi, so that either way the write head is now located at a conﬁguration in Zj|j ≤ i which appears inﬁnitely often among X0, . . . , Xi.
Step 5) Iterating through conﬁgurations c in C (note that C must be ﬁnite), complete the following steps:
Step 6) Letting c be the conﬁguration at the current write head location, locate a sequence of conﬁgurations
c = c0, . . . , ck−1 = c ,
Such that:
• Each cj is from C, • The sequence of conﬁgurations occurs consecutively within X0, . . . , Xi.
We can ﬁnd such a conﬁguration by locating an instance of the conﬁguration c beyond the point in ω where those conﬁgurations which appear only ﬁnitely often among X0, . . . , Xi ﬁnish appearing, and then locating an instance of the conﬁguration c which appears after that. Our sequence is the sequence of conﬁgurations which appear between those instances of c and c inclusive.
122

Step 7) Write the conﬁgurations c0, . . . , ck−1 to locations h − k + 1, . . . , h respectively. Note that conﬁguration ck−1 is already present at position h. Move the write head to position h − k + 1.
Step 8) If there are still more c ∈ C which we haven’t iterated through, move on to the next c ∈ C and continue from step 6). Otherwise go on to step 9).
Step 9) Move on to the next i ∈ ω, continuing from step 3). Complete this for all i ∈ ω, then go on to step 10).
Step 10) Throw into our set MW every set of the form Wi ∪ Zi for i ∈ ω and Zi a ﬁnite modiﬁcation of Zi.
This completes our construction.
Proposition 59. When this construction has completed, a conﬁguration appears inﬁnitely often within W0, . . . , Wj iff it appears initially often within Z0, . . . , Zj for any j
Proof. (⇒): Note that for each a < b, if the conﬁguration c appears inﬁnitely often within W0, . . . , Wa then there is an extension of c which appears inﬁnitely often within W0, . . . , Wb, by the inﬁnite pigeonhole principle. Thus, for each iteration referred to in step 2) for i ≥ j, we write c to Z0, . . . , Zj at least once. Since there are inﬁnitely many such iterations, this completes our proof.
(⇐): Suppose the conﬁguration c appears inﬁnitely often within Z0, . . . , Zj. Then it must be that for each iteration i referred to in step 2), we write conﬁguration c to Z0, . . . , Zj, which means that some extension of c appears inﬁnitely often among W0, . . . , Wi. From this we can conclude that c appears inﬁnitely often among W0, . . . , Wi.
123

Proposition 60. Let X0, . . . , Xn−1 be arbitrary elements of MW. Then a conﬁguration appears inﬁnitely often within the ω part of X0, . . . , Xn−1 iff it appears initially often within the ζ part of X0, . . . , Xn−1.
Proof. (⇒): Note that there is some r such that X0, . . . , Xn−1 all come from sets of the form Wi ∪ Zi for i < r. Suppose a conﬁguration c appears inﬁnitely often within the ω part of X0, . . . , Xn−1. Then some suitable extension of c appears inﬁnitely often within W0, . . . , Wr−1. This extension appears inﬁnitely often within Z0, . . . , Zr−1. Thus it appears inﬁnitely often among Z0, . . . , Zr−1. Thus c appears inﬁnitely often within the ζ part of X0, . . . , Xn−1.
(⇐) is the reverse of (⇒).
Proposition 61. Given two elements X, Y in MW, if X and Y eventually agree on their ω parts, then they initially agree on their ζ parts.
Proof. Suppose X is Wa ∪ Za and Y is Wb ∪ Zb. Wlog, assume a < b. Then conﬁgurations which appear inﬁnitely often among W0, . . . , Wb agree in their a and bth coordinates. Hence conﬁgurations which appear inﬁnitely often among Z0, . . . , Zb agree in their a and bth coordinates. Hence Za and Zb agree initially (only the empty conﬁguration appears after 0 in the ζ part). Hence Za and Zb agree initially.
Proposition 62. MW is closed under ﬁnite modiﬁcation.
Proof. Suppose we have an X in MW and wish to modify whether it contains x or not. If x is in the ζ part, this holds by construction. Otherwise, X is of the form Wi ∪Zi. Let Wj be Wi modiﬁed appropriately at x (recall that W was closed under ﬁnite modiﬁcations). Then Y = Wj ∪ Zj is in our model.
124

Since X and Y eventually agree on their ω parts, they initially agree on their ζ parts. Thus Zj is a ﬁnite modiﬁcation of Zi so Wj ∪ Zi is in our model, which is the desired modiﬁcation of X.
Proposition 63. Every set in MW has a largest element.
Proof. Suppose we have an X = Wi ∪ Zi. Recall that Zi had upper bound ∞ + 0. If Zi is empty, one can argue that Wi does not contain inﬁnitely many elements, and so has a maximal element which is also the maximal element of X. If Zi is nonempty, since it is a ﬁnite modiﬁcation of a set with upper bound ∞ + 0, it has a maximal element, which is also the maximal element of X.
Proposition 64. Given elements X0, . . . , Xn−1 of MW, if the conﬁgurations c, c appear consecutively coinitially in the ζ part among X0, . . . , Xn−1, then they appear consecutively in the ω part.
Proof. Using the same sorts of conﬁguration expansion and ﬁnite modiﬁcation arguments as in previous propositions, it sufﬁces to prove this for Xi = Wi ∪ Zi for i < n. By construction, anywhere in the ζ part before the position where the write head was located at the beginning of the iteration for n, any consecutive pair of conﬁgurations among Z0, . . . , Zn−1 will come from the same sequence of conﬁgurations as in step 6), which was copied directly from W0, . . . , Wn−1.
Theorem 12. Given a countable set W of subsets of ω which:
• Contains ∅, ω, • Is closed under union and difference (and hence all bitwise maps), • Is closed under ﬁnite modiﬁcation,
125

• Is closed under Bit, RBit and Accn for every n.
It is the case that HW1 is a model of WS1S.
Proof. First, we need to show that HW1 is a 1-model. Speciﬁcally, we need to show that it contains sets of the form {x|x < }, and we need to show that every predecessor closed set is of that form.
I claim that MW contains the empty set, since W contained the empty set, so some X in MW has empty ω part. But by a previous proposition, the conﬁguration 0 appears inﬁnitely often in the ζ part of X and the conﬁguration 1 does not appear inﬁnitely often in the ζ part of X. Thus X is of the form ∅ ∪ Zi for Zi a ﬁnite set. Since MW is closed under ﬁnite modiﬁcations, the empty set is in MW .
Since MW was closed under ﬁnite modiﬁcations, it contains all sets of the form {x|x < } for ∈ ω.
I claim the MW also contains the interval [0, ∞+0]. Recall that W contained ω, so some X in MW has full ω part. By a previous proposition, the conﬁguration 1 appears inﬁnitely often in the ζ part of X and the conﬁguration 0 does not appear initially often in the ζ part of X. Thus X contains and so is a ﬁnite modiﬁcation of some interval of the form [0, z + ∞]. By closure under ﬁnite modiﬁcation, all intervals of the form {x|x < } are in MW for in the ζ part.
Next, we show that the only sets in MW which are predecessor closed are of the form {x|x < }. Suppose we have a predecessor-closed set X. If it doesn’t contain any elements from the ζ part, it is a predecessor closed subset of ω, thus of the desired form. If it does contain any elements from the ζ part, it contains
126

an initial segment of them. Thus, it contains a ﬁnal segment of ω. By downward closure, it thus contains all of ω. One can then argue that it is of the desired form by looking at its maximal element.
As such, HW1 is a 1-model, and so satisﬁes axioms (1.1) through (1.7) and (2.1). Now we check the rest:
(2.2) was previously proved.
(2.3) Suppose we have two sets Wi ∪ Zi and Wj ∪ Zj from MW . Since W was closed under boolean operations, it contains some Wk = Wi ∪ Wj. The only conﬁgurations that show up in the ω part of the tuple Wi ∪ Zi, Wj ∪ Zj, Wk ∪ Zk are of the form 000, 011, 101, and 111. Thus these are the only conﬁgurations which show up initially in their ζ parts. By upper bounding, the only the conﬁguration 000 shows up ﬁnally in their ζ parts. By ﬁnite modiﬁcation of Zk to get Zk, we can ﬁx those ﬁnitely many positions where these conﬁgurations do not appear, thus making sure these are the only conﬁgurations which show up in the tuple Wi ∪ Zi, Wj ∪ Zj, Wk ∪ Zk. That is to say, Wk ∪ Zk is the desired union.
(2.4) is similar.
(2.5) was proved in a previous proposition.
In order to prove axiom (3.A) we will ﬁrst prove (3.Bit) and (3.Accn).
(3.Bit): Suppose we are given a triple of inputs Wi∪Zi, Wj ∪Zj, Wk ∪Zk. Since W was closed under the Bit map, there is some (Wg, Wh) which has the correct start state and satisﬁes the transition relation for Bit at all positions in ω for inputs Wi∪Zi, Wj ∪Zj, Wk ∪Zk. Now look at Zg and Zh. By an above proposition, any consecutive pair of conﬁgurations that appear initially in the ζ part among
127

Wi ∪ Zi, Wj ∪ Zj, Wk ∪ Zk, Wg ∪ Zg, Wh ∪ Zh will satisfy the transition relation for Bit. Thus, by ﬁnite modiﬁcation, we can ﬁnd Zg, Zh such that (Wg ∪ Zg, Wh ∪ Zh) is the desired run of Bit on input Wi ∪ Zi, Wj ∪ Zj, Wk ∪ Zk.
(3.Accn): Similar to the above but with Accn.
Thus, as proved in an earlier proposition, (3.A) holds for every runnable A.
In order to prove (4.A) we will ﬁrst prove (4.RBit) (we just proved (3.Accn)).
(4.RBit): Suppose we are given a triple of inputs Wi ∪ Zi, Wj ∪ Zj, Wk ∪ Zk. We divide the argument up into two cases: one in which Wi ∪ Zi has the full head and tail equivalence classes and one in which it doesn’t.
Case 1) If Wi ∪Zi has the full head and tail equivalence classes, one can easily verify that the desired run (Y0, Y1) will have one of Y0 or Y1 in the full head and tail equivalence classes and one in the empty head and tail equivalence classes. Thus one is a ﬁnite modiﬁcation of the empty set, and the other is a ﬁnite modiﬁcation of the interval [0, ∞ + 0].
Case 2) Otherwise, since W was closed under the RBit map, there is some (Wg, Wh) which satisﬁes the transition relation for RBit at all positions in ω for inputs Wi∪Zi, Wj ∪Zj, Wk ∪Zk. Now look at Zg and Zh. By an above proposition, any consecutive pair of conﬁgurations that appear initially in the ζ part among Wi ∪ Zi, Wj ∪ Zj, Wk ∪ Zk, Wg ∪ Zg, Wh ∪ Zh will satisfy the transition relation for RBit. Thus, by ﬁnite modiﬁcation (and we need that either Wj ∪ Zj or Wk ∪ Zk make appearances in the ζ part to do this) we can ﬁnd Zg, Zh such that (Wg ∪ Zg, Wh ∪ Zh) is the desired run with the correct end state of RBit on input Wi ∪ Zi, Wj ∪ Zj, Wk ∪ Zk.
128

Thus, as proved in an earlier proposition, (4.A) holds for every reverserunnable A.
Now that we have a technique for constructing 1-models with interesting tail-head functions, we would like a technique for moving these tail-head functions around. In the next section, we describe as simple cut-and-paste technique for producing new models of WS1S from old.
4.7 Cut-And-Paste Models
In this section, we describe a procedure for gluing together a number of separate models of WS1S to produce a new model of WS1S. Proposition 65. In any model of WS1S, if X is a nonempty second order elements of the model, I is an interval with least element, and X ⊆ I, and:
∀x : S(x) ∈ X =⇒ (x ∈ X ∨ x ∈/ I), Then X is of the form [a, b] for a the minimal element of I and b is some element of I.
When the hypothesis holds, we say that X is nonempty and predecessor closed within I.
Proof. Let a be the minimal element of I. I claim that X ∪ [0, a) will be predecessor closed, and thus of the form [0, b] for b the maximal element of X. Thus X will be of the form [a, b]. Proposition 66. In any model of WS1S, if (n, Q, I, δ, Q) is a runnable ﬁnite automaton, and X0, . . . , Xn−1 are second order elements of our model, for any there is a
129

delayed run of our ﬁnite automaton, starting at and satisfying the transition relation δ. Speciﬁcally, there are Y0, . . . , Y|Q|−1 such that:
• The Yq partition the interval [ , S(max X)], • ∈ Yq for some q ∈ I, • If ≤ x ≤ max X, and x ∈ Yq, x ∈ Xs, S(x) ∈ Yq , then δ(q, s, q ).
Proof. One can prove this is true of the standard model of WS1S by simple induction. Since we can express this fact in the language of WS1S, it must be true of any nonstandard model. Deﬁnition. A cut-and-paste scheme is a tuple:
(L, I0, . . . , In−1, H0, . . . , Hn−1, J0, . . . , Jn−1, f0, . . . , fn−1), Where:
• L is a linear ordering of the form ω + ζ · L , • The I0, . . . , In−1 form a partition of L into intervals, • Hi is an Li-model of WS1S, for some linear order Li, • Ji for i < n − 1 is an interval in the second order part of Hi, • Jn−1 is a ﬁnal interval of the ﬁrst order part of Hn−1, • fi : Ii → Ji is an order preserving bijection. Deﬁnition. Given a cut-and-paste scheme (L, I, H, J, f ), deﬁne the resolution of the cut-and-paste scheme as the L-model with second order part:
W := { f −1(Xi)|Xi ⊆ Ji}.
i
Where the Xi are required to be second order elements of their respective Hi.
130

Theorem 13. Given a cut-and-paste scheme (L, I, H, J, f ), its resolution (ω + ζ · L, W, ∈, S) is an L-model of WS1S.
Proof. For i < n − 1, we note that since the Ji are intervals in the second order part of Hi, they have least and greatest elements, and thus since fi is an order preserving bijection, the Ii have least and greatest elements as well. This leaves In−1 (since I form a partition) to have a least element but no greatest element.
First, we need to show that (ω + ζ · L, W, ∈, S) is actually an L-model. As usual, we just need to check that every interval of the form {x|x < } is in W and that the only sets X in W which when intersected with each ω or ζ part form an initial interval of that part are of the form {x|x < }.
Given an , there is some interval Ii in the cut-and-paste scheme containing . Since Hi is a model of WS1S, it contains the interval [min(Ji), fi( )). What’s more, Hj contains the interval Jj for j < i (since such j cannot be n − 1). One simply veriﬁes then that:
[0, ) = f −1(Jj) ∪ f −1([min(Ji), fi( )))
j<i
= Ij ∪ [min(Ii), )
j<i
Now, suppose that X ∈ W which when intersected with each ω or ζ part forms an initial interval of that part. Then I claim that for each i, X ∩ Ii forms an initial interval of Ii. Note that X must be predecessor closed. Recall that X ∩ Ii was the image under the order preserving bijection f of some second order element Xi in Hi, which has to be a subset of Ji. Thus, Xi is predecessor closed in Ji, and thus by our lemma above, some initial interval of Ji. It then follows by our bijection that X ∩ Ii forms an initial interval of Ii. Combine
131

this with the fact that if X contains the least element of Ii then by predecessor closure, it contains the greatest element of Ii−1, and we can see that X must be an interval of the form {x|x < }.

Thus, the resolution of the cut-and-paste scheme is an L-model, and thus satisﬁes (1.1) through (1.7) and (2.1).

(2.2): Suppose we have an x ∈ ω + ζ · L. Then it fell into some interval Ii. Let:


{f (x)} Xj =
∅

j=i j = i,

Then I claim that

{x} = f −1(Xi).
i

(2.3): Suppose we have two sets A, B, with:

A = f −1(Ai),
i
B = f −1(Bi),
i
With Ai, Bi second order elements of Hi which are subsets of Ji. Then I claim
that:
A ∪ B = f −1(Ai ∪ Bi).
i
With Ai ∪ Bi second order elements of Hi which are subsets of Ji.

(2.4): The proof is the same as the proof for (2.3).

(2.5): Suppose we have a set X with:
X = f −1(Xi),
i

132

With Xi second order elements of Hi which are subsets of Ji. Let i be the largest such that Xi is nonempty. Then Xi has a largest element. The image of this largest element is the largest element of X within Ii, and thus the largest element of X overall.
(3.A) Suppose we have runnable automaton A = (m, Q, I, δ, Q), and suitable input X0, . . . , Xm−1, with:
Xj = f −1(Xj,i).
i
With Xj,i second order elements of Hi which are subsets of Ji.
Iterating through possible values of i do the following:
Step 1) Determine the state of A going into the beginning of Ii. If i = 0 this is the start state. Otherwise, we computed this at the end of the previous iteration.
Step 2) Use the proposition above to ﬁnd a run of A starting at the minimal element of Ji on input X0,i, . . . , Xm−1,i, running up to either the largest element of Ji or the image of the largest element of X, whichever is smaller (at least one will exist).
Step 3) If we ran up to the largest element of X, glue together the images under f −1 of the found runs, I claim that this is the desired run. Otherwise, add one to i and go back to step 1), remembering the state we wound up in at the end.
(4.A) The argument is similar to the above. Note that if we’re looking in a model of WS1S for a run of a reverse-runnable automaton A on an interval [a, b] which ends in a particular state q, we simply use axiom (4.A) on inputs X0 ∩ [0, S−1(b)], . . . , Xm−1 ∩ [0, S−1(b)].
133

Since all of the axioms are satisﬁed, the resolution is a model of WS1S.
Note that our initial restrictions on W were necessary of any projection of the second order part of a model of WS1S to its ω part.
4.8 Conclusion and Further Research
This paper makes signiﬁcant strides towards the complete classiﬁcation of nonstandard models of WS1S. In L-models we found a constructive canonical form for a representative of each isomorphism class of nonstandard model, and showed that for every linear ordering we could possibly expect as the ﬁrst order part of a nonstandard model of WS1S (those of the form ω + ζ · L), there is some nonstandard model of WS1S with that ﬁrst order part when ordered by that model’s interpretation of <. Tail-head functions gave us a language for specifying our L-models precisely, allowing us to completely capture all of the relevant information about a nonstandard model of WS1S.
Additionally, we provided a technique for producing a large variety of 1models, and then a technique for gluing together pieces of nonstandard models of WS1S to produce new models. Combining these two, we could also produce a large variety of n-models, where n is the natural linear ordering on {0, . . . , n−1}.
This is a very promising start, but it is missing a few key pieces that will be necessary for any complete classiﬁcation of 1-models. First, as we can see in the example of H1,k for various k, the projection of the second order part of a 1-model of WS1S to its ω part is not enough to uniquely determine the model. There may be many ways of completing those projections to a full 1-
134

model of WS1S. As such, the question of what sets of bounded subsets of Z can be completed backwards to produce a 1-model of WS1S is technically open, although it is likely that by allowing reversal in our cut-and-paste schemes this question can be resolved quickly.
One would also like to be able to extend this result to study the uncountable sets of subsets of ω that can be extended to 1-models of WS1S. The construction in our proof has an iterative building nature that closely resembles what one gets if one unpacks certain forcing constructions. It is possible that our argument may be obfuscated by translating it into a forcing proof, which may yield insights into how to extend it for uncountable collections W.
Additionally, we haven’t yet answered the question of what sets of subsets of ω can be completed to produce an L-model of WS1S for L without initial element. If L does have an initial element, we can use cut-and-paste arguments to turn L-models into 1-models with the same second order projection to their ω parts and vice versa. We cannot do such a cut-and-paste argument for L without initial element, although it is likely that a modiﬁcation of the original construction would sufﬁce. One gets a sense from how the Henkin construction freely adds new ﬁrst-order elements as necessary that perhaps the 1-model case is the hardest.
If we look at those sets of subsets of ω which can be completed to produce a 1-model of WS1S, we see that there’s some notion of regular function that they need to be closed under. This leads us to deﬁne a partial ordering ≤A on tuples of subsets of ω (which we can encode as ω sequences of characters from some ﬁnite alphabet) where X ≤A Y if X is the output of some regular function on input Y. It is suspected that understanding the structure of this partial ordering
135

will yield insights into the class of all 1-models of WS1S. And of course, all of this work can be done over again in at least three other
cases: • S1S, the theory of ω and arbitrary subsets of ω with ∈ and S, • WS2S, the theory of ﬁnite strings of 0s and 1s and ﬁnite sets of such strings, with ∈ and S0, S1 which tack a 0 or 1 onto the end of a string. • S2S, the theory of ﬁnite strings of 0s and 1s and arbitrary sets of such strings, with ∈ and S0, S1.
Initial speculation suggests that the S1S case should be fairly easy. The most technical portion of each proof will likely be showing that the standard construction for complementing automata over these structures works correctly.
136

APPENDIX A APPENDIX A: LIST OF AXIOMS

(1.1) ∀x, y, z : x < y ∧ x < z =⇒ x < z

(< is transitive)

(1.2) ∀x, y : x < y =⇒ ¬(y < x)

(< is antisymmetric)

(1.3) ∀x, y : x < y ∨ x = y ∨ y < x

(< is total)

(1.4) ∀x : S(x) > x

(S is strictly increasing)

(1.5) ∀x : y : x < y < S(x)

(S is a successor)

(1.6) ∃o : ∀x : o ≤ x

(< has minimal element)

(1.7) ∃o : ∀x : x = o =⇒ ∃y : S(y) = x

(S has image ω \ {0})

(2.1) ∀X, Y : (∀x : x ∈ X ⇔ x ∈ Y ) ⇒ X = Y

(Extensionality)

(2.2) ∀x : ∃X : ∀y : y ∈ X ⇔ y = x

(Singleton)

(2.3) ∀X, Y : ∃Z : ∀x : x ∈ Z ⇔ (x ∈ X ∨ x ∈ Y )

(Union Closure)

(2.4) ∀X, Y : ∃Z : ∀x : x ∈ Z ⇔ (x ∈ X ∧ x ∈/ Y )

(Difference Closure)

(2.5) ∀X = ∅ : ∃x : x ∈ X ∧ ∀y : y ∈ X ⇒ y ≤ x

(Maximal Element)

(3.A) ∀X0, . . . , Xn−1 : Φ(n,Q,I,δ,Q)(X0, . . . , Xn−1)

(A Runnable)

(4.A) ∀X0, . . . , Xn−1 : Φ(n,Q,Q,δ,F )(X0, . . . , Xn−1)

(A Reverse-Runnable)

137

BIBLIOGRAPHY
[1] Bu¨ chi, J. R. (1990). On a Decision Method in Restricted Second Order Arithmetic. The Collected Works of J. Richard Bu¨ chi, 425-435.
[2] Downey, R. G., & Fellows, M. R. (1999). Parameterized complexity. New York: Springer.
[3] Egri-Nagy, A., & Nehaniv, C. (2005). Algebraic Hierarchical Decomposition of Finite State Automata: Comparison of Implementations for KrohnRhodes Theory. Implementation and Application of Automata Lecture Notes in Computer Science, 315-316.
[4] Egri-Nagy, A., & Nehaniv, C. (2006). Making Sense of the Sensory Data Coordinate Systems by Hierarchical Decomposition. Lecture Notes in Computer Science Knowledge-Based Intelligent Information and Engineering Systems, 333340.
[5] Ginzburg, A. (1968). Algebraic theory of automata. New York: Academic Press.
[6] Kleene, S. C.: 1956, ‘Representation of Events in Nerve Nets and Finite Automata’, in C. E. Shannon and J. McCarthy (eds.), Automata Studies, Princeton University Press, Princeton, NJ, pp. 3-42.
[7] Krohn, K., & Rhodes, J. (1965). Algebraic Theory of Machines. I. Prime Decomposition Theorem for Finite Semigroups and Machines Transactions of the American Mathematical Society, 450-450.
[8] Marker, D. (2002). Model theory: An introduction. New York: Springer.
[9] Rosenstein, J. G. (1982). Linear orderings. New York: Academic Press.
[10] Siefkes, D. (1978). An axiom system for the weak monadic second order theory of two successors. Israel J. Math. Israel Journal of Mathematics, 30(3), 264-284.
138