Machine Science: Automated Modeling Of Deterministic And Stochastic Dynamical Systems
The work presented here advances the technology to analyze experimental data and automatically hypothesize about explanatory models and physical laws that help explain observations. Automated Modeling, sometimes referred to as Symbolic Regression or System Identification, is the process of searching a possibly infinite space of mathematical expressions in order to optimize various objectives - for example, identifying the simplest possible nonlinear equation that captures the observed dynamics of a system. Traditionally, the task of formulating analytical models and theory has remained entirely within the purview of human expertise, and also human limitation. However, the development of Evolutionary Algorithms, and more recently Genetic Programming, has made searching for analytical models automatically a possibility. The work presented here focuses on advancing the algorithms and techniques for Automated Modeling to shrink this "reality gap," and applies these advances to various real and experimental systems for the first time. The specific contributions of this work fall into four categories: search methods and algorithms, model representations and the types of systems that can be analyzed, techniques for interpreting solutions and results, and applications in science and engineering fields. The most important contribution in the search methods is the Fitness and Rank Prediction algorithm, which enables utilizing exceedingly large data sets with low computational effort. This algorithm is based on the idea that, at any given time, only a small number of carefully selected data points are necessary to discriminate among candidate models, allowing large reductions in computational effort. In model representations, the most important contribution is the principle for identifying meaningful invariant quantities amongst the infinite number of trivial invariant expressions. This principle enables searching for physical laws and conservations directly from experimental measurements. In the interpretation of results, the most important contribution is Parameter Mapping technique, which relates an automatically inferred model to a previous model through repeated regressions. Finally, the most important contribution in applications is the analysis of yeast Glycolytic oscillations, which demonstrates and compares several techniques in order to identify a complete nonlinear ordinary differential equation model directly from data.
Machine Science; Artificial Intelligence; Evolutionary Computation
Ellner, Stephen Paul; Strogatz, Steven H
Ph. D., Computational Biology
Doctor of Philosophy
dissertation or thesis