Machine Science: Automated Modeling Of Deterministic And Stochastic Dynamical Systems

Other Titles

The work presented here advances the technology to analyze experimental data and automatically hypothesize about explanatory models and physical laws that help explain observations. Automated Modeling, sometimes referred to as Symbolic Regression or System Identification, is the process of searching a possibly infinite space of mathematical expressions in order to optimize various objectives - for example, identifying the simplest possible nonlinear equation that captures the observed dynamics of a system. Traditionally, the task of formulating analytical models and theory has remained entirely within the purview of human expertise, and also human limitation. However, the development of Evolutionary Algorithms, and more recently Genetic Programming, has made searching for analytical models automatically a possibility. The work presented here focuses on advancing the algorithms and techniques for Automated Modeling to shrink this "reality gap," and applies these advances to various real and experimental systems for the first time. The specific contributions of this work fall into four categories: search methods and algorithms, model representations and the types of systems that can be analyzed, techniques for interpreting solutions and results, and applications in science and engineering fields. The most important contribution in the search methods is the Fitness and Rank Prediction algorithm, which enables utilizing exceedingly large data sets with low computational effort. This algorithm is based on the idea that, at any given time, only a small number of carefully selected data points are necessary to discriminate among candidate models, allowing large reductions in computational effort. In model representations, the most important contribution is the principle for identifying meaningful invariant quantities amongst the infinite number of trivial invariant expressions. This principle enables searching for physical laws and conservations directly from experimental measurements. In the interpretation of results, the most important contribution is Parameter Mapping technique, which relates an automatically inferred model to a previous model through repeated regressions. Finally, the most important contribution in applications is the analysis of yeast Glycolytic oscillations, which demonstrates and compares several techniques in order to identify a complete nonlinear ordinary differential equation model directly from data.

Journal / Series
Volume & Issue
Date Issued
Machine Science; Artificial Intelligence; Evolutionary Computation
Effective Date
Expiration Date
Union Local
Number of Workers
Committee Chair
Lipson, Hod
Committee Co-Chair
Committee Member
Ellner, Stephen Paul
Strogatz, Steven H
Degree Discipline
Computational Biology
Degree Name
Ph. D., Computational Biology
Degree Level
Doctor of Philosophy
Related Version
Related DOI
Related To
Related Part
Based on Related Item
Has Other Format(s)
Part of Related Item
Related To
Related Publication(s)
Link(s) to Related Publication(s)
Link(s) to Reference(s)
Previously Published As
Government Document
Other Identifiers
Rights URI
dissertation or thesis
Accessibility Feature
Accessibility Hazard
Accessibility Summary
Link(s) to Catalog Record