Machine Science: Automated Modeling Of Deterministic And Stochastic Dynamical Systems

Other Titles


The work presented here advances the technology to analyze experimental data and automatically hypothesize about explanatory models and physical laws that help explain observations. Automated Modeling, sometimes referred to as Symbolic Regression or System Identification, is the process of searching a possibly infinite space of mathematical expressions in order to optimize various objectives - for example, identifying the simplest possible nonlinear equation that captures the observed dynamics of a system. Traditionally, the task of formulating analytical models and theory has remained entirely within the purview of human expertise, and also human limitation. However, the development of Evolutionary Algorithms, and more recently Genetic Programming, has made searching for analytical models automatically a possibility. The work presented here focuses on advancing the algorithms and techniques for Automated Modeling to shrink this "reality gap," and applies these advances to various real and experimental systems for the first time. The specific contributions of this work fall into four categories: search methods and algorithms, model representations and the types of systems that can be analyzed, techniques for interpreting solutions and results, and applications in science and engineering fields. The most important contribution in the search methods is the Fitness and Rank Prediction algorithm, which enables utilizing exceedingly large data sets with low computational effort. This algorithm is based on the idea that, at any given time, only a small number of carefully selected data points are necessary to discriminate among candidate models, allowing large reductions in computational effort. In model representations, the most important contribution is the principle for identifying meaningful invariant quantities amongst the infinite number of trivial invariant expressions. This principle enables searching for physical laws and conservations directly from experimental measurements. In the interpretation of results, the most important contribution is Parameter Mapping technique, which relates an automatically inferred model to a previous model through repeated regressions. Finally, the most important contribution in applications is the analysis of yeast Glycolytic oscillations, which demonstrates and compares several techniques in order to identify a complete nonlinear ordinary differential equation model directly from data.

Journal / Series

Volume & Issue



Date Issued




Machine Science; Artificial Intelligence; Evolutionary Computation


Effective Date

Expiration Date




Union Local


Number of Workers

Committee Chair

Lipson, Hod

Committee Co-Chair

Committee Member

Ellner, Stephen Paul
Strogatz, Steven H

Degree Discipline

Computational Biology

Degree Name

Ph. D., Computational Biology

Degree Level

Doctor of Philosophy

Related Version

Related DOI

Related To

Related Part

Based on Related Item

Has Other Format(s)

Part of Related Item

Related To

Related Publication(s)

Link(s) to Related Publication(s)


Link(s) to Reference(s)

Previously Published As

Government Document




Other Identifiers


Rights URI


dissertation or thesis

Accessibility Feature

Accessibility Hazard

Accessibility Summary

Link(s) to Catalog Record