DYNAMIC MODELS OF METABOLIC NETWORKS
AND ANALYSIS OF CELL-FREE PROTEIN
SYNTHESIS
A Dissertation
Presented to the Faculty of the Graduate School
of Cornell University
in Partial Fulfillment of the Requirements for the Degree of
Doctor of Philosophy
by
Michael Vilkhovoy
December 2019
© 2019 Michael Vilkhovoy
ALL RIGHTS RESERVED
DYNAMIC MODELS OF METABOLIC NETWORKS AND ANALYSIS OF
CELL-FREE PROTEIN SYNTHESIS
Michael Vilkhovoy, Ph.D.
Cornell University 2019
BIOGRAPHICAL SKETCH
Michael Vilkhovoy was born in Holyoke, Massachusetts and attended the
University of Massachusetts Amherst, graduating cum laude with a Bachelors of
Science in Chemical Engineering with Departmental Honors in 2014. He enrolled
into the Robert Frederick Smith School of Chemical and Biomolecular Engineer-
ing at Cornell University in August of 2014. During his time at Cornell he was
interested in developing data-driven computational models of biological systems.
Under the guidance of Professor Jeffrey Varner, Michael studied complex metabolic
reaction networks and developed wet-lab and computational metabolic engineer-
ing tools. He received his doctorate of philosophy in Chemical and Biomolecular
Engineering in 2019.
iii
This work is dedicated to my sister, Tanya.
iv
ACKNOWLEDGEMENTS
I would first like to thank my wife, Inna, for all her support and always being there
for me. And to my parents, for raising me and guiding me into the person I am
today. I would like to thank Professor Jeffrey Varner for his guidance in sharpening
my skillset to be a better scientist and communicator. Thank you to my friends and
members of the Varner group for the continued support and making the process
enjoyable throughout the years, including Nick Horvath, David Dai, Tyler Moeller,
Adi Sagar, Mason Minot, Sandra Vadhin, and Abhinav Adhikari. And a final thank
you to my family: Alex, Julie, Viktoria, Serge, David, Liz, Christina, and Mark for
your love and support.
v
TABLE OF CONTENTS
Biographical Sketch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii
1 Introduction 1
1.1 Metabolic modeling methods . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Constraint based modeling of metabolism . . . . . . . . . . . . . . . 4
1.3 Cell-free protein synthesis . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3.1 Mathematical models of cell-free protein synthesis . . . . . . 8
2 Effective dynamic models of metabolic networks 12
2.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.5 Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.5.1 Elementary mode and flux balance analysis . . . . . . . . . . 23
2.5.2 Global sensitivity analysis . . . . . . . . . . . . . . . . . . . . 24
2.5.3 Estimation of model parameters . . . . . . . . . . . . . . . . 24
3 Sequence Specific Modeling of E. coli Cell-Free Protein Synthesis 26
3.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.3 Results and discussion . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.3.1 Model derivation and validation . . . . . . . . . . . . . . . . 31
3.3.2 Metabolic flux distributions . . . . . . . . . . . . . . . . . . . 35
3.3.3 Analysis of CFPS performance . . . . . . . . . . . . . . . . . 40
3.3.4 Global sensitivity analysis . . . . . . . . . . . . . . . . . . . . 47
3.3.5 Potential alternative metabolic optima . . . . . . . . . . . . . 51
3.3.6 Summary and conclusions . . . . . . . . . . . . . . . . . . . . 54
3.4 Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.4.1 Glucose/NMP cell-free protein synthesis. . . . . . . . . . . . 56
3.4.2 Protein product and metabolite measurements. . . . . . . . 58
3.4.3 Formulation and solution of the model equations. . . . . . . 59
vi
3.4.4 Calculation of energy efficiency. . . . . . . . . . . . . . . . . 64
3.4.5 Quantification of uncertainty. . . . . . . . . . . . . . . . . . . 65
3.4.6 Global sensitivity analysis. . . . . . . . . . . . . . . . . . . . 66
3.4.7 Potential alternative optimal metabolic flux solutions. . . . . 66
3.5 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4 Absolute quantification of cell-free protein synthesis metabolism by
reversed-phase liquid chromatography-mass spectrometry 68
4.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
4.3.1 Aniline tagged metabolites . . . . . . . . . . . . . . . . . . . 73
4.3.2 Amino Acid Analysis . . . . . . . . . . . . . . . . . . . . . . 75
4.3.3 Nucleotide charged sugars . . . . . . . . . . . . . . . . . . . 80
4.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
4.5 Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . 84
4.5.1 Aniline derivatization . . . . . . . . . . . . . . . . . . . . . . 84
4.5.2 Amino acid derivatization . . . . . . . . . . . . . . . . . . . . 87
4.5.3 Nucleotide charge sugar detection . . . . . . . . . . . . . . . 87
4.6 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
5 An integrated kinetic constraint-based model of E. coli cell-free protein
synthesis 90
5.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
5.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
5.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
5.3.1 Integration of kinetic parameters, enzyme levels, and metabo-
lite concentrations . . . . . . . . . . . . . . . . . . . . . . . . 94
5.3.2 Transcription/Translation is oxygen dependent . . . . . . . 97
5.3.3 Kinetic descriptions with metabolic constraints predict
metabolic behavior of oxidative phosphorylation inhibitors 101
5.3.4 Analysis of CFPS metabolism with oxidative phosphoryla-
tion inhibitors . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
5.3.5 Enzyme activity assays reveal allosteric regulation in CFPS . 110
5.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
5.5 Materials & Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
5.5.1 Cell-free protein synthesis and oxidative phosphorylation
inhibitors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
vii
5.5.2 Absolute quantification of central carbon metabolites . . . . 117
5.5.3 Amino acid analysis . . . . . . . . . . . . . . . . . . . . . . . 118
5.5.4 Glutamate and maltose assays . . . . . . . . . . . . . . . . . 119
5.5.5 Protein quantification . . . . . . . . . . . . . . . . . . . . . . 119
5.5.6 Enzyme activity assays . . . . . . . . . . . . . . . . . . . . . . 120
5.5.7 Absolute quantification of mRNA . . . . . . . . . . . . . . . 120
5.5.8 Formulation of model equations . . . . . . . . . . . . . . . . 122
5.5.9 Quantification of uncertainty . . . . . . . . . . . . . . . . . . 126
5.5.10 Calculation of energy efficiency . . . . . . . . . . . . . . . . . 127
5.5.11 Calculation of carbon yield . . . . . . . . . . . . . . . . . . . 128
6 Toward a genome scale sequence specific dynamic model of cell-free pro-
tein synthesis in Escherichia coli 130
6.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
6.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
6.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
6.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
6.5 Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . 153
6.5.1 Cell-free protein synthesis and measurement. . . . . . . . . 153
6.5.2 Formulation and solution of the model equations. . . . . . . 154
6.5.3 Estimation of kinetic model parameters. . . . . . . . . . . . . 158
6.5.4 Reaction group knockouts. . . . . . . . . . . . . . . . . . . . 161
6.5.5 Sensitivity of CAT productivity to transcription and translation.163
6.5.6 Calculation of energy efficiency. . . . . . . . . . . . . . . . . 164
6.5.7 Availability of model code. . . . . . . . . . . . . . . . . . . . 165
6.6 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
7 JuPOETs: A constrained multiobjective optimization approach to esti-
mate biochemical model ensembles in the Julia programming language 172
7.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
7.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
7.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
7.3.1 JuPOETs optimization problem formulation. . . . . . . . . . 177
7.4 Availability of data and materials . . . . . . . . . . . . . . . . . . . . 183
7.5 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . 185
7.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
7.7 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
8 Summary & Conclusion 196
viii
A Appendix 200
ix
LIST OF TABLES
3.1 Transcription and translation template reactions for protein produc-
tion. The symbol GP denotes the gene encoding protein product P ,
RT denotes the concentration of RNA polymerase, G∗P denotes the
gene bounded by the RNA polymerase (open complex), ηi and αj
denote the stoichiometric coefficients for nucleotide and amino acid,
respectively, Pi denotes inorganic phosphate, RX denotes the ribo-
some concentration, R∗X denotes bound ribosome, and AAj denotes
jth amino acid. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.2 Parameters for sequence specific flux balance analysis . . . . . . . 63
4.1 Each compound’s corresponding limit of detection, range of linear-
ity and correlation coefficient identified from standard curves. . . 76
4.2 Each compound’s corresponding peak number, retention time, m/z
value for 12C, 13C, and unlabeled, cone voltage, and MS species. . 77
4.3 Each amino acid’s retention time separated by reverse-phase liquid
chromatography and detected by TUV at 260nm with the corre-
sponding limit of detection, linear range, and correlation coefficient. 79
4.4 Each compound’s retention time and mass over charge ratio with
the corresponding limit of detection, linear range, and correlation
coefficient. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
5.1 Parameters for sequence specific flux balance analysis . . . . . . . 129
6.1 Breakdown of ATP generation. Flux through ATP-generating path-
ways in the first and second phases as percentages of total ATP
generation in that phase. . . . . . . . . . . . . . . . . . . . . . . . . 166
6.2 Breakdown of ATP consumption. Flux through ATP-consuming
pathways in the first and second phases as percentages of total ATP
consumption in that phase. . . . . . . . . . . . . . . . . . . . . . . . 167
6.3 Mean and standard deviation of Akaike information criterion (AIC),
by measurement, for the ensemble and random ensemble. . . . . . 168
6.4 Reference values for reaction rate maxima (Vmax) from BioNum-
bers. Vmax values calculated from turnover numbers (kcat) from
BioNumbers, and a characteristic enzyme concentration of 170 nM.
Characteristic rate maximum for all other reactions calculated as
geometric mean of calculated rate maxima. . . . . . . . . . . . . . . 169
x
6.5 Enzyme levels for key reaction fluxes, calculated from enzyme
turnover numbers [3] and rate maxima from the best-fit set. . . . . 170
6.6 Reference values for transcription, translation, and mRNA degra-
dation from literature. Transcription rate calculated from elonga-
tion rate, mRNA length, and promoter activity level. Translation
rate calculated from elongation rate, protein length, and polysome
amplification constant. mRNA degradation rate calculated from
mRNA degradation time. . . . . . . . . . . . . . . . . . . . . . . . . 171
7.1 Multi-objective optimization test problems. We tested the JuPOETs
implementation on three two-dimensional test problems, with one-
, two- and three-dimensional parameter vectors. Each problem
had parameter bounds constraints, however, on the Binh and Korn
function had additional non-linear problem constraints. For the
Fonesca and Fleming problem, N = 3. . . . . . . . . . . . . . . . . . 186
A.1 List of materials and equipment used to quantify cell-free protein
synthesis metabolites with aniline tagging and internal standards . 201
xi
LIST OF FIGURES
1.1 A schematic of the integration of transcription and translation pro-
cesses integrated with metabolism. Transcription and translation
processes demand macromolecular precursors (e.g. NTPS, amino
acids and cofactors) from metabolism for gene expression. The
target protein in turn can effect enzymatic flux (orange arrow) or
the target protein is synthesized as a product (green arrow). The
integrated framework is represented as a stiochiometric matrix of
metabolites participating in certain reactions, where the flux is esti-
mated subject to constraints, a pseudo-steady state assumption and
an objective function. . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Cell-free protein synthesis. Cell extract is prepared by cell lysis
and cellular debris and chromosome DNA is removed. An energy
source along with necessary amino acids, nucleotides, and cofactor
are added to the cell-free reaction. Template DNA of the target
protein is added. The target protein is then easily purified from
the cell-free system. Alternatively, cell-free extract can be freeze
dried into pellets and paired with lyophilized DNA. Through the
simple addition of water, proteins can be manufactured on site and
on demand. Figure adapted from [26, 137]. . . . . . . . . . . . . . . 8
2.1 HCM proof of concept metabolic study. A: HCMs distribute uptake
and secretion fluxes amongst different pathways. For HCM, these
pathways are elementary modes; for HCM-FBA these are flux bal-
ance analysis solutions. HCM combines all possible modes within
a network; whereas HCM-FBA combines only steady-state paths
estimated by flux balance analysis. B: Prototypical network with six
metabolites and seven reactions. Intracellular cellmass precursors
A, B, and C are balanced (no accumulation) while the extracellular
metabolites (Ae, Be, and Ce) are not balanced (can accumulate). The
oval denotes the cell boundary, qj is the jth flux across the boundary,
and vk denotes the kth intracellular flux. C: Simulation of extracellu-
lar metabolite trajectories using HCM-FBA (solid line) versus HCM
(points) for the prototypical network. . . . . . . . . . . . . . . . . . 15
xii
2.2 HCM-FBA versus HCM performance for small and large metabolic
networks. A: Batch anaerobic E. coli fermentation data versus HCM-
FBA (solid) and HCM (dashed). The experimental data was repro-
duced from Kim et al. [95]. Error bars represent the 90% confidence
interval. B: Batch aerobic E. coli fermentation data versus HCM-
FBA (solid). Model performance is also shown when minor modes
(dashed) and major modes (dotted) were removed from the HCM-
FBA model. The experimental data was reproduced from Varma &
Palsson [176]. Error bars denote a 10% coefficient of variation. . . . 17
2.3 Global sensitivity analysis of the aerobic E. coli model. Total or-
der variance based sensitivity coefficients were calculated for the
biomass yield on glucose and acetate. Sensitivity coefficients were
computed for kinetic parameters and enzyme initial conditions (N
= 183,000). Error bars represent the 95% confidence intervals of the
sensitivity coefficients. . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.1 Sequence specific flux balance analysis. A. Schematic of the core
metabolic network coupled to sequence-specific transcription and
translation processes of a protein of interest for cell-free protein
synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2 Sequence specific flux balance analysis of deGFP under a P70a
promoter in TXTL 2.0 E. coli extract. A. deGFP production for 8
h using maltose and 3PG as a carbon and energy source (R2 =
0.84). Error bars denote a 10% deviation from the nominal value. B.
Predicted versus measured deGFP concentration as a function of
plasmid concentration in TXTL 2.0 (R2 = 0.97). Error bars denote the
standard deviation of experimental measurements. The blue region
denotes the 95% CI over an ensemble of N = 100 sets, the black line
denotes the mean of the ensemble, and dots denote experimental
measurements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.3 Optimal metabolic flux distribution for CAT production. A. Opti-
mal flux distribution in the presence of amino acid supplementation
and de novo synthesis. B. Optimal flux distribution in the presence
of amino acid supplementation without de novo synthesis. C. Op-
timal flux distribution with de novo amino acid synthesis in the
absence of supplementation. Mean flux across the ensemble (N =
100), normalized to glucose uptake flux. Thick arrows indicate flux
to or from amino acid biosynthesis pathways. . . . . . . . . . . . . 36
xiii
3.4 Experimentally constrained simulation of CAT production. CAT
was produced under a T7 promoter in CFPS E. coli extract for 1 h
using glucose as a carbon and energy source. Error bars denote the
standard deviation of experimental measurements. The blue region
denotes the 95% CI over an ensemble of N = 100 sets, the black line
denotes the mean of the ensemble, and dots denote experimental
measurements. A. Metabolic flux distribution for CAT production
in the presence of experimental constraints for glucose, organic
acid and amino acid consumption and production rates. Mean
flux across the ensemble, normalized to glucose uptake flux. Thick
arrows indicate flux to or from amino acids. B. Central carbon
metabolite and CAT measurements versus simulations over a 1
hour time course. The blue region denotes the 95% CI over an
ensemble of N = 100 sets, the black line denotes the mean of the
ensemble, and dots denote experimental measurements. . . . . . . 39
3.5 The CFPS performance for eight model proteins with and without
amino acid supplementation. A. Mean CFPS productivity for a
panel of model proteins with and without amino acid supplementa-
tion. B. Mean CFPS productivity versus carbon number for a panel
of model proteins with and without amino acid supplementation.
Trendline (black dotted line) was calculated across all cases for a
P70a promoter (R2 = 0.99) and maximum productivity trendline
assumed u (κ) = 1 (grey dotted line; R2 = 0.99). C. Mean CFPS
energy efficiency for a panel of model proteins with and without
amino acid supplementation. D. Mean CFPS energy efficiency ver-
sus carbon number for a panel of model proteins with and without
amino acid supplementation. Trendline for cases with amino acids
(black dotted line) and trendline for without amino acids (grey dot-
ted line; R2 = 0.81). Error bars: 95% CI calculated by sampling;
asterisk: protein excluded from trendline; dagger: constrained by
experimental measurements and excluded from trendline; triangles:
first principle prediction and excluded from trendline. . . . . . . . 45
xiv
3.6 Sensitivity analysis of the cell-free production of CAT. A. Total or-
der sensitivity of the optimal CAT productivity with respect to
metabolic and transcription/translation parameters. B. Total or-
der sensitivity of the optimal CAT energy efficiency. Metabolic
and transcription/translation parameters were varied for amino
acid supplementation and synthesis (black), amino acid supple-
mentation without synthesis (dark grey) and amino acid synthesis
without supplementation (light gray). Error bars represent the 95%
CI of the total order sensitivity index. . . . . . . . . . . . . . . . . . 48
3.7 Optimal CAT energy efficiency versus oxidative phosphorylation
flux calculated across an ensemble (N = 1000) of flux balance solu-
tions (points). Energy efficiency versus oxidative phosphorylation
flux for amino acid supplementation and de novo synthesis (black),
amino acid supplementation without de novo synthesis (dark grey),
and de novo amino acid synthesis without supplementation (light
gray). The ensemble was generated by randomly varying the oxy-
gen consumption rate from 0.1 to 10 mM/h and randomly sampling
the transcription and translation parameters within 10% of their
literature values. Each point represents one solution of the model
equations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.8 Pairwise knockouts of reaction subgroups in the cell-free network.
A. Difference in the CAT productivity in the presence of reaction
knockouts compared with no knockouts for experimentally con-
strained CAT production. B. Difference in the flux distribution in
the presence of reaction knockouts compared with no knockouts
for experimentally constrained CAT production. The difference
between perturbed and wild-type productivity and flux distribu-
tions was quantified by the l2 norm, and then normalized so the
maximum change was 1.0. Red boxes indicate potential alternative
optimal flux distributions with the same CAT productivity as the
wild type, whereas no red box indicates no feasible solution and/or
the optimal CAT productivity was not met. . . . . . . . . . . . . . . 52
3.9 Robust analysis of maltose and 3PG consumption for TXTL 2.0
E. coli extract with and without oxidative phosphorylation activity
that meet the transcription and translation constraints. Each dot
represents the mean of an ensemble of N = 20 ssFBA solutions, black
dots are solutions without oxidative phosphorylation and grey dots
are solutions with oxidative phosphorylation. . . . . . . . . . . . . 55
xv
4.1 Schematic of workflow for aniline tagging. The cell-free protein
synthesis reaction is de-proteinized and tagged with 12C-aniline,
while a standard stock mixture is tagged with 13C-aniline. Both
mixtures are then mixed at a 1:1 volumetric ratio and analyzed by
LC/MS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.2 Mass chromatogram from a single LC/MS run of a 40µM standard
mixture of 40 metabolites. Peaks were identified by their retention
time and m/z values for each compound. Complete compound
names and their abbreviations are listed in Table 4.1. . . . . . . . . 74
4.3 Amino acid chromatogram tagged and separated by reverse-phase
liquid chromatography and detected with a TUV at 260nm. Peaks
were identified by their retention time. . . . . . . . . . . . . . . . . 78
4.4 Nucleotide charged sugars chromatogram separated by reverse-
phase liquid chromatography and detected by mass-spectrometry
according to each compounds mass over charge ratio. Peaks were
identified by their retention time and selective ion recording. . . . 81
5.1 Modeling framework of cell-free protein synthesis. The metabolic
network was adapted from Vilkhovoy and coworkers where tran-
scription/translation was integrated with metabolism. Maximum
flux bound rates were formulated to be a function of the turnover
rate and enzyme abundance found to be present in CFPS extract. En-
zyme levels were validated for a subset of 15 reactions with enzyme
activity assays. Four of the enzymes were not reported in Garenne
and coworker (grey boxes), but were found to be active with it’s
corresponding enzyme activity assay. The flux estimation for each
time step was estimated while being constrained to metabolic mea-
surements where data was present (62 species). Finally, the flux
calculation was sampled across an ensemble of 100 sets given ex-
perimental noise and literature parameters. hk: hexokinase, gdh:
glutamate dehydrogenase, ppc: phosphoenolpyruvate carboxylase,
sdh: succinate dehydrogenase. . . . . . . . . . . . . . . . . . . . . . 93
xvi
5.2 Mean flux distribution across an ensemble (N=100) for control.
Fluxes were determined by integrating kinetic parameters with
enzyme levels and constraining to measurements of metabolites
and enzyme activity levels where data was available. (a) Flux
distribution at 2 hours of CFPS reaction. (b) Flux distribution at 8
hours of CFPS reaction. Fluxes were normalized to maltodextrin
consumption at t=0 hours. . . . . . . . . . . . . . . . . . . . . . . . 96
5.3 Prediction of mRNA and protein levels in CFPS for control (blue),
DNP (red) and TTA (grey). (a) The mRNA levels of GFP were
predicted with the given modeling framework. (b) The protein
abundance of GFP was predicted for all three cases. The solid line
denotes the mean of the ensemble (N=100), the shaded region de-
notes the 95% confidence interval of the ensemble, the points denote
experimental measurements, and error bars denote the standard
deviation of experimental measurements. . . . . . . . . . . . . . . . 97
5.4 Time course of amino acid levels in CFPS for control (blue), DNP
(red) and TTA (grey). Experimental amino acid fluxes constrained
the mathematical model of CFPS. The solid line denotes the mean
of the ensemble (N=100), the shaded region denotes the 95% con-
fidence interval of the ensemble, the points denote experimental
measurements, and error bars denote the standard deviation of
experimental measurements. . . . . . . . . . . . . . . . . . . . . . . 98
5.5 Time course of upper central carbon metabolite levels in CFPS for
control (blue), DNP (red) and TTA (grey). DNP showed exhuas-
tion of maltose revealing maltodextrin depletion and thus high
carbon utilization. Experimental fluxes constrained the mathemati-
cal model of CFPS. The solid line denotes the mean of the ensemble
(N=100), the shaded region denotes the 95% confidence interval of
the ensemble, the points denote experimental measurements, and
error bars denote the standard deviation of experimental measure-
ments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
xvii
5.6 Time course of lower central carbon metabolite levels in CFPS for
control (blue), DNP (red) and TTA (grey). DNP heavily relied on
substrate level phosphorylation with high accumulation of acetate,
whereas TTA had a high abundance of lactate. Experimental fluxes
constrained the mathematical model of CFPS. The solid line denotes
the mean of the ensemble (N=100), the shaded region denotes the
95% confidence interval of the ensemble, the points denote experi-
mental measurements, and error bars denote the standard deviation
of experimental measurements. . . . . . . . . . . . . . . . . . . . . . 100
5.7 Time course of energy species levels in CFPS for control (blue),
DNP (red) and TTA (grey). Both DNP and TTA exhausted GTP
within 4 hours of the reaction which is required for translation.
Experimental fluxes constrained the mathematical model of CFPS.
The solid line denotes the mean of the ensemble (N=100), the shaded
region denotes the 95% confidence interval of the ensemble, the
points denote experimental measurements, and error bars denote
the standard deviation of experimental measurements. . . . . . . . 101
5.8 Mean flux distribution across an ensemble (N=100) for DNP. Fluxes
were determined by integrating kinetic parameters with enzyme
levels and constraining to measurements of metabolites and enzyme
activity levels where data was available. (a) Flux distribution at
2 hours of CFPS reaction. Flux difference from control shown for
key reactions at 2 hours of CFPS reaction. (b) Flux distribution at 8
hours of CFPS reaction. Flux difference from control shown for key
reactions at 8 hours of CFPS reaction. Fluxes were normalized to
maltodextrin consumption at t=0 hours. . . . . . . . . . . . . . . . 104
5.9 Mean flux distribution across an ensemble (N=100) for TTA. Fluxes
were determined by integrating kinetic parameters with enzyme
levels and constraining to measurements of metabolites and enzyme
activity levels where data was available. (a) Flux distribution at
2 hours of CFPS reaction. Flux difference from control shown for
key reactions at 2 hours of CFPS reaction. (b) Flux distribution at 8
hours of CFPS reaction. Flux difference from control shown for key
reactions at 8 hours of CFPS reaction. Fluxes were normalized to
maltodextrin consumption at t=0 hours. . . . . . . . . . . . . . . . 106
5.10 Mean energy efficiency across an ensemble (N=100) for control (a),
DNP (b), and TTA (c) throughout the metabolic network. TXTL de-
notes the energy efficiency for transcription and translation processes.108
xviii
5.11 Mean carbon yield across an ensemble (N=100) for control (a), DNP
(b), and TTA (c) for CFPS. PPP denotes the Pentose Phosphate Path-
way. Other includes purine, pyrimidine and chorismate metabolism.109
5.12 Enzyme activity measurements reveal allosteric regulation is
present in CFPS. Enzyme activity assays at 2 and 8 hours of the
CFPS reaction throughout the metabolic network for control (black),
DNP (dark grey), and TTA (light grey). . . . . . . . . . . . . . . . . 111
6.1 Schematic of the core portion of the cell-free E. coli metabolic
network. Metabolites of glycolysis, pentose phosphate pathway,
Entner-Doudoroff pathway, and TCA cycle are shown. Metabolites
of oxidative phosphorylation, amino acid biosynthesis and degrada-
tion, transcription/translation, chorismate metabolism, and energy
metabolism are not shown. . . . . . . . . . . . . . . . . . . . . . . . 136
6.2 Central carbon metabolism in the presence (top) and absence (bot-
tom) of allosteric control, including glucose (substrate), CAT (prod-
uct), and intermediates, as well as total concentration of energy
species. Best-fit parameter set (orange line) versus experimental
data (points). 95% confidence interval (blue or gray shaded region)
over the ensemble of 100 sets. . . . . . . . . . . . . . . . . . . . . . . 140
6.3 Amino acids in the presence of allosteric control. Best-fit parameter
set (orange line) versus experimental data (points). 95% confidence
interval (blue shaded region) over the ensemble of 100 sets. . . . . 141
6.4 Energy species and energy totals by base in the presence of allosteric
control. Best-fit parameter set (orange line) versus experimental
data (points). 95% confidence interval (blue shaded region) over
the ensemble of 100 sets. . . . . . . . . . . . . . . . . . . . . . . . . . 142
6.5 Histograms of model parameters, across the ensemble of 100 sets.
A. Histogram of rate maxima. B. Histogram of saturation constants. 143
6.6 Log of cost function (residual between training data and model
simulations) across 37 datasets for data-trained ensemble (blue) and
randomly generated ensemble (red, gray background). Median
(bars), interquartile range (boxes), range excluding outliers (thin
lines), and outliers (circles) for each dataset. Median across all
datasets (large bar overlaid). . . . . . . . . . . . . . . . . . . . . . . 144
xix
6.7 Effect of group knockouts on system. A. Change in CAT productiv-
ity when one (diagonal) or two (off-diagonal) reaction groups are
turned off. B. Change in system state (only species for which data
exist) when one (diagonal) or two (off-diagonal) reaction groups
are turned off. Total-order effect for each group calculated as the
sum of first-order effect and all pairwise effects. Larger and darker
circles represent greater effects. . . . . . . . . . . . . . . . . . . . . . 145
6.8 Key reaction fluxes of the network, in the first (gray boxes, top row)
and second (gray boxes, bottom row) phases of metabolism. A.
Fluxes of ATP generation and consumption, and GTP consumption
toward protein synthesis. B. Fluxes of glycolysis and lactate and
acetate metabolism. Fluxes are normalized to the first-phase glucose
uptake rate. For PEP and pyruvate, accumulation (normalized to
glucose uptake) is also shown. . . . . . . . . . . . . . . . . . . . . . 148
7.1 Schematic of multiobjective parameter mapping. The performance
of any given parameter set is mapped into an objective space using
a ranking function which quantifies the quality of the parameters.
The distance away from the optimal tradeoff surface is quantified
using the Pareto ranking scheme of Fonseca and Fleming in JuPOETs.179
7.2 The performance of JuPOETs on the multi-objective test suite. The
execution time (wall-clock) for JuPOETs and POETs implemented
in Octave was measured for 10 independent trials for the suite of
test problems. The number of steps per temperature I = 10, and the
cooling parameter α = 0.9 for all cases. The problem domain was
partitioned into 10 equal segments, an initial guess was drawn from
each segment. For each of the test functions, JuPOETs estimated
solutions on (rank zero solutions, black) or near (gray) the optimal
tradeoff surface, subject to bounds and problem constraints. . . . 187
7.3 Representative JuPOETs solutions for problems in the multi-
objective test suite. The number of steps per temperature I = 10,
and the cooling parameter α = 0.9 for all cases. The problem domain
was partitioned into 10 equal segments, an initial guess was drawn
from each segment. For each of the test functions, JuPOETs esti-
mated solutions on (rank zero solutions, black) or near (gray) the
optimal tradeoff surface, subject to bounds and problem constraints. 188
xx
7.4 Proof of concept biochemical network study. Inset right: Prototypi-
cal biochemical network with six metabolites and seven reactions
modeled using the hybrid cybernetic approach (HCM). Intracellu-
lar cellmass precursors A, B, and C are balanced (no accumulation)
while the extracellular metabolites Ae, Be, and Ce are dynamic. The
oval denotes the cell boundary, qj is the jth flux across the boundary,
and vk denotes the kth intracellular flux. Four data sets (each with
Ae, Be,Ce and cellmass measurements) were generated by varying
the kinetic constants for each biochemical mode. Each data set was
a single objective in the JuPOETs procedure. A: Ensemble simu-
lation of extracellular substrate Ae and cellmass versus time. B:
Ensemble simulation of extracellular substrate Be and Ce versus
time. The gray region denotes the 95% confidence estimate of the
mean ensemble simulation. The data points denote mean synthetic
measurements, while the error bars denote the 95% confidence es-
timate of the measurement computed over the four training data
sets. C: Trade-off plots between the four training objectives. The
quantity Oj denotes the jth training objective. Each point represents
a member of the parameter ensemble, where gray denotes rank 0
sets, while black denotes rank 1 sets. Ensembles were generated
using POETs without employing local refinement. . . . . . . . . . 189
7.5 Experiment to experiment variation captured by the ensemble. Cell-
mass measurements (points) versus time for experiment 2 and 3
were compared with ensemble simulations. The full ensemble was
sorted by simultaneously selecting the top 25% of solutions for each
objective with rank ≤ 1. The best fit solution for each objective
(line) ± 1-standard deviation (gray region) for experiment 2 and 3
brackets the training data despite significant differences the training
values between the two data sets. . . . . . . . . . . . . . . . . . . . 193
xxi
CHAPTER 1
INTRODUCTION
1.1 Metabolic modeling methods
Metabolism is the central process through which cells manage their resources to
survive, adapt and meet energetic demands. To implement these diverse functions,
cells have very complex and highly interconnected networks of chemical reactions
between genes, RNA, proteins and metabolites. Systems modeling arose from
the desire to better understand metabolism and how metabolism can be altered
for our benefit [48, 12]. Metabolic control analysis (MCA), developed by Kascer
and Burns, was among the first tools to define a quantitative approach towards
metabolic control [84]. However, MCA required a priori experimental data of in-
finitesimal enzyme perturbations that are difficult to acquire and the technique
lacked a predictive scope. Shortly after, biochemical systems theory (BST) was
developed which is based on ordinary differential equations (ODEs), where each
biochemical process is represented by power law expansions [147]. However, just
as MCA, it depended on local sensitivity arguments, limiting its predictability.
Cybernetic models provided a systematic approach for describing metabolic reg-
ulation by directing the allocation of resources towards a nutritional objective in
an optimal manner [177]. The advantage of cybernetic models over BST and other
contemporary frameworks is they are able to predict how network functionality
adjusts to different perturbations, creating a dynamic model. Cybernetic models
1
were first used to predict microbial growth on multiple substrates [98], however
the model used only abstract models of the network, which was favorable at the
time since the underlying biological mechanism was unknown. Since then, cyber-
netic models have been integrated with metabolic pathway analysis to successfully
predict metabolic shifts of E. coli with genetic deletions of pta-ackA [193], however
this model only consisted of 12 biochemical reactions and did not scale well for
genome-scale or even core metabolism models of microbes.
As biological understanding grew, metabolic models became more sophisti-
cated, able to describe cellular processes such as RNA synthesis, chromosome
synthesis, regulated catabolic and macromolecular synthesis pathways using ordi-
nary differential equations [173]. One of the first whole cell models was developed
by Shuler and coworkers which described the growth of E. coli on limited glucose
[38]. Since then, models have been expanded with sufficient detail for a variety
of cells. Karr et al. (2012) have developed a whole cell model of Mycoplasma geni-
talium, accounting for all genes and their interactions in the cell [88]. The model
is constructed with independent sub-models describing different components of
the cell, which is able to describe the life cycle from the level of single molecules.
Each sub-model was parameterized and tested independently, thus it is possible
that this whole cell model will not hold true under all conditions for the speci-
fied parameters. Even though some of these models have been successful, their
formulation is complex, nonlinear and requires a large set of parameters that are
computationally expensive to estimate. To overcome such obstacles, constraint
based methods [176] have been developed to help describe biochemical networks
2
Metabolism Transcription & Translation
NTPs DNA
Amino
Acids RNAp Degradation
Cofactors
mRNA
Ribosome
Target Product
Protein
 Mathematical Representation
Metabolic Reactions
v1 v2 v3 v4 vR Flux Constraints
x1 1 0 0 -1 0 0 ≤ v1 ≤ ∞
x2 0 -1 1 0 1 -∞ ≤ v2 ≤ ∞
x3 -1 0 1 0 0 X 0 ≤ v3 ≤ ∞ =  0
xM 1 0 0 -1 0 -∞ ≤ vR ≤ ∞
Figure 1.1: A schematic of the integration of transcription and translation pro-
cesses integrated with metabolism. Transcription and translation processes de-
mand macromolecular precursors (e.g. NTPS, amino acids and cofactors) from
metabolism for gene expression. The target protein in turn can effect enzymatic
flux (orange arrow) or the target protein is synthesized as a product (green arrow).
The integrated framework is represented as a stiochiometric matrix of metabolites
participating in certain reactions, where the flux is estimated subject to constraints,
a pseudo-steady state assumption and an objective function.
without the need of kinetic parameters of the cellular processes.
3
Metabolites
1.2 Constraint based modeling of metabolism
Stoichiometric reconstructions of microbial metabolism, popularized by constraint
based approaches such as flux balance analysis (FBA), have become standard tools
to interrogate metabolism [108]. FBA and metabolic flux analysis (MFA) [187], as
well as convex network decomposition approaches such as elementary modes [150]
and extreme pathways [148], model intracellular metabolism using the biochemical
stoichiometry and other constraints such as thermodynamical feasibility [64, 62]
under pseudo steady state conditions. Constraint based approaches use linear
programming [34] to predict productivity [176, 146], yield [176], mutant behavior
[43], and growth phenotypes [129] for biochemical networks of varying complexity,
including genome scale networks. Constraint based models have also been used to
identify strategies for the overproduction of desired compounds. These strategies
include genetic knockouts or the addition of heterologous enzyme pathways to an
organism’s metabolic network and have been used in developing useful bacterial
strains for the production of biofuels [10], high-value chemicals [124, 154, 192]
and pharmaceuticals [141, 47]. Stoichiometric reconstructions have been expanded
to include the metabolic demands for protein synthesis based on the DNA and
protein sequences (Fig. 1.1), where the transcription and translation processes
have been integrated into metabolism [4]. Since then, these models have been
expanded into genome-scale with detailed descriptions of gene expression (ME-
Model) [171, 108, 129] and protein structures (GEM-PRO) [197, 30] and successfully
capturing the regulatory effects they have on metabolism. These expansions have
4
greatly increased the scope of questions these models can explore. Constraint
based methods are powerful tools to estimate the performance of metabolic net-
works with very few to sometimes no parameters. In addition, they are able to
provide unintuitive strategies for metabolic engineering applications of increasing
productivity, yield or titer. Constraint based methods have typically been used to
model in vivo processes, and have not yet been applied to cell-free metabolism.
1.3 Cell-free protein synthesis
Cell-free biology is a powerful and flexible enabling technology that can engineer
biological parts and be used for biomolecule production without using living cells.
However, cell-free biology is not new, it has been practiced for decades. The first
examples of the use of cell-free protein synthesis were in 1950 by Borsook [19] and
Winnick [188]. Using animal tissue homogenates, they looked at how amino acids
were incorporated into proteins. A few years later, bacterial extracts (Staphylococcus
aureus) were used to confirm amino acid incorporation [51]. In 1956, the role of
ATP in protein production was discovered using rat liver extracts [69]. Soon after,
Nirenberg and Matthaei [118, 128] discovered the genetic code which led them to
earn the Nobel Prize in 1968. It is thus evident that cell-free systems have had a
significant impact on Molecular Biology.
Over the years, the cell extract preparation process has undergone significant
developments. In 1967, Lederman and Zubay developed a coupled transcription-
5
translation bacterial extract that allowed DNA to be used as a template [101].
Shortly after, Spirin and coworkers improved cell-free extract protein production
with a continuous exchange of reactant and product and could run for tens of
hours; however, these systems could only synthesize a single product and were
energy limited [159]. More recently, energy efficiency of E. coli CFPS was improved
by generating ATP with substrate level phosphorylation [93] and oxidative phos-
phorylation [82, 83, 81]. Since oxidative phosphorylation is a membrane associated
process, this reveals that membrane dependent energy metabolism can be activated
in cell-free and shows that complex metabolism is still occurring. With the advent
of genome sequencing, CFPS has shown remarkable utility as a protein synthesis
technology, given the knowledge of reaction networks that can be understood, al-
tered and controlled. CFPS systems are derived from crude cell extracts, taking the
cell’s machinery to operate transcription and translation processes while discarding
cellular debris and chromosomal DNA (Fig 1.2). Cell-free extracts are commonly
prepared from E.coli, S. cerevisiae, rabbit reticulocytes, wheat germ, and insect cells
[143] CFPS reactions are activated by the addition of amino acids, nucleotides,
template DNA, cofactors and an energy source. It is important to note that some
cell-free platforms are better suited for a particular application than the others. For
example, insect and mammalian cell extracts are equipped with post-translational
modification capabilities that might not be available in S. cerevisiae or E. coli extracts
[139].
While earlier approaches focused on investigating biological phenomena, today,
CFPS is used to produce complex biological products. CFPS has been utilized in a
6
wide range of applications from the production of pharmaceutical proteins [110,
55, 126] to high-throughput production of protein libraries for protein evolution
and structural genomics[164]. Point-of-care protein synthesis is also a promising
technology with microfluidic reactors [172]. N-linked glycoproteins have also been
produced in an E. coli-based CFPS by Guarino and DeLisa [57]. And more recently,
single-pot glycoprotein synthesis with glycosylation machinery has been achieved
[78], allowing the development of important therapeutics.
Thus, CFPS is a promising platform for manufacturing of proteins and chemi-
cals; a technology that has traditionally applied to living cells. If cell-free protein
synthesis (CFPS) is to become a mainstream technology for advanced applica-
tions such as point of care manufacturing [137], we must first understand the
performance limits and costs of these systems [81]. Cell-free systems offer many
advantages for the study, manipulation and modeling of metabolism compared
to in vivo processes. Central amongst these is direct access to metabolites and the
biosynthetic machinery without the interference of a cell wall or the complications
associated with cell growth. This allows interrogation of the chemical environment
while the biosynthetic machinery is operating, potentially at a fine time resolution.
Despite the advantages and disadvantages of in vivo and CFPS processes, a funda-
mental challenge in metabolic engineering remains: the identification of genetic
manipulations that accomplish the desired function most effectively [12]. Due to
the complexity and immense interconnectivity of metabolic networks, even for
simple prokaryotic organisms like E. coli, making the appropriate genetic manip-
ulation for a desired function is not intuitive. Computational and mathematical
7
Prepare and Energy substrates
store extract DNA plasmid
Target
Remove cellular Protein
Extract Preparation Cell Lysis debris and
chromosomal DNA
Freeze 
dry
Rehydrate molecular 
Freeze-dried instructions
pellets Target
Protein
Freeze 
dry
Freeze-dried
DNA constructs
Figure 1.2: Cell-free protein synthesis. Cell extract is prepared by cell lysis and
cellular debris and chromosome DNA is removed. An energy source along with
necessary amino acids, nucleotides, and cofactor are added to the cell-free reaction.
Template DNA of the target protein is added. The target protein is then easily
purified from the cell-free system. Alternatively, cell-free extract can be freeze
dried into pellets and paired with lyophilized DNA. Through the simple addition
of water, proteins can be manufactured on site and on demand. Figure adapted
from [26, 137].
models of metabolic networks offer powerful tools to aid our understanding of
metabolism and rational-design for improving cell-free protein expression [184].
1.3.1 Mathematical models of cell-free protein synthesis
There have been several mathematical models of cell-free protein synthesis, how-
ever, the majority of models published focus on the transcription and translation
processes. These models are mostly systems of ODEs based on Michaelis-Menten
kinetics. For example, Karzbrun and coworkers developed a coarse-grained model
of transcription and translation for E. coli cell-free extract [89]. To simplify calcula-
8
tions, this model was based on four enzymes and ten parameters. Transcription
and translation processes were assumed to follow Michaelis-Menten kinetics. The
authors noted that the protein synthesis rate began to exponentially decay after
1 hour, so their study focused on the first hour of the cell-free experiment. This
decay was attributed to resource depletion and waste accumulation.
Stögbauer and coworkers developed a model that accounts for resource con-
sumption and degradation and identified the bottleneck of protein synthesis [161].
Variables representing transcription and translation resources were added to the
model, but the exact identities and quantities of these resources were beyond the
scope of the study. The authors attempted to use Hill functions to better pre-
dict saturation effects of mRNA and their protein of interest but found that the
optimized Hill coefficients were close to one, resulting in Michaelis-Menten-like
approximations. Protein yield was determined to be a function of template DNA
concentration. Interestingly, this work found that nucleotide triphosphate deple-
tion was not the source of protein synthesis rate decay. For the specific extract used,
ribosome degradation was to blame for rate decay.
More recently, Neißand coworkers published a more comprehensive experi-
mentally validated model that was used to identify limiting factors of cell-free
protein synthesis [127]. An unusual characteristic of this model is what the authors
described as a hybrid black box approach: transcription processes were simplified,
while the model for translation was detailed. The entire model is a differential
algebraic equation system of eight algebraic equations and over 400 ODEs. Using
9
sensitivity analysis, Neiß found that cell-free protein synthesis rates are limited by
concentrations of tRNA and elongation factor Tu.
A model that captured resource competition in gene networks was published
by Gyorgy and Murry [60]. For a two-protein expression system, simulations
that considered both products agreed with experimental data for the same sys-
tem. This model was also applied to predict possible product concentrations in
multiple-protein expression systems and compare different cell-free extracts. The
authors concluded that resource competition is a key consideration in the design
of synthetic gene circuits.
The cell-free protein synthesis models discussed thus far have been based on
experiments in which DNA serves as the template. RNA genetic circuitry is another
area where mathematical models can be developed. Transcription regulating RNAs
are of interest because they bypass the need for regulatory proteins [113]. In the
context of circuit design, these regulatory RNAs can be used to create various logic
gates and cascades [20] [31]. The first experimentally validated model of a synthetic
RNA circuit was published by Hu and coworkers. [72]. The model contained 8
ODEs and 13 previously unknown parameters. These parameters were estimated
based on results from sensitivity analysis guided experiments. This model was
able to predict results for new networks it had not been trained on.
Taken together, these models of transcription, translation, resource competition,
and gene regulatory circuits provide useful information for designing new sys-
tems; however, they each provide an incomplete representation of cell-free protein
10
synthesis. CFPS does not just rely on transcription and translation processes to
fuel protein production, but relies on central carbon metabolism to meet energy
requirements. Thus, more sophisticated models are needed that integrate metabolic
pathways with transcription and translation process. Ultimately, an integrated
framework can provide insights into the limitations of CFPS and provide strategies
for improving CFPS performance metrics such as carbon yield, energy efficiency
and productivity.
11
CHAPTER 2
EFFECTIVE DYNAMIC MODELS OF METABOLIC NETWORKS
2.1 Abstract
1 Mathematical models of biochemical networks are useful tools to understand
and ultimately predict how cells utilize nutrients to produce valuable products.
Hybrid cybernetic models (HCM) in combination with elementary modes are tools
to model cellular metabolism. However, HCM is limited to reduced metabolic
networks because of the computational burden of calculating elementary modes. In
this study, we developed the hybrid cybernetic modeling with flux balance analysis
or HCM-FBA technique which uses flux balance solutions instead of elementary
modes to dynamically model metabolism. We show HCM-FBA has comparable
performance to HCM for a proof of concept metabolic network and for a reduced
anaerobic E. coli network. Next, HCM-FBA was applied to a larger metabolic
network of aerobic E. coli metabolism which was infeasible for HCM (29 FBA
modes versus more than 153,000 elementary modes). Global sensitivity analysis
further reduced the number of FBA modes required to describe the aerobic E. coli
data, while maintaining model fit. Thus, HCM-FBA is a promising alternative to
HCM for large networks where the generation of elementary modes is infeasible.
1Adapted with permission from Vilkhovoy M, Minot M, and Varner JD, ”Effective dynamic
models of metabolic networks” (2016) IEEE Life Sciences Letters, 2(4):51-54.
12
2.2 Introduction
Biotechnology harnesses the power of metabolism to produce products that benefit
society. Constraints based models are important tools to understand and ultimately
to predict how cells utilize nutrients to produce products. Constraints based
methods such as flux balance analysis (FBA) [133] and network decomposition
approaches such as elementary modes (EMs) [150] or extreme pathways (EPs) [148]
model intracellular metabolism using the biochemical stoichiometry and other con-
straints such as thermodynamical feasibility under pseudo-steady state conditions.
FBA has been used to efficiently estimate the performance of metabolic networks of
arbitrary complexity, including genome scale networks, using linear programming
[34]. On the other hand, EMs (or EPs) catalog all possible metabolic behaviors such
that any flux distribution predicted by FBA is a convex combination of the EMs (or
EPs) [186]. However, the calculation of EMs (or EPs) is computationally expensive
and currently infeasible for genome scale networks [102].
Cybernetic models are an alternative to the constraints based approach which
hypothesize that metabolic control is the output of an optimal decision. Cyber-
netic models have predicted mutant behavior [178, 157], steady-state multiplicity
[96], strain specific metabolism [156], and have been used in bioprocess control
applications [49]. Hybrid cybernetic models (HCM) have addressed earlier short-
comings of the approach by integrating cybernetic optimality concepts with EMs.
HCMs dynamically choose combinations of biochemical modes (each catalyzed by
a pseudo enzyme whose expression is controlled by an optimal decision) to achieve
13
a physiological objective (Fig. 2.1A). HCMs generate intracellular flux distributions
consistent with other approaches such as metabolic flux analysis (MFA), and also
describe dynamic extracellular measurements superior to dynamic FBA (DFBA)
[95]. However, HCMs are restricted to networks which can be decomposed into
EMs (or EPs).
In this study, we developed the hybrid cybernetic modeling with flux balance
analysis (HCM-FBA) technique. HCM-FBA is a modification of the hybrid cy-
bernetic approach of Ramkrishna and coworkers [95] which uses FBA solutions
(instead of EMs) in conjunction with cybernetic control variables to dynamically
simulate metabolism. Since HCM showed superior performance to DFBA, we
compared the performance of HCM-FBA to HCM for a prototypical metabolic
network, along with two real-world E. coli applications. HCM-FBA performed
comparably to HCM for the prototypical network and a reduced anaerobic E. coli
network, despite having fewer parameters in each case. Next, HCM-FBA was
applied to an aerobic E. coli metabolic network that was infeasible for HCM. HCM-
FBA described cellmass growth and the shift from glucose to acetate consumption
with only a few modes. Global sensitivity analysis allowed us to further reduce the
aerobic E. coli HCM-FBA model to the minimal model required to describe the data.
Thus, HCM-FBA is a promising approach for the development of reduced order
dynamic metabolic models and a viable alternative to HCM or DFBA, especially
for large networks where the generation of EMs is infeasible.
14
A Network HCM-EM HCM-FBA
A Extracellular
Intracellular
Ae Bv4 e
v2 v3
C
All possible modes Each mode  Intracellular considered separatceelllymass Modes areq c3ombined by FBAprecursor
Ce
B CB
A Extracellular
Intracellular
q1 v
A
1 q2
Ae A B Bv4 e
C
v2 v3
C
Biomass
Intracellular 
cellmass q3
precursor
Ce B
B
Time (hr)
Figure 2.1: HCM proof of concept metabolic study. A: HCMs distribute uptake
and secretion flAuxes amongst different pathways. For HCM, these pathways are
elementary modes; for HCM-FBA theCse are flux balance analysis solutions. HCM
combines all possible modes within a network; whereas HCM-FBA combines only
steady-state paths estimated by fluBxiombasslance analysis. B: Prototypical network with
six metabolites and seven reactions. Intracellular cellmass precursors A, B, and C
are balanced (no accumulation) while the extracellular metabolites (Ae, Be, and Ce)
are not balanced (can accumulate). TheBoval denotes the cell boundary, qj is the jth
flux across the boundary, and vk denotes the kth intracellular flux. C: Simulation
of extracellular metaboliTtimeet (rhra) jectories using HCM-FBA (solid line) versus HCM
(points) for the prototypical network.
2.3 Results
HCM-FBA was equivalent to HCM for a prototypical metabolic network (Fig.
2.1). The proof of concept network, consisting of 6 metabolites and 7 reactions
15
Abundance (A.U.)
Abundance (A.U.)
(Fig. 2.1B), generated 3 FBA modes and 6 EMs. Using the EMs and synthetic
parameters, we generated test data from which we estimated the HCM-FBA model
parameters. The best fit HCM-FBA model replicated the synthetic data (Fig. 2.1C).
The HCM and HCM-FBA kinetic parameters were not quantitatively identical,
but had similar orders of magnitude; the FBA approach had 3 fewer modes, thus
identical parameter values were not expected. The HCM-FBA approach replicated
synthetic data generated by HCM, despite having 3 fewer modes. Thus, we expect
HCM-FBA will perform similarly to HCM, despite having fewer parameters. Next,
we tested the ability of HCM-FBA to replicate real-world experimental data.
The performance of HCM-FBA was equivalent to HCM for anaerobic E. coli
metabolism (Fig. 2.2A). We constructed an anaerobic E. coli network [95], con-
sisting of 12 reactions and 19 metabolites, which generated 7 FBA modes and 9
EMs. HCM reproduced cellmass, glucose, and byproduct trajectories using the
kinetic parameters reported by Kim et al. [95] (Fig. 2.2A, points versus dashed).
HCM-FBA model parameters were estimated in this study from the Kim et al. data
set using simulated annealing. Overall, HCM-FBA performed within 5% of HCM
(on a residual standard error basis) for the anaerobic E. coli data (Fig. 2.2A, solid),
despite having 2 fewer modes and 4 fewer parameters (17 versus 21 parameters).
Thus, while both HCM and HCM-FBA described the experimental data, HCM-FBA
did so with fewer modes and parameters. HCM-FBA captured the shift from
glucose to acetate consumption for a model of aerobic E. coli metabolism that was
infeasible for HCM (Fig. 2.2B). An E. coli metabolic network (60 metabolites and
105 reactions) was constructed from literature [149, 136]. Elementary mode de-
16
A 1.0 B 1.0
HCM FBA HCM FBA
HCM EM Minor modes removed
Data Major mode removed
Data
0.5 Biomass (gDW/L) 0.5
Lactate (mM)
0.0 0.0
0 2 4 6 8 0 2 4 6 8 10
30 Glucose 12
20 8
Formate
10 4
0 0
0 2 4 6 8 0 2 4 6 8 10
6
20 40
15 Acetate 30 4
10 Ethanol 20
Succinate 2
5 10
0 0 0
0 2 4 6 8 0 2 4 6 8 10
TTimi ee ((hhrr)) TTimimee (h(hr)r)
Figure 2.2: HCM-FBA versus HCM performance for small and large metabolic
networks. A: Batch anaerobic E. coli fermentation data versus HCM-FBA (solid) and
HCM (dashed). The experimental data was reproduced from Kim et al. [95]. Error
bars represent the 90% confidence interval. B: Batch aerobic E. coli fermentation
data versus HCM-FBA (solid). Model performance is also shown when minor
modes (dashed) and major modes (dotted) were removed from the HCM-FBA
model. The experimental data was reproduced from Varma & Palsson [176]. Error
bars denote a 10% coefficient of variation.
composition of this network (and thus HCM) was not feasible; 153,000 elementary
modes were generated before the calculation became infeasible. Conversely, flux
balance analysis generated only 29 modes for the same network. HCM-FBA model
17
AAbbuunnddaannccee ((mmMM)) AAbbuunnddaannccee ((mmMM)) Abundance
AAAAc
BBBiiiooommmaaa
ccceeeettttaaaatttteeee ((((mmmmMMMM)))) GGGGlllluuuuccccoooosssseeee ((((mmmmMMMM)))) Bioma
ssssssss ((((ggggDDDDWWWW////LLLL))))
parameters were estimated from cellmass, glucose, and acetate measurements
[176] using simulated annealing (Fig. 2.2B, solid). HCM-FBA captured glucose
consumption, cellmass formation, and the switch to acetate consumption following
glucose exhaustion. HCM-FBA described the dynamics of a network that was
infeasible for HCM, thereby demonstrating the power of the approach for large
networks. Next, we demonstrated a systematic strategy to identify the critical
subset of FBA modes required for model performance.
Global sensitivity analysis identified the FBA modes essential to model perfor-
mance (Fig. 2.3). Total order sensitivity coefficients were calculated for all kinetic
parameters and enzyme initial conditions in the aerobic E. coli model. Five of the
29 FBA modes were significant; removal of the most significant of these modes
(encoding aerobic growth on glucose) destroyed model performance (Fig. 2.2B,
dotted). Conversely, removing the remaining 24 modes simultaneously had a neg-
ligible effect upon model performance (Fig. 2.2B, dashed). The sensitivity analysis
identified the minimal model structure required to explain the experimental data.
2.4 Discussion
In this study, we developed HCM-FBA, an effective modeling technique to simulate
metabolic dynamics. HCM-FBA uses flux balance analysis solutions in conjunction
with cybernetic control variables to dynamically simulate metabolism. We studied
the performance of HCM-FBA on a prototypical metabolic network, along with two
18
Rate constants Saturation constants Enzyme parameters
0.7
Total Order
0.6
0.5
0.4
0.3
0.2
0.1
0.0
kmax Ksat k ↵  enzyme initial e conditions
Figure 2.3: Global sensitivity analysis of the aerobic E. coli model. Total order vari-
ance based sensitivity coefficients were calculated for the biomass yield on glucose
and acetate. Sensitivity coefficients were computed for kinetic parameters and
enzyme initial conditions (N = 183,000). Error bars represent the 95% confidence
intervals of the sensitivity coefficients.
E. coli networks. First, we showed that the performance of HCM-FBA and HCM
were comparable for the prototypical network and a small model of anaerobic E. coli
metabolism. For the anaerobic case, both approaches described the experimental
data. However, HCM-FBA (which was within 5% of HCM and slightly better than
HCM for lactate secretion) had fewer modes and parameters. Next, HCM-FBA
was applied to an aerobic E. coli metabolic network that was not feasible for HCM.
Elementary mode decomposition of the aerobic network generated over 153,000
elementary modes. Conversely, the HCM-FBA approach described cellmass growth
and the shift from glucose to acetate consumption with only 29 FBA modes. Global
19
YieYldie sldenSseitinvsitiyt i v(iAty.UI.n mdiecaiens + s.d.)
sensitivity analysis further showed that only 5 of the 29 FBA modes were critical to
model performance. Removal of these modes crippled the model, but removal of
the remaining 24 modes had a negligible impact. These insignificant modes were
associated with maintenance, thus they would likely not impact model predictions
since the data represented a growing culture. HCM-FBA is an alternative approach
to HCM, especially for large networks where the generation of elementary modes
is infeasible. Elementary modes show the complexity of a cell, displaying the many
routes it can take but mathematically FBA has an objective superiority for large
networks.
HCM-FBA is a promising approach to model large metabolic networks where
elementary modes calculations are infeasible, and where kinetic models of such
systems have intractable identification problems. However, there are additional
studies that should be performed. First, the intracellular flux distribution predicted
by HCM-FBA should be compared to HCM and to flux measurements calculated
using MFA or FBA/DFBA in combination with carbon labeling. HCM predicted
intracellular fluxes that were similar to MFA results [95]; however, the fluxes
predicted by HCM-FBA have not yet been validated. Next, the performance of
HCM-FBA should be compared to lumped hybrid cybernetic models (L-HCM). L-
HCMs, which combine elementary modes into mode families based upon metabolic
function [155, 156], have been applied to an E. coli network with 67 reactions and a
Saccharomyces cerevisiae network with 70 reactions; both cases had satisfactory fits to
extracellular experimental data. However, while L-HCM reduces the dimension of
possible alternative modes that must be considered, it still requires the calculation
20
of an initial set of modes. For metabolic networks of even moderate size, EM (or
EP) decomposition may not be possible. On the other hand, the generation of
flux balance solutions (convex combinations of the elementary modes or extreme
pathways) is trivial, even for genome scale metabolic networks. Thus, HCM-FBA
opens up the possibility for dynamic genome scale models of bacterial and perhaps
even of mammalian metabolism.
2.5 Materials and Methods
The HCM-FBA approach is a modification of HCM, where elementary modes
are replaced with flux balance analysis solutions. Thus, extracellular variables
are dynamic while intracellular metabolites are at a pseudo steady state. The
abundance of extracellular species i (xi), the pseudo enzyme el (catalyzes flux
through mode l), and cellmass are governed by:
dx R Li = σ z q (e, k, x) c i = 1, . . . ,M
dt ∑ ∑ ij jl lj=1 l=1
del = α + r (k, x) u − (β + r ) e l = 1, . . . ,L
dt l El l l G l
dc
= rGcdt
where R andM denote the number of reactions and extracellular species in the
model and L denotes the number of FBA modes. The quantity σij denotes the
stoichiometric coefficient for species i in reaction j and zjl denotes the normalized
21
flux for reaction j in mode l. If σij > 0, species i is produced by reaction j; if
σij < 0, species i is consumed by reaction j; if σij = 0, species i is not connected
with reaction j. Extracellular species balances were subject to the initial conditions
x (to) = xo determined from experimental data. The term ql (e, k, x) denotes the
specific uptake/secretion rate for mode l where e denotes the pseudo enzyme
vector, k denotes the unknown kinetic parameter vector, x denotes the extracellular
species vector, and c denotes the cell mass; ql (e, k, x) is the product of a kinetic
term (q̄l) and a control variable governing enzyme activity. Flux through each
mode was catalyzed by a pseudo enzyme el, synthesized at the regulated specific
rate rE,l (k, x), and constitutively at the rate αl. The term ul denotes the cybernetic
variable controlling the synthesis of enzyme l. The term βl denotes the rate constant
governing non-specific enzyme degradation, and rG denotes the specific growth
rate through all modes. The specific uptake/secretion rates and the specific rate
of enzyme synthesis were modeled using saturation kinetics. The specific growth
rate was given by:
L
rG = ∑ zµlql (e, k, x)
l=1
where zµl denotes the growth flux µ through mode l. The control variables ul and
vl , which control the synthesis and activity of each enzyme respectively, were given
by:
z
u = sl
q̄l z q̄
l L vl =
sl l
max z q̄
∑ z q̄ sl lsl l l=1,...,L
l=1
22
where zsl denotes the uptake flux of substrate s through mode l. The model
equations were implemented in Julia (v.0.4.2) [16] and solved using SUNDIALS
[66]. The model code is available at http://www.varnerlab.org under a MIT license.
2.5.1 Elementary mode and flux balance analysis
Elementary modes were calculated using METATOOL 5.1 [87]. FBA modes were
defined as the solution flux vector through the network connecting substrate uptake
to cellmass and extracellular product formation. The FBA problem was formulated
as: ( )
max w T
w obj
= θ w
Subject to : Sw = 0
αi ≤ wi ≤ βi i = 1, 2, . . . ,R
where S denotes the stoichiometric matrix, w denotes the unknown flux vector, θ
denotes the objective selection vector and αi and βi denote the lower and upper
bounds on flux wi, respectively. The flux balance analysis problem was solved
using the GNU Linear Programming Kit (v4.52) [1]. For each FBA mode, the
objective wobj was to maximize either the specific growth rate or the specific rate
of byproduct formation. Multiple FBA modes were calculated for each objective
by allowing the oxygen and nitrate uptake rates to vary. For aerobic metabolism,
the specific oxygen and nitrate uptake rates were constrained to allow a maximum
flux of 10 mM/gDW·hr and 0.05 mM/gDW·hr, respectively. Each FBA mode was
23
normalized by the specified objective flux.
2.5.2 Global sensitivity analysis
Variance based sensitivity analysis was used to estimate which FBA modes were
critical to model performance. The performance function used in this study was the
biomass yield on substrate. Candidate parameter sets (N = 182,000) were generated
using Sobol sampling by perturbing the best fit parameter set ±50% [65]. Model
performance, calculated for each of these parameter sets, was then used to estimate
the total-order sensitivity coefficient for each model parameter.
2.5.3 Estimation of model parameters
Model parameters were estimated by minimizing the difference between simula-
tions and experimental measurements (squared residual):
( )
T S 2x̂
min ∑ ∑ j
(τ)− xj (τ, k)
k =1 j 1 ωj (ττ = )
where x̂j (τ) denotes the measured value of species j at time τ, xj (τ, k) denotes
the simulated value for species j at time τ, and ωj (τ) denotes the experimental
measurement variance for species j at time τ. The outer summation is with respect
to time, while the inner summation is with respect to state. The model residual
24
was minimized using simulated annealing implemented in the Julia programming
language.
25
CHAPTER 3
SEQUENCE SPECIFIC MODELING OF E. COLI CELL-FREE PROTEIN
SYNTHESIS
3.1 Abstract
1 Cell-free protein synthesis (CFPS) is a widely used research tool in systems and
synthetic biology. However, if CFPS is to become a mainstream technology for
applications such as point of care manufacturing, we must understand the perfor-
mance limits and costs of these systems. Toward this question, we used sequence
specific constraint based modeling to evaluate the performance of E. coli cell-free
protein synthesis. A core E. coli metabolic network, describing glycolysis, the
pentose phosphate pathway, energy metabolism, amino acid biosynthesis and
degradation was augmented with sequence specific descriptions of transcription
and translation and effective models of promoter function. Model parameters
were largely taken from literature, thus the constraint based approach coupled the
transcription and translation of the protein product, and the regulation of gene
expression, with the availability of metabolic resources using only six adjustable
model parameters. We tested this approach by simulating the expression of two
model proteins: chloramphenicol acetyltransferase and dual emission green fluo-
rescent protein, for which we have datasets; we then expanded the simulations to
1Adapted with permission from Vilkhovoy M, Horvath N, Shih CH, Wayman JA, Calhoun K,
Swartz J, and Varner JD, ”Sequence specific modeling of E. coli cell-free protein synthesis” (2018)
ACS Synthetic Biology, 7(8):1844-1857.
26
a range of additional proteins. Protein expression simulations were consistent with
measurements for a variety of cases. The constraint based simulations confirmed
that oxidative phosphorylation was active in the CAT cell-free extract, as without
it there was no feasible solution within the experimental constraints of the system.
We then compared the metabolism of theoretically optimal and experimentally con-
strained CFPS reactions, and developed parameter free correlations which could
be used to estimate productivity as a function of carbon number and promoter
type. Lastly, global sensitivity analysis identified the key metabolic processes that
controlled CFPS productivity and energy efficiency. In summary, sequence specific
constraint based modeling of CFPS offered a novel means to a priori estimate the
performance of a cell-free system, using only a limited number of adjustable pa-
rameters. While we modeled the production of a single protein in this study, the
approach could easily be extended to multi-protein synthetic circuits, RNA circuits
or the cell free production of small molecule products.
3.2 Introduction
Cell-free protein expression has become a widely used research tool in systems and
synthetic biology, and a promising technology for personalized protein production.
Cell-free systems offer many advantages for the study, manipulation and modeling
of metabolism compared to in vivo processes. Central amongst these is direct access
to metabolites and the biosynthetic machinery without the interference of a cell wall
27
or the complications associated with cell growth. This allows interrogation of the
chemical environment while the biosynthetic machinery is operating, potentially at
a fine time resolution. Cell-free protein synthesis (CFPS) systems are arguably the
most prominent examples of cell-free systems used today [81]. However, CFPS is
not new; Matthaei and Nirenberg first used E. coli cell-free extracts in the 1960s to
decipher the sequencing of the genetic code [118, 128]. Spirin and coworkers later
improved the operational lifetime of cell-free protein production with a continuous
exchange of reactants and products; however, these systems could only synthesize
a single product and were energy limited [159]. More recently, CFPS was improved
by generating ATP using both substrate level [93] and oxidative phosphorylation
[82, 83]. Today, cell-free systems are used in a variety of applications ranging
from therapeutic protein production [110, 94] to synthetic biology [70]. There
are also several CFPS technology platforms, such as the PANOx-SP and Cytomin
platforms developed by Swartz and coworkers [93, 82, 81], and the TX/TL platform
of Noireaux [52]. However, if CFPS is to become a mainstream technology for
advanced applications such as point of care manufacturing [137], we must first
understand the performance limits and costs of these systems [81]. One tool to
address these questions is constraint based modeling.
Constraint based approaches such as flux balance analysis (FBA), which use stoi-
chiometric reconstructions of microbial metabolism, have become standard tools in
systems biology and metabolic engineering [108]. FBA and metabolic flux analysis
(MFA) [187], as well as convex network decomposition approaches such as ele-
mentary modes [150] and extreme pathways [148], model intracellular metabolism
28
using the biochemical stoichiometry and other constraints such as thermodynam-
ical feasibility [64, 62] under pseudo steady state conditions. Constraint based
approaches have used linear programming [34] to predict productivity [176, 146],
yield [176], mutant behavior [43], and growth phenotypes [129] for biochemical
networks of varying complexity, including genome scale networks, using a limited
number of adjustable parameters. Since the first genome scale stoichiometric model
of E. coli [42], stoichiometric reconstructions of hundreds of organisms, including
industrially important prokaryotes such as E. coli [44] and B. subtilis [131], are now
available [45]. Stoichiometric reconstructions have been expanded to include the in-
tegration of metabolism with detailed descriptions of gene expression (ME-Model)
[4, 107, 129] and protein structures (GEM-PRO) [197, 30]. These expansions have
greatly increased the scope of questions that constraint based models can explore.
Thus, constraint based methods are powerful tools to estimate the performance
of metabolic networks. However, constraint based methods are typically used to
model in vivo processes, and have not yet been applied to cell-free metabolism.
In this study, we used sequence specific constraint based modeling to eval-
uate the performance of E. coli cell-free protein synthesis. A core E. coli cell-
free metabolic model describing glycolysis, pentose phosphate pathway, energy
metabolism, amino acid biosynthesis and degradation was developed from litera-
ture [44]; this model was then augmented with sequence specific descriptions of
promoter function, transcription and translation processes. Thus, the sequence
specific constraint based approach explicitly coupled transcription and translation
processes with the availability of metabolic resources in the CFPS reaction. We
29
tested this approach by simulating the cell-free production of two model proteins,
and then investigated the productivity and energy efficiency for eight additional
proteins. Productivity was inversely proportional to carbon number, while energy
efficiency was independent of carbon number. Based on these simulations, effective
correlation models for optimal protein productivity and energy efficiency were
developed. These correlations were then independently validated with maltose
binding protein which was not in the original data set. Further, global sensitivity
analysis identified the key metabolic processes that controlled CFPS performance;
oxidative phosphorylation was vital to energy efficiency, while the translation
rate was the most important factor controlling productivity. Lastly, we compared
theoretically optimal metabolic flux distributions with experimentally constrained
flux distributions; the experimental CFPS system had an overconsumption of glu-
cose and overproduction of ATP which negatively influenced energy efficiency.
Taken together, sequence specific constraint based modeling of CFPS offered a
novel means to a priori estimate the performance of a cell-free system, using only
six adjustable parameters. While we considered only a single protein here, this
approach could be extended to synthetic circuits, RNA circuits [73] or even cell-free
small molecule production.
30
Metabolism Transcription & Translation
DNA
NTPs
Amino RNA Polymerase
Mathematical Acids Degradation
Representation
Cofactors
mRNA
Ribosome
Target
Protein
Figure 3.1: Sequence specific flux balance analysis. A. Schematic of the core
metabolic network coupled to sequence-specific transcription and translation pro-
cesses of a protein of interest for cell-free protein synthesis .
3.3 Results and discussion
3.3.1 Model derivation and validation
The cell-free stoichiometric network was constructed by removing growth associ-
ated reactions from the iAF1260 reconstruction of K-12 MG1655 E. coli [44], and
adding deletions associated with the specific cell-free system (see Materials and
Methods). The iAF1260 reconstruction describes 1260 ORFs, and thermodynami-
cally derived metabolic flux directionality. We then added the transcription and
translation template reactions of Allen and Palsson for the specific proteins of inter-
est [4]. A schematic of the metabolic network, consisting of 264 reactions and 146
species, is shown in Fig. 3.1A. The network described the major carbon and energy
31
pathways and amino acid biosynthesis and degradation pathways. Using this net-
work in combination with effective promoter models taken from Moon et al. [123]
and literature values for cell-free culture parameters (Table 3.2), we simulated the
sequence specific production of two model proteins: chloramphenicol acetyltrans-
ferase (CAT) and dual emission green fluorescent protein (deGFP, Fig. 3.2A). Dual
emmission GFP was produced under a P70a promoter in an E. coli extract for eight
hours using maltose and 3-phosphoglycerate (3PG) as a carbon and energy source
(R2 = 0.84, Fig. 3.2A). Uncertainty in experimental factors such as the concentration
of RNA polymerase, ribosomes, transcription and translation elongation rates, as
well as the upper bounds on oxygen, maltose, and 3pg consumption rates, did
not qualitatively alter the performance of the model (blue region, 95% confidence
estimate of 100 sets). However, these simulations were only conducted at a single
plasmid concentration of 5 nM. Thus, it was unclear if the model could capture
cell-free protein synthesis for a range of plasmid concentrations.
Simulations of the cell-free deGFP titer for a range of plasmid concentrations
were consistent with experimental measurements (Fig. 3.2B). The titer at each
plasmid concentration was calculated by multiplying the deGFP synthesis flux by
the active time of production, approximately 8 hours. The mean of the ensemble
(calculated by sampling the uncertainty in the model parameters) captured the
saturation of deGFP production as a function of plasmid concentration (R2 = 0.97).
However, while the mean and 95% confidence estimate of the ensemble were
consistent with measured deGFP levels, the model under predicted the deGFP titer
at the saturating plasmid concentration of 5 nM. These results showed that the
32
A B
95% CI of Ensemble 
Mean of Ensemble 
Experimental Data
Time (h) Plasmid Concentration (nM)
Figure 3.2: Sequence specific flux balance analysis of deGFP under a P70a promoter
in TXTL 2.0 E. coli extract. A. deGFP production for 8 h using maltose and 3PG as
a carbon and energy source (R2 = 0.84). Error bars denote a 10% deviation from the
nominal value. B. Predicted versus measured deGFP concentration as a function
of plasmid concentration in TXTL 2.0 (R2 = 0.97). Error bars denote the standard
deviation of experimental measurements. The blue region denotes the 95% CI over
an ensemble of N = 100 sets, the black line denotes the mean of the ensemble, and
dots denote experimental measurements.
sequence specific template reactions, metabolic network, and literature parameters
were sufficient to predict protein production under different promoters.
We calculated the transcription rate using effective promoter models, and then
maximized the rate of translation within biologically realistic bounds. Transcription
and translation rates were subject to resource constraints encoded by the metabolic
network, and transcription and translation model parameters were largely derived
from literature (Table 3.2). In this study, we did not explicitly consider protein
folding. However, the addition of chaperone or other protein maturation steps
could easily be accommodated within the approach by updating the template
reactions, see Palsson and coworkers [129]. The cell-free metabolic model code and
33
deGFP Concentration (μM)
deGFP Concentration (μM)
parameters can be downloaded under an MIT software license from the Varnerlab
website [179].
Cell-free simulations of the time evolution of CAT production were consistent
with experimental measurements (Fig. 3.4). CAT was produced under a T7 pro-
moter in a glucose/NMP cell-free system using glucose as a source of carbon and
energy [25]. Metabolic fluxes were constrained by experimental measurements
of glucose, nucleotides, amino and organic acid consumption and production
rates (estimated from a total of 37 metabolite time series measurements) for the
first hour of the reaction. Whereas, the rates of CAT transcription and translation
were predicted by the model. The model showed good agreement with the CAT
measurement with a coefficient of determination of R2 = 0.92. Next, we simulated
the production of deGFP under a P70a promoter in TXTL 2.0 using maltose and 3-
phosphoglycerate (3PG) as a carbon and energy source. The TXTL simulation was
performed with transcription and translation constraints estimated from literature,
but without metabolite constraints, since experimental metabolite measurements
were not reported. TXTL 2.0 simulations showed good agreement between es-
timated and dynamic (R2 = 0.84) or end-point (R2 = 0.97) deGFP measurements
(Supporting Information, Fig. 3.2), including the saturation of deGFP titer with
plasmid concentration. In the cases of CAT and deGFP production, uncertainty
in experimental factors such as the concentration of RNA polymerase, ribosomes,
transcription and translation elongation rates, as well as the upper bounds on
oxygen and carbon consumption rates (uniformly sampled around the parameter
values shown in Table 3.2), did not qualitatively alter the performance of the
34
model (blue region, 95% confidence estimated for N=100 parameter sets). Together,
these simulations suggested the description of transcription and translation, and its
integration with metabolism encoded in the cell-free model, were consistent with
experimental measurements. These simulations also showed that the sequence
specific template reactions, metabolic network, and literature parameters were
sufficient to predict protein production under different promoters.
3.3.2 Metabolic flux distributions
Theoretical optimal flux distribution
Recently, aerobic catabolism has been activated in CFPS which increases the usable
energy from a carbon source such as glucose [81]. The discovery that such complex
metabolism could be activated and controlled in CFPS led us to examine the flux
distribution of CFPS. While there is no cell growth, complex anabolic and catabolic
processes still occur during cell free protein synthesis [163]. The CAT translation
rate was optimized without experimental constraints on substrate consumption
or byproduct formation to estimate the theoretically optimal metabolic flux dis-
tribution. In all cases, the CFPS reaction was supplied with glucose; however,
we considered different scenarios for amino acid (AA) supplementation. Amino
acids are routinely supplied in CFPS reactions [121, 25, 52], but it has not yet been
determined whether de novo amino acid biosynthesis occurs in CFPS reactions
[121, 117]. Thus, we simulated three different scenarios: first, the CFPS reaction
35
A AA Uptake & Synthesis B AA Uptake w/o Synthesis C AA Synthesis w/o Uptake
GLC GLC GLC
Pentose Phosphate Pathway Pentose Phosphate Pathway Pentose Phosphate Pathway 
G6P 6PG RU5P G6P 6PG RU5P G6P 6PG RU5P
F6P XU5P R5P F6P XU5P R5P F6P XU5P R5P
FBP S7P G3P FBP S7P G3P FBP S7P G3P
E4P F6P E4P F6P E4P F6P
T3P T3P T3P
1,3DPG 2DDG6P 1,3DPG 2DDG6P 1,3DPG 2DDG6P
3PG 3PG 3PG
2PG 2PG 2PG
PEP PEP PEP
PYR PYR PYR
ACCOA ACCOA ACCOA Flux (A.U.)
100
OAA CIT OAA
80
CIT OAA CIT
60
MAL MAL MAL
ICIT ICIT ICIT
GLX GLX GLX
TCA Cycle TCA Cycle TCA Cycle 40
FUM AKG FUM AKG FUM AKG 20
SUCC SUCCOA SUCC SUCCOA SUCC SUCCOA
0
Figure 3.3: Optimal metabolic flux distribution for CAT production. A. Optimal flux
distribution in the presence of amino acid supplementation and de novo synthesis.
B. Optimal flux distribution in the presence of amino acid supplementation without
de novo synthesis. C. Optimal flux distribution with de novo amino acid synthesis
in the absence of supplementation. Mean flux across the ensemble (N = 100),
normalized to glucose uptake flux. Thick arrows indicate flux to or from amino
acid biosynthesis pathways.
was supplied with glucose and amino acids, and was able to synthesize amino
acids from glucose (AAs supplied and de novo synthesis). In this case, the flux
distribution showed an incomplete TCA cycle, where a combination of glucose
and amino acids powered protein expression (Fig. 3.3A). Glucose was consumed
to produce acetyl-coenzyme A, and associated byproducts, while glutamate was
36
Glycolysis
Glycolysis
Glycolysis
converted to alpha-ketoglutarate which traveled to oxaloacetic acid and pyruvate
for additional amino acid biosynthesis. In order to validate this case experimentally,
a separate CFPS reaction would have to be prepared where amino acids are not
supplied during cell growth, before cell-free extract preparation. In the second
scenario, the CFPS reaction was supplied with glucose and amino acids, but de novo
amino acid biosynthesis was not allowed (AAs supplied w/o de novo synthesis).
This scenario was potentially consistent with common cell-free extract preparation
protocols which often involve amino acid supplementation during cell growth; in
the presence of supplementation we expected the enzymes responsible for amino
acid biosynthesis to be largely absent from the CFPS reaction. Our comprehensive
dataset for CAT synthesis is likely representative of this case, thus we compared
the optimal and experimentally constrained flux distribution for these cases subse-
quently. With supplementation and without de novo synthesis, the flux distribution
showed no TCA cycle flux with all carbon flux traveling from glucose to acetate.
In this case, ATP was produced by a combination of substrate level and oxidative
phosphorylation, where ubiquinone was regenerated via either cyo and cyd activity,
without relying on succinate dehydrogenase in the TCA cycle (Fig. 3.3B). These
first two cases where amino acids were available had similar performance, and
their respective metabolic flux distributions had a 99% correlation. Lastly, when
the CFPS reaction was supplied with glucose but not amino acids, the system was
forced to synthesize amino acids de novo from glucose (de novo synthesis only), the
flux distribution showed a largely complete TCA cycle, with diversion of metabolic
flux into the Entner-Doudoroff pathway to produce NADPH (Fig. 3.3C). To val-
37
idate this case experimentally, a CFPS extract would have to be prepared where
amino acids are not supplied during cell growth and the CFPS reaction would
have to be run without amino acid supplementation in the media. However, these
simulations represent the theoretically optimal metabolic flux distribution, which
may not be consistent with what is observed experimentally. Toward this question,
we compared the optimal metabolic flux distribution of the second scenario (AA
supplementation without de novo synthesis) with the experimentally constrained
case (Fig. 3.4A).
Experimentally constrained flux distribution
The experimentally constrained metabolic flux distribution had a 54% correlation
with the theoretically optimal flux distribution (AAs supplied w/o de novo syn-
thesis; Fig. 3.3B). The low similarity suggested several differences between the
experimentally constrained and optimal metabolic flux distribution. The largest dis-
crepancy was in the pentose phosphate pathway, oxidative phosphorylation, and
anaplerotic reactions which had no correlation. The experimentally constrained
simulation suggested a high flux through zwf, yielding NADPH which was inter-
converted to NADH via the pnt1 reaction. This NADH was consumed to convert
pyruvate to lactate or to generate ATP via oxidative phosphorylation. In contrast,
the optimal solution had no zwf nor pnt1 activity. Oxidative phosphorylation had a
negligible correlation, where the experimental system relied on cyd rather than cyo
to produce ATP through oxidative phosphorylation. However, the experimentally
38
A Experimental Measurements B
GLC
Pentose Phosphate Pathway 95% CI of Ensemble 
G6P 6PG RU5P Mean of Ensemble 
Experimental Data
F6P XU5P R5P
FBP S7P G3P
E4P F6P
T3P
1,3DPG 2DDG6P
3PG
2PG
PEP
PYR
ACCOA Flux (A.U.)
100
OAA 80CIT
60
MAL
ICIT
GLX
TCA Cycle 40
FUM AKG 20
SUCC SUCCOA
0 Time (h) Time (h)
Figure 3.4: Experimentally constrained simulation of CAT production. CAT was
produced under a T7 promoter in CFPS E. coli extract for 1 h using glucose as a
carbon and energy source. Error bars denote the standard deviation of experimental
measurements. The blue region denotes the 95% CI over an ensemble of N = 100
sets, the black line denotes the mean of the ensemble, and dots denote experimental
measurements. A. Metabolic flux distribution for CAT production in the presence
of experimental constraints for glucose, organic acid and amino acid consumption
and production rates. Mean flux across the ensemble, normalized to glucose
uptake flux. Thick arrows indicate flux to or from amino acids. B. Central carbon
metabolite and CAT measurements versus simulations over a 1 hour time course.
The blue region denotes the 95% CI over an ensemble of N = 100 sets, the black line
denotes the mean of the ensemble, and dots denote experimental measurements.
constrained simulation suggested that oxidative phosphorylation must be active
in the CFPS extract, as without it there was no feasible solution within the experi-
39
Glycolysis
Lactate (mM) Pyruvate (mM) Glucose (mM)
Succinate (mM) Acetate (mM) CAT (μM)
mental constraints. On the other hand, overflow metabolism and glycolysis were
highly correlated between the optimal and experimentally constrained solutions,
with a 72% and 81% correlation, respectively. Surprisingly, folate, purine, and
pyrimidine metabolism were active in the experimental system, but inactive in
the optimal system. Lastly, alanine, glutamine, pyruvate, lactate, acetate, malate,
and succinate all accumulated in the experimental system, whereas the optimal
solution produced only acetate; this accumulation contributed to the difference in
the flux distributions. Next, we examined the productivity and energy efficiency
for the cell-free synthesis of model proteins.
3.3.3 Analysis of CFPS performance
We analyzed the productivity and energy efficiency for the cell-free production
of eight proteins with and without amino acid supplementation (Fig. 3.5). The
expression of each protein was under a P70a promoter, with the exception of CAT
which was expressed using a T7 promoter. In all cases, the CFPS reaction was
supplied with glucose; however, we considered different scenarios for amino acid
(AA) supplementation, similar to the cases considered in the flux distribution:
AAs supplied and de novo synthesis, AAs supplied w/o de novo synthesis, and AA
de novo synthesis only. Eight proteins, ranging in size, were selected to evaluate
CFPS performance: bone morphogenetic protein 10 (BMP10), chloramphenicol
acetyltransferase (CAT), caspase 9 (CASP9), dual emission green fluorescent pro-
tein (deGFP), prothrombin (FII), coagulation factor X (FX), fibroblast growth factor
40
21 (FGF21), and single chain variable fragment R4 (scFvR4). An additional case
was considered for CAT, where central metabolic fluxes were constrained by ex-
perimental measurements of glucose, organic and amino acids. Using these model
proteins, we developed effective correlation models that predicted the productivity
and energy efficiency given the carbon number of the protein. Finally, we inde-
pendently validated the correlations with a protein not in our original data set:
maltose binding protein (MBP).
Productivity
The theoretical maximum productivity for proteins expressed using a P70a pro-
moter (µM/h) was inversely proportional to the carbon number (CPOI) and varied
between 1 and 12 µM/h for the proteins sampled (Fig. 3.5A-B). The theoretical max-
imum productivities, with and without amino acid supplementation, were within a
standard deviation of one another for each protein, but varied significantly between
proteins. Productivity varied non-linearly with carbon number of the protein; for
instance, BMP10 (424 aa) had a optimal productivity of approximately 2.5 µM/h,
whereas the optimal productivity of deGFP (229 aa) was approximately 8.4 µM/h.
To examine the influence of protein size, we plotted the mean optimal productivity
against the carbon number of each protein (Fig. 3.5B). The optimal productivity
and carbon number were related by the power-law relationship α× (C )βPOI , where
α = 6.02× 106 µM/(h·carbon number) and β = −1.93 for a P70a promoter. Inter-
estingly, CAT did not obey the P70a power-law relationship; the relatively high
41
productivity of CAT was due to its T7 promoter. The higher transcription rate of
the T7 promoter increased the steady state level of mRNA by 34%, resulting in a
higher productivity. However, CAT expressed under a P70a promoter followed
the P70a power-law correlation with a productivity of approximately 8.5 ± 2.3
µM/h (predicted to be 7.2 µM/h by the optimal productivity correlation). These
simulations suggested a promoter specific relationship between the productivity
and carbon number of the protein. However, it was unclear if the productivity
correlation was predictive for proteins not considered in the original data set.
We independently validated the productivity correlation by calculating the
optimal productivity of MBP (which was not in the original dataset) using the full
model and the effective correlation model (Fig. 3.5B). The prediction error was less
than 8% for an a priori prediction of CFPS productivity using the effective correla-
tion. Thus, the effective productivity correlation could be used as a parameter-free
method to estimate optimal productivity for cell-free protein production using a
P70a promoter. For CFPS using other promoters, a similar correlation model could
be developed. For example, maximal transcription occurs when the promoter
model coefficient u (κ) = 1; the theoretical maximum productivity correlation for
maximum promoter activity also followed a power-law distribution (α = 1.39× 107
µM/(h·carbon number) and β = −1.99) (Fig. 3.5B, gray). The CAT value under a
T7 promoter was similar to the maximal productivity as uT7 (κ) ' 0.91 given the
T7 promoter model parameters used in this study (Table 3.2). Taken together, the
maximum optimal productivity of a cell-free reaction was found to be inversely
proportional to carbon number of the protein, following a power-law relationship
42
for proteins expressed under a P70a promoter.
Energy efficiency
The optimal energy efficiency of protein synthesis was independent of carbon num-
ber, with and without amino acid supplementation (Fig. 3.5C-D); it was approxi-
mately 84% for the model proteins sampled. The relationship was linear, but with
negligible slopes: mY × (CPOI) + bY, where mY = −1.43× 10−4 energy efficiency
(%)/carbon number for the case with supplementation, and mY = 3.21× 10−3
energy efficiency (%)/carbon number for the case without supplementation. The
energy efficiency (y-intercept) was calculated at bY = 84.15 (%) with supplementa-
tion, and bY = 66.96 (%) without supplementation. In the presence of amino acids,
energy was utilized to power CFPS instead of synthesizing amino acids; thus, a
constant energy efficiency was observed regardless of the carbon number of the
protein. In the absence of supplementation, the energy efficiency decreased to
between 68% and 76%. In this case, glucose consumption more than doubled (64%
increase for CAT) compared to cases supplemented with amino acids; meanwhile,
the productivity was similar for each protein (Fig. 3.5B). Therefore, the energy
burden required for synthesizing each amino acid and powering CFPS lowered the
energy efficiency. Surprisingly, without amino acid supplementation, proteins with
a higher carbon number had marginally higher energy efficiency (R2 = 0.82). This
counter intuitive result was an artifact of the difference in productivity between
small and large carbon number proteins. Smaller carbon number proteins had a
43
higher productivity compared to larger carbon number proteins. Thus, smaller
carbon number proteins had a higher energy demand to meet the increased pro-
ductivity. When smaller carbon number proteins were constrained to have the
same productivity as larger carbon number proteins, they had comparable energy
efficiencies. There were also differences in the metabolic flux distribution between
smaller and larger carbon number proteins. Larger carbon number proteins had a
higher flux through glycolysis and oxidative phosphorylation. On the other hand,
the smaller carbon number proteins had higher flux through zwf, e.g., approxi-
mately 84% of flux traveled through zwf for FGF21 compared to 67% for FII. This
difference in pathway choice is also due to the higher productivity of the smaller
carbon number proteins. Higher productivity increased the demand of NADPH
(required for amino acid biosynthesis since amino acids were not available in the
media), where NADPH was generated via zwf. Lastly, the optimal energy efficiency
MBP was well predicted by the linear efficiency model with and without amino
acid supplementation. The estimated MBP energy efficiency had a maximum error
of 6% compared to the correlation model prediction without supplementation, and
an error of 1% in the presence of amino acids.
Experimentally constrained CAT simulations showed suboptimal energy effi-
ciency (Fig. 3.5D, dagger). CAT production was simulated using the constraint
based model in combination with experimental measurements of glucose, organic
and amino acid consumption and production rates (Fig. 3.1B). The experimen-
tally constrained energy efficiency was 16.4 ± 5.6% compared to the theoretical
maximum of approximately 84 ± 0.1%. Thus, while the energy efficiency correla-
44
A B
AA Uptake & Synthesis  AA Synthesis w/o Uptake All Cases
AA Uptake w/o Synthesis Experimental Measurements f(x) = 6.02⇥ 106x-1.93 (R2 = 0.99)
Max Productivity
f(x) = 1.39⇥ 107 -1.99 ( 2x R = 0.99)
CAT*
†
FGF21
deGFP
scFvR4
MBP BMP10
CASP9 FII
FX
 BMP10   C ASP9 CAT     deGFP FII  FX      FGF21  scFvR4
   Carbon Number in POI
C D
AA  Uppttaakkee  &  SSyynntthheessiiss    AA  SSyynntthheessiiss  w//oo  Uppttaakkee  
AA  Uppttaakkee  w//oo  SSyynntthheessiiss  EExxppeerriimeennttaall  Meeaassuurreemeennttss FGF21 CAT BMP10
MBP
scFvR4 CASP9 FX FIIdeGFP
AA Uptake w/ & w/o Synthesis 
f(x) = -1.43⇥ 10-4x+ 84.15
AA Synthesis w/o Uptake
f(x) = 3.21⇥ 10-3x+ 66.96
†
 BMP10   C ASP9 CAT     deGFP FII  FX      FGF21  scFvR4
Carbon Number in POI
Figure 3.5: The CFPS performance for eight model proteins with and without
amino acid supplementation. A. Mean CFPS productivity for a panel of model
proteins with and without amino acid supplementation. B. Mean CFPS productivity
versus carbon number for a panel of model proteins with and without amino acid
supplementation. Trendline (black dotted line) was calculated across all cases for a
P70a promoter (R2 = 0.99) and maximum productivity trendline assumed u (κ) = 1
(grey dotted line; R2 = 0.99). C. Mean CFPS energy efficiency for a panel of model
proteins with and without amino acid supplementation. D. Mean CFPS energy
efficiency versus carbon number for a panel of model proteins with and without
amino acid supplementation. Trendline for cases with amino acids (black dotted
line) and trendline for without amino acids (grey dotted line; R2 = 0.81). Error bars:
95% CI calculated by sampling; asterisk: protein excluded from trendline; dagger:
constrained by experimental measurements and excluded from trendline; triangles:
first principle prediction and excluded from trendline.
tion model was not effective in describing the experimental dataset as it assumes
optimality, it was useful in showing the potential optimal energy efficiency CFPS
systems could achieve. Given that the CAT productivity was similar between the
45
Energy Efficiency (%) Productivity (μM/h)
Mean Energy Efficiency (%) Mean Productivity (μM/h)
simulated and measured systems, differences in the glucose consumption rate and
the ATP yield per glucose were likely responsible for the difference between the
optimal and experimental systems. The glucose consumption rate was approxi-
mately 30 - 40 mM/h in the experimental system (even in the presence of amino
acids). On the other hand, the theoretical optimal simulation suggested the glucose
consumption rate was significantly less than the observed rate, approximately 1 - 7
mM/h (depending upon amino acid supplementation). In the theoretical optimal
simulation, the CFPS reaction produced only acetate as a byproduct, but in the
experimental system acetate, lactate, pyruvate, succinate and malate all accumu-
lated during the first hour of production. In the optimal system, the majority of
the carbon flux traveled toward CAT synthesis, while the remaining flux traveled
toward acetate. On the other hand, in the experimentally constrained system, there
was relatively low carbon flux toward CAT synthesis compared to the glucose
consumption rate, which lead to the accumulation of various organic acids. This
suggested that the experimentally constrained system consumed more glucose than
was required for CAT synthesis. The energy produced per unit glucose was also
different between the optimal and experimentally constrained cases. In the optimal
simulation, 12 ATPs were produced per unit glucose (the theoretical maximum for
this network was 21), while the experimentally constrained simulation produced
only ˜4 ATPs per glucose. Thus, approximately 120 - 160 mM ATP/h was produced
in the experimental case, in contrast to 12 - 84 mM ATP/h for the optimal case.
Thus, the experimental system overproduced ATP. We know from measurements
that ATP did not accumulate in the media, which suggested it was consumed
46
by pathways that were not active in the optimal simulation. For example, in the
experimentally constrained simulations, 36% of energy resources went toward
nucleoside triphosphate (NTP) degradation. In contrast, the theoretical optimum
had negligible NTP degradation. Taken together, comparison of the experimentally
constrained and theoretically optimal flux distributions suggested the CFPS system
over-consumed glucose, and counter intuitively overproduced ATP. A strategy
to increase the energy efficiency of CFPS would be to feed less glucose, reducing
the overproduction of ATP. Another potential strategy would be to increase the
translation rate, which is potentially the bottleneck for protein production. This
would allow for more energy to be consumed for protein production rather than
NTP degradation.
3.3.4 Global sensitivity analysis
We performed global sensitivity analysis to understand which parameters con-
trolled CFPS productivity and energy efficiency (Fig. 3.6). The translation elonga-
tion rate was the most important factor controlling productivity, while RNAP and
ribosome abundance had only a modest effect irrespective of amino acid supple-
mentation (Fig. 3.6A). This suggested that the translation elongation rate, and not
transcriptional parameters, controlled productivity. Underwood and coworkers
showed that increasing ribosome abundance did not significantly increase protein
yields or rates; however, adding elongation factors increased protein synthesis rates
by 27% [175]. In addition, Li et al. increased the productivity of firefly luciferase by
47
A B
AA Uptake  
& Synthesis  Transcription/Translation Transcription/Translation 
AA Uptake  Parameters Parameters
w/o Synthesis  
AA Synthesis  
w/o Uptake 
Figure 3.6: Sensitivity analysis of the cell-free production of CAT. A. Total order
sensitivity of the optimal CAT productivity with respect to metabolic and transcrip-
tion/translation parameters. B. Total order sensitivity of the optimal CAT energy
efficiency. Metabolic and transcription/translation parameters were varied for
amino acid supplementation and synthesis (black), amino acid supplementation
without synthesis (dark grey) and amino acid synthesis without supplementation
(light gray). Error bars represent the 95% CI of the total order sensitivity index.
5-fold in PURE CFPS by first improving translation, followed by transcription by
adjusting elongation factors, ribosome recycling factor, release factors, chaperones,
BSA, and tRNAs [109]. In examining substrate utilization, glucose consumption
was not important for productivity in the presence of amino acid supplementa-
tion. However, its importance increased significantly when amino acids were
not available. On the other hand, amino acid consumption was only sensitive
when de novo amino acids biosynthetic reactions were blocked, as these were the
only source of amino acids for protein synthesis. The oxygen consumption rate
was the most important factor controlling the energy efficiency of cell-free protein
synthesis (Fig. 3.6B). Oxidative phosphorylation is the most efficient process for
energy generation, however it is unclear how active oxidative phosphorylation
48
Productivity Sensitivity Index
G
Ulup ct oa sk ee  
O
U xp ygta ek ne  
Amin
U o p Ata ck ide  
R
L Ne Av P  
T era lnscrip
R tia ot ne  
Ribos
L oe mve el   
Transla
R tia ot ne   
Energy Efficiency Sensitivity Index
G
Ulup ct oa sk ee  
O
U xp ygta ek ne  
Amin
U o p Ata ck ide  
R
L Ne Av Pel   Transcrip
R tia ot ne  
Ribos
L oe mve el   
Transla
R tia ot ne   
is in CFPS. In the model, we assumed that ATP could be produced by both sub-
strate level and oxidative phosphorylation. Jewett and coworkers reported that
oxidative phosphorylation still operated in cell-free systems, and that the protein
titer decreased from 1.5-fold to 4-fold when oxidative phosphorylation reactions
were inhibited in pyruvate-powered CFPS [81]. Furthermore, we showed that
oxidative phosphorylation must be active to simultaneously meet the metabolic
and protein production constraints. However, it is unknown how active oxidative
phosphorylation is in a glucose-powered cell-free system and its quantitative effect
on energy efficiency.
We calculated the optimal CAT energy efficiency as a function of the oxida-
tive phosphorylation flux to investigate the connection between energy efficiency
and oxidative phosphorylation (Fig. 3.7). We calculated energy efficiency across
an ensemble of 1000 flux balance solutions by varying the oxygen uptake rate
with transcription and translation parameters. Oxidative phosphorylation had a
strong effect on the energy efficiency, both with and without amino acid supple-
mentation. In the presence of amino acid supplementation, the energy efficiency
ranged from 50% to approximately 84%, depending on the oxidative phosphory-
lation flux. However, without amino acid supplementation, the energy efficiency
dropped to approximately 39%, and reached a maximum of 70%. In the absence
of supplementation, a lower energy efficiency was expected for the same oxida-
tive phosphorylation flux, as glucose was utilized for both energy generation and
amino acid biosynthesis. In all cases, whenever the energy efficiency was below
its theoretical maximum, there was an accumulation of both acetate and lactate.
49
AA Uptake & Synthesis 
AA Uptake w/o Synthesis  
AA Synthesis w/o Uptake
Oxidative Phosphorylation Flux (mM/hr)
Figure 3.7: Optimal CAT energy efficiency versus oxidative phosphorylation flux
calculated across an ensemble (N = 1000) of flux balance solutions (points). Energy
efficiency versus oxidative phosphorylation flux for amino acid supplementation
and de novo synthesis (black), amino acid supplementation without de novo synthe-
sis (dark grey), and de novo amino acid synthesis without supplementation (light
gray). The ensemble was generated by randomly varying the oxygen consumption
rate from 0.1 to 10 mM/h and randomly sampling the transcription and translation
parameters within 10% of their literature values. Each point represents one solution
of the model equations.
The experimental dataset exhibited a mixture of acetate and lactate accumulation
during CAT synthesis, which suggested the CFPS reaction was not operating with
optimal oxidative phosphorylation activity. Oxidative phosphorylation is a mem-
brane associated process, whereas CFPS does not rely on living cells, and at least
in theory has no cell membrane in the extract. Jewett and coworkers, however,
hypothesized that inverted membrane vesicles present in the CFPS reaction could
carry out oxidative phosphorylation [81]. Toward this hypothesis, they enhanced
50
CAT Energy Efficiency (%)
the CAT titer by 33% when the reaction was augmented with 10 mM phosphate;
they suggested the additional phosphate either enhanced oxidative phosphoryla-
tion activity or inhibited phosphatase reactions. They also showed that protein titer
was significantly reduced in the absence of oxygen. The model validated oxidative
phosphorylation activity in this CFPS system; without oxidative phosphorylation
there was no feasible solution satisfying the experimental constraints. However,
the number, size, protein loading, and lifetime of these inverted vesicles remains
an open area of study.
3.3.5 Potential alternative metabolic optima
Constraint based approaches are useful tools to calculate the optimal performance
of a biological system; however, the metabolic flux distributions predicted by these
methods are often not unique. Alternative optimal solutions have the same objec-
tive value, e.g., productivity, but different metabolic flux distributions. Techniques
such as flux variability analysis (FVA) [115, 149], mixed-integer approaches [103]
or Monte Carlo sampling of the constraint space [185] can estimate alternative
optimal flux distributions. In this study, we were not interested in specific alter-
native optima, rather we wanted to generate a global view of which pathways
were absolutely necessary to meet optimal CAT production. Toward this ques-
tion, we removed entire reaction groups, and simulated CAT production subject
to experimental constraints, to determine their influence on system performance.
We examined the productivity difference between the experimentally constrained
51
A B
Glycolysis/Gluconeogenesis Glycolysis/Gluconeogenesis 
Pentose Phosphate Pathway Pentose Phosphate Pathway 
Entner-Doudoroff Entner-Doudoroff 
TCA cycle TCA cycle 
Oxidative phosphorylation  Oxidative phosphorylation  
Cofactors Cofactors 
Anaplerotic/Glyoxylate reactions Anaplerotic/Glyoxylate reactions 
Overflow metabolism Overflow metabolism 
Folate metabolism Folate metabolism 
Purine/Pyrimidine Purine/Pyrimidine 
ALA, ASP, ASN biosynthesis ALA, ASP, ASN biosynthesis 
GLU, GLN biosynthesis GLU, GLN biosynthesis 
ARG, PRO biosynthesis ARG, PRO biosynthesis 
GLY, SER biosynthesis GLY, SER biosynthesis 
CYS, MET biosynthesis CYS, MET biosynthesis 
LYS, THR biosynthesis LYS, THR biosynthesis 
HIS biosynthesis HIS biosynthesis 
PHE, TRP, TYR biosynthesis PHE, TRP, TYR biosynthesis 
ILE, LEU, VAL biosynthesis ILE, LEU, VAL biosynthesis
Productivity difference Flux distribution difference
0.0 1.0 0.0 1.0
Figure 3.8: Pairwise knockouts of reaction subgroups in the cell-free network. A.
Difference in the CAT productivity in the presence of reaction knockouts compared
with no knockouts for experimentally constrained CAT production. B. Difference
in the flux distribution in the presence of reaction knockouts compared with no
knockouts for experimentally constrained CAT production. The difference between
perturbed and wild-type productivity and flux distributions was quantified by the
l2 norm, and then normalized so the maximum change was 1.0. Red boxes indicate
potential alternative optimal flux distributions with the same CAT productivity
as the wild type, whereas no red box indicates no feasible solution and/or the
optimal CAT productivity was not met.
system with no knockouts (wild-type) and with group knockouts (Fig. 3.8A). So-
lutions that met the experimental constraints and produced CAT are indicated
by red boxes, solutions outside the red boxes did not meet the constraints and
resulted in infeasible solutions. We then quantified the difference in metabolic flux
distribution between the wild-type and group knockouts (Fig. 3.8B). Globally, the
constraint based simulation reached the same CAT productivity for 40% of the
pairwise knockouts, while 92% of these solutions had different flux distributions
compared with the wild-type. Knockout analysis identified pathways required for
CAT production; for example, deletion of glycolysis/gluconeogenesis or oxida-
52
Gly
P ce on lyt so iEn s
s
e /Gtne  P
lu
r- h
cone
T D osph oC gA c oy udo
a
r to e
enes
Oxid c
 
l P
is
a e
ff athw
Co tf i
ay
A a
v
n c
e
t  o phoa sp
O p
rs hle or ro yti lv ae c
F rflow /
t
G iol ny  olate  mP  e
o
u ta
x
r b
ylate
A i
m
n etab ol
 rea
L eA /Pyr
o ism ctio
, AS im
lism ns
GL P,
id
U ine
A ,R  G
 A
G L, N
SN
P  b
 bio
GL RO
iosy sn yt nY t
C ,
h
 
Y S
es
S E
 h i
, R
b e s
  
io
b sio ys n
sis
LYS M y
th
, E n
esis
H  T
T
H  bR i o
t
IS b s
hesi
 bio i
ynth s
P s eHE, y
o
n st yh ne th
sis
I TRP si
e
s sLE ,
i
 s, LEU T, Y V RA  bL i ob sio ynsy thn eth se issis
Gly
P ce on lyt sE o
i
n s
s/G
tn e P
lu
h co oneT oC er-D sp geA
O  c
o
y uc d
h n
o ar te P
e
a sisxida le
off thw
ti aC yof
A a
v
n c
e
t  o pr hs oa spp hle or rO o yverfl t
lation
Fol o
ic  
a w
/
 Gm lyt e oe ta x ym b lP aur e
t
ta olis
e reac
A ine/P my bolism tiL oA, A rS im
ns
GLU, P,
id
 G  A
ine
ARG L, N
SN
 b  bP ioR sGLY, S O
io
 sb y
yn
E io
n
s th
t
e hCY R y s
e
i s
S s
is
LY , M
 b
E ioT s
n
y thS,  bi n
e
 T o ts h
s
H e
is
H sI isS
P  b
R
io  bi
y
o ns thHE synt yh n
esis
IL , e
t
T s hesiE s, R isL PE , U T, Y V RA  bL i ob sio ynsy thn eth se issis
tive phosphorylation resulted attenuated CAT production. Attenuation of CAT
production in the absence of oxidative phosphorylation suggested that oxidative
phosphorylation was present in the CFPS cell-extract. However, this was not true
for deGFP production in the TXTL 2.0 system. A robustness analysis with and
without oxidative phosphorylation showed a feasible solution was reached for
each simulation (Fig. 3.9). This suggested that oxidative phosphorylation may not
required for a TXTL 2.0 system producing deGFP, however this result is currently
under experimental investigation. There were also pathway knockouts that had
no effect on productivity, such as Pentose Phosphate pathway (PPP) or the biosyn-
thesis reactions of amino acids that did not accumulate in the media (only alanine
and glutamine accumulated). For example, one of the features of the predicted
wild-type metabolic flux distribution was a high flux through the first step of
PPP (zwf ) and the Entner-Douodoroff (ED) pathway. Removal of PPP and the
ED pathway had no effect on the CAT productivity compared to the wild-type
(Fig. 3.8). Pairwise knockouts of the ED pathway and other subgroups (i.e. pentose
phosphate pathway, cofactors, folate metabolism, etc.) also resulted in the same
optimal CAT productivity. However, there was a difference in the flux distribution
with these knockouts (Fig. 3.8B); thus, alternative optimal metabolic flux distribu-
tions exist for CAT production, despite experimental constraints. Lastly, amino
acid biosynthetic knockouts had no effect on the productivity with the exception
of alanine, aspartate, asparagine, glutamate and glutamine biosynthesis reactions,
since amino acids were available in the medium and 13 amino acid biosynthesis
reactions were already blocked (see Materials and Methods). We blocked certain
53
biosynthesis reactions since the cell was grown in the presence of these amino acids
during cell-free extract preparation. Alanine and glutamine accumulated in the
medium; thus, when their biosynthesis reactions were removed, the simulation
failed to meet the experimental constraints. This resulted in no feasible solution
and no CAT production. Ultimately, to determine the metabolic flux distribution
occurring in CFPS, we need to add additional constraints to the flux estimation
calculation. For example, thermodynamic feasibility constraints may result in a
better depiction of the flux distribution [64, 62], and 13C labeling in CFPS could
provide significant insight. However, while 13C labeling techniques are well estab-
lished for in vivo processes [194], application of these techniques to CFPS remains
an active area of research. Taken together, a more constrained solution would help
determine if CFPS has de novo amino acids biosynthesis, and could also help to
identify strategies to optimize CFPS energy efficiency.
3.3.6 Summary and conclusions
In this study, we developed a sequence specific constraint based modeling ap-
proach to predict the performance of cell-free protein synthesis reactions. First
principle predictions of the cell-free production of CAT and deGFP were in agree-
ment with experimental measurements for two different promoters. While we
considered only the P70a and T7 promoters here, we are expanding our library
of possible promoters. These promoter models, in combination with the cell-free
constraint based approach, could enable the de novo design of circuits for optimal
54
Without OxPhos Activity 
With OxPhos Activity 
Mean 3PG Uptake (mM/h)
Figure 3.9: Robust analysis of maltose and 3PG consumption for TXTL 2.0 E. coli
extract with and without oxidative phosphorylation activity that meet the transcrip-
tion and translation constraints. Each dot represents the mean of an ensemble of N
= 20 ssFBA solutions, black dots are solutions without oxidative phosphorylation
and grey dots are solutions with oxidative phosphorylation.
functionality and performance. We also developed effective correlation models
for the productivity and energy efficiency as a function of carbon number that
could be used to quickly prototype CFPS reactions. The productivity correlation
model described the experimental measurements of CAT and deGFP, whereas
the energy efficiency correlation model represented the theoretical optimum that
CFPS could attain. Further, global sensitivity analysis identified that the translation
rate had the highest effect on productivity, while oxidative phosphorylation was
crucial for energy efficiency. While this first study was promising in predicting
protein production, there are several issues to consider in future work. First, a more
detailed description of transcription and translation reactions has been utilized
in genome scale ME models e.g., O’Brien et al [129]. These template reactions
could be adapted to a cell-free system. This would allow us to consider important
55
Mean Maltose Uptake (mM/h)
facets of protein production, such as the role of chaperones in protein folding. We
would also like to include post-translation modifications such as glycosylation that
are important for the production of therapeutic proteins in the next generation of
models. In conclusion, we modeled the cell-free production of a single protein in
this study, but sequence specific constraint based modeling could be extended to
multi-protein synthetic circuits, RNA circuits or small molecule production.
3.4 Materials and Methods
3.4.1 Glucose/NMP cell-free protein synthesis.
The protein synthesis reaction was conducted using the PANOxSP protocol with
slight modifications from that described previously [82]. The glucose/NMP cell-
free protein synthesis reaction was performed using the S30 extract in 1.5-mL
Eppendorf tubes (working volume of 15 µL) and incubated in a humidified incu-
bator at 37 ◦C. The S30 extract was prepared from E. coli strain KC6 (A19 ∆tonA
∆tnaA ∆speA ∆endA ∆sdaA ∆sdaB ∆gshA met+). This K12-derivative has several
gene deletions to stabilize amino acid concentrations during the cell-free reaction.
The KC6 strain was grown to approximately 3.0 OD595 in a 10-L fermenter (B.
Braun, Allentown PA) on defined media with glucose as the carbon source and
with the addition of 13 amino acids (alanine, arginine, cysteine, serine, aspartate,
glutamate, and glutamine were excluded) [195]. Crude S30 extract was prepared
56
as described previously [80]. Plasmid pK7CAT was used as the DNA template
for chloramphenical acetyl transferase (CAT) expression by placing the cat gene
between the T7 promoter and the T7 terminator [92]. The plasmid was isolated
and purified using a Plasmid Maxi Kit (Qiagen, Valencia CA).
All reagents were purchased from Sigma (St. Louis, MO), unless otherwise
noted. The initial mixture included 1.2 mM ATP; 0.85 mM each of GTP, UTP, and
CTP; 30 mM phosphoenolpyruvate (Roche, Indianapolis IN); 130 mM potassium
glutamate; 10 mM ammonium glutamate; 16 mM magnesium glutamate; 50 mM
HEPES-KOH buffer (pH 7.5); 1.5 mM spermidine; 1.0 mM putrescine; 34 µg/mL
folinic acid; 170.6 µg/mL E. coli tRNA mixture (Roche, Indianapolis IN); 13.3
µg/mL pK7CAT plasmid; 100 µg/mL T7 RNA polymerase; 20 unlabeled amino
acids at 2-3 mM each; 5 µM l-[U-14C]-leucine (Amersham Pharmacia, Uppsala
Sweden); 0.33 mM nicotinamide adenine dinucleotide (NAD); 0.26 mM coenzyme
A (CoA); 2.7 mM sodium oxalate; and 0.24 volumes of E. coli S30 extract. This
reaction was modified for the energy source used such that glucose reactions have
30-40 mM glucose in place of PEP. Sodium oxalate was not added since it has a
detrimental effect on protein synthesis and ATP concentrations when using glucose
or other early glycolytic intermediate energy sources [93]. The HEPES buffer
(pKa ∼ 7.5) was replaced with Bis-Tris (pKa ∼ 6.5). In addition, the magnesium
glutamate concentration was reduced to 8 mM for the glucose reaction since a lower
magnesium optimum was found when using a nonphosphorylated energy source
[82]. Finally, 10 mM phosphate was added in the form of potassium phosphate
dibasic adjusted to pH 7.2 with acetic acid.
57
3.4.2 Protein product and metabolite measurements.
Cell-free reaction samples were quenched at specific timepoints with equal vol-
umes of ice-cold 150 mM sulfuric acid to precipitate proteins. Protein synthesis
of CAT was determined from the total amount of 14C-leucine-labeled product by
trichloroacetic acid precipitation followed by scintillation counting as described
previously [25]. Samples were centrifuged for 10 min at 12,000g and 4◦C. The
supernatant was collected for high performance liquid chromatography (HPLC)
analysis. HPLC analysis (Agilent 1100 HPLC, Palo Alto CA) was used to separate
nucleotides and organic acids, including glucose. Compounds were identified
and quantified by comparison to known standards for retention time and UV
absorbance (260 nm for nucleotides and 210 nm for organic acids) as described pre-
viously [25]. The standard compounds quantified with a refractive index detector
included inorganic phosphate, glucose, and acetate. Pyruvate, malate, succinate,
and lactate were quantified with the UV detector. The stability of the amino acids in
the cell extract was determined using a Dionex Amino Acid Analysis (AAA) HPLC
System (Sunnyvale, CA) that separates amino acids by gradient anion exchange
(AminoPac PA10 column). Compounds were identified with pulsed amperometric
electrochemical detection and by comparison to known standards.
58
3.4.3 Formulation and solution of the model equations.
The sequence specific flux balance analysis problem was formulated as a linear
program: ( )
max w = θTw
w X
Subject to : Sw = 0 (3.1)
Li ≤ wi ≤ Ui i = 1, 2, . . . ,R
where S denotes the stoichiometric matrix (M×R), w denotes the unknown flux
vector (R× 1), θ denotes the objective vector (R× 1) and Li and Ui denote the
lower and upper bounds on flux wi, respectively (both R× 1 column vectors).
Unless otherwise specified, Li = 0 and Ui = 100 mM/hr. The transcription (T) and
translation (X) stoichiometry was modeled using the template reactions of Allen
and Palsson [4] (Table 3.1). The objective of the cell free flux balance calculation
was to maximize the rate of protein translation, wX. The total glucose uptake
rate was bounded by [0,40 mM/h] according to experimental data, while the
amino acid uptake rates were bounded by [0,30 mM/h], but did not reach the
maximum flux. Gene and protein sequences were taken from literature [181]. The
sequence specific flux balance linear program was solved using the GNU Linear
Programming Kit (GLPK) v4.55 [1]. For all cases, amino acid degradation reactions
were blocked as these enzymes were likely inactivated during the cell-free extract
preparation [25, 52]. In the absence of de novo amino acid synthesis, all amino
acid synthesis reactions were set to 0 mM/h. In the experimentally constrained
simulations, E. coli was grown in the presence of 13 amino acids (alanine, arginine,
59
Table 3.1: Transcription and translation template reactions for protein production.
The symbol GP denotes the gene encoding protein product P , RT denotes the
concentration of RNA polymerase, G∗P denotes the gene bounded by the RNA
polymerase (open complex), ηi and αj denote the stoichiometric coefficients for
nucleotide and amino acid, respectively, Pi denotes inorganic phosphate, RX de-
notes the ribosome concentration, R∗X denotes bound ribosome, and AAj denotes
jth amino acid.
Description Template reaction
Transcription initiation GP + R ∗T −→ GP
Transcription (wT) G∗P + ∑ ηk · ({k} TP + H2O) −→ mRNA + GP + RT + ∑ ηk · PPi
k∈{A,C,G,U} k∈{A,C,G,U}
mRNA degradation mRNA −→ ∑ ηk · {k}MP
k∈{A,C,G,U}
Translation initiation mRN( A + RX ) −→ R∗X ( )
tRNA charging αj · AAj + tRNA + ATP + H2O −→ αj · AAj-tRNAj + AMP + PPi
( ) j = 1, 2, . . . , 20
Translation (w ) R∗X X + ∑ αj · AAj-tRNAj + 2GTP + 2H2O −→ P + RX + mRNA
j∈{AA} ( )
+∑ αj · tRNA + 2GDP + 2Pi
j∈{AA}
cysteine, serine, aspartate, glutamate, and glutamine were excluded) [195], thus
the synthesis reactions responsible for those 13 amino acids were set to 0 mM/h.
Lastly, reactions that were knocked out in the host strain used to prepare the extract
were removed from the network (∆speA, ∆tnaA, ∆sdaA, ∆sdaB, ∆gshA, ∆tonA,
∆endA).
The bounds on the transcription rate (LT = wT = UT) were modeled as:
( )
w = Vmax
GP
T T (3.2)KT + GP
where GP denotes the concentration of the gene encoding the protein of interest,
and KT denotes a transcription saturation coefficient. The maximum transcription
60
rate VmaxT was formulated as:
[ ( ) ]
Vmax ≡ v̇R TT T u (κ) (3.3)lG
where RT denotes the RNA polymerase concentration (nM), v̇T denotes the RNA
polymerase elongation rate (nt/h), lG denotes the gene length (nt). The term u (κ)
(dimensionless, 0 ≤ u (κ) ≤ 1) is an effective model of promoter activity, where κ
denotes promoter specific parameters. The general form for the promoter models
was taken from Moon et al. [123]; which was based on earlier studies from Bintu
and coworkers [17], and similar to the genetically structured modeling approach
of Lee and Bailey [104]. In this study, we considered two promoters: T7 and P70a.
The promoter function for T7, uT7, was given by:
K
u = T7T7 (3.4)1 + KT7
where KT7 denotes a T7 RNA polymerase binding constant. The P70a promoter
function uP70a (which was used for all other proteins) was formulated as:
K + K f
u 1 2 σ70P70a = (3.5)1 + K1 + K2 fσ70
where K1 denotes the weight of RNA polymerase binding alone, K2 denotes the
weight of RNAP-σ70 bound to the promoter, and fp70 denotes the fraction of the
61
σ70 transcription factor bound to RNAP, modeled as a Hill function:
σn
f 70σ70 = Kn + σn
(3.6)
D 70
where σ70 denotes the sigma-factor 70 concentration, KD denotes the dissociation
constant, and n denotes a cooperativity coefficient. The values for all promoter
parameters are given in Table 3.2.
The translation rate (wX) was bounded by:
( )
≤ ≤ mRNA0 wX VmaxX (3.7)KX + mRNA
where mRNA∗ denotes the steady state mRNA abundance and KX denotes a trans-
lation saturation constant. The maximum translation rate VmaxX was formulated as:
[ ( )]
Vmax
v̇X
X ≡ KPRX (3.8)lP
The term KP denotes the polysome amplification constant, v̇X denotes the ribosome
elongation rate (amino acids per hour), and lP denotes the number of amino acids
in the protein of interest. The mRNA abundance mRNA was estimated as:
mRNAt+∆t = mRNAt + (wT −mRNAtλ)∆t (3.9)
where λ denotes the mRNA degradation rate (h−1). All translation parameters are
62
Table 3.2: Parameters for sequence specific flux balance analysis
Description Parameter Value Units Reference
T7 RNA polymerase concentration RT 1.0 µM specified
Native RNA polymerase concentration RT 75 nM [52]
Ribosome concentration RX 1.6 µM [52, 175]
Transcription elongation rate v̇T 25 nt/s [52]
Translation elongation rate v̇X 2 aa/s/ribosome [52, 175]
T7 transcription saturation coefficient KT7,T 116 nM estimated
P70 transcription saturation coefficient KP70,T 3.5 nM estimated
Translation saturation coefficient KX 45.0 µM estimated
Polysome number KP 10 ribosome number estimated
mRNA degradation rate constant λ 5.2 h−1 [52]
T7 promoter weight KT7 10 constant estimated
Weight RNA polymerase binding alone P70a K1 0.014 constant estimated
Weight bound RNAP-σ70 P70a K2 10 constant estimated
σ70 concentration σ70 35 nM [52]
σ70 dissociation constant KD 130 nM [119]
σ70 hill coefficient n 1 constant [119]
Gene concentration GP 5 nM [52]
ATP transcription coefficient (CAT) ATPT 176 constant calculated
CTP transcription coefficient (CAT) CTPT 144 constant calculated
GTP transcription coefficient (CAT) GTPT 151 constant calculated
UTP transcription coefficient (CAT) UTPT 189 constant calculated
ATP tRNA charging coefficient (CAT) ATPX 219 constant calculated
GTP translation coefficient (CAT) GTPX 438 constant calculated
given in Table 3.2.
63
3.4.4 Calculation of energy efficiency.
Energy efficiency (E ) was calculated as the ratio of transcription and translation
(weighted by the appropriate energy species coefficients) to ATP generation:
E w= T · αT + wX · αXATP (3.10)∑ σj w̄j
j∈RATP
αT = 2 · (ATPT + CTPT + GTPT + UTPT) (3.11)
αX = 2 ·ATPX + GTPX (3.12)
where αT denotes the energy cost of transcription, αX denotes the energy cost of
translation, RATP denotes the set of ATP-producing reactions, and σATPj denotes
the ATP coefficient for reaction j. ATPT, CTPT, GTPT, and UTPT denote the stoi-
chiometric coefficients of each energy species for the transcription of the protein of
interest, ATPX and GTPX denote the stoichiometric coefficients of ATP and GTP for
the translation of the protein of interest. During transcription and tRNA charging,
triphosphate molecules are consumed with monophosphates as byproducts; this is
the reason for the factors of 2 on ATPT, CTPT, GTPT, UTPT, and ATPX
64
3.4.5 Quantification of uncertainty.
Experimental factors taken from literature, for example macromolecular concentra-
tions or elongation rates, are uncertain. To quantify the influence of this uncertainty
on model performance, we randomly sampled the expected physiological ranges
for these parameters as determined from literature. An ensemble of flux distri-
butions was calculated for the three different cases we considered: control (with
amino acid synthesis and uptake), amino acid uptake without synthesis, and amino
acid synthesis without uptake. The flux ensemble was calculated by randomly
sampling the maximum glucose consumption rate within a range of 0 to 30 mM/h
(determined from experimental data) and randomly sampling RNA polymerase
levels, ribosome levels, and elongation rates in a physiological range determined
from literature. P70 RNA polymerase levels were sampled between 60 and 80 nM,
T70 RNA polymerase levels were sampled between 990 and 1010 nM, ribosome
levels between 1.2 and 1.8 µM, the RNA polymerase elongation rate between 20
and 30 nt/s, and the ribosome elongation rate between 1.5 and 3 aa/s [175, 52]. We
generated uniform random samples between an upper (u) and lower (l) parameter
bound of the form:
p∗ = l + (u− l)×U (0, 1) (3.13)
65
3.4.6 Global sensitivity analysis.
We conducted a global sensitivity analysis using the variance-based method of
Sobol to estimate which parameters controlled the performance of the cell-free
protein synthesis reaction [153]. We computed the total sensitivity index of each
parameter relative to two performance objectives: productivity of the protein
of interest and energy efficiency. We established the sampling bounds for each
parameter from literature. We used the sampling method of Saltelli et al. [145] to
compute a family of N (2d + 2) parameter sets which obeyed our parameter ranges,
where N was a parameter proportional to the desired number of model evaluations
and d was the number of parameters in the model. In our case, N = 1000 and d =
7, so the total sensitivity indices were computed from 16,000 model evaluations.
The variance-based sensitivity analysis was conducted using the SALib module
encoded in the Python programming language [65].
3.4.7 Potential alternative optimal metabolic flux solutions.
We identified potential alternative optimal flux distributions by performing single
and pairwise reaction group knockout simulations. Reaction group knockouts
were simulated by setting the flux bounds for all the reactions involved in a group
to zero and then maximizing the translation rate. We grouped reactions in the
cell-free network into 19 subgroups [181]. We computed the difference (l2-norm)
for CAT productivity in the presence and absence of pairwise reaction knockouts.
66
Simultaneously, we computed the difference in the flux distribution (l2-norm)
for each pairwise reaction knockout compared to the flux distribution with no
knockouts. Those solutions with the same or similar productivity but large changes
in the metabolic flux distribution represent alternative optimal solutions.
3.5 Acknowledgements
This study was supported by the National Science Foundation (MCB-1411715) and
the National Science Foundation Graduate Research Fellowship (DGE-1333468)
to N.H. This study was also supported by an award from the US Army and
Systems Biology of Trauma Induced Coagulopathy (W911NF-10-1-0376) to J.V. for
the support of M.V. Lastly, this work was also supported by the Center on the
Physics of Cancer Metabolism through Award Number 1U54CA210184-01 from the
National Cancer Institute. The content is solely the responsibility of the authors and
does not necessarily represent the official views of the National Cancer Institute or
the National Institutes of Health.
67
CHAPTER 4
ABSOLUTE QUANTIFICATION OF CELL-FREE PROTEIN SYNTHESIS
METABOLISM BY REVERSED-PHASE LIQUID
CHROMATOGRAPHY-MASS SPECTROMETRY
4.1 Abstract
1 Cell-free protein synthesis (CFPS) is a widely used research tool in systems and
synthetic biology; however, if CFPS is to become a mainstream technology for appli-
cations such as point-of-care manufacturing, we must understand the performance
limits of these systems. Toward this question, we developed a robust protocol to
quantify 40 compounds involved in glycolysis, the pentose phosphate pathway,
the tricarboxylic acid cycle, energy metabolism and cofactor regeneration in CFPS
reactions. The method uses internal standards tagged with 13C-aniline, while com-
pounds in the sample are derivatized with 12C-aniline. The internal standards and
sample were mixed and analyzed by reversed-phase liquid chromatography-mass
spectrometry (LC/MS). The co-elution of compounds eliminated ion suppression,
allowing the accurate quantification of metabolite concentrations over 2-3 orders
of magnitude where the average correlation coefficient was 0.988. Five of the forty
compounds were untagged with aniline, however they were still detected in the
CFPS sample and quantified with a standard curve method. The chromatic run
1Adapted with permission from Vilkhovoy M, Dai D, Vadhin S, Abhinav A, and Varner
JD, ”Absolute quantification of cell-free protein synthesis metabolism by reversed-phase liquid
chromatography-mass spectrometry”(2019) Journal of Visual Experiments, .
68
takes approximately 10 minutes to complete. In summary, we developed a fast,
robust method to separate, and accurately quantify 40 compounds involved in
CFPS in a single LC/MS run. Taken together, the method is a robust and accurate
approach to characterize cell free metabolism, so that ultimately, we can understand
and improve the yield, productivity and energy efficiency of cell free systems.
4.2 Introduction
Cell-free protein synthesis has become a widely used tool in systems and syn-
thetic biology, and a promising technology for point-of-use manufacturing of
biomolecules. Cell-free systems offer many advantages compared to in vivo pro-
cesses, such as direct access to metabolites and the biosynthetic machinery without
the interference of a cell wall or the complications associated with cell growth [70].
However, a fundamental understanding of the performance limits of cell free pro-
cesses has been lacking. High-throughput methods for metabolite quantification
are valuable because they can help characterize metabolism, they are important
to our understanding of the systems, and are critical to the construction of ro-
bust metabolic computational models useful in process optimization[181, 180, 71].
Common methods used to determine metabolite concentrations include Nuclear
Magnetic Resonance (NMR), Fourier transform-infrared spectroscopy (FT-IR),
enzyme-based assays, and mass spectrometry (MS)[61, 36, 144, 170]. However,
these methods are often limited by their inability to efficiently measure multiple
69
compounds at once and sample size requirements. For example, enzyme-based
assays can often only be used to quantify a single compound in a run, and are lim-
ited when the sample size is small, such as in cell-free protein synthesis reactions
(typically run on a 10-15 µL scale). Meanwhile, NMR requires a high abundance
of metabolites for detection and quantification[36]. Toward these shortcomings,
chromatography methods in tandem with mass spectrometry (LC/MS) provide
several advantages, including sensitivity and the capability of measuring multiple
species simultaneously[40]; however, the analytical complexity increases consid-
erably with the number and diversity of species being measured. It is important,
therefore, to develop methods that fully realize the high-throughput potential of
LC/MS systems. Compounds in a sample are separated by liquid chromatography
and identified through mass spectrometry. The signal of the compound depends on
its concentration and ionization efficiency, where the ionization can vary between
compounds and may also depend on the sample matrix.
Achieving the same ionization efficiency between the sample and standards is
a challenge to using LC/MS to quantify analytes. Further, quantification becomes
more challenging with metabolite diversity due to signal splitting and heterogene-
ity in proton affinity and polarity[75]. Lastly, the co-eluting matrix of the sample
can also affect the ionization efficiencies of the compounds. To address these issues,
metabolites can be chemically derivatized, increasing the separation resolution,
and the sensitivity and detection by the LC/MS system, while simultaneously
decreasing signal splitting in some cases[75, 74]. Chemical derivatization works by
tagging specific functional groups of metabolites to adjust their physical properties
70
like charge or hydrophobicity to increase ionization efficiency[74]. Various tagging
agents can be used to target different functional groups like amines, hydroxyls,
phosphates, carboxylic acids, etc. Aniline, one such derivatization agent, targets
multiple functional groups at once, and adds a hydrophobic component into hy-
drophilic molecules, increasing their separation resolution and signal[191]. To
address the co-eluting matrix ion suppression effect, Yang and coworkers devel-
oped a technique based on Group Specific Internal Standard Technology (GSIST)
labeling where standards are tagged with 13C aniline isotopes and mixed with
the sample[191, 77]. The metabolite and corresponding internal standard have the
same ionization efficiency since they co-elute, and their intensity ratio can be used
to quantify the concentration in the experimental sample.
In this study, we developed a protocol to detect and quantify 40 compounds
involved in glycolysis, the pentose phosphate pathway, the tricarboxylic acid cycle,
energy metabolism and cofactor regeneration in cell-free protein synthesis reactions.
The method is based on the GSIST approach, where we used 12C-aniline and 13C-
aniline to tag, detect, and quantify metabolites using reversed-phase LC/MS. The
linear range of all compounds spanned 2-3 orders of magnitude with an average
correlation coefficient of 0.988. In conjunction, we used a commercially available
method by Waters to tag, detect and separate all 20 amino acids in the cell-free
extract. This method had a linear range for 2 orders of magnitude and an average
correlation coefficient of 0.999. Thus, the method is a robust and accurate approach
to interrogate cell free metabolism, and possibly whole-cell extracts.
71
Sample CFPS Standards
de-proteinized
Label with 12C-Aniline Label with 13C Aniline 
Combine equal volume of 
12C sample and 13C standards 
LC-MS
A Ax A Cx =  A    
x    C
std std std
m/z time
Figure 4.1: Schematic of workflow for aniline tagging. The cell-free protein syn-
thesis reaction is de-proteinized and tagged with 12C-aniline, while a standard
stock mixture is tagged with 13C-aniline. Both mixtures are then mixed at a 1:1
volumetric ratio and analyzed by LC/MS.
72
Intensity 
Intensity 
4.3 Results
4.3.1 Aniline tagged metabolites
As a proof-of-concept, we used the protocol to quantify metabolites in myTXTL,
a commercially available E. coli based CFPS system (Arbor Biosciences) express-
ing green fluorescent protein (GFP). The CFPS reaction (14µL) was quenched
and de-proteinized with ethanol. The CFPS sample was then tagged with 12C-
aniline, while standards were tagged with 13C-aniline. The tagged sample and
standards were then combined and injected into the LC/MS (Fig. 4.1). The proto-
col detected and quantified 40 metabolites involved in central carbon and energy
metabolism using internal standards, while a standard curve for 5 of the metabo-
lites that were not tagged with aniline was also developed (Fig. 4.2 and Table
4.1). The diverse metabolites involved in these pathways were a class of phos-
phorylated sugars, phosphocarboxylic acids, carboxylic acids, nucleotides, and
cofactors. The derivatization with aniline introduced a hydrophobic moiety into
hydrophilic molecules which facilitated more effective separation using reversed-
phase chromatography[191]. In addition, the method enabled the separation of
structural isomer pairs such as glucose 6-phosphate and fructose 6-phosphate in a
single LC/MS run. Each compound’s mass over charge (m/z) ratio and retention
time were identified prior to the experiment by injecting 1mM of one compound at
a time and comparing the mass spectrum to the blank (Table 4.2).
73
1.00 29
0.75 33
37 28
17
34
0.50
13 15 27
12 32
5 14 3536
6 10 22 25 2630 31
11
0.25 3 16
4 8 18
9 19
38
1 2 7 20 23 4021 24 39
0.00
3.75 4.50 5.25 6.00 6.75 7.50 8.25 9.00 9.75 10.50
Time (min)
1. Gly3P 9. LAC 17. UDP 25. CTP 33.MAL 
2. NAD 10. AMP 18. FAD 26. GTP 34.GAP 
3. GLC 11. UMP 19. F16P 27. OAA 35.ACA 
4. S7P 12. NADP 20. 6PG 28. aKG 36.NADPH 
5. F6P 13. 3PG 21. NADH 29. UTP 37.PEP 
6. GMP 14. CDP 22. G6P 30. ATP 38.SUCC 
7. RL5P 15. GDP 23. R5P 31. FUM 39. ICIT 
8. CMP 16. ADP 24. E4P 32. PYR 40.CIT
Figure 4.2: Mass chromatogram from a single LC/MS run of a 40µM standard
mixture of 40 metabolites. Peaks were identified by their retention time and m/z
values for each compound. Complete compound names and their abbreviations
are listed in Table 4.1.
74
Intensity (A.U.)
The limit of detection and range of linearity for all compounds was estimated
by producing a standard curve that ranged from 0.10 µM to 400 µM (Table 4.1).
The average correlation coefficient (R2) for all compounds was 0.988 and most
compounds had a linear range of 3-orders of magnitude. Three compounds had
notable saturation effects, especially alpha-ketoglutarate which had a linear range
from 0.1 µM to 25 µM. Isocitrate and citrate also had saturation effects above 100
µM.
4.3.2 Amino Acid Analysis
As a proof-of-concept, we applied a commercially available protocol (Waters Corp.)
to quantify amino acids in myTXTL, a commercially available E. coli based CFPS
system (Arbor Biosciences) expressing green fluorescent protein (GFP). The CFPS
reaction (14µL) was quenched and de-proteinized with ethanol. The de-proteinized
sample was then tagged with AccQ-Tag Ultra Derivatization Kit (Waters Corp),
separated by reverse-phase liquid chromatography and detected with a TUV at
260nm (Fig. 4.3). The accQ-Tag contained 17 of the 20 amino acids in the amino
acid hydrolysate standard. The stock mixture was supplemented with the three
missing amino acids: L-glutamine, L-asparagine, and L-tryptophan at the same
concentration as the other amino acids. The limit of detection and limit of the linear
ranges was determined to range from 0.781 to 50 µM with an average correlation
coefficient of 0.999 (Table 4.3). The only exception was L-cysteine which had a
linear range of 0.391 to 25 µM with a correlation coefficient of 0.999. L-cysteine
75
Table 4.1: Each compound’s corresponding limit of detection, range of linearity
and correlation coefficient identified from standard curves.
Peak Metabolite Abbreviation KEGG ID Limit of Limit of Linear 2Detection (µM) Range (µM) R
1 Glycerol 3-phosphate Gly3P C00093 0.1 400 0.995
2 Nicotinamide adenine dinucleotide NAD C00003 0.39 400 0.993
3 Glucose GLC C00031 0.1 400 0.997
4 Sedoheptulose 7-phosphate S7P C05382 0.16 400 0.988
5 Fructose 6-phosphate F6P C00085 0.1 400 0.986
6 Guanosine monophosphate GMP C00144 0.39 100 0.992
7 Ribulose 5-phosphate RL5P C00199 0.39 400 0.996
8 Cytidine monophosphate CMP C00055 0.1 100 0.992
9 Lactate LAC C00186 0.1 400 0.988
10 Adenosine monophosphate AMP C00020 0.1 100 0.992
11 Uridine monophosphate UMP C00105 0.1 100 0.997
12 Nicotinamide adenine dinucleotide phosphate NADP C00006 0.34 400 0.950
13 3-Phosphoglyceric acid 3PG C00197 0.1 100 0.996
14 Cytidine diphosphate CDP C00112 0.39 400 0.997
15 Guanosine diphosphate GDP C00035 1.5625 400 0.984
16 Adenosine diphosphate ADP C00008 0.39 400 0.995
17 Uridine diphosphate UDP C00015 0.39 400 0.991
18 Flavin adenine dinucleotide FAD C00016 0.1 400 0.958
19 Fructose 1,6-bisphosphate F16P C05378 0.39 400 0.989
20 Gluconate 6-phosphate 6PG C00345 0.39 400 0.989
21 Nicotinamide adenine dinucleotide reduced NADH C00004 0.39 100 0.972
22 Glucose 6-phosphate G6P C00668 0.1 400 0.984
23 Ribose 5-phosphate R5P C00117 0.39 100 0.999
24 Erythrose 4-phosphate E4P C00279 0.39 400 0.979
25 Cytidine triphosphate CTP C00075 6.25 100 0.998
26 Guanosine triphosphate GTP C00044 6.25 100 0.993
27 Oxalacetate OAA C00036 0.56 400 0.997
28 Alpha-ketoglutarate aKG C00026 0.1 25 0.979
29 Uridine triphosphate UTP C00075 1.5625 400 0.998
30 Adenosine triphosphate ATP C00002 1.5625 400 0.991
31 Fumarate FUM C00122 1.5625 100 0.999
32 Pyruvate PYR C00022 0.39 400 0.993
33 Malate MAL C00149 0.1 400 0.991
34 D-glyceraldehyde 3-phosphate GAP C00118 0.1 100 0.974
35 Acetyl-coenzyme A ACA C00024 0.1 100 0.991
36 Nicotinamide adenine dinucleotide phosphate reduced NADPH C00005 0.14 100 0.990
37 Phosphoenolpyruvate PEP C00074 0.1 100 0.962
38 Succinate SUCC C00042 0.1 320 0.999
39 Isocitrate ICIT C00311 0.39 100 0.998
40 Citrate CIT C00158 0.1 100 0.981
76
Table 4.2: Each compound’s corresponding peak number, retention time, m/z
value for 12C, 13C, and unlabeled, cone voltage, and MS species.
Peak Metabolite KEGG ID RetentionTime (min) 12C m/z 13C m/z nonlabel m/z CV MS Species
1 Gly3P C00093 3.85 153 10 M – H2O – H
2 NAD C00003 3.96 698 10 M + Cl – H
3 GLC C00031 4.06 289.9 296 15 M + A + Cl - H
4 S7P C05382 5.41 364 370 10 M + A – H
5 F6P C00085 5.48 334 340 10 M + A – H
6 GMP C00144 5.57 437.05 443 10 M + A – H
7 RL5P C00199 5.58 304 310 10 M + A – H
8 CMP C00055 5.59 397.09 403 10 M + A – H
9 LAC C00186 5.77 164.05 170 10 M + A – H
10 AMP C00020 5.85 421.1 427.1 10 M + A – H
11 UMP C00105 5.88 398.07 404 10 M + A – H
12 NADP C00006 6.39 724 10 M - H2O – H
13 3PG C00197 6.63 242 248.06 15 M + A – H2O – H
14 CDP C00112 6.72 477 483 10 M + A – H
15 GDP C00035 6.87 517 523 10 M + A – H
16 ADP C00008 6.94 501 507 10 M + A – H
17 UDP C00015 6.97 478 484 10 M + A – H
18 FAD C00016 7.03 784.15 15 M – H
19 F16P C05378 7.1 395.95 402.1 10 M + A – H2O – H
20 6PG C00345 7.11 425.1 437 10 M + 2A – H
21 NADH C00004 7.23 633.13 639.08 10 M + A + H2O – nicotinamide – H
22 G6P C00668 7.32 409.1 421.1 10 M + 2A – H
23 R5P C00117 7.54 379.1 391.1 15 M + 2A – H
24 E4P C00279 7.71 348.9 361 10 M + 2A – H
25 CTP C00075 7.84 557 563 5 M + A – H
26 GTP C00044 7.93 597 603 5 M + A – H
27 OAA C00036 7.94 281 293 25 M + 2A – H
28 aKG C00026 7.95 295 307.1 15 M + 2A – H
29 UTP C00075 7.97 558 564 10 M + A – H
30 ATP C00002 8.03 581 587 15 M + A – H
31 FUM C00122 8.09 265 277.1 10 M + 2A – H
32 PYR C00022 8.09 162 168 25 M + A – H
33 MAL C00149 8.09 283.06 295.15 10 M + 2A – H
34 GAP C00118 8.09 319 331.1 5 M + 2A – H
35 ACA C00024 8.16 790 10 M – H2O – H
36 NADPH C00005 8.23 694.92 700.82 10 M + A – nicotinamide – H
37 PEP C00074 8.28 317 329.1 20 M + 2A – H
38 SUCC C00042 8.64 267.07 279.1 15 M + 2A – H
39 ICIT C00311 10.13 398 416 10 M + 3A – H2O – H
40 CIT C00158 10.46 416.1 434.06 20 M + 3A – H
A: represents aniline group under MS Species
77
1.00
0.75
0.50
0.25
0.00
2.00 3.00 4.00 5.00 6.00 7.00 8.00
Time (min)
Figure 4.3: Amino acid chromatogram tagged and separated by reverse-phase
liquid chromatography and detected with a TUV at 260nm. Peaks were identified
by their retention time.
had a lower limit of linear range since it’s concentration was half of all the other
amino acids in the amino acid hydrolysate standard mixture. Amino acids in the
sample were identified by their retention time and compared to the standard and
quantified by standard curve method.
78
Intensity (A.U.)
NH3
His
Asn
Ser
Gln
Arg
Gly
Asp
Glu
Thr
Ala
Pro
Derivatization 
Cys Lys 
Tyr 
Val Met 
Ile 
Leu 
Phe Trp 
Table 4.3: Each amino acid’s retention time separated by reverse-phase liquid
chromatography and detected by TUV at 260nm with the corresponding limit of
detection, linear range, and correlation coefficient.
Amino Acid Abbreviation KEGG ID Retention Limit of Limit of Linear 2Time (min) Detection (µM) Range (µM) R
L-histidine His C00135 2.565 0.781 50 0.999
L-asparagine Asn C00152 2.893 0.781 50 0.999
L-serine Ser C00065 3.694 0.781 50 0.999
L-glutamine Gln C00064 3.788 0.781 50 0.999
L-arginine Arg C00062 3.92 0.781 50 0.999
L-glycine Gly C00037 4.082 0.781 50 0.999
L-aspartate Asp C00049 4.500 0.781 50 0.999
L-glutamate Glu C00025 5.009 0.781 50 0.999
L-threonine Thr C00188 5.363 0.781 50 0.999
L-alanine Ala C00041 5.834 0.781 50 0.999
Lproline Pro C00148 6.419 0.781 50 0.999
L-cysteine Cys C00097 7.192 0.391 25 0.999
L-lysine Lys C00047 7.250 0.781 50 0.999
L-tyrosine Tyr C00082 7.501 0.781 50 0.999
L-methionine Met C00073 7.611 0.781 50 0.999
L-valine Val C00183 7.680 0.781 50 0.999
L-isoleucine Ile C00407 8.340 0.781 50 0.999
L-leucine Leu C00123 8.438 0.781 50 0.999
L-phenylalanine Phe C00079 8.573 0.781 50 0.999
L-tryptophan Trp C00078 8.629 0.781 50 0.999
79
Table 4.4: Each compound’s retention time and mass over charge ratio with the
corresponding limit of detection, linear range, and correlation coefficient.
Nucleotide Sugar Abbreviation RetentionTime (min) m/z
Limit of Limit of Linear 2
Detection (µM) Range (µM) R
CMP-Sialic Acid CMP-Neu5AC 1.562 613.10 0.2 20 0.999
GDP-D-Mannose GDP-D-Man 1.656 604.01 0.2 20 0.999
UDP-a-D-Galactose UDP-a-D-Gal 1.670 564.96 0.2 20 0.999
UDP-N-acetyl-D-glucosamine/galactosamine UDP-Hex 1.671 606.00 0.2 20 0.996
4.3.3 Nucleotide charged sugars
We developed a protocol for the detection and quantification of five nucleotide
charged sugars (Fig. 4.4). Nucleotide charged sugars are important precursors
for glycoproteins which are products of interest to be produced in CFPS [78]. The
retention time and mass over charge ratio for each compound were determined
individually from standards. The range from 0.2 to 20 µM had a linear coefficient
of 0.999 for all compounds except UDP-Hex which had a linear coefficient of
0.996 (Table 4.4). Three of the five nucleotide sugars (CMP-Sialic Acid, GDP-D-
Mannose, and UDP-a-Galactose) had unique mass over charge ratios that allowed
for their detection and quantification. Whereas UDP-N-acetyl-D-glucosamine and
galactosamine had the same retention time of 1.671 minutes and the same m/z
of 606.0, thus they were not distinguishable for individual quantification. Due
to this, the compounds were mixed at a 1:1 ratio to be used for quantification in
biological samples. This protocol has been used to determine the corresponding
concentrations of the nucleotide sugars in mammalian cells lines (intracellular
levels) and from E. coli lysate (data not shown).
80
1.00 UDP-Hex
0.75
UDP-a-D-Gal
GDP-D-Man
0.50
CMP-Neu5Ac
0.25
0.00
0.00 0.40 0.80 1.20 1.60 2.00 2.40 2.80 3.20 3.60
Minutes
Figure 4.4: Nucleotide charged sugars chromatogram separated by reverse-phase
liquid chromatography and detected by mass-spectrometry according to each
compounds mass over charge ratio. Peaks were identified by their retention time
and selective ion recording.
4.4 Discussion
Cell-free systems have no cell wall, thus there is direct access to metabolites and
the biosynthetic machinery without the need for complex sample preparation.
However, despite this, very little work has been done to develop thorough and
robust protocols to quantitatively interrogate cell-free reaction systems. In this
study, we developed a fast, robust method to quantify metabolites in cell-free
reaction mixtures and potentially in whole-cell extracts. Individual quantification
81
Intensity (A.U.)
of metabolites in complex mixtures, such as those found in cell-free reactions,
or whole-cell extracts, is challenging for several reasons. Central amongst these
reasons is chemical diversity. The array of functional groups simultaneously
present in these mixtures, such as carboxylic acids, amines, phosphates, hydroxyls,
etc. greatly increases the analytical complexity. To circumvent this, we used
an aniline derivatization method in combination with 13C internal standard to
introduce hydrophobic components to the metabolite mixtures. Using this method,
we robustly detected and quantified 40 metabolites in a cell-free reaction in a single
LC/MS run. While we demonstrated this technique in a cell-free reaction mixture,
it could also likely be applied to whole-cell extracts, thus, potentially allowing
the absolute quantification of intracellular metabolites concentrations. The latter
application has relevance to a variety of important questions in biotechnology and
human health.
The method presented here was based on a previous technique (GSIST) that
was applied to whole-cell extracts of the yeast S. cerevisiae[191, 77]. In this study,
we expanded upon which compounds could be detected and quantified to include
all 12 nucleotides (xMP, xDP, xTP, where x is A, C, G and U). Addition of these
compounds could have important biological implications. For example, these
nucleotides are heavily involved in transcription and translation processes, which
is one of the central processes of interest in CFPS applications, and more generally
the compounds are important in a variety of physiological functions. In addition,
we were able to detect acetic acid which is an important metabolite when examining
overflow metabolism. However, we did not include it in the study because there
82
was a significant reduction of signal in multiple compounds, especially NADH
and NADPH, when acetic acid was added to the standard mixture. Acetic acid
had a high limit of detection of 612 µM, thus at these high levels it had a negative
effect on the other metabolites’ signals. Despite this, acetic acid can still be detected
and quantified in samples by creating a standard curve with just acetic acid in the
vial. Acetic acid had a m/z value of 134.0, retention time of 5.78 minutes, and a
linear range from 612 µM to 5000 µM (R2 = 0.986) when tagged with 12C-aniline.
The remaining metabolites did not alter each other’s ion signal and represent a
comprehensive mixture to characterize CFPS metabolism.
Taken together, we developed a fast, robust protocol for the characterization
and absolute quantification of 40 compounds involved in glycolysis, the pentose
phosphate pathway, the tricarboxylic acid cycle, energy metabolism and cofactor
regeneration in CFPS reactions. The method relied on internal standards tagged
with 13C-aniline, while the sample was tagged with 12C-aniline. The internal
standards and sample compounds co-eluted and eliminated ion-suppression ef-
fects which enabled accurate quantification of individual metabolites in complex
metabolite mixtures. We identified a total of 40 compounds (41, if including acetic
acid) that can be detected and quantified in a cell-free reaction mixture; however,
the list of metabolites could be further expanded and adjusted towards the par-
ticular biochemical process of interest. Thus, the method provides a robust and
accurate approach to characterize cell free metabolism, which is potentially critical
to improving the yield, productivity and energy efficiency of cell free processes.
83
4.5 Materials and Methods
4.5.1 Aniline derivatization
Materials and Reagents: All metabolite standards, aniline, N-(3-dimethylaminopropyl)-
N’-ethylcarbodiimide hydrochloride (EDC), tributylamine (TBA), triethyamine
(TEA), HPLC grade acetonitrile, and HPLC grade water were purchased from
Sigma-Aldrich (St. Louis, MO). Sedoheptulose 7-Phosphate was purchased from
Carbosynth (Compton, UK). All materials and equipment are listed in Table A.1 in
the appendix.
LC-MS: The UPLC-ESI-MS system consisted of a UPLC system (Acquity H-
Class, Waters) and an electrospray ionization (ESI) source mass spectrometer (QDA
detector, Waters). The system was controlled by Empower 3 software (Waters).
The autosampler was set at 10 ◦ C. Separation were performed on a Acquity BEH
C18 Column (1.7 µm, 2.1 mm x 150 mm, Waters). The elution started from 95%
mobile phase A (5 mM TBA aqueous solution, adjusted to pH 4.75 with acetic acid)
and 5% mobile phase B (5 mM TBA in Acetonitrile), raised to 70% B in 10 minutes,
further raised to 100% B in 2 minutes, and then held at 100% B for 3 minutes and
returned to initial conditions over 1 minute and held for 9 minutes to re-equilibrate
the column. The flow rate was set at 0.3 mL/min with injection volume as 5 µL.
The column was preconditioned by pumping the starting mobile phase mixture for
10 minutes, followed by the gradient protocol specified above 3 times prior to any
84
injections. LC-ESI-MS chromatograms were acquired in negative ion mode under
the following conditions: capillary voltage of 10 V, dry temperature at 520◦C, and
an acquisition range of m/z 100-800. Selected ion recordings were specified for
each metabolite and are listed in Table 4.2.
Labeling protocol: A solution of 6.0 M 12C-aniline was prepared by combining
550 µL of aniline with 337.5 µL if water and 112.5 µL of 12 M hydrochloric acid
and vortexed.A solution of 6.0 M 13C-aniline solution was prepared by combing
250 mg 13C-aniline with 132 µL water and 44 µL of 12 M hydrochloric acid and
vortexed. Store aniline solutions at 4 ◦C for upto 2 months. EDC at 200.0 mg/mL
was prepared freshly in HPLC grade water. A 50 µL sample solution with 35
standards was prepared in water at 40 µM. 5 µL of 13C-aniline was added to
the sample solution followed by 5 µL of 200 mg/mL EDC. The CFPS sample
was de-proteinized by the addition of 100% ice-cold ethanol at a 1:1 volumetric
ratio and centrifuged at 12,000 x g for 15 minutes at 4◦C. The supernatant was
transferred into a new centrifuge tube and 6 µL was used for aniline tagging. The
volume was brought upto 50 µL with water and 5 µL of 12C-aniline and 5 µL of
200 mg/mL EDC was added to the reaction. Both sample and standard mixtures
were vortexed with gentle shaking at ambient temperature ( 22 °C) for 2 h. The
labeling reaction was stopped by the addition of 1.5 µL of triethylamine. The
mixture was centrifuged at 13,500 xg for 3 minutes. The supernatant of the sample
and the standard were combined at a 1:1 volumetric ratio into an autosample vial
for injection into the LC-MS. The solution mixture was injected at 5 µL and the
12C-aniline m/z tagged values were recorded. The sample was injected again at
85
the same volume and the 13C-aniline mz values were recorded (Table 4.2). The
QDa detector is unable to record both the 12C and 13C m/z values at the same
time since it cannot handle that amount of data acquisition. Thus, the sample is
injected twice to record the sample intensities followed by the standard intensities.
Standard curve preparation: Prepare a series of dilutions in water of the un-
tagged metabolites (NAD, NADP, FAD, acetyl-CoA and glycerol 3-phosphate)
ranging from 0.4 to 400 µM with a volume of 50 µL. Add 5 muL of 12-C aniline and
5 µL of 200 mg/ml EDC and vortex at room temperature for 2 hours. Add 1.5 µL
of triethylamine and centrifuge at 13,500 x g for 3 minutes. Transfer the standard
into an auto-sample vial and inject into the LC-MS. The untagged metabolites
follow the same procedure as the sample to replicate the sample matrix in order to
maintain a similar ionization efficiency.
Quantification of metabolites: The mass-chromatogram peak for each metabo-
lite is integrated and the area is used to quantify the amount in the sample by the
following equation:
A
C = x,ix,i CA std,i
D (4.1)
std,i
where Cx,i is the concentration of the unknown sample for metabolite i, Ax,i is
the integrated area of the unknown metabolite i, Astd,i is the integrated area of the
internal standard of metabolite i, Cstd,i is the concentration of the internal standard
of metabolite i, and D is the dilution factor.
86
Untagged metabolites are quantified by the standard curve method where
the integrated area of a standard is associated with the known concentration. A
standard curve is developed for the series of different concentrations and is used
to quantify the unknown amounts in the sample.
4.5.2 Amino acid derivatization
Amino Acid labeling protocol: A solution containing a mixture of 20 amino acids
is drivatized with a Waters AccQ-Tag Ultra amino acid analysis kit (Waters). The
sample is prepared by taking 10 µL of a mixture of 20 amino acids and adding
70 µL of a buffer solution (Waters) followed by 20 µL of a reagent (Waters). The
solution is then kept in a water bath at 55 °C for 10 minutes. The solution is then
separated by reverse-phase liquid chromatography with a Acquity Amide C18
Column (2.1 mm x 150 mm, Waters) and analyzed with a TUV detector at 260
nm. The gradient protocol is available from Waters Corporation. Amino acid are
detected and quantified based on known retention times (Fig. 4.3).
4.5.3 Nucleotide charge sugar detection
Nucleotide charge sugar protocol: Nucleotide charged sugars were purchased from
CarboSynth (Newbury, UK). Standards were dissolved in water individually and
injected into an UPLC-ESI-MS (Waters) to determine their corresponding retention
87
times and mass over charge ratios (m/z). The UPLC-ESI-MS system consisted of
a UPLC system (Acquity H-Class, Waters) and an electrospray ionization (ESI)
source mass spectrometer (QDA detector, Waters). The system was controlled by
Empower 3 software (Waters). The autosampler was set at 10 ◦ C. Separation were
performed on a Acquity BEH C18 Column (1.7 µm, 2.1 mm x 150 mm, Waters).
Separation were performed on a Acquity BEH C18 Column (1.7 µm, 2.1 mm
x 50 mm, Waters). The elution started from 95% mobile phase A (5 mM TBA
aqueous solution, adjusted to pH 4.75 with acetic acid) and 5% mobile phase
B (5 mM TBA in Acetonitrile), raised to 57% B in 2 minutes, further raised to
100% B in 0.5 minutes, and then held at 100% B for 2 minutes and returned to
initial conditions over 0.1 minute and held for 4 minutes to re-equilibrate the
column. The flow rate was set at 0.6 mL/min with injection volume as 5 µL. The
column was preconditioned by pumping the starting mobile phase mixture for 10
minutes, followed by the gradient protocol specified above 2 times prior to any
injections. LC-ESI-MS chromatograms were acquired in negative ion mode under
the following conditions: capillary voltage of 10 V, dry temperature at 520◦C, and
an acquisition range of m/z 100-800. Selected ion recordings were specified for
each metabolite and are listed in Table 4.4.
88
4.6 Acknowledgments
The work described was supported by the Center on the Physics of Cancer
Metabolism through Award Number 1U54CA210184-01 from the National Cancer
Institute ( https://www.cancer.gov/ ). The content is solely the responsibility of
the authors and does not necessarily represent the official views of the National
Cancer Institute or the National Institutes of Health. The funders had no role in
study design, data collection and analysis, decision to publish, or preparation of
the manuscript.
89
CHAPTER 5
AN INTEGRATED KINETIC CONSTRAINT-BASED MODEL OF E. COLI
CELL-FREE PROTEIN SYNTHESIS
5.1 Abstract
Cell-free protein expression has become a widely used research tool in systems and
synthetic biology, and a promising technology for biomanufacturing of proteins.
Cell-free protein synthesis relies on transcription and translation machinery to
produce a protein of interest. However, to fuel this process requires biochemical
enzymes and reactions that are involved in complex metabolic pathways. Here
we use isotope labeling to measure absolute metabolite concentrations in an E. coli
based cell-free system for a batch reaction. We then integrate this information with
kinetic parameters, enzyme levels, enzyme activity assays, and kinetic descrip-
tions of transcription and translation in a constraint-based mathematical modeling
framework. The modeling framework predicts the production of mRNA and pro-
tein along with metabolic behavior of two oxidative phosphorylation inhibitors.
Flux estimations and experimental data reveal that the cell-free reaction has active
central carbon metabolism with glutamate powering the TCA cycle to provide
reduced ubiquinone for oxidative phosphorylation that sustains the batch reaction
for 16 hours.
90
5.2 Introduction
Cell-free protein synthesis (CFPS) is a widely used research tool in systems and
synthetic biology and a promising platform for manufacturing of proteins and
chemicals [196, 110, 94, 70, 78]. In the past decades, CFPS has been used to better
understand biochemical processes. For example, E. coli cell-free extracts were
used in the 1960s to decipher the sequencing of the genetic code [118, 128]. Today,
CFPS is gaining wide interest in metabolic engineering to circumvent significant
barriers in traditional in vivo systems [58]. Cell-free systems offer many advantages
for the study, manipulation and modeling of metabolism compared to in vivo
processes. However, both approaches still require mathematical modeling to help
better understand the metabolism that is occurring in these system. Ultimately,
mathematical models can identify unintuitive strategies for the rationale design of
strains and circuits to improve product yield and system efficiency [12, 182, 27].
Constraint-based approaches such as flux balance analysis (FBA), which use sto-
ichiometric reconstructions of microbial metabolism, have become standard tools
in systems biology and metabolic engineering [108]. Stoichiometric reconstruc-
tions have been expanded to include the integration of metabolism with detailed
descriptions of gene expression (ME-Model) [4, 107, 129] and protein structures
(GEM-PRO) [197, 30]. Constraint-based approaches model metabolism using the
biochemical stoichiometry and other constraints such as thermodynamical feasibil-
ity [64, 62] under pseudo steady state conditions. Given these constraints, these
models have used linear programming [34] to predict productivity [176, 146, 181],
91
yield [176], mutant behavior [43], and growth phenotypes [129] for biochemi-
cal networks of varying complexity, including genome scale networks, using a
limited number of adjustable parameters. Currently, there are only a few math-
ematical models of CFPS that integrate metabolic pathways with transcription
and translation processes [71, 181]. In addition, eperimental measurements of
absolute metabolite levels are required to build dynamic mathematical models of
metabolism that can describe and predict experimental data.
Cell-free systems allow for direct access to metabolites and the biosynthetic
machinery without the interference of a cell wall or the complications associated
with cell growth. However, comprehensive measurements of cell-free metabolism
have not been reported in literature with the exception of amino acids and a few
organic acids [83, 82, 81, 181]. A variety of methods exist for measuring metabolite
concentrations[23], most commonly with liquid chromatography-mass spectrome-
try (LC/MS). A complication of LC/MS systems is maintaining the same ionization
efficiency for samples and standards to obtain reliable absolute concentrations.
Here, we overcome this limitation with isotopically labeled standards based on an
aniline tagging technique [191]. Through this approach and additional analytical
techniques, we quantified 61 metabolites involved in central carbon and energy
metabolism.
In this study, we expand on our previous sequence specific constraint based
model and integrate kinetic turnover rates, enzyme levels, kinetic descriptions of
transcription/translation, enzyme activity assays and absolute metabolite levels
92
Construct network Integrate Integrate Constrain to Ensemble modeling
kinetic parameters enzyme levels metabolite fluxes
10 3
10 4
10 5 gdh
hk ldh CFPS Analysis
10 6 ppc
10 7
10 7 10 6 10 5 10 4 10 3
Enzyme level from activity assays (mM)
Figure 5.1: Modeling framework of cell-free protein synthesis. The metabolic net-
work was adapted from Vilkhovoy and coworkers where transcription/translation
was integrated with metabolism. Maximum flux bound rates were formulated to be
a function of the turnover rate and enzyme abundance found to be present in CFPS
extract. Enzyme levels were validated for a subset of 15 reactions with enzyme
activity assays. Four of the enzymes were not reported in Garenne and coworker
(grey boxes), but were found to be active with it’s corresponding enzyme activity
assay. The flux estimation for each time step was estimated while being constrained
to metabolic measurements where data was present (62 species). Finally, the flux
calculation was sampled across an ensemble of 100 sets given experimental noise
and literature parameters. hk: hexokinase, gdh: glutamate dehydrogenase, ppc:
phosphoenolpyruvate carboxylase, sdh: succinate dehydrogenase.
to describe CFPS metabolism. Flux estimations and experimental data reveal that
oxidative phosphorylation is active in myTXTL and is coupled with central carbon
metabolism to power transcription/translation for the production of GFP for 16
hours. The mathematical framework described metabolic behavior pertubations
from the control when biochemical inhibitors were introduced into the reaction.
Taken together, we provide a modeling framework that describes and predicts
CFPS metabolism and can be potentially used to identify strategies toward cell-free
metabolic engineering applications.
93
Enzyme level from literature (mM)
5.3 Results
5.3.1 Integration of kinetic parameters, enzyme levels, and
metabolite concentrations
We integrated kinetic parameters, enzyme levels and metabolite concentrations
along with mechanistic descriptions of transcription and translation to describe
the time course of CFPS metabolism (Fig. 5.1). To this end, we adapted a earlier
stoichiometric reconstruction of CFPS [181] with 200 reactions (not including ex-
change reactions) and 157 species that described glycolysis, the pentose phosphae
pathway, tca cycle, amino acid biosynthesis, chorismate, purine and pyrmidine
metabolism. The network also described sequence specific descriptions of tran-
scription/translation including tRNA charging of amino acids. Mechanistic kinetic
rates for transcription and translation were derived following mass-action kinetics.
This mathematical framework has been previously used to predict protein syn-
thesis for green fluorescent protein (GFP) and chloramphenicol acetyltransferase
(CAT) under different promoters in two different CFPS systems. However, there
was a high uncertainty in flux estimations. Toward this we constrained each flux
to be a function of it’s turnover rate and enzyme level. We identified turnover
rates for each reaction from BRENDA and/or taken from Adadi and coworkers [3].
Enzyme abundance levels were identified for 104 reactions in our network from
Garenne and coworkers [53] in the myTXTL lysate. We validated enzyme concen-
94
trations for a subset of 15 reactions using our enzyme activity assays (Fig. 5.1).
Four of the enzymes including hexokinase (hk), glutamate dehydrogenase (gdh),
phosphoenolpyruvate carboxylase (ppc), and succinate dehydrogenase (sdh) were
not reported in Garenne and coworkers, but were found to be active in myTXTL
with their corresponding activity assay. All remaining enzymes not reported in
Garenne and coworkers were set to a median value of 50 nM. In addition, we
constrained the upper bound of the corresponding reaction with the experimental
enzyme activity level. Finally, we integrated absolute measurements of 63 species
for the timecourse of the CFPS reaction which included central carbon metabo-
lites, energy species and amino acids. Taken together, the modeling framework
with integrated kinetic parameters, enzyme levels, and metabolite concentrations
provided an accurate timecourse flux distribution of CFPS metabolism (Fig. 5.2).
Metabolism of the myTXTL system has been reported to rely on maltodextrin
and 3-phosphoglycerate (3PG) to provide energy resources for transcription and
translation. However, metabolic constraints and enzyme activity assays for gluta-
mate dehydrogenase reveal glutamate powers the TCA cycle along with succinate
dehydrogenase to provide energy support for oxidative phosphorylation. Previ-
ously, it was inconclusive whether oxidative phosphorylation was active in the
myTXTL system. Flux distributions show oxidative phosphorylation was active
throughout the CFPS reaction with high flux at 2 hours (Fig. 5.2A) and moderate
flux at 8 hours (Fig. 5.2B). Maltodextrin and 3PG activated the glycolysis pathway
and lead to an accumulation of organic acids such as pyruvate and acetate (Fig. 5.6.
The accumulation of acetate relies on substrate level phosphorylation which pro-
95
a Control (2 h) b Control (8 h)
gp …gpM M (n-1) MALTOSE Flux (A.U.)
gp gp
M M (n-1) … MALTOSE
gp 0 25 50 75 100 gp
G1P GLC G1P GLC
hk hk
pgm pgm
zwf pgl gnd zwf pgl gnd
G6P 6GPL 6GPC RL5P G6P 6GPL 6GPC RL5P
pgi rpe rpiedd pgi
rpe rpi
edd
F6P XU5P R5P O F6P XU5P R5P2 Otkt1 tkt1 2
fdp pfk fdp pfk
S7P G3P atp atpF16P tkt2 F16P tkt2 S7P G3P
talAB talAB
fbaA ATP fbaA ATP
tpiA E4P F6P tpiA E4P F6P
DHAP G3P DHAP G3P
gapA gapA
eda GLU eda GLU
1,3DPG 2DDG6P 1,3DPG 2DDG6P
pgk acn ICIT icd gdh pgk acn ICIT icd gdh
3PG AC CIT AKG 3PG AC CIT AKG
gpm ackA gltA gpm ackA gltA
akgdh akgdh
2PG ACCOA aceA 2PG ACCOA aceA
eno eno
ppc ppc
PEP OAA GLX SUCCOA PEP OAA GLX SUCCOA
pps pyk pdh aceB pps pyk pdh aceB
scs scs
LAC PYR mdh LAC PYR mdh
ldh maeAB ldhMAL SUCC maeAB MAL SUCC
frd frd
fum FUM sdh fum FUM sdh
Figure 5.2: Mean flux distribution across an ensemble (N=100) for control. Fluxes
were determined by integrating kinetic parameters with enzyme levels and con-
straining to measurements of metabolites and enzyme activity levels where data
was available. (a) Flux distribution at 2 hours of CFPS reaction. (b) Flux distribution
at 8 hours of CFPS reaction. Fluxes were normalized to maltodextrin consumption
at t=0 hours.
vides an inefficient energy pathway when compared to oxidative phosphorylation.
At 4 hours of the CFPS reaction, metabolism switched toward pyruvate consump-
tion towards valine synthesis which showed an accumulation in the media (Fig.
5.4). In conclusion, the timecourse flux distribution of CFPS revealed the system
relied on a mixture of aerobic and anaerobic processes to provide the necessary
energy requirements for transcription and translation. Cell-free protein synthesis
is a mixture of cytoplasmic extract and does not contain the necessary enzymatic
regulation of in vivo systems to exploit the optimal pathway given the system’s
96
a b
Control

DNP

TTA
Figure 5.3: Prediction of mRNA and protein levels in CFPS for control (blue),
DNP (red) and TTA (grey). (a) The mRNA levels of GFP were predicted with the
given modeling framework. (b) The protein abundance of GFP was predicted
for all three cases. The solid line denotes the mean of the ensemble (N=100), the
shaded region denotes the 95% confidence interval of the ensemble, the points
denote experimental measurements, and error bars denote the standard deviation
of experimental measurements.
environment.
5.3.2 Transcription/Translation is oxygen dependent
Transcription and translation processes were oxygen dependent in the myTXTL
CFPS system. Specific inhibitors of oxidative phosphorylation showed that respira-
tion was active and powers transcription and translation. When a cell-free reaction
was incubated with two different inhibitors including thenoyltrifluoroacetone
(TTA), an electron transport inhibitor in Complex II, and 2-4-dinitrophenol (DNP),
a membrane gradient uncoupler, protein accumulation was significantly less than
that of the control (Fig. 5.3B). In addition, mRNA levels were not sustained for
97
Control

DNP

TTA
Figure 5.4: Time course of amino acid levels in CFPS for control (blue), DNP (red)
and TTA (grey). Experimental amino acid fluxes constrained the mathematical
model of CFPS. The solid line denotes the mean of the ensemble (N=100), the
shaded region denotes the 95% confidence interval of the ensemble, the points
denote experimental measurements, and error bars denote the standard deviation
of experimental measurements.
the duration of the CFPS reaction with the inhibitors (Fig. 5.3A). This can be seen
with the depletion of CTP and GTP at approximately 4 hours of the CFPS reaction
for TTA (Fig. 5.7). In the case with DNP in the CFPS extract, mRNA levels were
degraded substantially slower then in the case of TTA. This shows that DNP relied
on substrate level phosphorylation to fuel transcription and translation resulting
in a slightly higher titer of GFP of 10.4 ± 0.8 µM whereas TTA resulted in a titer
of 8.0 ± 1.0 µM. Whereas for the reaction with TTA, transcription and translation
relied on the nucleotides that were available in the media. This can be seen with
98
Control

DNP

TTA
Figure 5.5: Time course of upper central carbon metabolite levels in CFPS for
control (blue), DNP (red) and TTA (grey). DNP showed exhuastion of maltose
revealing maltodextrin depletion and thus high carbon utilization. Experimental
fluxes constrained the mathematical model of CFPS. The solid line denotes the
mean of the ensemble (N=100), the shaded region denotes the 95% confidence
interval of the ensemble, the points denote experimental measurements, and error
bars denote the standard deviation of experimental measurements.
the depletion of CTP and GTP at approximately 4 hours of the CFPS reaction (Fig.
5.7). In addition, the DNP treatment had a significantly higher accumulation of
acetate of 53 mM at 16 hours compared to the control with 39 mM and TTA with 27
mM. The high accumulation of acetate in the control further supports that myTXTL
relied on aerobic and anaerobic processes. For the control, mRNA levels were
maintained at a steady-state level of approximately 570 nM and GFP resulted in a
99
Control

DNP

TTA
Figure 5.6: Time course of lower central carbon metabolite levels in CFPS for
control (blue), DNP (red) and TTA (grey). DNP heavily relied on substrate level
phosphorylation with high accumulation of acetate, whereas TTA had a high
abundance of lactate. Experimental fluxes constrained the mathematical model
of CFPS. The solid line denotes the mean of the ensemble (N=100), the shaded
region denotes the 95% confidence interval of the ensemble, the points denote
experimental measurements, and error bars denote the standard deviation of
experimental measurements.
titer of 21.3 ± 1.6 µM. The higher accumulation of GFP for the control compared to
the oxidative phosphorylation inhibitors supports that oxidative phosphorylation
was active in myTXTL and was able to sustain transcription and translation with
an activated metabolism of central carbon pathways.
100
Control

DNP

TTA
Figure 5.7: Time course of energy species levels in CFPS for control (blue), DNP
(red) and TTA (grey). Both DNP and TTA exhausted GTP within 4 hours of the
reaction which is required for translation. Experimental fluxes constrained the
mathematical model of CFPS. The solid line denotes the mean of the ensemble
(N=100), the shaded region denotes the 95% confidence interval of the ensemble,
the points denote experimental measurements, and error bars denote the standard
deviation of experimental measurements.
5.3.3 Kinetic descriptions with metabolic constraints predict
metabolic behavior of oxidative phosphorylation inhibitors
The integrated modeling framework of CFPS predicted the dynamic behavior of
mRNA and protein production for an aerobic reaction (control) and for reactions
incubated with DNP and TTA. In order to capture the mRNA timecourse behavior,
the transcription rate was formulated to be a function of saturation kinetics of the
reactants involved which included, ATP, CTP, GTP, UTP and the concentration
of the plasmid for GFP. Given this formulation, the model captured the mRNA
level for the first 12 hours of the CFPS reaction for the control, however it failed to
101
capture the mRNA abundance at the 16 hour time point. Since CTP, GTP, and UTP
essentially decline to 0 mM toward the end of the CFPS reaction, transcription is
halted and mRNA is degraded. However, the experimental system showed mRNA
maintaining its steady state value at 16 hours. Despite this, the ensemble captured
GFP production for the entire CFPS reaction.
For the DNP case, we added a reaction that leaked a charged hydrogen to an
uncharged hydrogen and set this reaction to be maximized. 2-4-dinitrophenol
is a membrane uncoupler and acts as a chemical ionophore that leaks charged
proton ions. Thus, DNP doesn’t allow for a gradient to form for efficient oxidative
phosphorylation activity. The added reaction allowed the model to accurately
describe the effect of DNP on CFPS metabolism. Given these conditions, along
with the metabolic constraints and enzyme activity assays, the ensemble captured
the dynamic behavior of mRNA and protein production of GFP. The mathematical
model estimated a reduction of 94% in oxidative phosphorylation activity for DNP
when compared to the control. In the case of TTA, the model estimated a reduction
of 51% in oxidative phosporylation activity with no additional modifications to the
model. Additionally, the model predicted no flux via succinate dehydrogenase for
the first 2 hours and very low flux for the remainder of the reaction. Thenoyltriflu-
oroacetone directly blocks the respiratory chain at complex II which is part of the
succinate dehydrogenase enzyme. Given the metabolic and kinetic constraints, the
modeling framework was able to accurately predict the effect of DNP and TTA on
CFPS metabolism as well as mRNA and protein production.
102
5.3.4 Analysis of CFPS metabolism with oxidative phosphoryla-
tion inhibitors
With the accurate prediction of the effect of DNP and TTA on CFPS metabolism
and validation of the ensemble capturing mRNA and protein production, we
analyzed the flux distribution at 2 and 8 hours and compared key reactions to the
control to gain insights into CFPS metabolism (Fig. 5.8-5.9). Together with absolute
metabolite measurements, kinetic parameters, enzyme levels and enzyme activity
assays, we determined the net flux distribution across an ensemble of 100 sets with
sampling on experimental noise and uncertainty in literature values. Substantial
differences were observed across all three cases as well as throughout the duration
of the CFPS reaction.
When the CFPS reaction was incubated with DNP, there was an increase in
metabolism and oxygen consumption, however, oxidative phosphorylation was
inactive. At 2 hours of the reaction, the majority of the carbon traveled through
glycolysis with 74% via pgi and 24% through pentose phosphate pathway via zwf.
The split toward pentose phosphate was notably higher for DNP then compared to
the control, where only 1% of the flux traveled through zwf at the 2 hour mark of
the reaction. The TCA cycle for DNP behaved very similarly to the control with
high activity via gdh and saw no significant differences across the ensemble. How-
ever, as the reaction progressed towards 8 hours, significant differences appeared
throughout the network. First, maltose was depleted and thus lower activity is
103
a DNP (2 h) b DNP (8 h)
gp …gp Flux (A.U.) gp gpM M (n-1) MALTOSE M M (n-1) … MALTOSE
gp 0 25 50 75 100 gp
G1P GLC G1P GLC
hk hk
pgm pgm
zwf pgl gnd zwf pgl gnd
G6P 6GPL 6GPC RL5P G6P 6GPL 6GPC RL5P
pgi rpe rpi pgi rpe rpiedd edd
F6P XU5P R5P O2 F6P XU5P R5P Otkt1 tkt1 2
fdp pfk fdp pfk
F16P tkt2 S7P G3P
atp atp
F16P tkt2 S7P G3P
talAB talAB
fbaA ATP fbaA ATP
tpiA E4P F6P tpiA E4P F6P
DHAP G3P DHAP G3P
gapA gapA
eda 2DDG6P GLU eda 2DDG6P GLU1,3DPG 1,3DPG
pgk acn ICIT icd gdh pgk acn ICIT icd gdh
3PG AC CIT AKG 3PG AC CIT AKG
gpm ackA gltA gpm ackA gltA
akgdh akgdh
2PG ACCOA aceA 2PG ACCOA aceA
eno eno
ppc
PEP OAA GLX SUCCOA
ppc
PEP OAA GLX SUCCOA
pps pyk pdh aceB pps pyk pdh aceB
scs scs
LAC PYR mdh LAC PYR mdh
ldh maeAB ldhMAL SUCC maeAB MAL SUCC
frd frd
fum FUM sdh fum FUM sdh
Figure 5.8: Mean flux distribution across an ensemble (N=100) for DNP. Fluxes
were determined by integrating kinetic parameters with enzyme levels and con-
straining to measurements of metabolites and enzyme activity levels where data
was available. (a) Flux distribution at 2 hours of CFPS reaction. Flux difference
from control shown for key reactions at 2 hours of CFPS reaction. (b) Flux distri-
bution at 8 hours of CFPS reaction. Flux difference from control shown for key
reactions at 8 hours of CFPS reaction. Fluxes were normalized to maltodextrin
consumption at t=0 hours.
104
pgm
hk
pgi
gpm
eno
pyk
pdh
zwf
rpe
rpi
tkt1
gltA
akgdh
sdh
fum
mdh
mae
ackA
ldh
atp
pgm
hk
pgi
gpm
eno
pyk
pdh
zwf
rpe
rpi
tkt1
gltA
akgdh
sdh
fum
mdh
mae
ackA
ldh
atp
seen via pgm and hk. In addition, 100% of the carbon traveled via zwf and the
first step in glycolysis had a backward reaction to further supplement G6P for the
pentose phosphate pathway. Lower glycolysis showed much higher flux starting
from gapA to pdh and towards ackA. Compared to the control, DNP had a 740%
increase in flux via gapA and a 120% increase in flux via pdh. The high metabolism
rate with DNP incubation resulted in the accumulation of acetate and thus relying
on substrate level phosphorylation since oxidative phosphorylation was inhibited.
When the CFPS reacton was incubated with TTA, there was a decrease in overall
metabolism, however, oxidative phosphorylation remained active, but at a 51%
reduction when compared to the control. Upper glycolysis involving maltodextrin
consumption, glucose-1-phosphate, and glucose utilization was very similar when
compared to the control with a slight increase in pgm activity. However, just as in
the case with DNP, there was a higher split towards pentose phosphate pathway
with 15% at 2 hours of the reaction and 18% at 8 hours of the reaction. The most
notable differences were in the TCA cycle which had very low activity throughout
the pathway and this can be seen with high glutamate levels in the media. TTA is
an inhibitor of succinate dehydrogenase and uncoupled the TCA cycle from central
carbon metabolism. Despite having an active oxidative phosphorylation reaction,
central carbon metabolism showed significant less flux then that of the control and
DNP. In addition, there was a high accumulation of lactate with approximately 8
mM at the end of the 16 hour reaction. This accumulation is most likely due to the
surplus of NADH not utilized in oxidative phosphorylation.
105
a TTA (2 h) b TTA (8 h)
gp gp gp gp
M M (n-1) … MALTOSE Flux (A.U.) M M (n-1) … MALTOSE
gp 0 25 50 75 100 gp
G1P GLC G1P GLC
hk hk
pgm pgm
zwf pgl gnd zwf pgl gnd
G6P 6GPL 6GPC RL5P G6P 6GPL 6GPC RL5P
pgi rpe rpi pgi rpe rpiedd edd
F6P XU5P R5P O2 F6P XU5P R5P Otkt1 tkt1 2
fdp pfk fdp pfk
atp atp
F16P tkt2 S7P G3P F16P tkt2 S7P G3P
talAB talAB
fbaA ATP fbaA ATP
tpiA E4P F6P tpiA E4P F6P
DHAP G3P DHAP G3P
gapA gapA
eda 2DDG6P GLU eda1,3DPG 1,3DPG 2DDG6P
GLU
pgk acn ICIT icd gdh pgk acn ICIT icd gdh
3PG AC CIT AKG 3PG AC CIT AKG
gpm ackA gltA gpm ackA gltA
akgdh akgdh
2PG ACCOA aceA 2PG ACCOA aceA
eno eno
ppc ppc
PEP OAA GLX SUCCOA PEP OAA GLX SUCCOA
pps pyk pdh aceB pps pyk pdh aceB
scs scs
LAC PYR mdh LAC PYR mdh
ldh maeAB ldhMAL SUCC maeAB MAL SUCC
frd frd
fum FUM sdh fum FUM sdh
Figure 5.9: Mean flux distribution across an ensemble (N=100) for TTA. Fluxes were
determined by integrating kinetic parameters with enzyme levels and constrain-
ing to measurements of metabolites and enzyme activity levels where data was
available. (a) Flux distribution at 2 hours of CFPS reaction. Flux difference from
control shown for key reactions at 2 hours of CFPS reaction. (b) Flux distribution
at 8 hours of CFPS reaction. Flux difference from control shown for key reactions
at 8 hours of CFPS reaction. Fluxes were normalized to maltodextrin consumption
at t=0 hours.
106
pgm
hk
pgi
gpm
eno
pyk
pdh
zwf
rpe
rpi
tkt1
gltA
akgdh
sdh
fum
mdh
mae
ackA
ldh
atp
pgm
hk
pgi
gpm
eno
pyk
pdh
zwf
rpe
rpi
tkt1
gltA
akgdh
sdh
fum
mdh
mae
ackA
ldh
atp
In summary, in the case of TTA, transcription and translation were powered
with the nucleotides and amino acids that were supplemented with the CFPS
extract and lasted for roughly 4 hours with protein production. Whereas when
the reaction was incubated with DNP, central carbon metabolism remained active
and relied on substrate level phosphorylation but it wasn’t sufficient to sustain
transcription and translation for over 4-6 hours. Whereas in the control, central
carbon metabolism was activated to fuel and sustain transcription and translation
and showed active protein production for 16 hours with mRNA still at steady state
levels. With differences in flux distributions between all three groups, we next
wanted to examine the performance metrics of the myTXTL system in terms of
energy efficiency and carbon yield.
Energy Efficiency
Energy efficiency was significantly higher for the control at 23 ± 2.8% for transcrip-
tion and translation, where the DNP and TTA group were at 15 ± 2.7% and 15 ±
2.6%, respectively (Fig. 5.10). Energy efficiency was calculated as a ratio for the
entire duration of the reaction in terms of nucleotides triphosphates utilized for
the corresponding category to ATP generation. Despite having a higher energy
efficiency then the treatment groups, 37 ± 1% of the nucleotide triphosphates
were wasted toward degradation (Fig 5.10A). Twenty seven percent of energy was
utilized in glycolysis, with 11% toward amino acid biosynthesis and 2% toward
anaplerosis.
107
a b c
Control DNP TTA
Amino Acids Amino Acids TXTL Amino Acids TXTL
11% TXTL 11% 8%15% 15%
23%
Degradation
16%
Amino Acids
Glycolysis
Degradation 23% Anaplerosis
37% Glycolysis 10%Degradation
27% Glycolysis50% 51%
Anaplerosis
1%
Anaplerosis
2%
Figure 5.10: Mean energy efficiency across an ensemble (N=100) for control (a),
DNP (b), and TTA (c) throughout the metabolic network. TXTL denotes the energy
efficiency for transcription and translation processes.
In the case of DNP, there was 50% of the energy wasted toward degradation.
This is due to the effect of DNP which resulted in a higher metabolism as well as
the inhibition of all energy requiring processes [56]. For TTA, the majority of the
energy was spent on glycolysis with 51%. The flux distribution for TTA showed
only an active upper glycolysis pathway and thus resulting in a higher than normal
energy utilization.
Carbon Yield
Carbon yield for GFP production was similar for all groups. The control group had
a carbon yield of 2±0.2%, where the DNP group had a carbon yield of 1.4±0.2%
and TTA had a carbon yield of 1±0.2% (Fig. 5.11). Carbon yield was calculated for
the duration of the reaction as a ratio of the concentration of the carbon produced
to the total concentration of the carbon consumed. The low carbon yield for all
108
a b c
Control DNP TTA
GFP GFP GFP
2%
CO2 1% Other 1%
Other 12% Other CO2 17% CO2
24% 28% 25% 9%
Amino Acids
Amino Acids 3%Glycolysis Glycolysis
1% 22% Amino Acids 28%
1%
Glycolysis
TCA Cycle PPP 16% PPPPPP
3% 17% 39%
TCA Cycle
36% TCA Cycle 3%
12%
Figure 5.11: Mean carbon yield across an ensemble (N=100) for control (a), DNP
(b), and TTA (c) for CFPS. PPP denotes the Pentose Phosphate Pathway. Other
includes purine, pyrimidine and chorismate metabolism.
groups showed that the myTXTL system was supplemented with more carbon
then needed. For instance, the extract is supplied with 20-40 mM of maltodextrin
and was measured to have a concentration of approximately 104 mM of glutamate.
Meanwhile, the amount of GFP produced was in the range of 10-30 µM.
Thus, we investigated where the remaining carbon went towards. The control
and TTA group had a very similar distributino of the carbon in the network.
Between both groups, 9-12% went towards carbon dioxide, 22-28% remained
in glycolysis, 3% remained in the TCA cycle, 36-39% remained in the pentose
phosphate pathway, 1-3% went towards amino acid biosynthesis and 17-24%
went towards purine, pyrmidine and chorismate metabolism (other category).
Meanwhile in the DNP group, the carbon yield had a more uniform distribution
throughout the network with a notable difference in carbon dioxide (25%). The
higher yield of carbon dioxide further supports the higher metabolism observed
109
with DNP incubation [56].
5.3.5 Enzyme activity assays reveal allosteric regulation in CFPS
The activity of enzymes throughout central carbon metabolism were measured in
the myTXTL system at 2 and 8 hours of a CFPS reaction for control, DNP and TTA
(Fig. 5.12). Substantial differences were observed between groups and between the
two time points for a number of enzymes. However, for enzymes where allosteric
regulation is not present, including: eno, mdh, gdh, and akgdh; the enzyme activity
assays showed to have the same activity between groups and time points. Thus, the
activity assays reveal that allosteric regulation is present in CFPS. For instance, the
enzyme icd is allosterically regulated with phosphoenolpyruvate (PEP) inhibited
its activity. The control and TTA group show the enzyme activity increased from
approximately 180 to 250 mM/h and 125 to 290 mM/h, respectively, from 2 to 8
hours. Additionally, there was an overall decrease of PEP abundance for cotrol and
TTA in the CFPS extract from 2 to 8 hours which resulted in the observed increase
of enzyme activity for icd.
5.4 Discussion
Cell-free protein synthesis relies on transcription and translation machinery in
order to produce a protein of interest. However, the mechanisms and reactions
110
Control
DNP
TTA
GLC
2h 8h 2h 8h 2h 8h
hk G6P ATP
NAD(P)H F16PNADH
G6P zwf 6GPL 6GPC gnd RL5P
pgi 6PGC
2h 8h
F6P
pfk F16P,PEPF6P,AMP
F16P 2h 8h GLU
2h 8h ICIT 2h 8h
fbaA DHAP icd gdhPEP,CIT CIT
DHAP G3P PEP AKG
2h 8h 2h 8h
gltA AKG
2PG ACCOA akgdh
eno
F16P 2h 8h
PEP ppc OAA SUCCOA2h 8h
F16P, PEP pyk GDP scs
ATP mdh
2h 8h 2h 8h
LAC ldh PYR pdh MAL SUCC
PYRAMP CIT
GLX 2h 8h 2h 8h
fum sdh
FUM
2h 8h
2h 8h
2h 8h 2h 8h
Figure 5.12: Enzyme activity measurements reveal allosteric regulation is present
in CFPS. Enzyme activity assays at 2 and 8 hours of the CFPS reaction throughout
the metabolic network for control (black), DNP (dark grey), and TTA (light grey).
involved in CFPS are not only limited to transcription and translation. Jewett and
coworkers have shown that central carbon metabolism is activated with inverted
membrane vesicles carrying out oxidative phosphorylation in the Cytomim system
, an E. coli based extract [81]. Thus, CFPS systems are often more complex then pre-
viously believed with many interacting species that could be potentially exploited
111
to optimize for better performance. In this study we used a mathematical frame-
work to gain insights into the the activity of metabolic pathways and biochemical
reactions of CFPS in the myTXTL system.
One of the main advantages of CFPS systems is the elimination of the cell mem-
brane, which allows for direct access to metabolites and potentially precise control
of biochemical reactions. However, a comprehensive absolute quantification of
metabolites in CFPS has not been reported and is often limited to amino acids
and a few organic acids [83, 83, 81, 71]. Here, we present a robust time-course
quantification of 41 species with a single-quad LC-MS system along with 20 amino
acids with a LC-TUV system. In addition, we quantified absolute levels of mRNA
using real-time quantitative RT-qPCR. This comprehensive dataset allowed us to
constrain our mathematical model of CFPS metabolism integrated with kinetic
parameters from BRENDA [79] and enzyme abundance levels estimated by Ga-
reene and coworkers [53]. The constraint-based linear programming framework
was validated with the prediction of mRNA and protein production along with
describing the effect of DNP and TTA. DNP incubation has been reported to result
in overconsumption of oxygen, higher rate of metabolism, while inhibiting en-
ergy requiring processes [120, 56]. DNP has also been shown to disrupt oxidative
phosphorylation in the Cytomim system [81] and in zebrafish [15]. The model
predicted a high rate of metabolism leading to high levels of acetate, agreeing with
literature findings. Further, the high carbon yield of 25% in carbon dioxide suggests
higher activity metabolism and anaplerosis. TTA blocks the respiratory chain at
Complex II by binding to the quinone reduction site and inhibiting the transfer of
112
electrons form FE-S centre S-3 of succinate dehydrogenase to oxidized ubiquinone
[167, 140]. TTA was also used by Jewett and coworkers to assess whether oxidative
phosphorylation was active in the Cytomim system. In our study, TTA incubation
resulted in a lower protein titer of about 50% when compared to the control. The
model predicted reduced activity in succinate dehydrogenase and oxidative phos-
phorylation activity reduced by 51% when compared to the control. Previously,
the myTXTL system has been reported to rely on glycolysis and recycle inorganic
phosphates [28, 52], however our study suggests that the E. coli based extract also
has active oxidative phosphorylation with glutamate powering the TCA cycle.
Our modeling framework provided quantitative means to assess the perfor-
mance of CFPS in terms of energy efficiency and carbon yield. Previously, we
reported a theoretical optimal energy efficiency of approximately 80% for transcrip-
tion and translation [181]. However, the myTXTL system had an energy efficiency
of 23% for the control and 15% with the inhibitors. In addition, the carbon yield for
GFP was only 2%. Thus, CFPS has more than enough carbon and energy require-
ments but is not being effectively used. The flux distribution suggests that despite
having oxidative phosphorylation, anaerobic processes are still active in cell-free
extracts as seen with the high accumulation of acetate and flux in anaplerotic
reactions. Where in vivo systems are able to respond to different environment
conditions and activate different metabolic pathways, cell-free extracts no longer
have the ability for enzymatic regulation. Thus, some of the enzymes that are
present in CFPS may lead to inefficiency and low carbon yield. For example, Bujara
and coworkers successfully increased the yield of dihydroxyacetone phosphate
113
(DHAP) from glucose in CFPS [22]. The source strain that was used for cell-free
extract preparation had a gene knockout of triosephophosphate isomerase (tpiA)
which resulted in the higher accumulation of DHAP. Such strategies have been
used for decades in in vivo systems [12] and are only beginning to be used in CFPS
[196, 2, 11]. In addition, the majority of the energy is wasted toward nucleotide
triphosphate degradation. This suggests that the energy utilization could be op-
timized by addressing the rate limiting step of protein production identified as
translation [109, 181] Underwood and coworkers showed that increasing ribosome
abundance did not significantly increase protein yields or rates; however, adding
elongation factors increased protein synthesis rates by 27% [175] which would
require more energy to be spent towards translation instead of degradation. Sub-
sequently, the carbon substrates that power CFPS could be minimized in order to
increase the energy efficiency and carbon yield by lowering the total ATP produced
since the majority is degraded.
Our analysis of CFPS performance was based on the quality and accuracy of the
flux estimation for each time step. The integrated kinetic constraint-based model
was constrained to 61 species with kinetic parameters and physiological enzyme
levels in the myTXTL system. Adadi and coworkers have used kinetic parameters
for the flux bounds with success in predicting microbial growth rates [3]. Thus,
we have high confidence in the reported flux distribution to be a representation
of CFPS metabolism for myTXTL. In addition, the flux bounds for a subset of 19
enzymes was also constrained to the actual enzyme activity levels. The assays
revealed that allosteric regulation is active in CFPS and should be incorporated into
114
the mathematical framework. Allosteric regulation was shown to be instrumental
in capturing experimental data with a kinetic ODE model for CFPS [71]. Since our
model was constrained to the enzyme activity assays, mathematical descriptions
of allosteric regulation on flux bounds was not needed. However, conducting
these assays experimentally is low throughput and expensive. Thus, it would be
advantageous to incorporate allosteric regulation into the modeling framework.
Taken together, we provide an integrated kinetic constraint based mathematical
framework with absolute metabolite measurements to better understand cell-free
metabolism that can be used to understand performance limitations. Flux estima-
tions revealed that central metabolism is activated along with glutamate powering
the TCA cycle to provide reduced ubiquinone for oxidative phosphorylation. Ox-
idative phosphorylation inhibitors provide biochemical evidence that myTXTL
relied on oxidative phosphorylation to provide energy for sustaining transcription
and translation for 16 hours in a batch reaction. Finally, enzyme activity assays
throughout central carbon metabolism revealed that allosteric regulation is present
in CFPS metabolism and should be incorporated into future mathematical models.
Cell-free protein synthesis is beyond just transcription and translation processes,
thus we provide a comprehensive mathematical framework that predicted mRNA
and protein production and could potentially be used to identify strategies for the
improvement of CFPS productivity, yield and efficiency.
115
5.5 Materials & Methods
5.5.1 Cell-free protein synthesis and oxidative phosphorylation
inhibitors
Cell-free protein synthesis reactions were carried out with the myTXTL system
(Arbor Biosciences) in 1.5 mL Eppendorf tubes at 29 ◦C. Plasmid P70a-GFP (Arbor
Biosciences) was used as the DNA template for green flourescent protein (GFP)
expression. The template plasmid was amplified in E. coli KL740 cI857+ (E. coli
Genetic Stock Center, No. 14222). The plasmid was isolated and purified using a
Plasmid Mini Kit (Qiagen, Valencia CA). Each cell-free reaction was supplemented
with a final concentration of 5 nM P70a-GFP. Each Cell-free reaction had a total
volume of 14 µL with 9 µL myTXTL master mix, 1.5 µL P70a-GFP, and 3.5 µL water
(control), 3.5 µL 2-4-dinitrophenol (DNP, 2.5 mM final concentration in CFPS), or
3.5 µL thenoyltrifluoroacetone (TTA, 1 mM final concentration in CFPS). DNP was
solubilized in water to prepare a 10 mM solution. TTA was solubilized in methanol
to prepare a 100 mM solution and diluted with water to 4 mM before adding to
the CFPS reaction. Negative controls performed with methanol demonstrated that
these solvents did not affect protein synthesis at concentrations used in this study.
Separate CFPS samples were carried out in triplicate for each time point in order to
ensure constant volume throughout the duration of the reaction.
116
5.5.2 Absolute quantification of central carbon metabolites
To quantify central carbon metabolites and amino acids, reaction samples were
quenched with 100% ice-cold ethanol in a 1:1 volumetric ratio. Ethanol precipitated
samples were centrifuged at 12,000 g for 15 minutes at 4 ◦C. The supernatant
was collected and stored at -80 ◦C. Metabolites, involved in glycolysis, pentose
phosphate pathway, tca cycle, and energy metabolism, were quantified by liquid
chromatography-tandem mass spectrometry (LCMS) using an isotope ratio based
approach. Samples were tagged with 12-C aniline, meanwhile internal standards
were tagged with 13-C aniline as described previously [? ]. Briefly, 6 µL of the
supernatant was added to 44 µL of water, followed by 5 µL of 200 mg/mL EDC (N-
(3-dimethylaminopropyl)-N-ethylcarbodiimide hydrochloride) and 5 µL of 12-C 6
M Aniline (pH 4.5). EDC was solubilized in water. The aniline solution was pre-
pared by combining 550 µL of 10.9 M aniline with 337.5 µL water and 112.5 µL of
12 M hydrochloric acid in an Eppendorf tube and vortexed well. The mixture was
gently vortexed at room temperature for 2 hours. In order to stabilize the metabo-
lites, 1.5 µL of TEA (triethylamine) was added. The mixture was centrifuged at
13,500 g for 3 minutes and 25 µL of the supernatant was transferred to a LCMS vial.
The sample mixture was mixed with 25 µL of a standard stock solution containing
35 metabolites at 80 µM tagged with 13-C aniline. The standard stock solution
was tagged with aniline following the same procedure as the sample, except with
13-C aniline. The 35 standard metabolites are listed in the metabolite dataset and
exclude acetic acid, NAD, NADP, FAD, acetyl-CoA, glycerol 3-phosphate, and
117
maltose. Acetic acid was tagged with aniline and quantified with a standard curve
method. NAD, NADP, FAD, acetyl-CoA and glycerol 3-phosphate were not tagged
with aniline, and were quantified by a standard curve method. Samples and stan-
dards were injected at 5 µL onto a Waters Acquity BEH C18 (1.7 µm, 2.1 mm x
150 mm) column. The LCMS system consisted of a Waters Acquity Quaternary
system, a Acquity Sample Manager, and a Acquity QDa detector (Waters Corp,
Medford, MA). The system was controlled by Empower 3 software (Waters). The
autosampler was set at 10 ◦C. Separation was carried out at a flow rate of 0.3
mL/min. The elution started with 95 % mobile phase A (5 mM tributylamine (TBA)
in HPLC-grade water adjusted to pH 4.75 with glacial acetic acid) and 5 % mobile
phase B (5 mM TBA in acetonitrile), raised to 70 % B in 10 minutes, raised to 100 %
B in 2 minutes and held at 100 % B for 3 minutes. Return to initial conditions (95 %
A, 5 % B) over 1 minute and hold for 9 minutes to re-equilibrate the column. The
column was pre-conditioned with the specified gradient protocol 3 times prior to
any injection onto the column. The MS chromatograms were acquired in negative
ion mode with a probe temperature of 520 ◦C, negative capillary voltage of -0.8 kV,
and positive capillary voltage of 0.8 kV.
5.5.3 Amino acid analysis
Amino acids were analyzed using a Waters AccQ-Tag Ultra amino acid analysis
kit (Waters). The ethanol-precipitated CFPS samples were derivatized and tagged
by combining 4 µL of the sample with 6 µL of water, 70 µL borate buffer solution
118
(Waters), and 10 µL reagent (Waters). The solution is then kept in a water bath
at 55 ◦C for 10 minutes. One-microliter is then injected onto an Acquity BEH
C18 column (1.7 µm, 2.1 mm x 100 mm). The elution gradient and flowrate was
conducted according to the manufacturer’s recommendations. Amino acids were
detected by an Acquity TUV detector (Waters) at 260 nm. Amino acids were
identified by known retention times of standards. Concentrations were determined
by comparison with calibration standard curves with the exception of glutamate.
Glutamate was outside the linearity of the calibration standard curves and not
quantified with the Waters AccQ-Tag Ultra amino acid analysis kit.
5.5.4 Glutamate and maltose assays
Glutamate and maltose concentrations were determined using enzymatic colori-
metric assays purchased from Sigma-Aldrich (St. Louis, MO) according to the
manufacturer instruction. The readings were performed with a multimode plate
reader Varioskan Lux (ThermoFisher) using 96-well plates.
5.5.5 Protein quantification
GFP concentrations were determined by fluorescence measurements with compari-
son to a standard curve. Two-microliters of the CFPS reaction were diluted with 33
µL of phosphate-buffered saline and analyzed in triplicate on a black 384-well plate
119
with a multimode plate reader Varioskan Lux (ThermoFisher) at 488 nm excitation
and 535 nm emission.
5.5.6 Enzyme activity assays
Enzyme activity for 6-phsophogluconate dehydrogenase (gnd) and phospho-
enolpyruvate carboxylase (ppc were determined using colorimetric based assays
purchased from BioVision (Milpitas, CA) according to the manufacturer instruction.
All remaining enzyme activity levels were determined using colorimetric and fluo-
rescence based assays purchased from Sigma-Aldrich (St. Louis, MO) according to
the manufacturer instruction. The readings were performed in kinetic mode with a
multimode plate reader Varioskan Lux (ThermoFisher) using clear 96-well plates
for colorimetric assays and black 96-well plates for fluorescence based assays.
5.5.7 Absolute quantification of mRNA
Absolute levels of messenger RNA (mRNA) were quantified using quantitative
real-time RT-PCR with comparison to a standard curve. A standard mRNA of GFP
was prepared by conducting a CFPS reaction with 5 nM P70a-GFP plasmid for 2
hours at 29 ◦C. The reaction was applied to a PureLink RNA Mini Kit with an on-
column PureLink DNase Treatment (ThermoFisher) according to the manufacturer
instruction. The total RNA was eluted with Invitrogen UltraPure DNase/RNase-
120
free water (ThermoFisher). The total RNA was then applied to MICROBExpress
Bacterial mRNA enrichment kit (ThermoFisher) followed by MEGAclear Tran-
scription clean-up kit (ThermoFisher) according to the manufacturer instruction.
The purified mRNA was eluted with UltraPure water. The mRNA concentration
was determined with a a Qubit Fluorometer using a Qubit RNA HS Assay Kit
(ThermoFisher).
To quantify mRNA levels in CFPS samples from the experiment, 1 µL from
the CFPS reaction was applied to the PureLink RNA Mini Kit with an on-column
PureLink DNase Treatment according to the manufacturer instruction. The total
RNA was eluted with 50 µL of UltraPure water. The total RNA sample was diluted
100 times and 2 µL of the diluted sample was loaded for each RT-PCR reaction. The
quantitative real-time RT-PCR reaction was carried out on a Applied Biosystems
QuantStudio 3 with a Taqman RNA-to-Ct 1-Step Kit using GFP Taqman assay
(Mr04329676 mr) on a 96-well plate in triplicate according to the manufacturer
instruction (Applied Biosystems, Life Technologies Corporation, Foster City, CA).
Messenger concentrations were determined by comparison to the calibration stan-
dard curve. The standard curve was generated with the purified mRNA of GFP
ranging from 10−4 to 1 ng. The standard curve had a linearity coefficient of 0.994
and efficiency of 104 %.
121
5.5.8 Formulation of model equations
The dynamic sequence specific flux balance analysis problem was formulated as a
linear program: ( )
max Z = θTv
v
Subject to : (Sv− ẋ) ≥ 0
R (5.1)
ẋi = ∑ σijrj(x, e, k) i = 1, 2, . . . ,M
j=1
0 ≤ vj ≤ rj(x, e, k) j = 1, 2, . . . ,R
where S denotes the stoichiometric matrix (M×R) and σij denotes the stoichio-
metric coefficient for species i in reaction j, v denotes the unknown flux vector
(R × 1), θ denotes the objective vector (R × 1), and rj(x, e, k) denotes the rate
of reaction j. For all metabolic reactions except for the transcription/translation
processes and maltodextrin consumption, reaction j was modeled as the product
of the turnover rate k j and enzyme abundance ej or known as the maximum veloc-
ity of the reaction Vmax (mM/h). The transcription/translation and maltodextrin
consumption reactions were modeled following saturation kinetics. The turnover
rate for each reaction was identified from BRENDA [79] or taken from Adadi
and coworkers [3]. The enzyme abundance was identified for 104 reactions from
Garenne and coworkers [53]. Garenne and coworkers reported the counts of the
enzymes identified in their LC-MS analysis, where we calibrated the counts of
sigma 70 and RNA polymerase to the concentration values [52] to create a calibra-
tion curve. The enzyme abundance was calculated using this calibration curve
122
and all remaining enzymes not identified in Garenne and coworkers were set to
the median value of 50 nM. The enzyme abundance was validated for a subset of
15 enzymes from our enzyme activity assays where we calculated the expected
enzyme abundance in the cell-free reaction by:
V̂max,j
êj = (5.2)
k̂ j
where V̂max,j is the maximum velocity for reaction j from the enzyme activity assay
and k̂ j is the corresponding turnover number for enzyme j. The transcription (TX)
and translation (TL) reactions stoichiometry was modeled based on previous work
[4, 181].
The transcription initiation rate was modeled as:
( )
r max
G
TXinit = VTX (5.3)τTXKTX + (τTX + 1)G
where G denotes the concentration of the DNA plasmid in the cell-free reaction,
KTX denotes a transcription saturation coefficient, and τTX denotes the transcrip-
tion time constant. The maximum transcription rate VmaxTX was formulated as:
[ ( ) ]
VmaxTX ≡
ṅ
R TXTX u (κ) (5.4)lG
where RTX denotes the RNA polymerase concentration, ṅTX denotes the RNA
polymerase elongation rate (nt/h), lG denotes the gene length (nt). The term u (κ)
123
(dimensionless, 0 ≤ u (κ) ≤ 1) is an effective model of promoter activity, where κ
denotes promoter specific parameters. In this study, the promoter model was taken
from Vilkhovoy and coworkers [181] for the P70a promoter. The transcription rate
was modeled as:
x
r sTX = rTXinit ∏ KTX (5.5)sem + xTX s s
where mTX denotes the set of reactants for transcription: ATP, CTP, GTP, and UTP,
and KTXs denotes the saturation constant for species s. The degradation of mRNA
was modeled as a first order rate:
rd = kd · xmRNA (5.6)
where kd denotes the degradation rate constant. The translation initiation and
translation rate was modeled as:
( )
r = Vmax
xmRNA
TL TL (5.7)τTLKTL + (τTL + 1)xmRNA
where xmRNA denotes the concentration of the mRNA, KTL denotes a translation
saturation coefficient, and τTL denotes the translation time constant. The maximum
translation rate VmaxTL was formulated as:
[ ( )]
Vmax
ṅTL
TL ≡ KPRTL (5.8)lP
where KP denotes the polysome amplification constant, RTL denotes the ribosome
124
concentration, ṅTL denotes the ribosome elongation rate (amino acids per hour),
and lP denotes the number of amino acids in the protein of interest. The abundance
of each species x was modeled as:
xt+∆t = xt + Sv∆t (5.9)
where t denotes the current time point and ∆t denotes the time step. Lastly,
we imposed a user configurable bound Bi on the maximum rate of change for
metabolite i where data was available:
|ẋi| ≤ Bi i = 1, 2, . . . ,M (5.10)
The bound Bi was determined by fitting the timecourse concentration data by a
regression spline with the cubic SmoothingSplines package in Julia 1.1. The rate of
change at step t was determined by a forward difference approximation from t to
t + ∆t from the regression spline. Metabolic fluxes were estimated at each time step
using the GNU Linear Programming Kit (GLPK) v4.55 [1]. All parameters are listed
in Table 5.1. In addition, flux bounds were set to the experimental value where data
was available for the corresponding enzyme activity assays. The objective of the cell
free flux balance calculation was to maximize the rate of maltodextrin consumption,
transcription initiation, transcription, mRNA degradation, translation initiation
and translation, unless specified.
125
5.5.9 Quantification of uncertainty
Experimental factors taken from literature, for example macromolecular concentra-
tions or elongation rates, are uncertain. To quantify the influence of this uncertainty
on model performance, we randomly sampled the expected physiological ranges
for these parameters as determined from literature. An ensemble of flux distri-
butions was calculated for the three different cases we considered: control, DNP,
and TTA. The flux ensemble was calculated by randomly sampling the rate of
change for metabolites where data was available, randomly sampling enzyme
abundance, and randomly sampling RNA polymerase levels, ribosome levels,
and elongation rates in a physiological range determined from literature. The
rate of change for metabolites was sampled between the calculated value from
the regression spline to twice its value. The enzyme abundance was randomly
sampled from the estimated value upto 1.5 it’s value. P70 RNA polymerase levels
were sampled between 60 and 75 nM, ribosome levels between 2.0 and 2.3 µM,
the RNA polymerase elongation rate between 15 and 25 nt/s, and the ribosome
elongation rate between 1.0 and 2 aa/s [175, 52]. We generated uniform random
samples between an upper (u) and lower (l) parameter bound of the form:
p∗ = l + (u− l)×U (0, 1) (5.11)
126
5.5.10 Calculation of energy efficiency
Energy efficiency (E ) was calculated as the ratio of transcription and translation
(weighted by the appropriate energy species coefficients) to ATP generation:
∫
E T (vTX · αT∫X + vTL · αTL)= (5.12)
∑ σATPj v̄j
j∈R TATP
αTX = 2 · (ATPTX + CTPTX + GTPTX + UTPTX) (5.13)
αTL = 2 ·ATPTL + GTPTL (5.14)
where αTX denotes the energy cost of transcription, αTL denotes the energy cost
of translation, RATP denotes the set of ATP-producing reactions, σATPj denotes the
ATP coefficient for reaction j, and T denotes the time of the experiment. ATPTX,
CTPTX, GTPTX, and UTPTX denote the stoichiometric coefficients of each energy
species for the transcription of the protein of interest, ATPTL and GTPTL denote
the stoichiometric coefficients of ATP and GTP for the translation of the protein
of interest. During transcription and tRNA charging, triphosphate molecules are
consumed with monophosphates as byproducts; this is the reason for the factors of
2 on ATPTX, CTPTX, GTPTX, and UTPTX, and ATPTL
127
5.5.11 Calculation of carbon yield
The carbon yield (YC) was calculated as the ratio of carbon produced as the protein
divided by the carbon consumed as reactants:
xGFP, f · CY GFPC = (5.15)∑ (xi,o − xi, f ) · Cmi
i∈ms
where xGFP, f denotes the final concentration of GFP, CGFP denotes carbon number
of GFP, ms denotes the set of species that were consumed, xi,o denotes the initial
concentration of species i, xi, f denotes the final concentration of species i, and Cmi
denotes the carbon number of species i.
128
Table 5.1: Parameters for sequence specific flux balance analysis
Description Parameter Value Units Reference
RNA polymerase concentration RTL 60-75 nM [52]
Ribosome concentration RTX 2-2.3 µM [52]
Transcription elongation rate ṅTX 15-25 nt/s [52]
Translation elongation rate ṅTL 1-2 aa/s/ribosome [52, 175]
Transcription time constant τTX 0.021 - 0.05 constant calculated
Translation time constant τTL 0.063 - 0.126 constant calculated
Transcription saturation coefficient KTX 0.3 µM [? ]
Translation saturation coefficient KTL 600.0 µM estimated
Polysome number KP 10 ribosome number estimated
mRNA degradation rate constant kmRNA 2.38 h−1d [52]
Maltodextrin saturation constant Km 8.3 mM BRENDA
Transcription saturation constant KTXs 0.03 mM estimated
Weight RNA polymerase binding alone P70a K1 0.014 constant estimated
Weight bound RNAP-σ70 P70a K2 10 constant estimated
σ70 concentration σ70 35 nM [52]
σ70 dissociation constant KD 130 nM [119]
σ70 hill coefficient n 1 constant [119]
Gene concentration GP 5 nM experiment
ATP transcription coefficient ATPTX 208 constant calculated
CTP transcription coefficient CTPTX 157 constant calculated
GTP transcription coefficient GTPTX 195 constant calculated
UTP transcription coefficient UTPTX 157 constant calculated
ATP tRNA charging coefficient ATPTL 239 constant calculated
GTP translation coefficient GTPTL 478 constant calculated
Carbon number of GFP CGFP 1208 constant calculated
129
CHAPTER 6
TOWARD A GENOME SCALE SEQUENCE SPECIFIC DYNAMIC MODEL
OF CELL-FREE PROTEIN SYNTHESIS IN ESCHERICHIA COLI
6.1 Abstract
1 In this study, we developed a dynamic mathematical model of E. coli cell-free
protein synthesis (CFPS). Model parameters were estimated from a dataset con-
sisting of glucose, organic acids, energy species, amino acids, and protein product,
chloramphenicol acetyltransferase (CAT) measurements. The model was success-
fully trained to predict these measurements, especially those of the central carbon
metabolism. We then used the trained model to evaluate the optimality of protein
production. CAT was produced with an energy efficiency of 12%, suggesting that
the process could be further optimized. Reaction group knockouts showed that
protein productivity was most sensitive to the oxidative phosphorylation and gly-
colysis/gluconeogenesis pathways. Amino acid biosynthesis was also important
for productivity, while overflow metabolism and TCA cycle affected the overall
system state. In addition, translation was more important to productivity than
transcription. Finally, CAT production was robust to allosteric control, as were
most of the predicted metabolite concentrations; the exceptions to this were the
concentrations of succinate and malate, and to a lesser extent pyruvate and acetate,
1The following work has been submitted as: Horvath N, Vilkhovoy M, Wayman JA, Calhoun K,
Swartz J, and Varner JD, , ”Toward a Genome Scale Sequence Specific Dynamic Model of Cell-Free
Protein Synthesis in Escherichia coli” Metabolic Engineering Communications.
130
which varied from the measured values when allosteric control was removed. This
study is the first to use kinetic modeling to predict dynamic protein production in a
cell-free E. coli system, and could provide a foundation for genome scale, dynamic
modeling of cell-free E. coli protein synthesis.
6.2 Introduction
Cell-free protein expression is a widely used tool in systems and synthetic biology,
and a promising technology for personalized point of use biotechnology [137].
Cell-free systems offer many advantages for the study, manipulation and modeling
of metabolism compared to in vivo processes. Central amongst these advantages is
direct access to metabolites and the biosynthetic machinery without the interfer-
ence of a cell wall, or the complications associated with cell growth. Thus, we can
interrogate (and potentially manipulate) the chemical microenvironment while the
biosynthetic machinery is operating, possibly at a fine time resolution. Cell-free
protein synthesis (CFPS) is arguably the most prominent example of a cell-free
system used today [81]. However, CFPS is not new; CFPS in crude E. coli extracts
has been used since the 1960s to explore fundamental biological mechanisms. For
example, Matthaei and Nirenberg used E. coli cell-free extracts in ground-breaking
experiments to decipher the sequencing of the genetic code [118, 128]. Spirin
and coworkers later improved protein production in cell-free extracts by contin-
uously exchanging reactants and products; however, while these extracts could
131
run for tens of hours, they could only synthesize a single product and were energy
limited [159]. More recently, energy and cofactor regeneration in CFPS has been
significantly improved; for example, ATP can be regenerated using substrate-level
phosphorylation [93] or even oxidative phosphorylation [81]. While it was once
debated whether oxidative phosphorylation occurred in cell-free systems, Jewett
and coworkers demonstrated its existence definitively in the Cytomim system by
inhibiting it using electron transport chain and F1FO-ATPase inhibitors, as well as
membrane gradient uncouplers, and observing a significantly lower protein yield
[81]. They hypothesized respiration to be occurring in inner membrane vesicles
created during cell lysis. Today, cell-free systems are used in a variety of appli-
cations ranging from therapeutic protein production [110] to synthetic biology
[70, 73, 137]. Moreover, there are also several CFPS technology platforms, such
as the PANOx-SP and Cytomim platforms developed by Swartz and coworkers
[82, 81], the TXTL platform of Noireaux [52] or the PURE system developed by
Shimizu et al. [152]. However, for point of use cell-free manufacturing to become a
mainstream technology, we must first understand the system performance, and
eventually optimize important metrics such as yield and productivity. A critical
tool towards this goal is mathematical modeling. We previously developed a
constraint-based model of CFPS which integrated the expression of the protein
product with the supply of metabolic precursors and energy [181].
Dynamic mathematical modeling has long contributed to our understanding
of metabolism [184]. Decades before the genomics revolution, mechanistically
structured metabolic models arose from the desire to predict microbial phenotypes
132
resulting from changes in intracellular or extracellular states [48]. The single cell
E. coli models of Shuler and coworkers pioneered the construction of large-scale,
dynamic metabolic models that incorporated multiple regulated catabolic and
anabolic pathways constrained by experimentally determined kinetic parameters
[37]. Shuler and coworkers generated many single cell kinetic models, including
single cell models of eukaryotes [160, 190], minimal cell architectures [29], and
DNA sequence based whole-cell models of E. coli [9]. More recent studies have
extended the approach, from integrating disparate models of cellular processes in
M. genitalium [88], to describing dozens of mutant strains in E. coli with a single
partially kinetic model [90], to identifying industrially useful target enzymes in
E. coli for improved 1,4-butanediol production [5]. Taken together, mathematical
modeling of metabolism has proven useful for applications across systems biology.
However, dynamic metabolic model development is often time consuming, and
model identification and validation requires significant experimental information.
Parameter identification is a challenge to the development of predictive dy-
namic metabolic models. Sethna identified parameter sloppiness as a common
feature of systems biology models; the eigenvalues of the network sensitivity were
distributed across wide ranges, and were not generally aligned with single parame-
ters [21, 59]. This leads to parameter values being unknown despite comprehensive
metabolite information. Furthermore, if direct parameter measurements were at-
tempted, they had to be precise and exhaustive to yield reliable model predictions.
Surprisingly, despite this, models often still accurately predict multiple phenotypes
via collective parameter fitting. Liao and coworkers constructed an ensemble of
133
models across a wide range of kinetic parameters that satisfied thermodynamic
constraints and steady state flux distributions, and selected from within the ensem-
ble those models that described enzyme overexpression datasets [174]. In this way,
specific parameter identification was bypassed, and multiple relevant phenotypes
could be described. Meanwhile, Hatzimanikatis and coworkers employed machine
learning to simplify the parameter estimation problem [6]. They segregated the
feasible-solution parameter space into N-dimensional boxes, via a binary decision
tree which determined the values of parameters. This subsequently allowed for
uniform, non-asymptotic sampling within the subregions; a convenient byproduct
of this approach was a simple estimation of the volume of the solution space.
Taken together, large-scale, descriptive models of prokaryotic metabolism can be
constructed and trained to predict diverse biological behaviors with uncertain
parameter information.
In this study, we developed an ensemble of kinetic cell-free protein synthesis
(CFPS) models using dynamic metabolite measurements from an early glucose
powered Cytomim E. coli cell-free extract. While cell-free technology has evolved
considerably since this data set was generated, developing a model using a pre-
vious generation CFPS platform offers several unique opportunities. First and
foremost, is the ability to directly compare the different improvements established
by purely experimental means, to those estimated using a dynamic mathematical
model. The CFPS model equations were formulated using the hybrid cell-free
modeling framework of Wayman and coworkers [183], which integrates traditional
kinetic modeling with a logical rule-based description of allosteric regulation.
134
Model parameters were estimated from measurements of glucose, organic acids,
energy species, amino acids, and the protein product, chloramphenicol acetyl-
transferase (CAT) over the course of a three hour protein synthesis reaction. A
constrained Markov Chain Monte Carlo (MCMC) approach was used to minimize
the squared difference between model simulations and experimental measure-
ments, where a plausible range for each kinetic parameter was established from
BioNumbers [122]. The ensemble of parameter sets described the training data
with a median cost greater than two orders of magnitude smaller than a population
of random parameter sets constructed using the same literature parameter con-
straints. We then used the ensemble of kinetic models to analyze the performance
of the CFPS system, and to estimate the pathways most important to protein pro-
duction. We calculated that CAT was produced with an energy efficiency of 12%,
suggesting that much of the energy resources for protein synthesis were diverted
to non-productive pathways. By simulating the knockout of metabolic enzyme
groups (this was not actually done experimentally), we showed that metabolism
and protein production in particular depended upon oxidative phosphorylation
and glycolysis/gluconeogenesis. In addition, translation was more important to
productivity than transcription. Lastly, CAT production was robust to allosteric
control, as was most of the network, with the exception of the organic acid trajecto-
ries in central carbon metabolism. Taken together, this study provides a foundation
for sequence specific genome scale, dynamic modeling of cell-free E. coli protein
synthesis.
135
GLC
Pentose Phosphate Pathway (PPP)
G6P 6GP RU5P
F6P XU5P R5P
FBP S7P G3P
E4P F6P
T3P
1,3DPG 2DDG6P
3PG Other modules in the model
Oxidadative phosphorylation
Amino acid biosynthesis and degrdation
2PG Transcription and translation processes
C1 metabolism
Energy metabolism
PEP
PYR Lactate
ACCOA Acetate
OAA CIT
MAL
ICIT
TCA Cycle
FUM AKG
SUCC SUCCCoA
Figure 6.1: Schematic of the core portion of the cell-free E. coli metabolic network.
Metabolites of glycolysis, pentose phosphate pathway, Entner-Doudoroff pathway,
and TCA cycle are shown. Metabolites of oxidative phosphorylation, amino acid
biosynthesis and degradation, transcription/translation, chorismate metabolism,
and energy metabolism are not shown.
136
Glycolysis 
6.3 Results
The cell-free E. coli metabolic network was constructed by removing growth-
associated reactions from the iAF1260 reconstruction of K-12 MG1655 E. coli
[44], and by adding reactions describing chloramphenicol acetyltransferase (CAT)
biosynthesis (Fig. 6.1). In addition, reactions that were knocked out in the host
strain used to prepare the extract were removed from the network (∆speA, ∆tnaA,
∆sdaA, ∆sdaB, ∆gshA, ∆tonA, ∆endA). Lastly, we added transcription and trans-
lation processes for the synthesis of CAT, which were based on template reactions
from earlier work done by Allen and Palsson [4] and more recently Vilkhovoy
et al. [181]. The metabolic network, which contained 148 metabolites and 204
reactions, is available in the supplemental materials. Model equations followed
the hybrid modeling framework of Wayman and coworkers [183], combining mul-
tiple saturation kinetics with a rule-based model of allostery. An ensemble of 100
model parameter sets was estimated from measurements of glucose, CAT, organic
acids, energy species, and 18 of the 20 proteinogenic amino acids [181] using a
constrained Markov Chain Monte Carlo (MCMC) approach. The organic acids
measured included pyruvate, lactate, acetate, succinate, and malate. The energy
species included three phosphorylation states each of the four ribonucleosides: ATP,
ADP, AMP, GTP, GDP, GMP, CTP, CDP, CMP, UTP, UDP, and UMP. Nicotinamide
adenine dinucleotide (NAD(H)) and nicotinamide adenine dinucleotide phosphate
(NADP(H)), while present in the model, were not measured in the dataset. The
model equations and parameter sets, as well as the experimental dataset, are
137
available under an MIT open source software license from the Varnerlab website
[179].
The MCMC algorithm minimized the squared difference (residual) between
the training data and model simulations starting from an initial parameter set
assembled from literature and inspection. Bounds on permissible parameter values
were established using studies from the BioNumbers database [122]. For each
newly generated parameter set, the balance equations were re-solved and the
cost function re-calculated; all sets with a lower cost (and some with higher cost)
were accepted into the ensemble. Parameter sets were also required to meet strict
ordinary differential equation solver tolerances, to ensure numerical stability. Ap-
proximately 3,000 sets were accepted into an initial ensemble; each set contained
204 maximum reaction rates, 204 enzyme activity decay constants, 548 saturation
constants, and 34 control parameters, for a total of 815 parameters. 100 sets were
then selected from this initial ensemble based upon error to form the final param-
eter ensemble. The final ensemble had a mean Pearson correlation coefficient of
0.78; this suggested parameter sets were not over-sampled in the region of a local
minimum. The median maximum reaction rate (Vmax) across the ensemble was
11.6 mM/h, assuming a total cell-free enzyme concentration of approximately 170
nM. This Vmax, which corresponded to a median catalytic rate of 19 s-1 across the
ensemble, was in relative agreement with the 13.7 s-1 median catalytic rate found by
Milo and coworkers [13]. The median enzyme activity decay constant was 0.0045
h-1, corresponding to an enzyme activity half life of approximately 6 days. The
median saturation constant was 1.0 mM; this was within one order of magnitude of
138
the 130 µM reported by Milo and coworkers. Lastly, both the median control gain
and order parameters, which appeared in the allosteric control functions, were on
order 1. While the maximum reaction rates of the ensemble were distributed evenly
across the allowed range (Fig. 6.5A), the saturation constants were clustered around
the upper and lower bounds (Fig. 6.5B) of the parameter search. Taken together, the
constrained MCMC approach estimated a numerically stable ensemble of model
parameters that was on aggregate consistent with literature values. Next, we exam-
ined the model fit to the experimental training data. The ensemble of kinetic CFPS
models captured the time evolution of protein biosynthesis, and the consumption
and production of organic acid, amino acid and energy species. The time evolution
of central carbon metabolites (Fig. 6.2, top), amino acids (Fig. 6.3), and energy
species (Fig. 6.4) were captured by the ensemble and the best-fit parameter set. The
constrained MCMC approach estimated parameter sets with a median error more
than two orders of magnitude less than random parameter sets generated within
the same parameter bounds established from literature (Fig. 6.6). For 29 of the 37
measurements in the training dataset, the mean Akaike information criterion (AIC)
of the predicted ensemble was lower than that of the random sets, signifying a
better fit of the data (Table 6.3). For the remaining eight measurements, the AIC
score of the random ensemble was lower than that of the predicted ensemble,
but the difference was within the standard deviation of the AIC score (with the
exception of isoleucine: œRand = 4.8, ¯Rand EnsAIC AIC − ¯AIC = −5.0). Taken together, these
results suggested that the predicted ensemble modeled cell-free metabolism and
protein production, significantly better than the random ensemble, not just overall
139
Figure 6.2: Central carbon metabolism in the presence (top) and absence (bottom) of
allosteric control, including glucose (substrate), CAT (product), and intermediates,
as well as total concentration of energy species. Best-fit parameter set (orange line)
versus experimental data (points). 95% confidence interval (blue or gray shaded
region) over the ensemble of 100 sets.
but for the majority of individual metabolite and protein measurements. Next, we
analyzed the important features of the cell-free protein synthesis timecourse.
The predicted ensemble of models captured the biphasic time course of CAT
production. During the first hour, glucose powered protein production, and CAT
was produced at 8 µM/h; subsequently, pyruvate and lactate reserves were con-
sumed to power metabolism, and CAT was produced at 5 µM/h. Allosteric control
140
No Control Control
Figure 6.3: Amino acids in the presence of allosteric control. Best-fit parameter
set (orange line) versus experimental data (points). 95% confidence interval (blue
shaded region) over the ensemble of 100 sets.
was important to central carbon metabolism, especially for pyruvate, acetate, and
succinate (Fig. 6.2, bottom). However, CAT production was robust to the removal
of allosteric control. The difference between the allosteric control and no-control
cases was mostly seen in the second (pyruvate-driven) phase of CAT production,
following glucose exhaustion. Specifically, pyruvate, succinate, and malate con-
sumption and acetate accumulation increased with the removal of allosteric control.
141
Figure 6.4: Energy species and energy totals by base in the presence of allosteric
control. Best-fit parameter set (orange line) versus experimental data (points). 95%
confidence interval (blue shaded region) over the ensemble of 100 sets.
The rate of acetate accumulation increased by 172%, while the rates of malate,
pyruvate, and lactate consumption increased by 146%, 82%, and 9%, respectively.
Succinate went from accumulating slightly in the second phase, in the presence
of allosteric control, to being fully consumed. While ATP generation varied when
allosteric control was removed, ATP expenditure toward CAT production did not.
Most of the fluxes that differed between the two cases involved PEP and pyru-
vate, which directly participated in many of the reactions modulated by allosteric
control. Taken together, the ensemble of kinetic models was consistent with time
series measurements of the cell-free production of a model protein. Although the
142
A
Rate maxima (mM/h)
B
Saturation constants (mM)
Figure 6.5: Histograms of model parameters, across the ensemble of 100 sets. A.
Histogram of rate maxima. B. Histogram of saturation constants.
143
Relative frequency Relative frequency
Training
Random
Measured species
Figure 6.6: Log of cost function (residual between training data and model simula-
tions) across 37 datasets for data-trained ensemble (blue) and randomly generated
ensemble (red, gray background). Median (bars), interquartile range (boxes), range
excluding outliers (thin lines), and outliers (circles) for each dataset. Median across
all datasets (large bar overlaid).
ensemble described the experimental data, it was unclear which kinetic parameters
and pathways most influenced metabolism and CAT production. To explore this
question, we performed reaction group knockout analysis.
The importance of CFPS pathways was estimated using pathway group knock-
out analysis (Fig. 6.7). The metabolic network was divided into 19 reaction groups,
spanning central carbon metabolism, energetics, and amino acid biosynthesis. The
144
log(cost function)
A
Glycolysis/Gluconeogenesis 
Pentose Phosphate Pathway 
Entner-Doudoroff 
TCA cycle 
Oxidative phosphorylation  
Cofactors 
Anaplerotic/Glyoxylate reactions 
Overflow metabolism 
Folate metabolism 
Purine/Pyrimidine 
ALA, ASP, ASN biosynthesis 
GLU, GLN biosynthesis 
ARG, PRO biosynthesis 
GLY, SER biosynthesis 
CYS, MET biosynthesis 
LYS, THR biosynthesis 
HIS biosynthesis 
PHE, TRP, TYR biosynthesis 
ILE, LEU, VAL biosynthesis
B
Glycolysis/Gluconeogenesis 
Pentose Phosphate Pathway 
Entner-Doudoroff 
TCA cycle 
Oxidative phosphorylation  
Cofactors 
Anaplerotic/Glyoxylate reactions 
Overflow metabolism 
Folate metabolism 
Purine/Pyrimidine 
ALA, ASP, ASN biosynthesis 
GLU, GLN biosynthesis 
ARG, PRO biosynthesis 
GLY, SER biosynthesis 
CYS, MET biosynthesis 
LYS, THR biosynthesis 
HIS biosynthesis 
PHE, TRP, TYR biosynthesis 
ILE, LEU, VAL biosynthesis
Low High
Figure 6.7: Effect of group knockouts on system. A. Change in CAT productivity
when one (diagonal) or two (off-diagonal) reaction groups are turned off. B. Change
in system state (only species for which data exist) when one (diagonal) or two
(off-diagonal) reaction groups are turned off. Total-order effect for each group
calculated as the sum of first-order effect and all pairwise effects. Larger and
darker circles represent greater effects.
145
Glycolys
P ise /Gnt lo us ce o nP eh oo gE s entne p
n
h er-D a
s
o te
is
u  PT aC thA d o
w
c ry o
ay
c ff
O lexidativ
C e o pf ha oc sto pr hory
A s lan tia op nl  ero
O tiv ce /Grfl lo yw ox m ylatF eo  rl ea et te a b
a
m oe lis
c
m tions
Pu tari bn oe l/ iP sy m
A rL imA i, d A inS e
G P,L  AU S, NG  L bN io syARG bi, o
n
 P s
th
y e
R n
s
O th
is
GL  b
e
i sisY, oS sE yR n
C  
t
b heio ss isYS, M ynE t
L T
h
Y  
e
b s
S i
is
, o T sH ynR t
H  
h
I b
e
S i
sis
 ob sio yns the
P yH ntE h
s
e is, T sR isP
IL , E T, Y L RE  bU i, o V sA ynL t hb eio ss isyn
T tho eta sl i sorder coefficient
response in the productivity (Fig. 6.7A) and overall system state (Fig. 6.7B) was
calculated for single and pairwise deletion of each of these reaction groups. Lastly,
the overall effect of the deletion of a pathway was estimated by summing the
single and pairwise effects (summation across the columns of the response array).
Glycolysis/gluconeogenesis and oxidative phosphorylation had the greatest effect
on both productivity and system state. This supports previous studies that have
suggested oxidative phosphorylation is occurring in a cell-free system [81]; Jewett
and coworkers observed a decrease in CAT yield, ranging from 1.5-fold to 4-fold,
when inhibiting oxidative phosphorylation reactions in the Cytomim cell-free plat-
form, using both pyruvate and glutamate as substrates. CAT productivity was also
affected by two sectors of amino acid biosynthesis: alanine/aspartate/asparagine,
and glutamate/glutamine biosynthesis. Aspartate, glutamate, and glutamine are
key reactants in the biosynthesis of many other amino acids, all of which are re-
quired for CAT synthesis. Meanwhile, the TCA cycle and overflow metabolism
(which included acetyl-coA/acetate reactions and the interconversion of pyruvate
and lactate) also had a significant effect on the system state. These reactions di-
rectly impacted key system species: succinate and malate in the TCA cycle, and
acetate, pyruvate, and lactate in the overflow metabolism. In addition, the relative
influence of transcription and translation parameters was interrogated by global
sensitivity analysis [153]. Productivity was sensitive to the maximum reaction rate
of transcription (coefficient of 0.43 ± 0.06), but was more sensitive to variations in
the maximum reaction rate of translation (0.66 ± 0.08). Thus, translation appeared
to be the limiting step of cell-free protein synthesis.
146
The energy efficiency of CAT production, as well as the sources of energy
generation and consumption, were tracked for the best-fit set. Energy efficiency
was calculated as the ratio of transcription and translation rates (weighted by the
associated ATP costs of each step) to the amount of ATP generated by all sources.
During the first phase of protein production, with glucose as the substrate, CAT
was produced with a productivity of 8 µM/h and an energy efficiency of 10%. The
organic acids that accumulated in the first phase (with the exception of acetate)
were then utilized as substrates in the second phase, once glucose was depleted. We
assumed the second phase of CAT production was powered largely by pyruvate;
although malate was also consumed in the second phase, it accounted for only 11%
of substrate consumption. Lactate accounted for a significant amount of substrate
consumption, but was connected in the stoichiometry only to pyruvate. Thus,
we considered the second phase as pyruvate-driven production. Interestingly,
while this mode of protein production was slower (5 µM/h), it exhibited a higher
energy efficiency (14%). Of the ATP generated, about half was observed to come
from oxidative phosphorylation (R atp) in each of the two phases of production
(Fig. 6.8A, Table 6.1). Another 30% was generated by glycolysis during the first
phase (R pgk,R pyk), which decreased to approximately 20% following glucose
exhaustion. However, glycolysis was also amongst the largest consumers of ATP
during first phase of production (R glk atp, R pfk) (Table 6.2). The TCA cycle
(R sucCD) contributed 3% to the overall rate of ATP generation in the first phase
and 5% in the second. The hypothesis that pyruvate drives the second phase ex-
plains this; stores of accumulated pyruvate can be converted to acetyl-CoA, as well
147
as OAA (via PEP), and thus power the TCA cycle just as when glucose was avail-
able. Interestingly, ATP generation through acetate metabolism (R ackA) increased
from 12% in the first phase to 28% in the second. The switch from glycolysis in the
first phase, to consumption of organic acid reserves and increased acetate accumu-
lation in the second phase, can also be seen in the reaction fluxes surrounding PEP
and pyruvate (Fig. 6.8B). Lastly, amino acid degradation contributed a negligible
amount to energy production. Taken together, while the efficiency of production
was higher for the pyruvate-driven phase, it was still relatively low, suggesting
that there is room for platform optimization. This strengthens the importance
of glycolysis and oxidative phosphorylation, and presents a trade-off between
productivity and energy efficiency in CFPS.
A pgk 0.6
 0.0
 mRNA B 2PG0.5 0.0
0.03

0.7
 3.4
 eno 0.01pyk 0.01 1.5 tRNA 0.6
 pck
0.3
sucCD 0.2
0.1 ATP 0.2
0.2 CTP 0.0
 0.0
0.3 0.0
oxidative PEP OAA2.5
 0.1
 UTP ppcphosphorylation 1.1 0.1 NADP NADPH
ackA 0.6
 0.1

0.06

GTP 0.02 pps
pyk
0.7 0.1 0.7

0.01
 0.4
 0.09

First phase 0.1
 -0.2 0.3
Second phase 0.07 LAC ldh PYR pdh Ac-CoA
NADP NADPH  0.2
 NADP NADPH
(normalized to first-phase -0.1
glucose uptake) Protein
Figure 6.8: Key reaction fluxes of the network, in the first (gray boxes, top row) and
second (gray boxes, bottom row) phases of metabolism. A. Fluxes of ATP genera-
tion and consumption, and GTP consumption toward protein synthesis. B. Fluxes
of glycolysis and lactate and acetate metabolism. Fluxes are normalized to the
first-phase glucose uptake rate. For PEP and pyruvate, accumulation (normalized
to glucose uptake) is also shown.
148
6.4 Discussion
In this study, an ensemble of kinetic cell-free protein synthesis (CFPS) models
was developed using dynamic metabolite measurements from an early glucose
powered Cytomim E. coli cell-free extract. The hybrid cell-free modeling approach
of Wayman and coworkers, [183], which integrates traditional kinetic modeling
with a logic-based description of allosteric regulation, was employed to describe
the time evolution of the CFPS reaction. The ensemble captured dynamic metabo-
lite measurements over two orders of magnitude better than random parameter
sets generated in the same region of parameter space. The ensemble captured
the biphasic time course of CAT production, relying on glucose during the first
hour and pyruvate and lactate following glucose exhaustion. Allosteric control
was essential to the description of the organic acid trajectories; without allosteric
control, pyruvate, lactate, succinate, and malate were predicted to be consumed
more quickly following glucose exhaustion, to power CAT synthesis. However,
CAT production was robust to the removal of allosteric control because the amino
acids and energy species that are reactants for CAT synthesis were also not affected
by allosteric control. The ensemble of kinetic models was then used to analyze the
performance of the CFPS system, and to estimate the pathways most important to
protein production. CAT was produced with an approximate aggregate energy ef-
ficiency of 12%, suggesting that much of the energy resources for protein synthesis
were diverted to non-productive pathways. By knocking out metabolic enzymes
in groups, it was shown that metabolism and protein production in particular de-
149
pended upon oxidative phosphorylation and glycolysis /gluconeogenesis. Lastly,
global sensitivity analysis suggested that the translation rate was more important
to protein productivity than transcription. Taken together, this study provides
a foundation for sequence-specific genome scale, dynamic modeling of cell-free
E. coli protein synthesis that could be adapted to model the production of other
proteins and synthetic circuits.
The ensemble of models could serve as a surrogate to rationally design cell-
free production processes to optimize production rate and energy efficiency. In
analyzing the effect of reaction groups on CAT production and the system state, the
regions of metabolism associated with substrate utilization and energy generation
were the most important. Oxidative phosphorylation was vital, since it provided
most of the energetic needs of CFPS. While it is unknown how active oxidative
phosphorylation is compared to that of in vivo systems, this study suggested it was
critical to CFPS performance. However, the biphasic operation of CFPS highlights
the ability of the system to respond to an absence of glucose. During the first phase,
central carbon metabolites accumulated with the majority of flux going toward
acetate and some toward pyruvate, lactate, succinate and malate. While acetate
continued to accumulate as a byproduct, the other organic acids were consumed as
secondary substrates after glucose was no longer available. Glutamate also served
as a substrate throughout both phases, powering amino acid synthesis. These
results confirmed experimental findings that CAT production can be sustained
by other substrates in the absence of glucose, providing alternative strategies
to optimize CFPS performance. While CAT synthesis can be powered by other
150
substrates, the productivity was lower (5 µM/h, as opposed to 8 µM/h). This
is in accordance with literature, where pyruvate provided a relatively slow but
continuous supply of ATP [162]. Taken together, this shows CFPS can be designed
towards a specified application, either requiring a slow stable energy source or
faster production.
Presented herein is the first dynamic model of E. coli cell-free protein synthesis.
A hybrid modeling framework was applied to describe an experimental dataset for
production of a model protein [181] and identified system limitations and areas of
improvement for production efficiency. Having captured the system dynamics, ar-
eas of improvement for CFPS performance were investigated. The model predicted
CAT production with an energy efficiency of 10% under glucose consumption
and 14% under pyruvate consumption. The accumulation of glycolytic interme-
diates and byproducts such as acetate and carbon dioxide was responsible for
this sub-optimal performance. If fluxes could be balanced such that intermediates
were fully utilized, CAT production would increase. Theoretical estimations of the
energy efficiency of an in vivo system can be as high as 80%, as found by our group
[181] and others [116]. However, the corresponding experimental values are much
lower; 16% in the case of our experimentally-constrained sequence-specific model
[181]. Knocking out sections of network metabolism revealed that glycolysis/
gluconeogenesis and oxidative phosphorylation were the most important to CAT
production and the system as a whole. Productivity was also heavily dependent on
the synthesis reactions of alanine, aspartate, asparagine, glutamate, and glutamine,
while TCA cycle and overflow reactions affected the system state. These findings
151
represent the first dynamic model of E. coli cell-free protein synthesis, an important
step toward a functional genome scale description of cell-free systems. This work
could be extended through further experimentation to gain a deeper understanding
of system performance under a variety of conditions. Specifically, CAT produc-
tion performed in the absence of amino acids could inform the system’s ability
to synthesize them, while experimentation in the absence of glucose or oxygen
could shed light on the importance of those substrates. Another extension of this
study would be to apply its insights to other protein applications. CAT is only a
test protein used for model identification; the modeling framework, and to some
extent the parameter values, should be protein agnostic. However, it should be
noted that the fully kinetic approach resulted in a model that was computationally
expensive to solve, difficult to characterize, and arduous to interrogate. Future
applications may benefit from alternate modeling strategies. For example, our
group also employed a dynamic constraint-based approach to model CFPS [35].
This involved constraining the problem to hundreds of different combinations of
measurements, and solving the model for each. That approach also captured the
dynamics, and allowed the question of which measurements might best charac-
terize a system to be explored. Approaching that question using the fully kinetic
approach would have been untenable. However, constraint-based approaches
depend on the accuracy of the measurements to which they are constrained. A
kinetic approach can theoretically predict dynamics in the absence of data, if param-
eters are well identified. Taken together, the dynamics of multiphasic metabolism
and protein synthesis in CFPS were accurately captured, and the importance of
152
various pathways was interrogated toward improvement of production; however,
other modeling approaches have advantages that make them well suited for future
endeavors.
6.5 Materials and Methods
6.5.1 Cell-free protein synthesis and measurement.
The protein synthesis reaction was conducted using a modified version of the
PANOxSP protocol [82]. Briefly, the protein synthesis reaction was performed
using the S30 extract in 1.5-mL Eppendorf tubes (working volume of 15 µL) and
incubated in a humidified incubator at 37 ◦C. Plasmid pK7CAT was used as the
DNA template for chloramphenical acetyl transferase (CAT) expression by placing
the cat gene between the T7 promoter and the T7 terminator [92]. The plasmid was
isolated and purified using a Plasmid Maxi Kit (Qiagen, Valencia CA). Cell-free
reaction samples were quenched at specific timepoints with equal volumes of
ice-cold 150 mM sulfuric acid to precipitate proteins. Protein synthesis of CAT was
determined from the total amount of 14C-leucine-labeled product by trichloroacetic
acid precipitation followed by scintillation counting as described previously [25].
Samples were centrifuged for 10 min at 12,000g and 4◦C. The supernatant was
collected for high performance liquid chromatography (HPLC) analysis. HPLC
analysis (Agilent 1100 HPLC, Palo Alto CA) was used to separate nucleotides
153
and organic acids, including glucose. Compounds were identified and quantified
by comparison to known standards for retention time and UV absorbance (260
nm for nucleotides and 210 nm for organic acids) as described previously [25].
The standard compounds quantified with a refractive index detector included
inorganic phosphate, glucose, and acetate. Pyruvate, malate, succinate, and lactate
were quantified with the UV detector. The stability of the amino acids in the
cell extract was determined using a Dionex Amino Acid Analysis (AAA) HPLC
System (Sunnyvale, CA) that separates amino acids by gradient anion exchange
(AminoPac PA10 column). Compounds were identified with pulsed amperometric
electrochemical detection and by comparison to known standards. More details
are available in the Materials and Methods section of Vilkhovoy et al. [181].
6.5.2 Formulation and solution of the model equations.
Cell-free protein synthesis was modeled using ordinary differential equations
(ODEs) to estimate the time evolution of metabolite (xi), scaled enzyme activity
(ei), transcription (m) and translation (P) in an E. coli cell-free metabolic network:
dx Ri = ∑ σijrj (x, ffl, k) i = 1, 2, . . . ,M (6.1)dt j=1
dei = −λiei i = 1, 2, . . . , E (6.2)dt
dm
= r̄Tu− r̄d (6.3)dt
dP
= r̄X (6.4)dt
154
The quantityR denotes the number of metabolic reactions,M denotes the number
of metabolites and E denotes the number of metabolic enzymes in the model. The
quantity rj (x, ffl, k) denotes the rate of reaction j. Typically, reaction j is a non-
linear function of metabolite and enzyme abundance, as well as unknown kinetic
parameters k (K × 1). The quantity σij denotes the stoichiometric coefficient for
species i in reaction j. If σij > 0, metabolite i is produced by reaction j. Conversely,
if σij < 0, metabolite i is consumed by reaction j, while σij = 0 indicates metabolite
i is not connected with reaction j. Lastly, λi denotes the scaled enzyme activity
decay constant. The system material balances were subject to the initial conditions
x (to) = xo and ffl (to) = 1 (initially we have 100% cell-free enzyme activity).
Metabolic reaction rates were written as the product of a kinetic term (r̄j) and a
control term (vj), rj (x, k) = r̄jvj. We used multiple saturation kinetics to model the
reaction term r̄j:
x
r̄ max sj = Vj ei ∏ (6.5)
s∈m− Kjs + xsj
where Vmaxj denotes the maximum rate for reaction j, ei denotes the scaled enzyme
activity which catalyzes reaction j, Kjs denotes the saturation constant for species s
in reaction j, and m−j denotes the set of reactants for reaction j.
The control term 0 ≤ vj ≤ 1 depended upon the combination of factors which
influenced rate process j. For each rate, we used a rule-based approach to select
from competing control factors. If rate j was influenced by 1, . . . , m factors, we
( )
modeled this relationship as vj = Ij f1j (·) , . . . , fmj (·) where 0 ≤ fij (·) ≤ 1 de-
155
notes a transfer function quantifying the influence of factor i on rate j. The function
Ij (·) is an integration rule which maps the output of regulatory transfer functions
to a control variable. We used Hill-like transfer functions and Ij ∈ {mean} in
this study [183]. We included 17 allosteric regulation terms, taken from literature,
in the CFPS model. PEP was modeled as an inhibitor for phosphofructokinase
[99, 24], PEP carboxykinase [99], PEP synthetase [99, 32], isocitrate dehydrogenase
[99, 130], and isocitrate lyase/malate synthase [99, 130, 114], and as an activator for
fructose-biphosphatase [99, 39, 67, 68]. AKG was modeled as an inhibitor for citrate
synthase [99, 138, 142] and isocitrate lyase/malate synthase [99, 114]. 3PG was
modeled as an inhibitor for isocitrate lyase/malate synthase [99, 114]. FDP was
modeled as an activator for pyruvate kinase [99, 198] and PEP carboxylase [99, 189].
Pyruvate was modeled as an inhibitor for pyruvate dehydrogenase [99, 85, 8] and
as an activator for lactate dehydrogenase [132]. Acetyl-CoA was modeled as an
inhibitor for malate dehydrogenase [99].
The symbol r̄T denotes the transcription rate, u denotes a promoter specific
activation model, and r̄d denotes the transcript degradation rate. The transcription
rate was modeled as:
( )
T GP xr̄ sT = kcat · RT T ∏ (6.6)KG + G KTP + xs∈m− s sT
where kTcat denotes the maximum transcription rate, RT denotes the RNA poly-
merase concentration, GP denotes the gene concentration, KTG denotes the gene
156
saturation constant, KTs denotes the saturation constant for species s, and m
−
T de-
notes the set of reactants for transcription: ATP, GTP, CTP, UTP, and water. In
this study, we considered only the T7 promoter; we have previously estimated
u '0.95 for T7 [181]. Transcription was modeled as saturating with respect to
gene concentration. However, transcription was not considered to result in any
depletion of gene. Transcript degradation was modeled as first-order in transcript:
r̄d = kd ·m (6.7)
where kd denotes the transcript degradation rate constant.
The symbol r̄X denotes the translation rate, which was modeled as:
( )
r̄ = kX
m xs
X cat · RX X ∏ X (6.8)KmRNA + m ∈ − Ks + xs m sX
where kXcat denotes the maximum translation rate, RX denotes the ribosome con-
centration, m denotes the transcript concentration, KXmRNA denotes the transcript
saturation constant, KXs denotes the saturation constant for species s, and m
−
X de-
notes the set of reactants for translation: GTP, water, and the 20 species representing
tRNA charged with amino acids. Translation was modeled as saturating with
respect to transcript concentration. However, translation was not considered to
result in any depletion of transcript.
157
6.5.3 Estimation of kinetic model parameters.
We estimated an ensemble of kinetic parameter sets using a constrained Markov
Chain Monte Carlo (MCMC) random walk strategy. We have used this tech-
nique previously to estimate numerically stable low-error parameter sets for signal
transduction models [168, 169]. Starting from a small number of parameter sets
estimated by inspection and literature, we calculated the cost function, equal to the
sum-squared-error between experimental data and model predictions:
[ T ( ) ]D w i 2
cost = ∑ iY2 ∑ yij − xi|t(j) (6.9)i=1 i j=1
where D denotes the number of datasets (D = 37), wi denotes the weight of the
ith dataset, Ti denotes the number of timepoints in the ith dataset, t(j) denotes
the jth timepoint, yij denotes the measurement value of the ith dataset at the jth
timepoint, and xi|t(j) denotes the simulated value of the metabolite corresponding
to the ith dataset, interpolated to the jth timepoint. Lastly, the cost function was
( )
scaled by the maximum experimental value in the ith dataset, Yi = maxj yij . We
then perturbed each model parameter between an upper and lower bound that
varied by parameter type:
knewi = min (max (ki · exp(a · ri), li) , ui) i = 1, 2, . . . ,P (6.10)
158
where P denotes the number of parameters (P = 815), which includes 204 maxi-
mum reaction rates (Vmax), 204 enzyme activity decay constants, 548 saturation
constants (Kjs), and 34 control parameters, knewi denotes the new value of the i
th
parameter, ki denotes the current value of the ith parameter, a denotes a distribution
variance, ri denotes a random sample from the normal distribution, li denotes the
lower bound for that parameter type, and ui denotes the upper bound for that
parameter type. Model parameters were constrained by literature collected using
the BioNumbers database [122]. Transcription, translation, and mRNA degradation
were bounded within a factor of two of their reference values. A characteristic
cell-free enzyme concentration of 170 nM was calculated by diluting the one-tenth
maximal concentration of lacZ (5 µM, BNID 100735) by a cell-free dilution factor
of 30. This enzyme level was then used to calculate rate maxima from turnover
numbers for various enzymes from BioNumbers (Table 6.4). Enzyme levels calcu-
lated from the rate maxima of select reaction fluxes in the best-fit set and catalytic
rates reported in the MOMENT study of Shlomi and coworkers [3] (Table 6.5) had
a median value of 202 nM, well in agreement with this characteristic value. Rate
maxima were bounded within one order of magnitude of the reference value where
available; all other rate maxima were bounded within two orders of magnitude
of the geometric mean of the available values. Enzyme activity decay constants
were bounded between 0 and 1 h-1, corresponding to half lives of infinity and 42
minutes, respectively. Saturation constants were bounded between 0.0001 and 10
mM. Control gain parameters were bounded between 0.05 and 10 (dimensionless),
while order parameters were bounded between 0.02 and 10 (dimensionless).
159
For each newly generated parameter set, we re-solved the balance equations
and calculated the cost function. All sets with a lower cost were accepted into the
ensemble. Sets with a higher cost were also accepted into the ensemble, if they
satisfied the acceptance constraint:
( )
Runi f orm − · costnew − cost0,1 < exp α (6.11)cost
where Runi f orm0,1 denotes a random number taken from a uniform distribution
between 0 and 1, cost denotes the cost of the current parameter set, costnew
denotes the cost of the new parameter set, and α denotes a tunable parameter to
control the tolerance to high-error sets. A total of 3,875 sets were accepted into the
initial ensemble, from which we selected N = 100 with minimal error for the final
ensemble.
Lastly, a random ensemble of 100 parameter sets was generated within the
same parameter bounds as the trained ensemble. The randomized parameter sets
were generated using a Monte Carlo approach: each parameter was taken from a
uniform distribution constructed between its upper and lower bounds. The model
equations were then solved and the cost function and the Akaike information
criterion (AIC) were calculated for each of the 37 separate experimental datasets.
160
6.5.4 Reaction group knockouts.
The metabolic network was divided into 19 reaction groups: glycolysis/
gluconeogenesis, pentose phosphate, Entner-Doudoroff, TCA cycle, oxidative
phosphorylation, cofactor reactions, anaplerotic/glyoxylate reactions, overflow
metabolism, folate synthesis, purine/pyrimidine reactions, alanine/aspartate/
asparagine synthesis, glutamate/glutamine synthesis, arginine/proline synthesis,
glycine/serine synthesis, cysteine/methionine synthesis, threonine/lysine synthe-
sis, histidine synthesis, tyrosine/tryptophan/phenylalanine synthesis, and valine/
leucine/isoleucine synthesis. Each reaction group and pair of reaction groups were
removed and the model was re-solved; the CAT productivity was then calculated
and subtracted from that of the base case (no knockouts):
Pii = |∆CAT− ∆CAT∆Ri | (6.12)
Pij = |∆CAT− ∆CAT∆Ri∆Rj | (6.13)
Ptotali = Pii + ∑ Pij (6.14)
j
where Pii denotes the first-order productivity knockout effect for reaction group i,
Pij denotes the pairwise productivity knockout effect for reaction groups i and j,
Ptotali denotes the total-order productivity knockout effect for reaction group i, ∆CAT
denotes the base case CAT productivity, ∆CAT∆Ri denotes the CAT productivity
when reaction group i is knocked out, ∆CAT∆Ri∆Rj denotes the CAT productivity
161
when reaction groups i and j are knocked out, and |x| denotes the absolute value
of x. The system state, defined as the model predictions for all species for which
experimental data exists, was also recorded for each knockout and compared to
the base case:
S = ||xdata − xdataii ∆R ||2 (6.15)i
S = ||xdata − xdataij ∆R ∆R || (6.16)i j 2
Stotali = Sii + ∑ Sij (6.17)
j
where Sii denotes the first-order system state knockout effect for reaction group i,
Sij denotes the pairwise system state knockout effect for reaction groups i and j,
Stotali denotes the total-order system state knockout effect for reaction group i, x
data
denotes the base-case system state, xdata∆R denotes the system state when reactioni
group i is knocked out, xdata∆R ∆R denotes the system state when reaction groups ii j
and j are knocked out, and ||x||2 denotes the l2 norm of x. In order to not dominate
the colorbar, the total-order knockout effects were normalized to the same ranges
as the main arrays (first-order and pairwise effects).
162
6.5.5 Sensitivity of CAT productivity to transcription and trans-
lation.
The catalytic rates of transcription and translation were sampled within one order
of magnitude on each side from the best-fit values. The parameter bounds were set
as the base-10 logarithms of the upper and lower bound for each rate; then, 10 was
taken to the power of each parameter sample to obtain the catalytic rates:
[ ( ) ( )]
kT,sample ∈ log kT,b f T,b fcat 10 cat /10 , log10 kcat ∗ 10 (6.18)
[ ( ) ( )]
kX,samplecat ∈ log k
X,b f
10 cat /10 , log k
X,b f
10 cat ∗ 10 (6.19)( )
kT,sample kX,sample∆CAT = f 10 cat , 10 cat (6.20)
where kT,samplecat denotes the sample of the transcription catalytic rate, k
X,sample
cat
denotes the sample of the translation catalytic rate, kT,b fcat denotes the best-fit value of
the transcription catalytic rate, and kX,b fcat denotes the best-fit value of the translation
catalytic rate. The sampling was performed using the Sensitivity Analysis Library
in Python (Numpy) with 3,000 samples [65].
163
6.5.6 Calculation of energy efficiency.
Energy efficiency was calculated as the ratio of transcription and translation
(weighted by the appropriate energy species coefficients) to ATP generation:
∆τmRNA · αT∫+ ∆τCAT · αEfficiency = X (6.21)
∑ σATPj r̄j
j∈{R τATP}
αT = 2 · (ATPT + CTPT + GTPT + UTPT) (6.22)
αX = 2 ·ATPX + GTPX (6.23)
where ∆τmRNA denotes the net accumulation of mRNA in phase τ (first, second,
or overall), ∆τCAT denotes the net accumulation of protein in phase τ, αT denotes
the energy cost of transcription, αX denotes the energy cost of translation, RATP
denotes the set of ATP-producing reactions, and σATPj denotes the ATP coefficient
for reaction j. ATPT, CTPT, GTPT, UTPT denote the stoichiometric coefficients of
each energy species for transcription, and ATPX, GTPX denote the stoichiometric
coefficients of ATP and GTP for translation. During transcription and tRNA charg-
ing, triphosphate molecules are consumed with monophosphates as byproducts;
this is the reason for the factors of 2 on ATPT, CTPT, GTPT, UTPT, and ATPX.
164
6.5.7 Availability of model code.
The cell-free model equations and the parameter estimation procedure were imple-
mented in the Julia programming language [16]. The model equations were solved
using the CVODE solver of the SUNDIALS suite [66], with an absolute tolerance
and relative tolerance of 1e−9; any parameter sets exhibiting CVODE errors were
discarded. Thus, the numerical stability of all parameter sets in the ensemble was
ensured. The model code and parameter ensemble is freely available under an MIT
software license and can be downloaded from the Varnerlab website [179].
6.6 Acknowledgements
This study was supported by a National Science Foundation Graduate Research
Fellowship (DGE-1333468) to N.H. Research reported in this publication was also
supported by the Systems Biology Coagulopathy of Trauma Program with support
from the US Army Medical Research and Materiel Command under award number
W911NF-10-1-0376.
165
Table 6.1: Breakdown of ATP generation. Flux through ATP-generating pathways
in the first and second phases as percentages of total ATP generation in that phase.
Name Index Reaction Phase 1 Phase 2
R pgk 12 13DPG + ADP →3PG + ATP 14% 21%
R pyk 18 ADP + PEP →ATP + PYR 16% <1%
R sucCD 45 ADP + Pi + SUCCOA →ATP + COA + SUCC 3% 5%
R atp 55 ADP + Pi + 4 He →ATP + 4 H + H O 54% 46%2
R ackA 68 ACTP + ADP →AC + ATP 12% 28%
R asn deg 102 ASN + AMP + PPi →NH + ASP + ATP <1% <1%3
R thr deg3 109 THR + Pi + ADP →NH3 + FOR + ATP + PROP <1% <1%
166
Table 6.2: Breakdown of ATP consumption. Flux through ATP-consuming path-
ways in the first and second phases as percentages of total ATP consumption in
that phase.
Name Index Reaction Phase 1 Phase 2
R glk atp 1 ATP + GLC →ADP + G6P + H 22% <1%
R pfk 4 ATP + F6P →ADP + FBP 24% <1%
R pps 22 ATP + H2O + PYR →AMP + PEP + P 1% 1%i
R acs 70 AC + ATP + COA →ACCOA + AMP + PP 8% 19%i
R glnA 86 GLU + ATP + NH3 →GLN + ADP + P 1% 2%i
R atp amp 152 ATP + H2O →AMP + PP 6% 13%i
R udp utp 160 UDP + ATP →UTP + ADP 3% 6%
R cdp ctp 161 CDP + ATP →CTP + ADP 4% 8%
R gdp gtp 162 GDP + ATP →GTP + ADP 3% 4%
R atp ump 163 ATP + UMP →ADP + UDP 1% 3%
R atp cmp 164 ATP + CMP →ADP + CDP 2% 3%
R adk atp 166 AMP + ATP →2 ADP 18% 35%
tRNA
charg- 185-204 AA + tRNA + ATP + H2O →
ing AA·tRNA + AMP + PP
2% 2%
i
Other 4% 4%
167
Table 6.3: Mean and standard deviation of Akaike information criterion (AIC), by
measurement, for the ensemble and random ensemble.
Measurement ¯Ens œEns ¯Rand œRand Rand EnsAIC AIC AIC AIC ¯AIC − ¯AIC
GLC 65.4 2.1 103.9 0.6 38.5
CAT -23.0 10.5 -5.2 <0.1 17.8
PYR 64.8 10.3 84.7 0.7 19.9
LAC 70.7 4.5 88.9 <0.1 18.2
AC 79.4 6.0 96 2.1 16.6
SUCC 59.6 3.4 55.5 4.1 -4.1
MAL 60.8 4.1 71.6 6.3 10.8
ATP 51.1 3.3 69.1 <0.1 18.0
ADP 39.8 3.7 53.2 4.7 13.4
AMP 32.9 1.5 75.1 5.7 42.2
GTP 53.4 1.6 68.2 <0.1 14.8
GDP 45.7 2.9 43.6 9.5 -2.1
GMP 46.5 4.2 46.1 12.5 -0.4
CTP 44.9 2.6 58.5 <0.1 13.7
CDP 38.8 1.6 50.7 8.2 11.8
CMP 32.1 4.0 51.9 9.1 19.8
UTP 55.6 5.2 53 <0.1 -2.7
UDP 28.2 4.6 51.9 11.5 23.6
UMP 35.3 3.3 72.3 7.3 36.9
ALA 66.4 4.4 100.5 1.1 34.1
ASN 53.7 1.5 67.6 3.8 13.8
ASP 65.9 2.5 79.5 <0.1 13.6
CYS 60.5 3.1 74 <0.1 13.5
GLN 54.3 5.6 84.7 <0.1 30.4
GLY 47.2 12.7 75.5 11.7 28.3
HIS 46.3 6.2 43.2 3.2 -3.2
ILE 53.3 3.8 48.4 4.8 -5.0
LEU 41.5 6.5 52.5 4.6 10.9
LYS 68.4 2.0 73.9 0.2 5.5
MET 55.9 1.0 57.4 4 1.5
PHE 43.4 5.9 57.7 8.3 14.3
PRO 54.4 2.8 47.9 6.7 -6.5
SER 65.9 4.1 81.4 <0.1 15.6
THR 28.2 5.5 63.2 14.9 35.0
TRP 31.2 5.7 79.9 1.4 48.6
TYR 39.3 2.0 36.7 5.4 -2.6
VAL 51.3 3.1 55.5 4.6 4.1
168
Table 6.4: Reference values for reaction rate maxima (Vmax) from BioNumbers.
Vmax values calculated from turnover numbers (kcat) from BioNumbers, and a
characteristic enzyme concentration of 170 nM. Characteristic rate maximum for
all other reactions calculated as geometric mean of calculated rate maxima.
.
Enzyme Reaction k -1cat (min ) Vmax (mM/h) BNID#
Serine dehydrase R ser deg 10400 104 101119
Isocitrate dehydrogenase R icd 11900 119 101152
Lactate dehydrogenase R ldh 5800 58 101036
R aspC
Aspartate transaminase R tyr 25800 258 101108
R phe
Enolase R eno 13200 132 101028
Pyruvate kinase R pyk 25000 250 101029101030
Malic enzyme R maeAR maeB 35400 354 101167
Phosphofructokinase R pfk 554400 5544 104955
Malate dehydrogenase R mdh 33000 330 101163
Citrate Synthase R gltA 42000 420 101149
R zwf
6PG dehydrogenase R pgl 3200 32 101048
R gnd
Succinate dehydrogenase R sdh 121 1.21 101162
Succinyl-coA synthetase R sucCD 4700 47 101158
3PGA dehydrogenase R gpm 1100 11 101135
PEP carboxylase R ppc 35400 354 101139
3PGA kinase R pgk 4300 43 101016
Characteristic Vmax 110
169
Table 6.5: Enzyme levels for key reaction fluxes, calculated from enzyme turnover
numbers [3] and rate maxima from the best-fit set.
.
Enzyme Reaction kcat (min
-1), V (mM/h), Enzymemax
MOMENT best-fit set Level (nM),calculated
Isocitrate dehydrogenase R icd 1700 37 356
Lactate dehydrogenase R ldh 52500 35 11
Aspartate transaminase R aspC 4900 39 130
Pyruvate kinase R pyk 8100 610 1250
Malic enzyme R maeA 8100 46 96
Malic enzyme R maeB 4000 66 274
Phosphofructokinase R pfk 5000 15600 51800
Malate dehydrogenase R mdh 43700 33 13
Succinate dehydrogenase R sdh 10000 4.9 8.2
Succinyl-coA synthetase R sucCD 1500 250 2690
Median 202
170
Table 6.6: Reference values for transcription, translation, and mRNA degradation
from literature. Transcription rate calculated from elongation rate, mRNA length,
and promoter activity level. Translation rate calculated from elongation rate,
protein length, and polysome amplification constant. mRNA degradation rate
calculated from mRNA degradation time.
Description Parameter Value Units Reference
T7 RNA polymerase concentration RT 1.0 µM
Ribosome concentration RX 2 µM [52]
Transcription saturation coefficient KT 100 nM estimated
Translation saturation coefficient KX 45 µM estimated
Transcription elongation rate v̇T 25 nt/s [52]
CAT mRNA length lG 660 nt [92]
Promoter activity level (u ) 0.9 estimated
v̇
Transcription rate kT = Tcat u 123 h
-1 calculated
lG
Translation elongation rate v̇X 1.5 aa/s [52]
CAT protein length lP 219 aa [92]
Polysome amplification constant (KP ) 10 estimated
v̇
Translation rate kX X -1cat = KP 247 h calculatedlP
mRNA degradation time t1/2 8 min BNID 106253
ln(2)
mRNA degradation rate k -1deg = 5.2 h calculatedt1/2
ATP transcription coefficient ATPT 176 calculated
CTP transcription coefficient CTPT 144 calculated
GTP transcription coefficient GTPT 151 calculated
UTP transcription coefficient UTPT 189 calculated
ATP tRNA charging coefficient ATPX 219 calculated
GTP translation coefficient GTPX 438 calculated
171
CHAPTER 7
JUPOETS: A CONSTRAINED MULTIOBJECTIVE OPTIMIZATION
APPROACH TO ESTIMATE BIOCHEMICAL MODEL ENSEMBLES IN THE
JULIA PROGRAMMING LANGUAGE
7.1 Abstract
1 Ensemble modeling is a promising approach for obtaining robust predictions
and coarse grained population behavior in deterministic mathematical models.
Ensemble approaches address model uncertainty by using parameter or model
families instead of single best-fit parameters or fixed model structures. Parameter
ensembles can be selected based upon simulation error, along with other criteria
such as diversity or steady-state performance. Simulations using parameter ensem-
bles can estimate confidence intervals on model variables, and robustly constrain
model predictions, despite having many poorly constrained parameters. In this
software note, we present a multiobjective based technique to estimate param-
eter or models ensembles, the Pareto Optimal Ensemble Technique in the Julia
programming language (JuPOETs). JuPOETs integrates simulated annealing with
Pareto optimality to estimate ensembles on or near the optimal tradeoff surface
between competing training objectives. We demonstrate JuPOETs on a suite of
multiobjective problems, including test functions with parameter bounds and sys-
1Adapted with permission from Bassen DM, Vilkhovoy M, Minot M, Butcher JT and Varner
JD, ”JuPOETs: a constrained multiobjective optimization approach to estimate biochemical model
ensembles in the Julia programming language” (2017) BMC Systems Biology, 11(10).
172
tem constraints as well as for the identification of a proof-of-concept biochemical
model with four conflicting training objectives. JuPOETs identified optimal or near
optimal solutions approximately six-fold faster than a corresponding implementa-
tion in Octave for the suite of test functions. For the proof-of-concept biochemical
model, JuPOETs produced an ensemble of parameters that gave both the mean of
the training data for conflicting data sets, while simultaneously estimating parame-
ter sets that performed well on each of the individual objective functions. JuPOETs
is a promising approach for the estimation of parameter and model ensembles
using multiobjective optimization. JuPOETs can be adapted to solve many problem
types, including mixed binary and continuous variable types, bilevel optimization
problems and constrained problems without altering the base algorithm. JuPOETs
is open source, available under an MIT license, and can be installed using the Julia
package manager from the JuPOETs GitHub repository
7.2 Introduction
Ensemble modeling is a promising approach for obtaining robust predictions and
coarse grained population behavior in deterministic mathematical models. It is
often not possible to uniquely identify all the parameters in biochemical models,
even when given extensive training data [50]. Thus, despite significant advances
in standardizing biochemical model identification [54], the problem of estimat-
ing model parameters from experimental data remains challenging. Ensemble
173
approaches address parameter uncertainty in systems biology and other fields like
weather prediction [14, 100, 21, 135] by using parameter families instead of single
best-fit parameter sets. Parameter families can be selected based upon simulation
error, along with other criteria such as diversity or steady-state performance. Sim-
ulations using parameter ensembles can estimate confidence intervals on model
variables, and robustly constrain model predictions, despite having many poorly
constrained parameters [59, 158]. There are many techniques to generate parameter
ensembles. Battogtokh et al., Brown et al., and later Tasseff et al. generated experi-
mentally constrained parameter ensembles using a Metropolis-type random walk
[14, 21, 168, 169]. Liao and coworkers developed methods to generate ensembles
that all approach the same steady-state, for example one determined by fluxomics
measurements [174]. They have used this approach for model reduction [? ], strain
engineering [33, 165] and to study the robustness of non-native pathways and
network failure [105]. Maranas and coworkers have also applied this method to
develop a comprehensive kinetic model of bacterial central carbon metabolism,
including mutant data [91]. We and others have used ensemble approaches, gen-
erated using both sampling and optimization techniques, that have robustly sim-
ulated a wide variety of signal transduction processes [112, 158, 168, 169, 125],
neutrophil trafficking in sepsis [157], patient specific coagulation behavior [111],
uncertainty quantification in metabolic kinetic models [5] and to capture cell to cell
variation [106]. Further, ensemble approaches have been used in synthetic biology
to sample possible biocircuit configurations [134]. Thus, ensemble approaches are
widely used to robustly simulate a variety of biochemical systems.
174
Identification of biochemical models requires significant training data perhaps
taken from diverse sources. These real-world data sets often contain intrinsic con-
flicts resulting from, for example, the use of different cell lines, different measure-
ment technologies, different reagent vendors or lots, uncontrollable experimental
artifacts or general cross laboratory variability. Parameter ensembles that optimally
balance these inherent conflicts lead to more robust model performance. Multiob-
jective optimization is an ensemble generation technique that naturally balances
conflicts in noisy training data [63]. Multiobjective optimization has been used to
identify signal transduction models [106, 158], for the design of synthetic circuits
[134], to design the folding behaviors of novel RNAs [166], to design bioprocesses
[151], and to understand bacterial adaptation [7]. Thus, it is a widely used ap-
proach for a variety of biochemical applications. Previously, we developed the
Pareto Optimal Ensemble Technique (POETs) algorithm to address the challenge of
competing or conflicting training objectives. POETs, which integrates simulated
annealing (SA) and multiobjective optimization through the notion of Pareto rank,
estimates parameter ensembles which optimally trade-off between competing (and
potentially conflicting) experimental objectives [155]. However, the previous im-
plementation of POETs, in the Octave programming language [41], suffered from
poor performance and was not configurable. For example, Octave-POETs does not
accommodate user definable objective functions, bounds and problem constraints,
cooling schedules, different variable types e.g., a mixture of binary and continuous
design variables or custom diversity generation routines. Octave-POETs was also
not well integrated into a package or source code management (SCM) system.
175
Thus, upgrades to the approach containing new features, or bug fixes were not
centrally managed.
7.3 Implementation
In this software note, we present an open-source implementation of the Pareto op-
timal ensemble technique in the Julia programming language (JuPOETs). JuPOETs
takes advantage of the unique features of Julia to address many of the shortcom-
ings of the previous implementation. Julia is a cross-platform, high-performance
programming language for technical computing that has performance comparable
to C but with syntax similar to MATLAB/Octave and Python [16]. Julia also offers
a sophisticated compiler, distributed parallel execution, numerical accuracy, and
an extensive function library. Further, the architecture of JuPOETs takes advantage
of the first-class function type in Julia allowing user definable behavior for all key
aspects of the algorithm, including objective functions, custom diversity generation
logic, linear/non-linear parameter constraints (and parameter bounds constraints)
as well as custom cooling schedules. Julia’s ability to naturally call other languages
such as Python or C also allows JuPOETs to be used with models implemented in
a variety of languages across many platforms. Additionally, Julia offers a built-in
package manager which is directly integrated with GitHub, a popular web-based
Git repository hosting service offering distributed revision control and source code
management. Thus, JuPOETs can be adapted to many problem types, including
176
mixed binary and continuous variable types, bilevel problems and constrained
problems without altering the base algorithm, as was required in the previous
POETs implementation.
7.3.1 JuPOETs optimization problem formulation.
JuPOETs solves the K−dimensional constrained multiobjective optimization prob-
lem: 



O1 (x(t, p), p)
min ... (7.1)p 


OK (x(t, p), p)
subject to the model equations and constraints:
f(t, x(t, p), ẋ(t, p), u(t), p) = 0
g1 (t, x(t, p), u(t), p) ≥ 0
...
gC (t, x(t, p), u(t), p) ≥ 0
and parameter bound constraints:
L ≤ p ≤ U
177
The quantity O denotes the jthj objective function (j = 1, 2, . . . ,K), typically the
sum of squared errors for the jth data set for biochemical modeling applications.
The terms f(t, x(t, p), ẋ(t, p), u(t), p) denote the system of model equations (e.g.,
differential equations, differential algebraic equations or linear/non-linear alge-
braic equations) where p denotes the decision variable vector e.g., unknown model
parameters (D× 1). In typical biochemical modeling applications, the model equa-
tions f (·) are a system of continuous real-valued non-linear differential equations
that comprise a kinetic model, but other types of models e.g., stoichiometric models
are also common. The quantity t denotes time, x (t, p) denotes the model state
(with an initial state x0), and u(t) denotes an input vector. The decision variables
(e.g., kinetic parameters) can be subject to bounds constraints, where L and U
denote the lower and upper bounds, respectively as well as C problem specific
constraints gi (t, x(t, p), u(t), p) , i = 1, . . . , C. The decision variables p are typically
real-valued kinetic constants, or metabolic fluxes in the case of stoichiometric mod-
els. However, other variables types e.g., binary or categorical decision variables
can also be accommodated.
JuPOETs integrates simulated annealing (SA) [97] with Pareto ranking to esti-
mate decision variables on or near the optimal tradeoff surface between competing
objectives (Fig. 7.1 and Algorithm 1). A tradeoff surface defines the best possible
performance for every conflicting objective, such that an increase in the perfor-
mance of one objective does not decrease the performance of at least one other
objective. Pareto rank is a scalar measure of distance away from the optimal trade-
off surface (low rank is near the surface, while higher ranks are progressively
178
PaPraarmaemteerte Sr psapcaece ObjeOctbivjeec ftuivnec stipoanc Sepace
pkin objm
random walk
k2 obj2
pk j1 obj1
“Pareto-optimal front”
Figure 7.1: Schematkic : poaframeuteltr ivoebctjoerctive parameter mapping. The performance of
any given parameterE(ske)t :i ms umltia-opbpjeecdtivien ctosta fnunoctbiojenc vteicvteors pace using a ranking function
which quantifies the quali(tEy(ko)f=t(hE1e(kp),aEr2a(km),.e..t,EerNs(k. )T))he distance away from the optimal
tradeoff surface is qKu : aann atricfiheivde ouf sthine gcurtrhenet ePsatimreatteo orf athnek einnsegmsbcleheme of Fonseca and
Fleming in JuPOETsra.nk(k|K) : a Pareto-optimal rank based dominance  
         measure 
further away). Thusk, =th keinict %en tthrea sltairdtienag puonindt eofr lpyarianmgetPerOs ETs is a mapping between the
T = T0 % initial annealing temperature
value of the objective vector evaluated at pi+1 (decision variable guess at iterationRepeat
i + 1) and the scalar Parektonewr =a npekrt(uFrbi g(k. c7urr.e1nt)) . Traditional simulated annealing uses
 % Generate a new parameter guess (random
a scalar performanc e valuwealek .ogr. ,losciaml suealracthi)o  n error to make a probabilistic decision
 Calculate E(knew) and rank(knew|K)
to keep or reject a se t of dPeaccceipst(ikonnew,v kacurrrieant)b ≡le esx;pd{-eracniks(ikonnew|vKa) /r Tia}bles with better perfor-
 if  Paccept(knew, kcurrent) > rand(0,1)
mance are always a ccepte d, whMiolveet tho oksneew with worse performance are sometimes
  Update the archive K
accepted depending uponenadipf arameter called the temperature. On the other hand,
 T=annealing(T)
JuPOETs makes thisEnsdaRmepeeadt (eucnitsili tohne tuersmiinngatitohn ecoPndairtieotno isr saantiskfieind)stead of a single per-
formance objective. The problem of estimating biochemical model parameters
from experimental data is typically posed as an error minimization problem over
continuous real-valued decision variables (model parameters) subject to the model
equations. A parameter set pi+1 lies along the optimal tradeoff surface if no other
179
parameter guess leads to decreased error for every objective. JuPOETs calculates
the performance of a candidate parameter set pi+1 by calling the user defined
objective function; objective takes a parameter set as an input, evaluates the
model equations, and using this solution, returns the K× 1 objective vector. Can-
didate parameter sets are generated by the user supplied neighbor function; the
default implementation of neighbor is a random perturbation, however other per-
turbation logic can be implemented by the user. The error vector associated with
pi+1 is ranked using the builtin Pareto rank function, by comparing the error at
iteration i + 1 to the error archive Oi (all error vectors up to iteration i meeting a
ranking criterion). Parameter sets on or near the optimal trade-off surface between
the objectives have a rank equal to 0 (no other current parameter sets are better).
These rank zero parameter sets define the Pareto optimal group for the ensemble,
wherein Pareto optimality is defined as a parameter set not being dominated by
any other sets within the ensemble. Sets with increasing non-zero rank are pro-
gressively further away from the optimal trade-off surface. Thus, a parameter set
with a rank = 0 is better in a trade-off sense than rank > 0. We implemented the
Fonseca and Fleming ranking scheme in the builtin rank function [46]:
rank (Oi+1 (pi+1) | Oi) = r (7.2)
where rank r is the number of parameter sets that dominate (are better than)
parameter set pi+1, and Oi+1 (pi+1) denotes the objective vector evaluated at pi+1.
We used the Pareto rank to inform the SA calculation. The parameter set pi+1
180
was accepted or rejected by the SA at each iteration, by calculating an acceptance
probability P (pi+1):
P(pi+1) ≡ exp {−rank (Oi+1 (pi+1) | Oi) /T} (7.3)
where T is the simulated annealing temperature; the temperature provides control
over how strictly decreasing Pareto rank is enforced. As rank (Oi+1 (pi+1) | Oi)→
0, the acceptance probability moves toward one, ensuring that we explore parame-
ter sets along the Pareto surface. Occasionally, (depending upon T) a parameter
set with a high Pareto rank is accepted by the SA allowing a more diverse search
of the parameter space. However, as T is reduced as a function of iteration count
(using the cooling function), the probability of accepting a high-rank set decreases.
Parameter sets could also be accepted by the SA but not permanently archived in
Si, where Si is the solution archive. Only parameter sets with rank less than or
equal to a threshold (rank ≤4 by default) are included in Si, where the archive is
re-ranked and filtered after accepting every new parameter set. Parameter bounds
were implemented in the neighbor function as box constraints, while problem
specific constraints were implemented in objective using a penalty method:
C { }
Oi + λ ∑ min 0, gj (t, x(t, p), u(t), p) i = 1, . . . ,K (7.4)
j=1
where λ denotes the penalty parameter (λ = 100 by default). However, because
both the neighbor and objective functions are user defined, different constraint
181
implementations are easily defined.
To use JuPOETs, the user specifies the neighbor, acceptance, cooling and
objective functions along with an initial decision variable guess. Default im-
plementations of the neighbor, acceptance and cooling functions can be used
directly, or they can be overridden by user defined logic. However, the user must
provide an implementation of the objective function and provide an initial deci-
sion variable guess. Lastly, if the user is operating JuPOETs in hybrid mode, then
a refinement function pointer must also be specified. Hybrid mode temporarily
switches the search from a multiobjective to a single objective problem, where the
sum of the objective functions can be used to update the best (or initial) param-
eter guess. The specific hybrid mode search logic is up to the user; by default
hybrid mode is off, and the default refinement implementation is simply a pass
through function. However, we have shown previously that POETs operated in
hybrid mode (where the single objective problem used a pattern search approach)
had better performance that POETs alone [155]. Thus, hybrid mode is generally
recommended for most applications. In addition, there are several user config-
urable parameters that can be adjusted to control the performance of JuPOETs:
maximum number of iterations controls the number of iterations per temperature
(default 20); rank cutoff controls the upper rank bound on the solution archive
(default 5); temperature min controls the minimum temperature after which JuPO-
ETs returns the error and solution archives (default 0.001); show trace controls the
level of output shown to the user (default true). After the completion of the run,
JuPOETs returns the parameter solution archive S , objective archive O and rank
182
archive R. The parameter solution archive S contains is an D ×A array, where
A denotes the number of solutions in the archive when JuPOETs terminated. On
the other hand, the objective archive O is an K ×A array containing the perfor-
mance values for each objective corresponding the columns of S . Lastly, JuPOETs
returns the rank archiveR which is an A× 1 array of Pareto ranks corresponding
to the columns of S . One technical note, if JuPOETs is run from multiple starting
locations, and the archives from each of these runs is combined into a single collec-
tive archive, the combined parameter rank archive may become invalid. In these
cases, it is required to re-rank the parameter sets using the built-in rank function to
produce a collective parameter ranking.
7.4 Availability of data and materials
JuPOETs is open source, available under an MIT software license. The JuPO-
ETs source code is freely available from the JuPOETs GitHub repository at
https://github.com/varnerlab/POETs.jl. All samples used in this study are in-
cluded in the sample/biochemical and sample/test functions subdirectories of
the JuPOETs GitHub repository.
183
input :User specified objective function, and initial guess (D × 1). User can also specify custom neighbor,
acceptance. cooling and refinement functions or use the default functions provided.
Output :Rank archiveR (A× 1), parameter solution archive S (D ×A) and objective archive O (K×A), where A
denotes the number of accepted solutions
1 initialize: R, S and O using initial guess po ;
2 initialize: T←1.0;
3 initialize: Tmin ←1/10000;
4 initialize: Maximum number of steps per temperature I ;
// Call to local refinement function (single objective problem)
5 po ← user-function:refinement(po);
6 while T > Tmin do
7 i← 1;
8 while i< I do
// Generate a new parameter solution using user neighbor function
9 pi+1 ← user-function::neighbor(p∗);
// Evaluate pi+1 using user objective function
10 oi+1 ← user-function::objective(pi+1);
11 Add pi+1 to solution archive S ;
12 Add oi+1 to objective archive O;
// Calculate Pareto rank of solutions in O using builtin rank function
13 R ← builtin-function::rank(O);
// Accept pi+1 into the archive with user defined probability
14 P ← user-function::acceptance(R,T);
15 if P >rand then
// Update the best solution with pi+1
16 p∗ ← pi+1;
17 prune S ,R and O of all solutions above a rank threshold;
18 else
19 Remove pi+1 from solution archive S ;
20 Remove oi+1 from error archive O;
21 end
22 i← i + 1;
23 end
// Update T using the user cooling function
24 T← user-function::cooling(T);
25 end
Algorithm 1: Pseudo-code for the JuPOETs run-loop. The user must specify the objective
function and an initial parameter guess. The user can optionally specify the neighbor,
acceptance, cooling and refinement functions (or use the default implementations). The rank
archiveR, solution archive S and objective archive O are initialized from the initial guess. The
initial guess (potentially following a single objective local refinement step) is perturbed in the
neighbor function, which generates a new solution whose performance is evaluated using the
user supplied objective function. The new solution and objective values are then added to the
respective archives and ranked using the builtin rank function. If the new solution is accepted
(based upon a probability calculated with the user supplied acceptance function) it is added
to the solution and objective archive. This solution is then perturbed during the next iteration
of the algorithm. However, if the solution is not accepted, it is removed from the archive and
discarded. The temperature is adjusted using the user supplied cooling function after each I
iterations. When JuPOETs terminates, the parameter solution archive S , objective archive O and
rank archiveR are returned to the caller.
184
7.5 Results and Discussion
JuPOETs identified optimal or nearly optimal solutions significantly faster than
Octave-POETs for a suite of multiobjective algebraic test problems (Table 7.1). The
algebraic test problems were constrained non-linear functions with bound con-
straints and additional non-linear constraints on the decision variables in one case.
The problems had up to three-dimensional continuous real-valued decision vectors,
and each case had two objective functions. The wall-clock time for JuPOETs and
Octave-POETs was measured for 10 independent trials for each of the test problems.
The same cooling, neighbor, acceptance, and objective logic was employed be-
tween the implementations, and all other parameters were held constant. For each
test function, the search domain was partitioned into 10 segments, where an initial
parameter guess was drawn from each partition. The number of search steps for
each temperate was I = 10 for all cases, and the cooling parameter was α = 0.9.
On average, JuPOETs identified optimal or near optimal solutions for the suite
of test problems six-fold faster (60s versus 400s) than Octave-POETs (Fig. 7.2).
JuPOETs produced the characteristic tradeoff curves for each test problem, given
both decision variable bound and problem constraints (Fig. 7.3). Thus, JuPOETs
estimated an ensemble of solutions to constrained multiobjective algebraic test
problems significantly faster than the current Octave implementation. Next, we
tested JuPOETs on a proof-of-concept biochemical model identification problem.
JuPOETs estimated an ensemble of biochemical model parameters that were
consistent with the mean of synthetic training data (Fig. 7.4). Four synthetic train-
185
Name Dimension Function Domain Constraints
Schaffer O 21 (x) = x1  10  x  10function 2O2 (x) = (x  2)
2 2
Binh and Korn O1 (x, y) = 4x
2 + 4y2 0  x  5 g1 (x, y) = (x  5) + y  25
function 2 2 2 2 2O2 (x, y) = (x  5) + (y   5) 0  y  3 g2 (x, y) = (x  8) + (y + 3)   7.7
 ✓ ◆ !N 2
Fonseca and X 1O (x ) = 1  exp   x   p  4  x  4
Fleming function 3 1 i i iN
 i=1X !N ✓ ◆21
O2 (xi) = 1  exp   xi + p
N
i=1
Table 7.1: Multi-objective optimization test problems. We tested the JuPOETs
implementation on three two-dimensional test problems, with one-, two- and three-
dimensional parameter vectors. Each problem had parameter bounds constraints,
however, on the Binh and Korn function had additional non-linear problem con-
straints. For the Fonesca and Fleming problem, N = 3.
ing data sets were generated from a prototypical biochemical network consisting
of 6 metabolites and 7 reactions (Fig. 7.4, inset right). We considered a common
case in which the same extracellular measurements of Ae, Be, Ce and cellmass were
made on four hypothetical cell types, each having the same biological connectivity
but different performance. Network dynamics were modeled using the hybrid
cybernetic model with elementary modes (HCM) approach of Ramkrishna and
coworkers [95]. In the HCM approach, metabolic networks are first decomposed
into a set of elementary modes (EMs) (chemically balanced steady-state pathways,
see [150]). Dynamic combinations of elementary modes are then used to character-
ize network behavior. Each elementary mode is catalyzed by a pseudo enzyme;
thus, each mode has both kinetic and enzyme synthesis parameters. The proof of
concept network generated 6 EMs, resulting in 13 model parameters (continuous
186
Figure 7.2: The performance of JuPOETs on the multi-objective test suite. The
execution time (wall-clock) for JuPOETs and POETs implemented in Octave was
measured for 10 independent trials for the suite of test problems. The number
of steps per temperature I = 10, and the cooling parameter α = 0.9 for all cases.
The problem domain was partitioned into 10 equal segments, an initial guess was
drawn from each segment. For each of the test functions, JuPOETs estimated
solutions on (rank zero solutions, black) or near (gray) the optimal tradeoff surface,
subject to bounds and problem constraints.
real-valued decision variables). The synthetic training data was generated by
randomly varying these parameters.
The general form of the biochemical test problem was given by:
min (O1, . . . , OK) (7.5)p
subject to model and bounds constraints. We considered four training data sets
187
600
JuPOETs
Octave
500
400
300
200
100
0
Schaffer N1 Binh and Korn Fonesca and Fleming
Figure 7.3: Representative JuPOETs solutions for problems in the multi-objective
test suite. The number of steps per temperature I = 10, and the cooling parameter
α = 0.9 for all cases. The problem domain was partitioned into 10 equal segments,
an initial guess was drawn from each segment. For each of the test functions,
JuPOETs estimated solutions on (rank zero solutions, black) or near (gray) the
optimal tradeoff surface, subject to bounds and problem constraints.
(K = 4), each of which contained time-series measurements of Ae, Be, Ce and
cellmass. Each objective Oj, j = 1, . . . ,K quantified the squared difference between
the simulated (x thi) and measured extracellular species abundance (yi) in the j data
set:
O = ∑ ∑ (x (τ)− y (τ))2j i i j = 1, . . . ,K (7.6)
i τ
where, i denotes the species index and τ denotes the time index. The abundance of
extracellular species i (xi), the pseudo enzyme el (catalyzes flux through mode l),
188
Average Performance (N = 10) (s)
Extracellular
2.5
Intracellular
A (extracellular) A C q v1 q
2.0 1 2
Cellmass Ae A B Bv4 e
1.5 v2 v3
C
1.0
0.5 q3
C
0.0 e0 20 40 60 80 100
Time (AU)
1.2
B
1.0
0.8 C (extracellular)
0.6
0.4
0.2
B (extracellular)
0.00 20 40 60 80 100
Time (AU)
Figure 7.4: Proof of concept biochemical network study. Inset right: Prototypical
biochemical network with six metabolites and seven reactions modeled using
the hybrid cybernetic approach (HCM). Intracellular cellmass precursors A, B,
and C are balanced (no accumulation) while the extracellular metabolites Ae, Be,
and Ce are dynamic. The oval denotes the cell boundary, qj is the jth flux across
the boundary, and vk denotes the kth intracellular flux. Four data sets (each
with Ae, Be,Ce and cellmass measurements) were generated by varying the kinetic
constants for each biochemical mode. Each data set was a single objective in the
JuPOETs procedure. A: Ensemble simulation of extracellular substrate Ae and
cellmass versus time. B: Ensemble simulation of extracellular substrate Be and Ce
versus time. The gray region denotes the 95% confidence estimate of the mean
ensemble simulation. The data points denote mean synthetic measurements, while
the error bars denote the 95% confidence estimate of the measurement computed
over the four training data sets. C: Trade-off plots between the four training
objectives. The quantity Oj denotes the jth training objective. Each point represents
a member of the parameter ensemble, where gray denotes rank 0 sets, while black
denotes rank 1 sets. Ensembles were generated using POETs without employing
local refinement.
189
Concentration (AU) Concentration (AU)
and cellmass were governed by the model equations:
dx R Li = ∑ ∑ σijzjlql (e, p, x) c i = 1, . . . ,Mdt j=1 l=1
del = α + r (p, x) u − (β + r
dt l El l l G
) el l = 1, . . . ,L
dc
= r c
dt G
where R and M denote the number of reactions and extracellular species in
the model and L denotes the number of elementary modes. The quantity σij
denotes the stoichiometric coefficient for species i in reaction j and zjl denotes
the normalized flux for reaction j in mode l. If σij > 0, species i is produced by
reaction j; if σij < 0, species i is consumed by reaction j; if σij = 0, species i is not
connected with reaction j. Extracellular species, cellmass and pseudo-enzyme were
subject to the initial conditions x (to) = xo, c(to) = co and el = 0.5, respectively.
The term ql (e, p, x) denotes the specific uptake/secretion rate for mode l where
e denotes the pseudo enzyme vector, p denotes the unknown kinetic parameter
vector (decision variables), x denotes the extracellular species vector, and c denotes
the cell mass; ql (e, p, x) is the product of a kinetic term (q̄l) and a control variable
governing enzyme activity. Flux through each mode was catalyzed by a pseudo
enzyme el, synthesized at the regulated specific rate rE,l (p, x), and constitutively
at the rate αl. The term ul denotes the cybernetic variable controlling the synthesis
of enzyme l. The term βl denotes the rate constant governing non-specific enzyme
degradation, and rG denotes the specific growth rate through all modes. The
190
specific uptake/secretion rates and the specific rate of enzyme synthesis were
modeled using saturation kinetics. The specific growth rate was given by:
L
rG = ∑ zµlql (e, p, x)
l=1
where zµl denotes the growth flux µ through mode l. The control variables ul and
vl , which control the synthesis and activity of each enzyme respectively, were given
by:
z q̄
ul =
sl l
L (7.7)
∑ zsl q̄l
l=1
and
z
v = sl
q̄l
l (7.8)max z
L sl
q̄l
l=1,...,
where zsl denotes the uptake flux of substrate s through mode l. Each unknown ki-
netic parameter was continuous and real-valued, and subject to bounds constraints:
L ≤ p ≤ U .
JuPOETs produced an ensemble of approximately dimS ' 13,000 parameter
sets that captured the mean of the measured data sets for extracellular metabolites
and cellmass (Fig. 7.4A and B). JuPOETs minimized the difference between the sim-
ulated and measured values for extracellular metabolites Ae, Be, Ce and cellmass,
where the residual for each data set was treated as a single objective (leading to four
objectives). The 95% confidence estimate produced by the ensemble was consistent
with the mean of the measured data, despite having significant uncertainty in the
191
training data. JuPOETs produced a consensus estimate of the synthetic data by
calculating optimal trade-offs between the training data sets (Fig. 7.4C). Multiple
trade-off fronts were visible in the objective plots, for example between data set
3 (O3) and data set 2 (O2). Thus, without a multiobjective approach, it would be
challenging to capture these data sets as fitting one leads to decreased performance
on the other. However, the ensemble contained parameter sets that described
each data set independently (Fig. 7.5). Thus, JuPOETs produced an ensemble of
parameters that gave the mean of the training data for conflicting data sets, while
simultaneously estimating parameter sets that performed well on each individual
objective function.
Currently, JuPOETs does not consider parameter identifiability when construct-
ing parameter ensembles. Although JuPOETs produces parameter estimates that
give model performance similar to the training data, we do not have strict statisti-
cal confidence that the true parameter values are contained within the ensemble.
However, despite this, ensembles produced by POETs can be predictive [106, 158].
Thus, JuPOETs produces a collection of parameters that are constrained by the
performance of the model, and not by specific hypotheses regarding the individual
values of the raw model parameters. Of course, knowledge of specific parameter
values, or the relationship between parameter combinations, can be used to inform
the search through either bounds or problem specific constraints (for example, as
demonstrated in the first example problem.)
192
3.0
2.5
2.0
Experiment 3
1.5
1.0
Experiment 2
0.5
0.00 20 40 60 80 100
Time (AU)
Figure 7.5: Experiment to experiment variation captured by the ensemble. Cellmass
measurements (points) versus time for experiment 2 and 3 were compared with
ensemble simulations. The full ensemble was sorted by simultaneously selecting
the top 25% of solutions for each objective with rank ≤ 1. The best fit solution
for each objective (line) ± 1-standard deviation (gray region) for experiment 2
and 3 brackets the training data despite significant differences the training values
between the two data sets.
7.6 Conclusions
In this software note, we presented JuPOETs, a multiobjective technique to estimate
parameter ensembles in the Julia programming language. JuPOETs is open source,
and available for download under an MIT license from the JuPOETs GitHub repos-
itory at https://github.com/varnerlab/POETs.jl. We demonstrated JuPOETs on
a suite of algebraic test problems, and a proof-of-concept ODE based biochem-
ical model. While JuPOETs outperformed (and was significantly more flexible)
193
Cellmass Concentration (AU)
than the previous Octave implementation, there are several areas that could be
explored further. First, JuPOETs should be compared with other multiobjective
evolutionary algorithms (MOEAs) to determine its relative performance on test
and real world problems. Many evolutionary approaches e.g., the non-dominated
sorting genetic algorithm (NSGA) family of algorithms, have been adapted to solve
multiobjective problems [86, 76]. However, since there is a lack of open source Julia
implementations of these alternative approaches, we did not benchmark the rela-
tive performance of JuPOETs in this note. One advantage that JuPOETs may have
when compared to a strictly evolutionary approaches, is the inclusion of a local
refinement step (hybrid mode), which temporarily reduces the problem to a single
objective formulation. Previously, POETs run in hybrid mode led to better con-
vergence on a proof-of-concept signal transduction model compared to the same
approach without the hybrid refinement step [155]. Other hybrid multiobjective
methods have also been shown to be more efficient than evolutionary approaches
alone, for a variety of biochemical optimization problems [134, 151]. Thus, there
are several different algorithms that we can use to benchmark, and improve the
performance of JuPOETs, after we implement them in Julia. Another strategy to
improve the performance of JuPOETs is to reduce the number (or cost) of function
evaluations that are required to obtain optimal or near optimal solutions. For exam-
ple, in many real world parameter estimation problems, the bulk of the execution
time is spent evaluating the objective functions. One strategy to improve JuPOETs
performance could be to optimize surrogates [18], while another would be parallel
execution of the objective functions. Currently, JuPOETs serially evaluates the
194
objective function vector. However, parallel evaluation of the objective functions
e.g., using the parallel Julia macro or other techniques, could be implemented
without significantly changing the JuPOETs run loop. Taken together, JuPOETs
demonstrated improved flexibility, and performance over POETs in parameter
identification and ensemble generation for multiple objectives. JuPOETs has the
potential for widespread use due to the flexibility of the implementation, and the
high level syntax and distribution tools native to the Julia programming language.
7.7 Acknowledgements
This study was supported by an award from the National Science Foundation (NSF
CBET-0955172) and the National Institutes of Health (NIH HL110328) to J.B, and
by a National Science Foundation Graduate Research Fellowship (DGE-1144153)
to D.B. Lastly, J.V was supported by an award from the US Army and Systems
Biology of Trauma Induced Coagulopathy (W911NF-10-1-0376). We gratefully
acknowledge Ani Chakrabarti, Russell Gould and Kathy Rogers for their input and
suggestions regarding new features to include into JuPOETs. We also gratefully
acknowledge the suggestions from the anonymous reviewers to improve this
manuscript and JuPOETs.
195
CHAPTER 8
SUMMARY & CONCLUSION
Metabolism is the central process through which cells manage their resources to
survive, adapt and meet energetic demands. To implement these diverse functions,
cells have very complex and highly interconnected networks of chemical reactions
between genes, RNA, proteins and metabolites. Due to the complexity of cells,
systems modeling arose from the desire to better understand metabolism and
how metabolism can be altered for our benefit [48, 12]. A primary challenge
is the development of metabolic mathematical models that are able to describe
the effect genetic perturbations have on cellular behavior. In this study, we first
review metabolic modeling methods and go on to develop computational tools
for the analysis and engineering of microbial systems. My research work began
with cybernetic modeling and linear programming. Both techniques were able to
describe growth of microbial systems on substrates as well as byproduct formation
[176, 98, 95]. However, cybernetic modeling coupled with elementary modes was
only applicable to small networks, since the decomposition of a network would
grow exponentially with it’s size. Thus, we eliminated this computational burden
by the use of flux balance solutions instead of elementary modes to describe aerobic
and anaerobic growth of E. coli. Following our work with cybernetic modeling,
my research focus shifted towards cell-free protein synthesis systems. Cybernetic
modeling uses matching laws to describe enzyme synthesis, however CFPS systems
do not have the capacity of enzyme synthesis. Thus, we used alternative modeling
196
approaches to describe CFPS behavior to help us understand the performance
limitations of these systems. In addition, these mathematical models would help
identify strategies for the improvement of CFPS in terms of productivity, yield
and/or energy efficiency.
We first began by developing a kinetic model of CFPS for which an extensive
dataset was provided by the Swartz Lab. The kinetic model contained 148 metabo-
lites and 204 reactions with a total of 815 parameters. Model equations followed the
hybrid modeling framework of Wayman and coworkers [183], combining multiple
saturation kinetics with a rule-based model of allostery. Even though this model
described the metabolite levels of 38 species, its development took several years to
complete. In addition, the model was only applied to a specific CFPS system for
the production of CAT under a T7 promoter. We then applied a constraint-based
approach to minimize the number of adjustable parameters.
We developed a sequence-specific constraint based model of cell-free protein
synthesis by taking the same metabolic network with the addition of promoter
models from Moon and coworkers [123]. The resulting model structure contained
only six adjustable parameters, not including parameters taken from literature.
The modeling framework estimated the production of CAT under a T7 promoter
for the Glucose/NMP cell-free system and GFP production under a P70 promoter
in the myTXTL system. The model also estimated the titer of GFP as a function
of plasmid concentration. Global sensitivity analysis identified the translation
rate as the key metabolic process that controlled CFPS productivity and oxidative
197
phosphorylation as the key metabolic process for energy efficiency. Despite the
simulations being consistent with experimental measurements, there was a high
uncertainty in the flux distribution as shown by alternative optimal solutions.
In order to circumvent this uncertainty, we developed analytical techniques
to measure species involved in central carbon and energy metabolism as well as
amino acids, mRNA, and protein levels. Cell-free systems have no cell wall, thus
we have direct access to metabolites and the biosynthetic machinery. We developed
a robust protocol to qunatify 41 compounds involved in glycolysis, the pentose
phosphate pathway, the tricarboxylic acid cycle, energy metabolism and cofactor
regeneration in CFPS reactions. The method used internal standards tagged with
13C-aniline, while compounds in the sample were derivatized with 12C-aniline.
The internal standards allowed for the co-elution of compounds which eliminated
ion suppression. We then applied an amino acid protocol from Waters (Medford,
MA) to quantify 19 amino acids and used a colorimetric assay to quantify glutamate
levels. Finally, we used real-time RT-qPCR to measure mRNA levels. In total we
quantified 63 species for a span of 16 hours for a batch reaction of CFPS.
We expanded our sequence specific modeling framework by integrating these
experimental measurements along with kinetic parameters, enzyme levels, and
enzyme activity assays. The framework predicted the overall production of mRNA
and protein along with changes in metabolic behavior with two different oxidative
phosphorylation inhibitors. The integrated modeling framework revealed that
central metabolism is activated along with glutamate powering the TCA cycle to
198
provide reduced ubiquinone for oxidative phosphorylation. Oxidative phosphory-
lation inhibitors provide biochemical evidence that myTXTL relied on oxidative
phosphorylation to provide energy for sustaining transcription and translation for
16 hours in a batch reaction. Finally, enzyme activity assays throughout central car-
bon metabolism revealed that allosteric regulation is present in CFPS metabolism
and should be incorporated into future mathematical models. Cell-free protein
synthesis is beyond just transcription and translation processes, thus we provide a
comprehensive mathematical framework that predicted mRNA and protein pro-
duction along with metabolic perturbations. This framework could potentially be
used to identify strategies for the improvement of CFPS productivity, yield and
efficiency.
While this study was promising in predicting protein, mRNA production, and
metabolic behavior, there are several opportunities to consider in future work.
First, a more detailed description of transcription and translation reactions has
been utilized in genome scale ME models e.g., O’Brien et al [129]. These template
reactions could be adapted to a cell-free system. This would allow us to consider
important facets of protein production, such as the role of chaperones in protein
folding. Post-translation modifications such as glycosylation that are important
for the production of therapeutic proteins could also be included in the next
generation of models. In this work, we modeled the cell-free production of single
proteins coupled to cell-free metabolism, but sequence specific constraint based
modeling could be extended to multi-protein synthetic circuits, RNA circuits or
small molecule production.
199
APPENDIX A
APPENDIX
200
Table A.1: List of materials and equipment used to quantify cell-free protein
synthesis metabolites with aniline tagging and internal standards
Material/Equipment Company Catalog Number Comments/Description
12C Aniline Sigma-Aldrich 242284 Aniline 12C
13C labeled aniline Sigma-Aldrich 485797 Aniline 13C6
3-Phosphoglyceric acid Sigma-Aldrich P8877 3PG
Acetic Acid FisherScientific AC222140010 ACE
Acetonitrile, LCMS JT BAKER 9829-03 ACN
Acetyl-coenzyme A Sigma-Aldrich A2056 ACA
Acquity UPLC BEH C18 1.7 µM, 2.1 x 150 mm Column Waters 186002353 Column
Adenosine diphosphate Sigma-Aldrich A2754 ADP
Adenosine monophosphate Sigma-Aldrich A1752 AMP
Adenosine triphosphate Sigma-Aldrich A2383 ATP
Alpha-ketoglutarate Sigma-Aldrich K1128 aKG
Citrate Sigma-Aldrich 251275 CIT
Cytidine diphosphate Sigma-Aldrich C9755 CDP
Cytidine monophosphate Sigma-Aldrich C1006 CMP
Cytidine triphosphate Sigma-Aldrich C9274 CTP
D-glyceraldehyde 3-phosphate Sigma-Aldrich 39705 GAP
Erythrose 4-phosphate Sigma-Aldrich E0377 E4P
Ethanol Sigma-Aldrich EX0276 EtOH
Fisher Scientific accuSpin Micro 17 Centrifuge FisherScientific Centrifuge
Flavin adenine dinucleotide Sigma-Aldrich F6625 FAD
Fructose 1,6-bisphosphate Sigma-Aldrich F6803 F16P
Fructose 6-phosphate Sigma-Aldrich F3627 F6P
Fumarate Sigma-Aldrich F8509 FUM
Gluconate 6-phosphate Sigma-Aldrich P7877 6PG
Glucose Sigma-Aldrich G8270 GLC
Glucose 6-phosphate Sigma-Aldrich G7879 G6P
Glycerol 3-phosphate Sigma-Aldrich G7886 Gly3P
Guanosine diphosphate Sigma-Aldrich G7127 GDP
Guanosine monophosphate Sigma-Aldrich G8377 GMP
Guanosine triphosphate Sigma-Aldrich G8877 GTP
Hydrochloric acid Sigma-Aldrich 258148 HCl
Isocitrate Sigma-Aldrich I1252 ICIT
Lactate Sigma-Aldrich L1750 LAC
Malate Sigma-Aldrich 02288 MAL
myTXTL - Sigma 70 Master Mix Kit ArborBiosciences 507024 Cell-free protein synthesis
N-(3-dimethylaminopropyl)-N’-ethylcarbodiimide hydrochloride Sigma-Aldrich 03449 EDC
Nicotinamide adenine dinucleotide Sigma-Aldrich 43410 NAD
Nicotinamide adenine dinucleotide phosphate Sigma-Aldrich N5755 NADP
Nicotinamide adenine dinucleotide phosphate reduced Sigma-Aldrich 481973 NADPH
Nicotinamide adenine dinucleotide reduced Sigma-Aldrich N8129 NADH
Oxalacetate Sigma-Aldrich O4126 OAA
Phosphoenolpyruvate Sigma-Aldrich P0564 PEP
Pyruvate Sigma-Aldrich P5280 PYR
Ribose 5-phosphate Sigma-Aldrich R7750 R5P
Ribulose 5-phosphate CarboSynth MR45852 RL5P
Sedoheptulose 7-phosphate CarboSynth MS07457 S7P
Succinate Sigma-Aldrich S3674 SUCC
Tributylamine Sigma-Aldrich 90780 TBA
Triethylamine FisherScientific O4884 TEA
ultrapure water FisherScientific 10977-015 water
Uridine diphosphate Sigma-Aldrich U4125 UDP
Uridine monophosphate Sigma-Aldrich U6375 UMP
Uridine triphosphate Sigma-Aldrich U6625 UTP
VWR Heavy Duty Vortex VWR Vortex
Water, LCMS JT BAKER 9831-03 WATER
Waters Acquity H UPLC Class Quaternary Solvent Manager Waters LCMS
Waters Acquity H UPLC Class Sample Manager FTN Waters LCMS
Waters Acquity Qda detector Waters LCMS
Waters Empower 3 Waters Software
Waters LCMS Total Recovery Vial Waters 186000384c LCMS Vial
201
BIBLIOGRAPHY
[1] GNU Linear Programming Kit, Version 4.52, March 2016.
[2] Jiro Adachi, Kazushige Katsura, Eiko Seki, Chie Takemoto, Mikako Shi-
rouzu, Takaho Terada, Takahito Mukai, Kensaku Sakamoto, and Shigeyuki
Yokoyama. Cell-free protein synthesis using s30 extracts from Escherichia
coli rfzero strains for efficient incorporation of non-natural amino acids
into proteins. International journal of molecular sciences, 20(3):492, Jan 2019.
30678326[pmid].
[3] Roi Adadi, Benjamin Volkmer, Ron Milo, Matthias Heinemann, and Tomer
Shlomi. Prediction of microbial growth rate versus biomass yield by a
metabolic network with kinetic parameters. PLOS Comput Biol, 8, 2012.
[4] Timothy E Allen and Bernhard Ø Palsson. Sequence-based analysis of
metabolic demands for protein synthesis in prokaryotes. J Theor Biol, 220(1):1–
18, Jan 2003.
[5] S. Andreozzi, A. Chakrabarti, K. C. Soh, A. Burgard, T. H. Yang, S. Van Dien,
L. Miskovic, and V. Hatzimanikatis. Identification of metabolic engineering
targets for the enhancement of 1,4-butanediol production in recombinant E.
coli using large-scale kinetic models. Metab. Eng., 35:148–159, May 2016.
[6] Stefano Andreozzi, Ljubisa Miskovic, and Vassily Hatzimanikatis. iS-
CHRUNK – in silico approach to characterization and reduction of uncer-
202
tainty in the kinetic models of genome-scale metabolic networks. Metab Eng,
33:158–168, 2007.
[7] Claudio Angione and Pietro Lió. Predictive analytics of environmental
adaptability in multi-omic network models. Sci Rep, 5:15147, Oct 2015.
[8] P. Arjunan, N. Nemeria, A. Brunskill, K. Chandrasekhar, M. Sax, Y. Yan,
F. Jordan, J. R. Guest, and W. Furey. Structure of the pyruvate dehydroge-
nase multienzyme complex E1 component from Escherichia coli at 1.85 A
resolution. Biochemistry, 41(16):5213–21, Apr 2002.
[9] J C Atlas, E V Nikolaev, S T Browning, and M L Shuler. Incorporating
genome-wide dna sequence information into a dynamic whole-cell model
of Escherichia coli: application to dna replication. IET Syst Biol, 2(5):369–82,
Sep 2008.
[10] Shota Atsumi, Taizo Hanai, and James C. Liao. Non-fermentative path-
ways for synthesis of branched-chain higher alcohols as biofuels. Nature,
451(7174):86–89, 01 2008.
[11] Rochelle Aw and Karen M. Polizzi. Biosensor-assisted engineering of a high-
yield pichia pastoris cell-free protein synthesis platform. Biotechnol Bioeng,
116(3):656–666, 2019.
[12] JE Bailey. Toward a science of metabolic engineering. Science, 252(5013):1668–
1675, 1991.
203
[13] Arren Bar-Even, Elad Noor, Yonatan Savir, Wolfram Liebermeister, Dan
Davidi, Dan S. Tawfik, and Ron Milo. The moderately efficient enzyme:
evolutionary and physicochemical trends shaping enzyme parameters. Bio-
chemistry, 50, 2011.
[14] D Battogtokh, D.K Asch, M.E Case, J Arnold, and H.B Shüttler. An ensemble
method for identifying regulatory circuits with special reference to the qa
gene cluster of Neurospora crassa. Proc Natl Acad Sci U S A, 99(26):16904–
16909, December 2002.
[15] Jennifer E. Bestman, Krista D. Stackley, Jennifer J. Rahn, Tucker J. Williamson,
and Sherine S. L. Chan. The cellular and molecular progression of mito-
chondrial dysfunction induced by 2,4-dinitrophenol in developing zebrafish
embryos. Differentiation; research in biological diversity, 89(3-4):51–69, 2015.
25771346[pmid].
[16] Jeff Bezanzon, Stefan Karpinski, Viral Shah, and Alan Edelman. Julia: A fast
dynamic language for technical computing. In Lang.NEXT, April 2012.
[17] Lacramioara Bintu, Nicolas E Buchler, Hernan G Garcia, Ulrich Gerland,
Terrence Hwa, Jane Kondev, and Rob Phillips. Transcriptional regulation by
the numbers: models. Current Opinion in Genetics & Development, 15(2):116–24,
2005.
[18] A.J Booker, J.E Dennis, P.D Frank, D.B Serafini, V Torczon, and M.W Trosset.
A rigorous framework for optimization of expensive functions by surrogates.
Struct Optim, 17:1 – 13, 1999.
204
[19] Henry Borsook. Protein turnover and incorporation of labeled amino acids
into tissue proteins in vivo and in vitro. Physiological reviews, 30(2):206–219,
1950.
[20] Sabine Brantl and E. Gerhart H. Wagner. Antisense RNA-mediated transcrip-
tional attenuation: an in vitro study of plasmid pT181. Molecular Microbiology,
35(6):1469–1482, 2000.
[21] Kevin S Brown and James P Sethna. Statistical mechanical approaches to
models with many poorly known parameters. Phys Rev E Stat Nonlin Soft
Matter Phys, 68(2 Pt 1):021904, Aug 2003.
[22] Matthias Bujara, Michael Schümperli, Sonja Billerbeck, Matthias Heinemann,
and Sven Panke. Exploiting cell-free systems: Implementation and debug-
ging of a system of biotransformations. Biotechnol Bioeng, 106(3):376–389,
2010.
[23] Jörg Martin Büscher, Dominika Czernik, Jennifer Christina Ewald, Uwe
Sauer, and Nicola Zamboni. Cross-platform comparison of methods for
quantitative metabolomics of primary metabolism. Analytical Chemistry,
81(6):2135–2143, Mar 2009.
[24] R. Cabrera, M. Baez, H. M. Pereira, A. Caniuguir, R. C. Garratt, and J. Babul.
The crystal complex of phosphofructokinase-2 of Escherichia coli with
fructose-6-phosphate: kinetic and structural analysis of the allosteric ATP
inhibition. J. Biol. Chem., 286(7):5774–83, Feb 2011.
205
[25] Kara A. Calhoun and James R. Swartz. An Economical Method for Cell-Free
Protein Synthesis using Glucose and Nucleoside Monophosphates. Biotechnol
Prog, 21(4):1146–53, 2005.
[26] Erik D Carlson, Rui Gan, C Eric Hodgman, and Michael C Jewett. Cell-free
protein synthesis: applications come of age. Biotechnol Adv, 30(5):1185–94,
2012.
[27] Filippo Caschera, Mark A. Bedau, Andrew Buchanan, James Cawse, Davide
de Lucrezia, Gianluca Gazzola, Martin M. Hanczyc, and Norman H. Packard.
Coping with complexity: Machine learning optimization of cell-free protein
synthesis. Biotechnol Bioeng, 108(9):2218–2228, 2011.
[28] Filippo Caschera and Vincent Noireaux. Synthesis of 2.3 mg/ml of pro-
tein with an all Escherichia coli cell-free transcription–translation system.
Biochimie, 99:162 – 168, 2014.
[29] M Castellanos, D B Wilson, and M L Shuler. A modular minimal cell
model: purine and pyrimidine transport and metabolism. Proc Natl Acad Sci,
101(17):6681–6, Apr 2004.
[30] Roger L. Chang, Kathleen Andrews, Donghyuk Kim, Zhanwen Li, Adam
Godzik, and Bernhard O. Palsson. Structural systems biology evaluation of
metabolic thermotolerance in Escherichia coli. Science, 340(6137):1220–1223,
2013.
[31] James Chappell, Melissa K. Takahashi, and Julius B. Lucks. Creating small
206
transcription activating RNAs. Nature Chemical Biology, 11(3):214–220, March
2015.
[32] M. Chulavatnatol and D. E. Atkinson. Phosphoenolpyruvate synthetase
from Escherichia coli. Effects of adenylate energy charge and modifier con-
centrations. J. Biol. Chem., 248(8):2712–5, Apr 1973.
[33] Carolina A. Contador, Matthew L. Rizk, Juan A. Asenjo, and James C. Liao.
Ensemble modeling for strain development of l-lysine-producing Escherichia
coli. Metabolic Engineering, 11(4–5):221 – 233, 2009.
[34] Markus W Covert, Eric M Knight, Jennifer L Reed, Markus J Herrgard, and
Bernhard O Palsson. Integrating high-throughput and computational data
elucidates bacterial networks. Nature, 429(6987):92–6, May 2004.
[35] David Dai, Nicholas Horvath, and Jeffrey D Varner. Dynamic sequence
specific constraint-based modeling of cell-free protein synthesis. Processes,
6(8):132, Aug 2018.
[36] Katja Dettmer, Pavel A. Aronov, and Bruce D. Hammock. Mass spectrometry-
based metabolomics. Mass Spectrometry Reviews, 26(1):51–78, 2007.
[37] M M Domach, S K Leung, R E Cahn, G G Cocks, and M L Shuler. Computer
model for glucose-limited growth of a single cell of Escherichia coli b/r-a.
Biotechnol Bioeng, 26(3):203–16, Mar 1984.
[38] M. M. Domach, S. K. Leung, R. E. Cahn, G. G. Cocks, and M. L. Shuler.
207
Computer model for glucose-limited growth of a single cell of Escherichia
coli b/r-a. Biotechnol Bioeng, 67(6):827–840, 2000.
[39] J. L. Donahue, J. L. Bownas, W. G. Niehaus, and T. J. Larson. Purification
and characterization of glpX-encoded fructose 1, 6-bisphosphatase, a new
enzyme of the glycerol 3-phosphate regulon of Escherichia coli. J. Bacteriol.,
182(19):5624–7, Oct 2000.
[40] Warwick B. Dunn, Alexander Erban, Ralf J M Weber, Darren J. Creek, Marie
Brown, Rainer Breitling, Thomas Hankemeier, Royston Goodacre, Steffen
Neumann, Joachim Kopka, and Mark R. Viant. Mass appeal: Metabo-
lite identification in mass spectrometry-focused untargeted metabolomics.
Metabolomics : Official journal of the Metabolomic Society, 9(1):44–66, 2013.
[41] John W. Eaton, David Bateman, and Soren Hauberg. GNU Octave version 3.0.1
manual: a high-level interactive language for numerical computations. CreateSpace
Independent Publishing Platform, North Charleston, SC, USA, 2009.
[42] J S Edwards and B Ø Palsson. The Escherichia coli mg1655 in silico metabolic
genotype: its definition, characteristics, and capabilities. Proc Natl Acad Sci,
97(10):5528–33, May 2000.
[43] Jeremy S. Edwards and Bernhard O. Palsson. Metabolic flux balance analysis
and the in silico analysis of Escherichia coli k-12 gene deletions. BMC
Bioinformatics, 1(1):1, 2000.
[44] Adam M Feist, Christopher S Henry, Jennifer L Reed, Markus Krummenacker,
208
Andrew R Joyce, Peter D Karp, Linda J Broadbelt, Vassily Hatzimanikatis,
and Bernhard Ø Palsson. A genome-scale metabolic reconstruction for Es-
cherichia coli K-12 MG1655 that accounts for 1260 ORFs and thermodynamic
information. Mol Syst Biol, 3:121, 2007.
[45] Adam M Feist, Markus J Herrgrd, Ines Thiele, Jennie L Reed, and Bernhard Ø
Palsson. Reconstruction of biochemical networks in microorganisms. Nat
Rev Microbiol, 7(2):129–43, Feb 2009.
[46] C.M. Fonseca and P. J. Fleming. Genetic Algorithms for Multiobjective
Optimization: Formulation, Discussion and Generalization. In Proceedings of
the 5th International Conference on Genetic Algorithms, pages 416 – 423, 1993.
[47] Elena Fossati, Andrew Ekins, Lauren Narcross, Yun Zhu, Jean-Pierre
Falgueyret, Guillaume A. W. Beaudoin, Peter J Facchini, and Vincent J. J.
Martin. Reconstitution of a 10-gene pathway for synthesis of the plant alka-
loid dihydrosanguinarine in saccharomyces cerevisiae. Nat Commun, 5, 02
2014.
[48] A G Fredrickson. Formulation of structured growth models. Biotechnol
Bioeng, 18(10):1481–6, Oct 1976.
[49] Kapil G Gadkar, Francis J Doyle, 3rd, Timothy J Crowley, and Jeffrey D
Varner. Cybernetic model predictive control of a continuous bioreactor with
cell recycle. Biotechnol Prog, 19(5):1487–97, 2003.
[50] Kapil G Gadkar, Rudiyanto Gunawan, and Francis J Doyle, 3rd. Iterative
209
approach to model identification of biological networks. BMC Bioinformatics,
6:155, 2005.
[51] Ernest F Gale, Joan P Folkes, et al. Effect of nucleic acids on protein synthesis
and amino-acid incorporation in disrupted staphylococcal cells. Nature,
173:1223–7, 1954.
[52] Jonathan Garamella, Ryan Marshall, Mark Rustad, and Vincent Noireaux.
The all e. coli tx-tl toolbox 2.0: A platform for cell-free synthetic biology. ACS
Synth Biol, 5(4):344–55, Apr 2016.
[53] David Garenne, Chase L. Beisel, and Vincent Noireaux. Characterization of
the all-e. coli transcription-translation system mytxtl by mass spectrometry.
Rapid Communications in Mass Spectrometry, 33(11):1036–1048, 2019.
[54] Peter Gennemark and Dag Wedelin. Benchmarks for identification of ordi-
nary differential equations from time series data. Bioinformatics, 25(6):780–6,
Mar 2009.
[55] Aaron R. Goerke and James R. Swartz. Development of cell-free protein
synthesis platforms for disulfide bonded proteins. Biotechnol Bioeng, 99(2):351–
367, 2008.
[56] Johann Grundlingh, Paul I. Dargan, Marwa El-Zanfaly, and David M. Wood.
2,4-dinitrophenol (dnp): a weight loss agent with significant acute toxicity
and risk of death. Journal of medical toxicology : official journal of the American
College of Medical Toxicology, 7(3):205–212, Sep 2011. 21739343[pmid].
210
[57] Cassandra Guarino and Matthew P DeLisa. A prokaryote-based cell-free
translation system that efficiently synthesizes glycoproteins. Glycobiology,
22(5):596–601, May 2012.
[58] Weihua Guo, Jiayuan Sheng, and Xueyang Feng. Mini-review: In vitro
metabolic engineering for biomanufacturing of high-value products. Compu-
tational and Structural Biotechnology Journal, 15:161 – 167, 2017.
[59] Ryan N Gutenkunst, Joshua J Waterfall, Fergal P Casey, Kevin S Brown,
Christopher R Myers, and James P Sethna. Universally sloppy parameter
sensitivities in systems biology models. PLoS Comput Biol, 3:e189, 2007.
[60] A. Gyorgy and R. M. Murray. Quantifying resource competition and its
effects in the TX-TL system. In 2016 IEEE 55th Conference on Decision and
Control (CDC), pages 3363–3368, December 2016.
[61] H Hajjaj, P.J Blanc, G Goma, and J François. Sampling techniques and
comparative extraction procedures for quantitative determination of intra-
and extracellular metabolites in filamentous fungi. FEMS Microbiology Letters,
164(1):195–200, 1998.
[62] Joshua J Hamilton, Vivek Dwivedi, and Jennifer L Reed. Quantitative Assess-
ment of Thermodynamic Constraints on the Solution Space of Genome-Scale
Metabolic Models. Biophys J, 105(2):512–522, Jul 2013.
[63] Julia Handl, Douglas B Kell, and Joshua Knowles. Multiobjective optimiza-
211
tion in bioinformatics and computational biology. IEEE/ACM Trans Comput
Biol Bioinform, 4(2):279–92, 2007.
[64] Christopher S Henry, Linda J Broadbelt, and Vassily Hatzimanikatis.
Thermodynamics-Based Metabolic Flux Analysis. Biophys. J, 92(5):192–1805,
Mar 2006.
[65] Jon Herman and Will Usher. SALib: An open-source python library for
sensitivity analysis. The Journal of Open Source Software, 2(9), jan 2017.
[66] Alan C Hindmarsh, Peter N Brown, Keith E Grant, Steven L Lee, Radu
Serban, Dan E Shumaker, and Carol S Woodward. SUNDIALS: Suite of
nonlinear and differential/algebraic equation solvers. ACM T Math Software
(TOMS), 31(3):363–396, 2005.
[67] J. K. Hines, H. J. Fromm, and R. B. Honzatko. Novel allosteric activation site
in Escherichia coli fructose-1,6-bisphosphatase. J. Biol. Chem., 281(27):18386–
93, Jul 2006.
[68] J. K. Hines, H. J. Fromm, and R. B. Honzatko. Structures of activated fructose-
1,6-bisphosphatase from Escherichia coli. Coordinate regulation of bacterial
metabolism and the conservation of the R-state. J. Biol. Chem., 282(16):11696–
704, Apr 2007.
[69] Mahlon B Hoagland, Elizabeth B Keller, and Paul C Zamecnik. Enzymatic
carboxyl activation of amino acids. J Biol Chem, 218(1):345–358, 1956.
212
[70] C Eric Hodgman and Michael C Jewett. Cell-free synthetic biology: thinking
outside the cell. Metab Eng, 14(3):261–9, May 2012.
[71] Nicholas Horvath, Michael Vilkhovoy, Joseph A. Wayman, Kara Calhoun,
James Swartz, and Jeffrey D. Varner. Toward a genome scale sequence specific
dynamic model of cell-free protein synthesis in Escherichia coli. bioRxiv, 2017.
[72] Chelsea Y. Hu, Jeffrey D. Varner, and Julius B. Lucks. Generating effective
models and parameters for RNA genetic circuits. ACS Synthetic Biology,
4(8):914–926, August 2015.
[73] Chelsea Y Hu, Jeffrey D Varner, and Julius B Lucks. Generating effective
models and parameters for rna genetic circuits. ACS Synth Biol, 4(8):914–26,
Aug 2015.
[74] Tianjiao Huang, Michael R. Armbruster, John B. Coulton, and James L. Ed-
wards. Chemical tagging in mass spectrometry for systems biology. Analytical
Chemistry, 91(1):109–125, Jan 2019.
[75] Tianjiao Huang, Maria Toro, Richard Lee, Dawn S. Hui, and James L. Ed-
wards. Multi-functional derivatization of amine, hydroxyl, and carboxylate
groups for metabolomic investigations of human tissue by electrospray ion-
ization mass spectrometry. Analyst, 143:3408–3414, 2018.
[76] S Huband, P Hingston, L Barone, and L While. A Review of Multiobjective
Test Problems and a Scalable Test Problem Toolkit. IEEE Trans. Evol. Comp.,
10:477 – 506, 2006.
213
[77] Amber Jannasch, Miroslav Sedlak, and Jiri Adamec. Quantification of pen-
tose phosphate pathway (ppp) metabolites by liquid chromatography-mass
spectrometry (lc-ms). Methods in molecular biology (Clifton, N.J.), 708:159–171,
2011.
[78] Thapakorn Jaroentomeechai, Jessica C Stark, Aravind Natarajan, Cameron J
Glasscock, Laura E Yates, Karen J Hsu, Milan Mrksich, Michael C Jewett,
and Matthew P DeLisa. Single-pot glycoprotein biosynthesis using a cell-
free transcription-translation system enriched with glycosylation machinery.
Nature communications, 9(1):2686, 2018.
[79] Lisa Jeske, Sandra Placzek, Ida Schomburg, Antje Chang, and Dietmar
Schomburg. BRENDA in 2019: a European ELIXIR core data resource. Nucleic
Acids Research, 47(D1):D542–D549, 11 2018.
[80] M.C. Jewett, A. Voloshin, and J. Swartz. Prokaryotic systems for in vitro expres-
sion, pages 391–411. Eaton Publishing, Westborough, MA, 2002.
[81] Michael C Jewett, Kara A Calhoun, Alexei Voloshin, Jessica J Wuu, and
James R Swartz. An integrated cell-free metabolic platform for protein
production and synthetic biology. Mol Syst Biol, 4:220, 2008.
[82] Michael C. Jewett and James R. Swartz. Mimicking the Escherichia coli
cytoplasmic environment activates long-lived and efficient cell-free protein
synthesis. Biotechnol Bioeng, 86(1):19–26, 2004.
[83] Michael C. Jewett and James R. Swartz. Substrate replenishment extends
214
protein synthesis with an in vitro translation system designed to mimic the
cytoplasm. Biotechnol Bioeng, 87(4):465–471, 2004.
[84] H Kacser and JA Burns. The control of flux. Symp Soc Exp Biol., 27(27):65–104,
1973.
[85] S. Kale, P. Arjunan, W. Furey, and F. Jordan. A dynamic loop at the active
center of the Escherichia coli pyruvate dehydrogenase complex E1 compo-
nent modulates substrate utilization and chemical communication with the
E2 component. J. Biol. Chem., 282(38):28106–16, Sep 2007.
[86] D Kalyanmoy, A Pratap, S Agarwal, and T. Meyarivan. A Fast and Elitist
Multiobjective Genetic Algorithm: NSGA-II. IEEE Trans. Evol. Comp., 6:182 –
197, 2002.
[87] A Kamp and S Schuster. Metatool 5.0: fast and flexible elementary modes
analysis. Bioinformatics, 22(15):1930–1931, 2006.
[88] J. R. Karr, J. C. Sanghvi, D. N. Macklin, M. V. Gutschow, J. M. Jacobs, B. Bolival,
N. Assad-Garcia, J. I. Glass, and M. W. Covert. A whole-cell computational
model predicts phenotype from genotype. Cell, 150(2):389–401, Jul 2012.
[89] Eyal Karzbrun, Jonghyeon Shin, Roy H. Bar-Ziv, and Vincent Noireaux.
Coarse-Grained Dynamics of Protein Synthesis in a Cell-Free System. Physical
Review Letters, 106(4), January 2011.
[90] A. Khodayari and C. D. Maranas. A genome-scale Escherichia coli kinetic
215
metabolic model k-ecoli457 satisfying flux data for multiple mutant strains.
Nat Commun, 7:13806, Dec 2016.
[91] Ali Khodayari, Ali R Zomorrodi, James C Liao, and Costas D Maranas. A
kinetic model of Escherichia coli core metabolism satisfying multiple sets of
mutant flux data. Metab Eng, 25:50–62, Sep 2014.
[92] Takanori Kigawa, Yutaka Muto, and Shigeyuki Yokoyama. Cell-free synthesis
and amino acid-selective stable isotope labeling of proteins for NMR analysis.
J Biomolec NMR, 6(2):129–134, 1995.
[93] Dong-Myung Kim and James R. Swartz. Regeneration of adenosine triphos-
phate from glycolytic intermediates for cell-free protein synthesis. Biotechnol
Bioeng, 74(4):309–316, 2001.
[94] Dong-Myung Kim and James R. Swartz. Efficient production of a bioactive,
multiple disulfide-bonded protein using modified extracts of Escherichia
coli. Biotechnol Bioeng, 85(2):122–129, 2004.
[95] JI Kim, JD Varner, and D Ramkrishna. A hybrid model of anaerobic e. coli
gjt001: Combination of elementary flux modes and cybernetic variables.
Biotechnol. Prog., 24(5):993–1006, 2008.
[96] Jin Il Kim, Hyun-Seob Song, Sunil R Sunkara, Arvind Lali, and Doraiswami
Ramkrishna. Exacting predictions by cybernetic model confirmed experimen-
tally: steady state multiplicity in the chemostat. Biotechnol Prog, 28(5):1160–6,
2012.
216
[97] S Kirkpatrick, C D Gelatt, Jr, and M P Vecchi. Optimization by simulated
annealing. Science, 220(4598):671–80, May 1983.
[98] Dhinakar S. Kompala, Doraiswami Ramkrishna, and George T. Tsao. Cyber-
netic modeling of microbial growth on multiple substrates. Biotechnol Bioeng,
26(11):1272–1281, 1984.
[99] O. Kotte, J. B. Zaugg, and M. Heinemann. Bacterial adaptation through
distributed sensing of metabolic fluxes. Mol. Syst. Biol., 6:355, 2010.
[100] Lars Kuepfer, Matthias Peter, Uwe Sauer, and Jörg Stelling. Ensemble model-
ing for analysis of cell signaling dynamics. Nat Biotechnol, 25(9):1001–6, Sep
2007.
[101] Muriel Lederman and Geoffrey Zubay. Dna-directed peptide synthesis i. a
comparison of t2 and Escherichia coli dna-directed peptide synthesis in two
cell-free systems. Biochimica et Biophysica Acta (BBA)-Nucleic Acids and Protein
Synthesis, 149(1):253–258, 1967.
[102] L Lee, JD Varner, and K Ko. Parallel extreme pathway computation for
metabolic networks. Comput Syst Bioinformatics Conf, Int IEEE CS, 0:636–639,
2004.
[103] Sangbum Lee, Chan Phalakornkule, Michael M Domach, and Ignacio E
Grossmann. Recursive MILP model for finding all the alternate optima in LP
models for metabolic networks. Comput. Chem. Eng., 24(2):711 – 716, 2000.
217
[104] Sun Bok Lee and James E. Bailey. Genetically structured models forlac
promoter–operator function in the Escherichia coli chromosome and in mul-
ticopy plasmids: Lac operator function. Biotechnol Bioeng, 26(11):1372–1382,
1984.
[105] Yun Lee, Jimmy G Lafontaine Rivera, and James C Liao. Ensemble modeling
for robustness analysis in engineering non-native metabolic pathways. Metab
Eng, 25:63–71, Sep 2014.
[106] Joshua Lequieu, Anirikh Chakrabarti, Satyaprakash Nayak, and Jeffrey D
Varner. Computational modeling and analysis of insulin induced eukaryotic
translation initiation. PLoS Comput Biol, 7(11):e1002263, Nov 2011.
[107] Joshua A Lerman, Daniel R Hyduke, Haythem Latif, Vasiliy A Portnoy,
Nathan E Lewis, Jeffrey D Orth, Alexandra C Schrimpe-Rutledge, Richard D
Smith, Joshua N Adkins, Karsten Zengler, and Bernhard Ø Palsson. In silico
method for modelling metabolism and gene product expression at genome
scale. Nat Commun, 3:929, 2012.
[108] Nathan E Lewis, Harish Nagarajan, and Bernhard Ø Palsson. Constraining
the metabolic genotype-phenotype relationship using a phylogeny of in silico
methods. Nat Rev Microbiol, 10(4):291–305, Apr 2012.
[109] Jun Li, Liangcai Gu, John Aach, and George M. Church. Improved cell-free
rna and protein synthesis system. PLoS ONE, 9(9):1–11, 09 2014.
[110] Yuan Lu, John P Welsh, and James R Swartz. Production and stabilization of
218
the trimeric influenza hemagglutinin stem domain for potentially broadly
protective influenza vaccines. Proc Natl Acad Sci, 111(1):125–30, Jan 2014.
[111] Deyan Luan, Fania Szlam, Kenichi A Tanaka, Philip S Barie, and Jeffrey D
Varner. Ensembles of uncertain mathematical models can identify network
response to therapeutic interventions. Mol Biosyst, 6(11):2272–86, Nov 2010.
[112] Deyan Luan, Michael Zai, and Jeffrey D Varner. Computationally derived
points of fragility of a human cascade are consistent with current therapeutic
strategies. PLoS Comput Biol, 3(7):e142, Jul 2007.
[113] Julius B. Lucks, Lei Qi, Vivek K. Mutalik, Denise Wang, and Adam P. Arkin.
Versatile RNA-sensing transcriptional regulators for engineering genetic
networks. Proc Natl Acad Sci, 108(21):8617–8622, May 2011.
[114] C. MacKintosh and H. G. Nimmo. Purification and regulatory properties of
isocitrate lyase from Escherichia coli ML308. Biochem. J., 250(1):25–31, Feb
1988.
[115] R. Mahadevan and C.H. Schilling. The effects of alternate optimal solutions
in constraint-based genome-scale metabolic models. Metab Eng, 5(4):264 –
276, 2003.
[116] Arijit Maitra and Ken A. Dill. Bacterial growth laws reflect the evolutionary
importance of energy efficiency. Proc Natl Acad Sci, 112:406–411, 2015.
[117] Rey W. Martin, Benjamin J. Des Soye, Yong-Chan Kwon, Jennifer Kay, Roder-
ick G. Davis, Paul M. Thomas, Natalia I. Majewska, Cindy X. Chen, Ryan D.
219
Marcum, Mary Grace Weiss, Ashleigh E. Stoddart, Miriam Amiram, Arnaz K.
Ranji Charna, Jaymin R. Patel, Farren J. Isaacs, Neil L. Kelleher, Seok Hoon
Hong, and Michael C. Jewett. Cell-free protein synthesis from genomically re-
coded bacteria enables multisite incorporation of noncanonical amino acids.
Nature Communications, 9(1):1203, 2018.
[118] J H Matthaei and M W Nirenberg. Characteristics and stabilization of dnaase-
sensitive protein synthesis in e. coli extracts. Proc Natl Acad Sci, 47:1580–8,
Oct 1961.
[119] Marco Mauri and Stefan Klumpp. A model for sigma factor competition in
bacterial cells. PLoS Comput Biol, 10(10):e1003845, Oct 2014.
[120] Stuart McLaughlin. The mechanism of action of dnp on phospholipid bilayer
membranes. The Journal of Membrane Biology, 9(1):361–372, Dec 1972.
[121] Nathalie Michel-Reydellet, Kara Calhoun, and James Swartz. Amino acid
stabilization for cell-free protein synthesis by modification of the Escherichia
coli genome. Metabolic Engineering, 6(3):197 – 203, 2004.
[122] Ron Milo, Paul Jorgensen, Uri Moran, Griffin Weber, and Michael Springer.
Bionumbers–the database of key numbers in molecular and cell biology.
Nucleic Acids Res, 38:750–3, 2009.
[123] Tae Seok Moon, Chunbo Lou, Alvin Tamsir, Brynne C Stanton, and Christo-
pher A Voigt. Genetic programs constructed from layered logic gates in
single cells. Nature, 491(7423):249–53, Nov 2012.
220
[124] Charles E Nakamura and Gregory M Whited. Metabolic engineering for the
microbial production of 1,3-propanediol. Current Opinion in Biotechnology,
14(5):454 – 459, 2003.
[125] S Nayak, J K Siddiqui, and J D Varner. Modelling and analysis of an ensemble
of eukaryotic translation initiation models. IET Syst Biol, 5(1):2, Jan 2011.
[126] Patrick P Ng, Ming Jia, Kedar G Patel, Joshua D Brody, James R Swartz,
Shoshana Levy, and Ronald Levy. A vaccine directed to b cells and produced
by cell-free protein synthesis generates potent antilymphoma immunity. Proc
Natl Acad Sci, 109(36):14526–14531, 2012.
[127] Alexander Nieß, Jurek Failmezger, Maike Kuschel, Martin Siemann-
Herzberg, and Ralf Takors. Experimentally Validated Model Enables Debot-
tlenecking of in Vitro Protein Synthesis and Identifies a Control Shift under
in Vivo Conditions. ACS Synthetic Biology, 6(10):1913–1921, October 2017.
[128] M W Nirenberg and J H Matthaei. The dependence of cell-free protein
synthesis in e. coli upon naturally occurring or synthetic polyribonucleotides.
Proc Natl Acad Sci, 47:1588–602, Oct 1961.
[129] Edward J O’Brien, Joshua A Lerman, Roger L Chang, Daniel R Hyduke,
and Bernhard Ø Palsson. Genome-scale models of metabolism and gene
expression extend and refine growth phenotype prediction. Mol. Sys. Biol.,
9(1):693, 2013.
[130] T. Ogawa, K. Murakami, H. Mori, N. Ishii, M. Tomita, and M. Yoshin. Role of
221
phosphoenolpyruvate in the NADP-isocitrate dehydrogenase and isocitrate
lyase reaction in Escherichia coli. J. Bacteriol., 189(3):1176–8, Feb 2007.
[131] You-Kwan Oh, Bernhard Ø Palsson, Sung M Park, Christophe H Schilling,
and Radhakrishnan Mahadevan. Genome-scale reconstruction of metabolic
network in bacillus subtilis based on high-throughput phenotyping and gene
essentiality data. J Biol Chem, 282(39):28791–9, Sep 2007.
[132] S. Okino, M. Suda, K. Fujikura, M. Inui, and H. Yukawa. Production of
D-lactic acid by Corynebacterium glutamicum under oxygen deprivation.
Appl. Microbiol. Biotechnol., 78(3):449–54, Mar 2008.
[133] JD Orth, I Thiele, and BØ Palsson. What is flux balance analysis? Nat.
Biotechnol., 28(3):245–248, 2010.
[134] Irene Otero-Muras and Julio R Banga. Multicriteria global optimization for
biocircuit design. BMC Syst Biol, 8:113, Sep 2014.
[135] T.N Palmer, G.J Shutts, R Hagedorn, F.J Doblas-Reyes, T Jung, and M Leut-
becher. Representing model uncertainty in weather and climate prediction.
Ann Rev Earth and Planetary Sci, 33:163–193, 2005.
[136] BØ Palsson. Systems Biology: Properties of Reconstructed Networks. Cambridge
University Press, New York, NY, USA, 2006.
[137] Keith Pardee, Shimyn Slomovic, Peter Q Nguyen, Jeong Wook Lee, Nina
Donghia, Devin Burrill, Tom Ferrante, Fern R McSorley, Yoshikazu Furuta,
Andyna Vernet, Michael Lewandowski, Christopher N Boddy, Neel S Joshi,
222
and James J Collins. Portable, on-demand biomolecular manufacturing. Cell,
167(1):248–59.e12, Sep 2016.
[138] D. S. Pereira, L. J. Donald, D. J. Hosfield, and H. W. Duckworth. Active site
mutants of Escherichia coli citrate synthase. Effects of mutations on catalytic
and allosteric properties. J. Biol. Chem., 269(1):412–7, Jan 1994.
[139] Jessica G Perez, Jessica C Stark, and Michael C Jewett. Cell-free synthetic
biology: engineering beyond the cell. Cold Spring Harbor perspectives in biology,
8(12):a023853, 2016.
[140] R. R. Ramsay, B. A. Ackrell, C. J. Coles, T. P. Singer, G. A. White, and G. D.
Thorn. Reaction site of carboxanilides and of thenoyltrifluoroacetone in
complex ii. Proc Natl Acad Sci, 78(2):825–828, Feb 1981. 6940149[pmid].
[141] Dae-Kyun Ro, Eric M. Paradise, Mario Ouellet, Karl J. Fisher, Karyn L. New-
man, John M. Ndungu, Kimberly A. Ho, Rachel A. Eachus, Timothy S. Ham,
James Kirby, Michelle C. Y. Chang, Sydnor T. Withers, Yoichiro Shiba, Rich-
mond Sarpong, and Jay D. Keasling. Production of the antimalarial drug
precursor artemisinic acid in engineered yeast. Nature, 440(7086):940–943, 04
2006.
[142] M. S. Robinson, R. A. Easom, M. J. Danson, and P. D. Weitzman. Citrate
synthase of Escherichia coli. Characterisation of the enzyme from a plasmid-
cloned gene and amplification of the intracellular levels. FEBS Lett., 154(1):51–
4, Apr 1983.
223
[143] Gabriel Rosenblum and Barry S. Cooperman. Engine out of the chassis:
Cell-free protein synthesis and its uses. FEBS Letters, 588(2):261 – 268, 2014.
Protein Engineering.
[144] G.J.G. Ruijter and J. Visser. Determination of intermediary metabolites in
aspergillus niger. Journal of Microbiological Methods, 25(3):295 – 302, 1996.
[145] A Saltelli, P Annoni, I Azzini, F Campolongo, M Ratto, and S Tarantola.
Variance based sensitivity analysis of model output. Design and estimator
for the total sensitivity index. Comp Phys Comm, 181:259–70, 2010.
[146] Claudia Sánchez, Juan Carlos Quintero, and Silvia Ochoa. Flux balance
analysis in the production of clavulanic acid by Streptomyces clavuligerus.
Biotechnol Prog, 31(5):1226–1236, 2015.
[147] Michael A Savageau, Eberhard O Voit, and Douglas H Irvine. Biochemical
systems theory and metabolic control theory: 1. fundamental similarities and
differences. Mathematical Biosciences, 86(2):127–145, 1987.
[148] C H Schilling, D Letscher, and B O Palsson. Theory for the systemic definition
of metabolic pathways and their use in interpreting metabolic function from
a pathway-oriented perspective. J Theor Biol, 203(3):229–48, Apr 2000.
[149] Robert Schuetz, Lars Kuepfer, and Uwe Sauer. Systematic evaluation of
objective functions for predicting intracellular fluxes in Escherichia coli. Mol
Syst Biol, 3:119, 2007.
224
[150] S Schuster, D A Fell, and T Dandekar. A general definition of metabolic path-
ways useful for systematic organization and analysis of complex metabolic
networks. Nat Biotechnol, 18(3):326–32, Mar 2000.
[151] J Sendın, I Otero-Muras, A A Alonso, and J Banga. Improved Optimization
Methods for the Multiobjective Design of Bioprocesses. Ind. Eng. Chem. Res.,
45:8594 – 8603, 2006.
[152] Y Shimizu, A Inoue, Y Tomari, T Suzuki, T Yokogawa, K Nishikawa, and
T Ueda. Cell-free translation reconstituted with purified components. Nat
Biotechnol, 19(8):751–5, Aug 2001.
[153] I.M Sobol. Global sensitivity indices for nonlinear mathematical models
and their Monte Carlo estimates. Mathematics and Computers in Simulation,
55:271–80, 2001.
[154] Hyohak Song and Sang Yup Lee. Production of succinic acid by bacterial
fermentation. Enzyme and Microbial Technology, 39(3):352 – 361, 2006. The
Asia-Pacific Biochemical Engineering Conference (APBioChEC 2005).
[155] Hyun-Seob Song and Doraiswami Ramkrishna. Prediction of metabolic
function from limited data: Lumped hybrid cybernetic modeling (l-hcm).
Biotechnol Bioeng, 106(2):271–84, Jun 2010.
[156] Hyun-Seob Song and Doraiswami Ramkrishna. Cybernetic models based
on lumped elementary modes accurately predict strain-specific metabolic
function. Biotechnol Bioeng, 108(1):127–40, Jan 2011.
225
[157] Hyun-Seob Song and Doraiswami Ramkrishna. Prediction of dynamic be-
havior of mutant strains from limited wild-type data. Metab Eng, 14(2):69–80,
Mar 2012.
[158] Sang Ok Song and Jeffrey Varner. Modeling and analysis of the molecular
basis of pain in sensory neurons. PLoS One, 4(9):e6758, 2009.
[159] AS Spirin, VI Baranov, LA Ryabova, SY Ovodov, and YB Alakhov. A contin-
uous cell-free translation system capable of producing polypeptides in high
yield. Science, 242(4882):1162–1164, 1988.
[160] D.E. Steinmeyer and M.L. Shuler. Structured model for Saccharomyces
cerevisiae. Chem. Eng. Sci., 44:2017–30, 1989.
[161] Tobias Stögbauer, Lukas Windhager, Ralf Zimmer, and Joachim O. Rädler.
Experiment and mathematical modeling of gene expression dynamics in a
cell-free system. Integrative Biology, 4(5):494–501, May 2012.
[162] James Swartz. A pure approach to constructive biology. Nature Biotechnology,
19:732–3, 2001.
[163] James R. Swartz. Transforming biochemical engineering with cell-free biol-
ogy. AIChE Journal, 58(1):5–13, 2012.
[164] Kazuyuki Takai, Tatsuya Sawasaki, and Yaeta Endo. Practical cell-free protein
synthesis system using purified wheat embryos. Nature Protocols, 5, 2001.
[165] Yikun Tan and James C Liao. Metabolic ensemble modeling for strain engi-
neers. Biotechnol J, 7(3):343–53, Mar 2012.
226
[166] Akito Taneda. Multi-objective optimization for RNA design with multiple
target secondary structures. BMC bioinformatics, 16(1):280, 2015.
[167] A.L. Tappel. Inhibition of electron transport by antimycin a, alkyl hydroxy
napthoquinones and metal coordination compounds. Biochemical Pharmacol-
ogy, 3(4):289 – 296, 1960.
[168] Ryan Tasseff, Satyaprakash Nayak, Saniya Salim, Poorvi Kaushik, Noreen
Rizvi, and Jeffrey D Varner. Analysis of the molecular networks in androgen
dependent and independent prostate cancer revealed fragile and robust
subsystems. PLoS One, 5(1):e8864, 2010.
[169] Ryan Tasseff, Satyaprakash Nayak, Sang Ok Song, Andrew Yen, and Jeffrey D
Varner. Modeling and analysis of retinoic acid induced differentiation of
uncommitted precursor cells. Integr Biol (Camb), 3(5):578–91, May 2011.
[170] Uwe Theobald, Werner Mailinger, Michael Baltes, Manfred Rizzi, and
Matthias Reuss. In vivo analysis of metabolic dynamics in saccharomyces
cerevisiae : I. experimental observations. Biotechnol Bioeng, 55(2):305–316,
1997.
[171] Ines Thiele, Neema Jamshidi, Ronan M. T. Fleming, and Bernhard O. Palsson.
Genome-scale reconstruction of Escherichia coli’s transcriptional and transla-
tional machinery: A knowledge base, its mathematical formulation, and its
functional characterization. PLOS Computational Biology, 5(3):1–13, 03 2009.
[172] Andrea C Timm, Peter G Shankles, Carmen M Foster, Mitchel J Doktycz, and
227
Scott T Retterer. Toward microfluidic reactors for cell-free protein synthesis
at the point-of-care. Small, 12(6):810–817, 2016.
[173] M Tomita, K Hashimoto, K Takahashi, T S Shimizu, Y Matsuzaki, F Miyoshi,
K Saito, S Tanida, K Yugi, J C Venter, and C A Hutchison. E-cell: software
environment for whole-cell simulation. Bioinformatics, 15(1):72–84, 1999.
[174] Linh M Tran, Matthew L Rizk, and James C Liao. Ensemble modeling of
metabolic networks. Biophys J, 95(12):5606–17, Dec 2008.
[175] Kelly A. Underwood, James R. Swartz, and Joseph D. Puglisi. Quantitative
polysome analysis identifies limitations in bacterial cell-free protein synthesis.
Biotech. Bioeng., 91(4):425–35, 2005.
[176] A Varma and B O Palsson. Stoichiometric flux balance models quantitatively
predict growth and metabolic by-product secretion in wild-type Escherichia
coli W3110. Appl. Environ. Microbiol., 60(10):3724–3731, 1994.
[177] J. Varner and D. Ramkrishna. Metabolic engineering from a cybernetic
perspective. 1. theoretical preliminaries. Biotechnology Progress, 15(3):407–425,
1999.
[178] J Varner and D Ramkrishna. Metabolic engineering from a cybernetic per-
spective: aspartate family of amino acids. Metab Eng, 1(1):88–116, Jan 1999.
[179] Varnerlab. http://www.varnerlab.org/downloads/.
[180] M. Vilkhovoy, M. Minot, and J. D. Varner. Effective dynamic models of
metabolic networks. IEEE Life Sciences Letters, 2(4):51–54, Dec 2016.
228
[181] Michael Vilkhovoy, Nicholas Horvath, Che-Hsiao Shih, Joseph A. Wayman,
Kara Calhoun, James Swartz, and Jeffrey D. Varner. Sequence specific model-
ing of e. coli cell-free protein synthesis. ACS Synthetic Biology, 7(8):1844–1857,
Aug 2018.
[182] Tobias von der Haar. Mathematical and computational modelling of ribo-
somal movement and protein synthesis: An overview. Computational and
Structural Biotechnology Journal, 1(1):e201204002, 2012.
[183] Joseph A. Wayman, Adithya Sagar, and Jeffrey D. Varner. Dynamic modeling
of cell-free biochemical networks using effective kinetic models. Processes,
3(1):138, 2015.
[184] Joseph A. Wayman and Jeffrey D. Varner. Biological systems modeling of
metabolic and signaling networks. Curr Opin Chem Eng, 2, 2013.
[185] Sharon J. Wiback, Iman Famili, Harvey J. Greenberg, and Bernhard Ø Palsson.
Monte carlo sampling can be used to determine the size and shape of the
steady-state flux space. Journal of Theoretical Biology, 228(4):437–447, 2004.
[186] Sharon J Wiback, Radhakrishnan Mahadevan, and Bernhard Ø Palsson.
Reconstructing metabolic flux vectors from extreme pathways: defining the
alpha-spectrum. J Theor Biol, 224(3):313–24, Oct 2003.
[187] Wolfgang Wiechert. 13C Metabolic Flux Analysis. Metabol. Eng., 3(3):195 –
206, 2001.
229
[188] T Winnick. Incorporation of labeled amino acids into the protein of embry-
onic and tumor tissue homogenates. 9(1):247, 1950.
[189] R. C. Wohl and G. Markus. Phosphoenolpyruvate carboxylase of Escherichia
coli. Purification and some properties. J. Biol. Chem., 247(18):5785–92, Sep
1972.
[190] P Wu, N G Ray, and M L Shuler. A single-cell model for cho cells. Ann N Y
Acad Sci, 665:152–87, Oct 1992.
[191] Wen-Chu Yang, Miroslav Sedlak, Fred E. Regnier, Nathan Mosier, Nancy
Ho, and Jiri Adamec. Simultaneous quantification of metabolites involved
in central carbon and energy metabolism using reversed-phase liquid
chromatography-mass spectrometry and in vitro 13c labeling. Analytical
Chemistry, 80(24):9508–9516, Dec 2008.
[192] Harry Yim, Robert Haselbeck, Wei Niu, Catherine Pujol-Baxley, Anthony
Burgard, Jeff Boldt, Julia Khandurina, John D Trawick, Robin E Osterhout,
Rosary Stephen, Jazell Estadilla, Sy Teisan, H Brett Schreyer, Stefan Andrae,
Tae Hoon Yang, Sang Yup Lee, Mark J Burk, and Stephen Van Dien. Metabolic
engineering of Escherichia coli for direct production of 1,4-butanediol. Nat
Chem Biol, 7(7):445–452, 07 2011.
[193] Jamey D. Young, Kristene L. Henne, John A. Morgan, Allan E. Konopka, and
Doraiswami Ramkrishna. Integrating cybernetic modeling with pathway
analysis provides a dynamic, systems-level description of metabolic control.
Biotechnol Bioeng, 100(3):542–559, 2008.
230
[194] Nicola Zamboni, Sarah-Maria Fendt, and Uwe Sauer. 13c-based metabolic
flux analysis. Nature Protocols, 4:878–92, May 2009.
[195] J. Zawada, B. Richter, E. Huang, E. Lodes, A. Shah, and J. R. Swartz. High-
Density, Defined Media Culture for the Production of Escherichia coli Cell Extracts,
chapter 9, pages 142–156.
[196] James F. Zawada, Gang Yin, Alexander R. Steiner, Junhao Yang, Alpana
Naresh, Sushmita M. Roy, Daniel S. Gold, Henry G. Heinsohn, and Christo-
pher J. Murray. Microscale to manufacturing scale-up of cell-free cytokine
production—a new approach for shortening protein production development
timelines. Biotechnol Bioeng, 108(7):1570–1578, 2011.
[197] Ying Zhang, Ines Thiele, Dana Weekes, Zhanwen Li, Lukasz Jaroszewski,
Krzysztof Ginalski, Ashley M. Deacon, John Wooley, Scott A. Lesley, Ian A.
Wilson, Bernhard Palsson, Andrei Osterman, and Adam Godzik. Three-
dimensional structural view of the central metabolic network of thermotoga
maritima. Science, 325(5947):1544–1549, 2009.
[198] T. Zhu, M. F. Bailey, L. M. Angley, T. F. Cooper, and R. C. Dobson. The
quaternary structure of pyruvate kinase type 1 from Escherichia coli at low
nanomolar concentrations. Biochimie, 92(1):116–20, Jan 2010.
231