DYNAMIC MODELS OF METABOLIC NETWORKS AND ANALYSIS OF CELL-FREE PROTEIN SYNTHESIS A Dissertation Presented to the Faculty of the Graduate School of Cornell University in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy by Michael Vilkhovoy December 2019 © 2019 Michael Vilkhovoy ALL RIGHTS RESERVED DYNAMIC MODELS OF METABOLIC NETWORKS AND ANALYSIS OF CELL-FREE PROTEIN SYNTHESIS Michael Vilkhovoy, Ph.D. Cornell University 2019 BIOGRAPHICAL SKETCH Michael Vilkhovoy was born in Holyoke, Massachusetts and attended the University of Massachusetts Amherst, graduating cum laude with a Bachelors of Science in Chemical Engineering with Departmental Honors in 2014. He enrolled into the Robert Frederick Smith School of Chemical and Biomolecular Engineer- ing at Cornell University in August of 2014. During his time at Cornell he was interested in developing data-driven computational models of biological systems. Under the guidance of Professor Jeffrey Varner, Michael studied complex metabolic reaction networks and developed wet-lab and computational metabolic engineer- ing tools. He received his doctorate of philosophy in Chemical and Biomolecular Engineering in 2019. iii This work is dedicated to my sister, Tanya. iv ACKNOWLEDGEMENTS I would first like to thank my wife, Inna, for all her support and always being there for me. And to my parents, for raising me and guiding me into the person I am today. I would like to thank Professor Jeffrey Varner for his guidance in sharpening my skillset to be a better scientist and communicator. Thank you to my friends and members of the Varner group for the continued support and making the process enjoyable throughout the years, including Nick Horvath, David Dai, Tyler Moeller, Adi Sagar, Mason Minot, Sandra Vadhin, and Abhinav Adhikari. And a final thank you to my family: Alex, Julie, Viktoria, Serge, David, Liz, Christina, and Mark for your love and support. v TABLE OF CONTENTS Biographical Sketch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii 1 Introduction 1 1.1 Metabolic modeling methods . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Constraint based modeling of metabolism . . . . . . . . . . . . . . . 4 1.3 Cell-free protein synthesis . . . . . . . . . . . . . . . . . . . . . . . . 5 1.3.1 Mathematical models of cell-free protein synthesis . . . . . . 8 2 Effective dynamic models of metabolic networks 12 2.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.5 Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.5.1 Elementary mode and flux balance analysis . . . . . . . . . . 23 2.5.2 Global sensitivity analysis . . . . . . . . . . . . . . . . . . . . 24 2.5.3 Estimation of model parameters . . . . . . . . . . . . . . . . 24 3 Sequence Specific Modeling of E. coli Cell-Free Protein Synthesis 26 3.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.3 Results and discussion . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.3.1 Model derivation and validation . . . . . . . . . . . . . . . . 31 3.3.2 Metabolic flux distributions . . . . . . . . . . . . . . . . . . . 35 3.3.3 Analysis of CFPS performance . . . . . . . . . . . . . . . . . 40 3.3.4 Global sensitivity analysis . . . . . . . . . . . . . . . . . . . . 47 3.3.5 Potential alternative metabolic optima . . . . . . . . . . . . . 51 3.3.6 Summary and conclusions . . . . . . . . . . . . . . . . . . . . 54 3.4 Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . 56 3.4.1 Glucose/NMP cell-free protein synthesis. . . . . . . . . . . . 56 3.4.2 Protein product and metabolite measurements. . . . . . . . 58 3.4.3 Formulation and solution of the model equations. . . . . . . 59 vi 3.4.4 Calculation of energy efficiency. . . . . . . . . . . . . . . . . 64 3.4.5 Quantification of uncertainty. . . . . . . . . . . . . . . . . . . 65 3.4.6 Global sensitivity analysis. . . . . . . . . . . . . . . . . . . . 66 3.4.7 Potential alternative optimal metabolic flux solutions. . . . . 66 3.5 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 4 Absolute quantification of cell-free protein synthesis metabolism by reversed-phase liquid chromatography-mass spectrometry 68 4.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 4.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 4.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 4.3.1 Aniline tagged metabolites . . . . . . . . . . . . . . . . . . . 73 4.3.2 Amino Acid Analysis . . . . . . . . . . . . . . . . . . . . . . 75 4.3.3 Nucleotide charged sugars . . . . . . . . . . . . . . . . . . . 80 4.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 4.5 Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . 84 4.5.1 Aniline derivatization . . . . . . . . . . . . . . . . . . . . . . 84 4.5.2 Amino acid derivatization . . . . . . . . . . . . . . . . . . . . 87 4.5.3 Nucleotide charge sugar detection . . . . . . . . . . . . . . . 87 4.6 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 5 An integrated kinetic constraint-based model of E. coli cell-free protein synthesis 90 5.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 5.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 5.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 5.3.1 Integration of kinetic parameters, enzyme levels, and metabo- lite concentrations . . . . . . . . . . . . . . . . . . . . . . . . 94 5.3.2 Transcription/Translation is oxygen dependent . . . . . . . 97 5.3.3 Kinetic descriptions with metabolic constraints predict metabolic behavior of oxidative phosphorylation inhibitors 101 5.3.4 Analysis of CFPS metabolism with oxidative phosphoryla- tion inhibitors . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 5.3.5 Enzyme activity assays reveal allosteric regulation in CFPS . 110 5.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 5.5 Materials & Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 5.5.1 Cell-free protein synthesis and oxidative phosphorylation inhibitors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 vii 5.5.2 Absolute quantification of central carbon metabolites . . . . 117 5.5.3 Amino acid analysis . . . . . . . . . . . . . . . . . . . . . . . 118 5.5.4 Glutamate and maltose assays . . . . . . . . . . . . . . . . . 119 5.5.5 Protein quantification . . . . . . . . . . . . . . . . . . . . . . 119 5.5.6 Enzyme activity assays . . . . . . . . . . . . . . . . . . . . . . 120 5.5.7 Absolute quantification of mRNA . . . . . . . . . . . . . . . 120 5.5.8 Formulation of model equations . . . . . . . . . . . . . . . . 122 5.5.9 Quantification of uncertainty . . . . . . . . . . . . . . . . . . 126 5.5.10 Calculation of energy efficiency . . . . . . . . . . . . . . . . . 127 5.5.11 Calculation of carbon yield . . . . . . . . . . . . . . . . . . . 128 6 Toward a genome scale sequence specific dynamic model of cell-free pro- tein synthesis in Escherichia coli 130 6.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 6.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 6.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 6.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 6.5 Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . 153 6.5.1 Cell-free protein synthesis and measurement. . . . . . . . . 153 6.5.2 Formulation and solution of the model equations. . . . . . . 154 6.5.3 Estimation of kinetic model parameters. . . . . . . . . . . . . 158 6.5.4 Reaction group knockouts. . . . . . . . . . . . . . . . . . . . 161 6.5.5 Sensitivity of CAT productivity to transcription and translation.163 6.5.6 Calculation of energy efficiency. . . . . . . . . . . . . . . . . 164 6.5.7 Availability of model code. . . . . . . . . . . . . . . . . . . . 165 6.6 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 7 JuPOETs: A constrained multiobjective optimization approach to esti- mate biochemical model ensembles in the Julia programming language 172 7.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 7.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 7.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176 7.3.1 JuPOETs optimization problem formulation. . . . . . . . . . 177 7.4 Availability of data and materials . . . . . . . . . . . . . . . . . . . . 183 7.5 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . 185 7.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 7.7 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195 8 Summary & Conclusion 196 viii A Appendix 200 ix LIST OF TABLES 3.1 Transcription and translation template reactions for protein produc- tion. The symbol GP denotes the gene encoding protein product P , RT denotes the concentration of RNA polymerase, G∗P denotes the gene bounded by the RNA polymerase (open complex), ηi and αj denote the stoichiometric coefficients for nucleotide and amino acid, respectively, Pi denotes inorganic phosphate, RX denotes the ribo- some concentration, R∗X denotes bound ribosome, and AAj denotes jth amino acid. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 3.2 Parameters for sequence specific flux balance analysis . . . . . . . 63 4.1 Each compound’s corresponding limit of detection, range of linear- ity and correlation coefficient identified from standard curves. . . 76 4.2 Each compound’s corresponding peak number, retention time, m/z value for 12C, 13C, and unlabeled, cone voltage, and MS species. . 77 4.3 Each amino acid’s retention time separated by reverse-phase liquid chromatography and detected by TUV at 260nm with the corre- sponding limit of detection, linear range, and correlation coefficient. 79 4.4 Each compound’s retention time and mass over charge ratio with the corresponding limit of detection, linear range, and correlation coefficient. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 5.1 Parameters for sequence specific flux balance analysis . . . . . . . 129 6.1 Breakdown of ATP generation. Flux through ATP-generating path- ways in the first and second phases as percentages of total ATP generation in that phase. . . . . . . . . . . . . . . . . . . . . . . . . 166 6.2 Breakdown of ATP consumption. Flux through ATP-consuming pathways in the first and second phases as percentages of total ATP consumption in that phase. . . . . . . . . . . . . . . . . . . . . . . . 167 6.3 Mean and standard deviation of Akaike information criterion (AIC), by measurement, for the ensemble and random ensemble. . . . . . 168 6.4 Reference values for reaction rate maxima (Vmax) from BioNum- bers. Vmax values calculated from turnover numbers (kcat) from BioNumbers, and a characteristic enzyme concentration of 170 nM. Characteristic rate maximum for all other reactions calculated as geometric mean of calculated rate maxima. . . . . . . . . . . . . . . 169 x 6.5 Enzyme levels for key reaction fluxes, calculated from enzyme turnover numbers [3] and rate maxima from the best-fit set. . . . . 170 6.6 Reference values for transcription, translation, and mRNA degra- dation from literature. Transcription rate calculated from elonga- tion rate, mRNA length, and promoter activity level. Translation rate calculated from elongation rate, protein length, and polysome amplification constant. mRNA degradation rate calculated from mRNA degradation time. . . . . . . . . . . . . . . . . . . . . . . . . 171 7.1 Multi-objective optimization test problems. We tested the JuPOETs implementation on three two-dimensional test problems, with one- , two- and three-dimensional parameter vectors. Each problem had parameter bounds constraints, however, on the Binh and Korn function had additional non-linear problem constraints. For the Fonesca and Fleming problem, N = 3. . . . . . . . . . . . . . . . . . 186 A.1 List of materials and equipment used to quantify cell-free protein synthesis metabolites with aniline tagging and internal standards . 201 xi LIST OF FIGURES 1.1 A schematic of the integration of transcription and translation pro- cesses integrated with metabolism. Transcription and translation processes demand macromolecular precursors (e.g. NTPS, amino acids and cofactors) from metabolism for gene expression. The target protein in turn can effect enzymatic flux (orange arrow) or the target protein is synthesized as a product (green arrow). The integrated framework is represented as a stiochiometric matrix of metabolites participating in certain reactions, where the flux is esti- mated subject to constraints, a pseudo-steady state assumption and an objective function. . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 Cell-free protein synthesis. Cell extract is prepared by cell lysis and cellular debris and chromosome DNA is removed. An energy source along with necessary amino acids, nucleotides, and cofactor are added to the cell-free reaction. Template DNA of the target protein is added. The target protein is then easily purified from the cell-free system. Alternatively, cell-free extract can be freeze dried into pellets and paired with lyophilized DNA. Through the simple addition of water, proteins can be manufactured on site and on demand. Figure adapted from [26, 137]. . . . . . . . . . . . . . . 8 2.1 HCM proof of concept metabolic study. A: HCMs distribute uptake and secretion fluxes amongst different pathways. For HCM, these pathways are elementary modes; for HCM-FBA these are flux bal- ance analysis solutions. HCM combines all possible modes within a network; whereas HCM-FBA combines only steady-state paths estimated by flux balance analysis. B: Prototypical network with six metabolites and seven reactions. Intracellular cellmass precursors A, B, and C are balanced (no accumulation) while the extracellular metabolites (Ae, Be, and Ce) are not balanced (can accumulate). The oval denotes the cell boundary, qj is the jth flux across the boundary, and vk denotes the kth intracellular flux. C: Simulation of extracellu- lar metabolite trajectories using HCM-FBA (solid line) versus HCM (points) for the prototypical network. . . . . . . . . . . . . . . . . . 15 xii 2.2 HCM-FBA versus HCM performance for small and large metabolic networks. A: Batch anaerobic E. coli fermentation data versus HCM- FBA (solid) and HCM (dashed). The experimental data was repro- duced from Kim et al. [95]. Error bars represent the 90% confidence interval. B: Batch aerobic E. coli fermentation data versus HCM- FBA (solid). Model performance is also shown when minor modes (dashed) and major modes (dotted) were removed from the HCM- FBA model. The experimental data was reproduced from Varma & Palsson [176]. Error bars denote a 10% coefficient of variation. . . . 17 2.3 Global sensitivity analysis of the aerobic E. coli model. Total or- der variance based sensitivity coefficients were calculated for the biomass yield on glucose and acetate. Sensitivity coefficients were computed for kinetic parameters and enzyme initial conditions (N = 183,000). Error bars represent the 95% confidence intervals of the sensitivity coefficients. . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.1 Sequence specific flux balance analysis. A. Schematic of the core metabolic network coupled to sequence-specific transcription and translation processes of a protein of interest for cell-free protein synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.2 Sequence specific flux balance analysis of deGFP under a P70a promoter in TXTL 2.0 E. coli extract. A. deGFP production for 8 h using maltose and 3PG as a carbon and energy source (R2 = 0.84). Error bars denote a 10% deviation from the nominal value. B. Predicted versus measured deGFP concentration as a function of plasmid concentration in TXTL 2.0 (R2 = 0.97). Error bars denote the standard deviation of experimental measurements. The blue region denotes the 95% CI over an ensemble of N = 100 sets, the black line denotes the mean of the ensemble, and dots denote experimental measurements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.3 Optimal metabolic flux distribution for CAT production. A. Opti- mal flux distribution in the presence of amino acid supplementation and de novo synthesis. B. Optimal flux distribution in the presence of amino acid supplementation without de novo synthesis. C. Op- timal flux distribution with de novo amino acid synthesis in the absence of supplementation. Mean flux across the ensemble (N = 100), normalized to glucose uptake flux. Thick arrows indicate flux to or from amino acid biosynthesis pathways. . . . . . . . . . . . . 36 xiii 3.4 Experimentally constrained simulation of CAT production. CAT was produced under a T7 promoter in CFPS E. coli extract for 1 h using glucose as a carbon and energy source. Error bars denote the standard deviation of experimental measurements. The blue region denotes the 95% CI over an ensemble of N = 100 sets, the black line denotes the mean of the ensemble, and dots denote experimental measurements. A. Metabolic flux distribution for CAT production in the presence of experimental constraints for glucose, organic acid and amino acid consumption and production rates. Mean flux across the ensemble, normalized to glucose uptake flux. Thick arrows indicate flux to or from amino acids. B. Central carbon metabolite and CAT measurements versus simulations over a 1 hour time course. The blue region denotes the 95% CI over an ensemble of N = 100 sets, the black line denotes the mean of the ensemble, and dots denote experimental measurements. . . . . . . 39 3.5 The CFPS performance for eight model proteins with and without amino acid supplementation. A. Mean CFPS productivity for a panel of model proteins with and without amino acid supplementa- tion. B. Mean CFPS productivity versus carbon number for a panel of model proteins with and without amino acid supplementation. Trendline (black dotted line) was calculated across all cases for a P70a promoter (R2 = 0.99) and maximum productivity trendline assumed u (κ) = 1 (grey dotted line; R2 = 0.99). C. Mean CFPS energy efficiency for a panel of model proteins with and without amino acid supplementation. D. Mean CFPS energy efficiency ver- sus carbon number for a panel of model proteins with and without amino acid supplementation. Trendline for cases with amino acids (black dotted line) and trendline for without amino acids (grey dot- ted line; R2 = 0.81). Error bars: 95% CI calculated by sampling; asterisk: protein excluded from trendline; dagger: constrained by experimental measurements and excluded from trendline; triangles: first principle prediction and excluded from trendline. . . . . . . . 45 xiv 3.6 Sensitivity analysis of the cell-free production of CAT. A. Total or- der sensitivity of the optimal CAT productivity with respect to metabolic and transcription/translation parameters. B. Total or- der sensitivity of the optimal CAT energy efficiency. Metabolic and transcription/translation parameters were varied for amino acid supplementation and synthesis (black), amino acid supple- mentation without synthesis (dark grey) and amino acid synthesis without supplementation (light gray). Error bars represent the 95% CI of the total order sensitivity index. . . . . . . . . . . . . . . . . . 48 3.7 Optimal CAT energy efficiency versus oxidative phosphorylation flux calculated across an ensemble (N = 1000) of flux balance solu- tions (points). Energy efficiency versus oxidative phosphorylation flux for amino acid supplementation and de novo synthesis (black), amino acid supplementation without de novo synthesis (dark grey), and de novo amino acid synthesis without supplementation (light gray). The ensemble was generated by randomly varying the oxy- gen consumption rate from 0.1 to 10 mM/h and randomly sampling the transcription and translation parameters within 10% of their literature values. Each point represents one solution of the model equations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 3.8 Pairwise knockouts of reaction subgroups in the cell-free network. A. Difference in the CAT productivity in the presence of reaction knockouts compared with no knockouts for experimentally con- strained CAT production. B. Difference in the flux distribution in the presence of reaction knockouts compared with no knockouts for experimentally constrained CAT production. The difference between perturbed and wild-type productivity and flux distribu- tions was quantified by the l2 norm, and then normalized so the maximum change was 1.0. Red boxes indicate potential alternative optimal flux distributions with the same CAT productivity as the wild type, whereas no red box indicates no feasible solution and/or the optimal CAT productivity was not met. . . . . . . . . . . . . . . 52 3.9 Robust analysis of maltose and 3PG consumption for TXTL 2.0 E. coli extract with and without oxidative phosphorylation activity that meet the transcription and translation constraints. Each dot represents the mean of an ensemble of N = 20 ssFBA solutions, black dots are solutions without oxidative phosphorylation and grey dots are solutions with oxidative phosphorylation. . . . . . . . . . . . . 55 xv 4.1 Schematic of workflow for aniline tagging. The cell-free protein synthesis reaction is de-proteinized and tagged with 12C-aniline, while a standard stock mixture is tagged with 13C-aniline. Both mixtures are then mixed at a 1:1 volumetric ratio and analyzed by LC/MS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 4.2 Mass chromatogram from a single LC/MS run of a 40µM standard mixture of 40 metabolites. Peaks were identified by their retention time and m/z values for each compound. Complete compound names and their abbreviations are listed in Table 4.1. . . . . . . . . 74 4.3 Amino acid chromatogram tagged and separated by reverse-phase liquid chromatography and detected with a TUV at 260nm. Peaks were identified by their retention time. . . . . . . . . . . . . . . . . 78 4.4 Nucleotide charged sugars chromatogram separated by reverse- phase liquid chromatography and detected by mass-spectrometry according to each compounds mass over charge ratio. Peaks were identified by their retention time and selective ion recording. . . . 81 5.1 Modeling framework of cell-free protein synthesis. The metabolic network was adapted from Vilkhovoy and coworkers where tran- scription/translation was integrated with metabolism. Maximum flux bound rates were formulated to be a function of the turnover rate and enzyme abundance found to be present in CFPS extract. En- zyme levels were validated for a subset of 15 reactions with enzyme activity assays. Four of the enzymes were not reported in Garenne and coworker (grey boxes), but were found to be active with it’s corresponding enzyme activity assay. The flux estimation for each time step was estimated while being constrained to metabolic mea- surements where data was present (62 species). Finally, the flux calculation was sampled across an ensemble of 100 sets given ex- perimental noise and literature parameters. hk: hexokinase, gdh: glutamate dehydrogenase, ppc: phosphoenolpyruvate carboxylase, sdh: succinate dehydrogenase. . . . . . . . . . . . . . . . . . . . . . 93 xvi 5.2 Mean flux distribution across an ensemble (N=100) for control. Fluxes were determined by integrating kinetic parameters with enzyme levels and constraining to measurements of metabolites and enzyme activity levels where data was available. (a) Flux distribution at 2 hours of CFPS reaction. (b) Flux distribution at 8 hours of CFPS reaction. Fluxes were normalized to maltodextrin consumption at t=0 hours. . . . . . . . . . . . . . . . . . . . . . . . 96 5.3 Prediction of mRNA and protein levels in CFPS for control (blue), DNP (red) and TTA (grey). (a) The mRNA levels of GFP were predicted with the given modeling framework. (b) The protein abundance of GFP was predicted for all three cases. The solid line denotes the mean of the ensemble (N=100), the shaded region de- notes the 95% confidence interval of the ensemble, the points denote experimental measurements, and error bars denote the standard deviation of experimental measurements. . . . . . . . . . . . . . . . 97 5.4 Time course of amino acid levels in CFPS for control (blue), DNP (red) and TTA (grey). Experimental amino acid fluxes constrained the mathematical model of CFPS. The solid line denotes the mean of the ensemble (N=100), the shaded region denotes the 95% con- fidence interval of the ensemble, the points denote experimental measurements, and error bars denote the standard deviation of experimental measurements. . . . . . . . . . . . . . . . . . . . . . . 98 5.5 Time course of upper central carbon metabolite levels in CFPS for control (blue), DNP (red) and TTA (grey). DNP showed exhuas- tion of maltose revealing maltodextrin depletion and thus high carbon utilization. Experimental fluxes constrained the mathemati- cal model of CFPS. The solid line denotes the mean of the ensemble (N=100), the shaded region denotes the 95% confidence interval of the ensemble, the points denote experimental measurements, and error bars denote the standard deviation of experimental measure- ments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 xvii 5.6 Time course of lower central carbon metabolite levels in CFPS for control (blue), DNP (red) and TTA (grey). DNP heavily relied on substrate level phosphorylation with high accumulation of acetate, whereas TTA had a high abundance of lactate. Experimental fluxes constrained the mathematical model of CFPS. The solid line denotes the mean of the ensemble (N=100), the shaded region denotes the 95% confidence interval of the ensemble, the points denote experi- mental measurements, and error bars denote the standard deviation of experimental measurements. . . . . . . . . . . . . . . . . . . . . . 100 5.7 Time course of energy species levels in CFPS for control (blue), DNP (red) and TTA (grey). Both DNP and TTA exhausted GTP within 4 hours of the reaction which is required for translation. Experimental fluxes constrained the mathematical model of CFPS. The solid line denotes the mean of the ensemble (N=100), the shaded region denotes the 95% confidence interval of the ensemble, the points denote experimental measurements, and error bars denote the standard deviation of experimental measurements. . . . . . . . 101 5.8 Mean flux distribution across an ensemble (N=100) for DNP. Fluxes were determined by integrating kinetic parameters with enzyme levels and constraining to measurements of metabolites and enzyme activity levels where data was available. (a) Flux distribution at 2 hours of CFPS reaction. Flux difference from control shown for key reactions at 2 hours of CFPS reaction. (b) Flux distribution at 8 hours of CFPS reaction. Flux difference from control shown for key reactions at 8 hours of CFPS reaction. Fluxes were normalized to maltodextrin consumption at t=0 hours. . . . . . . . . . . . . . . . 104 5.9 Mean flux distribution across an ensemble (N=100) for TTA. Fluxes were determined by integrating kinetic parameters with enzyme levels and constraining to measurements of metabolites and enzyme activity levels where data was available. (a) Flux distribution at 2 hours of CFPS reaction. Flux difference from control shown for key reactions at 2 hours of CFPS reaction. (b) Flux distribution at 8 hours of CFPS reaction. Flux difference from control shown for key reactions at 8 hours of CFPS reaction. Fluxes were normalized to maltodextrin consumption at t=0 hours. . . . . . . . . . . . . . . . 106 5.10 Mean energy efficiency across an ensemble (N=100) for control (a), DNP (b), and TTA (c) throughout the metabolic network. TXTL de- notes the energy efficiency for transcription and translation processes.108 xviii 5.11 Mean carbon yield across an ensemble (N=100) for control (a), DNP (b), and TTA (c) for CFPS. PPP denotes the Pentose Phosphate Path- way. Other includes purine, pyrimidine and chorismate metabolism.109 5.12 Enzyme activity measurements reveal allosteric regulation is present in CFPS. Enzyme activity assays at 2 and 8 hours of the CFPS reaction throughout the metabolic network for control (black), DNP (dark grey), and TTA (light grey). . . . . . . . . . . . . . . . . 111 6.1 Schematic of the core portion of the cell-free E. coli metabolic network. Metabolites of glycolysis, pentose phosphate pathway, Entner-Doudoroff pathway, and TCA cycle are shown. Metabolites of oxidative phosphorylation, amino acid biosynthesis and degrada- tion, transcription/translation, chorismate metabolism, and energy metabolism are not shown. . . . . . . . . . . . . . . . . . . . . . . . 136 6.2 Central carbon metabolism in the presence (top) and absence (bot- tom) of allosteric control, including glucose (substrate), CAT (prod- uct), and intermediates, as well as total concentration of energy species. Best-fit parameter set (orange line) versus experimental data (points). 95% confidence interval (blue or gray shaded region) over the ensemble of 100 sets. . . . . . . . . . . . . . . . . . . . . . . 140 6.3 Amino acids in the presence of allosteric control. Best-fit parameter set (orange line) versus experimental data (points). 95% confidence interval (blue shaded region) over the ensemble of 100 sets. . . . . 141 6.4 Energy species and energy totals by base in the presence of allosteric control. Best-fit parameter set (orange line) versus experimental data (points). 95% confidence interval (blue shaded region) over the ensemble of 100 sets. . . . . . . . . . . . . . . . . . . . . . . . . . 142 6.5 Histograms of model parameters, across the ensemble of 100 sets. A. Histogram of rate maxima. B. Histogram of saturation constants. 143 6.6 Log of cost function (residual between training data and model simulations) across 37 datasets for data-trained ensemble (blue) and randomly generated ensemble (red, gray background). Median (bars), interquartile range (boxes), range excluding outliers (thin lines), and outliers (circles) for each dataset. Median across all datasets (large bar overlaid). . . . . . . . . . . . . . . . . . . . . . . 144 xix 6.7 Effect of group knockouts on system. A. Change in CAT productiv- ity when one (diagonal) or two (off-diagonal) reaction groups are turned off. B. Change in system state (only species for which data exist) when one (diagonal) or two (off-diagonal) reaction groups are turned off. Total-order effect for each group calculated as the sum of first-order effect and all pairwise effects. Larger and darker circles represent greater effects. . . . . . . . . . . . . . . . . . . . . . 145 6.8 Key reaction fluxes of the network, in the first (gray boxes, top row) and second (gray boxes, bottom row) phases of metabolism. A. Fluxes of ATP generation and consumption, and GTP consumption toward protein synthesis. B. Fluxes of glycolysis and lactate and acetate metabolism. Fluxes are normalized to the first-phase glucose uptake rate. For PEP and pyruvate, accumulation (normalized to glucose uptake) is also shown. . . . . . . . . . . . . . . . . . . . . . 148 7.1 Schematic of multiobjective parameter mapping. The performance of any given parameter set is mapped into an objective space using a ranking function which quantifies the quality of the parameters. The distance away from the optimal tradeoff surface is quantified using the Pareto ranking scheme of Fonseca and Fleming in JuPOETs.179 7.2 The performance of JuPOETs on the multi-objective test suite. The execution time (wall-clock) for JuPOETs and POETs implemented in Octave was measured for 10 independent trials for the suite of test problems. The number of steps per temperature I = 10, and the cooling parameter α = 0.9 for all cases. The problem domain was partitioned into 10 equal segments, an initial guess was drawn from each segment. For each of the test functions, JuPOETs estimated solutions on (rank zero solutions, black) or near (gray) the optimal tradeoff surface, subject to bounds and problem constraints. . . . 187 7.3 Representative JuPOETs solutions for problems in the multi- objective test suite. The number of steps per temperature I = 10, and the cooling parameter α = 0.9 for all cases. The problem domain was partitioned into 10 equal segments, an initial guess was drawn from each segment. For each of the test functions, JuPOETs esti- mated solutions on (rank zero solutions, black) or near (gray) the optimal tradeoff surface, subject to bounds and problem constraints. 188 xx 7.4 Proof of concept biochemical network study. Inset right: Prototypi- cal biochemical network with six metabolites and seven reactions modeled using the hybrid cybernetic approach (HCM). Intracellu- lar cellmass precursors A, B, and C are balanced (no accumulation) while the extracellular metabolites Ae, Be, and Ce are dynamic. The oval denotes the cell boundary, qj is the jth flux across the boundary, and vk denotes the kth intracellular flux. Four data sets (each with Ae, Be,Ce and cellmass measurements) were generated by varying the kinetic constants for each biochemical mode. Each data set was a single objective in the JuPOETs procedure. A: Ensemble simu- lation of extracellular substrate Ae and cellmass versus time. B: Ensemble simulation of extracellular substrate Be and Ce versus time. The gray region denotes the 95% confidence estimate of the mean ensemble simulation. The data points denote mean synthetic measurements, while the error bars denote the 95% confidence es- timate of the measurement computed over the four training data sets. C: Trade-off plots between the four training objectives. The quantity Oj denotes the jth training objective. Each point represents a member of the parameter ensemble, where gray denotes rank 0 sets, while black denotes rank 1 sets. Ensembles were generated using POETs without employing local refinement. . . . . . . . . . 189 7.5 Experiment to experiment variation captured by the ensemble. Cell- mass measurements (points) versus time for experiment 2 and 3 were compared with ensemble simulations. The full ensemble was sorted by simultaneously selecting the top 25% of solutions for each objective with rank ≤ 1. The best fit solution for each objective (line) ± 1-standard deviation (gray region) for experiment 2 and 3 brackets the training data despite significant differences the training values between the two data sets. . . . . . . . . . . . . . . . . . . . 193 xxi CHAPTER 1 INTRODUCTION 1.1 Metabolic modeling methods Metabolism is the central process through which cells manage their resources to survive, adapt and meet energetic demands. To implement these diverse functions, cells have very complex and highly interconnected networks of chemical reactions between genes, RNA, proteins and metabolites. Systems modeling arose from the desire to better understand metabolism and how metabolism can be altered for our benefit [48, 12]. Metabolic control analysis (MCA), developed by Kascer and Burns, was among the first tools to define a quantitative approach towards metabolic control [84]. However, MCA required a priori experimental data of in- finitesimal enzyme perturbations that are difficult to acquire and the technique lacked a predictive scope. Shortly after, biochemical systems theory (BST) was developed which is based on ordinary differential equations (ODEs), where each biochemical process is represented by power law expansions [147]. However, just as MCA, it depended on local sensitivity arguments, limiting its predictability. Cybernetic models provided a systematic approach for describing metabolic reg- ulation by directing the allocation of resources towards a nutritional objective in an optimal manner [177]. The advantage of cybernetic models over BST and other contemporary frameworks is they are able to predict how network functionality adjusts to different perturbations, creating a dynamic model. Cybernetic models 1 were first used to predict microbial growth on multiple substrates [98], however the model used only abstract models of the network, which was favorable at the time since the underlying biological mechanism was unknown. Since then, cyber- netic models have been integrated with metabolic pathway analysis to successfully predict metabolic shifts of E. coli with genetic deletions of pta-ackA [193], however this model only consisted of 12 biochemical reactions and did not scale well for genome-scale or even core metabolism models of microbes. As biological understanding grew, metabolic models became more sophisti- cated, able to describe cellular processes such as RNA synthesis, chromosome synthesis, regulated catabolic and macromolecular synthesis pathways using ordi- nary differential equations [173]. One of the first whole cell models was developed by Shuler and coworkers which described the growth of E. coli on limited glucose [38]. Since then, models have been expanded with sufficient detail for a variety of cells. Karr et al. (2012) have developed a whole cell model of Mycoplasma geni- talium, accounting for all genes and their interactions in the cell [88]. The model is constructed with independent sub-models describing different components of the cell, which is able to describe the life cycle from the level of single molecules. Each sub-model was parameterized and tested independently, thus it is possible that this whole cell model will not hold true under all conditions for the speci- fied parameters. Even though some of these models have been successful, their formulation is complex, nonlinear and requires a large set of parameters that are computationally expensive to estimate. To overcome such obstacles, constraint based methods [176] have been developed to help describe biochemical networks 2 Metabolism Transcription & Translation NTPs DNA Amino Acids RNAp Degradation Cofactors mRNA Ribosome Target Product Protein Mathematical Representation Metabolic Reactions v1 v2 v3 v4 vR Flux Constraints x1 1 0 0 -1 0 0 ≤ v1 ≤ ∞ x2 0 -1 1 0 1 -∞ ≤ v2 ≤ ∞ x3 -1 0 1 0 0 X 0 ≤ v3 ≤ ∞ = 0 xM 1 0 0 -1 0 -∞ ≤ vR ≤ ∞ Figure 1.1: A schematic of the integration of transcription and translation pro- cesses integrated with metabolism. Transcription and translation processes de- mand macromolecular precursors (e.g. NTPS, amino acids and cofactors) from metabolism for gene expression. The target protein in turn can effect enzymatic flux (orange arrow) or the target protein is synthesized as a product (green arrow). The integrated framework is represented as a stiochiometric matrix of metabolites participating in certain reactions, where the flux is estimated subject to constraints, a pseudo-steady state assumption and an objective function. without the need of kinetic parameters of the cellular processes. 3 Metabolites 1.2 Constraint based modeling of metabolism Stoichiometric reconstructions of microbial metabolism, popularized by constraint based approaches such as flux balance analysis (FBA), have become standard tools to interrogate metabolism [108]. FBA and metabolic flux analysis (MFA) [187], as well as convex network decomposition approaches such as elementary modes [150] and extreme pathways [148], model intracellular metabolism using the biochemical stoichiometry and other constraints such as thermodynamical feasibility [64, 62] under pseudo steady state conditions. Constraint based approaches use linear programming [34] to predict productivity [176, 146], yield [176], mutant behavior [43], and growth phenotypes [129] for biochemical networks of varying complexity, including genome scale networks. Constraint based models have also been used to identify strategies for the overproduction of desired compounds. These strategies include genetic knockouts or the addition of heterologous enzyme pathways to an organism’s metabolic network and have been used in developing useful bacterial strains for the production of biofuels [10], high-value chemicals [124, 154, 192] and pharmaceuticals [141, 47]. Stoichiometric reconstructions have been expanded to include the metabolic demands for protein synthesis based on the DNA and protein sequences (Fig. 1.1), where the transcription and translation processes have been integrated into metabolism [4]. Since then, these models have been expanded into genome-scale with detailed descriptions of gene expression (ME- Model) [171, 108, 129] and protein structures (GEM-PRO) [197, 30] and successfully capturing the regulatory effects they have on metabolism. These expansions have 4 greatly increased the scope of questions these models can explore. Constraint based methods are powerful tools to estimate the performance of metabolic net- works with very few to sometimes no parameters. In addition, they are able to provide unintuitive strategies for metabolic engineering applications of increasing productivity, yield or titer. Constraint based methods have typically been used to model in vivo processes, and have not yet been applied to cell-free metabolism. 1.3 Cell-free protein synthesis Cell-free biology is a powerful and flexible enabling technology that can engineer biological parts and be used for biomolecule production without using living cells. However, cell-free biology is not new, it has been practiced for decades. The first examples of the use of cell-free protein synthesis were in 1950 by Borsook [19] and Winnick [188]. Using animal tissue homogenates, they looked at how amino acids were incorporated into proteins. A few years later, bacterial extracts (Staphylococcus aureus) were used to confirm amino acid incorporation [51]. In 1956, the role of ATP in protein production was discovered using rat liver extracts [69]. Soon after, Nirenberg and Matthaei [118, 128] discovered the genetic code which led them to earn the Nobel Prize in 1968. It is thus evident that cell-free systems have had a significant impact on Molecular Biology. Over the years, the cell extract preparation process has undergone significant developments. In 1967, Lederman and Zubay developed a coupled transcription- 5 translation bacterial extract that allowed DNA to be used as a template [101]. Shortly after, Spirin and coworkers improved cell-free extract protein production with a continuous exchange of reactant and product and could run for tens of hours; however, these systems could only synthesize a single product and were energy limited [159]. More recently, energy efficiency of E. coli CFPS was improved by generating ATP with substrate level phosphorylation [93] and oxidative phos- phorylation [82, 83, 81]. Since oxidative phosphorylation is a membrane associated process, this reveals that membrane dependent energy metabolism can be activated in cell-free and shows that complex metabolism is still occurring. With the advent of genome sequencing, CFPS has shown remarkable utility as a protein synthesis technology, given the knowledge of reaction networks that can be understood, al- tered and controlled. CFPS systems are derived from crude cell extracts, taking the cell’s machinery to operate transcription and translation processes while discarding cellular debris and chromosomal DNA (Fig 1.2). Cell-free extracts are commonly prepared from E.coli, S. cerevisiae, rabbit reticulocytes, wheat germ, and insect cells [143] CFPS reactions are activated by the addition of amino acids, nucleotides, template DNA, cofactors and an energy source. It is important to note that some cell-free platforms are better suited for a particular application than the others. For example, insect and mammalian cell extracts are equipped with post-translational modification capabilities that might not be available in S. cerevisiae or E. coli extracts [139]. While earlier approaches focused on investigating biological phenomena, today, CFPS is used to produce complex biological products. CFPS has been utilized in a 6 wide range of applications from the production of pharmaceutical proteins [110, 55, 126] to high-throughput production of protein libraries for protein evolution and structural genomics[164]. Point-of-care protein synthesis is also a promising technology with microfluidic reactors [172]. N-linked glycoproteins have also been produced in an E. coli-based CFPS by Guarino and DeLisa [57]. And more recently, single-pot glycoprotein synthesis with glycosylation machinery has been achieved [78], allowing the development of important therapeutics. Thus, CFPS is a promising platform for manufacturing of proteins and chemi- cals; a technology that has traditionally applied to living cells. If cell-free protein synthesis (CFPS) is to become a mainstream technology for advanced applica- tions such as point of care manufacturing [137], we must first understand the performance limits and costs of these systems [81]. Cell-free systems offer many advantages for the study, manipulation and modeling of metabolism compared to in vivo processes. Central amongst these is direct access to metabolites and the biosynthetic machinery without the interference of a cell wall or the complications associated with cell growth. This allows interrogation of the chemical environment while the biosynthetic machinery is operating, potentially at a fine time resolution. Despite the advantages and disadvantages of in vivo and CFPS processes, a funda- mental challenge in metabolic engineering remains: the identification of genetic manipulations that accomplish the desired function most effectively [12]. Due to the complexity and immense interconnectivity of metabolic networks, even for simple prokaryotic organisms like E. coli, making the appropriate genetic manip- ulation for a desired function is not intuitive. Computational and mathematical 7 Prepare and Energy substrates store extract DNA plasmid Target Remove cellular Protein Extract Preparation Cell Lysis debris and chromosomal DNA Freeze dry Rehydrate molecular Freeze-dried instructions pellets Target Protein Freeze dry Freeze-dried DNA constructs Figure 1.2: Cell-free protein synthesis. Cell extract is prepared by cell lysis and cellular debris and chromosome DNA is removed. An energy source along with necessary amino acids, nucleotides, and cofactor are added to the cell-free reaction. Template DNA of the target protein is added. The target protein is then easily purified from the cell-free system. Alternatively, cell-free extract can be freeze dried into pellets and paired with lyophilized DNA. Through the simple addition of water, proteins can be manufactured on site and on demand. Figure adapted from [26, 137]. models of metabolic networks offer powerful tools to aid our understanding of metabolism and rational-design for improving cell-free protein expression [184]. 1.3.1 Mathematical models of cell-free protein synthesis There have been several mathematical models of cell-free protein synthesis, how- ever, the majority of models published focus on the transcription and translation processes. These models are mostly systems of ODEs based on Michaelis-Menten kinetics. For example, Karzbrun and coworkers developed a coarse-grained model of transcription and translation for E. coli cell-free extract [89]. To simplify calcula- 8 tions, this model was based on four enzymes and ten parameters. Transcription and translation processes were assumed to follow Michaelis-Menten kinetics. The authors noted that the protein synthesis rate began to exponentially decay after 1 hour, so their study focused on the first hour of the cell-free experiment. This decay was attributed to resource depletion and waste accumulation. Stögbauer and coworkers developed a model that accounts for resource con- sumption and degradation and identified the bottleneck of protein synthesis [161]. Variables representing transcription and translation resources were added to the model, but the exact identities and quantities of these resources were beyond the scope of the study. The authors attempted to use Hill functions to better pre- dict saturation effects of mRNA and their protein of interest but found that the optimized Hill coefficients were close to one, resulting in Michaelis-Menten-like approximations. Protein yield was determined to be a function of template DNA concentration. Interestingly, this work found that nucleotide triphosphate deple- tion was not the source of protein synthesis rate decay. For the specific extract used, ribosome degradation was to blame for rate decay. More recently, Neißand coworkers published a more comprehensive experi- mentally validated model that was used to identify limiting factors of cell-free protein synthesis [127]. An unusual characteristic of this model is what the authors described as a hybrid black box approach: transcription processes were simplified, while the model for translation was detailed. The entire model is a differential algebraic equation system of eight algebraic equations and over 400 ODEs. Using 9 sensitivity analysis, Neiß found that cell-free protein synthesis rates are limited by concentrations of tRNA and elongation factor Tu. A model that captured resource competition in gene networks was published by Gyorgy and Murry [60]. For a two-protein expression system, simulations that considered both products agreed with experimental data for the same sys- tem. This model was also applied to predict possible product concentrations in multiple-protein expression systems and compare different cell-free extracts. The authors concluded that resource competition is a key consideration in the design of synthetic gene circuits. The cell-free protein synthesis models discussed thus far have been based on experiments in which DNA serves as the template. RNA genetic circuitry is another area where mathematical models can be developed. Transcription regulating RNAs are of interest because they bypass the need for regulatory proteins [113]. In the context of circuit design, these regulatory RNAs can be used to create various logic gates and cascades [20] [31]. The first experimentally validated model of a synthetic RNA circuit was published by Hu and coworkers. [72]. The model contained 8 ODEs and 13 previously unknown parameters. These parameters were estimated based on results from sensitivity analysis guided experiments. This model was able to predict results for new networks it had not been trained on. Taken together, these models of transcription, translation, resource competition, and gene regulatory circuits provide useful information for designing new sys- tems; however, they each provide an incomplete representation of cell-free protein 10 synthesis. CFPS does not just rely on transcription and translation processes to fuel protein production, but relies on central carbon metabolism to meet energy requirements. Thus, more sophisticated models are needed that integrate metabolic pathways with transcription and translation process. Ultimately, an integrated framework can provide insights into the limitations of CFPS and provide strategies for improving CFPS performance metrics such as carbon yield, energy efficiency and productivity. 11 CHAPTER 2 EFFECTIVE DYNAMIC MODELS OF METABOLIC NETWORKS 2.1 Abstract 1 Mathematical models of biochemical networks are useful tools to understand and ultimately predict how cells utilize nutrients to produce valuable products. Hybrid cybernetic models (HCM) in combination with elementary modes are tools to model cellular metabolism. However, HCM is limited to reduced metabolic networks because of the computational burden of calculating elementary modes. In this study, we developed the hybrid cybernetic modeling with flux balance analysis or HCM-FBA technique which uses flux balance solutions instead of elementary modes to dynamically model metabolism. We show HCM-FBA has comparable performance to HCM for a proof of concept metabolic network and for a reduced anaerobic E. coli network. Next, HCM-FBA was applied to a larger metabolic network of aerobic E. coli metabolism which was infeasible for HCM (29 FBA modes versus more than 153,000 elementary modes). Global sensitivity analysis further reduced the number of FBA modes required to describe the aerobic E. coli data, while maintaining model fit. Thus, HCM-FBA is a promising alternative to HCM for large networks where the generation of elementary modes is infeasible. 1Adapted with permission from Vilkhovoy M, Minot M, and Varner JD, ”Effective dynamic models of metabolic networks” (2016) IEEE Life Sciences Letters, 2(4):51-54. 12 2.2 Introduction Biotechnology harnesses the power of metabolism to produce products that benefit society. Constraints based models are important tools to understand and ultimately to predict how cells utilize nutrients to produce products. Constraints based methods such as flux balance analysis (FBA) [133] and network decomposition approaches such as elementary modes (EMs) [150] or extreme pathways (EPs) [148] model intracellular metabolism using the biochemical stoichiometry and other con- straints such as thermodynamical feasibility under pseudo-steady state conditions. FBA has been used to efficiently estimate the performance of metabolic networks of arbitrary complexity, including genome scale networks, using linear programming [34]. On the other hand, EMs (or EPs) catalog all possible metabolic behaviors such that any flux distribution predicted by FBA is a convex combination of the EMs (or EPs) [186]. However, the calculation of EMs (or EPs) is computationally expensive and currently infeasible for genome scale networks [102]. Cybernetic models are an alternative to the constraints based approach which hypothesize that metabolic control is the output of an optimal decision. Cyber- netic models have predicted mutant behavior [178, 157], steady-state multiplicity [96], strain specific metabolism [156], and have been used in bioprocess control applications [49]. Hybrid cybernetic models (HCM) have addressed earlier short- comings of the approach by integrating cybernetic optimality concepts with EMs. HCMs dynamically choose combinations of biochemical modes (each catalyzed by a pseudo enzyme whose expression is controlled by an optimal decision) to achieve 13 a physiological objective (Fig. 2.1A). HCMs generate intracellular flux distributions consistent with other approaches such as metabolic flux analysis (MFA), and also describe dynamic extracellular measurements superior to dynamic FBA (DFBA) [95]. However, HCMs are restricted to networks which can be decomposed into EMs (or EPs). In this study, we developed the hybrid cybernetic modeling with flux balance analysis (HCM-FBA) technique. HCM-FBA is a modification of the hybrid cy- bernetic approach of Ramkrishna and coworkers [95] which uses FBA solutions (instead of EMs) in conjunction with cybernetic control variables to dynamically simulate metabolism. Since HCM showed superior performance to DFBA, we compared the performance of HCM-FBA to HCM for a prototypical metabolic network, along with two real-world E. coli applications. HCM-FBA performed comparably to HCM for the prototypical network and a reduced anaerobic E. coli network, despite having fewer parameters in each case. Next, HCM-FBA was applied to an aerobic E. coli metabolic network that was infeasible for HCM. HCM- FBA described cellmass growth and the shift from glucose to acetate consumption with only a few modes. Global sensitivity analysis allowed us to further reduce the aerobic E. coli HCM-FBA model to the minimal model required to describe the data. Thus, HCM-FBA is a promising approach for the development of reduced order dynamic metabolic models and a viable alternative to HCM or DFBA, especially for large networks where the generation of EMs is infeasible. 14 A Network HCM-EM HCM-FBA A Extracellular Intracellular Ae Bv4 e v2 v3 C All possible modes Each mode Intracellular considered separatceelllymass Modes areq c3ombined by FBAprecursor Ce B CB A Extracellular Intracellular q1 v A 1 q2 Ae A B Bv4 e C v2 v3 C Biomass Intracellular cellmass q3 precursor Ce B B Time (hr) Figure 2.1: HCM proof of concept metabolic study. A: HCMs distribute uptake and secretion flAuxes amongst different pathways. For HCM, these pathways are elementary modes; for HCM-FBA theCse are flux balance analysis solutions. HCM combines all possible modes within a network; whereas HCM-FBA combines only steady-state paths estimated by fluBxiombasslance analysis. B: Prototypical network with six metabolites and seven reactions. Intracellular cellmass precursors A, B, and C are balanced (no accumulation) while the extracellular metabolites (Ae, Be, and Ce) are not balanced (can accumulate). TheBoval denotes the cell boundary, qj is the jth flux across the boundary, and vk denotes the kth intracellular flux. C: Simulation of extracellular metaboliTtimeet (rhra) jectories using HCM-FBA (solid line) versus HCM (points) for the prototypical network. 2.3 Results HCM-FBA was equivalent to HCM for a prototypical metabolic network (Fig. 2.1). The proof of concept network, consisting of 6 metabolites and 7 reactions 15 Abundance (A.U.) Abundance (A.U.) (Fig. 2.1B), generated 3 FBA modes and 6 EMs. Using the EMs and synthetic parameters, we generated test data from which we estimated the HCM-FBA model parameters. The best fit HCM-FBA model replicated the synthetic data (Fig. 2.1C). The HCM and HCM-FBA kinetic parameters were not quantitatively identical, but had similar orders of magnitude; the FBA approach had 3 fewer modes, thus identical parameter values were not expected. The HCM-FBA approach replicated synthetic data generated by HCM, despite having 3 fewer modes. Thus, we expect HCM-FBA will perform similarly to HCM, despite having fewer parameters. Next, we tested the ability of HCM-FBA to replicate real-world experimental data. The performance of HCM-FBA was equivalent to HCM for anaerobic E. coli metabolism (Fig. 2.2A). We constructed an anaerobic E. coli network [95], con- sisting of 12 reactions and 19 metabolites, which generated 7 FBA modes and 9 EMs. HCM reproduced cellmass, glucose, and byproduct trajectories using the kinetic parameters reported by Kim et al. [95] (Fig. 2.2A, points versus dashed). HCM-FBA model parameters were estimated in this study from the Kim et al. data set using simulated annealing. Overall, HCM-FBA performed within 5% of HCM (on a residual standard error basis) for the anaerobic E. coli data (Fig. 2.2A, solid), despite having 2 fewer modes and 4 fewer parameters (17 versus 21 parameters). Thus, while both HCM and HCM-FBA described the experimental data, HCM-FBA did so with fewer modes and parameters. HCM-FBA captured the shift from glucose to acetate consumption for a model of aerobic E. coli metabolism that was infeasible for HCM (Fig. 2.2B). An E. coli metabolic network (60 metabolites and 105 reactions) was constructed from literature [149, 136]. Elementary mode de- 16 A 1.0 B 1.0 HCM FBA HCM FBA HCM EM Minor modes removed Data Major mode removed Data 0.5 Biomass (gDW/L) 0.5 Lactate (mM) 0.0 0.0 0 2 4 6 8 0 2 4 6 8 10 30 Glucose 12 20 8 Formate 10 4 0 0 0 2 4 6 8 0 2 4 6 8 10 6 20 40 15 Acetate 30 4 10 Ethanol 20 Succinate 2 5 10 0 0 0 0 2 4 6 8 0 2 4 6 8 10 TTimi ee ((hhrr)) TTimimee (h(hr)r) Figure 2.2: HCM-FBA versus HCM performance for small and large metabolic networks. A: Batch anaerobic E. coli fermentation data versus HCM-FBA (solid) and HCM (dashed). The experimental data was reproduced from Kim et al. [95]. Error bars represent the 90% confidence interval. B: Batch aerobic E. coli fermentation data versus HCM-FBA (solid). Model performance is also shown when minor modes (dashed) and major modes (dotted) were removed from the HCM-FBA model. The experimental data was reproduced from Varma & Palsson [176]. Error bars denote a 10% coefficient of variation. composition of this network (and thus HCM) was not feasible; 153,000 elementary modes were generated before the calculation became infeasible. Conversely, flux balance analysis generated only 29 modes for the same network. HCM-FBA model 17 AAbbuunnddaannccee ((mmMM)) AAbbuunnddaannccee ((mmMM)) Abundance AAAAc BBBiiiooommmaaa ccceeeettttaaaatttteeee ((((mmmmMMMM)))) GGGGlllluuuuccccoooosssseeee ((((mmmmMMMM)))) Bioma ssssssss ((((ggggDDDDWWWW////LLLL)))) parameters were estimated from cellmass, glucose, and acetate measurements [176] using simulated annealing (Fig. 2.2B, solid). HCM-FBA captured glucose consumption, cellmass formation, and the switch to acetate consumption following glucose exhaustion. HCM-FBA described the dynamics of a network that was infeasible for HCM, thereby demonstrating the power of the approach for large networks. Next, we demonstrated a systematic strategy to identify the critical subset of FBA modes required for model performance. Global sensitivity analysis identified the FBA modes essential to model perfor- mance (Fig. 2.3). Total order sensitivity coefficients were calculated for all kinetic parameters and enzyme initial conditions in the aerobic E. coli model. Five of the 29 FBA modes were significant; removal of the most significant of these modes (encoding aerobic growth on glucose) destroyed model performance (Fig. 2.2B, dotted). Conversely, removing the remaining 24 modes simultaneously had a neg- ligible effect upon model performance (Fig. 2.2B, dashed). The sensitivity analysis identified the minimal model structure required to explain the experimental data. 2.4 Discussion In this study, we developed HCM-FBA, an effective modeling technique to simulate metabolic dynamics. HCM-FBA uses flux balance analysis solutions in conjunction with cybernetic control variables to dynamically simulate metabolism. We studied the performance of HCM-FBA on a prototypical metabolic network, along with two 18 Rate constants Saturation constants Enzyme parameters 0.7 Total Order 0.6 0.5 0.4 0.3 0.2 0.1 0.0 kmax Ksat k ↵ enzyme initial e conditions Figure 2.3: Global sensitivity analysis of the aerobic E. coli model. Total order vari- ance based sensitivity coefficients were calculated for the biomass yield on glucose and acetate. Sensitivity coefficients were computed for kinetic parameters and enzyme initial conditions (N = 183,000). Error bars represent the 95% confidence intervals of the sensitivity coefficients. E. coli networks. First, we showed that the performance of HCM-FBA and HCM were comparable for the prototypical network and a small model of anaerobic E. coli metabolism. For the anaerobic case, both approaches described the experimental data. However, HCM-FBA (which was within 5% of HCM and slightly better than HCM for lactate secretion) had fewer modes and parameters. Next, HCM-FBA was applied to an aerobic E. coli metabolic network that was not feasible for HCM. Elementary mode decomposition of the aerobic network generated over 153,000 elementary modes. Conversely, the HCM-FBA approach described cellmass growth and the shift from glucose to acetate consumption with only 29 FBA modes. Global 19 YieYldie sldenSseitinvsitiyt i v(iAty.UI.n mdiecaiens + s.d.) sensitivity analysis further showed that only 5 of the 29 FBA modes were critical to model performance. Removal of these modes crippled the model, but removal of the remaining 24 modes had a negligible impact. These insignificant modes were associated with maintenance, thus they would likely not impact model predictions since the data represented a growing culture. HCM-FBA is an alternative approach to HCM, especially for large networks where the generation of elementary modes is infeasible. Elementary modes show the complexity of a cell, displaying the many routes it can take but mathematically FBA has an objective superiority for large networks. HCM-FBA is a promising approach to model large metabolic networks where elementary modes calculations are infeasible, and where kinetic models of such systems have intractable identification problems. However, there are additional studies that should be performed. First, the intracellular flux distribution predicted by HCM-FBA should be compared to HCM and to flux measurements calculated using MFA or FBA/DFBA in combination with carbon labeling. HCM predicted intracellular fluxes that were similar to MFA results [95]; however, the fluxes predicted by HCM-FBA have not yet been validated. Next, the performance of HCM-FBA should be compared to lumped hybrid cybernetic models (L-HCM). L- HCMs, which combine elementary modes into mode families based upon metabolic function [155, 156], have been applied to an E. coli network with 67 reactions and a Saccharomyces cerevisiae network with 70 reactions; both cases had satisfactory fits to extracellular experimental data. However, while L-HCM reduces the dimension of possible alternative modes that must be considered, it still requires the calculation 20 of an initial set of modes. For metabolic networks of even moderate size, EM (or EP) decomposition may not be possible. On the other hand, the generation of flux balance solutions (convex combinations of the elementary modes or extreme pathways) is trivial, even for genome scale metabolic networks. Thus, HCM-FBA opens up the possibility for dynamic genome scale models of bacterial and perhaps even of mammalian metabolism. 2.5 Materials and Methods The HCM-FBA approach is a modification of HCM, where elementary modes are replaced with flux balance analysis solutions. Thus, extracellular variables are dynamic while intracellular metabolites are at a pseudo steady state. The abundance of extracellular species i (xi), the pseudo enzyme el (catalyzes flux through mode l), and cellmass are governed by: dx R Li = σ z q (e, k, x) c i = 1, . . . ,M dt ∑ ∑ ij jl lj=1 l=1 del = α + r (k, x) u − (β + r ) e l = 1, . . . ,L dt l El l l G l dc = rGcdt where R andM denote the number of reactions and extracellular species in the model and L denotes the number of FBA modes. The quantity σij denotes the stoichiometric coefficient for species i in reaction j and zjl denotes the normalized 21 flux for reaction j in mode l. If σij > 0, species i is produced by reaction j; if σij < 0, species i is consumed by reaction j; if σij = 0, species i is not connected with reaction j. Extracellular species balances were subject to the initial conditions x (to) = xo determined from experimental data. The term ql (e, k, x) denotes the specific uptake/secretion rate for mode l where e denotes the pseudo enzyme vector, k denotes the unknown kinetic parameter vector, x denotes the extracellular species vector, and c denotes the cell mass; ql (e, k, x) is the product of a kinetic term (q̄l) and a control variable governing enzyme activity. Flux through each mode was catalyzed by a pseudo enzyme el, synthesized at the regulated specific rate rE,l (k, x), and constitutively at the rate αl. The term ul denotes the cybernetic variable controlling the synthesis of enzyme l. The term βl denotes the rate constant governing non-specific enzyme degradation, and rG denotes the specific growth rate through all modes. The specific uptake/secretion rates and the specific rate of enzyme synthesis were modeled using saturation kinetics. The specific growth rate was given by: L rG = ∑ zµlql (e, k, x) l=1 where zµl denotes the growth flux µ through mode l. The control variables ul and vl , which control the synthesis and activity of each enzyme respectively, were given by: z u = sl q̄l z q̄ l L vl = sl l max z q̄ ∑ z q̄ sl lsl l l=1,...,L l=1 22 where zsl denotes the uptake flux of substrate s through mode l. The model equations were implemented in Julia (v.0.4.2) [16] and solved using SUNDIALS [66]. The model code is available at http://www.varnerlab.org under a MIT license. 2.5.1 Elementary mode and flux balance analysis Elementary modes were calculated using METATOOL 5.1 [87]. FBA modes were defined as the solution flux vector through the network connecting substrate uptake to cellmass and extracellular product formation. The FBA problem was formulated as: ( ) max w T w obj = θ w Subject to : Sw = 0 αi ≤ wi ≤ βi i = 1, 2, . . . ,R where S denotes the stoichiometric matrix, w denotes the unknown flux vector, θ denotes the objective selection vector and αi and βi denote the lower and upper bounds on flux wi, respectively. The flux balance analysis problem was solved using the GNU Linear Programming Kit (v4.52) [1]. For each FBA mode, the objective wobj was to maximize either the specific growth rate or the specific rate of byproduct formation. Multiple FBA modes were calculated for each objective by allowing the oxygen and nitrate uptake rates to vary. For aerobic metabolism, the specific oxygen and nitrate uptake rates were constrained to allow a maximum flux of 10 mM/gDW·hr and 0.05 mM/gDW·hr, respectively. Each FBA mode was 23 normalized by the specified objective flux. 2.5.2 Global sensitivity analysis Variance based sensitivity analysis was used to estimate which FBA modes were critical to model performance. The performance function used in this study was the biomass yield on substrate. Candidate parameter sets (N = 182,000) were generated using Sobol sampling by perturbing the best fit parameter set ±50% [65]. Model performance, calculated for each of these parameter sets, was then used to estimate the total-order sensitivity coefficient for each model parameter. 2.5.3 Estimation of model parameters Model parameters were estimated by minimizing the difference between simula- tions and experimental measurements (squared residual): ( ) T S 2x̂ min ∑ ∑ j (τ)− xj (τ, k) k =1 j 1 ωj (ττ = ) where x̂j (τ) denotes the measured value of species j at time τ, xj (τ, k) denotes the simulated value for species j at time τ, and ωj (τ) denotes the experimental measurement variance for species j at time τ. The outer summation is with respect to time, while the inner summation is with respect to state. The model residual 24 was minimized using simulated annealing implemented in the Julia programming language. 25 CHAPTER 3 SEQUENCE SPECIFIC MODELING OF E. COLI CELL-FREE PROTEIN SYNTHESIS 3.1 Abstract 1 Cell-free protein synthesis (CFPS) is a widely used research tool in systems and synthetic biology. However, if CFPS is to become a mainstream technology for applications such as point of care manufacturing, we must understand the perfor- mance limits and costs of these systems. Toward this question, we used sequence specific constraint based modeling to evaluate the performance of E. coli cell-free protein synthesis. A core E. coli metabolic network, describing glycolysis, the pentose phosphate pathway, energy metabolism, amino acid biosynthesis and degradation was augmented with sequence specific descriptions of transcription and translation and effective models of promoter function. Model parameters were largely taken from literature, thus the constraint based approach coupled the transcription and translation of the protein product, and the regulation of gene expression, with the availability of metabolic resources using only six adjustable model parameters. We tested this approach by simulating the expression of two model proteins: chloramphenicol acetyltransferase and dual emission green fluo- rescent protein, for which we have datasets; we then expanded the simulations to 1Adapted with permission from Vilkhovoy M, Horvath N, Shih CH, Wayman JA, Calhoun K, Swartz J, and Varner JD, ”Sequence specific modeling of E. coli cell-free protein synthesis” (2018) ACS Synthetic Biology, 7(8):1844-1857. 26 a range of additional proteins. Protein expression simulations were consistent with measurements for a variety of cases. The constraint based simulations confirmed that oxidative phosphorylation was active in the CAT cell-free extract, as without it there was no feasible solution within the experimental constraints of the system. We then compared the metabolism of theoretically optimal and experimentally con- strained CFPS reactions, and developed parameter free correlations which could be used to estimate productivity as a function of carbon number and promoter type. Lastly, global sensitivity analysis identified the key metabolic processes that controlled CFPS productivity and energy efficiency. In summary, sequence specific constraint based modeling of CFPS offered a novel means to a priori estimate the performance of a cell-free system, using only a limited number of adjustable pa- rameters. While we modeled the production of a single protein in this study, the approach could easily be extended to multi-protein synthetic circuits, RNA circuits or the cell free production of small molecule products. 3.2 Introduction Cell-free protein expression has become a widely used research tool in systems and synthetic biology, and a promising technology for personalized protein production. Cell-free systems offer many advantages for the study, manipulation and modeling of metabolism compared to in vivo processes. Central amongst these is direct access to metabolites and the biosynthetic machinery without the interference of a cell wall 27 or the complications associated with cell growth. This allows interrogation of the chemical environment while the biosynthetic machinery is operating, potentially at a fine time resolution. Cell-free protein synthesis (CFPS) systems are arguably the most prominent examples of cell-free systems used today [81]. However, CFPS is not new; Matthaei and Nirenberg first used E. coli cell-free extracts in the 1960s to decipher the sequencing of the genetic code [118, 128]. Spirin and coworkers later improved the operational lifetime of cell-free protein production with a continuous exchange of reactants and products; however, these systems could only synthesize a single product and were energy limited [159]. More recently, CFPS was improved by generating ATP using both substrate level [93] and oxidative phosphorylation [82, 83]. Today, cell-free systems are used in a variety of applications ranging from therapeutic protein production [110, 94] to synthetic biology [70]. There are also several CFPS technology platforms, such as the PANOx-SP and Cytomin platforms developed by Swartz and coworkers [93, 82, 81], and the TX/TL platform of Noireaux [52]. However, if CFPS is to become a mainstream technology for advanced applications such as point of care manufacturing [137], we must first understand the performance limits and costs of these systems [81]. One tool to address these questions is constraint based modeling. Constraint based approaches such as flux balance analysis (FBA), which use stoi- chiometric reconstructions of microbial metabolism, have become standard tools in systems biology and metabolic engineering [108]. FBA and metabolic flux analysis (MFA) [187], as well as convex network decomposition approaches such as ele- mentary modes [150] and extreme pathways [148], model intracellular metabolism 28 using the biochemical stoichiometry and other constraints such as thermodynam- ical feasibility [64, 62] under pseudo steady state conditions. Constraint based approaches have used linear programming [34] to predict productivity [176, 146], yield [176], mutant behavior [43], and growth phenotypes [129] for biochemical networks of varying complexity, including genome scale networks, using a limited number of adjustable parameters. Since the first genome scale stoichiometric model of E. coli [42], stoichiometric reconstructions of hundreds of organisms, including industrially important prokaryotes such as E. coli [44] and B. subtilis [131], are now available [45]. Stoichiometric reconstructions have been expanded to include the in- tegration of metabolism with detailed descriptions of gene expression (ME-Model) [4, 107, 129] and protein structures (GEM-PRO) [197, 30]. These expansions have greatly increased the scope of questions that constraint based models can explore. Thus, constraint based methods are powerful tools to estimate the performance of metabolic networks. However, constraint based methods are typically used to model in vivo processes, and have not yet been applied to cell-free metabolism. In this study, we used sequence specific constraint based modeling to eval- uate the performance of E. coli cell-free protein synthesis. A core E. coli cell- free metabolic model describing glycolysis, pentose phosphate pathway, energy metabolism, amino acid biosynthesis and degradation was developed from litera- ture [44]; this model was then augmented with sequence specific descriptions of promoter function, transcription and translation processes. Thus, the sequence specific constraint based approach explicitly coupled transcription and translation processes with the availability of metabolic resources in the CFPS reaction. We 29 tested this approach by simulating the cell-free production of two model proteins, and then investigated the productivity and energy efficiency for eight additional proteins. Productivity was inversely proportional to carbon number, while energy efficiency was independent of carbon number. Based on these simulations, effective correlation models for optimal protein productivity and energy efficiency were developed. These correlations were then independently validated with maltose binding protein which was not in the original data set. Further, global sensitivity analysis identified the key metabolic processes that controlled CFPS performance; oxidative phosphorylation was vital to energy efficiency, while the translation rate was the most important factor controlling productivity. Lastly, we compared theoretically optimal metabolic flux distributions with experimentally constrained flux distributions; the experimental CFPS system had an overconsumption of glu- cose and overproduction of ATP which negatively influenced energy efficiency. Taken together, sequence specific constraint based modeling of CFPS offered a novel means to a priori estimate the performance of a cell-free system, using only six adjustable parameters. While we considered only a single protein here, this approach could be extended to synthetic circuits, RNA circuits [73] or even cell-free small molecule production. 30 Metabolism Transcription & Translation DNA NTPs Amino RNA Polymerase Mathematical Acids Degradation Representation Cofactors mRNA Ribosome Target Protein Figure 3.1: Sequence specific flux balance analysis. A. Schematic of the core metabolic network coupled to sequence-specific transcription and translation pro- cesses of a protein of interest for cell-free protein synthesis . 3.3 Results and discussion 3.3.1 Model derivation and validation The cell-free stoichiometric network was constructed by removing growth associ- ated reactions from the iAF1260 reconstruction of K-12 MG1655 E. coli [44], and adding deletions associated with the specific cell-free system (see Materials and Methods). The iAF1260 reconstruction describes 1260 ORFs, and thermodynami- cally derived metabolic flux directionality. We then added the transcription and translation template reactions of Allen and Palsson for the specific proteins of inter- est [4]. A schematic of the metabolic network, consisting of 264 reactions and 146 species, is shown in Fig. 3.1A. The network described the major carbon and energy 31 pathways and amino acid biosynthesis and degradation pathways. Using this net- work in combination with effective promoter models taken from Moon et al. [123] and literature values for cell-free culture parameters (Table 3.2), we simulated the sequence specific production of two model proteins: chloramphenicol acetyltrans- ferase (CAT) and dual emission green fluorescent protein (deGFP, Fig. 3.2A). Dual emmission GFP was produced under a P70a promoter in an E. coli extract for eight hours using maltose and 3-phosphoglycerate (3PG) as a carbon and energy source (R2 = 0.84, Fig. 3.2A). Uncertainty in experimental factors such as the concentration of RNA polymerase, ribosomes, transcription and translation elongation rates, as well as the upper bounds on oxygen, maltose, and 3pg consumption rates, did not qualitatively alter the performance of the model (blue region, 95% confidence estimate of 100 sets). However, these simulations were only conducted at a single plasmid concentration of 5 nM. Thus, it was unclear if the model could capture cell-free protein synthesis for a range of plasmid concentrations. Simulations of the cell-free deGFP titer for a range of plasmid concentrations were consistent with experimental measurements (Fig. 3.2B). The titer at each plasmid concentration was calculated by multiplying the deGFP synthesis flux by the active time of production, approximately 8 hours. The mean of the ensemble (calculated by sampling the uncertainty in the model parameters) captured the saturation of deGFP production as a function of plasmid concentration (R2 = 0.97). However, while the mean and 95% confidence estimate of the ensemble were consistent with measured deGFP levels, the model under predicted the deGFP titer at the saturating plasmid concentration of 5 nM. These results showed that the 32 A B 95% CI of Ensemble Mean of Ensemble Experimental Data Time (h) Plasmid Concentration (nM) Figure 3.2: Sequence specific flux balance analysis of deGFP under a P70a promoter in TXTL 2.0 E. coli extract. A. deGFP production for 8 h using maltose and 3PG as a carbon and energy source (R2 = 0.84). Error bars denote a 10% deviation from the nominal value. B. Predicted versus measured deGFP concentration as a function of plasmid concentration in TXTL 2.0 (R2 = 0.97). Error bars denote the standard deviation of experimental measurements. The blue region denotes the 95% CI over an ensemble of N = 100 sets, the black line denotes the mean of the ensemble, and dots denote experimental measurements. sequence specific template reactions, metabolic network, and literature parameters were sufficient to predict protein production under different promoters. We calculated the transcription rate using effective promoter models, and then maximized the rate of translation within biologically realistic bounds. Transcription and translation rates were subject to resource constraints encoded by the metabolic network, and transcription and translation model parameters were largely derived from literature (Table 3.2). In this study, we did not explicitly consider protein folding. However, the addition of chaperone or other protein maturation steps could easily be accommodated within the approach by updating the template reactions, see Palsson and coworkers [129]. The cell-free metabolic model code and 33 deGFP Concentration (μM) deGFP Concentration (μM) parameters can be downloaded under an MIT software license from the Varnerlab website [179]. Cell-free simulations of the time evolution of CAT production were consistent with experimental measurements (Fig. 3.4). CAT was produced under a T7 pro- moter in a glucose/NMP cell-free system using glucose as a source of carbon and energy [25]. Metabolic fluxes were constrained by experimental measurements of glucose, nucleotides, amino and organic acid consumption and production rates (estimated from a total of 37 metabolite time series measurements) for the first hour of the reaction. Whereas, the rates of CAT transcription and translation were predicted by the model. The model showed good agreement with the CAT measurement with a coefficient of determination of R2 = 0.92. Next, we simulated the production of deGFP under a P70a promoter in TXTL 2.0 using maltose and 3- phosphoglycerate (3PG) as a carbon and energy source. The TXTL simulation was performed with transcription and translation constraints estimated from literature, but without metabolite constraints, since experimental metabolite measurements were not reported. TXTL 2.0 simulations showed good agreement between es- timated and dynamic (R2 = 0.84) or end-point (R2 = 0.97) deGFP measurements (Supporting Information, Fig. 3.2), including the saturation of deGFP titer with plasmid concentration. In the cases of CAT and deGFP production, uncertainty in experimental factors such as the concentration of RNA polymerase, ribosomes, transcription and translation elongation rates, as well as the upper bounds on oxygen and carbon consumption rates (uniformly sampled around the parameter values shown in Table 3.2), did not qualitatively alter the performance of the 34 model (blue region, 95% confidence estimated for N=100 parameter sets). Together, these simulations suggested the description of transcription and translation, and its integration with metabolism encoded in the cell-free model, were consistent with experimental measurements. These simulations also showed that the sequence specific template reactions, metabolic network, and literature parameters were sufficient to predict protein production under different promoters. 3.3.2 Metabolic flux distributions Theoretical optimal flux distribution Recently, aerobic catabolism has been activated in CFPS which increases the usable energy from a carbon source such as glucose [81]. The discovery that such complex metabolism could be activated and controlled in CFPS led us to examine the flux distribution of CFPS. While there is no cell growth, complex anabolic and catabolic processes still occur during cell free protein synthesis [163]. The CAT translation rate was optimized without experimental constraints on substrate consumption or byproduct formation to estimate the theoretically optimal metabolic flux dis- tribution. In all cases, the CFPS reaction was supplied with glucose; however, we considered different scenarios for amino acid (AA) supplementation. Amino acids are routinely supplied in CFPS reactions [121, 25, 52], but it has not yet been determined whether de novo amino acid biosynthesis occurs in CFPS reactions [121, 117]. Thus, we simulated three different scenarios: first, the CFPS reaction 35 A AA Uptake & Synthesis B AA Uptake w/o Synthesis C AA Synthesis w/o Uptake GLC GLC GLC Pentose Phosphate Pathway Pentose Phosphate Pathway Pentose Phosphate Pathway G6P 6PG RU5P G6P 6PG RU5P G6P 6PG RU5P F6P XU5P R5P F6P XU5P R5P F6P XU5P R5P FBP S7P G3P FBP S7P G3P FBP S7P G3P E4P F6P E4P F6P E4P F6P T3P T3P T3P 1,3DPG 2DDG6P 1,3DPG 2DDG6P 1,3DPG 2DDG6P 3PG 3PG 3PG 2PG 2PG 2PG PEP PEP PEP PYR PYR PYR ACCOA ACCOA ACCOA Flux (A.U.) 100 OAA CIT OAA 80 CIT OAA CIT 60 MAL MAL MAL ICIT ICIT ICIT GLX GLX GLX TCA Cycle TCA Cycle TCA Cycle 40 FUM AKG FUM AKG FUM AKG 20 SUCC SUCCOA SUCC SUCCOA SUCC SUCCOA 0 Figure 3.3: Optimal metabolic flux distribution for CAT production. A. Optimal flux distribution in the presence of amino acid supplementation and de novo synthesis. B. Optimal flux distribution in the presence of amino acid supplementation without de novo synthesis. C. Optimal flux distribution with de novo amino acid synthesis in the absence of supplementation. Mean flux across the ensemble (N = 100), normalized to glucose uptake flux. Thick arrows indicate flux to or from amino acid biosynthesis pathways. was supplied with glucose and amino acids, and was able to synthesize amino acids from glucose (AAs supplied and de novo synthesis). In this case, the flux distribution showed an incomplete TCA cycle, where a combination of glucose and amino acids powered protein expression (Fig. 3.3A). Glucose was consumed to produce acetyl-coenzyme A, and associated byproducts, while glutamate was 36 Glycolysis Glycolysis Glycolysis converted to alpha-ketoglutarate which traveled to oxaloacetic acid and pyruvate for additional amino acid biosynthesis. In order to validate this case experimentally, a separate CFPS reaction would have to be prepared where amino acids are not supplied during cell growth, before cell-free extract preparation. In the second scenario, the CFPS reaction was supplied with glucose and amino acids, but de novo amino acid biosynthesis was not allowed (AAs supplied w/o de novo synthesis). This scenario was potentially consistent with common cell-free extract preparation protocols which often involve amino acid supplementation during cell growth; in the presence of supplementation we expected the enzymes responsible for amino acid biosynthesis to be largely absent from the CFPS reaction. Our comprehensive dataset for CAT synthesis is likely representative of this case, thus we compared the optimal and experimentally constrained flux distribution for these cases subse- quently. With supplementation and without de novo synthesis, the flux distribution showed no TCA cycle flux with all carbon flux traveling from glucose to acetate. In this case, ATP was produced by a combination of substrate level and oxidative phosphorylation, where ubiquinone was regenerated via either cyo and cyd activity, without relying on succinate dehydrogenase in the TCA cycle (Fig. 3.3B). These first two cases where amino acids were available had similar performance, and their respective metabolic flux distributions had a 99% correlation. Lastly, when the CFPS reaction was supplied with glucose but not amino acids, the system was forced to synthesize amino acids de novo from glucose (de novo synthesis only), the flux distribution showed a largely complete TCA cycle, with diversion of metabolic flux into the Entner-Doudoroff pathway to produce NADPH (Fig. 3.3C). To val- 37 idate this case experimentally, a CFPS extract would have to be prepared where amino acids are not supplied during cell growth and the CFPS reaction would have to be run without amino acid supplementation in the media. However, these simulations represent the theoretically optimal metabolic flux distribution, which may not be consistent with what is observed experimentally. Toward this question, we compared the optimal metabolic flux distribution of the second scenario (AA supplementation without de novo synthesis) with the experimentally constrained case (Fig. 3.4A). Experimentally constrained flux distribution The experimentally constrained metabolic flux distribution had a 54% correlation with the theoretically optimal flux distribution (AAs supplied w/o de novo syn- thesis; Fig. 3.3B). The low similarity suggested several differences between the experimentally constrained and optimal metabolic flux distribution. The largest dis- crepancy was in the pentose phosphate pathway, oxidative phosphorylation, and anaplerotic reactions which had no correlation. The experimentally constrained simulation suggested a high flux through zwf, yielding NADPH which was inter- converted to NADH via the pnt1 reaction. This NADH was consumed to convert pyruvate to lactate or to generate ATP via oxidative phosphorylation. In contrast, the optimal solution had no zwf nor pnt1 activity. Oxidative phosphorylation had a negligible correlation, where the experimental system relied on cyd rather than cyo to produce ATP through oxidative phosphorylation. However, the experimentally 38 A Experimental Measurements B GLC Pentose Phosphate Pathway 95% CI of Ensemble G6P 6PG RU5P Mean of Ensemble Experimental Data F6P XU5P R5P FBP S7P G3P E4P F6P T3P 1,3DPG 2DDG6P 3PG 2PG PEP PYR ACCOA Flux (A.U.) 100 OAA 80CIT 60 MAL ICIT GLX TCA Cycle 40 FUM AKG 20 SUCC SUCCOA 0 Time (h) Time (h) Figure 3.4: Experimentally constrained simulation of CAT production. CAT was produced under a T7 promoter in CFPS E. coli extract for 1 h using glucose as a carbon and energy source. Error bars denote the standard deviation of experimental measurements. The blue region denotes the 95% CI over an ensemble of N = 100 sets, the black line denotes the mean of the ensemble, and dots denote experimental measurements. A. Metabolic flux distribution for CAT production in the presence of experimental constraints for glucose, organic acid and amino acid consumption and production rates. Mean flux across the ensemble, normalized to glucose uptake flux. Thick arrows indicate flux to or from amino acids. B. Central carbon metabolite and CAT measurements versus simulations over a 1 hour time course. The blue region denotes the 95% CI over an ensemble of N = 100 sets, the black line denotes the mean of the ensemble, and dots denote experimental measurements. constrained simulation suggested that oxidative phosphorylation must be active in the CFPS extract, as without it there was no feasible solution within the experi- 39 Glycolysis Lactate (mM) Pyruvate (mM) Glucose (mM) Succinate (mM) Acetate (mM) CAT (μM) mental constraints. On the other hand, overflow metabolism and glycolysis were highly correlated between the optimal and experimentally constrained solutions, with a 72% and 81% correlation, respectively. Surprisingly, folate, purine, and pyrimidine metabolism were active in the experimental system, but inactive in the optimal system. Lastly, alanine, glutamine, pyruvate, lactate, acetate, malate, and succinate all accumulated in the experimental system, whereas the optimal solution produced only acetate; this accumulation contributed to the difference in the flux distributions. Next, we examined the productivity and energy efficiency for the cell-free synthesis of model proteins. 3.3.3 Analysis of CFPS performance We analyzed the productivity and energy efficiency for the cell-free production of eight proteins with and without amino acid supplementation (Fig. 3.5). The expression of each protein was under a P70a promoter, with the exception of CAT which was expressed using a T7 promoter. In all cases, the CFPS reaction was supplied with glucose; however, we considered different scenarios for amino acid (AA) supplementation, similar to the cases considered in the flux distribution: AAs supplied and de novo synthesis, AAs supplied w/o de novo synthesis, and AA de novo synthesis only. Eight proteins, ranging in size, were selected to evaluate CFPS performance: bone morphogenetic protein 10 (BMP10), chloramphenicol acetyltransferase (CAT), caspase 9 (CASP9), dual emission green fluorescent pro- tein (deGFP), prothrombin (FII), coagulation factor X (FX), fibroblast growth factor 40 21 (FGF21), and single chain variable fragment R4 (scFvR4). An additional case was considered for CAT, where central metabolic fluxes were constrained by ex- perimental measurements of glucose, organic and amino acids. Using these model proteins, we developed effective correlation models that predicted the productivity and energy efficiency given the carbon number of the protein. Finally, we inde- pendently validated the correlations with a protein not in our original data set: maltose binding protein (MBP). Productivity The theoretical maximum productivity for proteins expressed using a P70a pro- moter (µM/h) was inversely proportional to the carbon number (CPOI) and varied between 1 and 12 µM/h for the proteins sampled (Fig. 3.5A-B). The theoretical max- imum productivities, with and without amino acid supplementation, were within a standard deviation of one another for each protein, but varied significantly between proteins. Productivity varied non-linearly with carbon number of the protein; for instance, BMP10 (424 aa) had a optimal productivity of approximately 2.5 µM/h, whereas the optimal productivity of deGFP (229 aa) was approximately 8.4 µM/h. To examine the influence of protein size, we plotted the mean optimal productivity against the carbon number of each protein (Fig. 3.5B). The optimal productivity and carbon number were related by the power-law relationship α× (C )βPOI , where α = 6.02× 106 µM/(h·carbon number) and β = −1.93 for a P70a promoter. Inter- estingly, CAT did not obey the P70a power-law relationship; the relatively high 41 productivity of CAT was due to its T7 promoter. The higher transcription rate of the T7 promoter increased the steady state level of mRNA by 34%, resulting in a higher productivity. However, CAT expressed under a P70a promoter followed the P70a power-law correlation with a productivity of approximately 8.5 ± 2.3 µM/h (predicted to be 7.2 µM/h by the optimal productivity correlation). These simulations suggested a promoter specific relationship between the productivity and carbon number of the protein. However, it was unclear if the productivity correlation was predictive for proteins not considered in the original data set. We independently validated the productivity correlation by calculating the optimal productivity of MBP (which was not in the original dataset) using the full model and the effective correlation model (Fig. 3.5B). The prediction error was less than 8% for an a priori prediction of CFPS productivity using the effective correla- tion. Thus, the effective productivity correlation could be used as a parameter-free method to estimate optimal productivity for cell-free protein production using a P70a promoter. For CFPS using other promoters, a similar correlation model could be developed. For example, maximal transcription occurs when the promoter model coefficient u (κ) = 1; the theoretical maximum productivity correlation for maximum promoter activity also followed a power-law distribution (α = 1.39× 107 µM/(h·carbon number) and β = −1.99) (Fig. 3.5B, gray). The CAT value under a T7 promoter was similar to the maximal productivity as uT7 (κ) ' 0.91 given the T7 promoter model parameters used in this study (Table 3.2). Taken together, the maximum optimal productivity of a cell-free reaction was found to be inversely proportional to carbon number of the protein, following a power-law relationship 42 for proteins expressed under a P70a promoter. Energy efficiency The optimal energy efficiency of protein synthesis was independent of carbon num- ber, with and without amino acid supplementation (Fig. 3.5C-D); it was approxi- mately 84% for the model proteins sampled. The relationship was linear, but with negligible slopes: mY × (CPOI) + bY, where mY = −1.43× 10−4 energy efficiency (%)/carbon number for the case with supplementation, and mY = 3.21× 10−3 energy efficiency (%)/carbon number for the case without supplementation. The energy efficiency (y-intercept) was calculated at bY = 84.15 (%) with supplementa- tion, and bY = 66.96 (%) without supplementation. In the presence of amino acids, energy was utilized to power CFPS instead of synthesizing amino acids; thus, a constant energy efficiency was observed regardless of the carbon number of the protein. In the absence of supplementation, the energy efficiency decreased to between 68% and 76%. In this case, glucose consumption more than doubled (64% increase for CAT) compared to cases supplemented with amino acids; meanwhile, the productivity was similar for each protein (Fig. 3.5B). Therefore, the energy burden required for synthesizing each amino acid and powering CFPS lowered the energy efficiency. Surprisingly, without amino acid supplementation, proteins with a higher carbon number had marginally higher energy efficiency (R2 = 0.82). This counter intuitive result was an artifact of the difference in productivity between small and large carbon number proteins. Smaller carbon number proteins had a 43 higher productivity compared to larger carbon number proteins. Thus, smaller carbon number proteins had a higher energy demand to meet the increased pro- ductivity. When smaller carbon number proteins were constrained to have the same productivity as larger carbon number proteins, they had comparable energy efficiencies. There were also differences in the metabolic flux distribution between smaller and larger carbon number proteins. Larger carbon number proteins had a higher flux through glycolysis and oxidative phosphorylation. On the other hand, the smaller carbon number proteins had higher flux through zwf, e.g., approxi- mately 84% of flux traveled through zwf for FGF21 compared to 67% for FII. This difference in pathway choice is also due to the higher productivity of the smaller carbon number proteins. Higher productivity increased the demand of NADPH (required for amino acid biosynthesis since amino acids were not available in the media), where NADPH was generated via zwf. Lastly, the optimal energy efficiency MBP was well predicted by the linear efficiency model with and without amino acid supplementation. The estimated MBP energy efficiency had a maximum error of 6% compared to the correlation model prediction without supplementation, and an error of 1% in the presence of amino acids. Experimentally constrained CAT simulations showed suboptimal energy effi- ciency (Fig. 3.5D, dagger). CAT production was simulated using the constraint based model in combination with experimental measurements of glucose, organic and amino acid consumption and production rates (Fig. 3.1B). The experimen- tally constrained energy efficiency was 16.4 ± 5.6% compared to the theoretical maximum of approximately 84 ± 0.1%. Thus, while the energy efficiency correla- 44 A B AA Uptake & Synthesis AA Synthesis w/o Uptake All Cases AA Uptake w/o Synthesis Experimental Measurements f(x) = 6.02⇥ 106x-1.93 (R2 = 0.99) Max Productivity f(x) = 1.39⇥ 107 -1.99 ( 2x R = 0.99) CAT* † FGF21 deGFP scFvR4 MBP BMP10 CASP9 FII FX BMP10 C ASP9 CAT deGFP FII FX FGF21 scFvR4 Carbon Number in POI C D AA Uppttaakkee & SSyynntthheessiiss AA SSyynntthheessiiss w//oo Uppttaakkee AA Uppttaakkee w//oo SSyynntthheessiiss EExxppeerriimeennttaall Meeaassuurreemeennttss FGF21 CAT BMP10 MBP scFvR4 CASP9 FX FIIdeGFP AA Uptake w/ & w/o Synthesis f(x) = -1.43⇥ 10-4x+ 84.15 AA Synthesis w/o Uptake f(x) = 3.21⇥ 10-3x+ 66.96 † BMP10 C ASP9 CAT deGFP FII FX FGF21 scFvR4 Carbon Number in POI Figure 3.5: The CFPS performance for eight model proteins with and without amino acid supplementation. A. Mean CFPS productivity for a panel of model proteins with and without amino acid supplementation. B. Mean CFPS productivity versus carbon number for a panel of model proteins with and without amino acid supplementation. Trendline (black dotted line) was calculated across all cases for a P70a promoter (R2 = 0.99) and maximum productivity trendline assumed u (κ) = 1 (grey dotted line; R2 = 0.99). C. Mean CFPS energy efficiency for a panel of model proteins with and without amino acid supplementation. D. Mean CFPS energy efficiency versus carbon number for a panel of model proteins with and without amino acid supplementation. Trendline for cases with amino acids (black dotted line) and trendline for without amino acids (grey dotted line; R2 = 0.81). Error bars: 95% CI calculated by sampling; asterisk: protein excluded from trendline; dagger: constrained by experimental measurements and excluded from trendline; triangles: first principle prediction and excluded from trendline. tion model was not effective in describing the experimental dataset as it assumes optimality, it was useful in showing the potential optimal energy efficiency CFPS systems could achieve. Given that the CAT productivity was similar between the 45 Energy Efficiency (%) Productivity (μM/h) Mean Energy Efficiency (%) Mean Productivity (μM/h) simulated and measured systems, differences in the glucose consumption rate and the ATP yield per glucose were likely responsible for the difference between the optimal and experimental systems. The glucose consumption rate was approxi- mately 30 - 40 mM/h in the experimental system (even in the presence of amino acids). On the other hand, the theoretical optimal simulation suggested the glucose consumption rate was significantly less than the observed rate, approximately 1 - 7 mM/h (depending upon amino acid supplementation). In the theoretical optimal simulation, the CFPS reaction produced only acetate as a byproduct, but in the experimental system acetate, lactate, pyruvate, succinate and malate all accumu- lated during the first hour of production. In the optimal system, the majority of the carbon flux traveled toward CAT synthesis, while the remaining flux traveled toward acetate. On the other hand, in the experimentally constrained system, there was relatively low carbon flux toward CAT synthesis compared to the glucose consumption rate, which lead to the accumulation of various organic acids. This suggested that the experimentally constrained system consumed more glucose than was required for CAT synthesis. The energy produced per unit glucose was also different between the optimal and experimentally constrained cases. In the optimal simulation, 12 ATPs were produced per unit glucose (the theoretical maximum for this network was 21), while the experimentally constrained simulation produced only ˜4 ATPs per glucose. Thus, approximately 120 - 160 mM ATP/h was produced in the experimental case, in contrast to 12 - 84 mM ATP/h for the optimal case. Thus, the experimental system overproduced ATP. We know from measurements that ATP did not accumulate in the media, which suggested it was consumed 46 by pathways that were not active in the optimal simulation. For example, in the experimentally constrained simulations, 36% of energy resources went toward nucleoside triphosphate (NTP) degradation. In contrast, the theoretical optimum had negligible NTP degradation. Taken together, comparison of the experimentally constrained and theoretically optimal flux distributions suggested the CFPS system over-consumed glucose, and counter intuitively overproduced ATP. A strategy to increase the energy efficiency of CFPS would be to feed less glucose, reducing the overproduction of ATP. Another potential strategy would be to increase the translation rate, which is potentially the bottleneck for protein production. This would allow for more energy to be consumed for protein production rather than NTP degradation. 3.3.4 Global sensitivity analysis We performed global sensitivity analysis to understand which parameters con- trolled CFPS productivity and energy efficiency (Fig. 3.6). The translation elonga- tion rate was the most important factor controlling productivity, while RNAP and ribosome abundance had only a modest effect irrespective of amino acid supple- mentation (Fig. 3.6A). This suggested that the translation elongation rate, and not transcriptional parameters, controlled productivity. Underwood and coworkers showed that increasing ribosome abundance did not significantly increase protein yields or rates; however, adding elongation factors increased protein synthesis rates by 27% [175]. In addition, Li et al. increased the productivity of firefly luciferase by 47 A B AA Uptake & Synthesis Transcription/Translation Transcription/Translation AA Uptake Parameters Parameters w/o Synthesis AA Synthesis w/o Uptake Figure 3.6: Sensitivity analysis of the cell-free production of CAT. A. Total order sensitivity of the optimal CAT productivity with respect to metabolic and transcrip- tion/translation parameters. B. Total order sensitivity of the optimal CAT energy efficiency. Metabolic and transcription/translation parameters were varied for amino acid supplementation and synthesis (black), amino acid supplementation without synthesis (dark grey) and amino acid synthesis without supplementation (light gray). Error bars represent the 95% CI of the total order sensitivity index. 5-fold in PURE CFPS by first improving translation, followed by transcription by adjusting elongation factors, ribosome recycling factor, release factors, chaperones, BSA, and tRNAs [109]. In examining substrate utilization, glucose consumption was not important for productivity in the presence of amino acid supplementa- tion. However, its importance increased significantly when amino acids were not available. On the other hand, amino acid consumption was only sensitive when de novo amino acids biosynthetic reactions were blocked, as these were the only source of amino acids for protein synthesis. The oxygen consumption rate was the most important factor controlling the energy efficiency of cell-free protein synthesis (Fig. 3.6B). Oxidative phosphorylation is the most efficient process for energy generation, however it is unclear how active oxidative phosphorylation 48 Productivity Sensitivity Index G Ulup ct oa sk ee O U xp ygta ek ne Amin U o p Ata ck ide R L Ne Av P T era lnscrip R tia ot ne Ribos L oe mve el Transla R tia ot ne Energy Efficiency Sensitivity Index G Ulup ct oa sk ee O U xp ygta ek ne Amin U o p Ata ck ide R L Ne Av Pel Transcrip R tia ot ne Ribos L oe mve el Transla R tia ot ne is in CFPS. In the model, we assumed that ATP could be produced by both sub- strate level and oxidative phosphorylation. Jewett and coworkers reported that oxidative phosphorylation still operated in cell-free systems, and that the protein titer decreased from 1.5-fold to 4-fold when oxidative phosphorylation reactions were inhibited in pyruvate-powered CFPS [81]. Furthermore, we showed that oxidative phosphorylation must be active to simultaneously meet the metabolic and protein production constraints. However, it is unknown how active oxidative phosphorylation is in a glucose-powered cell-free system and its quantitative effect on energy efficiency. We calculated the optimal CAT energy efficiency as a function of the oxida- tive phosphorylation flux to investigate the connection between energy efficiency and oxidative phosphorylation (Fig. 3.7). We calculated energy efficiency across an ensemble of 1000 flux balance solutions by varying the oxygen uptake rate with transcription and translation parameters. Oxidative phosphorylation had a strong effect on the energy efficiency, both with and without amino acid supple- mentation. In the presence of amino acid supplementation, the energy efficiency ranged from 50% to approximately 84%, depending on the oxidative phosphory- lation flux. However, without amino acid supplementation, the energy efficiency dropped to approximately 39%, and reached a maximum of 70%. In the absence of supplementation, a lower energy efficiency was expected for the same oxida- tive phosphorylation flux, as glucose was utilized for both energy generation and amino acid biosynthesis. In all cases, whenever the energy efficiency was below its theoretical maximum, there was an accumulation of both acetate and lactate. 49 AA Uptake & Synthesis AA Uptake w/o Synthesis AA Synthesis w/o Uptake Oxidative Phosphorylation Flux (mM/hr) Figure 3.7: Optimal CAT energy efficiency versus oxidative phosphorylation flux calculated across an ensemble (N = 1000) of flux balance solutions (points). Energy efficiency versus oxidative phosphorylation flux for amino acid supplementation and de novo synthesis (black), amino acid supplementation without de novo synthe- sis (dark grey), and de novo amino acid synthesis without supplementation (light gray). The ensemble was generated by randomly varying the oxygen consumption rate from 0.1 to 10 mM/h and randomly sampling the transcription and translation parameters within 10% of their literature values. Each point represents one solution of the model equations. The experimental dataset exhibited a mixture of acetate and lactate accumulation during CAT synthesis, which suggested the CFPS reaction was not operating with optimal oxidative phosphorylation activity. Oxidative phosphorylation is a mem- brane associated process, whereas CFPS does not rely on living cells, and at least in theory has no cell membrane in the extract. Jewett and coworkers, however, hypothesized that inverted membrane vesicles present in the CFPS reaction could carry out oxidative phosphorylation [81]. Toward this hypothesis, they enhanced 50 CAT Energy Efficiency (%) the CAT titer by 33% when the reaction was augmented with 10 mM phosphate; they suggested the additional phosphate either enhanced oxidative phosphoryla- tion activity or inhibited phosphatase reactions. They also showed that protein titer was significantly reduced in the absence of oxygen. The model validated oxidative phosphorylation activity in this CFPS system; without oxidative phosphorylation there was no feasible solution satisfying the experimental constraints. However, the number, size, protein loading, and lifetime of these inverted vesicles remains an open area of study. 3.3.5 Potential alternative metabolic optima Constraint based approaches are useful tools to calculate the optimal performance of a biological system; however, the metabolic flux distributions predicted by these methods are often not unique. Alternative optimal solutions have the same objec- tive value, e.g., productivity, but different metabolic flux distributions. Techniques such as flux variability analysis (FVA) [115, 149], mixed-integer approaches [103] or Monte Carlo sampling of the constraint space [185] can estimate alternative optimal flux distributions. In this study, we were not interested in specific alter- native optima, rather we wanted to generate a global view of which pathways were absolutely necessary to meet optimal CAT production. Toward this ques- tion, we removed entire reaction groups, and simulated CAT production subject to experimental constraints, to determine their influence on system performance. We examined the productivity difference between the experimentally constrained 51 A B Glycolysis/Gluconeogenesis Glycolysis/Gluconeogenesis Pentose Phosphate Pathway Pentose Phosphate Pathway Entner-Doudoroff Entner-Doudoroff TCA cycle TCA cycle Oxidative phosphorylation Oxidative phosphorylation Cofactors Cofactors Anaplerotic/Glyoxylate reactions Anaplerotic/Glyoxylate reactions Overflow metabolism Overflow metabolism Folate metabolism Folate metabolism Purine/Pyrimidine Purine/Pyrimidine ALA, ASP, ASN biosynthesis ALA, ASP, ASN biosynthesis GLU, GLN biosynthesis GLU, GLN biosynthesis ARG, PRO biosynthesis ARG, PRO biosynthesis GLY, SER biosynthesis GLY, SER biosynthesis CYS, MET biosynthesis CYS, MET biosynthesis LYS, THR biosynthesis LYS, THR biosynthesis HIS biosynthesis HIS biosynthesis PHE, TRP, TYR biosynthesis PHE, TRP, TYR biosynthesis ILE, LEU, VAL biosynthesis ILE, LEU, VAL biosynthesis Productivity difference Flux distribution difference 0.0 1.0 0.0 1.0 Figure 3.8: Pairwise knockouts of reaction subgroups in the cell-free network. A. Difference in the CAT productivity in the presence of reaction knockouts compared with no knockouts for experimentally constrained CAT production. B. Difference in the flux distribution in the presence of reaction knockouts compared with no knockouts for experimentally constrained CAT production. The difference between perturbed and wild-type productivity and flux distributions was quantified by the l2 norm, and then normalized so the maximum change was 1.0. Red boxes indicate potential alternative optimal flux distributions with the same CAT productivity as the wild type, whereas no red box indicates no feasible solution and/or the optimal CAT productivity was not met. system with no knockouts (wild-type) and with group knockouts (Fig. 3.8A). So- lutions that met the experimental constraints and produced CAT are indicated by red boxes, solutions outside the red boxes did not meet the constraints and resulted in infeasible solutions. We then quantified the difference in metabolic flux distribution between the wild-type and group knockouts (Fig. 3.8B). Globally, the constraint based simulation reached the same CAT productivity for 40% of the pairwise knockouts, while 92% of these solutions had different flux distributions compared with the wild-type. Knockout analysis identified pathways required for CAT production; for example, deletion of glycolysis/gluconeogenesis or oxida- 52 Gly P ce on lyt so iEn s s e /Gtne P lu r- h cone T D osph oC gA c oy udo a r to e enes Oxid c l P is a e ff athw Co tf i ay A a v n c e t o phoa sp O p rs hle or ro yti lv ae c F rflow / t G iol ny olate mP e o u ta x r b ylate A i m n etab ol rea L eA /Pyr o ism ctio , AS im lism ns GL P, id U ine A ,R G A G L, N SN P b bio GL RO iosy sn yt nY t C , h Y S es S E h i , R b e s io b sio ys n sis LYS M y th , E n esis H T T H bR i o t IS b s hesi bio i ynth s P s eHE, y o n st yh ne th sis I TRP si e s sLE , i s, LEU T, Y V RA bL i ob sio ynsy thn eth se issis Gly P ce on lyt sE o i n s s/G tn e P lu h co oneT oC er-D sp geA O c o y uc d h n o ar te P e a sisxida le off thw ti aC yof A a v n c e t o pr hs oa spp hle or rO o yverfl t lation Fol o ic a w / Gm lyt e oe ta x ym b lP aur e t ta olis e reac A ine/P my bolism tiL oA, A rS im ns GLU, P, id G A ine ARG L, N SN b bP ioR sGLY, S O io sb y yn E io n s th t e hCY R y s e i s S s is LY , M b E ioT s n y thS, bi n e T o ts h s H e is H sI isS P b R io bi y o ns thHE synt yh n esis IL , e t T s hesiE s, R isL PE , U T, Y V RA bL i ob sio ynsy thn eth se issis tive phosphorylation resulted attenuated CAT production. Attenuation of CAT production in the absence of oxidative phosphorylation suggested that oxidative phosphorylation was present in the CFPS cell-extract. However, this was not true for deGFP production in the TXTL 2.0 system. A robustness analysis with and without oxidative phosphorylation showed a feasible solution was reached for each simulation (Fig. 3.9). This suggested that oxidative phosphorylation may not required for a TXTL 2.0 system producing deGFP, however this result is currently under experimental investigation. There were also pathway knockouts that had no effect on productivity, such as Pentose Phosphate pathway (PPP) or the biosyn- thesis reactions of amino acids that did not accumulate in the media (only alanine and glutamine accumulated). For example, one of the features of the predicted wild-type metabolic flux distribution was a high flux through the first step of PPP (zwf ) and the Entner-Douodoroff (ED) pathway. Removal of PPP and the ED pathway had no effect on the CAT productivity compared to the wild-type (Fig. 3.8). Pairwise knockouts of the ED pathway and other subgroups (i.e. pentose phosphate pathway, cofactors, folate metabolism, etc.) also resulted in the same optimal CAT productivity. However, there was a difference in the flux distribution with these knockouts (Fig. 3.8B); thus, alternative optimal metabolic flux distribu- tions exist for CAT production, despite experimental constraints. Lastly, amino acid biosynthetic knockouts had no effect on the productivity with the exception of alanine, aspartate, asparagine, glutamate and glutamine biosynthesis reactions, since amino acids were available in the medium and 13 amino acid biosynthesis reactions were already blocked (see Materials and Methods). We blocked certain 53 biosynthesis reactions since the cell was grown in the presence of these amino acids during cell-free extract preparation. Alanine and glutamine accumulated in the medium; thus, when their biosynthesis reactions were removed, the simulation failed to meet the experimental constraints. This resulted in no feasible solution and no CAT production. Ultimately, to determine the metabolic flux distribution occurring in CFPS, we need to add additional constraints to the flux estimation calculation. For example, thermodynamic feasibility constraints may result in a better depiction of the flux distribution [64, 62], and 13C labeling in CFPS could provide significant insight. However, while 13C labeling techniques are well estab- lished for in vivo processes [194], application of these techniques to CFPS remains an active area of research. Taken together, a more constrained solution would help determine if CFPS has de novo amino acids biosynthesis, and could also help to identify strategies to optimize CFPS energy efficiency. 3.3.6 Summary and conclusions In this study, we developed a sequence specific constraint based modeling ap- proach to predict the performance of cell-free protein synthesis reactions. First principle predictions of the cell-free production of CAT and deGFP were in agree- ment with experimental measurements for two different promoters. While we considered only the P70a and T7 promoters here, we are expanding our library of possible promoters. These promoter models, in combination with the cell-free constraint based approach, could enable the de novo design of circuits for optimal 54 Without OxPhos Activity With OxPhos Activity Mean 3PG Uptake (mM/h) Figure 3.9: Robust analysis of maltose and 3PG consumption for TXTL 2.0 E. coli extract with and without oxidative phosphorylation activity that meet the transcrip- tion and translation constraints. Each dot represents the mean of an ensemble of N = 20 ssFBA solutions, black dots are solutions without oxidative phosphorylation and grey dots are solutions with oxidative phosphorylation. functionality and performance. We also developed effective correlation models for the productivity and energy efficiency as a function of carbon number that could be used to quickly prototype CFPS reactions. The productivity correlation model described the experimental measurements of CAT and deGFP, whereas the energy efficiency correlation model represented the theoretical optimum that CFPS could attain. Further, global sensitivity analysis identified that the translation rate had the highest effect on productivity, while oxidative phosphorylation was crucial for energy efficiency. While this first study was promising in predicting protein production, there are several issues to consider in future work. First, a more detailed description of transcription and translation reactions has been utilized in genome scale ME models e.g., O’Brien et al [129]. These template reactions could be adapted to a cell-free system. This would allow us to consider important 55 Mean Maltose Uptake (mM/h) facets of protein production, such as the role of chaperones in protein folding. We would also like to include post-translation modifications such as glycosylation that are important for the production of therapeutic proteins in the next generation of models. In conclusion, we modeled the cell-free production of a single protein in this study, but sequence specific constraint based modeling could be extended to multi-protein synthetic circuits, RNA circuits or small molecule production. 3.4 Materials and Methods 3.4.1 Glucose/NMP cell-free protein synthesis. The protein synthesis reaction was conducted using the PANOxSP protocol with slight modifications from that described previously [82]. The glucose/NMP cell- free protein synthesis reaction was performed using the S30 extract in 1.5-mL Eppendorf tubes (working volume of 15 µL) and incubated in a humidified incu- bator at 37 ◦C. The S30 extract was prepared from E. coli strain KC6 (A19 ∆tonA ∆tnaA ∆speA ∆endA ∆sdaA ∆sdaB ∆gshA met+). This K12-derivative has several gene deletions to stabilize amino acid concentrations during the cell-free reaction. The KC6 strain was grown to approximately 3.0 OD595 in a 10-L fermenter (B. Braun, Allentown PA) on defined media with glucose as the carbon source and with the addition of 13 amino acids (alanine, arginine, cysteine, serine, aspartate, glutamate, and glutamine were excluded) [195]. Crude S30 extract was prepared 56 as described previously [80]. Plasmid pK7CAT was used as the DNA template for chloramphenical acetyl transferase (CAT) expression by placing the cat gene between the T7 promoter and the T7 terminator [92]. The plasmid was isolated and purified using a Plasmid Maxi Kit (Qiagen, Valencia CA). All reagents were purchased from Sigma (St. Louis, MO), unless otherwise noted. The initial mixture included 1.2 mM ATP; 0.85 mM each of GTP, UTP, and CTP; 30 mM phosphoenolpyruvate (Roche, Indianapolis IN); 130 mM potassium glutamate; 10 mM ammonium glutamate; 16 mM magnesium glutamate; 50 mM HEPES-KOH buffer (pH 7.5); 1.5 mM spermidine; 1.0 mM putrescine; 34 µg/mL folinic acid; 170.6 µg/mL E. coli tRNA mixture (Roche, Indianapolis IN); 13.3 µg/mL pK7CAT plasmid; 100 µg/mL T7 RNA polymerase; 20 unlabeled amino acids at 2-3 mM each; 5 µM l-[U-14C]-leucine (Amersham Pharmacia, Uppsala Sweden); 0.33 mM nicotinamide adenine dinucleotide (NAD); 0.26 mM coenzyme A (CoA); 2.7 mM sodium oxalate; and 0.24 volumes of E. coli S30 extract. This reaction was modified for the energy source used such that glucose reactions have 30-40 mM glucose in place of PEP. Sodium oxalate was not added since it has a detrimental effect on protein synthesis and ATP concentrations when using glucose or other early glycolytic intermediate energy sources [93]. The HEPES buffer (pKa ∼ 7.5) was replaced with Bis-Tris (pKa ∼ 6.5). In addition, the magnesium glutamate concentration was reduced to 8 mM for the glucose reaction since a lower magnesium optimum was found when using a nonphosphorylated energy source [82]. Finally, 10 mM phosphate was added in the form of potassium phosphate dibasic adjusted to pH 7.2 with acetic acid. 57 3.4.2 Protein product and metabolite measurements. Cell-free reaction samples were quenched at specific timepoints with equal vol- umes of ice-cold 150 mM sulfuric acid to precipitate proteins. Protein synthesis of CAT was determined from the total amount of 14C-leucine-labeled product by trichloroacetic acid precipitation followed by scintillation counting as described previously [25]. Samples were centrifuged for 10 min at 12,000g and 4◦C. The supernatant was collected for high performance liquid chromatography (HPLC) analysis. HPLC analysis (Agilent 1100 HPLC, Palo Alto CA) was used to separate nucleotides and organic acids, including glucose. Compounds were identified and quantified by comparison to known standards for retention time and UV absorbance (260 nm for nucleotides and 210 nm for organic acids) as described pre- viously [25]. The standard compounds quantified with a refractive index detector included inorganic phosphate, glucose, and acetate. Pyruvate, malate, succinate, and lactate were quantified with the UV detector. The stability of the amino acids in the cell extract was determined using a Dionex Amino Acid Analysis (AAA) HPLC System (Sunnyvale, CA) that separates amino acids by gradient anion exchange (AminoPac PA10 column). Compounds were identified with pulsed amperometric electrochemical detection and by comparison to known standards. 58 3.4.3 Formulation and solution of the model equations. The sequence specific flux balance analysis problem was formulated as a linear program: ( ) max w = θTw w X Subject to : Sw = 0 (3.1) Li ≤ wi ≤ Ui i = 1, 2, . . . ,R where S denotes the stoichiometric matrix (M×R), w denotes the unknown flux vector (R× 1), θ denotes the objective vector (R× 1) and Li and Ui denote the lower and upper bounds on flux wi, respectively (both R× 1 column vectors). Unless otherwise specified, Li = 0 and Ui = 100 mM/hr. The transcription (T) and translation (X) stoichiometry was modeled using the template reactions of Allen and Palsson [4] (Table 3.1). The objective of the cell free flux balance calculation was to maximize the rate of protein translation, wX. The total glucose uptake rate was bounded by [0,40 mM/h] according to experimental data, while the amino acid uptake rates were bounded by [0,30 mM/h], but did not reach the maximum flux. Gene and protein sequences were taken from literature [181]. The sequence specific flux balance linear program was solved using the GNU Linear Programming Kit (GLPK) v4.55 [1]. For all cases, amino acid degradation reactions were blocked as these enzymes were likely inactivated during the cell-free extract preparation [25, 52]. In the absence of de novo amino acid synthesis, all amino acid synthesis reactions were set to 0 mM/h. In the experimentally constrained simulations, E. coli was grown in the presence of 13 amino acids (alanine, arginine, 59 Table 3.1: Transcription and translation template reactions for protein production. The symbol GP denotes the gene encoding protein product P , RT denotes the concentration of RNA polymerase, G∗P denotes the gene bounded by the RNA polymerase (open complex), ηi and αj denote the stoichiometric coefficients for nucleotide and amino acid, respectively, Pi denotes inorganic phosphate, RX de- notes the ribosome concentration, R∗X denotes bound ribosome, and AAj denotes jth amino acid. Description Template reaction Transcription initiation GP + R ∗T −→ GP Transcription (wT) G∗P + ∑ ηk · ({k} TP + H2O) −→ mRNA + GP + RT + ∑ ηk · PPi k∈{A,C,G,U} k∈{A,C,G,U} mRNA degradation mRNA −→ ∑ ηk · {k}MP k∈{A,C,G,U} Translation initiation mRN( A + RX ) −→ R∗X ( ) tRNA charging αj · AAj + tRNA + ATP + H2O −→ αj · AAj-tRNAj + AMP + PPi ( ) j = 1, 2, . . . , 20 Translation (w ) R∗X X + ∑ αj · AAj-tRNAj + 2GTP + 2H2O −→ P + RX + mRNA j∈{AA} ( ) +∑ αj · tRNA + 2GDP + 2Pi j∈{AA} cysteine, serine, aspartate, glutamate, and glutamine were excluded) [195], thus the synthesis reactions responsible for those 13 amino acids were set to 0 mM/h. Lastly, reactions that were knocked out in the host strain used to prepare the extract were removed from the network (∆speA, ∆tnaA, ∆sdaA, ∆sdaB, ∆gshA, ∆tonA, ∆endA). The bounds on the transcription rate (LT = wT = UT) were modeled as: ( ) w = Vmax GP T T (3.2)KT + GP where GP denotes the concentration of the gene encoding the protein of interest, and KT denotes a transcription saturation coefficient. The maximum transcription 60 rate VmaxT was formulated as: [ ( ) ] Vmax ≡ v̇R TT T u (κ) (3.3)lG where RT denotes the RNA polymerase concentration (nM), v̇T denotes the RNA polymerase elongation rate (nt/h), lG denotes the gene length (nt). The term u (κ) (dimensionless, 0 ≤ u (κ) ≤ 1) is an effective model of promoter activity, where κ denotes promoter specific parameters. The general form for the promoter models was taken from Moon et al. [123]; which was based on earlier studies from Bintu and coworkers [17], and similar to the genetically structured modeling approach of Lee and Bailey [104]. In this study, we considered two promoters: T7 and P70a. The promoter function for T7, uT7, was given by: K u = T7T7 (3.4)1 + KT7 where KT7 denotes a T7 RNA polymerase binding constant. The P70a promoter function uP70a (which was used for all other proteins) was formulated as: K + K f u 1 2 σ70P70a = (3.5)1 + K1 + K2 fσ70 where K1 denotes the weight of RNA polymerase binding alone, K2 denotes the weight of RNAP-σ70 bound to the promoter, and fp70 denotes the fraction of the 61 σ70 transcription factor bound to RNAP, modeled as a Hill function: σn f 70σ70 = Kn + σn (3.6) D 70 where σ70 denotes the sigma-factor 70 concentration, KD denotes the dissociation constant, and n denotes a cooperativity coefficient. The values for all promoter parameters are given in Table 3.2. The translation rate (wX) was bounded by: ( ) ≤ ≤ mRNA0 wX VmaxX (3.7)KX + mRNA where mRNA∗ denotes the steady state mRNA abundance and KX denotes a trans- lation saturation constant. The maximum translation rate VmaxX was formulated as: [ ( )] Vmax v̇X X ≡ KPRX (3.8)lP The term KP denotes the polysome amplification constant, v̇X denotes the ribosome elongation rate (amino acids per hour), and lP denotes the number of amino acids in the protein of interest. The mRNA abundance mRNA was estimated as: mRNAt+∆t = mRNAt + (wT −mRNAtλ)∆t (3.9) where λ denotes the mRNA degradation rate (h−1). All translation parameters are 62 Table 3.2: Parameters for sequence specific flux balance analysis Description Parameter Value Units Reference T7 RNA polymerase concentration RT 1.0 µM specified Native RNA polymerase concentration RT 75 nM [52] Ribosome concentration RX 1.6 µM [52, 175] Transcription elongation rate v̇T 25 nt/s [52] Translation elongation rate v̇X 2 aa/s/ribosome [52, 175] T7 transcription saturation coefficient KT7,T 116 nM estimated P70 transcription saturation coefficient KP70,T 3.5 nM estimated Translation saturation coefficient KX 45.0 µM estimated Polysome number KP 10 ribosome number estimated mRNA degradation rate constant λ 5.2 h−1 [52] T7 promoter weight KT7 10 constant estimated Weight RNA polymerase binding alone P70a K1 0.014 constant estimated Weight bound RNAP-σ70 P70a K2 10 constant estimated σ70 concentration σ70 35 nM [52] σ70 dissociation constant KD 130 nM [119] σ70 hill coefficient n 1 constant [119] Gene concentration GP 5 nM [52] ATP transcription coefficient (CAT) ATPT 176 constant calculated CTP transcription coefficient (CAT) CTPT 144 constant calculated GTP transcription coefficient (CAT) GTPT 151 constant calculated UTP transcription coefficient (CAT) UTPT 189 constant calculated ATP tRNA charging coefficient (CAT) ATPX 219 constant calculated GTP translation coefficient (CAT) GTPX 438 constant calculated given in Table 3.2. 63 3.4.4 Calculation of energy efficiency. Energy efficiency (E ) was calculated as the ratio of transcription and translation (weighted by the appropriate energy species coefficients) to ATP generation: E w= T · αT + wX · αXATP (3.10)∑ σj w̄j j∈RATP αT = 2 · (ATPT + CTPT + GTPT + UTPT) (3.11) αX = 2 ·ATPX + GTPX (3.12) where αT denotes the energy cost of transcription, αX denotes the energy cost of translation, RATP denotes the set of ATP-producing reactions, and σATPj denotes the ATP coefficient for reaction j. ATPT, CTPT, GTPT, and UTPT denote the stoi- chiometric coefficients of each energy species for the transcription of the protein of interest, ATPX and GTPX denote the stoichiometric coefficients of ATP and GTP for the translation of the protein of interest. During transcription and tRNA charging, triphosphate molecules are consumed with monophosphates as byproducts; this is the reason for the factors of 2 on ATPT, CTPT, GTPT, UTPT, and ATPX 64 3.4.5 Quantification of uncertainty. Experimental factors taken from literature, for example macromolecular concentra- tions or elongation rates, are uncertain. To quantify the influence of this uncertainty on model performance, we randomly sampled the expected physiological ranges for these parameters as determined from literature. An ensemble of flux distri- butions was calculated for the three different cases we considered: control (with amino acid synthesis and uptake), amino acid uptake without synthesis, and amino acid synthesis without uptake. The flux ensemble was calculated by randomly sampling the maximum glucose consumption rate within a range of 0 to 30 mM/h (determined from experimental data) and randomly sampling RNA polymerase levels, ribosome levels, and elongation rates in a physiological range determined from literature. P70 RNA polymerase levels were sampled between 60 and 80 nM, T70 RNA polymerase levels were sampled between 990 and 1010 nM, ribosome levels between 1.2 and 1.8 µM, the RNA polymerase elongation rate between 20 and 30 nt/s, and the ribosome elongation rate between 1.5 and 3 aa/s [175, 52]. We generated uniform random samples between an upper (u) and lower (l) parameter bound of the form: p∗ = l + (u− l)×U (0, 1) (3.13) 65 3.4.6 Global sensitivity analysis. We conducted a global sensitivity analysis using the variance-based method of Sobol to estimate which parameters controlled the performance of the cell-free protein synthesis reaction [153]. We computed the total sensitivity index of each parameter relative to two performance objectives: productivity of the protein of interest and energy efficiency. We established the sampling bounds for each parameter from literature. We used the sampling method of Saltelli et al. [145] to compute a family of N (2d + 2) parameter sets which obeyed our parameter ranges, where N was a parameter proportional to the desired number of model evaluations and d was the number of parameters in the model. In our case, N = 1000 and d = 7, so the total sensitivity indices were computed from 16,000 model evaluations. The variance-based sensitivity analysis was conducted using the SALib module encoded in the Python programming language [65]. 3.4.7 Potential alternative optimal metabolic flux solutions. We identified potential alternative optimal flux distributions by performing single and pairwise reaction group knockout simulations. Reaction group knockouts were simulated by setting the flux bounds for all the reactions involved in a group to zero and then maximizing the translation rate. We grouped reactions in the cell-free network into 19 subgroups [181]. We computed the difference (l2-norm) for CAT productivity in the presence and absence of pairwise reaction knockouts. 66 Simultaneously, we computed the difference in the flux distribution (l2-norm) for each pairwise reaction knockout compared to the flux distribution with no knockouts. Those solutions with the same or similar productivity but large changes in the metabolic flux distribution represent alternative optimal solutions. 3.5 Acknowledgements This study was supported by the National Science Foundation (MCB-1411715) and the National Science Foundation Graduate Research Fellowship (DGE-1333468) to N.H. This study was also supported by an award from the US Army and Systems Biology of Trauma Induced Coagulopathy (W911NF-10-1-0376) to J.V. for the support of M.V. Lastly, this work was also supported by the Center on the Physics of Cancer Metabolism through Award Number 1U54CA210184-01 from the National Cancer Institute. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Cancer Institute or the National Institutes of Health. 67 CHAPTER 4 ABSOLUTE QUANTIFICATION OF CELL-FREE PROTEIN SYNTHESIS METABOLISM BY REVERSED-PHASE LIQUID CHROMATOGRAPHY-MASS SPECTROMETRY 4.1 Abstract 1 Cell-free protein synthesis (CFPS) is a widely used research tool in systems and synthetic biology; however, if CFPS is to become a mainstream technology for appli- cations such as point-of-care manufacturing, we must understand the performance limits of these systems. Toward this question, we developed a robust protocol to quantify 40 compounds involved in glycolysis, the pentose phosphate pathway, the tricarboxylic acid cycle, energy metabolism and cofactor regeneration in CFPS reactions. The method uses internal standards tagged with 13C-aniline, while com- pounds in the sample are derivatized with 12C-aniline. The internal standards and sample were mixed and analyzed by reversed-phase liquid chromatography-mass spectrometry (LC/MS). The co-elution of compounds eliminated ion suppression, allowing the accurate quantification of metabolite concentrations over 2-3 orders of magnitude where the average correlation coefficient was 0.988. Five of the forty compounds were untagged with aniline, however they were still detected in the CFPS sample and quantified with a standard curve method. The chromatic run 1Adapted with permission from Vilkhovoy M, Dai D, Vadhin S, Abhinav A, and Varner JD, ”Absolute quantification of cell-free protein synthesis metabolism by reversed-phase liquid chromatography-mass spectrometry”(2019) Journal of Visual Experiments, . 68 takes approximately 10 minutes to complete. In summary, we developed a fast, robust method to separate, and accurately quantify 40 compounds involved in CFPS in a single LC/MS run. Taken together, the method is a robust and accurate approach to characterize cell free metabolism, so that ultimately, we can understand and improve the yield, productivity and energy efficiency of cell free systems. 4.2 Introduction Cell-free protein synthesis has become a widely used tool in systems and syn- thetic biology, and a promising technology for point-of-use manufacturing of biomolecules. Cell-free systems offer many advantages compared to in vivo pro- cesses, such as direct access to metabolites and the biosynthetic machinery without the interference of a cell wall or the complications associated with cell growth [70]. However, a fundamental understanding of the performance limits of cell free pro- cesses has been lacking. High-throughput methods for metabolite quantification are valuable because they can help characterize metabolism, they are important to our understanding of the systems, and are critical to the construction of ro- bust metabolic computational models useful in process optimization[181, 180, 71]. Common methods used to determine metabolite concentrations include Nuclear Magnetic Resonance (NMR), Fourier transform-infrared spectroscopy (FT-IR), enzyme-based assays, and mass spectrometry (MS)[61, 36, 144, 170]. However, these methods are often limited by their inability to efficiently measure multiple 69 compounds at once and sample size requirements. For example, enzyme-based assays can often only be used to quantify a single compound in a run, and are lim- ited when the sample size is small, such as in cell-free protein synthesis reactions (typically run on a 10-15 µL scale). Meanwhile, NMR requires a high abundance of metabolites for detection and quantification[36]. Toward these shortcomings, chromatography methods in tandem with mass spectrometry (LC/MS) provide several advantages, including sensitivity and the capability of measuring multiple species simultaneously[40]; however, the analytical complexity increases consid- erably with the number and diversity of species being measured. It is important, therefore, to develop methods that fully realize the high-throughput potential of LC/MS systems. Compounds in a sample are separated by liquid chromatography and identified through mass spectrometry. The signal of the compound depends on its concentration and ionization efficiency, where the ionization can vary between compounds and may also depend on the sample matrix. Achieving the same ionization efficiency between the sample and standards is a challenge to using LC/MS to quantify analytes. Further, quantification becomes more challenging with metabolite diversity due to signal splitting and heterogene- ity in proton affinity and polarity[75]. Lastly, the co-eluting matrix of the sample can also affect the ionization efficiencies of the compounds. To address these issues, metabolites can be chemically derivatized, increasing the separation resolution, and the sensitivity and detection by the LC/MS system, while simultaneously decreasing signal splitting in some cases[75, 74]. Chemical derivatization works by tagging specific functional groups of metabolites to adjust their physical properties 70 like charge or hydrophobicity to increase ionization efficiency[74]. Various tagging agents can be used to target different functional groups like amines, hydroxyls, phosphates, carboxylic acids, etc. Aniline, one such derivatization agent, targets multiple functional groups at once, and adds a hydrophobic component into hy- drophilic molecules, increasing their separation resolution and signal[191]. To address the co-eluting matrix ion suppression effect, Yang and coworkers devel- oped a technique based on Group Specific Internal Standard Technology (GSIST) labeling where standards are tagged with 13C aniline isotopes and mixed with the sample[191, 77]. The metabolite and corresponding internal standard have the same ionization efficiency since they co-elute, and their intensity ratio can be used to quantify the concentration in the experimental sample. In this study, we developed a protocol to detect and quantify 40 compounds involved in glycolysis, the pentose phosphate pathway, the tricarboxylic acid cycle, energy metabolism and cofactor regeneration in cell-free protein synthesis reactions. The method is based on the GSIST approach, where we used 12C-aniline and 13C- aniline to tag, detect, and quantify metabolites using reversed-phase LC/MS. The linear range of all compounds spanned 2-3 orders of magnitude with an average correlation coefficient of 0.988. In conjunction, we used a commercially available method by Waters to tag, detect and separate all 20 amino acids in the cell-free extract. This method had a linear range for 2 orders of magnitude and an average correlation coefficient of 0.999. Thus, the method is a robust and accurate approach to interrogate cell free metabolism, and possibly whole-cell extracts. 71 Sample CFPS Standards de-proteinized Label with 12C-Aniline Label with 13C Aniline Combine equal volume of 12C sample and 13C standards LC-MS A Ax A Cx = A x C std std std m/z time Figure 4.1: Schematic of workflow for aniline tagging. The cell-free protein syn- thesis reaction is de-proteinized and tagged with 12C-aniline, while a standard stock mixture is tagged with 13C-aniline. Both mixtures are then mixed at a 1:1 volumetric ratio and analyzed by LC/MS. 72 Intensity Intensity 4.3 Results 4.3.1 Aniline tagged metabolites As a proof-of-concept, we used the protocol to quantify metabolites in myTXTL, a commercially available E. coli based CFPS system (Arbor Biosciences) express- ing green fluorescent protein (GFP). The CFPS reaction (14µL) was quenched and de-proteinized with ethanol. The CFPS sample was then tagged with 12C- aniline, while standards were tagged with 13C-aniline. The tagged sample and standards were then combined and injected into the LC/MS (Fig. 4.1). The proto- col detected and quantified 40 metabolites involved in central carbon and energy metabolism using internal standards, while a standard curve for 5 of the metabo- lites that were not tagged with aniline was also developed (Fig. 4.2 and Table 4.1). The diverse metabolites involved in these pathways were a class of phos- phorylated sugars, phosphocarboxylic acids, carboxylic acids, nucleotides, and cofactors. The derivatization with aniline introduced a hydrophobic moiety into hydrophilic molecules which facilitated more effective separation using reversed- phase chromatography[191]. In addition, the method enabled the separation of structural isomer pairs such as glucose 6-phosphate and fructose 6-phosphate in a single LC/MS run. Each compound’s mass over charge (m/z) ratio and retention time were identified prior to the experiment by injecting 1mM of one compound at a time and comparing the mass spectrum to the blank (Table 4.2). 73 1.00 29 0.75 33 37 28 17 34 0.50 13 15 27 12 32 5 14 3536 6 10 22 25 2630 31 11 0.25 3 16 4 8 18 9 19 38 1 2 7 20 23 4021 24 39 0.00 3.75 4.50 5.25 6.00 6.75 7.50 8.25 9.00 9.75 10.50 Time (min) 1. Gly3P 9. LAC 17. UDP 25. CTP 33.MAL 2. NAD 10. AMP 18. FAD 26. GTP 34.GAP 3. GLC 11. UMP 19. F16P 27. OAA 35.ACA 4. S7P 12. NADP 20. 6PG 28. aKG 36.NADPH 5. F6P 13. 3PG 21. NADH 29. UTP 37.PEP 6. GMP 14. CDP 22. G6P 30. ATP 38.SUCC 7. RL5P 15. GDP 23. R5P 31. FUM 39. ICIT 8. CMP 16. ADP 24. E4P 32. PYR 40.CIT Figure 4.2: Mass chromatogram from a single LC/MS run of a 40µM standard mixture of 40 metabolites. Peaks were identified by their retention time and m/z values for each compound. Complete compound names and their abbreviations are listed in Table 4.1. 74 Intensity (A.U.) The limit of detection and range of linearity for all compounds was estimated by producing a standard curve that ranged from 0.10 µM to 400 µM (Table 4.1). The average correlation coefficient (R2) for all compounds was 0.988 and most compounds had a linear range of 3-orders of magnitude. Three compounds had notable saturation effects, especially alpha-ketoglutarate which had a linear range from 0.1 µM to 25 µM. Isocitrate and citrate also had saturation effects above 100 µM. 4.3.2 Amino Acid Analysis As a proof-of-concept, we applied a commercially available protocol (Waters Corp.) to quantify amino acids in myTXTL, a commercially available E. coli based CFPS system (Arbor Biosciences) expressing green fluorescent protein (GFP). The CFPS reaction (14µL) was quenched and de-proteinized with ethanol. The de-proteinized sample was then tagged with AccQ-Tag Ultra Derivatization Kit (Waters Corp), separated by reverse-phase liquid chromatography and detected with a TUV at 260nm (Fig. 4.3). The accQ-Tag contained 17 of the 20 amino acids in the amino acid hydrolysate standard. The stock mixture was supplemented with the three missing amino acids: L-glutamine, L-asparagine, and L-tryptophan at the same concentration as the other amino acids. The limit of detection and limit of the linear ranges was determined to range from 0.781 to 50 µM with an average correlation coefficient of 0.999 (Table 4.3). The only exception was L-cysteine which had a linear range of 0.391 to 25 µM with a correlation coefficient of 0.999. L-cysteine 75 Table 4.1: Each compound’s corresponding limit of detection, range of linearity and correlation coefficient identified from standard curves. Peak Metabolite Abbreviation KEGG ID Limit of Limit of Linear 2Detection (µM) Range (µM) R 1 Glycerol 3-phosphate Gly3P C00093 0.1 400 0.995 2 Nicotinamide adenine dinucleotide NAD C00003 0.39 400 0.993 3 Glucose GLC C00031 0.1 400 0.997 4 Sedoheptulose 7-phosphate S7P C05382 0.16 400 0.988 5 Fructose 6-phosphate F6P C00085 0.1 400 0.986 6 Guanosine monophosphate GMP C00144 0.39 100 0.992 7 Ribulose 5-phosphate RL5P C00199 0.39 400 0.996 8 Cytidine monophosphate CMP C00055 0.1 100 0.992 9 Lactate LAC C00186 0.1 400 0.988 10 Adenosine monophosphate AMP C00020 0.1 100 0.992 11 Uridine monophosphate UMP C00105 0.1 100 0.997 12 Nicotinamide adenine dinucleotide phosphate NADP C00006 0.34 400 0.950 13 3-Phosphoglyceric acid 3PG C00197 0.1 100 0.996 14 Cytidine diphosphate CDP C00112 0.39 400 0.997 15 Guanosine diphosphate GDP C00035 1.5625 400 0.984 16 Adenosine diphosphate ADP C00008 0.39 400 0.995 17 Uridine diphosphate UDP C00015 0.39 400 0.991 18 Flavin adenine dinucleotide FAD C00016 0.1 400 0.958 19 Fructose 1,6-bisphosphate F16P C05378 0.39 400 0.989 20 Gluconate 6-phosphate 6PG C00345 0.39 400 0.989 21 Nicotinamide adenine dinucleotide reduced NADH C00004 0.39 100 0.972 22 Glucose 6-phosphate G6P C00668 0.1 400 0.984 23 Ribose 5-phosphate R5P C00117 0.39 100 0.999 24 Erythrose 4-phosphate E4P C00279 0.39 400 0.979 25 Cytidine triphosphate CTP C00075 6.25 100 0.998 26 Guanosine triphosphate GTP C00044 6.25 100 0.993 27 Oxalacetate OAA C00036 0.56 400 0.997 28 Alpha-ketoglutarate aKG C00026 0.1 25 0.979 29 Uridine triphosphate UTP C00075 1.5625 400 0.998 30 Adenosine triphosphate ATP C00002 1.5625 400 0.991 31 Fumarate FUM C00122 1.5625 100 0.999 32 Pyruvate PYR C00022 0.39 400 0.993 33 Malate MAL C00149 0.1 400 0.991 34 D-glyceraldehyde 3-phosphate GAP C00118 0.1 100 0.974 35 Acetyl-coenzyme A ACA C00024 0.1 100 0.991 36 Nicotinamide adenine dinucleotide phosphate reduced NADPH C00005 0.14 100 0.990 37 Phosphoenolpyruvate PEP C00074 0.1 100 0.962 38 Succinate SUCC C00042 0.1 320 0.999 39 Isocitrate ICIT C00311 0.39 100 0.998 40 Citrate CIT C00158 0.1 100 0.981 76 Table 4.2: Each compound’s corresponding peak number, retention time, m/z value for 12C, 13C, and unlabeled, cone voltage, and MS species. Peak Metabolite KEGG ID RetentionTime (min) 12C m/z 13C m/z nonlabel m/z CV MS Species 1 Gly3P C00093 3.85 153 10 M – H2O – H 2 NAD C00003 3.96 698 10 M + Cl – H 3 GLC C00031 4.06 289.9 296 15 M + A + Cl - H 4 S7P C05382 5.41 364 370 10 M + A – H 5 F6P C00085 5.48 334 340 10 M + A – H 6 GMP C00144 5.57 437.05 443 10 M + A – H 7 RL5P C00199 5.58 304 310 10 M + A – H 8 CMP C00055 5.59 397.09 403 10 M + A – H 9 LAC C00186 5.77 164.05 170 10 M + A – H 10 AMP C00020 5.85 421.1 427.1 10 M + A – H 11 UMP C00105 5.88 398.07 404 10 M + A – H 12 NADP C00006 6.39 724 10 M - H2O – H 13 3PG C00197 6.63 242 248.06 15 M + A – H2O – H 14 CDP C00112 6.72 477 483 10 M + A – H 15 GDP C00035 6.87 517 523 10 M + A – H 16 ADP C00008 6.94 501 507 10 M + A – H 17 UDP C00015 6.97 478 484 10 M + A – H 18 FAD C00016 7.03 784.15 15 M – H 19 F16P C05378 7.1 395.95 402.1 10 M + A – H2O – H 20 6PG C00345 7.11 425.1 437 10 M + 2A – H 21 NADH C00004 7.23 633.13 639.08 10 M + A + H2O – nicotinamide – H 22 G6P C00668 7.32 409.1 421.1 10 M + 2A – H 23 R5P C00117 7.54 379.1 391.1 15 M + 2A – H 24 E4P C00279 7.71 348.9 361 10 M + 2A – H 25 CTP C00075 7.84 557 563 5 M + A – H 26 GTP C00044 7.93 597 603 5 M + A – H 27 OAA C00036 7.94 281 293 25 M + 2A – H 28 aKG C00026 7.95 295 307.1 15 M + 2A – H 29 UTP C00075 7.97 558 564 10 M + A – H 30 ATP C00002 8.03 581 587 15 M + A – H 31 FUM C00122 8.09 265 277.1 10 M + 2A – H 32 PYR C00022 8.09 162 168 25 M + A – H 33 MAL C00149 8.09 283.06 295.15 10 M + 2A – H 34 GAP C00118 8.09 319 331.1 5 M + 2A – H 35 ACA C00024 8.16 790 10 M – H2O – H 36 NADPH C00005 8.23 694.92 700.82 10 M + A – nicotinamide – H 37 PEP C00074 8.28 317 329.1 20 M + 2A – H 38 SUCC C00042 8.64 267.07 279.1 15 M + 2A – H 39 ICIT C00311 10.13 398 416 10 M + 3A – H2O – H 40 CIT C00158 10.46 416.1 434.06 20 M + 3A – H A: represents aniline group under MS Species 77 1.00 0.75 0.50 0.25 0.00 2.00 3.00 4.00 5.00 6.00 7.00 8.00 Time (min) Figure 4.3: Amino acid chromatogram tagged and separated by reverse-phase liquid chromatography and detected with a TUV at 260nm. Peaks were identified by their retention time. had a lower limit of linear range since it’s concentration was half of all the other amino acids in the amino acid hydrolysate standard mixture. Amino acids in the sample were identified by their retention time and compared to the standard and quantified by standard curve method. 78 Intensity (A.U.) NH3 His Asn Ser Gln Arg Gly Asp Glu Thr Ala Pro Derivatization Cys Lys Tyr Val Met Ile Leu Phe Trp Table 4.3: Each amino acid’s retention time separated by reverse-phase liquid chromatography and detected by TUV at 260nm with the corresponding limit of detection, linear range, and correlation coefficient. Amino Acid Abbreviation KEGG ID Retention Limit of Limit of Linear 2Time (min) Detection (µM) Range (µM) R L-histidine His C00135 2.565 0.781 50 0.999 L-asparagine Asn C00152 2.893 0.781 50 0.999 L-serine Ser C00065 3.694 0.781 50 0.999 L-glutamine Gln C00064 3.788 0.781 50 0.999 L-arginine Arg C00062 3.92 0.781 50 0.999 L-glycine Gly C00037 4.082 0.781 50 0.999 L-aspartate Asp C00049 4.500 0.781 50 0.999 L-glutamate Glu C00025 5.009 0.781 50 0.999 L-threonine Thr C00188 5.363 0.781 50 0.999 L-alanine Ala C00041 5.834 0.781 50 0.999 Lproline Pro C00148 6.419 0.781 50 0.999 L-cysteine Cys C00097 7.192 0.391 25 0.999 L-lysine Lys C00047 7.250 0.781 50 0.999 L-tyrosine Tyr C00082 7.501 0.781 50 0.999 L-methionine Met C00073 7.611 0.781 50 0.999 L-valine Val C00183 7.680 0.781 50 0.999 L-isoleucine Ile C00407 8.340 0.781 50 0.999 L-leucine Leu C00123 8.438 0.781 50 0.999 L-phenylalanine Phe C00079 8.573 0.781 50 0.999 L-tryptophan Trp C00078 8.629 0.781 50 0.999 79 Table 4.4: Each compound’s retention time and mass over charge ratio with the corresponding limit of detection, linear range, and correlation coefficient. Nucleotide Sugar Abbreviation RetentionTime (min) m/z Limit of Limit of Linear 2 Detection (µM) Range (µM) R CMP-Sialic Acid CMP-Neu5AC 1.562 613.10 0.2 20 0.999 GDP-D-Mannose GDP-D-Man 1.656 604.01 0.2 20 0.999 UDP-a-D-Galactose UDP-a-D-Gal 1.670 564.96 0.2 20 0.999 UDP-N-acetyl-D-glucosamine/galactosamine UDP-Hex 1.671 606.00 0.2 20 0.996 4.3.3 Nucleotide charged sugars We developed a protocol for the detection and quantification of five nucleotide charged sugars (Fig. 4.4). Nucleotide charged sugars are important precursors for glycoproteins which are products of interest to be produced in CFPS [78]. The retention time and mass over charge ratio for each compound were determined individually from standards. The range from 0.2 to 20 µM had a linear coefficient of 0.999 for all compounds except UDP-Hex which had a linear coefficient of 0.996 (Table 4.4). Three of the five nucleotide sugars (CMP-Sialic Acid, GDP-D- Mannose, and UDP-a-Galactose) had unique mass over charge ratios that allowed for their detection and quantification. Whereas UDP-N-acetyl-D-glucosamine and galactosamine had the same retention time of 1.671 minutes and the same m/z of 606.0, thus they were not distinguishable for individual quantification. Due to this, the compounds were mixed at a 1:1 ratio to be used for quantification in biological samples. This protocol has been used to determine the corresponding concentrations of the nucleotide sugars in mammalian cells lines (intracellular levels) and from E. coli lysate (data not shown). 80 1.00 UDP-Hex 0.75 UDP-a-D-Gal GDP-D-Man 0.50 CMP-Neu5Ac 0.25 0.00 0.00 0.40 0.80 1.20 1.60 2.00 2.40 2.80 3.20 3.60 Minutes Figure 4.4: Nucleotide charged sugars chromatogram separated by reverse-phase liquid chromatography and detected by mass-spectrometry according to each compounds mass over charge ratio. Peaks were identified by their retention time and selective ion recording. 4.4 Discussion Cell-free systems have no cell wall, thus there is direct access to metabolites and the biosynthetic machinery without the need for complex sample preparation. However, despite this, very little work has been done to develop thorough and robust protocols to quantitatively interrogate cell-free reaction systems. In this study, we developed a fast, robust method to quantify metabolites in cell-free reaction mixtures and potentially in whole-cell extracts. Individual quantification 81 Intensity (A.U.) of metabolites in complex mixtures, such as those found in cell-free reactions, or whole-cell extracts, is challenging for several reasons. Central amongst these reasons is chemical diversity. The array of functional groups simultaneously present in these mixtures, such as carboxylic acids, amines, phosphates, hydroxyls, etc. greatly increases the analytical complexity. To circumvent this, we used an aniline derivatization method in combination with 13C internal standard to introduce hydrophobic components to the metabolite mixtures. Using this method, we robustly detected and quantified 40 metabolites in a cell-free reaction in a single LC/MS run. While we demonstrated this technique in a cell-free reaction mixture, it could also likely be applied to whole-cell extracts, thus, potentially allowing the absolute quantification of intracellular metabolites concentrations. The latter application has relevance to a variety of important questions in biotechnology and human health. The method presented here was based on a previous technique (GSIST) that was applied to whole-cell extracts of the yeast S. cerevisiae[191, 77]. In this study, we expanded upon which compounds could be detected and quantified to include all 12 nucleotides (xMP, xDP, xTP, where x is A, C, G and U). Addition of these compounds could have important biological implications. For example, these nucleotides are heavily involved in transcription and translation processes, which is one of the central processes of interest in CFPS applications, and more generally the compounds are important in a variety of physiological functions. In addition, we were able to detect acetic acid which is an important metabolite when examining overflow metabolism. However, we did not include it in the study because there 82 was a significant reduction of signal in multiple compounds, especially NADH and NADPH, when acetic acid was added to the standard mixture. Acetic acid had a high limit of detection of 612 µM, thus at these high levels it had a negative effect on the other metabolites’ signals. Despite this, acetic acid can still be detected and quantified in samples by creating a standard curve with just acetic acid in the vial. Acetic acid had a m/z value of 134.0, retention time of 5.78 minutes, and a linear range from 612 µM to 5000 µM (R2 = 0.986) when tagged with 12C-aniline. The remaining metabolites did not alter each other’s ion signal and represent a comprehensive mixture to characterize CFPS metabolism. Taken together, we developed a fast, robust protocol for the characterization and absolute quantification of 40 compounds involved in glycolysis, the pentose phosphate pathway, the tricarboxylic acid cycle, energy metabolism and cofactor regeneration in CFPS reactions. The method relied on internal standards tagged with 13C-aniline, while the sample was tagged with 12C-aniline. The internal standards and sample compounds co-eluted and eliminated ion-suppression ef- fects which enabled accurate quantification of individual metabolites in complex metabolite mixtures. We identified a total of 40 compounds (41, if including acetic acid) that can be detected and quantified in a cell-free reaction mixture; however, the list of metabolites could be further expanded and adjusted towards the par- ticular biochemical process of interest. Thus, the method provides a robust and accurate approach to characterize cell free metabolism, which is potentially critical to improving the yield, productivity and energy efficiency of cell free processes. 83 4.5 Materials and Methods 4.5.1 Aniline derivatization Materials and Reagents: All metabolite standards, aniline, N-(3-dimethylaminopropyl)- N’-ethylcarbodiimide hydrochloride (EDC), tributylamine (TBA), triethyamine (TEA), HPLC grade acetonitrile, and HPLC grade water were purchased from Sigma-Aldrich (St. Louis, MO). Sedoheptulose 7-Phosphate was purchased from Carbosynth (Compton, UK). All materials and equipment are listed in Table A.1 in the appendix. LC-MS: The UPLC-ESI-MS system consisted of a UPLC system (Acquity H- Class, Waters) and an electrospray ionization (ESI) source mass spectrometer (QDA detector, Waters). The system was controlled by Empower 3 software (Waters). The autosampler was set at 10 ◦ C. Separation were performed on a Acquity BEH C18 Column (1.7 µm, 2.1 mm x 150 mm, Waters). The elution started from 95% mobile phase A (5 mM TBA aqueous solution, adjusted to pH 4.75 with acetic acid) and 5% mobile phase B (5 mM TBA in Acetonitrile), raised to 70% B in 10 minutes, further raised to 100% B in 2 minutes, and then held at 100% B for 3 minutes and returned to initial conditions over 1 minute and held for 9 minutes to re-equilibrate the column. The flow rate was set at 0.3 mL/min with injection volume as 5 µL. The column was preconditioned by pumping the starting mobile phase mixture for 10 minutes, followed by the gradient protocol specified above 3 times prior to any 84 injections. LC-ESI-MS chromatograms were acquired in negative ion mode under the following conditions: capillary voltage of 10 V, dry temperature at 520◦C, and an acquisition range of m/z 100-800. Selected ion recordings were specified for each metabolite and are listed in Table 4.2. Labeling protocol: A solution of 6.0 M 12C-aniline was prepared by combining 550 µL of aniline with 337.5 µL if water and 112.5 µL of 12 M hydrochloric acid and vortexed.A solution of 6.0 M 13C-aniline solution was prepared by combing 250 mg 13C-aniline with 132 µL water and 44 µL of 12 M hydrochloric acid and vortexed. Store aniline solutions at 4 ◦C for upto 2 months. EDC at 200.0 mg/mL was prepared freshly in HPLC grade water. A 50 µL sample solution with 35 standards was prepared in water at 40 µM. 5 µL of 13C-aniline was added to the sample solution followed by 5 µL of 200 mg/mL EDC. The CFPS sample was de-proteinized by the addition of 100% ice-cold ethanol at a 1:1 volumetric ratio and centrifuged at 12,000 x g for 15 minutes at 4◦C. The supernatant was transferred into a new centrifuge tube and 6 µL was used for aniline tagging. The volume was brought upto 50 µL with water and 5 µL of 12C-aniline and 5 µL of 200 mg/mL EDC was added to the reaction. Both sample and standard mixtures were vortexed with gentle shaking at ambient temperature ( 22 °C) for 2 h. The labeling reaction was stopped by the addition of 1.5 µL of triethylamine. The mixture was centrifuged at 13,500 xg for 3 minutes. The supernatant of the sample and the standard were combined at a 1:1 volumetric ratio into an autosample vial for injection into the LC-MS. The solution mixture was injected at 5 µL and the 12C-aniline m/z tagged values were recorded. The sample was injected again at 85 the same volume and the 13C-aniline mz values were recorded (Table 4.2). The QDa detector is unable to record both the 12C and 13C m/z values at the same time since it cannot handle that amount of data acquisition. Thus, the sample is injected twice to record the sample intensities followed by the standard intensities. Standard curve preparation: Prepare a series of dilutions in water of the un- tagged metabolites (NAD, NADP, FAD, acetyl-CoA and glycerol 3-phosphate) ranging from 0.4 to 400 µM with a volume of 50 µL. Add 5 muL of 12-C aniline and 5 µL of 200 mg/ml EDC and vortex at room temperature for 2 hours. Add 1.5 µL of triethylamine and centrifuge at 13,500 x g for 3 minutes. Transfer the standard into an auto-sample vial and inject into the LC-MS. The untagged metabolites follow the same procedure as the sample to replicate the sample matrix in order to maintain a similar ionization efficiency. Quantification of metabolites: The mass-chromatogram peak for each metabo- lite is integrated and the area is used to quantify the amount in the sample by the following equation: A C = x,ix,i CA std,i D (4.1) std,i where Cx,i is the concentration of the unknown sample for metabolite i, Ax,i is the integrated area of the unknown metabolite i, Astd,i is the integrated area of the internal standard of metabolite i, Cstd,i is the concentration of the internal standard of metabolite i, and D is the dilution factor. 86 Untagged metabolites are quantified by the standard curve method where the integrated area of a standard is associated with the known concentration. A standard curve is developed for the series of different concentrations and is used to quantify the unknown amounts in the sample. 4.5.2 Amino acid derivatization Amino Acid labeling protocol: A solution containing a mixture of 20 amino acids is drivatized with a Waters AccQ-Tag Ultra amino acid analysis kit (Waters). The sample is prepared by taking 10 µL of a mixture of 20 amino acids and adding 70 µL of a buffer solution (Waters) followed by 20 µL of a reagent (Waters). The solution is then kept in a water bath at 55 °C for 10 minutes. The solution is then separated by reverse-phase liquid chromatography with a Acquity Amide C18 Column (2.1 mm x 150 mm, Waters) and analyzed with a TUV detector at 260 nm. The gradient protocol is available from Waters Corporation. Amino acid are detected and quantified based on known retention times (Fig. 4.3). 4.5.3 Nucleotide charge sugar detection Nucleotide charge sugar protocol: Nucleotide charged sugars were purchased from CarboSynth (Newbury, UK). Standards were dissolved in water individually and injected into an UPLC-ESI-MS (Waters) to determine their corresponding retention 87 times and mass over charge ratios (m/z). The UPLC-ESI-MS system consisted of a UPLC system (Acquity H-Class, Waters) and an electrospray ionization (ESI) source mass spectrometer (QDA detector, Waters). The system was controlled by Empower 3 software (Waters). The autosampler was set at 10 ◦ C. Separation were performed on a Acquity BEH C18 Column (1.7 µm, 2.1 mm x 150 mm, Waters). Separation were performed on a Acquity BEH C18 Column (1.7 µm, 2.1 mm x 50 mm, Waters). The elution started from 95% mobile phase A (5 mM TBA aqueous solution, adjusted to pH 4.75 with acetic acid) and 5% mobile phase B (5 mM TBA in Acetonitrile), raised to 57% B in 2 minutes, further raised to 100% B in 0.5 minutes, and then held at 100% B for 2 minutes and returned to initial conditions over 0.1 minute and held for 4 minutes to re-equilibrate the column. The flow rate was set at 0.6 mL/min with injection volume as 5 µL. The column was preconditioned by pumping the starting mobile phase mixture for 10 minutes, followed by the gradient protocol specified above 2 times prior to any injections. LC-ESI-MS chromatograms were acquired in negative ion mode under the following conditions: capillary voltage of 10 V, dry temperature at 520◦C, and an acquisition range of m/z 100-800. Selected ion recordings were specified for each metabolite and are listed in Table 4.4. 88 4.6 Acknowledgments The work described was supported by the Center on the Physics of Cancer Metabolism through Award Number 1U54CA210184-01 from the National Cancer Institute ( https://www.cancer.gov/ ). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Cancer Institute or the National Institutes of Health. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. 89 CHAPTER 5 AN INTEGRATED KINETIC CONSTRAINT-BASED MODEL OF E. COLI CELL-FREE PROTEIN SYNTHESIS 5.1 Abstract Cell-free protein expression has become a widely used research tool in systems and synthetic biology, and a promising technology for biomanufacturing of proteins. Cell-free protein synthesis relies on transcription and translation machinery to produce a protein of interest. However, to fuel this process requires biochemical enzymes and reactions that are involved in complex metabolic pathways. Here we use isotope labeling to measure absolute metabolite concentrations in an E. coli based cell-free system for a batch reaction. We then integrate this information with kinetic parameters, enzyme levels, enzyme activity assays, and kinetic descrip- tions of transcription and translation in a constraint-based mathematical modeling framework. The modeling framework predicts the production of mRNA and pro- tein along with metabolic behavior of two oxidative phosphorylation inhibitors. Flux estimations and experimental data reveal that the cell-free reaction has active central carbon metabolism with glutamate powering the TCA cycle to provide reduced ubiquinone for oxidative phosphorylation that sustains the batch reaction for 16 hours. 90 5.2 Introduction Cell-free protein synthesis (CFPS) is a widely used research tool in systems and synthetic biology and a promising platform for manufacturing of proteins and chemicals [196, 110, 94, 70, 78]. In the past decades, CFPS has been used to better understand biochemical processes. For example, E. coli cell-free extracts were used in the 1960s to decipher the sequencing of the genetic code [118, 128]. Today, CFPS is gaining wide interest in metabolic engineering to circumvent significant barriers in traditional in vivo systems [58]. Cell-free systems offer many advantages for the study, manipulation and modeling of metabolism compared to in vivo processes. However, both approaches still require mathematical modeling to help better understand the metabolism that is occurring in these system. Ultimately, mathematical models can identify unintuitive strategies for the rationale design of strains and circuits to improve product yield and system efficiency [12, 182, 27]. Constraint-based approaches such as flux balance analysis (FBA), which use sto- ichiometric reconstructions of microbial metabolism, have become standard tools in systems biology and metabolic engineering [108]. Stoichiometric reconstruc- tions have been expanded to include the integration of metabolism with detailed descriptions of gene expression (ME-Model) [4, 107, 129] and protein structures (GEM-PRO) [197, 30]. Constraint-based approaches model metabolism using the biochemical stoichiometry and other constraints such as thermodynamical feasibil- ity [64, 62] under pseudo steady state conditions. Given these constraints, these models have used linear programming [34] to predict productivity [176, 146, 181], 91 yield [176], mutant behavior [43], and growth phenotypes [129] for biochemi- cal networks of varying complexity, including genome scale networks, using a limited number of adjustable parameters. Currently, there are only a few math- ematical models of CFPS that integrate metabolic pathways with transcription and translation processes [71, 181]. In addition, eperimental measurements of absolute metabolite levels are required to build dynamic mathematical models of metabolism that can describe and predict experimental data. Cell-free systems allow for direct access to metabolites and the biosynthetic machinery without the interference of a cell wall or the complications associated with cell growth. However, comprehensive measurements of cell-free metabolism have not been reported in literature with the exception of amino acids and a few organic acids [83, 82, 81, 181]. A variety of methods exist for measuring metabolite concentrations[23], most commonly with liquid chromatography-mass spectrome- try (LC/MS). A complication of LC/MS systems is maintaining the same ionization efficiency for samples and standards to obtain reliable absolute concentrations. Here, we overcome this limitation with isotopically labeled standards based on an aniline tagging technique [191]. Through this approach and additional analytical techniques, we quantified 61 metabolites involved in central carbon and energy metabolism. In this study, we expand on our previous sequence specific constraint based model and integrate kinetic turnover rates, enzyme levels, kinetic descriptions of transcription/translation, enzyme activity assays and absolute metabolite levels 92 Construct network Integrate Integrate Constrain to Ensemble modeling kinetic parameters enzyme levels metabolite fluxes 10 3 10 4 10 5 gdh hk ldh CFPS Analysis 10 6 ppc 10 7 10 7 10 6 10 5 10 4 10 3 Enzyme level from activity assays (mM) Figure 5.1: Modeling framework of cell-free protein synthesis. The metabolic net- work was adapted from Vilkhovoy and coworkers where transcription/translation was integrated with metabolism. Maximum flux bound rates were formulated to be a function of the turnover rate and enzyme abundance found to be present in CFPS extract. Enzyme levels were validated for a subset of 15 reactions with enzyme activity assays. Four of the enzymes were not reported in Garenne and coworker (grey boxes), but were found to be active with it’s corresponding enzyme activity assay. The flux estimation for each time step was estimated while being constrained to metabolic measurements where data was present (62 species). Finally, the flux calculation was sampled across an ensemble of 100 sets given experimental noise and literature parameters. hk: hexokinase, gdh: glutamate dehydrogenase, ppc: phosphoenolpyruvate carboxylase, sdh: succinate dehydrogenase. to describe CFPS metabolism. Flux estimations and experimental data reveal that oxidative phosphorylation is active in myTXTL and is coupled with central carbon metabolism to power transcription/translation for the production of GFP for 16 hours. The mathematical framework described metabolic behavior pertubations from the control when biochemical inhibitors were introduced into the reaction. Taken together, we provide a modeling framework that describes and predicts CFPS metabolism and can be potentially used to identify strategies toward cell-free metabolic engineering applications. 93 Enzyme level from literature (mM) 5.3 Results 5.3.1 Integration of kinetic parameters, enzyme levels, and metabolite concentrations We integrated kinetic parameters, enzyme levels and metabolite concentrations along with mechanistic descriptions of transcription and translation to describe the time course of CFPS metabolism (Fig. 5.1). To this end, we adapted a earlier stoichiometric reconstruction of CFPS [181] with 200 reactions (not including ex- change reactions) and 157 species that described glycolysis, the pentose phosphae pathway, tca cycle, amino acid biosynthesis, chorismate, purine and pyrmidine metabolism. The network also described sequence specific descriptions of tran- scription/translation including tRNA charging of amino acids. Mechanistic kinetic rates for transcription and translation were derived following mass-action kinetics. This mathematical framework has been previously used to predict protein syn- thesis for green fluorescent protein (GFP) and chloramphenicol acetyltransferase (CAT) under different promoters in two different CFPS systems. However, there was a high uncertainty in flux estimations. Toward this we constrained each flux to be a function of it’s turnover rate and enzyme level. We identified turnover rates for each reaction from BRENDA and/or taken from Adadi and coworkers [3]. Enzyme abundance levels were identified for 104 reactions in our network from Garenne and coworkers [53] in the myTXTL lysate. We validated enzyme concen- 94 trations for a subset of 15 reactions using our enzyme activity assays (Fig. 5.1). Four of the enzymes including hexokinase (hk), glutamate dehydrogenase (gdh), phosphoenolpyruvate carboxylase (ppc), and succinate dehydrogenase (sdh) were not reported in Garenne and coworkers, but were found to be active in myTXTL with their corresponding activity assay. All remaining enzymes not reported in Garenne and coworkers were set to a median value of 50 nM. In addition, we constrained the upper bound of the corresponding reaction with the experimental enzyme activity level. Finally, we integrated absolute measurements of 63 species for the timecourse of the CFPS reaction which included central carbon metabo- lites, energy species and amino acids. Taken together, the modeling framework with integrated kinetic parameters, enzyme levels, and metabolite concentrations provided an accurate timecourse flux distribution of CFPS metabolism (Fig. 5.2). Metabolism of the myTXTL system has been reported to rely on maltodextrin and 3-phosphoglycerate (3PG) to provide energy resources for transcription and translation. However, metabolic constraints and enzyme activity assays for gluta- mate dehydrogenase reveal glutamate powers the TCA cycle along with succinate dehydrogenase to provide energy support for oxidative phosphorylation. Previ- ously, it was inconclusive whether oxidative phosphorylation was active in the myTXTL system. Flux distributions show oxidative phosphorylation was active throughout the CFPS reaction with high flux at 2 hours (Fig. 5.2A) and moderate flux at 8 hours (Fig. 5.2B). Maltodextrin and 3PG activated the glycolysis pathway and lead to an accumulation of organic acids such as pyruvate and acetate (Fig. 5.6. The accumulation of acetate relies on substrate level phosphorylation which pro- 95 a Control (2 h) b Control (8 h) gp …gpM M (n-1) MALTOSE Flux (A.U.) gp gp M M (n-1) … MALTOSE gp 0 25 50 75 100 gp G1P GLC G1P GLC hk hk pgm pgm zwf pgl gnd zwf pgl gnd G6P 6GPL 6GPC RL5P G6P 6GPL 6GPC RL5P pgi rpe rpiedd pgi rpe rpi edd F6P XU5P R5P O F6P XU5P R5P2 Otkt1 tkt1 2 fdp pfk fdp pfk S7P G3P atp atpF16P tkt2 F16P tkt2 S7P G3P talAB talAB fbaA ATP fbaA ATP tpiA E4P F6P tpiA E4P F6P DHAP G3P DHAP G3P gapA gapA eda GLU eda GLU 1,3DPG 2DDG6P 1,3DPG 2DDG6P pgk acn ICIT icd gdh pgk acn ICIT icd gdh 3PG AC CIT AKG 3PG AC CIT AKG gpm ackA gltA gpm ackA gltA akgdh akgdh 2PG ACCOA aceA 2PG ACCOA aceA eno eno ppc ppc PEP OAA GLX SUCCOA PEP OAA GLX SUCCOA pps pyk pdh aceB pps pyk pdh aceB scs scs LAC PYR mdh LAC PYR mdh ldh maeAB ldhMAL SUCC maeAB MAL SUCC frd frd fum FUM sdh fum FUM sdh Figure 5.2: Mean flux distribution across an ensemble (N=100) for control. Fluxes were determined by integrating kinetic parameters with enzyme levels and con- straining to measurements of metabolites and enzyme activity levels where data was available. (a) Flux distribution at 2 hours of CFPS reaction. (b) Flux distribution at 8 hours of CFPS reaction. Fluxes were normalized to maltodextrin consumption at t=0 hours. vides an inefficient energy pathway when compared to oxidative phosphorylation. At 4 hours of the CFPS reaction, metabolism switched toward pyruvate consump- tion towards valine synthesis which showed an accumulation in the media (Fig. 5.4). In conclusion, the timecourse flux distribution of CFPS revealed the system relied on a mixture of aerobic and anaerobic processes to provide the necessary energy requirements for transcription and translation. Cell-free protein synthesis is a mixture of cytoplasmic extract and does not contain the necessary enzymatic regulation of in vivo systems to exploit the optimal pathway given the system’s 96 a b Control DNP TTA Figure 5.3: Prediction of mRNA and protein levels in CFPS for control (blue), DNP (red) and TTA (grey). (a) The mRNA levels of GFP were predicted with the given modeling framework. (b) The protein abundance of GFP was predicted for all three cases. The solid line denotes the mean of the ensemble (N=100), the shaded region denotes the 95% confidence interval of the ensemble, the points denote experimental measurements, and error bars denote the standard deviation of experimental measurements. environment. 5.3.2 Transcription/Translation is oxygen dependent Transcription and translation processes were oxygen dependent in the myTXTL CFPS system. Specific inhibitors of oxidative phosphorylation showed that respira- tion was active and powers transcription and translation. When a cell-free reaction was incubated with two different inhibitors including thenoyltrifluoroacetone (TTA), an electron transport inhibitor in Complex II, and 2-4-dinitrophenol (DNP), a membrane gradient uncoupler, protein accumulation was significantly less than that of the control (Fig. 5.3B). In addition, mRNA levels were not sustained for 97 Control DNP TTA Figure 5.4: Time course of amino acid levels in CFPS for control (blue), DNP (red) and TTA (grey). Experimental amino acid fluxes constrained the mathematical model of CFPS. The solid line denotes the mean of the ensemble (N=100), the shaded region denotes the 95% confidence interval of the ensemble, the points denote experimental measurements, and error bars denote the standard deviation of experimental measurements. the duration of the CFPS reaction with the inhibitors (Fig. 5.3A). This can be seen with the depletion of CTP and GTP at approximately 4 hours of the CFPS reaction for TTA (Fig. 5.7). In the case with DNP in the CFPS extract, mRNA levels were degraded substantially slower then in the case of TTA. This shows that DNP relied on substrate level phosphorylation to fuel transcription and translation resulting in a slightly higher titer of GFP of 10.4 ± 0.8 µM whereas TTA resulted in a titer of 8.0 ± 1.0 µM. Whereas for the reaction with TTA, transcription and translation relied on the nucleotides that were available in the media. This can be seen with 98 Control DNP TTA Figure 5.5: Time course of upper central carbon metabolite levels in CFPS for control (blue), DNP (red) and TTA (grey). DNP showed exhuastion of maltose revealing maltodextrin depletion and thus high carbon utilization. Experimental fluxes constrained the mathematical model of CFPS. The solid line denotes the mean of the ensemble (N=100), the shaded region denotes the 95% confidence interval of the ensemble, the points denote experimental measurements, and error bars denote the standard deviation of experimental measurements. the depletion of CTP and GTP at approximately 4 hours of the CFPS reaction (Fig. 5.7). In addition, the DNP treatment had a significantly higher accumulation of acetate of 53 mM at 16 hours compared to the control with 39 mM and TTA with 27 mM. The high accumulation of acetate in the control further supports that myTXTL relied on aerobic and anaerobic processes. For the control, mRNA levels were maintained at a steady-state level of approximately 570 nM and GFP resulted in a 99 Control DNP TTA Figure 5.6: Time course of lower central carbon metabolite levels in CFPS for control (blue), DNP (red) and TTA (grey). DNP heavily relied on substrate level phosphorylation with high accumulation of acetate, whereas TTA had a high abundance of lactate. Experimental fluxes constrained the mathematical model of CFPS. The solid line denotes the mean of the ensemble (N=100), the shaded region denotes the 95% confidence interval of the ensemble, the points denote experimental measurements, and error bars denote the standard deviation of experimental measurements. titer of 21.3 ± 1.6 µM. The higher accumulation of GFP for the control compared to the oxidative phosphorylation inhibitors supports that oxidative phosphorylation was active in myTXTL and was able to sustain transcription and translation with an activated metabolism of central carbon pathways. 100 Control DNP TTA Figure 5.7: Time course of energy species levels in CFPS for control (blue), DNP (red) and TTA (grey). Both DNP and TTA exhausted GTP within 4 hours of the reaction which is required for translation. Experimental fluxes constrained the mathematical model of CFPS. The solid line denotes the mean of the ensemble (N=100), the shaded region denotes the 95% confidence interval of the ensemble, the points denote experimental measurements, and error bars denote the standard deviation of experimental measurements. 5.3.3 Kinetic descriptions with metabolic constraints predict metabolic behavior of oxidative phosphorylation inhibitors The integrated modeling framework of CFPS predicted the dynamic behavior of mRNA and protein production for an aerobic reaction (control) and for reactions incubated with DNP and TTA. In order to capture the mRNA timecourse behavior, the transcription rate was formulated to be a function of saturation kinetics of the reactants involved which included, ATP, CTP, GTP, UTP and the concentration of the plasmid for GFP. Given this formulation, the model captured the mRNA level for the first 12 hours of the CFPS reaction for the control, however it failed to 101 capture the mRNA abundance at the 16 hour time point. Since CTP, GTP, and UTP essentially decline to 0 mM toward the end of the CFPS reaction, transcription is halted and mRNA is degraded. However, the experimental system showed mRNA maintaining its steady state value at 16 hours. Despite this, the ensemble captured GFP production for the entire CFPS reaction. For the DNP case, we added a reaction that leaked a charged hydrogen to an uncharged hydrogen and set this reaction to be maximized. 2-4-dinitrophenol is a membrane uncoupler and acts as a chemical ionophore that leaks charged proton ions. Thus, DNP doesn’t allow for a gradient to form for efficient oxidative phosphorylation activity. The added reaction allowed the model to accurately describe the effect of DNP on CFPS metabolism. Given these conditions, along with the metabolic constraints and enzyme activity assays, the ensemble captured the dynamic behavior of mRNA and protein production of GFP. The mathematical model estimated a reduction of 94% in oxidative phosphorylation activity for DNP when compared to the control. In the case of TTA, the model estimated a reduction of 51% in oxidative phosporylation activity with no additional modifications to the model. Additionally, the model predicted no flux via succinate dehydrogenase for the first 2 hours and very low flux for the remainder of the reaction. Thenoyltriflu- oroacetone directly blocks the respiratory chain at complex II which is part of the succinate dehydrogenase enzyme. Given the metabolic and kinetic constraints, the modeling framework was able to accurately predict the effect of DNP and TTA on CFPS metabolism as well as mRNA and protein production. 102 5.3.4 Analysis of CFPS metabolism with oxidative phosphoryla- tion inhibitors With the accurate prediction of the effect of DNP and TTA on CFPS metabolism and validation of the ensemble capturing mRNA and protein production, we analyzed the flux distribution at 2 and 8 hours and compared key reactions to the control to gain insights into CFPS metabolism (Fig. 5.8-5.9). Together with absolute metabolite measurements, kinetic parameters, enzyme levels and enzyme activity assays, we determined the net flux distribution across an ensemble of 100 sets with sampling on experimental noise and uncertainty in literature values. Substantial differences were observed across all three cases as well as throughout the duration of the CFPS reaction. When the CFPS reaction was incubated with DNP, there was an increase in metabolism and oxygen consumption, however, oxidative phosphorylation was inactive. At 2 hours of the reaction, the majority of the carbon traveled through glycolysis with 74% via pgi and 24% through pentose phosphate pathway via zwf. The split toward pentose phosphate was notably higher for DNP then compared to the control, where only 1% of the flux traveled through zwf at the 2 hour mark of the reaction. The TCA cycle for DNP behaved very similarly to the control with high activity via gdh and saw no significant differences across the ensemble. How- ever, as the reaction progressed towards 8 hours, significant differences appeared throughout the network. First, maltose was depleted and thus lower activity is 103 a DNP (2 h) b DNP (8 h) gp …gp Flux (A.U.) gp gpM M (n-1) MALTOSE M M (n-1) … MALTOSE gp 0 25 50 75 100 gp G1P GLC G1P GLC hk hk pgm pgm zwf pgl gnd zwf pgl gnd G6P 6GPL 6GPC RL5P G6P 6GPL 6GPC RL5P pgi rpe rpi pgi rpe rpiedd edd F6P XU5P R5P O2 F6P XU5P R5P Otkt1 tkt1 2 fdp pfk fdp pfk F16P tkt2 S7P G3P atp atp F16P tkt2 S7P G3P talAB talAB fbaA ATP fbaA ATP tpiA E4P F6P tpiA E4P F6P DHAP G3P DHAP G3P gapA gapA eda 2DDG6P GLU eda 2DDG6P GLU1,3DPG 1,3DPG pgk acn ICIT icd gdh pgk acn ICIT icd gdh 3PG AC CIT AKG 3PG AC CIT AKG gpm ackA gltA gpm ackA gltA akgdh akgdh 2PG ACCOA aceA 2PG ACCOA aceA eno eno ppc PEP OAA GLX SUCCOA ppc PEP OAA GLX SUCCOA pps pyk pdh aceB pps pyk pdh aceB scs scs LAC PYR mdh LAC PYR mdh ldh maeAB ldhMAL SUCC maeAB MAL SUCC frd frd fum FUM sdh fum FUM sdh Figure 5.8: Mean flux distribution across an ensemble (N=100) for DNP. Fluxes were determined by integrating kinetic parameters with enzyme levels and con- straining to measurements of metabolites and enzyme activity levels where data was available. (a) Flux distribution at 2 hours of CFPS reaction. Flux difference from control shown for key reactions at 2 hours of CFPS reaction. (b) Flux distri- bution at 8 hours of CFPS reaction. Flux difference from control shown for key reactions at 8 hours of CFPS reaction. Fluxes were normalized to maltodextrin consumption at t=0 hours. 104 pgm hk pgi gpm eno pyk pdh zwf rpe rpi tkt1 gltA akgdh sdh fum mdh mae ackA ldh atp pgm hk pgi gpm eno pyk pdh zwf rpe rpi tkt1 gltA akgdh sdh fum mdh mae ackA ldh atp seen via pgm and hk. In addition, 100% of the carbon traveled via zwf and the first step in glycolysis had a backward reaction to further supplement G6P for the pentose phosphate pathway. Lower glycolysis showed much higher flux starting from gapA to pdh and towards ackA. Compared to the control, DNP had a 740% increase in flux via gapA and a 120% increase in flux via pdh. The high metabolism rate with DNP incubation resulted in the accumulation of acetate and thus relying on substrate level phosphorylation since oxidative phosphorylation was inhibited. When the CFPS reacton was incubated with TTA, there was a decrease in overall metabolism, however, oxidative phosphorylation remained active, but at a 51% reduction when compared to the control. Upper glycolysis involving maltodextrin consumption, glucose-1-phosphate, and glucose utilization was very similar when compared to the control with a slight increase in pgm activity. However, just as in the case with DNP, there was a higher split towards pentose phosphate pathway with 15% at 2 hours of the reaction and 18% at 8 hours of the reaction. The most notable differences were in the TCA cycle which had very low activity throughout the pathway and this can be seen with high glutamate levels in the media. TTA is an inhibitor of succinate dehydrogenase and uncoupled the TCA cycle from central carbon metabolism. Despite having an active oxidative phosphorylation reaction, central carbon metabolism showed significant less flux then that of the control and DNP. In addition, there was a high accumulation of lactate with approximately 8 mM at the end of the 16 hour reaction. This accumulation is most likely due to the surplus of NADH not utilized in oxidative phosphorylation. 105 a TTA (2 h) b TTA (8 h) gp gp gp gp M M (n-1) … MALTOSE Flux (A.U.) M M (n-1) … MALTOSE gp 0 25 50 75 100 gp G1P GLC G1P GLC hk hk pgm pgm zwf pgl gnd zwf pgl gnd G6P 6GPL 6GPC RL5P G6P 6GPL 6GPC RL5P pgi rpe rpi pgi rpe rpiedd edd F6P XU5P R5P O2 F6P XU5P R5P Otkt1 tkt1 2 fdp pfk fdp pfk atp atp F16P tkt2 S7P G3P F16P tkt2 S7P G3P talAB talAB fbaA ATP fbaA ATP tpiA E4P F6P tpiA E4P F6P DHAP G3P DHAP G3P gapA gapA eda 2DDG6P GLU eda1,3DPG 1,3DPG 2DDG6P GLU pgk acn ICIT icd gdh pgk acn ICIT icd gdh 3PG AC CIT AKG 3PG AC CIT AKG gpm ackA gltA gpm ackA gltA akgdh akgdh 2PG ACCOA aceA 2PG ACCOA aceA eno eno ppc ppc PEP OAA GLX SUCCOA PEP OAA GLX SUCCOA pps pyk pdh aceB pps pyk pdh aceB scs scs LAC PYR mdh LAC PYR mdh ldh maeAB ldhMAL SUCC maeAB MAL SUCC frd frd fum FUM sdh fum FUM sdh Figure 5.9: Mean flux distribution across an ensemble (N=100) for TTA. Fluxes were determined by integrating kinetic parameters with enzyme levels and constrain- ing to measurements of metabolites and enzyme activity levels where data was available. (a) Flux distribution at 2 hours of CFPS reaction. Flux difference from control shown for key reactions at 2 hours of CFPS reaction. (b) Flux distribution at 8 hours of CFPS reaction. Flux difference from control shown for key reactions at 8 hours of CFPS reaction. Fluxes were normalized to maltodextrin consumption at t=0 hours. 106 pgm hk pgi gpm eno pyk pdh zwf rpe rpi tkt1 gltA akgdh sdh fum mdh mae ackA ldh atp pgm hk pgi gpm eno pyk pdh zwf rpe rpi tkt1 gltA akgdh sdh fum mdh mae ackA ldh atp In summary, in the case of TTA, transcription and translation were powered with the nucleotides and amino acids that were supplemented with the CFPS extract and lasted for roughly 4 hours with protein production. Whereas when the reaction was incubated with DNP, central carbon metabolism remained active and relied on substrate level phosphorylation but it wasn’t sufficient to sustain transcription and translation for over 4-6 hours. Whereas in the control, central carbon metabolism was activated to fuel and sustain transcription and translation and showed active protein production for 16 hours with mRNA still at steady state levels. With differences in flux distributions between all three groups, we next wanted to examine the performance metrics of the myTXTL system in terms of energy efficiency and carbon yield. Energy Efficiency Energy efficiency was significantly higher for the control at 23 ± 2.8% for transcrip- tion and translation, where the DNP and TTA group were at 15 ± 2.7% and 15 ± 2.6%, respectively (Fig. 5.10). Energy efficiency was calculated as a ratio for the entire duration of the reaction in terms of nucleotides triphosphates utilized for the corresponding category to ATP generation. Despite having a higher energy efficiency then the treatment groups, 37 ± 1% of the nucleotide triphosphates were wasted toward degradation (Fig 5.10A). Twenty seven percent of energy was utilized in glycolysis, with 11% toward amino acid biosynthesis and 2% toward anaplerosis. 107 a b c Control DNP TTA Amino Acids Amino Acids TXTL Amino Acids TXTL 11% TXTL 11% 8%15% 15% 23% Degradation 16% Amino Acids Glycolysis Degradation 23% Anaplerosis 37% Glycolysis 10%Degradation 27% Glycolysis50% 51% Anaplerosis 1% Anaplerosis 2% Figure 5.10: Mean energy efficiency across an ensemble (N=100) for control (a), DNP (b), and TTA (c) throughout the metabolic network. TXTL denotes the energy efficiency for transcription and translation processes. In the case of DNP, there was 50% of the energy wasted toward degradation. This is due to the effect of DNP which resulted in a higher metabolism as well as the inhibition of all energy requiring processes [56]. For TTA, the majority of the energy was spent on glycolysis with 51%. The flux distribution for TTA showed only an active upper glycolysis pathway and thus resulting in a higher than normal energy utilization. Carbon Yield Carbon yield for GFP production was similar for all groups. The control group had a carbon yield of 2±0.2%, where the DNP group had a carbon yield of 1.4±0.2% and TTA had a carbon yield of 1±0.2% (Fig. 5.11). Carbon yield was calculated for the duration of the reaction as a ratio of the concentration of the carbon produced to the total concentration of the carbon consumed. The low carbon yield for all 108 a b c Control DNP TTA GFP GFP GFP 2% CO2 1% Other 1% Other 12% Other CO2 17% CO2 24% 28% 25% 9% Amino Acids Amino Acids 3%Glycolysis Glycolysis 1% 22% Amino Acids 28% 1% Glycolysis TCA Cycle PPP 16% PPPPPP 3% 17% 39% TCA Cycle 36% TCA Cycle 3% 12% Figure 5.11: Mean carbon yield across an ensemble (N=100) for control (a), DNP (b), and TTA (c) for CFPS. PPP denotes the Pentose Phosphate Pathway. Other includes purine, pyrimidine and chorismate metabolism. groups showed that the myTXTL system was supplemented with more carbon then needed. For instance, the extract is supplied with 20-40 mM of maltodextrin and was measured to have a concentration of approximately 104 mM of glutamate. Meanwhile, the amount of GFP produced was in the range of 10-30 µM. Thus, we investigated where the remaining carbon went towards. The control and TTA group had a very similar distributino of the carbon in the network. Between both groups, 9-12% went towards carbon dioxide, 22-28% remained in glycolysis, 3% remained in the TCA cycle, 36-39% remained in the pentose phosphate pathway, 1-3% went towards amino acid biosynthesis and 17-24% went towards purine, pyrmidine and chorismate metabolism (other category). Meanwhile in the DNP group, the carbon yield had a more uniform distribution throughout the network with a notable difference in carbon dioxide (25%). The higher yield of carbon dioxide further supports the higher metabolism observed 109 with DNP incubation [56]. 5.3.5 Enzyme activity assays reveal allosteric regulation in CFPS The activity of enzymes throughout central carbon metabolism were measured in the myTXTL system at 2 and 8 hours of a CFPS reaction for control, DNP and TTA (Fig. 5.12). Substantial differences were observed between groups and between the two time points for a number of enzymes. However, for enzymes where allosteric regulation is not present, including: eno, mdh, gdh, and akgdh; the enzyme activity assays showed to have the same activity between groups and time points. Thus, the activity assays reveal that allosteric regulation is present in CFPS. For instance, the enzyme icd is allosterically regulated with phosphoenolpyruvate (PEP) inhibited its activity. The control and TTA group show the enzyme activity increased from approximately 180 to 250 mM/h and 125 to 290 mM/h, respectively, from 2 to 8 hours. Additionally, there was an overall decrease of PEP abundance for cotrol and TTA in the CFPS extract from 2 to 8 hours which resulted in the observed increase of enzyme activity for icd. 5.4 Discussion Cell-free protein synthesis relies on transcription and translation machinery in order to produce a protein of interest. However, the mechanisms and reactions 110 Control DNP TTA GLC 2h 8h 2h 8h 2h 8h hk G6P ATP NAD(P)H F16PNADH G6P zwf 6GPL 6GPC gnd RL5P pgi 6PGC 2h 8h F6P pfk F16P,PEPF6P,AMP F16P 2h 8h GLU 2h 8h ICIT 2h 8h fbaA DHAP icd gdhPEP,CIT CIT DHAP G3P PEP AKG 2h 8h 2h 8h gltA AKG 2PG ACCOA akgdh eno F16P 2h 8h PEP ppc OAA SUCCOA2h 8h F16P, PEP pyk GDP scs ATP mdh 2h 8h 2h 8h LAC ldh PYR pdh MAL SUCC PYRAMP CIT GLX 2h 8h 2h 8h fum sdh FUM 2h 8h 2h 8h 2h 8h 2h 8h Figure 5.12: Enzyme activity measurements reveal allosteric regulation is present in CFPS. Enzyme activity assays at 2 and 8 hours of the CFPS reaction throughout the metabolic network for control (black), DNP (dark grey), and TTA (light grey). involved in CFPS are not only limited to transcription and translation. Jewett and coworkers have shown that central carbon metabolism is activated with inverted membrane vesicles carrying out oxidative phosphorylation in the Cytomim system , an E. coli based extract [81]. Thus, CFPS systems are often more complex then pre- viously believed with many interacting species that could be potentially exploited 111 to optimize for better performance. In this study we used a mathematical frame- work to gain insights into the the activity of metabolic pathways and biochemical reactions of CFPS in the myTXTL system. One of the main advantages of CFPS systems is the elimination of the cell mem- brane, which allows for direct access to metabolites and potentially precise control of biochemical reactions. However, a comprehensive absolute quantification of metabolites in CFPS has not been reported and is often limited to amino acids and a few organic acids [83, 83, 81, 71]. Here, we present a robust time-course quantification of 41 species with a single-quad LC-MS system along with 20 amino acids with a LC-TUV system. In addition, we quantified absolute levels of mRNA using real-time quantitative RT-qPCR. This comprehensive dataset allowed us to constrain our mathematical model of CFPS metabolism integrated with kinetic parameters from BRENDA [79] and enzyme abundance levels estimated by Ga- reene and coworkers [53]. The constraint-based linear programming framework was validated with the prediction of mRNA and protein production along with describing the effect of DNP and TTA. DNP incubation has been reported to result in overconsumption of oxygen, higher rate of metabolism, while inhibiting en- ergy requiring processes [120, 56]. DNP has also been shown to disrupt oxidative phosphorylation in the Cytomim system [81] and in zebrafish [15]. The model predicted a high rate of metabolism leading to high levels of acetate, agreeing with literature findings. Further, the high carbon yield of 25% in carbon dioxide suggests higher activity metabolism and anaplerosis. TTA blocks the respiratory chain at Complex II by binding to the quinone reduction site and inhibiting the transfer of 112 electrons form FE-S centre S-3 of succinate dehydrogenase to oxidized ubiquinone [167, 140]. TTA was also used by Jewett and coworkers to assess whether oxidative phosphorylation was active in the Cytomim system. In our study, TTA incubation resulted in a lower protein titer of about 50% when compared to the control. The model predicted reduced activity in succinate dehydrogenase and oxidative phos- phorylation activity reduced by 51% when compared to the control. Previously, the myTXTL system has been reported to rely on glycolysis and recycle inorganic phosphates [28, 52], however our study suggests that the E. coli based extract also has active oxidative phosphorylation with glutamate powering the TCA cycle. Our modeling framework provided quantitative means to assess the perfor- mance of CFPS in terms of energy efficiency and carbon yield. Previously, we reported a theoretical optimal energy efficiency of approximately 80% for transcrip- tion and translation [181]. However, the myTXTL system had an energy efficiency of 23% for the control and 15% with the inhibitors. In addition, the carbon yield for GFP was only 2%. Thus, CFPS has more than enough carbon and energy require- ments but is not being effectively used. The flux distribution suggests that despite having oxidative phosphorylation, anaerobic processes are still active in cell-free extracts as seen with the high accumulation of acetate and flux in anaplerotic reactions. Where in vivo systems are able to respond to different environment conditions and activate different metabolic pathways, cell-free extracts no longer have the ability for enzymatic regulation. Thus, some of the enzymes that are present in CFPS may lead to inefficiency and low carbon yield. For example, Bujara and coworkers successfully increased the yield of dihydroxyacetone phosphate 113 (DHAP) from glucose in CFPS [22]. The source strain that was used for cell-free extract preparation had a gene knockout of triosephophosphate isomerase (tpiA) which resulted in the higher accumulation of DHAP. Such strategies have been used for decades in in vivo systems [12] and are only beginning to be used in CFPS [196, 2, 11]. In addition, the majority of the energy is wasted toward nucleotide triphosphate degradation. This suggests that the energy utilization could be op- timized by addressing the rate limiting step of protein production identified as translation [109, 181] Underwood and coworkers showed that increasing ribosome abundance did not significantly increase protein yields or rates; however, adding elongation factors increased protein synthesis rates by 27% [175] which would require more energy to be spent towards translation instead of degradation. Sub- sequently, the carbon substrates that power CFPS could be minimized in order to increase the energy efficiency and carbon yield by lowering the total ATP produced since the majority is degraded. Our analysis of CFPS performance was based on the quality and accuracy of the flux estimation for each time step. The integrated kinetic constraint-based model was constrained to 61 species with kinetic parameters and physiological enzyme levels in the myTXTL system. Adadi and coworkers have used kinetic parameters for the flux bounds with success in predicting microbial growth rates [3]. Thus, we have high confidence in the reported flux distribution to be a representation of CFPS metabolism for myTXTL. In addition, the flux bounds for a subset of 19 enzymes was also constrained to the actual enzyme activity levels. The assays revealed that allosteric regulation is active in CFPS and should be incorporated into 114 the mathematical framework. Allosteric regulation was shown to be instrumental in capturing experimental data with a kinetic ODE model for CFPS [71]. Since our model was constrained to the enzyme activity assays, mathematical descriptions of allosteric regulation on flux bounds was not needed. However, conducting these assays experimentally is low throughput and expensive. Thus, it would be advantageous to incorporate allosteric regulation into the modeling framework. Taken together, we provide an integrated kinetic constraint based mathematical framework with absolute metabolite measurements to better understand cell-free metabolism that can be used to understand performance limitations. Flux estima- tions revealed that central metabolism is activated along with glutamate powering the TCA cycle to provide reduced ubiquinone for oxidative phosphorylation. Ox- idative phosphorylation inhibitors provide biochemical evidence that myTXTL relied on oxidative phosphorylation to provide energy for sustaining transcription and translation for 16 hours in a batch reaction. Finally, enzyme activity assays throughout central carbon metabolism revealed that allosteric regulation is present in CFPS metabolism and should be incorporated into future mathematical models. Cell-free protein synthesis is beyond just transcription and translation processes, thus we provide a comprehensive mathematical framework that predicted mRNA and protein production and could potentially be used to identify strategies for the improvement of CFPS productivity, yield and efficiency. 115 5.5 Materials & Methods 5.5.1 Cell-free protein synthesis and oxidative phosphorylation inhibitors Cell-free protein synthesis reactions were carried out with the myTXTL system (Arbor Biosciences) in 1.5 mL Eppendorf tubes at 29 ◦C. Plasmid P70a-GFP (Arbor Biosciences) was used as the DNA template for green flourescent protein (GFP) expression. The template plasmid was amplified in E. coli KL740 cI857+ (E. coli Genetic Stock Center, No. 14222). The plasmid was isolated and purified using a Plasmid Mini Kit (Qiagen, Valencia CA). Each cell-free reaction was supplemented with a final concentration of 5 nM P70a-GFP. Each Cell-free reaction had a total volume of 14 µL with 9 µL myTXTL master mix, 1.5 µL P70a-GFP, and 3.5 µL water (control), 3.5 µL 2-4-dinitrophenol (DNP, 2.5 mM final concentration in CFPS), or 3.5 µL thenoyltrifluoroacetone (TTA, 1 mM final concentration in CFPS). DNP was solubilized in water to prepare a 10 mM solution. TTA was solubilized in methanol to prepare a 100 mM solution and diluted with water to 4 mM before adding to the CFPS reaction. Negative controls performed with methanol demonstrated that these solvents did not affect protein synthesis at concentrations used in this study. Separate CFPS samples were carried out in triplicate for each time point in order to ensure constant volume throughout the duration of the reaction. 116 5.5.2 Absolute quantification of central carbon metabolites To quantify central carbon metabolites and amino acids, reaction samples were quenched with 100% ice-cold ethanol in a 1:1 volumetric ratio. Ethanol precipitated samples were centrifuged at 12,000 g for 15 minutes at 4 ◦C. The supernatant was collected and stored at -80 ◦C. Metabolites, involved in glycolysis, pentose phosphate pathway, tca cycle, and energy metabolism, were quantified by liquid chromatography-tandem mass spectrometry (LCMS) using an isotope ratio based approach. Samples were tagged with 12-C aniline, meanwhile internal standards were tagged with 13-C aniline as described previously [? ]. Briefly, 6 µL of the supernatant was added to 44 µL of water, followed by 5 µL of 200 mg/mL EDC (N- (3-dimethylaminopropyl)-N-ethylcarbodiimide hydrochloride) and 5 µL of 12-C 6 M Aniline (pH 4.5). EDC was solubilized in water. The aniline solution was pre- pared by combining 550 µL of 10.9 M aniline with 337.5 µL water and 112.5 µL of 12 M hydrochloric acid in an Eppendorf tube and vortexed well. The mixture was gently vortexed at room temperature for 2 hours. In order to stabilize the metabo- lites, 1.5 µL of TEA (triethylamine) was added. The mixture was centrifuged at 13,500 g for 3 minutes and 25 µL of the supernatant was transferred to a LCMS vial. The sample mixture was mixed with 25 µL of a standard stock solution containing 35 metabolites at 80 µM tagged with 13-C aniline. The standard stock solution was tagged with aniline following the same procedure as the sample, except with 13-C aniline. The 35 standard metabolites are listed in the metabolite dataset and exclude acetic acid, NAD, NADP, FAD, acetyl-CoA, glycerol 3-phosphate, and 117 maltose. Acetic acid was tagged with aniline and quantified with a standard curve method. NAD, NADP, FAD, acetyl-CoA and glycerol 3-phosphate were not tagged with aniline, and were quantified by a standard curve method. Samples and stan- dards were injected at 5 µL onto a Waters Acquity BEH C18 (1.7 µm, 2.1 mm x 150 mm) column. The LCMS system consisted of a Waters Acquity Quaternary system, a Acquity Sample Manager, and a Acquity QDa detector (Waters Corp, Medford, MA). The system was controlled by Empower 3 software (Waters). The autosampler was set at 10 ◦C. Separation was carried out at a flow rate of 0.3 mL/min. The elution started with 95 % mobile phase A (5 mM tributylamine (TBA) in HPLC-grade water adjusted to pH 4.75 with glacial acetic acid) and 5 % mobile phase B (5 mM TBA in acetonitrile), raised to 70 % B in 10 minutes, raised to 100 % B in 2 minutes and held at 100 % B for 3 minutes. Return to initial conditions (95 % A, 5 % B) over 1 minute and hold for 9 minutes to re-equilibrate the column. The column was pre-conditioned with the specified gradient protocol 3 times prior to any injection onto the column. The MS chromatograms were acquired in negative ion mode with a probe temperature of 520 ◦C, negative capillary voltage of -0.8 kV, and positive capillary voltage of 0.8 kV. 5.5.3 Amino acid analysis Amino acids were analyzed using a Waters AccQ-Tag Ultra amino acid analysis kit (Waters). The ethanol-precipitated CFPS samples were derivatized and tagged by combining 4 µL of the sample with 6 µL of water, 70 µL borate buffer solution 118 (Waters), and 10 µL reagent (Waters). The solution is then kept in a water bath at 55 ◦C for 10 minutes. One-microliter is then injected onto an Acquity BEH C18 column (1.7 µm, 2.1 mm x 100 mm). The elution gradient and flowrate was conducted according to the manufacturer’s recommendations. Amino acids were detected by an Acquity TUV detector (Waters) at 260 nm. Amino acids were identified by known retention times of standards. Concentrations were determined by comparison with calibration standard curves with the exception of glutamate. Glutamate was outside the linearity of the calibration standard curves and not quantified with the Waters AccQ-Tag Ultra amino acid analysis kit. 5.5.4 Glutamate and maltose assays Glutamate and maltose concentrations were determined using enzymatic colori- metric assays purchased from Sigma-Aldrich (St. Louis, MO) according to the manufacturer instruction. The readings were performed with a multimode plate reader Varioskan Lux (ThermoFisher) using 96-well plates. 5.5.5 Protein quantification GFP concentrations were determined by fluorescence measurements with compari- son to a standard curve. Two-microliters of the CFPS reaction were diluted with 33 µL of phosphate-buffered saline and analyzed in triplicate on a black 384-well plate 119 with a multimode plate reader Varioskan Lux (ThermoFisher) at 488 nm excitation and 535 nm emission. 5.5.6 Enzyme activity assays Enzyme activity for 6-phsophogluconate dehydrogenase (gnd) and phospho- enolpyruvate carboxylase (ppc were determined using colorimetric based assays purchased from BioVision (Milpitas, CA) according to the manufacturer instruction. All remaining enzyme activity levels were determined using colorimetric and fluo- rescence based assays purchased from Sigma-Aldrich (St. Louis, MO) according to the manufacturer instruction. The readings were performed in kinetic mode with a multimode plate reader Varioskan Lux (ThermoFisher) using clear 96-well plates for colorimetric assays and black 96-well plates for fluorescence based assays. 5.5.7 Absolute quantification of mRNA Absolute levels of messenger RNA (mRNA) were quantified using quantitative real-time RT-PCR with comparison to a standard curve. A standard mRNA of GFP was prepared by conducting a CFPS reaction with 5 nM P70a-GFP plasmid for 2 hours at 29 ◦C. The reaction was applied to a PureLink RNA Mini Kit with an on- column PureLink DNase Treatment (ThermoFisher) according to the manufacturer instruction. The total RNA was eluted with Invitrogen UltraPure DNase/RNase- 120 free water (ThermoFisher). The total RNA was then applied to MICROBExpress Bacterial mRNA enrichment kit (ThermoFisher) followed by MEGAclear Tran- scription clean-up kit (ThermoFisher) according to the manufacturer instruction. The purified mRNA was eluted with UltraPure water. The mRNA concentration was determined with a a Qubit Fluorometer using a Qubit RNA HS Assay Kit (ThermoFisher). To quantify mRNA levels in CFPS samples from the experiment, 1 µL from the CFPS reaction was applied to the PureLink RNA Mini Kit with an on-column PureLink DNase Treatment according to the manufacturer instruction. The total RNA was eluted with 50 µL of UltraPure water. The total RNA sample was diluted 100 times and 2 µL of the diluted sample was loaded for each RT-PCR reaction. The quantitative real-time RT-PCR reaction was carried out on a Applied Biosystems QuantStudio 3 with a Taqman RNA-to-Ct 1-Step Kit using GFP Taqman assay (Mr04329676 mr) on a 96-well plate in triplicate according to the manufacturer instruction (Applied Biosystems, Life Technologies Corporation, Foster City, CA). Messenger concentrations were determined by comparison to the calibration stan- dard curve. The standard curve was generated with the purified mRNA of GFP ranging from 10−4 to 1 ng. The standard curve had a linearity coefficient of 0.994 and efficiency of 104 %. 121 5.5.8 Formulation of model equations The dynamic sequence specific flux balance analysis problem was formulated as a linear program: ( ) max Z = θTv v Subject to : (Sv− ẋ) ≥ 0 R (5.1) ẋi = ∑ σijrj(x, e, k) i = 1, 2, . . . ,M j=1 0 ≤ vj ≤ rj(x, e, k) j = 1, 2, . . . ,R where S denotes the stoichiometric matrix (M×R) and σij denotes the stoichio- metric coefficient for species i in reaction j, v denotes the unknown flux vector (R × 1), θ denotes the objective vector (R × 1), and rj(x, e, k) denotes the rate of reaction j. For all metabolic reactions except for the transcription/translation processes and maltodextrin consumption, reaction j was modeled as the product of the turnover rate k j and enzyme abundance ej or known as the maximum veloc- ity of the reaction Vmax (mM/h). The transcription/translation and maltodextrin consumption reactions were modeled following saturation kinetics. The turnover rate for each reaction was identified from BRENDA [79] or taken from Adadi and coworkers [3]. The enzyme abundance was identified for 104 reactions from Garenne and coworkers [53]. Garenne and coworkers reported the counts of the enzymes identified in their LC-MS analysis, where we calibrated the counts of sigma 70 and RNA polymerase to the concentration values [52] to create a calibra- tion curve. The enzyme abundance was calculated using this calibration curve 122 and all remaining enzymes not identified in Garenne and coworkers were set to the median value of 50 nM. The enzyme abundance was validated for a subset of 15 enzymes from our enzyme activity assays where we calculated the expected enzyme abundance in the cell-free reaction by: V̂max,j êj = (5.2) k̂ j where V̂max,j is the maximum velocity for reaction j from the enzyme activity assay and k̂ j is the corresponding turnover number for enzyme j. The transcription (TX) and translation (TL) reactions stoichiometry was modeled based on previous work [4, 181]. The transcription initiation rate was modeled as: ( ) r max G TXinit = VTX (5.3)τTXKTX + (τTX + 1)G where G denotes the concentration of the DNA plasmid in the cell-free reaction, KTX denotes a transcription saturation coefficient, and τTX denotes the transcrip- tion time constant. The maximum transcription rate VmaxTX was formulated as: [ ( ) ] VmaxTX ≡ ṅ R TXTX u (κ) (5.4)lG where RTX denotes the RNA polymerase concentration, ṅTX denotes the RNA polymerase elongation rate (nt/h), lG denotes the gene length (nt). The term u (κ) 123 (dimensionless, 0 ≤ u (κ) ≤ 1) is an effective model of promoter activity, where κ denotes promoter specific parameters. In this study, the promoter model was taken from Vilkhovoy and coworkers [181] for the P70a promoter. The transcription rate was modeled as: x r sTX = rTXinit ∏ KTX (5.5)sem + xTX s s where mTX denotes the set of reactants for transcription: ATP, CTP, GTP, and UTP, and KTXs denotes the saturation constant for species s. The degradation of mRNA was modeled as a first order rate: rd = kd · xmRNA (5.6) where kd denotes the degradation rate constant. The translation initiation and translation rate was modeled as: ( ) r = Vmax xmRNA TL TL (5.7)τTLKTL + (τTL + 1)xmRNA where xmRNA denotes the concentration of the mRNA, KTL denotes a translation saturation coefficient, and τTL denotes the translation time constant. The maximum translation rate VmaxTL was formulated as: [ ( )] Vmax ṅTL TL ≡ KPRTL (5.8)lP where KP denotes the polysome amplification constant, RTL denotes the ribosome 124 concentration, ṅTL denotes the ribosome elongation rate (amino acids per hour), and lP denotes the number of amino acids in the protein of interest. The abundance of each species x was modeled as: xt+∆t = xt + Sv∆t (5.9) where t denotes the current time point and ∆t denotes the time step. Lastly, we imposed a user configurable bound Bi on the maximum rate of change for metabolite i where data was available: |ẋi| ≤ Bi i = 1, 2, . . . ,M (5.10) The bound Bi was determined by fitting the timecourse concentration data by a regression spline with the cubic SmoothingSplines package in Julia 1.1. The rate of change at step t was determined by a forward difference approximation from t to t + ∆t from the regression spline. Metabolic fluxes were estimated at each time step using the GNU Linear Programming Kit (GLPK) v4.55 [1]. All parameters are listed in Table 5.1. In addition, flux bounds were set to the experimental value where data was available for the corresponding enzyme activity assays. The objective of the cell free flux balance calculation was to maximize the rate of maltodextrin consumption, transcription initiation, transcription, mRNA degradation, translation initiation and translation, unless specified. 125 5.5.9 Quantification of uncertainty Experimental factors taken from literature, for example macromolecular concentra- tions or elongation rates, are uncertain. To quantify the influence of this uncertainty on model performance, we randomly sampled the expected physiological ranges for these parameters as determined from literature. An ensemble of flux distri- butions was calculated for the three different cases we considered: control, DNP, and TTA. The flux ensemble was calculated by randomly sampling the rate of change for metabolites where data was available, randomly sampling enzyme abundance, and randomly sampling RNA polymerase levels, ribosome levels, and elongation rates in a physiological range determined from literature. The rate of change for metabolites was sampled between the calculated value from the regression spline to twice its value. The enzyme abundance was randomly sampled from the estimated value upto 1.5 it’s value. P70 RNA polymerase levels were sampled between 60 and 75 nM, ribosome levels between 2.0 and 2.3 µM, the RNA polymerase elongation rate between 15 and 25 nt/s, and the ribosome elongation rate between 1.0 and 2 aa/s [175, 52]. We generated uniform random samples between an upper (u) and lower (l) parameter bound of the form: p∗ = l + (u− l)×U (0, 1) (5.11) 126 5.5.10 Calculation of energy efficiency Energy efficiency (E ) was calculated as the ratio of transcription and translation (weighted by the appropriate energy species coefficients) to ATP generation: ∫ E T (vTX · αT∫X + vTL · αTL)= (5.12) ∑ σATPj v̄j j∈R TATP αTX = 2 · (ATPTX + CTPTX + GTPTX + UTPTX) (5.13) αTL = 2 ·ATPTL + GTPTL (5.14) where αTX denotes the energy cost of transcription, αTL denotes the energy cost of translation, RATP denotes the set of ATP-producing reactions, σATPj denotes the ATP coefficient for reaction j, and T denotes the time of the experiment. ATPTX, CTPTX, GTPTX, and UTPTX denote the stoichiometric coefficients of each energy species for the transcription of the protein of interest, ATPTL and GTPTL denote the stoichiometric coefficients of ATP and GTP for the translation of the protein of interest. During transcription and tRNA charging, triphosphate molecules are consumed with monophosphates as byproducts; this is the reason for the factors of 2 on ATPTX, CTPTX, GTPTX, and UTPTX, and ATPTL 127 5.5.11 Calculation of carbon yield The carbon yield (YC) was calculated as the ratio of carbon produced as the protein divided by the carbon consumed as reactants: xGFP, f · CY GFPC = (5.15)∑ (xi,o − xi, f ) · Cmi i∈ms where xGFP, f denotes the final concentration of GFP, CGFP denotes carbon number of GFP, ms denotes the set of species that were consumed, xi,o denotes the initial concentration of species i, xi, f denotes the final concentration of species i, and Cmi denotes the carbon number of species i. 128 Table 5.1: Parameters for sequence specific flux balance analysis Description Parameter Value Units Reference RNA polymerase concentration RTL 60-75 nM [52] Ribosome concentration RTX 2-2.3 µM [52] Transcription elongation rate ṅTX 15-25 nt/s [52] Translation elongation rate ṅTL 1-2 aa/s/ribosome [52, 175] Transcription time constant τTX 0.021 - 0.05 constant calculated Translation time constant τTL 0.063 - 0.126 constant calculated Transcription saturation coefficient KTX 0.3 µM [? ] Translation saturation coefficient KTL 600.0 µM estimated Polysome number KP 10 ribosome number estimated mRNA degradation rate constant kmRNA 2.38 h−1d [52] Maltodextrin saturation constant Km 8.3 mM BRENDA Transcription saturation constant KTXs 0.03 mM estimated Weight RNA polymerase binding alone P70a K1 0.014 constant estimated Weight bound RNAP-σ70 P70a K2 10 constant estimated σ70 concentration σ70 35 nM [52] σ70 dissociation constant KD 130 nM [119] σ70 hill coefficient n 1 constant [119] Gene concentration GP 5 nM experiment ATP transcription coefficient ATPTX 208 constant calculated CTP transcription coefficient CTPTX 157 constant calculated GTP transcription coefficient GTPTX 195 constant calculated UTP transcription coefficient UTPTX 157 constant calculated ATP tRNA charging coefficient ATPTL 239 constant calculated GTP translation coefficient GTPTL 478 constant calculated Carbon number of GFP CGFP 1208 constant calculated 129 CHAPTER 6 TOWARD A GENOME SCALE SEQUENCE SPECIFIC DYNAMIC MODEL OF CELL-FREE PROTEIN SYNTHESIS IN ESCHERICHIA COLI 6.1 Abstract 1 In this study, we developed a dynamic mathematical model of E. coli cell-free protein synthesis (CFPS). Model parameters were estimated from a dataset con- sisting of glucose, organic acids, energy species, amino acids, and protein product, chloramphenicol acetyltransferase (CAT) measurements. The model was success- fully trained to predict these measurements, especially those of the central carbon metabolism. We then used the trained model to evaluate the optimality of protein production. CAT was produced with an energy efficiency of 12%, suggesting that the process could be further optimized. Reaction group knockouts showed that protein productivity was most sensitive to the oxidative phosphorylation and gly- colysis/gluconeogenesis pathways. Amino acid biosynthesis was also important for productivity, while overflow metabolism and TCA cycle affected the overall system state. In addition, translation was more important to productivity than transcription. Finally, CAT production was robust to allosteric control, as were most of the predicted metabolite concentrations; the exceptions to this were the concentrations of succinate and malate, and to a lesser extent pyruvate and acetate, 1The following work has been submitted as: Horvath N, Vilkhovoy M, Wayman JA, Calhoun K, Swartz J, and Varner JD, , ”Toward a Genome Scale Sequence Specific Dynamic Model of Cell-Free Protein Synthesis in Escherichia coli” Metabolic Engineering Communications. 130 which varied from the measured values when allosteric control was removed. This study is the first to use kinetic modeling to predict dynamic protein production in a cell-free E. coli system, and could provide a foundation for genome scale, dynamic modeling of cell-free E. coli protein synthesis. 6.2 Introduction Cell-free protein expression is a widely used tool in systems and synthetic biology, and a promising technology for personalized point of use biotechnology [137]. Cell-free systems offer many advantages for the study, manipulation and modeling of metabolism compared to in vivo processes. Central amongst these advantages is direct access to metabolites and the biosynthetic machinery without the interfer- ence of a cell wall, or the complications associated with cell growth. Thus, we can interrogate (and potentially manipulate) the chemical microenvironment while the biosynthetic machinery is operating, possibly at a fine time resolution. Cell-free protein synthesis (CFPS) is arguably the most prominent example of a cell-free system used today [81]. However, CFPS is not new; CFPS in crude E. coli extracts has been used since the 1960s to explore fundamental biological mechanisms. For example, Matthaei and Nirenberg used E. coli cell-free extracts in ground-breaking experiments to decipher the sequencing of the genetic code [118, 128]. Spirin and coworkers later improved protein production in cell-free extracts by contin- uously exchanging reactants and products; however, while these extracts could 131 run for tens of hours, they could only synthesize a single product and were energy limited [159]. More recently, energy and cofactor regeneration in CFPS has been significantly improved; for example, ATP can be regenerated using substrate-level phosphorylation [93] or even oxidative phosphorylation [81]. While it was once debated whether oxidative phosphorylation occurred in cell-free systems, Jewett and coworkers demonstrated its existence definitively in the Cytomim system by inhibiting it using electron transport chain and F1FO-ATPase inhibitors, as well as membrane gradient uncouplers, and observing a significantly lower protein yield [81]. They hypothesized respiration to be occurring in inner membrane vesicles created during cell lysis. Today, cell-free systems are used in a variety of appli- cations ranging from therapeutic protein production [110] to synthetic biology [70, 73, 137]. Moreover, there are also several CFPS technology platforms, such as the PANOx-SP and Cytomim platforms developed by Swartz and coworkers [82, 81], the TXTL platform of Noireaux [52] or the PURE system developed by Shimizu et al. [152]. However, for point of use cell-free manufacturing to become a mainstream technology, we must first understand the system performance, and eventually optimize important metrics such as yield and productivity. A critical tool towards this goal is mathematical modeling. We previously developed a constraint-based model of CFPS which integrated the expression of the protein product with the supply of metabolic precursors and energy [181]. Dynamic mathematical modeling has long contributed to our understanding of metabolism [184]. Decades before the genomics revolution, mechanistically structured metabolic models arose from the desire to predict microbial phenotypes 132 resulting from changes in intracellular or extracellular states [48]. The single cell E. coli models of Shuler and coworkers pioneered the construction of large-scale, dynamic metabolic models that incorporated multiple regulated catabolic and anabolic pathways constrained by experimentally determined kinetic parameters [37]. Shuler and coworkers generated many single cell kinetic models, including single cell models of eukaryotes [160, 190], minimal cell architectures [29], and DNA sequence based whole-cell models of E. coli [9]. More recent studies have extended the approach, from integrating disparate models of cellular processes in M. genitalium [88], to describing dozens of mutant strains in E. coli with a single partially kinetic model [90], to identifying industrially useful target enzymes in E. coli for improved 1,4-butanediol production [5]. Taken together, mathematical modeling of metabolism has proven useful for applications across systems biology. However, dynamic metabolic model development is often time consuming, and model identification and validation requires significant experimental information. Parameter identification is a challenge to the development of predictive dy- namic metabolic models. Sethna identified parameter sloppiness as a common feature of systems biology models; the eigenvalues of the network sensitivity were distributed across wide ranges, and were not generally aligned with single parame- ters [21, 59]. This leads to parameter values being unknown despite comprehensive metabolite information. Furthermore, if direct parameter measurements were at- tempted, they had to be precise and exhaustive to yield reliable model predictions. Surprisingly, despite this, models often still accurately predict multiple phenotypes via collective parameter fitting. Liao and coworkers constructed an ensemble of 133 models across a wide range of kinetic parameters that satisfied thermodynamic constraints and steady state flux distributions, and selected from within the ensem- ble those models that described enzyme overexpression datasets [174]. In this way, specific parameter identification was bypassed, and multiple relevant phenotypes could be described. Meanwhile, Hatzimanikatis and coworkers employed machine learning to simplify the parameter estimation problem [6]. They segregated the feasible-solution parameter space into N-dimensional boxes, via a binary decision tree which determined the values of parameters. This subsequently allowed for uniform, non-asymptotic sampling within the subregions; a convenient byproduct of this approach was a simple estimation of the volume of the solution space. Taken together, large-scale, descriptive models of prokaryotic metabolism can be constructed and trained to predict diverse biological behaviors with uncertain parameter information. In this study, we developed an ensemble of kinetic cell-free protein synthesis (CFPS) models using dynamic metabolite measurements from an early glucose powered Cytomim E. coli cell-free extract. While cell-free technology has evolved considerably since this data set was generated, developing a model using a pre- vious generation CFPS platform offers several unique opportunities. First and foremost, is the ability to directly compare the different improvements established by purely experimental means, to those estimated using a dynamic mathematical model. The CFPS model equations were formulated using the hybrid cell-free modeling framework of Wayman and coworkers [183], which integrates traditional kinetic modeling with a logical rule-based description of allosteric regulation. 134 Model parameters were estimated from measurements of glucose, organic acids, energy species, amino acids, and the protein product, chloramphenicol acetyl- transferase (CAT) over the course of a three hour protein synthesis reaction. A constrained Markov Chain Monte Carlo (MCMC) approach was used to minimize the squared difference between model simulations and experimental measure- ments, where a plausible range for each kinetic parameter was established from BioNumbers [122]. The ensemble of parameter sets described the training data with a median cost greater than two orders of magnitude smaller than a population of random parameter sets constructed using the same literature parameter con- straints. We then used the ensemble of kinetic models to analyze the performance of the CFPS system, and to estimate the pathways most important to protein pro- duction. We calculated that CAT was produced with an energy efficiency of 12%, suggesting that much of the energy resources for protein synthesis were diverted to non-productive pathways. By simulating the knockout of metabolic enzyme groups (this was not actually done experimentally), we showed that metabolism and protein production in particular depended upon oxidative phosphorylation and glycolysis/gluconeogenesis. In addition, translation was more important to productivity than transcription. Lastly, CAT production was robust to allosteric control, as was most of the network, with the exception of the organic acid trajecto- ries in central carbon metabolism. Taken together, this study provides a foundation for sequence specific genome scale, dynamic modeling of cell-free E. coli protein synthesis. 135 GLC Pentose Phosphate Pathway (PPP) G6P 6GP RU5P F6P XU5P R5P FBP S7P G3P E4P F6P T3P 1,3DPG 2DDG6P 3PG Other modules in the model Oxidadative phosphorylation Amino acid biosynthesis and degrdation 2PG Transcription and translation processes C1 metabolism Energy metabolism PEP PYR Lactate ACCOA Acetate OAA CIT MAL ICIT TCA Cycle FUM AKG SUCC SUCCCoA Figure 6.1: Schematic of the core portion of the cell-free E. coli metabolic network. Metabolites of glycolysis, pentose phosphate pathway, Entner-Doudoroff pathway, and TCA cycle are shown. Metabolites of oxidative phosphorylation, amino acid biosynthesis and degradation, transcription/translation, chorismate metabolism, and energy metabolism are not shown. 136 Glycolysis 6.3 Results The cell-free E. coli metabolic network was constructed by removing growth- associated reactions from the iAF1260 reconstruction of K-12 MG1655 E. coli [44], and by adding reactions describing chloramphenicol acetyltransferase (CAT) biosynthesis (Fig. 6.1). In addition, reactions that were knocked out in the host strain used to prepare the extract were removed from the network (∆speA, ∆tnaA, ∆sdaA, ∆sdaB, ∆gshA, ∆tonA, ∆endA). Lastly, we added transcription and trans- lation processes for the synthesis of CAT, which were based on template reactions from earlier work done by Allen and Palsson [4] and more recently Vilkhovoy et al. [181]. The metabolic network, which contained 148 metabolites and 204 reactions, is available in the supplemental materials. Model equations followed the hybrid modeling framework of Wayman and coworkers [183], combining mul- tiple saturation kinetics with a rule-based model of allostery. An ensemble of 100 model parameter sets was estimated from measurements of glucose, CAT, organic acids, energy species, and 18 of the 20 proteinogenic amino acids [181] using a constrained Markov Chain Monte Carlo (MCMC) approach. The organic acids measured included pyruvate, lactate, acetate, succinate, and malate. The energy species included three phosphorylation states each of the four ribonucleosides: ATP, ADP, AMP, GTP, GDP, GMP, CTP, CDP, CMP, UTP, UDP, and UMP. Nicotinamide adenine dinucleotide (NAD(H)) and nicotinamide adenine dinucleotide phosphate (NADP(H)), while present in the model, were not measured in the dataset. The model equations and parameter sets, as well as the experimental dataset, are 137 available under an MIT open source software license from the Varnerlab website [179]. The MCMC algorithm minimized the squared difference (residual) between the training data and model simulations starting from an initial parameter set assembled from literature and inspection. Bounds on permissible parameter values were established using studies from the BioNumbers database [122]. For each newly generated parameter set, the balance equations were re-solved and the cost function re-calculated; all sets with a lower cost (and some with higher cost) were accepted into the ensemble. Parameter sets were also required to meet strict ordinary differential equation solver tolerances, to ensure numerical stability. Ap- proximately 3,000 sets were accepted into an initial ensemble; each set contained 204 maximum reaction rates, 204 enzyme activity decay constants, 548 saturation constants, and 34 control parameters, for a total of 815 parameters. 100 sets were then selected from this initial ensemble based upon error to form the final param- eter ensemble. The final ensemble had a mean Pearson correlation coefficient of 0.78; this suggested parameter sets were not over-sampled in the region of a local minimum. The median maximum reaction rate (Vmax) across the ensemble was 11.6 mM/h, assuming a total cell-free enzyme concentration of approximately 170 nM. This Vmax, which corresponded to a median catalytic rate of 19 s-1 across the ensemble, was in relative agreement with the 13.7 s-1 median catalytic rate found by Milo and coworkers [13]. The median enzyme activity decay constant was 0.0045 h-1, corresponding to an enzyme activity half life of approximately 6 days. The median saturation constant was 1.0 mM; this was within one order of magnitude of 138 the 130 µM reported by Milo and coworkers. Lastly, both the median control gain and order parameters, which appeared in the allosteric control functions, were on order 1. While the maximum reaction rates of the ensemble were distributed evenly across the allowed range (Fig. 6.5A), the saturation constants were clustered around the upper and lower bounds (Fig. 6.5B) of the parameter search. Taken together, the constrained MCMC approach estimated a numerically stable ensemble of model parameters that was on aggregate consistent with literature values. Next, we exam- ined the model fit to the experimental training data. The ensemble of kinetic CFPS models captured the time evolution of protein biosynthesis, and the consumption and production of organic acid, amino acid and energy species. The time evolution of central carbon metabolites (Fig. 6.2, top), amino acids (Fig. 6.3), and energy species (Fig. 6.4) were captured by the ensemble and the best-fit parameter set. The constrained MCMC approach estimated parameter sets with a median error more than two orders of magnitude less than random parameter sets generated within the same parameter bounds established from literature (Fig. 6.6). For 29 of the 37 measurements in the training dataset, the mean Akaike information criterion (AIC) of the predicted ensemble was lower than that of the random sets, signifying a better fit of the data (Table 6.3). For the remaining eight measurements, the AIC score of the random ensemble was lower than that of the predicted ensemble, but the difference was within the standard deviation of the AIC score (with the exception of isoleucine: œRand = 4.8, ¯Rand EnsAIC AIC − ¯AIC = −5.0). Taken together, these results suggested that the predicted ensemble modeled cell-free metabolism and protein production, significantly better than the random ensemble, not just overall 139 Figure 6.2: Central carbon metabolism in the presence (top) and absence (bottom) of allosteric control, including glucose (substrate), CAT (product), and intermediates, as well as total concentration of energy species. Best-fit parameter set (orange line) versus experimental data (points). 95% confidence interval (blue or gray shaded region) over the ensemble of 100 sets. but for the majority of individual metabolite and protein measurements. Next, we analyzed the important features of the cell-free protein synthesis timecourse. The predicted ensemble of models captured the biphasic time course of CAT production. During the first hour, glucose powered protein production, and CAT was produced at 8 µM/h; subsequently, pyruvate and lactate reserves were con- sumed to power metabolism, and CAT was produced at 5 µM/h. Allosteric control 140 No Control Control Figure 6.3: Amino acids in the presence of allosteric control. Best-fit parameter set (orange line) versus experimental data (points). 95% confidence interval (blue shaded region) over the ensemble of 100 sets. was important to central carbon metabolism, especially for pyruvate, acetate, and succinate (Fig. 6.2, bottom). However, CAT production was robust to the removal of allosteric control. The difference between the allosteric control and no-control cases was mostly seen in the second (pyruvate-driven) phase of CAT production, following glucose exhaustion. Specifically, pyruvate, succinate, and malate con- sumption and acetate accumulation increased with the removal of allosteric control. 141 Figure 6.4: Energy species and energy totals by base in the presence of allosteric control. Best-fit parameter set (orange line) versus experimental data (points). 95% confidence interval (blue shaded region) over the ensemble of 100 sets. The rate of acetate accumulation increased by 172%, while the rates of malate, pyruvate, and lactate consumption increased by 146%, 82%, and 9%, respectively. Succinate went from accumulating slightly in the second phase, in the presence of allosteric control, to being fully consumed. While ATP generation varied when allosteric control was removed, ATP expenditure toward CAT production did not. Most of the fluxes that differed between the two cases involved PEP and pyru- vate, which directly participated in many of the reactions modulated by allosteric control. Taken together, the ensemble of kinetic models was consistent with time series measurements of the cell-free production of a model protein. Although the 142 A Rate maxima (mM/h) B Saturation constants (mM) Figure 6.5: Histograms of model parameters, across the ensemble of 100 sets. A. Histogram of rate maxima. B. Histogram of saturation constants. 143 Relative frequency Relative frequency Training Random Measured species Figure 6.6: Log of cost function (residual between training data and model simula- tions) across 37 datasets for data-trained ensemble (blue) and randomly generated ensemble (red, gray background). Median (bars), interquartile range (boxes), range excluding outliers (thin lines), and outliers (circles) for each dataset. Median across all datasets (large bar overlaid). ensemble described the experimental data, it was unclear which kinetic parameters and pathways most influenced metabolism and CAT production. To explore this question, we performed reaction group knockout analysis. The importance of CFPS pathways was estimated using pathway group knock- out analysis (Fig. 6.7). The metabolic network was divided into 19 reaction groups, spanning central carbon metabolism, energetics, and amino acid biosynthesis. The 144 log(cost function) A Glycolysis/Gluconeogenesis Pentose Phosphate Pathway Entner-Doudoroff TCA cycle Oxidative phosphorylation Cofactors Anaplerotic/Glyoxylate reactions Overflow metabolism Folate metabolism Purine/Pyrimidine ALA, ASP, ASN biosynthesis GLU, GLN biosynthesis ARG, PRO biosynthesis GLY, SER biosynthesis CYS, MET biosynthesis LYS, THR biosynthesis HIS biosynthesis PHE, TRP, TYR biosynthesis ILE, LEU, VAL biosynthesis B Glycolysis/Gluconeogenesis Pentose Phosphate Pathway Entner-Doudoroff TCA cycle Oxidative phosphorylation Cofactors Anaplerotic/Glyoxylate reactions Overflow metabolism Folate metabolism Purine/Pyrimidine ALA, ASP, ASN biosynthesis GLU, GLN biosynthesis ARG, PRO biosynthesis GLY, SER biosynthesis CYS, MET biosynthesis LYS, THR biosynthesis HIS biosynthesis PHE, TRP, TYR biosynthesis ILE, LEU, VAL biosynthesis Low High Figure 6.7: Effect of group knockouts on system. A. Change in CAT productivity when one (diagonal) or two (off-diagonal) reaction groups are turned off. B. Change in system state (only species for which data exist) when one (diagonal) or two (off-diagonal) reaction groups are turned off. Total-order effect for each group calculated as the sum of first-order effect and all pairwise effects. Larger and darker circles represent greater effects. 145 Glycolys P ise /Gnt lo us ce o nP eh oo gE s entne p n h er-D a s o te is u PT aC thA d o w c ry o ay c ff O lexidativ C e o pf ha oc sto pr hory A s lan tia op nl ero O tiv ce /Grfl lo yw ox m ylatF eo rl ea et te a b a m oe lis c m tions Pu tari bn oe l/ iP sy m A rL imA i, d A inS e G P,L AU S, NG L bN io syARG bi, o n P s th y e R n s O th is GL b e i sisY, oS sE yR n C t b heio ss isYS, M ynE t L T h Y e b s S i is , o T sH ynR t H h I b e S i sis ob sio yns the P yH ntE h s e is, T sR isP IL , E T, Y L RE bU i, o V sA ynL t hb eio ss isyn T tho eta sl i sorder coefficient response in the productivity (Fig. 6.7A) and overall system state (Fig. 6.7B) was calculated for single and pairwise deletion of each of these reaction groups. Lastly, the overall effect of the deletion of a pathway was estimated by summing the single and pairwise effects (summation across the columns of the response array). Glycolysis/gluconeogenesis and oxidative phosphorylation had the greatest effect on both productivity and system state. This supports previous studies that have suggested oxidative phosphorylation is occurring in a cell-free system [81]; Jewett and coworkers observed a decrease in CAT yield, ranging from 1.5-fold to 4-fold, when inhibiting oxidative phosphorylation reactions in the Cytomim cell-free plat- form, using both pyruvate and glutamate as substrates. CAT productivity was also affected by two sectors of amino acid biosynthesis: alanine/aspartate/asparagine, and glutamate/glutamine biosynthesis. Aspartate, glutamate, and glutamine are key reactants in the biosynthesis of many other amino acids, all of which are re- quired for CAT synthesis. Meanwhile, the TCA cycle and overflow metabolism (which included acetyl-coA/acetate reactions and the interconversion of pyruvate and lactate) also had a significant effect on the system state. These reactions di- rectly impacted key system species: succinate and malate in the TCA cycle, and acetate, pyruvate, and lactate in the overflow metabolism. In addition, the relative influence of transcription and translation parameters was interrogated by global sensitivity analysis [153]. Productivity was sensitive to the maximum reaction rate of transcription (coefficient of 0.43 ± 0.06), but was more sensitive to variations in the maximum reaction rate of translation (0.66 ± 0.08). Thus, translation appeared to be the limiting step of cell-free protein synthesis. 146 The energy efficiency of CAT production, as well as the sources of energy generation and consumption, were tracked for the best-fit set. Energy efficiency was calculated as the ratio of transcription and translation rates (weighted by the associated ATP costs of each step) to the amount of ATP generated by all sources. During the first phase of protein production, with glucose as the substrate, CAT was produced with a productivity of 8 µM/h and an energy efficiency of 10%. The organic acids that accumulated in the first phase (with the exception of acetate) were then utilized as substrates in the second phase, once glucose was depleted. We assumed the second phase of CAT production was powered largely by pyruvate; although malate was also consumed in the second phase, it accounted for only 11% of substrate consumption. Lactate accounted for a significant amount of substrate consumption, but was connected in the stoichiometry only to pyruvate. Thus, we considered the second phase as pyruvate-driven production. Interestingly, while this mode of protein production was slower (5 µM/h), it exhibited a higher energy efficiency (14%). Of the ATP generated, about half was observed to come from oxidative phosphorylation (R atp) in each of the two phases of production (Fig. 6.8A, Table 6.1). Another 30% was generated by glycolysis during the first phase (R pgk,R pyk), which decreased to approximately 20% following glucose exhaustion. However, glycolysis was also amongst the largest consumers of ATP during first phase of production (R glk atp, R pfk) (Table 6.2). The TCA cycle (R sucCD) contributed 3% to the overall rate of ATP generation in the first phase and 5% in the second. The hypothesis that pyruvate drives the second phase ex- plains this; stores of accumulated pyruvate can be converted to acetyl-CoA, as well 147 as OAA (via PEP), and thus power the TCA cycle just as when glucose was avail- able. Interestingly, ATP generation through acetate metabolism (R ackA) increased from 12% in the first phase to 28% in the second. The switch from glycolysis in the first phase, to consumption of organic acid reserves and increased acetate accumu- lation in the second phase, can also be seen in the reaction fluxes surrounding PEP and pyruvate (Fig. 6.8B). Lastly, amino acid degradation contributed a negligible amount to energy production. Taken together, while the efficiency of production was higher for the pyruvate-driven phase, it was still relatively low, suggesting that there is room for platform optimization. This strengthens the importance of glycolysis and oxidative phosphorylation, and presents a trade-off between productivity and energy efficiency in CFPS. A pgk 0.6 0.0 mRNA B 2PG0.5 0.0 0.03 0.7 3.4 eno 0.01pyk 0.01 1.5 tRNA 0.6 pck 0.3 sucCD 0.2 0.1 ATP 0.2 0.2 CTP 0.0 0.0 0.3 0.0 oxidative PEP OAA2.5 0.1 UTP ppcphosphorylation 1.1 0.1 NADP NADPH ackA 0.6 0.1 0.06 GTP 0.02 pps pyk 0.7 0.1 0.7 0.01 0.4 0.09 First phase 0.1 -0.2 0.3 Second phase 0.07 LAC ldh PYR pdh Ac-CoA NADP NADPH 0.2 NADP NADPH (normalized to first-phase -0.1 glucose uptake) Protein Figure 6.8: Key reaction fluxes of the network, in the first (gray boxes, top row) and second (gray boxes, bottom row) phases of metabolism. A. Fluxes of ATP genera- tion and consumption, and GTP consumption toward protein synthesis. B. Fluxes of glycolysis and lactate and acetate metabolism. Fluxes are normalized to the first-phase glucose uptake rate. For PEP and pyruvate, accumulation (normalized to glucose uptake) is also shown. 148 6.4 Discussion In this study, an ensemble of kinetic cell-free protein synthesis (CFPS) models was developed using dynamic metabolite measurements from an early glucose powered Cytomim E. coli cell-free extract. The hybrid cell-free modeling approach of Wayman and coworkers, [183], which integrates traditional kinetic modeling with a logic-based description of allosteric regulation, was employed to describe the time evolution of the CFPS reaction. The ensemble captured dynamic metabo- lite measurements over two orders of magnitude better than random parameter sets generated in the same region of parameter space. The ensemble captured the biphasic time course of CAT production, relying on glucose during the first hour and pyruvate and lactate following glucose exhaustion. Allosteric control was essential to the description of the organic acid trajectories; without allosteric control, pyruvate, lactate, succinate, and malate were predicted to be consumed more quickly following glucose exhaustion, to power CAT synthesis. However, CAT production was robust to the removal of allosteric control because the amino acids and energy species that are reactants for CAT synthesis were also not affected by allosteric control. The ensemble of kinetic models was then used to analyze the performance of the CFPS system, and to estimate the pathways most important to protein production. CAT was produced with an approximate aggregate energy ef- ficiency of 12%, suggesting that much of the energy resources for protein synthesis were diverted to non-productive pathways. By knocking out metabolic enzymes in groups, it was shown that metabolism and protein production in particular de- 149 pended upon oxidative phosphorylation and glycolysis /gluconeogenesis. Lastly, global sensitivity analysis suggested that the translation rate was more important to protein productivity than transcription. Taken together, this study provides a foundation for sequence-specific genome scale, dynamic modeling of cell-free E. coli protein synthesis that could be adapted to model the production of other proteins and synthetic circuits. The ensemble of models could serve as a surrogate to rationally design cell- free production processes to optimize production rate and energy efficiency. In analyzing the effect of reaction groups on CAT production and the system state, the regions of metabolism associated with substrate utilization and energy generation were the most important. Oxidative phosphorylation was vital, since it provided most of the energetic needs of CFPS. While it is unknown how active oxidative phosphorylation is compared to that of in vivo systems, this study suggested it was critical to CFPS performance. However, the biphasic operation of CFPS highlights the ability of the system to respond to an absence of glucose. During the first phase, central carbon metabolites accumulated with the majority of flux going toward acetate and some toward pyruvate, lactate, succinate and malate. While acetate continued to accumulate as a byproduct, the other organic acids were consumed as secondary substrates after glucose was no longer available. Glutamate also served as a substrate throughout both phases, powering amino acid synthesis. These results confirmed experimental findings that CAT production can be sustained by other substrates in the absence of glucose, providing alternative strategies to optimize CFPS performance. While CAT synthesis can be powered by other 150 substrates, the productivity was lower (5 µM/h, as opposed to 8 µM/h). This is in accordance with literature, where pyruvate provided a relatively slow but continuous supply of ATP [162]. Taken together, this shows CFPS can be designed towards a specified application, either requiring a slow stable energy source or faster production. Presented herein is the first dynamic model of E. coli cell-free protein synthesis. A hybrid modeling framework was applied to describe an experimental dataset for production of a model protein [181] and identified system limitations and areas of improvement for production efficiency. Having captured the system dynamics, ar- eas of improvement for CFPS performance were investigated. The model predicted CAT production with an energy efficiency of 10% under glucose consumption and 14% under pyruvate consumption. The accumulation of glycolytic interme- diates and byproducts such as acetate and carbon dioxide was responsible for this sub-optimal performance. If fluxes could be balanced such that intermediates were fully utilized, CAT production would increase. Theoretical estimations of the energy efficiency of an in vivo system can be as high as 80%, as found by our group [181] and others [116]. However, the corresponding experimental values are much lower; 16% in the case of our experimentally-constrained sequence-specific model [181]. Knocking out sections of network metabolism revealed that glycolysis/ gluconeogenesis and oxidative phosphorylation were the most important to CAT production and the system as a whole. Productivity was also heavily dependent on the synthesis reactions of alanine, aspartate, asparagine, glutamate, and glutamine, while TCA cycle and overflow reactions affected the system state. These findings 151 represent the first dynamic model of E. coli cell-free protein synthesis, an important step toward a functional genome scale description of cell-free systems. This work could be extended through further experimentation to gain a deeper understanding of system performance under a variety of conditions. Specifically, CAT produc- tion performed in the absence of amino acids could inform the system’s ability to synthesize them, while experimentation in the absence of glucose or oxygen could shed light on the importance of those substrates. Another extension of this study would be to apply its insights to other protein applications. CAT is only a test protein used for model identification; the modeling framework, and to some extent the parameter values, should be protein agnostic. However, it should be noted that the fully kinetic approach resulted in a model that was computationally expensive to solve, difficult to characterize, and arduous to interrogate. Future applications may benefit from alternate modeling strategies. For example, our group also employed a dynamic constraint-based approach to model CFPS [35]. This involved constraining the problem to hundreds of different combinations of measurements, and solving the model for each. That approach also captured the dynamics, and allowed the question of which measurements might best charac- terize a system to be explored. Approaching that question using the fully kinetic approach would have been untenable. However, constraint-based approaches depend on the accuracy of the measurements to which they are constrained. A kinetic approach can theoretically predict dynamics in the absence of data, if param- eters are well identified. Taken together, the dynamics of multiphasic metabolism and protein synthesis in CFPS were accurately captured, and the importance of 152 various pathways was interrogated toward improvement of production; however, other modeling approaches have advantages that make them well suited for future endeavors. 6.5 Materials and Methods 6.5.1 Cell-free protein synthesis and measurement. The protein synthesis reaction was conducted using a modified version of the PANOxSP protocol [82]. Briefly, the protein synthesis reaction was performed using the S30 extract in 1.5-mL Eppendorf tubes (working volume of 15 µL) and incubated in a humidified incubator at 37 ◦C. Plasmid pK7CAT was used as the DNA template for chloramphenical acetyl transferase (CAT) expression by placing the cat gene between the T7 promoter and the T7 terminator [92]. The plasmid was isolated and purified using a Plasmid Maxi Kit (Qiagen, Valencia CA). Cell-free reaction samples were quenched at specific timepoints with equal volumes of ice-cold 150 mM sulfuric acid to precipitate proteins. Protein synthesis of CAT was determined from the total amount of 14C-leucine-labeled product by trichloroacetic acid precipitation followed by scintillation counting as described previously [25]. Samples were centrifuged for 10 min at 12,000g and 4◦C. The supernatant was collected for high performance liquid chromatography (HPLC) analysis. HPLC analysis (Agilent 1100 HPLC, Palo Alto CA) was used to separate nucleotides 153 and organic acids, including glucose. Compounds were identified and quantified by comparison to known standards for retention time and UV absorbance (260 nm for nucleotides and 210 nm for organic acids) as described previously [25]. The standard compounds quantified with a refractive index detector included inorganic phosphate, glucose, and acetate. Pyruvate, malate, succinate, and lactate were quantified with the UV detector. The stability of the amino acids in the cell extract was determined using a Dionex Amino Acid Analysis (AAA) HPLC System (Sunnyvale, CA) that separates amino acids by gradient anion exchange (AminoPac PA10 column). Compounds were identified with pulsed amperometric electrochemical detection and by comparison to known standards. More details are available in the Materials and Methods section of Vilkhovoy et al. [181]. 6.5.2 Formulation and solution of the model equations. Cell-free protein synthesis was modeled using ordinary differential equations (ODEs) to estimate the time evolution of metabolite (xi), scaled enzyme activity (ei), transcription (m) and translation (P) in an E. coli cell-free metabolic network: dx Ri = ∑ σijrj (x, ffl, k) i = 1, 2, . . . ,M (6.1)dt j=1 dei = −λiei i = 1, 2, . . . , E (6.2)dt dm = r̄Tu− r̄d (6.3)dt dP = r̄X (6.4)dt 154 The quantityR denotes the number of metabolic reactions,M denotes the number of metabolites and E denotes the number of metabolic enzymes in the model. The quantity rj (x, ffl, k) denotes the rate of reaction j. Typically, reaction j is a non- linear function of metabolite and enzyme abundance, as well as unknown kinetic parameters k (K × 1). The quantity σij denotes the stoichiometric coefficient for species i in reaction j. If σij > 0, metabolite i is produced by reaction j. Conversely, if σij < 0, metabolite i is consumed by reaction j, while σij = 0 indicates metabolite i is not connected with reaction j. Lastly, λi denotes the scaled enzyme activity decay constant. The system material balances were subject to the initial conditions x (to) = xo and ffl (to) = 1 (initially we have 100% cell-free enzyme activity). Metabolic reaction rates were written as the product of a kinetic term (r̄j) and a control term (vj), rj (x, k) = r̄jvj. We used multiple saturation kinetics to model the reaction term r̄j: x r̄ max sj = Vj ei ∏ (6.5) s∈m− Kjs + xsj where Vmaxj denotes the maximum rate for reaction j, ei denotes the scaled enzyme activity which catalyzes reaction j, Kjs denotes the saturation constant for species s in reaction j, and m−j denotes the set of reactants for reaction j. The control term 0 ≤ vj ≤ 1 depended upon the combination of factors which influenced rate process j. For each rate, we used a rule-based approach to select from competing control factors. If rate j was influenced by 1, . . . , m factors, we ( ) modeled this relationship as vj = Ij f1j (·) , . . . , fmj (·) where 0 ≤ fij (·) ≤ 1 de- 155 notes a transfer function quantifying the influence of factor i on rate j. The function Ij (·) is an integration rule which maps the output of regulatory transfer functions to a control variable. We used Hill-like transfer functions and Ij ∈ {mean} in this study [183]. We included 17 allosteric regulation terms, taken from literature, in the CFPS model. PEP was modeled as an inhibitor for phosphofructokinase [99, 24], PEP carboxykinase [99], PEP synthetase [99, 32], isocitrate dehydrogenase [99, 130], and isocitrate lyase/malate synthase [99, 130, 114], and as an activator for fructose-biphosphatase [99, 39, 67, 68]. AKG was modeled as an inhibitor for citrate synthase [99, 138, 142] and isocitrate lyase/malate synthase [99, 114]. 3PG was modeled as an inhibitor for isocitrate lyase/malate synthase [99, 114]. FDP was modeled as an activator for pyruvate kinase [99, 198] and PEP carboxylase [99, 189]. Pyruvate was modeled as an inhibitor for pyruvate dehydrogenase [99, 85, 8] and as an activator for lactate dehydrogenase [132]. Acetyl-CoA was modeled as an inhibitor for malate dehydrogenase [99]. The symbol r̄T denotes the transcription rate, u denotes a promoter specific activation model, and r̄d denotes the transcript degradation rate. The transcription rate was modeled as: ( ) T GP xr̄ sT = kcat · RT T ∏ (6.6)KG + G KTP + xs∈m− s sT where kTcat denotes the maximum transcription rate, RT denotes the RNA poly- merase concentration, GP denotes the gene concentration, KTG denotes the gene 156 saturation constant, KTs denotes the saturation constant for species s, and m − T de- notes the set of reactants for transcription: ATP, GTP, CTP, UTP, and water. In this study, we considered only the T7 promoter; we have previously estimated u '0.95 for T7 [181]. Transcription was modeled as saturating with respect to gene concentration. However, transcription was not considered to result in any depletion of gene. Transcript degradation was modeled as first-order in transcript: r̄d = kd ·m (6.7) where kd denotes the transcript degradation rate constant. The symbol r̄X denotes the translation rate, which was modeled as: ( ) r̄ = kX m xs X cat · RX X ∏ X (6.8)KmRNA + m ∈ − Ks + xs m sX where kXcat denotes the maximum translation rate, RX denotes the ribosome con- centration, m denotes the transcript concentration, KXmRNA denotes the transcript saturation constant, KXs denotes the saturation constant for species s, and m − X de- notes the set of reactants for translation: GTP, water, and the 20 species representing tRNA charged with amino acids. Translation was modeled as saturating with respect to transcript concentration. However, translation was not considered to result in any depletion of transcript. 157 6.5.3 Estimation of kinetic model parameters. We estimated an ensemble of kinetic parameter sets using a constrained Markov Chain Monte Carlo (MCMC) random walk strategy. We have used this tech- nique previously to estimate numerically stable low-error parameter sets for signal transduction models [168, 169]. Starting from a small number of parameter sets estimated by inspection and literature, we calculated the cost function, equal to the sum-squared-error between experimental data and model predictions: [ T ( ) ]D w i 2 cost = ∑ iY2 ∑ yij − xi|t(j) (6.9)i=1 i j=1 where D denotes the number of datasets (D = 37), wi denotes the weight of the ith dataset, Ti denotes the number of timepoints in the ith dataset, t(j) denotes the jth timepoint, yij denotes the measurement value of the ith dataset at the jth timepoint, and xi|t(j) denotes the simulated value of the metabolite corresponding to the ith dataset, interpolated to the jth timepoint. Lastly, the cost function was ( ) scaled by the maximum experimental value in the ith dataset, Yi = maxj yij . We then perturbed each model parameter between an upper and lower bound that varied by parameter type: knewi = min (max (ki · exp(a · ri), li) , ui) i = 1, 2, . . . ,P (6.10) 158 where P denotes the number of parameters (P = 815), which includes 204 maxi- mum reaction rates (Vmax), 204 enzyme activity decay constants, 548 saturation constants (Kjs), and 34 control parameters, knewi denotes the new value of the i th parameter, ki denotes the current value of the ith parameter, a denotes a distribution variance, ri denotes a random sample from the normal distribution, li denotes the lower bound for that parameter type, and ui denotes the upper bound for that parameter type. Model parameters were constrained by literature collected using the BioNumbers database [122]. Transcription, translation, and mRNA degradation were bounded within a factor of two of their reference values. A characteristic cell-free enzyme concentration of 170 nM was calculated by diluting the one-tenth maximal concentration of lacZ (5 µM, BNID 100735) by a cell-free dilution factor of 30. This enzyme level was then used to calculate rate maxima from turnover numbers for various enzymes from BioNumbers (Table 6.4). Enzyme levels calcu- lated from the rate maxima of select reaction fluxes in the best-fit set and catalytic rates reported in the MOMENT study of Shlomi and coworkers [3] (Table 6.5) had a median value of 202 nM, well in agreement with this characteristic value. Rate maxima were bounded within one order of magnitude of the reference value where available; all other rate maxima were bounded within two orders of magnitude of the geometric mean of the available values. Enzyme activity decay constants were bounded between 0 and 1 h-1, corresponding to half lives of infinity and 42 minutes, respectively. Saturation constants were bounded between 0.0001 and 10 mM. Control gain parameters were bounded between 0.05 and 10 (dimensionless), while order parameters were bounded between 0.02 and 10 (dimensionless). 159 For each newly generated parameter set, we re-solved the balance equations and calculated the cost function. All sets with a lower cost were accepted into the ensemble. Sets with a higher cost were also accepted into the ensemble, if they satisfied the acceptance constraint: ( ) Runi f orm − · costnew − cost0,1 < exp α (6.11)cost where Runi f orm0,1 denotes a random number taken from a uniform distribution between 0 and 1, cost denotes the cost of the current parameter set, costnew denotes the cost of the new parameter set, and α denotes a tunable parameter to control the tolerance to high-error sets. A total of 3,875 sets were accepted into the initial ensemble, from which we selected N = 100 with minimal error for the final ensemble. Lastly, a random ensemble of 100 parameter sets was generated within the same parameter bounds as the trained ensemble. The randomized parameter sets were generated using a Monte Carlo approach: each parameter was taken from a uniform distribution constructed between its upper and lower bounds. The model equations were then solved and the cost function and the Akaike information criterion (AIC) were calculated for each of the 37 separate experimental datasets. 160 6.5.4 Reaction group knockouts. The metabolic network was divided into 19 reaction groups: glycolysis/ gluconeogenesis, pentose phosphate, Entner-Doudoroff, TCA cycle, oxidative phosphorylation, cofactor reactions, anaplerotic/glyoxylate reactions, overflow metabolism, folate synthesis, purine/pyrimidine reactions, alanine/aspartate/ asparagine synthesis, glutamate/glutamine synthesis, arginine/proline synthesis, glycine/serine synthesis, cysteine/methionine synthesis, threonine/lysine synthe- sis, histidine synthesis, tyrosine/tryptophan/phenylalanine synthesis, and valine/ leucine/isoleucine synthesis. Each reaction group and pair of reaction groups were removed and the model was re-solved; the CAT productivity was then calculated and subtracted from that of the base case (no knockouts): Pii = |∆CAT− ∆CAT∆Ri | (6.12) Pij = |∆CAT− ∆CAT∆Ri∆Rj | (6.13) Ptotali = Pii + ∑ Pij (6.14) j where Pii denotes the first-order productivity knockout effect for reaction group i, Pij denotes the pairwise productivity knockout effect for reaction groups i and j, Ptotali denotes the total-order productivity knockout effect for reaction group i, ∆CAT denotes the base case CAT productivity, ∆CAT∆Ri denotes the CAT productivity when reaction group i is knocked out, ∆CAT∆Ri∆Rj denotes the CAT productivity 161 when reaction groups i and j are knocked out, and |x| denotes the absolute value of x. The system state, defined as the model predictions for all species for which experimental data exists, was also recorded for each knockout and compared to the base case: S = ||xdata − xdataii ∆R ||2 (6.15)i S = ||xdata − xdataij ∆R ∆R || (6.16)i j 2 Stotali = Sii + ∑ Sij (6.17) j where Sii denotes the first-order system state knockout effect for reaction group i, Sij denotes the pairwise system state knockout effect for reaction groups i and j, Stotali denotes the total-order system state knockout effect for reaction group i, x data denotes the base-case system state, xdata∆R denotes the system state when reactioni group i is knocked out, xdata∆R ∆R denotes the system state when reaction groups ii j and j are knocked out, and ||x||2 denotes the l2 norm of x. In order to not dominate the colorbar, the total-order knockout effects were normalized to the same ranges as the main arrays (first-order and pairwise effects). 162 6.5.5 Sensitivity of CAT productivity to transcription and trans- lation. The catalytic rates of transcription and translation were sampled within one order of magnitude on each side from the best-fit values. The parameter bounds were set as the base-10 logarithms of the upper and lower bound for each rate; then, 10 was taken to the power of each parameter sample to obtain the catalytic rates: [ ( ) ( )] kT,sample ∈ log kT,b f T,b fcat 10 cat /10 , log10 kcat ∗ 10 (6.18) [ ( ) ( )] kX,samplecat ∈ log k X,b f 10 cat /10 , log k X,b f 10 cat ∗ 10 (6.19)( ) kT,sample kX,sample∆CAT = f 10 cat , 10 cat (6.20) where kT,samplecat denotes the sample of the transcription catalytic rate, k X,sample cat denotes the sample of the translation catalytic rate, kT,b fcat denotes the best-fit value of the transcription catalytic rate, and kX,b fcat denotes the best-fit value of the translation catalytic rate. The sampling was performed using the Sensitivity Analysis Library in Python (Numpy) with 3,000 samples [65]. 163 6.5.6 Calculation of energy efficiency. Energy efficiency was calculated as the ratio of transcription and translation (weighted by the appropriate energy species coefficients) to ATP generation: ∆τmRNA · αT∫+ ∆τCAT · αEfficiency = X (6.21) ∑ σATPj r̄j j∈{R τATP} αT = 2 · (ATPT + CTPT + GTPT + UTPT) (6.22) αX = 2 ·ATPX + GTPX (6.23) where ∆τmRNA denotes the net accumulation of mRNA in phase τ (first, second, or overall), ∆τCAT denotes the net accumulation of protein in phase τ, αT denotes the energy cost of transcription, αX denotes the energy cost of translation, RATP denotes the set of ATP-producing reactions, and σATPj denotes the ATP coefficient for reaction j. ATPT, CTPT, GTPT, UTPT denote the stoichiometric coefficients of each energy species for transcription, and ATPX, GTPX denote the stoichiometric coefficients of ATP and GTP for translation. During transcription and tRNA charg- ing, triphosphate molecules are consumed with monophosphates as byproducts; this is the reason for the factors of 2 on ATPT, CTPT, GTPT, UTPT, and ATPX. 164 6.5.7 Availability of model code. The cell-free model equations and the parameter estimation procedure were imple- mented in the Julia programming language [16]. The model equations were solved using the CVODE solver of the SUNDIALS suite [66], with an absolute tolerance and relative tolerance of 1e−9; any parameter sets exhibiting CVODE errors were discarded. Thus, the numerical stability of all parameter sets in the ensemble was ensured. The model code and parameter ensemble is freely available under an MIT software license and can be downloaded from the Varnerlab website [179]. 6.6 Acknowledgements This study was supported by a National Science Foundation Graduate Research Fellowship (DGE-1333468) to N.H. Research reported in this publication was also supported by the Systems Biology Coagulopathy of Trauma Program with support from the US Army Medical Research and Materiel Command under award number W911NF-10-1-0376. 165 Table 6.1: Breakdown of ATP generation. Flux through ATP-generating pathways in the first and second phases as percentages of total ATP generation in that phase. Name Index Reaction Phase 1 Phase 2 R pgk 12 13DPG + ADP →3PG + ATP 14% 21% R pyk 18 ADP + PEP →ATP + PYR 16% <1% R sucCD 45 ADP + Pi + SUCCOA →ATP + COA + SUCC 3% 5% R atp 55 ADP + Pi + 4 He →ATP + 4 H + H O 54% 46%2 R ackA 68 ACTP + ADP →AC + ATP 12% 28% R asn deg 102 ASN + AMP + PPi →NH + ASP + ATP <1% <1%3 R thr deg3 109 THR + Pi + ADP →NH3 + FOR + ATP + PROP <1% <1% 166 Table 6.2: Breakdown of ATP consumption. Flux through ATP-consuming path- ways in the first and second phases as percentages of total ATP consumption in that phase. Name Index Reaction Phase 1 Phase 2 R glk atp 1 ATP + GLC →ADP + G6P + H 22% <1% R pfk 4 ATP + F6P →ADP + FBP 24% <1% R pps 22 ATP + H2O + PYR →AMP + PEP + P 1% 1%i R acs 70 AC + ATP + COA →ACCOA + AMP + PP 8% 19%i R glnA 86 GLU + ATP + NH3 →GLN + ADP + P 1% 2%i R atp amp 152 ATP + H2O →AMP + PP 6% 13%i R udp utp 160 UDP + ATP →UTP + ADP 3% 6% R cdp ctp 161 CDP + ATP →CTP + ADP 4% 8% R gdp gtp 162 GDP + ATP →GTP + ADP 3% 4% R atp ump 163 ATP + UMP →ADP + UDP 1% 3% R atp cmp 164 ATP + CMP →ADP + CDP 2% 3% R adk atp 166 AMP + ATP →2 ADP 18% 35% tRNA charg- 185-204 AA + tRNA + ATP + H2O → ing AA·tRNA + AMP + PP 2% 2% i Other 4% 4% 167 Table 6.3: Mean and standard deviation of Akaike information criterion (AIC), by measurement, for the ensemble and random ensemble. Measurement ¯Ens œEns ¯Rand œRand Rand EnsAIC AIC AIC AIC ¯AIC − ¯AIC GLC 65.4 2.1 103.9 0.6 38.5 CAT -23.0 10.5 -5.2 <0.1 17.8 PYR 64.8 10.3 84.7 0.7 19.9 LAC 70.7 4.5 88.9 <0.1 18.2 AC 79.4 6.0 96 2.1 16.6 SUCC 59.6 3.4 55.5 4.1 -4.1 MAL 60.8 4.1 71.6 6.3 10.8 ATP 51.1 3.3 69.1 <0.1 18.0 ADP 39.8 3.7 53.2 4.7 13.4 AMP 32.9 1.5 75.1 5.7 42.2 GTP 53.4 1.6 68.2 <0.1 14.8 GDP 45.7 2.9 43.6 9.5 -2.1 GMP 46.5 4.2 46.1 12.5 -0.4 CTP 44.9 2.6 58.5 <0.1 13.7 CDP 38.8 1.6 50.7 8.2 11.8 CMP 32.1 4.0 51.9 9.1 19.8 UTP 55.6 5.2 53 <0.1 -2.7 UDP 28.2 4.6 51.9 11.5 23.6 UMP 35.3 3.3 72.3 7.3 36.9 ALA 66.4 4.4 100.5 1.1 34.1 ASN 53.7 1.5 67.6 3.8 13.8 ASP 65.9 2.5 79.5 <0.1 13.6 CYS 60.5 3.1 74 <0.1 13.5 GLN 54.3 5.6 84.7 <0.1 30.4 GLY 47.2 12.7 75.5 11.7 28.3 HIS 46.3 6.2 43.2 3.2 -3.2 ILE 53.3 3.8 48.4 4.8 -5.0 LEU 41.5 6.5 52.5 4.6 10.9 LYS 68.4 2.0 73.9 0.2 5.5 MET 55.9 1.0 57.4 4 1.5 PHE 43.4 5.9 57.7 8.3 14.3 PRO 54.4 2.8 47.9 6.7 -6.5 SER 65.9 4.1 81.4 <0.1 15.6 THR 28.2 5.5 63.2 14.9 35.0 TRP 31.2 5.7 79.9 1.4 48.6 TYR 39.3 2.0 36.7 5.4 -2.6 VAL 51.3 3.1 55.5 4.6 4.1 168 Table 6.4: Reference values for reaction rate maxima (Vmax) from BioNumbers. Vmax values calculated from turnover numbers (kcat) from BioNumbers, and a characteristic enzyme concentration of 170 nM. Characteristic rate maximum for all other reactions calculated as geometric mean of calculated rate maxima. . Enzyme Reaction k -1cat (min ) Vmax (mM/h) BNID# Serine dehydrase R ser deg 10400 104 101119 Isocitrate dehydrogenase R icd 11900 119 101152 Lactate dehydrogenase R ldh 5800 58 101036 R aspC Aspartate transaminase R tyr 25800 258 101108 R phe Enolase R eno 13200 132 101028 Pyruvate kinase R pyk 25000 250 101029101030 Malic enzyme R maeAR maeB 35400 354 101167 Phosphofructokinase R pfk 554400 5544 104955 Malate dehydrogenase R mdh 33000 330 101163 Citrate Synthase R gltA 42000 420 101149 R zwf 6PG dehydrogenase R pgl 3200 32 101048 R gnd Succinate dehydrogenase R sdh 121 1.21 101162 Succinyl-coA synthetase R sucCD 4700 47 101158 3PGA dehydrogenase R gpm 1100 11 101135 PEP carboxylase R ppc 35400 354 101139 3PGA kinase R pgk 4300 43 101016 Characteristic Vmax 110 169 Table 6.5: Enzyme levels for key reaction fluxes, calculated from enzyme turnover numbers [3] and rate maxima from the best-fit set. . Enzyme Reaction kcat (min -1), V (mM/h), Enzymemax MOMENT best-fit set Level (nM),calculated Isocitrate dehydrogenase R icd 1700 37 356 Lactate dehydrogenase R ldh 52500 35 11 Aspartate transaminase R aspC 4900 39 130 Pyruvate kinase R pyk 8100 610 1250 Malic enzyme R maeA 8100 46 96 Malic enzyme R maeB 4000 66 274 Phosphofructokinase R pfk 5000 15600 51800 Malate dehydrogenase R mdh 43700 33 13 Succinate dehydrogenase R sdh 10000 4.9 8.2 Succinyl-coA synthetase R sucCD 1500 250 2690 Median 202 170 Table 6.6: Reference values for transcription, translation, and mRNA degradation from literature. Transcription rate calculated from elongation rate, mRNA length, and promoter activity level. Translation rate calculated from elongation rate, protein length, and polysome amplification constant. mRNA degradation rate calculated from mRNA degradation time. Description Parameter Value Units Reference T7 RNA polymerase concentration RT 1.0 µM Ribosome concentration RX 2 µM [52] Transcription saturation coefficient KT 100 nM estimated Translation saturation coefficient KX 45 µM estimated Transcription elongation rate v̇T 25 nt/s [52] CAT mRNA length lG 660 nt [92] Promoter activity level (u ) 0.9 estimated v̇ Transcription rate kT = Tcat u 123 h -1 calculated lG Translation elongation rate v̇X 1.5 aa/s [52] CAT protein length lP 219 aa [92] Polysome amplification constant (KP ) 10 estimated v̇ Translation rate kX X -1cat = KP 247 h calculatedlP mRNA degradation time t1/2 8 min BNID 106253 ln(2) mRNA degradation rate k -1deg = 5.2 h calculatedt1/2 ATP transcription coefficient ATPT 176 calculated CTP transcription coefficient CTPT 144 calculated GTP transcription coefficient GTPT 151 calculated UTP transcription coefficient UTPT 189 calculated ATP tRNA charging coefficient ATPX 219 calculated GTP translation coefficient GTPX 438 calculated 171 CHAPTER 7 JUPOETS: A CONSTRAINED MULTIOBJECTIVE OPTIMIZATION APPROACH TO ESTIMATE BIOCHEMICAL MODEL ENSEMBLES IN THE JULIA PROGRAMMING LANGUAGE 7.1 Abstract 1 Ensemble modeling is a promising approach for obtaining robust predictions and coarse grained population behavior in deterministic mathematical models. Ensemble approaches address model uncertainty by using parameter or model families instead of single best-fit parameters or fixed model structures. Parameter ensembles can be selected based upon simulation error, along with other criteria such as diversity or steady-state performance. Simulations using parameter ensem- bles can estimate confidence intervals on model variables, and robustly constrain model predictions, despite having many poorly constrained parameters. In this software note, we present a multiobjective based technique to estimate param- eter or models ensembles, the Pareto Optimal Ensemble Technique in the Julia programming language (JuPOETs). JuPOETs integrates simulated annealing with Pareto optimality to estimate ensembles on or near the optimal tradeoff surface between competing training objectives. We demonstrate JuPOETs on a suite of multiobjective problems, including test functions with parameter bounds and sys- 1Adapted with permission from Bassen DM, Vilkhovoy M, Minot M, Butcher JT and Varner JD, ”JuPOETs: a constrained multiobjective optimization approach to estimate biochemical model ensembles in the Julia programming language” (2017) BMC Systems Biology, 11(10). 172 tem constraints as well as for the identification of a proof-of-concept biochemical model with four conflicting training objectives. JuPOETs identified optimal or near optimal solutions approximately six-fold faster than a corresponding implementa- tion in Octave for the suite of test functions. For the proof-of-concept biochemical model, JuPOETs produced an ensemble of parameters that gave both the mean of the training data for conflicting data sets, while simultaneously estimating parame- ter sets that performed well on each of the individual objective functions. JuPOETs is a promising approach for the estimation of parameter and model ensembles using multiobjective optimization. JuPOETs can be adapted to solve many problem types, including mixed binary and continuous variable types, bilevel optimization problems and constrained problems without altering the base algorithm. JuPOETs is open source, available under an MIT license, and can be installed using the Julia package manager from the JuPOETs GitHub repository 7.2 Introduction Ensemble modeling is a promising approach for obtaining robust predictions and coarse grained population behavior in deterministic mathematical models. It is often not possible to uniquely identify all the parameters in biochemical models, even when given extensive training data [50]. Thus, despite significant advances in standardizing biochemical model identification [54], the problem of estimat- ing model parameters from experimental data remains challenging. Ensemble 173 approaches address parameter uncertainty in systems biology and other fields like weather prediction [14, 100, 21, 135] by using parameter families instead of single best-fit parameter sets. Parameter families can be selected based upon simulation error, along with other criteria such as diversity or steady-state performance. Sim- ulations using parameter ensembles can estimate confidence intervals on model variables, and robustly constrain model predictions, despite having many poorly constrained parameters [59, 158]. There are many techniques to generate parameter ensembles. Battogtokh et al., Brown et al., and later Tasseff et al. generated experi- mentally constrained parameter ensembles using a Metropolis-type random walk [14, 21, 168, 169]. Liao and coworkers developed methods to generate ensembles that all approach the same steady-state, for example one determined by fluxomics measurements [174]. They have used this approach for model reduction [? ], strain engineering [33, 165] and to study the robustness of non-native pathways and network failure [105]. Maranas and coworkers have also applied this method to develop a comprehensive kinetic model of bacterial central carbon metabolism, including mutant data [91]. We and others have used ensemble approaches, gen- erated using both sampling and optimization techniques, that have robustly sim- ulated a wide variety of signal transduction processes [112, 158, 168, 169, 125], neutrophil trafficking in sepsis [157], patient specific coagulation behavior [111], uncertainty quantification in metabolic kinetic models [5] and to capture cell to cell variation [106]. Further, ensemble approaches have been used in synthetic biology to sample possible biocircuit configurations [134]. Thus, ensemble approaches are widely used to robustly simulate a variety of biochemical systems. 174 Identification of biochemical models requires significant training data perhaps taken from diverse sources. These real-world data sets often contain intrinsic con- flicts resulting from, for example, the use of different cell lines, different measure- ment technologies, different reagent vendors or lots, uncontrollable experimental artifacts or general cross laboratory variability. Parameter ensembles that optimally balance these inherent conflicts lead to more robust model performance. Multiob- jective optimization is an ensemble generation technique that naturally balances conflicts in noisy training data [63]. Multiobjective optimization has been used to identify signal transduction models [106, 158], for the design of synthetic circuits [134], to design the folding behaviors of novel RNAs [166], to design bioprocesses [151], and to understand bacterial adaptation [7]. Thus, it is a widely used ap- proach for a variety of biochemical applications. Previously, we developed the Pareto Optimal Ensemble Technique (POETs) algorithm to address the challenge of competing or conflicting training objectives. POETs, which integrates simulated annealing (SA) and multiobjective optimization through the notion of Pareto rank, estimates parameter ensembles which optimally trade-off between competing (and potentially conflicting) experimental objectives [155]. However, the previous im- plementation of POETs, in the Octave programming language [41], suffered from poor performance and was not configurable. For example, Octave-POETs does not accommodate user definable objective functions, bounds and problem constraints, cooling schedules, different variable types e.g., a mixture of binary and continuous design variables or custom diversity generation routines. Octave-POETs was also not well integrated into a package or source code management (SCM) system. 175 Thus, upgrades to the approach containing new features, or bug fixes were not centrally managed. 7.3 Implementation In this software note, we present an open-source implementation of the Pareto op- timal ensemble technique in the Julia programming language (JuPOETs). JuPOETs takes advantage of the unique features of Julia to address many of the shortcom- ings of the previous implementation. Julia is a cross-platform, high-performance programming language for technical computing that has performance comparable to C but with syntax similar to MATLAB/Octave and Python [16]. Julia also offers a sophisticated compiler, distributed parallel execution, numerical accuracy, and an extensive function library. Further, the architecture of JuPOETs takes advantage of the first-class function type in Julia allowing user definable behavior for all key aspects of the algorithm, including objective functions, custom diversity generation logic, linear/non-linear parameter constraints (and parameter bounds constraints) as well as custom cooling schedules. Julia’s ability to naturally call other languages such as Python or C also allows JuPOETs to be used with models implemented in a variety of languages across many platforms. Additionally, Julia offers a built-in package manager which is directly integrated with GitHub, a popular web-based Git repository hosting service offering distributed revision control and source code management. Thus, JuPOETs can be adapted to many problem types, including 176 mixed binary and continuous variable types, bilevel problems and constrained problems without altering the base algorithm, as was required in the previous POETs implementation. 7.3.1 JuPOETs optimization problem formulation. JuPOETs solves the K−dimensional constrained multiobjective optimization prob- lem:     O1 (x(t, p), p) min ... (7.1)p    OK (x(t, p), p) subject to the model equations and constraints: f(t, x(t, p), ẋ(t, p), u(t), p) = 0 g1 (t, x(t, p), u(t), p) ≥ 0 ... gC (t, x(t, p), u(t), p) ≥ 0 and parameter bound constraints: L ≤ p ≤ U 177 The quantity O denotes the jthj objective function (j = 1, 2, . . . ,K), typically the sum of squared errors for the jth data set for biochemical modeling applications. The terms f(t, x(t, p), ẋ(t, p), u(t), p) denote the system of model equations (e.g., differential equations, differential algebraic equations or linear/non-linear alge- braic equations) where p denotes the decision variable vector e.g., unknown model parameters (D× 1). In typical biochemical modeling applications, the model equa- tions f (·) are a system of continuous real-valued non-linear differential equations that comprise a kinetic model, but other types of models e.g., stoichiometric models are also common. The quantity t denotes time, x (t, p) denotes the model state (with an initial state x0), and u(t) denotes an input vector. The decision variables (e.g., kinetic parameters) can be subject to bounds constraints, where L and U denote the lower and upper bounds, respectively as well as C problem specific constraints gi (t, x(t, p), u(t), p) , i = 1, . . . , C. The decision variables p are typically real-valued kinetic constants, or metabolic fluxes in the case of stoichiometric mod- els. However, other variables types e.g., binary or categorical decision variables can also be accommodated. JuPOETs integrates simulated annealing (SA) [97] with Pareto ranking to esti- mate decision variables on or near the optimal tradeoff surface between competing objectives (Fig. 7.1 and Algorithm 1). A tradeoff surface defines the best possible performance for every conflicting objective, such that an increase in the perfor- mance of one objective does not decrease the performance of at least one other objective. Pareto rank is a scalar measure of distance away from the optimal trade- off surface (low rank is near the surface, while higher ranks are progressively 178 PaPraarmaemteerte Sr psapcaece ObjeOctbivjeec ftuivnec stipoanc Sepace pkin objm random walk k2 obj2 pk j1 obj1 “Pareto-optimal front” Figure 7.1: Schematkic : poaframeuteltr ivoebctjoerctive parameter mapping. The performance of any given parameterE(ske)t :i ms umltia-opbpjeecdtivien ctosta fnunoctbiojenc vteicvteors pace using a ranking function which quantifies the quali(tEy(ko)f=t(hE1e(kp),aEr2a(km),.e..t,EerNs(k. )T))he distance away from the optimal tradeoff surface is qKu : aann atricfiheivde ouf sthine gcurtrhenet ePsatimreatteo orf athnek einnsegmsbcleheme of Fonseca and Fleming in JuPOETsra.nk(k|K) : a Pareto-optimal rank based dominance measure further away). Thusk, =th keinict %en tthrea sltairdtienag puonindt eofr lpyarianmgetPerOs ETs is a mapping between the T = T0 % initial annealing temperature value of the objective vector evaluated at pi+1 (decision variable guess at iterationRepeat i + 1) and the scalar Parektonewr =a npekrt(uFrbi g(k. c7urr.e1nt)) . Traditional simulated annealing uses % Generate a new parameter guess (random a scalar performanc e valuwealek .ogr. ,losciaml suealracthi)o n error to make a probabilistic decision Calculate E(knew) and rank(knew|K) to keep or reject a se t of dPeaccceipst(ikonnew,v kacurrrieant)b ≡le esx;pd{-eracniks(ikonnew|vKa) /r Tia}bles with better perfor- if Paccept(knew, kcurrent) > rand(0,1) mance are always a ccepte d, whMiolveet tho oksneew with worse performance are sometimes Update the archive K accepted depending uponenadipf arameter called the temperature. On the other hand, T=annealing(T) JuPOETs makes thisEnsdaRmepeeadt (eucnitsili tohne tuersmiinngatitohn ecoPndairtieotno isr saantiskfieind)stead of a single per- formance objective. The problem of estimating biochemical model parameters from experimental data is typically posed as an error minimization problem over continuous real-valued decision variables (model parameters) subject to the model equations. A parameter set pi+1 lies along the optimal tradeoff surface if no other 179 parameter guess leads to decreased error for every objective. JuPOETs calculates the performance of a candidate parameter set pi+1 by calling the user defined objective function; objective takes a parameter set as an input, evaluates the model equations, and using this solution, returns the K× 1 objective vector. Can- didate parameter sets are generated by the user supplied neighbor function; the default implementation of neighbor is a random perturbation, however other per- turbation logic can be implemented by the user. The error vector associated with pi+1 is ranked using the builtin Pareto rank function, by comparing the error at iteration i + 1 to the error archive Oi (all error vectors up to iteration i meeting a ranking criterion). Parameter sets on or near the optimal trade-off surface between the objectives have a rank equal to 0 (no other current parameter sets are better). These rank zero parameter sets define the Pareto optimal group for the ensemble, wherein Pareto optimality is defined as a parameter set not being dominated by any other sets within the ensemble. Sets with increasing non-zero rank are pro- gressively further away from the optimal trade-off surface. Thus, a parameter set with a rank = 0 is better in a trade-off sense than rank > 0. We implemented the Fonseca and Fleming ranking scheme in the builtin rank function [46]: rank (Oi+1 (pi+1) | Oi) = r (7.2) where rank r is the number of parameter sets that dominate (are better than) parameter set pi+1, and Oi+1 (pi+1) denotes the objective vector evaluated at pi+1. We used the Pareto rank to inform the SA calculation. The parameter set pi+1 180 was accepted or rejected by the SA at each iteration, by calculating an acceptance probability P (pi+1): P(pi+1) ≡ exp {−rank (Oi+1 (pi+1) | Oi) /T} (7.3) where T is the simulated annealing temperature; the temperature provides control over how strictly decreasing Pareto rank is enforced. As rank (Oi+1 (pi+1) | Oi)→ 0, the acceptance probability moves toward one, ensuring that we explore parame- ter sets along the Pareto surface. Occasionally, (depending upon T) a parameter set with a high Pareto rank is accepted by the SA allowing a more diverse search of the parameter space. However, as T is reduced as a function of iteration count (using the cooling function), the probability of accepting a high-rank set decreases. Parameter sets could also be accepted by the SA but not permanently archived in Si, where Si is the solution archive. Only parameter sets with rank less than or equal to a threshold (rank ≤4 by default) are included in Si, where the archive is re-ranked and filtered after accepting every new parameter set. Parameter bounds were implemented in the neighbor function as box constraints, while problem specific constraints were implemented in objective using a penalty method: C { } Oi + λ ∑ min 0, gj (t, x(t, p), u(t), p) i = 1, . . . ,K (7.4) j=1 where λ denotes the penalty parameter (λ = 100 by default). However, because both the neighbor and objective functions are user defined, different constraint 181 implementations are easily defined. To use JuPOETs, the user specifies the neighbor, acceptance, cooling and objective functions along with an initial decision variable guess. Default im- plementations of the neighbor, acceptance and cooling functions can be used directly, or they can be overridden by user defined logic. However, the user must provide an implementation of the objective function and provide an initial deci- sion variable guess. Lastly, if the user is operating JuPOETs in hybrid mode, then a refinement function pointer must also be specified. Hybrid mode temporarily switches the search from a multiobjective to a single objective problem, where the sum of the objective functions can be used to update the best (or initial) param- eter guess. The specific hybrid mode search logic is up to the user; by default hybrid mode is off, and the default refinement implementation is simply a pass through function. However, we have shown previously that POETs operated in hybrid mode (where the single objective problem used a pattern search approach) had better performance that POETs alone [155]. Thus, hybrid mode is generally recommended for most applications. In addition, there are several user config- urable parameters that can be adjusted to control the performance of JuPOETs: maximum number of iterations controls the number of iterations per temperature (default 20); rank cutoff controls the upper rank bound on the solution archive (default 5); temperature min controls the minimum temperature after which JuPO- ETs returns the error and solution archives (default 0.001); show trace controls the level of output shown to the user (default true). After the completion of the run, JuPOETs returns the parameter solution archive S , objective archive O and rank 182 archive R. The parameter solution archive S contains is an D ×A array, where A denotes the number of solutions in the archive when JuPOETs terminated. On the other hand, the objective archive O is an K ×A array containing the perfor- mance values for each objective corresponding the columns of S . Lastly, JuPOETs returns the rank archiveR which is an A× 1 array of Pareto ranks corresponding to the columns of S . One technical note, if JuPOETs is run from multiple starting locations, and the archives from each of these runs is combined into a single collec- tive archive, the combined parameter rank archive may become invalid. In these cases, it is required to re-rank the parameter sets using the built-in rank function to produce a collective parameter ranking. 7.4 Availability of data and materials JuPOETs is open source, available under an MIT software license. The JuPO- ETs source code is freely available from the JuPOETs GitHub repository at https://github.com/varnerlab/POETs.jl. All samples used in this study are in- cluded in the sample/biochemical and sample/test functions subdirectories of the JuPOETs GitHub repository. 183 input :User specified objective function, and initial guess (D × 1). User can also specify custom neighbor, acceptance. cooling and refinement functions or use the default functions provided. Output :Rank archiveR (A× 1), parameter solution archive S (D ×A) and objective archive O (K×A), where A denotes the number of accepted solutions 1 initialize: R, S and O using initial guess po ; 2 initialize: T←1.0; 3 initialize: Tmin ←1/10000; 4 initialize: Maximum number of steps per temperature I ; // Call to local refinement function (single objective problem) 5 po ← user-function:refinement(po); 6 while T > Tmin do 7 i← 1; 8 while i< I do // Generate a new parameter solution using user neighbor function 9 pi+1 ← user-function::neighbor(p∗); // Evaluate pi+1 using user objective function 10 oi+1 ← user-function::objective(pi+1); 11 Add pi+1 to solution archive S ; 12 Add oi+1 to objective archive O; // Calculate Pareto rank of solutions in O using builtin rank function 13 R ← builtin-function::rank(O); // Accept pi+1 into the archive with user defined probability 14 P ← user-function::acceptance(R,T); 15 if P >rand then // Update the best solution with pi+1 16 p∗ ← pi+1; 17 prune S ,R and O of all solutions above a rank threshold; 18 else 19 Remove pi+1 from solution archive S ; 20 Remove oi+1 from error archive O; 21 end 22 i← i + 1; 23 end // Update T using the user cooling function 24 T← user-function::cooling(T); 25 end Algorithm 1: Pseudo-code for the JuPOETs run-loop. The user must specify the objective function and an initial parameter guess. The user can optionally specify the neighbor, acceptance, cooling and refinement functions (or use the default implementations). The rank archiveR, solution archive S and objective archive O are initialized from the initial guess. The initial guess (potentially following a single objective local refinement step) is perturbed in the neighbor function, which generates a new solution whose performance is evaluated using the user supplied objective function. The new solution and objective values are then added to the respective archives and ranked using the builtin rank function. If the new solution is accepted (based upon a probability calculated with the user supplied acceptance function) it is added to the solution and objective archive. This solution is then perturbed during the next iteration of the algorithm. However, if the solution is not accepted, it is removed from the archive and discarded. The temperature is adjusted using the user supplied cooling function after each I iterations. When JuPOETs terminates, the parameter solution archive S , objective archive O and rank archiveR are returned to the caller. 184 7.5 Results and Discussion JuPOETs identified optimal or nearly optimal solutions significantly faster than Octave-POETs for a suite of multiobjective algebraic test problems (Table 7.1). The algebraic test problems were constrained non-linear functions with bound con- straints and additional non-linear constraints on the decision variables in one case. The problems had up to three-dimensional continuous real-valued decision vectors, and each case had two objective functions. The wall-clock time for JuPOETs and Octave-POETs was measured for 10 independent trials for each of the test problems. The same cooling, neighbor, acceptance, and objective logic was employed be- tween the implementations, and all other parameters were held constant. For each test function, the search domain was partitioned into 10 segments, where an initial parameter guess was drawn from each partition. The number of search steps for each temperate was I = 10 for all cases, and the cooling parameter was α = 0.9. On average, JuPOETs identified optimal or near optimal solutions for the suite of test problems six-fold faster (60s versus 400s) than Octave-POETs (Fig. 7.2). JuPOETs produced the characteristic tradeoff curves for each test problem, given both decision variable bound and problem constraints (Fig. 7.3). Thus, JuPOETs estimated an ensemble of solutions to constrained multiobjective algebraic test problems significantly faster than the current Octave implementation. Next, we tested JuPOETs on a proof-of-concept biochemical model identification problem. JuPOETs estimated an ensemble of biochemical model parameters that were consistent with the mean of synthetic training data (Fig. 7.4). Four synthetic train- 185 Name Dimension Function Domain Constraints Schaffer O 21 (x) = x1 10  x  10function 2O2 (x) = (x 2) 2 2 Binh and Korn O1 (x, y) = 4x 2 + 4y2 0  x  5 g1 (x, y) = (x 5) + y  25 function 2 2 2 2 2O2 (x, y) = (x 5) + (y 5) 0  y  3 g2 (x, y) = (x 8) + (y + 3) 7.7 ✓ ◆ !N 2 Fonseca and X 1O (x ) = 1 exp x p 4  x  4 Fleming function 3 1 i i iN i=1X !N ✓ ◆21 O2 (xi) = 1 exp xi + p N i=1 Table 7.1: Multi-objective optimization test problems. We tested the JuPOETs implementation on three two-dimensional test problems, with one-, two- and three- dimensional parameter vectors. Each problem had parameter bounds constraints, however, on the Binh and Korn function had additional non-linear problem con- straints. For the Fonesca and Fleming problem, N = 3. ing data sets were generated from a prototypical biochemical network consisting of 6 metabolites and 7 reactions (Fig. 7.4, inset right). We considered a common case in which the same extracellular measurements of Ae, Be, Ce and cellmass were made on four hypothetical cell types, each having the same biological connectivity but different performance. Network dynamics were modeled using the hybrid cybernetic model with elementary modes (HCM) approach of Ramkrishna and coworkers [95]. In the HCM approach, metabolic networks are first decomposed into a set of elementary modes (EMs) (chemically balanced steady-state pathways, see [150]). Dynamic combinations of elementary modes are then used to character- ize network behavior. Each elementary mode is catalyzed by a pseudo enzyme; thus, each mode has both kinetic and enzyme synthesis parameters. The proof of concept network generated 6 EMs, resulting in 13 model parameters (continuous 186 Figure 7.2: The performance of JuPOETs on the multi-objective test suite. The execution time (wall-clock) for JuPOETs and POETs implemented in Octave was measured for 10 independent trials for the suite of test problems. The number of steps per temperature I = 10, and the cooling parameter α = 0.9 for all cases. The problem domain was partitioned into 10 equal segments, an initial guess was drawn from each segment. For each of the test functions, JuPOETs estimated solutions on (rank zero solutions, black) or near (gray) the optimal tradeoff surface, subject to bounds and problem constraints. real-valued decision variables). The synthetic training data was generated by randomly varying these parameters. The general form of the biochemical test problem was given by: min (O1, . . . , OK) (7.5)p subject to model and bounds constraints. We considered four training data sets 187 600 JuPOETs Octave 500 400 300 200 100 0 Schaffer N1 Binh and Korn Fonesca and Fleming Figure 7.3: Representative JuPOETs solutions for problems in the multi-objective test suite. The number of steps per temperature I = 10, and the cooling parameter α = 0.9 for all cases. The problem domain was partitioned into 10 equal segments, an initial guess was drawn from each segment. For each of the test functions, JuPOETs estimated solutions on (rank zero solutions, black) or near (gray) the optimal tradeoff surface, subject to bounds and problem constraints. (K = 4), each of which contained time-series measurements of Ae, Be, Ce and cellmass. Each objective Oj, j = 1, . . . ,K quantified the squared difference between the simulated (x thi) and measured extracellular species abundance (yi) in the j data set: O = ∑ ∑ (x (τ)− y (τ))2j i i j = 1, . . . ,K (7.6) i τ where, i denotes the species index and τ denotes the time index. The abundance of extracellular species i (xi), the pseudo enzyme el (catalyzes flux through mode l), 188 Average Performance (N = 10) (s) Extracellular 2.5 Intracellular A (extracellular) A C q v1 q 2.0 1 2 Cellmass Ae A B Bv4 e 1.5 v2 v3 C 1.0 0.5 q3 C 0.0 e0 20 40 60 80 100 Time (AU) 1.2 B 1.0 0.8 C (extracellular) 0.6 0.4 0.2 B (extracellular) 0.00 20 40 60 80 100 Time (AU) Figure 7.4: Proof of concept biochemical network study. Inset right: Prototypical biochemical network with six metabolites and seven reactions modeled using the hybrid cybernetic approach (HCM). Intracellular cellmass precursors A, B, and C are balanced (no accumulation) while the extracellular metabolites Ae, Be, and Ce are dynamic. The oval denotes the cell boundary, qj is the jth flux across the boundary, and vk denotes the kth intracellular flux. Four data sets (each with Ae, Be,Ce and cellmass measurements) were generated by varying the kinetic constants for each biochemical mode. Each data set was a single objective in the JuPOETs procedure. A: Ensemble simulation of extracellular substrate Ae and cellmass versus time. B: Ensemble simulation of extracellular substrate Be and Ce versus time. The gray region denotes the 95% confidence estimate of the mean ensemble simulation. The data points denote mean synthetic measurements, while the error bars denote the 95% confidence estimate of the measurement computed over the four training data sets. C: Trade-off plots between the four training objectives. The quantity Oj denotes the jth training objective. Each point represents a member of the parameter ensemble, where gray denotes rank 0 sets, while black denotes rank 1 sets. Ensembles were generated using POETs without employing local refinement. 189 Concentration (AU) Concentration (AU) and cellmass were governed by the model equations: dx R Li = ∑ ∑ σijzjlql (e, p, x) c i = 1, . . . ,Mdt j=1 l=1 del = α + r (p, x) u − (β + r dt l El l l G ) el l = 1, . . . ,L dc = r c dt G where R and M denote the number of reactions and extracellular species in the model and L denotes the number of elementary modes. The quantity σij denotes the stoichiometric coefficient for species i in reaction j and zjl denotes the normalized flux for reaction j in mode l. If σij > 0, species i is produced by reaction j; if σij < 0, species i is consumed by reaction j; if σij = 0, species i is not connected with reaction j. Extracellular species, cellmass and pseudo-enzyme were subject to the initial conditions x (to) = xo, c(to) = co and el = 0.5, respectively. The term ql (e, p, x) denotes the specific uptake/secretion rate for mode l where e denotes the pseudo enzyme vector, p denotes the unknown kinetic parameter vector (decision variables), x denotes the extracellular species vector, and c denotes the cell mass; ql (e, p, x) is the product of a kinetic term (q̄l) and a control variable governing enzyme activity. Flux through each mode was catalyzed by a pseudo enzyme el, synthesized at the regulated specific rate rE,l (p, x), and constitutively at the rate αl. The term ul denotes the cybernetic variable controlling the synthesis of enzyme l. The term βl denotes the rate constant governing non-specific enzyme degradation, and rG denotes the specific growth rate through all modes. The 190 specific uptake/secretion rates and the specific rate of enzyme synthesis were modeled using saturation kinetics. The specific growth rate was given by: L rG = ∑ zµlql (e, p, x) l=1 where zµl denotes the growth flux µ through mode l. The control variables ul and vl , which control the synthesis and activity of each enzyme respectively, were given by: z q̄ ul = sl l L (7.7) ∑ zsl q̄l l=1 and z v = sl q̄l l (7.8)max z L sl q̄l l=1,..., where zsl denotes the uptake flux of substrate s through mode l. Each unknown ki- netic parameter was continuous and real-valued, and subject to bounds constraints: L ≤ p ≤ U . JuPOETs produced an ensemble of approximately dimS ' 13,000 parameter sets that captured the mean of the measured data sets for extracellular metabolites and cellmass (Fig. 7.4A and B). JuPOETs minimized the difference between the sim- ulated and measured values for extracellular metabolites Ae, Be, Ce and cellmass, where the residual for each data set was treated as a single objective (leading to four objectives). The 95% confidence estimate produced by the ensemble was consistent with the mean of the measured data, despite having significant uncertainty in the 191 training data. JuPOETs produced a consensus estimate of the synthetic data by calculating optimal trade-offs between the training data sets (Fig. 7.4C). Multiple trade-off fronts were visible in the objective plots, for example between data set 3 (O3) and data set 2 (O2). Thus, without a multiobjective approach, it would be challenging to capture these data sets as fitting one leads to decreased performance on the other. However, the ensemble contained parameter sets that described each data set independently (Fig. 7.5). Thus, JuPOETs produced an ensemble of parameters that gave the mean of the training data for conflicting data sets, while simultaneously estimating parameter sets that performed well on each individual objective function. Currently, JuPOETs does not consider parameter identifiability when construct- ing parameter ensembles. Although JuPOETs produces parameter estimates that give model performance similar to the training data, we do not have strict statisti- cal confidence that the true parameter values are contained within the ensemble. However, despite this, ensembles produced by POETs can be predictive [106, 158]. Thus, JuPOETs produces a collection of parameters that are constrained by the performance of the model, and not by specific hypotheses regarding the individual values of the raw model parameters. Of course, knowledge of specific parameter values, or the relationship between parameter combinations, can be used to inform the search through either bounds or problem specific constraints (for example, as demonstrated in the first example problem.) 192 3.0 2.5 2.0 Experiment 3 1.5 1.0 Experiment 2 0.5 0.00 20 40 60 80 100 Time (AU) Figure 7.5: Experiment to experiment variation captured by the ensemble. Cellmass measurements (points) versus time for experiment 2 and 3 were compared with ensemble simulations. The full ensemble was sorted by simultaneously selecting the top 25% of solutions for each objective with rank ≤ 1. The best fit solution for each objective (line) ± 1-standard deviation (gray region) for experiment 2 and 3 brackets the training data despite significant differences the training values between the two data sets. 7.6 Conclusions In this software note, we presented JuPOETs, a multiobjective technique to estimate parameter ensembles in the Julia programming language. JuPOETs is open source, and available for download under an MIT license from the JuPOETs GitHub repos- itory at https://github.com/varnerlab/POETs.jl. We demonstrated JuPOETs on a suite of algebraic test problems, and a proof-of-concept ODE based biochem- ical model. While JuPOETs outperformed (and was significantly more flexible) 193 Cellmass Concentration (AU) than the previous Octave implementation, there are several areas that could be explored further. First, JuPOETs should be compared with other multiobjective evolutionary algorithms (MOEAs) to determine its relative performance on test and real world problems. Many evolutionary approaches e.g., the non-dominated sorting genetic algorithm (NSGA) family of algorithms, have been adapted to solve multiobjective problems [86, 76]. However, since there is a lack of open source Julia implementations of these alternative approaches, we did not benchmark the rela- tive performance of JuPOETs in this note. One advantage that JuPOETs may have when compared to a strictly evolutionary approaches, is the inclusion of a local refinement step (hybrid mode), which temporarily reduces the problem to a single objective formulation. Previously, POETs run in hybrid mode led to better con- vergence on a proof-of-concept signal transduction model compared to the same approach without the hybrid refinement step [155]. Other hybrid multiobjective methods have also been shown to be more efficient than evolutionary approaches alone, for a variety of biochemical optimization problems [134, 151]. Thus, there are several different algorithms that we can use to benchmark, and improve the performance of JuPOETs, after we implement them in Julia. Another strategy to improve the performance of JuPOETs is to reduce the number (or cost) of function evaluations that are required to obtain optimal or near optimal solutions. For exam- ple, in many real world parameter estimation problems, the bulk of the execution time is spent evaluating the objective functions. One strategy to improve JuPOETs performance could be to optimize surrogates [18], while another would be parallel execution of the objective functions. Currently, JuPOETs serially evaluates the 194 objective function vector. However, parallel evaluation of the objective functions e.g., using the parallel Julia macro or other techniques, could be implemented without significantly changing the JuPOETs run loop. Taken together, JuPOETs demonstrated improved flexibility, and performance over POETs in parameter identification and ensemble generation for multiple objectives. JuPOETs has the potential for widespread use due to the flexibility of the implementation, and the high level syntax and distribution tools native to the Julia programming language. 7.7 Acknowledgements This study was supported by an award from the National Science Foundation (NSF CBET-0955172) and the National Institutes of Health (NIH HL110328) to J.B, and by a National Science Foundation Graduate Research Fellowship (DGE-1144153) to D.B. Lastly, J.V was supported by an award from the US Army and Systems Biology of Trauma Induced Coagulopathy (W911NF-10-1-0376). We gratefully acknowledge Ani Chakrabarti, Russell Gould and Kathy Rogers for their input and suggestions regarding new features to include into JuPOETs. We also gratefully acknowledge the suggestions from the anonymous reviewers to improve this manuscript and JuPOETs. 195 CHAPTER 8 SUMMARY & CONCLUSION Metabolism is the central process through which cells manage their resources to survive, adapt and meet energetic demands. To implement these diverse functions, cells have very complex and highly interconnected networks of chemical reactions between genes, RNA, proteins and metabolites. Due to the complexity of cells, systems modeling arose from the desire to better understand metabolism and how metabolism can be altered for our benefit [48, 12]. A primary challenge is the development of metabolic mathematical models that are able to describe the effect genetic perturbations have on cellular behavior. In this study, we first review metabolic modeling methods and go on to develop computational tools for the analysis and engineering of microbial systems. My research work began with cybernetic modeling and linear programming. Both techniques were able to describe growth of microbial systems on substrates as well as byproduct formation [176, 98, 95]. However, cybernetic modeling coupled with elementary modes was only applicable to small networks, since the decomposition of a network would grow exponentially with it’s size. Thus, we eliminated this computational burden by the use of flux balance solutions instead of elementary modes to describe aerobic and anaerobic growth of E. coli. Following our work with cybernetic modeling, my research focus shifted towards cell-free protein synthesis systems. Cybernetic modeling uses matching laws to describe enzyme synthesis, however CFPS systems do not have the capacity of enzyme synthesis. Thus, we used alternative modeling 196 approaches to describe CFPS behavior to help us understand the performance limitations of these systems. In addition, these mathematical models would help identify strategies for the improvement of CFPS in terms of productivity, yield and/or energy efficiency. We first began by developing a kinetic model of CFPS for which an extensive dataset was provided by the Swartz Lab. The kinetic model contained 148 metabo- lites and 204 reactions with a total of 815 parameters. Model equations followed the hybrid modeling framework of Wayman and coworkers [183], combining multiple saturation kinetics with a rule-based model of allostery. Even though this model described the metabolite levels of 38 species, its development took several years to complete. In addition, the model was only applied to a specific CFPS system for the production of CAT under a T7 promoter. We then applied a constraint-based approach to minimize the number of adjustable parameters. We developed a sequence-specific constraint based model of cell-free protein synthesis by taking the same metabolic network with the addition of promoter models from Moon and coworkers [123]. The resulting model structure contained only six adjustable parameters, not including parameters taken from literature. The modeling framework estimated the production of CAT under a T7 promoter for the Glucose/NMP cell-free system and GFP production under a P70 promoter in the myTXTL system. The model also estimated the titer of GFP as a function of plasmid concentration. Global sensitivity analysis identified the translation rate as the key metabolic process that controlled CFPS productivity and oxidative 197 phosphorylation as the key metabolic process for energy efficiency. Despite the simulations being consistent with experimental measurements, there was a high uncertainty in the flux distribution as shown by alternative optimal solutions. In order to circumvent this uncertainty, we developed analytical techniques to measure species involved in central carbon and energy metabolism as well as amino acids, mRNA, and protein levels. Cell-free systems have no cell wall, thus we have direct access to metabolites and the biosynthetic machinery. We developed a robust protocol to qunatify 41 compounds involved in glycolysis, the pentose phosphate pathway, the tricarboxylic acid cycle, energy metabolism and cofactor regeneration in CFPS reactions. The method used internal standards tagged with 13C-aniline, while compounds in the sample were derivatized with 12C-aniline. The internal standards allowed for the co-elution of compounds which eliminated ion suppression. We then applied an amino acid protocol from Waters (Medford, MA) to quantify 19 amino acids and used a colorimetric assay to quantify glutamate levels. Finally, we used real-time RT-qPCR to measure mRNA levels. In total we quantified 63 species for a span of 16 hours for a batch reaction of CFPS. We expanded our sequence specific modeling framework by integrating these experimental measurements along with kinetic parameters, enzyme levels, and enzyme activity assays. The framework predicted the overall production of mRNA and protein along with changes in metabolic behavior with two different oxidative phosphorylation inhibitors. The integrated modeling framework revealed that central metabolism is activated along with glutamate powering the TCA cycle to 198 provide reduced ubiquinone for oxidative phosphorylation. Oxidative phosphory- lation inhibitors provide biochemical evidence that myTXTL relied on oxidative phosphorylation to provide energy for sustaining transcription and translation for 16 hours in a batch reaction. Finally, enzyme activity assays throughout central car- bon metabolism revealed that allosteric regulation is present in CFPS metabolism and should be incorporated into future mathematical models. Cell-free protein synthesis is beyond just transcription and translation processes, thus we provide a comprehensive mathematical framework that predicted mRNA and protein pro- duction along with metabolic perturbations. This framework could potentially be used to identify strategies for the improvement of CFPS productivity, yield and efficiency. While this study was promising in predicting protein, mRNA production, and metabolic behavior, there are several opportunities to consider in future work. First, a more detailed description of transcription and translation reactions has been utilized in genome scale ME models e.g., O’Brien et al [129]. These template reactions could be adapted to a cell-free system. This would allow us to consider important facets of protein production, such as the role of chaperones in protein folding. Post-translation modifications such as glycosylation that are important for the production of therapeutic proteins could also be included in the next generation of models. In this work, we modeled the cell-free production of single proteins coupled to cell-free metabolism, but sequence specific constraint based modeling could be extended to multi-protein synthetic circuits, RNA circuits or small molecule production. 199 APPENDIX A APPENDIX 200 Table A.1: List of materials and equipment used to quantify cell-free protein synthesis metabolites with aniline tagging and internal standards Material/Equipment Company Catalog Number Comments/Description 12C Aniline Sigma-Aldrich 242284 Aniline 12C 13C labeled aniline Sigma-Aldrich 485797 Aniline 13C6 3-Phosphoglyceric acid Sigma-Aldrich P8877 3PG Acetic Acid FisherScientific AC222140010 ACE Acetonitrile, LCMS JT BAKER 9829-03 ACN Acetyl-coenzyme A Sigma-Aldrich A2056 ACA Acquity UPLC BEH C18 1.7 µM, 2.1 x 150 mm Column Waters 186002353 Column Adenosine diphosphate Sigma-Aldrich A2754 ADP Adenosine monophosphate Sigma-Aldrich A1752 AMP Adenosine triphosphate Sigma-Aldrich A2383 ATP Alpha-ketoglutarate Sigma-Aldrich K1128 aKG Citrate Sigma-Aldrich 251275 CIT Cytidine diphosphate Sigma-Aldrich C9755 CDP Cytidine monophosphate Sigma-Aldrich C1006 CMP Cytidine triphosphate Sigma-Aldrich C9274 CTP D-glyceraldehyde 3-phosphate Sigma-Aldrich 39705 GAP Erythrose 4-phosphate Sigma-Aldrich E0377 E4P Ethanol Sigma-Aldrich EX0276 EtOH Fisher Scientific accuSpin Micro 17 Centrifuge FisherScientific Centrifuge Flavin adenine dinucleotide Sigma-Aldrich F6625 FAD Fructose 1,6-bisphosphate Sigma-Aldrich F6803 F16P Fructose 6-phosphate Sigma-Aldrich F3627 F6P Fumarate Sigma-Aldrich F8509 FUM Gluconate 6-phosphate Sigma-Aldrich P7877 6PG Glucose Sigma-Aldrich G8270 GLC Glucose 6-phosphate Sigma-Aldrich G7879 G6P Glycerol 3-phosphate Sigma-Aldrich G7886 Gly3P Guanosine diphosphate Sigma-Aldrich G7127 GDP Guanosine monophosphate Sigma-Aldrich G8377 GMP Guanosine triphosphate Sigma-Aldrich G8877 GTP Hydrochloric acid Sigma-Aldrich 258148 HCl Isocitrate Sigma-Aldrich I1252 ICIT Lactate Sigma-Aldrich L1750 LAC Malate Sigma-Aldrich 02288 MAL myTXTL - Sigma 70 Master Mix Kit ArborBiosciences 507024 Cell-free protein synthesis N-(3-dimethylaminopropyl)-N’-ethylcarbodiimide hydrochloride Sigma-Aldrich 03449 EDC Nicotinamide adenine dinucleotide Sigma-Aldrich 43410 NAD Nicotinamide adenine dinucleotide phosphate Sigma-Aldrich N5755 NADP Nicotinamide adenine dinucleotide phosphate reduced Sigma-Aldrich 481973 NADPH Nicotinamide adenine dinucleotide reduced Sigma-Aldrich N8129 NADH Oxalacetate Sigma-Aldrich O4126 OAA Phosphoenolpyruvate Sigma-Aldrich P0564 PEP Pyruvate Sigma-Aldrich P5280 PYR Ribose 5-phosphate Sigma-Aldrich R7750 R5P Ribulose 5-phosphate CarboSynth MR45852 RL5P Sedoheptulose 7-phosphate CarboSynth MS07457 S7P Succinate Sigma-Aldrich S3674 SUCC Tributylamine Sigma-Aldrich 90780 TBA Triethylamine FisherScientific O4884 TEA ultrapure water FisherScientific 10977-015 water Uridine diphosphate Sigma-Aldrich U4125 UDP Uridine monophosphate Sigma-Aldrich U6375 UMP Uridine triphosphate Sigma-Aldrich U6625 UTP VWR Heavy Duty Vortex VWR Vortex Water, LCMS JT BAKER 9831-03 WATER Waters Acquity H UPLC Class Quaternary Solvent Manager Waters LCMS Waters Acquity H UPLC Class Sample Manager FTN Waters LCMS Waters Acquity Qda detector Waters LCMS Waters Empower 3 Waters Software Waters LCMS Total Recovery Vial Waters 186000384c LCMS Vial 201 BIBLIOGRAPHY [1] GNU Linear Programming Kit, Version 4.52, March 2016. [2] Jiro Adachi, Kazushige Katsura, Eiko Seki, Chie Takemoto, Mikako Shi- rouzu, Takaho Terada, Takahito Mukai, Kensaku Sakamoto, and Shigeyuki Yokoyama. Cell-free protein synthesis using s30 extracts from Escherichia coli rfzero strains for efficient incorporation of non-natural amino acids into proteins. International journal of molecular sciences, 20(3):492, Jan 2019. 30678326[pmid]. [3] Roi Adadi, Benjamin Volkmer, Ron Milo, Matthias Heinemann, and Tomer Shlomi. Prediction of microbial growth rate versus biomass yield by a metabolic network with kinetic parameters. PLOS Comput Biol, 8, 2012. [4] Timothy E Allen and Bernhard Ø Palsson. Sequence-based analysis of metabolic demands for protein synthesis in prokaryotes. J Theor Biol, 220(1):1– 18, Jan 2003. [5] S. Andreozzi, A. Chakrabarti, K. C. Soh, A. Burgard, T. H. Yang, S. Van Dien, L. Miskovic, and V. Hatzimanikatis. Identification of metabolic engineering targets for the enhancement of 1,4-butanediol production in recombinant E. coli using large-scale kinetic models. Metab. Eng., 35:148–159, May 2016. [6] Stefano Andreozzi, Ljubisa Miskovic, and Vassily Hatzimanikatis. iS- CHRUNK – in silico approach to characterization and reduction of uncer- 202 tainty in the kinetic models of genome-scale metabolic networks. Metab Eng, 33:158–168, 2007. [7] Claudio Angione and Pietro Lió. Predictive analytics of environmental adaptability in multi-omic network models. Sci Rep, 5:15147, Oct 2015. [8] P. Arjunan, N. Nemeria, A. Brunskill, K. Chandrasekhar, M. Sax, Y. Yan, F. Jordan, J. R. Guest, and W. Furey. Structure of the pyruvate dehydroge- nase multienzyme complex E1 component from Escherichia coli at 1.85 A resolution. Biochemistry, 41(16):5213–21, Apr 2002. [9] J C Atlas, E V Nikolaev, S T Browning, and M L Shuler. Incorporating genome-wide dna sequence information into a dynamic whole-cell model of Escherichia coli: application to dna replication. IET Syst Biol, 2(5):369–82, Sep 2008. [10] Shota Atsumi, Taizo Hanai, and James C. Liao. Non-fermentative path- ways for synthesis of branched-chain higher alcohols as biofuels. Nature, 451(7174):86–89, 01 2008. [11] Rochelle Aw and Karen M. Polizzi. Biosensor-assisted engineering of a high- yield pichia pastoris cell-free protein synthesis platform. Biotechnol Bioeng, 116(3):656–666, 2019. [12] JE Bailey. Toward a science of metabolic engineering. Science, 252(5013):1668– 1675, 1991. 203 [13] Arren Bar-Even, Elad Noor, Yonatan Savir, Wolfram Liebermeister, Dan Davidi, Dan S. Tawfik, and Ron Milo. The moderately efficient enzyme: evolutionary and physicochemical trends shaping enzyme parameters. Bio- chemistry, 50, 2011. [14] D Battogtokh, D.K Asch, M.E Case, J Arnold, and H.B Shüttler. An ensemble method for identifying regulatory circuits with special reference to the qa gene cluster of Neurospora crassa. Proc Natl Acad Sci U S A, 99(26):16904– 16909, December 2002. [15] Jennifer E. Bestman, Krista D. Stackley, Jennifer J. Rahn, Tucker J. Williamson, and Sherine S. L. Chan. The cellular and molecular progression of mito- chondrial dysfunction induced by 2,4-dinitrophenol in developing zebrafish embryos. Differentiation; research in biological diversity, 89(3-4):51–69, 2015. 25771346[pmid]. [16] Jeff Bezanzon, Stefan Karpinski, Viral Shah, and Alan Edelman. Julia: A fast dynamic language for technical computing. In Lang.NEXT, April 2012. [17] Lacramioara Bintu, Nicolas E Buchler, Hernan G Garcia, Ulrich Gerland, Terrence Hwa, Jane Kondev, and Rob Phillips. Transcriptional regulation by the numbers: models. Current Opinion in Genetics & Development, 15(2):116–24, 2005. [18] A.J Booker, J.E Dennis, P.D Frank, D.B Serafini, V Torczon, and M.W Trosset. A rigorous framework for optimization of expensive functions by surrogates. Struct Optim, 17:1 – 13, 1999. 204 [19] Henry Borsook. Protein turnover and incorporation of labeled amino acids into tissue proteins in vivo and in vitro. Physiological reviews, 30(2):206–219, 1950. [20] Sabine Brantl and E. Gerhart H. Wagner. Antisense RNA-mediated transcrip- tional attenuation: an in vitro study of plasmid pT181. Molecular Microbiology, 35(6):1469–1482, 2000. [21] Kevin S Brown and James P Sethna. Statistical mechanical approaches to models with many poorly known parameters. Phys Rev E Stat Nonlin Soft Matter Phys, 68(2 Pt 1):021904, Aug 2003. [22] Matthias Bujara, Michael Schümperli, Sonja Billerbeck, Matthias Heinemann, and Sven Panke. Exploiting cell-free systems: Implementation and debug- ging of a system of biotransformations. Biotechnol Bioeng, 106(3):376–389, 2010. [23] Jörg Martin Büscher, Dominika Czernik, Jennifer Christina Ewald, Uwe Sauer, and Nicola Zamboni. Cross-platform comparison of methods for quantitative metabolomics of primary metabolism. Analytical Chemistry, 81(6):2135–2143, Mar 2009. [24] R. Cabrera, M. Baez, H. M. Pereira, A. Caniuguir, R. C. Garratt, and J. Babul. The crystal complex of phosphofructokinase-2 of Escherichia coli with fructose-6-phosphate: kinetic and structural analysis of the allosteric ATP inhibition. J. Biol. Chem., 286(7):5774–83, Feb 2011. 205 [25] Kara A. Calhoun and James R. Swartz. An Economical Method for Cell-Free Protein Synthesis using Glucose and Nucleoside Monophosphates. Biotechnol Prog, 21(4):1146–53, 2005. [26] Erik D Carlson, Rui Gan, C Eric Hodgman, and Michael C Jewett. Cell-free protein synthesis: applications come of age. Biotechnol Adv, 30(5):1185–94, 2012. [27] Filippo Caschera, Mark A. Bedau, Andrew Buchanan, James Cawse, Davide de Lucrezia, Gianluca Gazzola, Martin M. Hanczyc, and Norman H. Packard. Coping with complexity: Machine learning optimization of cell-free protein synthesis. Biotechnol Bioeng, 108(9):2218–2228, 2011. [28] Filippo Caschera and Vincent Noireaux. Synthesis of 2.3 mg/ml of pro- tein with an all Escherichia coli cell-free transcription–translation system. Biochimie, 99:162 – 168, 2014. [29] M Castellanos, D B Wilson, and M L Shuler. A modular minimal cell model: purine and pyrimidine transport and metabolism. Proc Natl Acad Sci, 101(17):6681–6, Apr 2004. [30] Roger L. Chang, Kathleen Andrews, Donghyuk Kim, Zhanwen Li, Adam Godzik, and Bernhard O. Palsson. Structural systems biology evaluation of metabolic thermotolerance in Escherichia coli. Science, 340(6137):1220–1223, 2013. [31] James Chappell, Melissa K. Takahashi, and Julius B. Lucks. Creating small 206 transcription activating RNAs. Nature Chemical Biology, 11(3):214–220, March 2015. [32] M. Chulavatnatol and D. E. Atkinson. Phosphoenolpyruvate synthetase from Escherichia coli. Effects of adenylate energy charge and modifier con- centrations. J. Biol. Chem., 248(8):2712–5, Apr 1973. [33] Carolina A. Contador, Matthew L. Rizk, Juan A. Asenjo, and James C. Liao. Ensemble modeling for strain development of l-lysine-producing Escherichia coli. Metabolic Engineering, 11(4–5):221 – 233, 2009. [34] Markus W Covert, Eric M Knight, Jennifer L Reed, Markus J Herrgard, and Bernhard O Palsson. Integrating high-throughput and computational data elucidates bacterial networks. Nature, 429(6987):92–6, May 2004. [35] David Dai, Nicholas Horvath, and Jeffrey D Varner. Dynamic sequence specific constraint-based modeling of cell-free protein synthesis. Processes, 6(8):132, Aug 2018. [36] Katja Dettmer, Pavel A. Aronov, and Bruce D. Hammock. Mass spectrometry- based metabolomics. Mass Spectrometry Reviews, 26(1):51–78, 2007. [37] M M Domach, S K Leung, R E Cahn, G G Cocks, and M L Shuler. Computer model for glucose-limited growth of a single cell of Escherichia coli b/r-a. Biotechnol Bioeng, 26(3):203–16, Mar 1984. [38] M. M. Domach, S. K. Leung, R. E. Cahn, G. G. Cocks, and M. L. Shuler. 207 Computer model for glucose-limited growth of a single cell of Escherichia coli b/r-a. Biotechnol Bioeng, 67(6):827–840, 2000. [39] J. L. Donahue, J. L. Bownas, W. G. Niehaus, and T. J. Larson. Purification and characterization of glpX-encoded fructose 1, 6-bisphosphatase, a new enzyme of the glycerol 3-phosphate regulon of Escherichia coli. J. Bacteriol., 182(19):5624–7, Oct 2000. [40] Warwick B. Dunn, Alexander Erban, Ralf J M Weber, Darren J. Creek, Marie Brown, Rainer Breitling, Thomas Hankemeier, Royston Goodacre, Steffen Neumann, Joachim Kopka, and Mark R. Viant. Mass appeal: Metabo- lite identification in mass spectrometry-focused untargeted metabolomics. Metabolomics : Official journal of the Metabolomic Society, 9(1):44–66, 2013. [41] John W. Eaton, David Bateman, and Soren Hauberg. GNU Octave version 3.0.1 manual: a high-level interactive language for numerical computations. CreateSpace Independent Publishing Platform, North Charleston, SC, USA, 2009. [42] J S Edwards and B Ø Palsson. The Escherichia coli mg1655 in silico metabolic genotype: its definition, characteristics, and capabilities. Proc Natl Acad Sci, 97(10):5528–33, May 2000. [43] Jeremy S. Edwards and Bernhard O. Palsson. Metabolic flux balance analysis and the in silico analysis of Escherichia coli k-12 gene deletions. BMC Bioinformatics, 1(1):1, 2000. [44] Adam M Feist, Christopher S Henry, Jennifer L Reed, Markus Krummenacker, 208 Andrew R Joyce, Peter D Karp, Linda J Broadbelt, Vassily Hatzimanikatis, and Bernhard Ø Palsson. A genome-scale metabolic reconstruction for Es- cherichia coli K-12 MG1655 that accounts for 1260 ORFs and thermodynamic information. Mol Syst Biol, 3:121, 2007. [45] Adam M Feist, Markus J Herrgrd, Ines Thiele, Jennie L Reed, and Bernhard Ø Palsson. Reconstruction of biochemical networks in microorganisms. Nat Rev Microbiol, 7(2):129–43, Feb 2009. [46] C.M. Fonseca and P. J. Fleming. Genetic Algorithms for Multiobjective Optimization: Formulation, Discussion and Generalization. In Proceedings of the 5th International Conference on Genetic Algorithms, pages 416 – 423, 1993. [47] Elena Fossati, Andrew Ekins, Lauren Narcross, Yun Zhu, Jean-Pierre Falgueyret, Guillaume A. W. Beaudoin, Peter J Facchini, and Vincent J. J. Martin. Reconstitution of a 10-gene pathway for synthesis of the plant alka- loid dihydrosanguinarine in saccharomyces cerevisiae. Nat Commun, 5, 02 2014. [48] A G Fredrickson. Formulation of structured growth models. Biotechnol Bioeng, 18(10):1481–6, Oct 1976. [49] Kapil G Gadkar, Francis J Doyle, 3rd, Timothy J Crowley, and Jeffrey D Varner. Cybernetic model predictive control of a continuous bioreactor with cell recycle. Biotechnol Prog, 19(5):1487–97, 2003. [50] Kapil G Gadkar, Rudiyanto Gunawan, and Francis J Doyle, 3rd. Iterative 209 approach to model identification of biological networks. BMC Bioinformatics, 6:155, 2005. [51] Ernest F Gale, Joan P Folkes, et al. Effect of nucleic acids on protein synthesis and amino-acid incorporation in disrupted staphylococcal cells. Nature, 173:1223–7, 1954. [52] Jonathan Garamella, Ryan Marshall, Mark Rustad, and Vincent Noireaux. The all e. coli tx-tl toolbox 2.0: A platform for cell-free synthetic biology. ACS Synth Biol, 5(4):344–55, Apr 2016. [53] David Garenne, Chase L. Beisel, and Vincent Noireaux. Characterization of the all-e. coli transcription-translation system mytxtl by mass spectrometry. Rapid Communications in Mass Spectrometry, 33(11):1036–1048, 2019. [54] Peter Gennemark and Dag Wedelin. Benchmarks for identification of ordi- nary differential equations from time series data. Bioinformatics, 25(6):780–6, Mar 2009. [55] Aaron R. Goerke and James R. Swartz. Development of cell-free protein synthesis platforms for disulfide bonded proteins. Biotechnol Bioeng, 99(2):351– 367, 2008. [56] Johann Grundlingh, Paul I. Dargan, Marwa El-Zanfaly, and David M. Wood. 2,4-dinitrophenol (dnp): a weight loss agent with significant acute toxicity and risk of death. Journal of medical toxicology : official journal of the American College of Medical Toxicology, 7(3):205–212, Sep 2011. 21739343[pmid]. 210 [57] Cassandra Guarino and Matthew P DeLisa. A prokaryote-based cell-free translation system that efficiently synthesizes glycoproteins. Glycobiology, 22(5):596–601, May 2012. [58] Weihua Guo, Jiayuan Sheng, and Xueyang Feng. Mini-review: In vitro metabolic engineering for biomanufacturing of high-value products. Compu- tational and Structural Biotechnology Journal, 15:161 – 167, 2017. [59] Ryan N Gutenkunst, Joshua J Waterfall, Fergal P Casey, Kevin S Brown, Christopher R Myers, and James P Sethna. Universally sloppy parameter sensitivities in systems biology models. PLoS Comput Biol, 3:e189, 2007. [60] A. Gyorgy and R. M. Murray. Quantifying resource competition and its effects in the TX-TL system. In 2016 IEEE 55th Conference on Decision and Control (CDC), pages 3363–3368, December 2016. [61] H Hajjaj, P.J Blanc, G Goma, and J François. Sampling techniques and comparative extraction procedures for quantitative determination of intra- and extracellular metabolites in filamentous fungi. FEMS Microbiology Letters, 164(1):195–200, 1998. [62] Joshua J Hamilton, Vivek Dwivedi, and Jennifer L Reed. Quantitative Assess- ment of Thermodynamic Constraints on the Solution Space of Genome-Scale Metabolic Models. Biophys J, 105(2):512–522, Jul 2013. [63] Julia Handl, Douglas B Kell, and Joshua Knowles. Multiobjective optimiza- 211 tion in bioinformatics and computational biology. IEEE/ACM Trans Comput Biol Bioinform, 4(2):279–92, 2007. [64] Christopher S Henry, Linda J Broadbelt, and Vassily Hatzimanikatis. Thermodynamics-Based Metabolic Flux Analysis. Biophys. J, 92(5):192–1805, Mar 2006. [65] Jon Herman and Will Usher. SALib: An open-source python library for sensitivity analysis. The Journal of Open Source Software, 2(9), jan 2017. [66] Alan C Hindmarsh, Peter N Brown, Keith E Grant, Steven L Lee, Radu Serban, Dan E Shumaker, and Carol S Woodward. SUNDIALS: Suite of nonlinear and differential/algebraic equation solvers. ACM T Math Software (TOMS), 31(3):363–396, 2005. [67] J. K. Hines, H. J. Fromm, and R. B. Honzatko. Novel allosteric activation site in Escherichia coli fructose-1,6-bisphosphatase. J. Biol. Chem., 281(27):18386– 93, Jul 2006. [68] J. K. Hines, H. J. Fromm, and R. B. Honzatko. Structures of activated fructose- 1,6-bisphosphatase from Escherichia coli. Coordinate regulation of bacterial metabolism and the conservation of the R-state. J. Biol. Chem., 282(16):11696– 704, Apr 2007. [69] Mahlon B Hoagland, Elizabeth B Keller, and Paul C Zamecnik. Enzymatic carboxyl activation of amino acids. J Biol Chem, 218(1):345–358, 1956. 212 [70] C Eric Hodgman and Michael C Jewett. Cell-free synthetic biology: thinking outside the cell. Metab Eng, 14(3):261–9, May 2012. [71] Nicholas Horvath, Michael Vilkhovoy, Joseph A. Wayman, Kara Calhoun, James Swartz, and Jeffrey D. Varner. Toward a genome scale sequence specific dynamic model of cell-free protein synthesis in Escherichia coli. bioRxiv, 2017. [72] Chelsea Y. Hu, Jeffrey D. Varner, and Julius B. Lucks. Generating effective models and parameters for RNA genetic circuits. ACS Synthetic Biology, 4(8):914–926, August 2015. [73] Chelsea Y Hu, Jeffrey D Varner, and Julius B Lucks. Generating effective models and parameters for rna genetic circuits. ACS Synth Biol, 4(8):914–26, Aug 2015. [74] Tianjiao Huang, Michael R. Armbruster, John B. Coulton, and James L. Ed- wards. Chemical tagging in mass spectrometry for systems biology. Analytical Chemistry, 91(1):109–125, Jan 2019. [75] Tianjiao Huang, Maria Toro, Richard Lee, Dawn S. Hui, and James L. Ed- wards. Multi-functional derivatization of amine, hydroxyl, and carboxylate groups for metabolomic investigations of human tissue by electrospray ion- ization mass spectrometry. Analyst, 143:3408–3414, 2018. [76] S Huband, P Hingston, L Barone, and L While. A Review of Multiobjective Test Problems and a Scalable Test Problem Toolkit. IEEE Trans. Evol. Comp., 10:477 – 506, 2006. 213 [77] Amber Jannasch, Miroslav Sedlak, and Jiri Adamec. Quantification of pen- tose phosphate pathway (ppp) metabolites by liquid chromatography-mass spectrometry (lc-ms). Methods in molecular biology (Clifton, N.J.), 708:159–171, 2011. [78] Thapakorn Jaroentomeechai, Jessica C Stark, Aravind Natarajan, Cameron J Glasscock, Laura E Yates, Karen J Hsu, Milan Mrksich, Michael C Jewett, and Matthew P DeLisa. Single-pot glycoprotein biosynthesis using a cell- free transcription-translation system enriched with glycosylation machinery. Nature communications, 9(1):2686, 2018. [79] Lisa Jeske, Sandra Placzek, Ida Schomburg, Antje Chang, and Dietmar Schomburg. BRENDA in 2019: a European ELIXIR core data resource. Nucleic Acids Research, 47(D1):D542–D549, 11 2018. [80] M.C. Jewett, A. Voloshin, and J. Swartz. Prokaryotic systems for in vitro expres- sion, pages 391–411. Eaton Publishing, Westborough, MA, 2002. [81] Michael C Jewett, Kara A Calhoun, Alexei Voloshin, Jessica J Wuu, and James R Swartz. An integrated cell-free metabolic platform for protein production and synthetic biology. Mol Syst Biol, 4:220, 2008. [82] Michael C. Jewett and James R. Swartz. Mimicking the Escherichia coli cytoplasmic environment activates long-lived and efficient cell-free protein synthesis. Biotechnol Bioeng, 86(1):19–26, 2004. [83] Michael C. Jewett and James R. Swartz. Substrate replenishment extends 214 protein synthesis with an in vitro translation system designed to mimic the cytoplasm. Biotechnol Bioeng, 87(4):465–471, 2004. [84] H Kacser and JA Burns. The control of flux. Symp Soc Exp Biol., 27(27):65–104, 1973. [85] S. Kale, P. Arjunan, W. Furey, and F. Jordan. A dynamic loop at the active center of the Escherichia coli pyruvate dehydrogenase complex E1 compo- nent modulates substrate utilization and chemical communication with the E2 component. J. Biol. Chem., 282(38):28106–16, Sep 2007. [86] D Kalyanmoy, A Pratap, S Agarwal, and T. Meyarivan. A Fast and Elitist Multiobjective Genetic Algorithm: NSGA-II. IEEE Trans. Evol. Comp., 6:182 – 197, 2002. [87] A Kamp and S Schuster. Metatool 5.0: fast and flexible elementary modes analysis. Bioinformatics, 22(15):1930–1931, 2006. [88] J. R. Karr, J. C. Sanghvi, D. N. Macklin, M. V. Gutschow, J. M. Jacobs, B. Bolival, N. Assad-Garcia, J. I. Glass, and M. W. Covert. A whole-cell computational model predicts phenotype from genotype. Cell, 150(2):389–401, Jul 2012. [89] Eyal Karzbrun, Jonghyeon Shin, Roy H. Bar-Ziv, and Vincent Noireaux. Coarse-Grained Dynamics of Protein Synthesis in a Cell-Free System. Physical Review Letters, 106(4), January 2011. [90] A. Khodayari and C. D. Maranas. A genome-scale Escherichia coli kinetic 215 metabolic model k-ecoli457 satisfying flux data for multiple mutant strains. Nat Commun, 7:13806, Dec 2016. [91] Ali Khodayari, Ali R Zomorrodi, James C Liao, and Costas D Maranas. A kinetic model of Escherichia coli core metabolism satisfying multiple sets of mutant flux data. Metab Eng, 25:50–62, Sep 2014. [92] Takanori Kigawa, Yutaka Muto, and Shigeyuki Yokoyama. Cell-free synthesis and amino acid-selective stable isotope labeling of proteins for NMR analysis. J Biomolec NMR, 6(2):129–134, 1995. [93] Dong-Myung Kim and James R. Swartz. Regeneration of adenosine triphos- phate from glycolytic intermediates for cell-free protein synthesis. Biotechnol Bioeng, 74(4):309–316, 2001. [94] Dong-Myung Kim and James R. Swartz. Efficient production of a bioactive, multiple disulfide-bonded protein using modified extracts of Escherichia coli. Biotechnol Bioeng, 85(2):122–129, 2004. [95] JI Kim, JD Varner, and D Ramkrishna. A hybrid model of anaerobic e. coli gjt001: Combination of elementary flux modes and cybernetic variables. Biotechnol. Prog., 24(5):993–1006, 2008. [96] Jin Il Kim, Hyun-Seob Song, Sunil R Sunkara, Arvind Lali, and Doraiswami Ramkrishna. Exacting predictions by cybernetic model confirmed experimen- tally: steady state multiplicity in the chemostat. Biotechnol Prog, 28(5):1160–6, 2012. 216 [97] S Kirkpatrick, C D Gelatt, Jr, and M P Vecchi. Optimization by simulated annealing. Science, 220(4598):671–80, May 1983. [98] Dhinakar S. Kompala, Doraiswami Ramkrishna, and George T. Tsao. Cyber- netic modeling of microbial growth on multiple substrates. Biotechnol Bioeng, 26(11):1272–1281, 1984. [99] O. Kotte, J. B. Zaugg, and M. Heinemann. Bacterial adaptation through distributed sensing of metabolic fluxes. Mol. Syst. Biol., 6:355, 2010. [100] Lars Kuepfer, Matthias Peter, Uwe Sauer, and Jörg Stelling. Ensemble model- ing for analysis of cell signaling dynamics. Nat Biotechnol, 25(9):1001–6, Sep 2007. [101] Muriel Lederman and Geoffrey Zubay. Dna-directed peptide synthesis i. a comparison of t2 and Escherichia coli dna-directed peptide synthesis in two cell-free systems. Biochimica et Biophysica Acta (BBA)-Nucleic Acids and Protein Synthesis, 149(1):253–258, 1967. [102] L Lee, JD Varner, and K Ko. Parallel extreme pathway computation for metabolic networks. Comput Syst Bioinformatics Conf, Int IEEE CS, 0:636–639, 2004. [103] Sangbum Lee, Chan Phalakornkule, Michael M Domach, and Ignacio E Grossmann. Recursive MILP model for finding all the alternate optima in LP models for metabolic networks. Comput. Chem. Eng., 24(2):711 – 716, 2000. 217 [104] Sun Bok Lee and James E. Bailey. Genetically structured models forlac promoter–operator function in the Escherichia coli chromosome and in mul- ticopy plasmids: Lac operator function. Biotechnol Bioeng, 26(11):1372–1382, 1984. [105] Yun Lee, Jimmy G Lafontaine Rivera, and James C Liao. Ensemble modeling for robustness analysis in engineering non-native metabolic pathways. Metab Eng, 25:63–71, Sep 2014. [106] Joshua Lequieu, Anirikh Chakrabarti, Satyaprakash Nayak, and Jeffrey D Varner. Computational modeling and analysis of insulin induced eukaryotic translation initiation. PLoS Comput Biol, 7(11):e1002263, Nov 2011. [107] Joshua A Lerman, Daniel R Hyduke, Haythem Latif, Vasiliy A Portnoy, Nathan E Lewis, Jeffrey D Orth, Alexandra C Schrimpe-Rutledge, Richard D Smith, Joshua N Adkins, Karsten Zengler, and Bernhard Ø Palsson. In silico method for modelling metabolism and gene product expression at genome scale. Nat Commun, 3:929, 2012. [108] Nathan E Lewis, Harish Nagarajan, and Bernhard Ø Palsson. Constraining the metabolic genotype-phenotype relationship using a phylogeny of in silico methods. Nat Rev Microbiol, 10(4):291–305, Apr 2012. [109] Jun Li, Liangcai Gu, John Aach, and George M. Church. Improved cell-free rna and protein synthesis system. PLoS ONE, 9(9):1–11, 09 2014. [110] Yuan Lu, John P Welsh, and James R Swartz. Production and stabilization of 218 the trimeric influenza hemagglutinin stem domain for potentially broadly protective influenza vaccines. Proc Natl Acad Sci, 111(1):125–30, Jan 2014. [111] Deyan Luan, Fania Szlam, Kenichi A Tanaka, Philip S Barie, and Jeffrey D Varner. Ensembles of uncertain mathematical models can identify network response to therapeutic interventions. Mol Biosyst, 6(11):2272–86, Nov 2010. [112] Deyan Luan, Michael Zai, and Jeffrey D Varner. Computationally derived points of fragility of a human cascade are consistent with current therapeutic strategies. PLoS Comput Biol, 3(7):e142, Jul 2007. [113] Julius B. Lucks, Lei Qi, Vivek K. Mutalik, Denise Wang, and Adam P. Arkin. Versatile RNA-sensing transcriptional regulators for engineering genetic networks. Proc Natl Acad Sci, 108(21):8617–8622, May 2011. [114] C. MacKintosh and H. G. Nimmo. Purification and regulatory properties of isocitrate lyase from Escherichia coli ML308. Biochem. J., 250(1):25–31, Feb 1988. [115] R. Mahadevan and C.H. Schilling. The effects of alternate optimal solutions in constraint-based genome-scale metabolic models. Metab Eng, 5(4):264 – 276, 2003. [116] Arijit Maitra and Ken A. Dill. Bacterial growth laws reflect the evolutionary importance of energy efficiency. Proc Natl Acad Sci, 112:406–411, 2015. [117] Rey W. Martin, Benjamin J. Des Soye, Yong-Chan Kwon, Jennifer Kay, Roder- ick G. Davis, Paul M. Thomas, Natalia I. Majewska, Cindy X. Chen, Ryan D. 219 Marcum, Mary Grace Weiss, Ashleigh E. Stoddart, Miriam Amiram, Arnaz K. Ranji Charna, Jaymin R. Patel, Farren J. Isaacs, Neil L. Kelleher, Seok Hoon Hong, and Michael C. Jewett. Cell-free protein synthesis from genomically re- coded bacteria enables multisite incorporation of noncanonical amino acids. Nature Communications, 9(1):1203, 2018. [118] J H Matthaei and M W Nirenberg. Characteristics and stabilization of dnaase- sensitive protein synthesis in e. coli extracts. Proc Natl Acad Sci, 47:1580–8, Oct 1961. [119] Marco Mauri and Stefan Klumpp. A model for sigma factor competition in bacterial cells. PLoS Comput Biol, 10(10):e1003845, Oct 2014. [120] Stuart McLaughlin. The mechanism of action of dnp on phospholipid bilayer membranes. The Journal of Membrane Biology, 9(1):361–372, Dec 1972. [121] Nathalie Michel-Reydellet, Kara Calhoun, and James Swartz. Amino acid stabilization for cell-free protein synthesis by modification of the Escherichia coli genome. Metabolic Engineering, 6(3):197 – 203, 2004. [122] Ron Milo, Paul Jorgensen, Uri Moran, Griffin Weber, and Michael Springer. Bionumbers–the database of key numbers in molecular and cell biology. Nucleic Acids Res, 38:750–3, 2009. [123] Tae Seok Moon, Chunbo Lou, Alvin Tamsir, Brynne C Stanton, and Christo- pher A Voigt. Genetic programs constructed from layered logic gates in single cells. Nature, 491(7423):249–53, Nov 2012. 220 [124] Charles E Nakamura and Gregory M Whited. Metabolic engineering for the microbial production of 1,3-propanediol. Current Opinion in Biotechnology, 14(5):454 – 459, 2003. [125] S Nayak, J K Siddiqui, and J D Varner. Modelling and analysis of an ensemble of eukaryotic translation initiation models. IET Syst Biol, 5(1):2, Jan 2011. [126] Patrick P Ng, Ming Jia, Kedar G Patel, Joshua D Brody, James R Swartz, Shoshana Levy, and Ronald Levy. A vaccine directed to b cells and produced by cell-free protein synthesis generates potent antilymphoma immunity. Proc Natl Acad Sci, 109(36):14526–14531, 2012. [127] Alexander Nieß, Jurek Failmezger, Maike Kuschel, Martin Siemann- Herzberg, and Ralf Takors. Experimentally Validated Model Enables Debot- tlenecking of in Vitro Protein Synthesis and Identifies a Control Shift under in Vivo Conditions. ACS Synthetic Biology, 6(10):1913–1921, October 2017. [128] M W Nirenberg and J H Matthaei. The dependence of cell-free protein synthesis in e. coli upon naturally occurring or synthetic polyribonucleotides. Proc Natl Acad Sci, 47:1588–602, Oct 1961. [129] Edward J O’Brien, Joshua A Lerman, Roger L Chang, Daniel R Hyduke, and Bernhard Ø Palsson. Genome-scale models of metabolism and gene expression extend and refine growth phenotype prediction. Mol. Sys. Biol., 9(1):693, 2013. [130] T. Ogawa, K. Murakami, H. Mori, N. Ishii, M. Tomita, and M. Yoshin. Role of 221 phosphoenolpyruvate in the NADP-isocitrate dehydrogenase and isocitrate lyase reaction in Escherichia coli. J. Bacteriol., 189(3):1176–8, Feb 2007. [131] You-Kwan Oh, Bernhard Ø Palsson, Sung M Park, Christophe H Schilling, and Radhakrishnan Mahadevan. Genome-scale reconstruction of metabolic network in bacillus subtilis based on high-throughput phenotyping and gene essentiality data. J Biol Chem, 282(39):28791–9, Sep 2007. [132] S. Okino, M. Suda, K. Fujikura, M. Inui, and H. Yukawa. Production of D-lactic acid by Corynebacterium glutamicum under oxygen deprivation. Appl. Microbiol. Biotechnol., 78(3):449–54, Mar 2008. [133] JD Orth, I Thiele, and BØ Palsson. What is flux balance analysis? Nat. Biotechnol., 28(3):245–248, 2010. [134] Irene Otero-Muras and Julio R Banga. Multicriteria global optimization for biocircuit design. BMC Syst Biol, 8:113, Sep 2014. [135] T.N Palmer, G.J Shutts, R Hagedorn, F.J Doblas-Reyes, T Jung, and M Leut- becher. Representing model uncertainty in weather and climate prediction. Ann Rev Earth and Planetary Sci, 33:163–193, 2005. [136] BØ Palsson. Systems Biology: Properties of Reconstructed Networks. Cambridge University Press, New York, NY, USA, 2006. [137] Keith Pardee, Shimyn Slomovic, Peter Q Nguyen, Jeong Wook Lee, Nina Donghia, Devin Burrill, Tom Ferrante, Fern R McSorley, Yoshikazu Furuta, Andyna Vernet, Michael Lewandowski, Christopher N Boddy, Neel S Joshi, 222 and James J Collins. Portable, on-demand biomolecular manufacturing. Cell, 167(1):248–59.e12, Sep 2016. [138] D. S. Pereira, L. J. Donald, D. J. Hosfield, and H. W. Duckworth. Active site mutants of Escherichia coli citrate synthase. Effects of mutations on catalytic and allosteric properties. J. Biol. Chem., 269(1):412–7, Jan 1994. [139] Jessica G Perez, Jessica C Stark, and Michael C Jewett. Cell-free synthetic biology: engineering beyond the cell. Cold Spring Harbor perspectives in biology, 8(12):a023853, 2016. [140] R. R. Ramsay, B. A. Ackrell, C. J. Coles, T. P. Singer, G. A. White, and G. D. Thorn. Reaction site of carboxanilides and of thenoyltrifluoroacetone in complex ii. Proc Natl Acad Sci, 78(2):825–828, Feb 1981. 6940149[pmid]. [141] Dae-Kyun Ro, Eric M. Paradise, Mario Ouellet, Karl J. Fisher, Karyn L. New- man, John M. Ndungu, Kimberly A. Ho, Rachel A. Eachus, Timothy S. Ham, James Kirby, Michelle C. Y. Chang, Sydnor T. Withers, Yoichiro Shiba, Rich- mond Sarpong, and Jay D. Keasling. Production of the antimalarial drug precursor artemisinic acid in engineered yeast. Nature, 440(7086):940–943, 04 2006. [142] M. S. Robinson, R. A. Easom, M. J. Danson, and P. D. Weitzman. Citrate synthase of Escherichia coli. Characterisation of the enzyme from a plasmid- cloned gene and amplification of the intracellular levels. FEBS Lett., 154(1):51– 4, Apr 1983. 223 [143] Gabriel Rosenblum and Barry S. Cooperman. Engine out of the chassis: Cell-free protein synthesis and its uses. FEBS Letters, 588(2):261 – 268, 2014. Protein Engineering. [144] G.J.G. Ruijter and J. Visser. Determination of intermediary metabolites in aspergillus niger. Journal of Microbiological Methods, 25(3):295 – 302, 1996. [145] A Saltelli, P Annoni, I Azzini, F Campolongo, M Ratto, and S Tarantola. Variance based sensitivity analysis of model output. Design and estimator for the total sensitivity index. Comp Phys Comm, 181:259–70, 2010. [146] Claudia Sánchez, Juan Carlos Quintero, and Silvia Ochoa. Flux balance analysis in the production of clavulanic acid by Streptomyces clavuligerus. Biotechnol Prog, 31(5):1226–1236, 2015. [147] Michael A Savageau, Eberhard O Voit, and Douglas H Irvine. Biochemical systems theory and metabolic control theory: 1. fundamental similarities and differences. Mathematical Biosciences, 86(2):127–145, 1987. [148] C H Schilling, D Letscher, and B O Palsson. Theory for the systemic definition of metabolic pathways and their use in interpreting metabolic function from a pathway-oriented perspective. J Theor Biol, 203(3):229–48, Apr 2000. [149] Robert Schuetz, Lars Kuepfer, and Uwe Sauer. Systematic evaluation of objective functions for predicting intracellular fluxes in Escherichia coli. Mol Syst Biol, 3:119, 2007. 224 [150] S Schuster, D A Fell, and T Dandekar. A general definition of metabolic path- ways useful for systematic organization and analysis of complex metabolic networks. Nat Biotechnol, 18(3):326–32, Mar 2000. [151] J Sendın, I Otero-Muras, A A Alonso, and J Banga. Improved Optimization Methods for the Multiobjective Design of Bioprocesses. Ind. Eng. Chem. Res., 45:8594 – 8603, 2006. [152] Y Shimizu, A Inoue, Y Tomari, T Suzuki, T Yokogawa, K Nishikawa, and T Ueda. Cell-free translation reconstituted with purified components. Nat Biotechnol, 19(8):751–5, Aug 2001. [153] I.M Sobol. Global sensitivity indices for nonlinear mathematical models and their Monte Carlo estimates. Mathematics and Computers in Simulation, 55:271–80, 2001. [154] Hyohak Song and Sang Yup Lee. Production of succinic acid by bacterial fermentation. Enzyme and Microbial Technology, 39(3):352 – 361, 2006. The Asia-Pacific Biochemical Engineering Conference (APBioChEC 2005). [155] Hyun-Seob Song and Doraiswami Ramkrishna. Prediction of metabolic function from limited data: Lumped hybrid cybernetic modeling (l-hcm). Biotechnol Bioeng, 106(2):271–84, Jun 2010. [156] Hyun-Seob Song and Doraiswami Ramkrishna. Cybernetic models based on lumped elementary modes accurately predict strain-specific metabolic function. Biotechnol Bioeng, 108(1):127–40, Jan 2011. 225 [157] Hyun-Seob Song and Doraiswami Ramkrishna. Prediction of dynamic be- havior of mutant strains from limited wild-type data. Metab Eng, 14(2):69–80, Mar 2012. [158] Sang Ok Song and Jeffrey Varner. Modeling and analysis of the molecular basis of pain in sensory neurons. PLoS One, 4(9):e6758, 2009. [159] AS Spirin, VI Baranov, LA Ryabova, SY Ovodov, and YB Alakhov. A contin- uous cell-free translation system capable of producing polypeptides in high yield. Science, 242(4882):1162–1164, 1988. [160] D.E. Steinmeyer and M.L. Shuler. Structured model for Saccharomyces cerevisiae. Chem. Eng. Sci., 44:2017–30, 1989. [161] Tobias Stögbauer, Lukas Windhager, Ralf Zimmer, and Joachim O. Rädler. Experiment and mathematical modeling of gene expression dynamics in a cell-free system. Integrative Biology, 4(5):494–501, May 2012. [162] James Swartz. A pure approach to constructive biology. Nature Biotechnology, 19:732–3, 2001. [163] James R. Swartz. Transforming biochemical engineering with cell-free biol- ogy. AIChE Journal, 58(1):5–13, 2012. [164] Kazuyuki Takai, Tatsuya Sawasaki, and Yaeta Endo. Practical cell-free protein synthesis system using purified wheat embryos. Nature Protocols, 5, 2001. [165] Yikun Tan and James C Liao. Metabolic ensemble modeling for strain engi- neers. Biotechnol J, 7(3):343–53, Mar 2012. 226 [166] Akito Taneda. Multi-objective optimization for RNA design with multiple target secondary structures. BMC bioinformatics, 16(1):280, 2015. [167] A.L. Tappel. Inhibition of electron transport by antimycin a, alkyl hydroxy napthoquinones and metal coordination compounds. Biochemical Pharmacol- ogy, 3(4):289 – 296, 1960. [168] Ryan Tasseff, Satyaprakash Nayak, Saniya Salim, Poorvi Kaushik, Noreen Rizvi, and Jeffrey D Varner. Analysis of the molecular networks in androgen dependent and independent prostate cancer revealed fragile and robust subsystems. PLoS One, 5(1):e8864, 2010. [169] Ryan Tasseff, Satyaprakash Nayak, Sang Ok Song, Andrew Yen, and Jeffrey D Varner. Modeling and analysis of retinoic acid induced differentiation of uncommitted precursor cells. Integr Biol (Camb), 3(5):578–91, May 2011. [170] Uwe Theobald, Werner Mailinger, Michael Baltes, Manfred Rizzi, and Matthias Reuss. In vivo analysis of metabolic dynamics in saccharomyces cerevisiae : I. experimental observations. Biotechnol Bioeng, 55(2):305–316, 1997. [171] Ines Thiele, Neema Jamshidi, Ronan M. T. Fleming, and Bernhard O. Palsson. Genome-scale reconstruction of Escherichia coli’s transcriptional and transla- tional machinery: A knowledge base, its mathematical formulation, and its functional characterization. PLOS Computational Biology, 5(3):1–13, 03 2009. [172] Andrea C Timm, Peter G Shankles, Carmen M Foster, Mitchel J Doktycz, and 227 Scott T Retterer. Toward microfluidic reactors for cell-free protein synthesis at the point-of-care. Small, 12(6):810–817, 2016. [173] M Tomita, K Hashimoto, K Takahashi, T S Shimizu, Y Matsuzaki, F Miyoshi, K Saito, S Tanida, K Yugi, J C Venter, and C A Hutchison. E-cell: software environment for whole-cell simulation. Bioinformatics, 15(1):72–84, 1999. [174] Linh M Tran, Matthew L Rizk, and James C Liao. Ensemble modeling of metabolic networks. Biophys J, 95(12):5606–17, Dec 2008. [175] Kelly A. Underwood, James R. Swartz, and Joseph D. Puglisi. Quantitative polysome analysis identifies limitations in bacterial cell-free protein synthesis. Biotech. Bioeng., 91(4):425–35, 2005. [176] A Varma and B O Palsson. Stoichiometric flux balance models quantitatively predict growth and metabolic by-product secretion in wild-type Escherichia coli W3110. Appl. Environ. Microbiol., 60(10):3724–3731, 1994. [177] J. Varner and D. Ramkrishna. Metabolic engineering from a cybernetic perspective. 1. theoretical preliminaries. Biotechnology Progress, 15(3):407–425, 1999. [178] J Varner and D Ramkrishna. Metabolic engineering from a cybernetic per- spective: aspartate family of amino acids. Metab Eng, 1(1):88–116, Jan 1999. [179] Varnerlab. http://www.varnerlab.org/downloads/. [180] M. Vilkhovoy, M. Minot, and J. D. Varner. Effective dynamic models of metabolic networks. IEEE Life Sciences Letters, 2(4):51–54, Dec 2016. 228 [181] Michael Vilkhovoy, Nicholas Horvath, Che-Hsiao Shih, Joseph A. Wayman, Kara Calhoun, James Swartz, and Jeffrey D. Varner. Sequence specific model- ing of e. coli cell-free protein synthesis. ACS Synthetic Biology, 7(8):1844–1857, Aug 2018. [182] Tobias von der Haar. Mathematical and computational modelling of ribo- somal movement and protein synthesis: An overview. Computational and Structural Biotechnology Journal, 1(1):e201204002, 2012. [183] Joseph A. Wayman, Adithya Sagar, and Jeffrey D. Varner. Dynamic modeling of cell-free biochemical networks using effective kinetic models. Processes, 3(1):138, 2015. [184] Joseph A. Wayman and Jeffrey D. Varner. Biological systems modeling of metabolic and signaling networks. Curr Opin Chem Eng, 2, 2013. [185] Sharon J. Wiback, Iman Famili, Harvey J. Greenberg, and Bernhard Ø Palsson. Monte carlo sampling can be used to determine the size and shape of the steady-state flux space. Journal of Theoretical Biology, 228(4):437–447, 2004. [186] Sharon J Wiback, Radhakrishnan Mahadevan, and Bernhard Ø Palsson. Reconstructing metabolic flux vectors from extreme pathways: defining the alpha-spectrum. J Theor Biol, 224(3):313–24, Oct 2003. [187] Wolfgang Wiechert. 13C Metabolic Flux Analysis. Metabol. Eng., 3(3):195 – 206, 2001. 229 [188] T Winnick. Incorporation of labeled amino acids into the protein of embry- onic and tumor tissue homogenates. 9(1):247, 1950. [189] R. C. Wohl and G. Markus. Phosphoenolpyruvate carboxylase of Escherichia coli. Purification and some properties. J. Biol. Chem., 247(18):5785–92, Sep 1972. [190] P Wu, N G Ray, and M L Shuler. A single-cell model for cho cells. Ann N Y Acad Sci, 665:152–87, Oct 1992. [191] Wen-Chu Yang, Miroslav Sedlak, Fred E. Regnier, Nathan Mosier, Nancy Ho, and Jiri Adamec. Simultaneous quantification of metabolites involved in central carbon and energy metabolism using reversed-phase liquid chromatography-mass spectrometry and in vitro 13c labeling. Analytical Chemistry, 80(24):9508–9516, Dec 2008. [192] Harry Yim, Robert Haselbeck, Wei Niu, Catherine Pujol-Baxley, Anthony Burgard, Jeff Boldt, Julia Khandurina, John D Trawick, Robin E Osterhout, Rosary Stephen, Jazell Estadilla, Sy Teisan, H Brett Schreyer, Stefan Andrae, Tae Hoon Yang, Sang Yup Lee, Mark J Burk, and Stephen Van Dien. Metabolic engineering of Escherichia coli for direct production of 1,4-butanediol. Nat Chem Biol, 7(7):445–452, 07 2011. [193] Jamey D. Young, Kristene L. Henne, John A. Morgan, Allan E. Konopka, and Doraiswami Ramkrishna. Integrating cybernetic modeling with pathway analysis provides a dynamic, systems-level description of metabolic control. Biotechnol Bioeng, 100(3):542–559, 2008. 230 [194] Nicola Zamboni, Sarah-Maria Fendt, and Uwe Sauer. 13c-based metabolic flux analysis. Nature Protocols, 4:878–92, May 2009. [195] J. Zawada, B. Richter, E. Huang, E. Lodes, A. Shah, and J. R. Swartz. High- Density, Defined Media Culture for the Production of Escherichia coli Cell Extracts, chapter 9, pages 142–156. [196] James F. Zawada, Gang Yin, Alexander R. Steiner, Junhao Yang, Alpana Naresh, Sushmita M. Roy, Daniel S. Gold, Henry G. Heinsohn, and Christo- pher J. Murray. Microscale to manufacturing scale-up of cell-free cytokine production—a new approach for shortening protein production development timelines. Biotechnol Bioeng, 108(7):1570–1578, 2011. [197] Ying Zhang, Ines Thiele, Dana Weekes, Zhanwen Li, Lukasz Jaroszewski, Krzysztof Ginalski, Ashley M. Deacon, John Wooley, Scott A. Lesley, Ian A. Wilson, Bernhard Palsson, Andrei Osterman, and Adam Godzik. Three- dimensional structural view of the central metabolic network of thermotoga maritima. Science, 325(5947):1544–1549, 2009. [198] T. Zhu, M. F. Bailey, L. M. Angley, T. F. Cooper, and R. C. Dobson. The quaternary structure of pyruvate kinase type 1 from Escherichia coli at low nanomolar concentrations. Biochimie, 92(1):116–20, Jan 2010. 231