Weill Cornell Theses and Dissertations

Permanent URI for this collection


Recent Submissions

Now showing 1 - 10 of 196
  • Item
    High Throughput Genetics And Characterization Of An RNA Arbovirus, Sindbis Virus, Using Accurate Next-Generation Sequencing Of Viral Evolution And RNA Enrichment
    Ambrose, Pradeep Morris (2020-05-20)
    The goal of these studies was to investigate Sindbis virus adaptation to various infection bottlenecks and utilize the dynamics of minor variants to study viral genetics in a high-throughput manner. During infection an RNA virus exists as a tremendously diverse population, and this genetic diversity underlies their ability to rapidly adapt to new conditions and cause disease. Since viruses evolve as populations, our understanding of viral evolution has historically been limited by the inability to characterize populations. Whilst new sequencing technologies provide sufficient depth to sequence full viral populations, their intrinsic base-calling error rate combined with mutations introduced during sample processing makes viral mutations and sequencing errors indistinguishable. Utilizing rolling reverse transcription, the novel CirSeq technique virtually eliminates sequencing errors by bioinformatically parsing tandem generated repeats, and for the first time allows a highly accurate mutational landscape profile of the whole viral population. In addition, a novel hybridization capture technique we developed allows us to maximize the sequencing coverage of desired viral RNA molecules. We used these technologies to map the mutational distribution of RNA virus populations and perform genetics are previously unseen scales. We sequenced serial passages of the well-characterized Sindbis virus to yield novel information on genetic features crucial for viral replication. We analyzed how the starting in vitro transcribed RNA population adapts to various bottlenecks encountered during electroporation and subsequent passaging, and during packaging and egress. Then we compared these data to previous studies of critical genome sites and expanded our study to new sites of interest. We posit that such unbiased high-throughput genetics pushes the envelope beyond the previous limits on discovery of viral functional elements. These techniques can be used to further characterize clinically relevant RNA viruses that are agents of current and recent epidemics, such as SARS-CoV-2 coronavirus, chikungunya virus, Zika virus, eastern equine encephalitis virus, dengue virus, and West Nile virus
  • Item
    Dna Replication And Transcription In The C. Elegans Embryo
    Bellush, James (2019)
    Animal embryogenesis represents a cascade of genetically encoded events that transform a single-celled zygote, composed of cytoplasm and nuclei from two haploid gametes, into a fully formed organism. The “totipotent” zygote is the only cell with the ability to divide and produce cells of all differentiated types; however, the manner in which the embryonic genome is replicated, packaged, and transcribed during the regulatory transitions of animal development remain elusive. To further our understanding of early animal development, I developed integrative deep sequencing and bioinformatics approaches to profile the genomic landscape of embryogenesis in the nematode Caenorhabditis elegans. Comparison of DNA replication, nascent RNA transcription, and active histone modification profiles from synchronized populations of staged worm embryos revealed a striking model of genome organization, shaped by the combined influence of DNA replication origin specification and the developmental transcriptome. We find that the embryonic transcriptome and chromatin modification landscape are largely defined by DNA replication origins and the constitutive transcription of housekeeping genes arranged within operons. We observe a dramatic remodeling of gene expression relative to DNA replication origins during the shift from the proliferation to the differentiation phase of embryogenesis—analysis of embryonic chromatin states and nascent RNA transcription patterns reveal that the morphogenesis gene expression program in differentiated cells is coupled to the de novo remodeling of histone modifications and a switch to pervasive RNA polymerase elongation. Our data suggest a novel mechanism by which a small regulatory RNA pathway acting in the nucleus is involved in coordinating the switch between the germline/embryonic gene expression program and the somatic differentiation program. This pathway centers on an essential C. elegans Argonaute, CSR-1, and its associated 22-nt small RNAs that mediate CSR-1 interaction with the nascent embryonic transcriptome. CSR-1 is known to balance the level of maternal RNA transcripts that support early embryogenesis through selective slicing of endogenous mRNA transcripts during germline development. Our model proposes that CSR-1 activity from the adult germline is inherited in the early embryo and that its gradual titration from somatic cells during early embryonic cell divisions helps gradually transition the embryo from a program of rapid proliferation to one of tissue-specific morphogenesis. The work described herein reveals the influence of DNA replication and transcriptional regulation on the spatiotemporal dynamics of C. elegans embryogenesis and lends insight into the evolutionary forces which shape metazoan genome structure and function.
  • Item
    Decoding Chromatin Accessibility Programs In Cancer
    Chhangawala, Sagar (2019)
    Chromatin accessibility plays an important role in defining cell identity and phenotype. With the emergence of novel methods like ATAC-seq, a sequencing method that maps regions of open chromatin and enables the computational analysis of transcription factor (TF) binding at chromatin accessible sites, we can start to dissect the regulatory landscape in cancer. I present two vignettes that use ATAC-seq to analyze the phenotypes of tumor: 1. Pancreatic cancer is expected to become the 2nd deadliest cancer by 2020 in the US, and few therapeutic options are currently available. Additionally, 50% of pancreatic cancer patients recur within just one year. Previous genomic analyses of pancreatic tumors, including somatic mutation mapping and gene expression profiling, did not explain this difference in recurrence. We hypothesized that epigenetic heterogeneity underlies previously described difference in recurrence. We sorted 54 fresh patient tumor samples based on EpCAM (an epithelial cell marker) to enrich for tumor cells and subjected them to ATAC-seq. Using supervised learning and generalized linear modeling, we were able to characterize the changes in RNA-seq and ATAC-seq between recurrent vs non-recurrent patients. We characterized TF motifs in accessible peaks across all samples and used ridge regression to identify differential TF activity enriched in recurrent patients. Two TF hits, ZSCAN1 and HNF1b, were experimentally validated to predict recurrence in our cohort and in an independent cohort. These results reveal a novel regulatory landscape in recurrent patients of pancreatic cancer and support the development of individualized therapies. 2. Approximately 70% of breast cancers express estrogen receptor (ER) and are treated with ER-blocking endocrine therapy (e.g. fulvestrant). Despite the efficacy of such treatments, resistance to anti-hormonal therapy remains a clinical challenge. We performed an epigenome-wide CRISPR knockout screen on MCF7 ER-positive breast cancer cells, and identified ARID1A to be the top candidate whose loss limits the sensitivity to fulvestrant. To uncover how ARID1A loss confers fulvestrant resistance, we undertook a chromatin-based approach. Analysis from ATAC-seq and RNA-seq assays showed that loss of ARID1A leads to a widespread chromatin remodeling of the breast cancer epigenome to regulate the binding of a series of TF that in concert alter gene expression profiles. This results in a switch from luminal cells to ER independent basal-like cells, which has adverse prognosis for patients on hormone therapy.
  • Item
    Representation Learning On Sequential Medical Data
    Hyland, Stephanie (2019)
    The way we do medicine is undergoing a revolution driven by technology. As the modern drive to record, share, and analyse data sweeps across society, healthcare lies squarely in its path. Data generated by every-day clinical practice presents an invaluable view of health and disease at a scale previously unimaginable. However, to benefit it, we need computational tools to extract meaning, clinical insight, and actionable predictions. This new digital era of medicine is an opportunity not only for healthcare providers, but also for machine learning researchers to develop new methods tailored to the unique demands of this complex domain. The work described here sits in this sphere.Firstly, we explore representation learning for medical language. With its long-tailed distribution of technical terms, medical language necessitates development of methods to augment data-scarcity by exploiting prior information encoded in knowledge graphs. Obtaining semantically meaningful representations of medical concepts and their relationships is vital, and we describe a probabilistic model to learn such representations.Secondly, we address learning from and implicitly representing long time series using recurrent neural networks. These long sequences are commonplace in medicine, where one's health history is necessarily lengthy, but early events nonetheless provide crucial context. To address vanishing and exploding gradients in the training these networks, we propose a novel parametrisation exploiting the correspondence between the Lie group of unitary matrices and its Lie algebra.Next, a method for generating synthetic ICU time series data is described in the framework of adversarial networks. A core challenge for researchers in healthcare is the scarcity of shareable datasets on which to benchmark. Realistic synthetic data is therefore key. Novel methods for evaluating the quality of this synthetic data are proposed, and the model's privacy and memorisation properties are analysed, both heuristically and in terms of differential privacy.Finally, an ensemble of gradient-boosted decision trees are employed to identify circulatory system deterioration in Swiss ICU patients. As this system has been developed for deployment, we carefully detail the data processing steps, task specification, and evaluation considerations necessary for a real-world, real-time early warning system driven by machine learning.
  • Item
    Statistical Models For The Function And Evolution Of Cis-Regulatory Elements In Mammals
    Dukler, Noah (2019)
    Precise gene regulation is essential for a wide variety of transient, developmental, and homeostatic processes. The majority of gene regulation is mediated by cis-regulatory elements, both distal (enhancers), and proximal (promoters \& enhancers). Developments in biochemical assays, gene editing techniques, and sequencing technology have enabled genome-wide profiling of regulatory elements over a wide variety of \textit{in vivo} conditions. In this tripartite work, I present separate statistical frameworks for analyzing how these repertoires of regulatory elements work at both physiological, and evolutionary timescales. The first part describes the use of PRO-seq to characterize rapid changes in the transcriptional landscape of human cells to celastrol, a compound that has potent anti-inflammatory, tumor-inhibitory, and obesity-controlling effects. By exploiting the ability of PRO-seq to detect nascent RNAs, I characterize the transcriptional response at both genes and enhancers, and leverage statistical models to detect transcription factors that orchestrate it. I implicate several transcription factors in early transcriptional changes, including members of the E2F and RFX families. PRO-seq also allows us to detect an increase in transcription start site proximal pausing, suggesting that pause release may be a mechanism for inhibiting gene expression during the celastrol response. This work demonstrates that a thorough analysis of PRO-seq time-course data can provide novel insight into multiple aspects of a complex transcriptional response.The second part develops a statistical model for determining whether constituent enhancers of a ``super-enhancer'' exhibit synergy and thus address the question ``Is a super-enhancer greater than the sum of its parts?'' In this work I reconcile two works with seemingly opposing theses by finding that we cannot confidently reject synergy-free models for super-enhancers. Furthermore, I demonstrate that thoughtful consideration of null models for synergy in gene regulation is critical for furthering our understanding of ensembles of regulatory elements.In the final section, I develop evolutionary models for cis-regulatory function as quantified by genome-wide biochemical assays. I apply a noise-aware phylogenetic model to analyze the evolution of H3K27Ac and H3K4me3 histone marks as proxies of enhancer and promoter function. I estimate relative turnover rates for a variety of functional element categories and show that gene expression and sequence constraint correlate with turnover rate. I also propose that dosage sensitivity of target genes can explain the discrepancy between sequence and histone mark turnover rates of associated CREs.This work illustrates the important role statistical models play in understanding gene regulation at all levels and suggests a potential path towards unified models of gene regulation and evolution.
  • Item
    Consequences of Pericentromeric DNA Hypomethylation: Lessons From an Animal Model of ICF Syndrome
    Rajshekar, Srivarsha (2019)
    The modified base, 5-methylcytosine (5mC) is enriched at repetitive DNA sequences including satellite repeats that surround chromosome centromeres. These centromeric and pericentromeric satellite repeats are important for stable chromosome structure and proper chromosome segregation. Loss of 5mC at pericentromeric repeats is common in cancer and senescence. While the general importance of 5mC is well-established, the specific functions of 5mC at pericentromeres are less clear. 5mC loss at pericentromeric repeats is a molecular hallmark of the rare genetic disease Immunodeficiency, Centromere instability and Facial abnormalities (ICF) syndrome. To date, attempts to model specific loss of 5mC at pericentromeres in mouse through mutation of ICF associated genes have been unsuccessful. Here, I develop a zebrafish model for ICF syndrome by mutating the zebrafish ortholog of ZBTB24, a poorly characterized gene that is disrupted in ~30% of ICF patients. zbtb24 mutant zebrafish recapitulate key features of ICF syndrome including immunodeficiency, facial abnormalities, gastrointestinal defects, impaired growth and reduced lifespan. I also show that homozygous mutation of zbtb24 causes a progressive loss of 5mC at pericentromeric satellite repeats in zebrafish. This progressive loss of methylation allowed for elucidation of primary vs secondary consequences of hypomethylation at these sequences. Transcriptome analysis revealed that one of the earliest consequences of pericentromeric hypomethylation was activation of an interferon-based innate immune response. Mechanistically, I tie this response to derepression of pericentromeric satellite transcripts and I demonstrate that these aberrant transcripts are recognized through the MDA5-MAVS dsRNA-sensing machinery, which is normally associated with an innate immune response to viruses. Additional preliminary studies indicate increased incidence of DNA damage and tumor formation in zbtb24 mutants suggesting that pericentromeric 5mC is likely important for genome stability. Taken together, this thesis describes the first viable animal model of ICF Syndrome, reveals a function for ICF-gene zbtb24 in the long-term maintenance of pericentromeric DNA methylation and identifies roles for pericentromeric DNA methylation in preventing autoimmunity and maintaining genome integrity.
  • Item
    A Novel Optical Dynamic Clamp Platform Using Ipsc-Derived Cardiomyocytes For Drug Screening
    Quach, Bonnie (2019)
    iPSC-derived cardiomyocytes (iPSC-CMs) are a potentially advantageous platform for drug screening because they provide a renewable source of human cardiomyocytes and can be patient specific. One obstacle to their implementation is their neonatal-like electrophysiology, which reduces relevance to adult arrhythmogenesis. One method to address this problem is to electrically mimic deficient currents in iPSC-CMs using a technique called dynamic clamp. Mimicking the missing inward rectifying potassium current, IK1, in iPSC-CMs via dynamic clamp pushes action potential characteristics to resemble more closely an adult cardiomyocyte. However, this method is technically challenging and low throughput, limiting its practical uses for more high-throughput applications, such as large-scale drug screening. To address this, we aim to create an optically-controlled version of dynamic clamp, which because of its contactless nature, could be high-throughput and not limited to a single-cell format. The ideal platform would use optogenetics to supplement the deficient current and use a fluorescent voltage indicator to measure the membrane potential. Optogenetic tools are commonly used statically and to either stimulate or cease electrical activity. This thesis presents a proof of principle of using optogenetic tools in lieu of an electrode by developing an optical dynamic clamp (ODC) platform that uses an LED to dynamically activate a hyperpolarizing opsin, ArchT, to generate an IK1-like current. This ODC platform was verified with the standard electrode-based dynamic clamp (EDC) and gave a similar output, demonstrating a proof-of-concept that optogenetics are able to mimic an electrode. The ODC platform was challenged with E4031, bayK 8664, terfenadine, and verapamil. The ODC platform was able to detect effects of the drugs on action potential characteristics similar to EDC, but the ODC platform did not consistently yield results identical to EDC. Possible reasons and limitations are discussed. With further development, the ODC platform can possibly be refined to be more precise, but maturation of iPSC-CMs may still be needed to make the platform more relevant to adult electrophysiology. The ODC platform has the potential to expand on the possibilities of dynamic clamp by enabling more relevant formats, such as monolayers, co-cultures or with other engineered platforms.
  • Item
    Characterizing Notch1 Singaling Regulation And Amyloid Beta Cleavage
    Schachter, David (2019)
    Gamma secretase cleaves numerous substrates that regulate a variety of cellular processes. The involvement of Gamma secretase in so many different signaling systems makes it a challenging enzyme to target without toxic side-effects. The two predominantly studied substrates are Notch and Amyloid precursor protein (APP). Dysregulated cleavage of the first can lead to cancer, while cleavage of the second results Amyloid Beta (A?) production which contributes to Alzheimer’s Disease (AD). Therefore, the development of substrate specific gamma secretase modulators is of primary importance. To find a modulator of Notch signaling we screened a 20-amino acid displaying bacteriophage library for a peptide sequence that recognizes Notch1. Through this approach we identified a sequence that specifically binds to Notch1 and none of the other isoforms of Notch. The phage displaying this sequence co-localized and co-immunoprecipitated with Notch1 and was able to reduce Notch1 signaling in a Notch reporter system. Therefore, we have identified a novel sequence that can specifically modulate Notch1 signaling without interfering with cleavage of similar proteins. The AD field has been primarily restricted to studying two A? species: A?-40 and A?-42, that are produced after gamma secretase cleavage of two sites on APP. While other cleavage sites have been reported, the ability to study and characterize them, especially in relation to Alzheimer’s Disease, has been hampered due to the lack of tools specific to these alternate sites. To address this issue, we generated, and characterized, monoclonal antibodies to these cleavage sites. We confirmed that these antibodies are selective for the alternative species, including A?-43 and A?-45, and utilized Alpha-LISA technology to quantify binding strength. We were also able to utilize some of these antibodies in immunohistochemistry and aggregation binding studies to further investigate difference in behavior of these amyloid species. The development of these antibodies will enable us to focus on alternative cleavage sites of A? and study their contribution to AD.
  • Item
    Applications Of Mass Spectrometry-Based Metabolomics To Address Biomedical Problems
    Schwartz, Benjamin (2019)
    The cadre of small molecule metabolites in a cell, aka the “metabolome,” is the final output of the genome and ultimate determinant of cell phenotype (DNA ? RNA ? protein ? metabolite ? phenotype). In addition to a genetic contribution, the metabolome is conditioned by diet, lifestyle and composition of the microbiome - ultimately determining health vs. disease status. Therefore, the profiling of small molecule levels using mass spectrometry-based metabolomic analyses can contribute significantly to our understanding of disease processes, enable confident molecular diagnoses, monitor the efficacy of therapies, and serve many other biomedical applications. Here, we discuss applications and considerations for the use of targeted and untargeted metabolite profiling using mass spectrometry. This topic is presented in the context of three independent research studies, described herein. First, we describe the use of targeted metabolomics to characterize the endocannabinoid system in a mouse model of alcohol “binge” drinking. We show that anandamide, one of the primary endogenous agonists of the cannabinoid receptor, along with several other endocannabinoids, increase in response to acute alcohol withdrawal. Further, we consider the biological significance of these observations. Second, we describe the use of mass spectrometry-based untargeted metabolite profiling and stable isotope tracing for defining metabolic perturbations that occur in the setting of sporadic amyotrophic lateral sclerosis (sALS), considering discoveries made with patient-derived skin fibroblasts. Using a multiomic approach, this study identifies and characterizes a distinct subgroup of sALS patients, possessing fibroblasts that are typified by enhanced transsulfuration pathway activity and glucose hypermetabolism. We speculate that this sALS patient subclass will prove to be selectively responsive to anti-oxidant therapies and therefore a recognition of this patient subclass may allow for the future establishment of personalized medicines. Finally, we discuss efforts to characterize solute carrier 25 (SLC25) family mitochondrial transporters using CRISPR knockout cell lines and a heterologous bacterial overexpression system-based protocol. These studies utilize mass spectrometry and stable isotope-labeled compounds in attempt to recognize potential biological substrates for these transporters.
  • Item
    Ethanol (Etoh)-Mediated Differentiation Of Embryonic Stem Cells Via Retinoic Acid (Ra)-Retinoic Acid Receptor-Gamma (Rar?) Signaling
    Serio, Ryan (2019)
    Ethanol (EtOH) is a teratogen, but the mechanisms by which EtOH exerts its teratogenic effects aren’t fully understood. Vitamin A (all-trans retinol/ROL) can be oxidized to all-trans-retinoic acid (RA), which plays a critical role in differentiation and development. Using an embryonic stem cell (ESC) model to analyze effects of EtOH on differentiation, we show that mRNAs associated with differentiation are increased by EtOH and its metabolite acetaldehyde, but not its acid metabolite acetate. EtOH also decreases pluripotency-related mRNA levels. Kinetics assays showed that ALDH2, and not ALDH1A2, is responsible for metabolizing most of the acetaldehyde in ESCs. Using reporter assays, chromatin immunoprecipitation assays, and RAR?-knockout ESC lines generated by CRISPR/Cas9 or homologous recombination, we demonstrate that EtOH signals via RAR? binding to RA response elements (RAREs) in differentiation-associated genes. We also demonstrate that EtOH-mediated increases in Hoxa1 and Cyp26a1 transcripts, used as examples of direct RA target genes, require expression of the RA-synthesizing enzyme ALDH1A2. This result suggests that EtOH-mediated induction of Hoxa1 and Cyp26a1 transcripts requires ROL from serum. The retinol dehydrogenase gene RDH10 and a functional RARE in the ROL transporter Stra6 gene are required for EtOH induction of Hoxa1 and Cyp26a1 mRNAs, as shown with CRISPR/Cas9 knockout lines. Thus, we identify a mechanism by which EtOH stimulates stem cell differentiation via increased influx and metabolism of ROL for downstream RAR?-dependent transcription. Our data suggest that in stem cells EtOH may shift cell fate decisions to alter developmental outcomes by increasing endogenous ROL/RA signaling via increased STRA6 expression and ROL oxidation. Furthermore, we suggest that stem cells, which generally cannot produce retinyl esters, may be particularly vulnerable to EtOH teratogenesis.