Cornell University
Library
Cornell UniversityLibrary

eCommons

Help
Log In(current)
  1. Home
  2. Cornell University Graduate School
  3. Cornell Theses and Dissertations
  4. From Arabidopsis to Zea: Learning conserved cis mechanisms of gene regulation

From Arabidopsis to Zea: Learning conserved cis mechanisms of gene regulation

File(s)
Wrightsman_cornellgrad_0058F_14611.pdf (10.96 MB)
Permanent Link(s)
https://doi.org/10.7298/hncs-kw41
https://hdl.handle.net/1813/116622
Collections
Cornell Theses and Dissertations
Author
Wrightsman, Travis
Abstract

cis-regulatory elements (CREs) are critical functional components of the genome, controlling both the timing and magnitude of gene expression. Relative to protein-coding regions, CREs have proven more difficult and expensive to catalogue even within humans, with CRE knowledge in other higher organisms trailing far behind. To reduce the cost of locating CREs, many deep learning sequence-based models have been developed within species to predict CREs directly from DNA sequence. However, the vast majority of these models are validated within species or against species in the training set and never tested on a completely held-out set of species, questioning their generalizability. This dissertation explores three methods to locate CREs in held-out species at different phylogenetic scopes, from angiosperms to the Andropogoneae. The first method uses a recurrent convolutional neural network to classify 600 base pair sequence windows as accessible or hypomethylated in leaf tissue, two epigenetic signals associated with CREs. Models trained across multiple species and tested on a held-out species perform competitively or superior to models trained and tested solely within species. These multi-species models demonstrate the feasibility of predicting epigenetic signals of CREs in understudied species. The second method compares four published genomic deep learning model architectures on their ability to predict RNA abundance from 1,000 base pairs of promoter and UTR sequence. Models trained across the Andropogoneae and tested within maize showed moderate performance across all genes but poor performance within maize orthogroups. The dataset used to fairly compare all architectures has been publicly released as a community resource to consistently benchmark future expression model architectures. The final method uses phylogenetic footprinting with hundreds of Andropogoneae genomes to filter motif matches in maize to likely functional transcription factor binding sites. Aided by the high alignment depth, motifs within some transcription factor families show strong clustering into novel subfamily motifs that can be associated with changes in tissue-specific gene expression. These novel subfamilies are promising candidates for development and stress-specific transcription factor family members. Together, these three methods demonstrate the utility in leveraging data from many related species to identify CREs or functional loci within CREs, which can be useful targets for genome editing for crop improvement.

Description
83 pages
Date Issued
2024-08
Committee Chair
Buckler, Edward
Committee Member
Williams, Amy
Richards, Eric
Degree Discipline
Plant Breeding
Degree Name
Ph. D., Plant Breeding
Degree Level
Doctor of Philosophy
Type
dissertation or thesis
Link(s) to Catalog Record
https://newcatalog.library.cornell.edu/catalog/16611768

Site Statistics | Help

About eCommons | Policies | Terms of use | Contact Us

copyright © 2002-2026 Cornell University Library | Privacy | Web Accessibility Assistance