Cornell University
Library
Cornell UniversityLibrary

eCommons

Help
Log In(current)
  1. Home
  2. Cornell University Graduate School
  3. Cornell Theses and Dissertations
  4. The Population Genetic Variation of Interspersed and Tandem Repeats

The Population Genetic Variation of Interspersed and Tandem Repeats

File(s)
Said_cornellgrad_0058F_13807.pdf (2.47 MB)
SUPPLEMENTAL_FIGURES.pdf (1.58 MB)
SUPPLEMENTAL_FIGURE_LEGENDS.pdf (143.85 KB)
Supplemental_Table_1.csv (4.67 MB)
Permanent Link(s)
https://doi.org/10.7298/g9d8-1g70
https://hdl.handle.net/1813/114751
Collections
Cornell Theses and Dissertations
Author
Said, Iskander
Abstract

Despite making up large and essential portions of eukaryotic genomes repeat DNA has largely eluded study due to limitations in alignment and assembly of Next Generation short-read sequencing. Standard technologies and methods are largely inadequate to study the abundance and variation of repeat sequences at the population scale. Therefore, I have developed and applied new methods and algorithms to study the population variation of interspersed and tandem repeats, specifically transposable elements and simple satellites. Firstly, using hierarchical clustering and population genetic theory I have developed a method to infer transposable element clades and their age, circumventing difficult problems of phasing and mapping. This method resulted in a discovery that host piRNA regulatory mechanisms are turning over to regulate newly emerging transposable element variants shedding light on an evolutionary arms race. Secondly, I applied k-mer counting methods to human population genomics data to quantify abundance and variation of simple satellites discovering undescribed telomeric and centromeric satellites. I then applied a mixed modelling framework to find associations between centromeric ancestry and simple satellite abundances which lead to the discovery of an expansion of a centromeric satellite copy number in a cluster of African and Latin American individuals with shared centromeric ancestry. Overall, this dissertation represents an analysis framework for the development of computational strategies to mine population genomics data for repeat variation. The approaches I have developed and deployed have allowed me to make inferences about repeat abundance and variation in large population genomic datasets that were previously unfeasible.

Description
172 pages
Supplemental file(s) description: None.
Date Issued
2023-08
Keywords
evolution
•
genomics
•
population genetics
•
repeats
•
satellites
•
transposable elements
Committee Chair
Barbash, Daniel
Committee Member
Clark, Andrew
Feschotte, Cedric
Mezey, Jason
Degree Discipline
Genetics, Genomics and Development
Degree Name
Ph. D., Genetics, Genomics and Development
Degree Level
Doctor of Philosophy
Rights
Attribution-NonCommercial-ShareAlike 4.0 International
Rights URI
https://creativecommons.org/licenses/by-nc-sa/4.0/
Type
dissertation or thesis
Link(s) to Catalog Record
https://newcatalog.library.cornell.edu/catalog/16219177

Site Statistics | Help

About eCommons | Policies | Terms of use | Contact Us

copyright © 2002-2026 Cornell University Library | Privacy | Web Accessibility Assistance