The Population Genetic Variation of Interspersed and Tandem Repeats

Other Titles


Despite making up large and essential portions of eukaryotic genomes repeat DNA has largely eluded study due to limitations in alignment and assembly of Next Generation short-read sequencing. Standard technologies and methods are largely inadequate to study the abundance and variation of repeat sequences at the population scale. Therefore, I have developed and applied new methods and algorithms to study the population variation of interspersed and tandem repeats, specifically transposable elements and simple satellites. Firstly, using hierarchical clustering and population genetic theory I have developed a method to infer transposable element clades and their age, circumventing difficult problems of phasing and mapping. This method resulted in a discovery that host piRNA regulatory mechanisms are turning over to regulate newly emerging transposable element variants shedding light on an evolutionary arms race. Secondly, I applied k-mer counting methods to human population genomics data to quantify abundance and variation of simple satellites discovering undescribed telomeric and centromeric satellites. I then applied a mixed modelling framework to find associations between centromeric ancestry and simple satellite abundances which lead to the discovery of an expansion of a centromeric satellite copy number in a cluster of African and Latin American individuals with shared centromeric ancestry. Overall, this dissertation represents an analysis framework for the development of computational strategies to mine population genomics data for repeat variation. The approaches I have developed and deployed have allowed me to make inferences about repeat abundance and variation in large population genomic datasets that were previously unfeasible.

Journal / Series

Volume & Issue


172 pages

Supplemental file(s) description: None.


Date Issued




evolution; genomics; population genetics; repeats; satellites; transposable elements


Effective Date

Expiration Date




Union Local


Number of Workers

Committee Chair

Barbash, Daniel

Committee Co-Chair

Committee Member

Clark, Andrew
Feschotte, Cedric
Mezey, Jason

Degree Discipline

Genetics, Genomics and Development

Degree Name

Ph. D., Genetics, Genomics and Development

Degree Level

Doctor of Philosophy

Related Version

Related DOI

Related To

Related Part

Based on Related Item

Has Other Format(s)

Part of Related Item

Related To

Related Publication(s)

Link(s) to Related Publication(s)


Link(s) to Reference(s)

Previously Published As

Government Document




Other Identifiers


Attribution-NonCommercial-ShareAlike 4.0 International


dissertation or thesis

Accessibility Feature

Accessibility Hazard

Accessibility Summary

Link(s) to Catalog Record