JavaScript is disabled for your browser. Some features of this site may not work without it.
Relatedness inference, pedigree reconstruction, and cancer genomics in large cohorts.

Author
Seidman, Daniel Noah
Abstract
As the cost of collecting genetic information decreases, and the size of available genetic datasets increases, tools that efficiently process these datasets become more important. Our goal is to develop multiple algorithms that can extract useful quantities from these large datasets to inform further analyses in medicine and population genetics.We first present on IBIS, an IBD detector we developed that locates long regions of allele sharing between unphased individuals. We compared this algorithm's performance to several contemporary alternatives, some that rely on phasing and some that do not. We determined that, in addition to being comparatively efficient, our algorithm’s ability to infer IBD segments $\ge 7 cM$ compares favorably for to other methods. With these segments, we found that IBIS can classify first through third degree relatives in real data at rates meeting or exceeding other methods and identifies fourth through sixth degree pairs at rates close to the top methods. Next, we discuss our algorithm that leverages read data from multiple tumor samples to handle the complex problem of constructing phylogeny trees that include structural variants (SVs). The algorithm, Meltos, uses tumor phylogeny trees built on somatic single nucleotide variants (SNVs) initially to form a scaffold, and then it attempts to insert high confidence SVs to produce a comprehensive lineage tree. We found that using evolutionary constraints for variant allele frequences, combined with a new probabilistic formula for calculating said frequences, provides evidence that helps weed out false positive SVs and place true positive SVs into their phylogeny framework. Lastly, we return to the goal of working with IBD information. We present on our algorithm, PELICAN, that takes the information from IBIS, together with statistical likelihood for specific second degree relationships, provided by CREST (Qiao, Sannerud, et al. 2021), to quickly and accurately infer pedigrees from large genome datasets. We utilize a combination of likelihoods and biological constraints to perform a backtracking search that exhaustively checks the entire set of possible pedigrees for the highest possible likelihood pedigrees, and PELICAN is able to do so at speeds comparable to other state of the art algorithms.
Description
190 pages
Date Issued
2022-08Subject
Algorithm; Identical by Descent; Pedigrees; Phylogenetic trees; Population Genetics
Committee Chair
Williams, Amy L.
Committee Member
Mezey, Jason G.; Clark, Andrew
Degree Discipline
Computational Biology
Degree Name
Ph. D., Computational Biology
Degree Level
Doctor of Philosophy
Type
dissertation or thesis