Relatedness inference, pedigree reconstruction, and cancer genomics in large cohorts.

dc.contributor.authorSeidman, Daniel Noah
dc.contributor.chairWilliams, Amy L.
dc.contributor.committeeMemberMezey, Jason G.
dc.contributor.committeeMemberClark, Andrew
dc.description190 pages
dc.description.abstractAs the cost of collecting genetic information decreases, and the size of available genetic datasets increases, tools that efficiently process these datasets become more important. Our goal is to develop multiple algorithms that can extract useful quantities from these large datasets to inform further analyses in medicine and population genetics.We first present on IBIS, an IBD detector we developed that locates long regions of allele sharing between unphased individuals. We compared this algorithm's performance to several contemporary alternatives, some that rely on phasing and some that do not. We determined that, in addition to being comparatively efficient, our algorithm’s ability to infer IBD segments $\ge 7 cM$ compares favorably for to other methods. With these segments, we found that IBIS can classify first through third degree relatives in real data at rates meeting or exceeding other methods and identifies fourth through sixth degree pairs at rates close to the top methods. Next, we discuss our algorithm that leverages read data from multiple tumor samples to handle the complex problem of constructing phylogeny trees that include structural variants (SVs). The algorithm, Meltos, uses tumor phylogeny trees built on somatic single nucleotide variants (SNVs) initially to form a scaffold, and then it attempts to insert high confidence SVs to produce a comprehensive lineage tree. We found that using evolutionary constraints for variant allele frequences, combined with a new probabilistic formula for calculating said frequences, provides evidence that helps weed out false positive SVs and place true positive SVs into their phylogeny framework. Lastly, we return to the goal of working with IBD information. We present on our algorithm, PELICAN, that takes the information from IBIS, together with statistical likelihood for specific second degree relationships, provided by CREST (Qiao, Sannerud, et al. 2021), to quickly and accurately infer pedigrees from large genome datasets. We utilize a combination of likelihoods and biological constraints to perform a backtracking search that exhaustively checks the entire set of possible pedigrees for the highest possible likelihood pedigrees, and PELICAN is able to do so at speeds comparable to other state of the art algorithms.
dc.subjectIdentical by Descent
dc.subjectPhylogenetic trees
dc.subjectPopulation Genetics
dc.titleRelatedness inference, pedigree reconstruction, and cancer genomics in large cohorts.
dc.typedissertation or thesis
dcterms.license Biology University of Philosophy D., Computational Biology


Original bundle
Now showing 1 - 1 of 1
Thumbnail Image
3.26 MB
Adobe Portable Document Format