eCommons

 

Population Genetic Inference When Mutation Rates are Context-Dependent

dc.contributor.authorHernandez, Ryan
dc.date.accessioned2007-12-20T14:48:40Z
dc.date.available2012-12-20T07:16:58Z
dc.date.issued2007-12-20T14:48:40Z
dc.descriptionSpecial Committee Chair: Carlos D. Bustamante Special Committee Members: Andrew G. Clark, Richard T. Durretten_US
dc.description.abstractPopulation genetic studies often analyze patterns of single nucleotide polymorphisms (SNPs) to gain insight into the evolutionary history of a population. One summary statistic that has proved invaluable in these efforts is the frequency distribution of derived mutations (i.e., the site-frequency spectrum, or SFS). In order to generate the SFS, orthologous sequences from closely related outgroup species are frequently used to distinguish ancestral and derived alleles at each SNP (assuming the ancestral allele is the one that matches the outgroup). In a series of studies, I test the robustness of the parsimony assumption to a more realistic finite-sites model of context-dependent mutation biases inferred along the human lineage. I show (using both simulations and a theoretical model) that enough unobserved substitutions could have occurred since the divergence of human and chimpanzee to cause a shift in the SFS. The shifted SFS induced by misidentifying the ancestral states of some SNPs can lead to poor fitting demographic models and cause many statistical tests to spuriously reject neutrality in favor of models with positive selection. By constructing a novel model of the context-dependent mutation process, polymorphism data can be corrected for the effect of ancestral misidentification. Using this correction, statistical tests return to their proper rejection rates, allowing for more accurate inference of both demographic events as well as the strength and abundance of natural selection. This correction is used to better understand the evolution of GC-content in the human genome, and to perform accurate demographic inference in two populations of the biomedically important rhesus macaque. Finally, I present a new forward simulation program, SFS_CODE, that can simulate several populations under a Wright-Fisher style island model. This program is highly flexible, allowing the user to simulate several loci (with or without linkage), where each locus can be annotated as either coding or non-coding, sex or autosome, selected or neutral. In addition to providing the source code for our program, we have also developed a web server that will allow the user to perform simulations using the high performance computing resources of the Computational Biology Service Unit at Cornell University (http://cbsuapps.tc.cornell.edu/sfscode.aspx).en_US
dc.format.extent4256851 bytes
dc.format.mimetypeapplication/pdf
dc.identifier.otherbibid: 6476482
dc.identifier.urihttps://hdl.handle.net/1813/9398
dc.language.isoen_USen_US
dc.subjectpopulation geneticsen_US
dc.subjectstatistical inferenceen_US
dc.subjectancestral misidentificationen_US
dc.subjectforward simulationen_US
dc.subjectsingle nucleotide polymorphismen_US
dc.titlePopulation Genetic Inference When Mutation Rates are Context-Dependenten_US
dc.typedissertation or thesisen_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
ThesisMain.pdf
Size:
4.06 MB
Format:
Adobe Portable Document Format