We are trying to improve the usability of eCommons and we need your help! Please sign up here - https://forms.gle/mBwXs4zfy75wvGNE7

eCommons

 

Population Genetic Inference When Mutation Rates are Context-Dependent

Other Titles

Abstract

Population genetic studies often analyze patterns of single nucleotide polymorphisms (SNPs) to gain insight into the evolutionary history of a population. One summary statistic that has proved invaluable in these efforts is the frequency distribution of derived mutations (i.e., the site-frequency spectrum, or SFS). In order to generate the SFS, orthologous sequences from closely related outgroup species are frequently used to distinguish ancestral and derived alleles at each SNP (assuming the ancestral allele is the one that matches the outgroup).

In a series of studies, I test the robustness of the parsimony assumption to a more realistic finite-sites model of context-dependent mutation biases inferred along the human lineage. I show (using both simulations and a theoretical model) that enough unobserved substitutions could have occurred since the divergence of human and chimpanzee to cause a shift in the SFS. The shifted SFS induced by misidentifying the ancestral states of some SNPs can lead to poor fitting demographic models and cause many statistical tests to spuriously reject neutrality in favor of models with positive selection.

By constructing a novel model of the context-dependent mutation process, polymorphism data can be corrected for the effect of ancestral misidentification. Using this correction, statistical tests return to their proper rejection rates, allowing for more accurate inference of both demographic events as well as the strength and abundance of natural selection. This correction is used to better understand the evolution of GC-content in the human genome, and to perform accurate demographic inference in two populations of the biomedically important rhesus macaque.

Finally, I present a new forward simulation program, SFS_CODE, that can simulate several populations under a Wright-Fisher style island model. This program is highly flexible, allowing the user to simulate several loci (with or without linkage), where each locus can be annotated as either coding or non-coding, sex or autosome, selected or neutral. In addition to providing the source code for our program, we have also developed a web server that will allow the user to perform simulations using the high performance computing resources of the Computational Biology Service Unit at Cornell University (http://cbsuapps.tc.cornell.edu/sfscode.aspx).

Journal / Series

Volume & Issue

Description

Special Committee Chair: Carlos D. Bustamante Special Committee Members: Andrew G. Clark, Richard T. Durrett

Sponsorship

Date Issued

2007-12-20T14:48:40Z

Publisher

Keywords

population genetics; statistical inference; ancestral misidentification; forward simulation; single nucleotide polymorphism

Location

Effective Date

Expiration Date

Sector

Employer

Union

Union Local

NAICS

Number of Workers

Committee Chair

Committee Co-Chair

Committee Member

Degree Discipline

Degree Name

Degree Level

Related Version

Related DOI

Related To

Related Part

Based on Related Item

Has Other Format(s)

Part of Related Item

Related To

Related Publication(s)

Link(s) to Related Publication(s)

References

Link(s) to Reference(s)

Previously Published As

Government Document

ISBN

ISMN

ISSN

Other Identifiers

Rights

Rights URI

Types

dissertation or thesis

Accessibility Feature

Accessibility Hazard

Accessibility Summary

Link(s) to Catalog Record