Show simple item record

dc.contributor.authorHoffman, Gabrielen_US
dc.identifier.otherbibid: 8267168
dc.description.abstractGenome-wide association studies (GWAS) have become a a widely adopted approach to identify genetic variation that produces variation in complex phenotype. Standard statistical methods are able to identify strong associations in these datasets, but more sophisticated statistical methods that model complex aspects of the biological data can identify weaker associations and further elucidate the underlying molecular biology. We develop and apply statistical methods that explicitly model two aspects of GWAS data using two complementary forms of regularized regression. First, we model the polygenic architecture of complex phenotypes using feature selection methods in a penalized regression framework. We propose novel algorithmic, computational and heuristic approaches in order to produce a method that scales to high dimensional GWAS data and increases power to detect weak associations that are not detectable by standard tests. Second, we model the covariance between individuals due to kinship and population structure using a linear mixed model that regularizes the statistical contribution of a metric of ancestry. Linear mixed models have been widely adopted for analysis of GWAS data, but their theoretical properties have not been examined in this context. We formalize the statistical properties of the linear mixed model, develop a novel interpretation in relation to population genetics, and propose a novel low rank linear mixed model that learns the dimensionality of the correction for kinship and population structure from the data. Finally, we combine these two complementary regularized regression models into a penalized linear mixed model. We develop a unified model incorporating a novel algorithm with novel approaches to tuning nonconvex penalties and determining the optimal stopping point in the regularization path. Leveraging recent work on assessing significance of selected features, we produce a well-principled and scalable statistical method applicable to feature selection, hypothesis testing and prediction in many contexts.en_US
dc.subjectgenome-wide association studyen_US
dc.subjectregularized regressionen_US
dc.titleModeling Biological Processes In Genome-Wide Association Studies Using Regularized Regressionen_US
dc.typedissertation or thesisen_US Universityen_US of Philosophy D., Genetics
dc.contributor.chairMezey, Jason G.en_US
dc.contributor.committeeMemberSiepel, Adam Charlesen_US
dc.contributor.committeeMemberClark, Andrewen_US

Files in this item


This item appears in the following Collection(s)

Show simple item record