Improving GWAS phenotypes through Bayesian and machine learning approaches
MetadataShow full item record
Large-scale genome-wide association studies (GWAS) have enabled detection of numerous candidate genetic loci that may impact human diseases, expanded our knowledge about the underlying biological mechanisms and pathways for complex disease phenotypes, and provided tangible biological targets for treatment and drug development. Moreover, GWAS summary statistics have improved disease risk assessment for complex disease phenotypes paving the way for early disease detection, intervention, and mitigation strategies with the added potential of risk-based stratification for differential treatments. However, existing statistical models and inference suffer from limitations of disease phenotypes, including (i) ambiguity in disease definition (where genetically distinct disease phenotypes are defined as the same disease/disease complex) and (ii) disease misdiagnosis/misclassification (where errors in phenotype misclassify a disease case as a control and vice versa). Current methods addressing these issues show reasonable performance, however, these methods may be improved by exploring alternative methodologies and incorporating constraints for a typical GWAS dataset. In this dissertation, I propose a Bayesian hierarchical latent variable model, PheLEx (Phenotype Latent variable Extraction of disease misdiagnosis) and a bootstrapping approach PheBEs (Phenotype Bootstrapping Estimation method) for the extraction of misclassification errors in GWAS phenotypes. Performance of both methods is evaluated using simulated GWAS datasets and real GWAS phenotypes from the UK Biobank dataset. Improved performance of proposed methods over existing methods in identifying misclassified individuals in simulated GWAS phenotypes indicates potential for improved GWAS statistical power and candidate loci discovery through use of these methods. Finally, I propose an alternate disease risk assessment method for computation of disease risk scores and phenotype prediction. The proposed alternate disease risk assessment methodology showed comparable (and in some cases, improved) performance for prediction of risk assessment for UK Biobank phenotypes and subtypes indicating potential for improved disease risk assessment.
167 pagesSupplemental file(s) description: Additional File 3, Additional File 2, Additional File 1.
Bayesian; disease risk estimation; GWAS; MCMC; misclassification; phenotypes
Clark, Andrew; Williams, Amy
Ph. D., Computational Biology
Doctor of Philosophy
Attribution 4.0 International
dissertation or thesis
Except where otherwise noted, this item's license is described as Attribution 4.0 International