Show simple item record

dc.contributor.authorShafquat, Afrah
dc.date.accessioned2020-08-10T20:24:38Z
dc.date.available2020-08-10T20:24:38Z
dc.date.issued2020-05
dc.identifier.otherShafquat_cornellgrad_0058F_11885
dc.identifier.otherhttp://dissertations.umi.com/cornellgrad:11885
dc.identifier.urihttps://hdl.handle.net/1813/70464
dc.description167 pages
dc.descriptionSupplemental file(s) description: Additional File 3, Additional File 2, Additional File 1.
dc.description.abstractLarge-scale genome-wide association studies (GWAS) have enabled detection of numerous candidate genetic loci that may impact human diseases, expanded our knowledge about the underlying biological mechanisms and pathways for complex disease phenotypes, and provided tangible biological targets for treatment and drug development. Moreover, GWAS summary statistics have improved disease risk assessment for complex disease phenotypes paving the way for early disease detection, intervention, and mitigation strategies with the added potential of risk-based stratification for differential treatments. However, existing statistical models and inference suffer from limitations of disease phenotypes, including (i) ambiguity in disease definition (where genetically distinct disease phenotypes are defined as the same disease/disease complex) and (ii) disease misdiagnosis/misclassification (where errors in phenotype misclassify a disease case as a control and vice versa). Current methods addressing these issues show reasonable performance, however, these methods may be improved by exploring alternative methodologies and incorporating constraints for a typical GWAS dataset. In this dissertation, I propose a Bayesian hierarchical latent variable model, PheLEx (Phenotype Latent variable Extraction of disease misdiagnosis) and a bootstrapping approach PheBEs (Phenotype Bootstrapping Estimation method) for the extraction of misclassification errors in GWAS phenotypes. Performance of both methods is evaluated using simulated GWAS datasets and real GWAS phenotypes from the UK Biobank dataset. Improved performance of proposed methods over existing methods in identifying misclassified individuals in simulated GWAS phenotypes indicates potential for improved GWAS statistical power and candidate loci discovery through use of these methods. Finally, I propose an alternate disease risk assessment method for computation of disease risk scores and phenotype prediction. The proposed alternate disease risk assessment methodology showed comparable (and in some cases, improved) performance for prediction of risk assessment for UK Biobank phenotypes and subtypes indicating potential for improved disease risk assessment.
dc.rightsAttribution 4.0 International
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/
dc.subjectBayesian
dc.subjectdisease risk estimation
dc.subjectGWAS
dc.subjectMCMC
dc.subjectmisclassification
dc.subjectphenotypes
dc.titleImproving GWAS phenotypes through Bayesian and machine learning approaches
dc.typedissertation or thesis
thesis.degree.disciplineComputational Biology
thesis.degree.grantorCornell University
thesis.degree.levelDoctor of Philosophy
thesis.degree.namePh. D., Computational Biology
dc.contributor.chairMezey, Jason
dc.contributor.committeeMemberClark, Andrew
dc.contributor.committeeMemberWilliams, Amy
dcterms.licensehttps://hdl.handle.net/1813/59810
dc.identifier.doihttps://doi.org/10.7298/dsdz-m761


Files in this item

Thumbnail
Thumbnail
Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record

Except where otherwise noted, this item's license is described as Attribution 4.0 International

Statistics