Show simple item record

dc.contributor.authorWan, Mutingen_US
dc.date.accessioned2015-04-06T20:14:06Z
dc.date.available2020-01-27T07:01:39Z
dc.date.issued2015-01-26en_US
dc.identifier.otherbibid: 9154485
dc.identifier.urihttps://hdl.handle.net/1813/39389
dc.description.abstractIn recent years, sparse classification problems have emerged in many fields of study. Finite mixture models have been developed to facilitate Bayesian inference where parameter sparsity is substantial. Shrinkage estimation allows strength borrowing across features in light of the parallel nature of multiple hypothesis tests. Important examples that incorporate shrinkage estimation and finite mixture model for sparse classification include the hierarchical model in Smyth (2004) and the explicit mixture model in Bar et al. (2010) for Bayesian microarray analysis. Classification with finite mixture models is based on the posterior expectation of latent indicator variables. These quantities are typically estimated using the expectation-maximization (EM) algorithm in an empirical Bayes approach or Markov chain Monte Carlo (MCMC) in a fully Bayesian approach. MCMC is limited in applicability where high-dimensional data are involved because its sampling-based nature leads to slow computations and hard-to-monitor convergence. In a fully Bayesian framework, we investigate the feasibility and performance of variational Bayes (VB) approximation and apply the VB approach to fully Bayesian versions of several finite mixture models that have been proposed in bioinformatics. We find that it achieves desirable speed and accuracy in sparse classification with hierarchical mixture models for high-dimensional data. Another example of sparse classification in bioinformatics solvable via model-based approaches is expression quantitative trait loci (eQTL) detection, in which determining whether association between a gene and any given single nucleotide polymorphism (SNP) is significant is regarded as classifying genes as null or non-null with respect to the given SNP. High-dimensionality of the data not only causes difficulties in computations, but also renders the confounding impact of unwanted variation in the data irrefutable. Model-based approaches that account for unwanted variation by incorporating a factor analysis term representing hidden factors and their effects have been adopted in applications such as differential analysis and eQTL detection. HEFT (Gao et al., 2014) is a fast approach for model-based eQTL identification while simultaneously learning hidden effects. We develop a hierarchical mixture model-based empirical Bayes approach for sparse classification while simultaneously accounting for unwanted variation, as well as a family of model-based approaches that are its simplifications with the aim of attractive computational efficiency. We investigate feasibility and performance of these model-based approaches in comparison with HEFT using several real data examples in bioinformatics.en_US
dc.language.isoen_USen_US
dc.subjectBayesian inferenceen_US
dc.subjectLinear mixed modelsen_US
dc.subjectBioinformaticsen_US
dc.titleModel-Based Classification With Applications To High-Dimensional Data In Bioinformaticsen_US
dc.typedissertation or thesisen_US
thesis.degree.disciplineStatistics
thesis.degree.grantorCornell Universityen_US
thesis.degree.levelDoctor of Philosophy
thesis.degree.namePh. D., Statistics
dc.contributor.chairBooth, Jamesen_US
dc.contributor.committeeMemberHooker, Giles J.en_US
dc.contributor.committeeMemberWells, Martin Timothyen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record

Statistics