eCommons

 

Parallel Testing, And Variable Selection - A Mixture-Model Approach With Applications In Biostatistics

dc.contributor.authorBar, Haimen_US
dc.contributor.chairBooth, Jamesen_US
dc.contributor.committeeMemberWells, Martin Timothyen_US
dc.contributor.committeeMemberStrawderman, Robert Leeen_US
dc.date.accessioned2012-06-28T20:57:04Z
dc.date.available2017-06-01T06:00:34Z
dc.date.issued2012-01-31en_US
dc.description.abstractWe develop efficient and powerful statistical methods for high-dimensional data, where the sample size is much smaller than the number of features (the so-called 'large p, small n' problem). We deal with three important problems. First, we develop a mixture-model approach for parallel testing for unequal variances in two-sample experiments. The treatment effect on the variance has received little attention in the statistical literature, which so far focused mostly on the effect on the mean. The effect on the variance is increasingly recognized in recent biological literature, and we develop an empirical Bayes approach for testing differences in variance when the number of tests is large. We show that the model is useful in a wide range of applications, that our method is much more powerful than traditional tests for unequal variances, and that it is robust to the normality assumption. Second, we extend these ideas and develop a novel bivariate normal model that tests for both differential expression and differential variation between the two groups. We show in simulations that this new method yields a substantial gain in power when differential variation is present. Through a three-step estimation approach, in which we apply the Laplace approximation and the EM algorithm, we get a computationally efficient method, which is particularly well-suited for 'large p, small n' situations. Third, we deal with the problem of variable selection where the number of putative variables is large, possibly much larger than the sample size. We develop a model-based, empirical Bayes approach. By treating the putative variables as random effects, we get shrinkage estimation, which results in increased power and significantly faster convergence, compared with simulation-based methods. Furthermore, we employ computational tricks which allow us to increase the speed of our algorithm, to handle a very large number of putative variables, and to control the multicollinearity in the model. The motivation for developing this approach is QTL analysis, but our method is applicable to a broad range of applications. We use two widely-studied data sets, and show that our model selection algorithm yields excellent results.en_US
dc.identifier.otherbibid: 7745183
dc.identifier.urihttps://hdl.handle.net/1813/29325
dc.language.isoen_USen_US
dc.titleParallel Testing, And Variable Selection - A Mixture-Model Approach With Applications In Biostatisticsen_US
dc.typedissertation or thesisen_US
thesis.degree.disciplineStatistics
thesis.degree.grantorCornell Universityen_US
thesis.degree.levelDoctor of Philosophy
thesis.degree.namePh. D., Statistics

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
hyb2thesisPDF.pdf
Size:
1.24 MB
Format:
Adobe Portable Document Format