JavaScript is disabled for your browser. Some features of this site may not work without it.

eCommons will be completely unavailable from 8:00am April 4 until 5:00pm April 5, 2018, for software upgrades. Thank you for your patience during this planned service interruption.
Please contact us at ecommons-admin@cornell.edu if you have questions or concerns.

## Selected Topics In Nonparametric Testing And Variable Selection For High Dimensional Data

#####
**Author**

Ji, Pengsheng

#####
**Abstract**

Part I: The Gaussian white noise model has been used as a general framework for nonparametric problems. The asymptotic equivalence of this model to density estimation and nonparametric regression has been established by Nussbaum (1996), Brown and Low (1996). In Chapter 1, we consider testing for presence of a signal in Gaussian white noise with intensity n[-]1/2 , when the alternatives are given by smoothness ellipsoids with an L2 -ball of radius [rho] removed. It is known that, for a fixed Sobolev type ellipsoid [SIGMA]([beta], M ) of smoothness [beta] and size M , the radius rate [rho] n[-]4[beta]/(4[beta]+1) is the critical separation rate, in the sense that the minimax error of second kind over [alpha]-tests stays asymptotically between 0 and 1 strictly (Ingster, 1982). In addition, Ermakov (1990) found the sharp asymptotics of the minimax error of second kind at the separation rate. For adaptation over both [beta] and M in that context, it is known that a log log-penalty over the separation rate for [rho] is necessary for a nonzero asymptotic power. Here, following an example in nonparametric estimation related to the Pinsker constant, we investigate the adaptation problem over the ellipsoid size M only, for fixed smoothness degree [beta]. It is established that the Ermakov type sharp asymptotics can be preserved in that adaptive setting, if [rho] [RIGHTWARDS ARROW] 0 slower than the separation rate. The penalty for adapation in that setting turns out to be a sequence tending to infinity arbitrarily slowly. In Chapter 2, motivated by the sharp asymptotics of nonparametric estimation for non-Gaussian regression (Golubev and Nussbaum, 1990), we extend Ermakov's sharp asymptotics for the minimax testing errors to the nonparametric regression model with nonnormal errors. The paper entitled "Sharp Asymptotics for Risk Bounds in Nonparametric Testing with Uncertainty in Error Distributions" is in preparation. This part is joint work with Michael Nussbaum. Part II: Consider a linear model Y = X [beta] + z, z ~ N (0, In ). Here, X = Xn, p , where both p and n are large but p > n. We model the rows of X as iid samples from N (0, 1 Ω), where Ω is a p x p correlation matrix, which is unknown to us but is n presumably sparse. The vector [beta] is also unknown but has relatively few nonzero coordinates, and we are interested in identifying these nonzeros. We propose the Univariate Penalization Screeing (UPS) for variable selection. This is a Screen and Clean method where we screen with Univariate thresholding, and clean with Penalized MLE. It has two important properties: Sure Screening and Separable After Screening. These properties enable us to reduce the original regression problem to many small-size regression problems that can be fitted separately. The UPS is effective both in theory and in computation. We measure the performance of a procedure by the Hamming distance, and use an asymptotic framework where p [RIGHTWARDS ARROW] [INFINITY] and other quantities (e.g., n, sparsity level and strength of signals) are linked to p by fixed parameters. We find that in many cases, the UPS achieves the optimal rate of convergence. Al- so, for many different Ω, there is a common three-phase diagram in the twodimensional phase space quantifying the signal sparsity and signal strength. In the first phase, it is possible to recover all signals. In the second phase, it is possible to recover most of the signals, but not all of them. In the third phase, successful variable selection is impossible. UPS partitions the phase space in the same way that the optimal procedures do, and recovers most of the signals as long as successful variable selection is possible. The lasso and the subset selection are well-known approaches to variable selection. However, somewhat surprisingly, there are regions in the phase space where neither of them is rate optimal, even in very simple settings such as Ω is tridiagonal, and when the tuning parameter is ideally set. This part is joint work with Jiashun Jin, and has appeared in Annals of Statistics.

#####
**Date Issued**

2012-08-20#####
**Subject**

minimax hypothesis testing; graph; phase diagram; screen and clean; Hamming distance; variable selection

#####
**Committee Chair**

Nussbaum, Michael

#####
**Committee Member**

Booth, James; Wells, Martin Timothy

#####
**Degree Discipline**

Statistics

#####
**Degree Name**

Ph.D. of Statistics

#####
**Degree Level**

Doctor of Philosophy

#####
**Type**

dissertation or thesis