JavaScript is disabled for your browser. Some features of this site may not work without it.
ESTIMATION AND INFERENCE OF HIGH-DIMENSIONAL INDIVIDUALIZED THRESHOLD WITH BINARY RESPONSES

Author
Feng, Huijie
Abstract
High-dimensional data is ubiquitous nowadays in many areas. Over the last twenty to thirty years, studying the approaches for analyzing high-dimensional data, including parameter estimation and uncertainty quantification (hypothesis test, confidence interval, etc), has attracted tremendous attention and been extensively developed. This work proposes the methodology, theory and computation algorithm for estimation and inference under a high-dimensional binary response setting. In this setting, an individualized linear threshold $\bbeta^T\bZ$, with $\bbeta$ being a high-dimensional parameter of interest, minimizes the disagreement between $\sign{X-\bbeta^T\bZ}$ and a binary response $Y$, where $X$ is a continuous variable and $\bZ$ is a high-dimensional vector. While many popular models fit into this general framework, it is not well studied in the high-dimensional setting. This work consists of two main parts. The first part discusses a general framework for estimating the unknown parameter $\bbeta$. While the problem can be formulated into the M-estimation framework, minimizing the corresponding empirical risk function is computationally intractable due to discontinuity of the sign function. Moreover, estimating $\bbeta$ even in the fixed-dimensional setting is known as a nonregular problem leading to nonstandard asymptotic theory. To tackle the computational and theoretical challenges in the estimation of the high-dimensional parameter $\bbeta$, we propose an empirical risk minimization approach based on a regularized smoothed non-convex loss function. The Fisher consistency of the proposed method is guaranteed as the bandwidth of the smoothed loss is shrunk to 0. %The statistical and computational trade-off is investigated. Statistically, we show that the finite sample error bound for estimating $\bbeta$ in $\ell_2$ norm is $(s\log d/n)^{\ell/(2\ell+1)}$, where $d$ is the dimension of $\bbeta$, $s$ is the sparsity level, $n$ is the sample size and $\ell$ is the smoothness of the conditional density of $X$ given the response $Y$ and the covariates $\bZ$. The convergence rate is nonstandard and slower than that in the classical Lasso problems. Furthermore, we prove that the resulting estimator is minimax rate optimal up to a logarithmic factor. The Lepski's method is developed to achieve the adaption to both the unknown sparsity $s$ and smoothness $\beta$. Computationally, an efficient path-following algorithm is proposed to compute the solution path. We show that this algorithm achieves geometric rate of convergence for computing the whole path. The second part of this work proposes a general procedure for conducting statistical tests for the low-dimensional components of the high-dimensional parameter $\bbeta$. Based on the smoothed non-convex loss function, a smoothed decorrelated score test statistic is proposed. Different from many classical settings where the test statistic is scaled by a $n^{1/2}$ factor, the proposed test statistic is scaled by $(n\delta)^{1/2}$, where $\delta\rightarrow0$ is the bandwidth parameter. In addition, there is an extra approximation bias term to be eliminated for valid inference. The proposed test statistic based on the debiased smoothed decorrelated score function is shown to be asymptotic normal and the power of the test is studied under local alternatives. Moreover, as the bandwidth $\delta$ is a crucial quantity that affects the finite sample performance of the test statistic, we discuss how this parameter can be selected in a data-driven manner.
Description
195 pages
Date Issued
2021-05Subject
High-dimensional Statistics; Kernel Method; Nonstandard Asymptotics; Statistical Inference
Committee Chair
Ning, Yang
Committee Member
Chen, Yudong; Hooker, Giles J.
Degree Discipline
Statistics
Degree Name
Ph. D., Statistics
Degree Level
Doctor of Philosophy
Rights
Attribution 4.0 International
Rights URI
Type
dissertation or thesis
Except where otherwise noted, this item's license is described as Attribution 4.0 International