Cornell University
Library
Cornell UniversityLibrary

eCommons

Help
Log In(current)
  1. Home
  2. Cornell University Graduate School
  3. Cornell Theses and Dissertations
  4. ESTIMATION AND INFERENCE OF HIGH-DIMENSIONAL INDIVIDUALIZED THRESHOLD WITH BINARY RESPONSES

ESTIMATION AND INFERENCE OF HIGH-DIMENSIONAL INDIVIDUALIZED THRESHOLD WITH BINARY RESPONSES

File(s)
Feng_cornellgrad_0058F_12516.pdf (1.19 MB)
Permanent Link(s)
https://doi.org/10.7298/q520-3742
https://hdl.handle.net/1813/109738
Collections
Cornell Theses and Dissertations
Author
Feng, Huijie
Abstract

High-dimensional data is ubiquitous nowadays in many areas. Over the last twenty to thirty years, studying the approaches for analyzing high-dimensional data, including parameter estimation and uncertainty quantification (hypothesis test, confidence interval, etc), has attracted tremendous attention and been extensively developed. This work proposes the methodology, theory and computation algorithm for estimation and inference under a high-dimensional binary response setting. In this setting, an individualized linear threshold $\bbeta^T\bZ$, with $\bbeta$ being a high-dimensional parameter of interest, minimizes the disagreement between $\sign{X-\bbeta^T\bZ}$ and a binary response $Y$, where $X$ is a continuous variable and $\bZ$ is a high-dimensional vector. While many popular models fit into this general framework, it is not well studied in the high-dimensional setting. This work consists of two main parts. The first part discusses a general framework for estimating the unknown parameter $\bbeta$. While the problem can be formulated into the M-estimation framework, minimizing the corresponding empirical risk function is computationally intractable due to discontinuity of the sign function. Moreover, estimating $\bbeta$ even in the fixed-dimensional setting is known as a nonregular problem leading to nonstandard asymptotic theory. To tackle the computational and theoretical challenges in the estimation of the high-dimensional parameter $\bbeta$, we propose an empirical risk minimization approach based on a regularized smoothed non-convex loss function. The Fisher consistency of the proposed method is guaranteed as the bandwidth of the smoothed loss is shrunk to 0. %The statistical and computational trade-off is investigated. Statistically, we show that the finite sample error bound for estimating $\bbeta$ in $\ell_2$ norm is $(s\log d/n)^{\ell/(2\ell+1)}$, where $d$ is the dimension of $\bbeta$, $s$ is the sparsity level, $n$ is the sample size and $\ell$ is the smoothness of the conditional density of $X$ given the response $Y$ and the covariates $\bZ$. The convergence rate is nonstandard and slower than that in the classical Lasso problems. Furthermore, we prove that the resulting estimator is minimax rate optimal up to a logarithmic factor. The Lepski's method is developed to achieve the adaption to both the unknown sparsity $s$ and smoothness $\beta$. Computationally, an efficient path-following algorithm is proposed to compute the solution path. We show that this algorithm achieves geometric rate of convergence for computing the whole path. The second part of this work proposes a general procedure for conducting statistical tests for the low-dimensional components of the high-dimensional parameter $\bbeta$. Based on the smoothed non-convex loss function, a smoothed decorrelated score test statistic is proposed. Different from many classical settings where the test statistic is scaled by a $n^{1/2}$ factor, the proposed test statistic is scaled by $(n\delta)^{1/2}$, where $\delta\rightarrow0$ is the bandwidth parameter. In addition, there is an extra approximation bias term to be eliminated for valid inference. The proposed test statistic based on the debiased smoothed decorrelated score function is shown to be asymptotic normal and the power of the test is studied under local alternatives. Moreover, as the bandwidth $\delta$ is a crucial quantity that affects the finite sample performance of the test statistic, we discuss how this parameter can be selected in a data-driven manner.

Description
195 pages
Date Issued
2021-05
Keywords
High-dimensional Statistics
•
Kernel Method
•
Nonstandard Asymptotics
•
Statistical Inference
Committee Chair
Ning, Yang
Committee Member
Chen, Yudong
Hooker, Giles J.
Degree Discipline
Statistics
Degree Name
Ph. D., Statistics
Degree Level
Doctor of Philosophy
Rights
Attribution 4.0 International
Rights URI
https://creativecommons.org/licenses/by/4.0/
Type
dissertation or thesis
Link(s) to Catalog Record
https://newcatalog.library.cornell.edu/catalog/15049552

Site Statistics | Help

About eCommons | Policies | Terms of use | Contact Us

copyright © 2002-2026 Cornell University Library | Privacy | Web Accessibility Assistance