Cornell University
Library
Cornell UniversityLibrary

eCommons

Help
Log In(current)
  1. Home
  2. Cornell University Graduate School
  3. Cornell Theses and Dissertations
  4. High Dimensional Data Analysis With Dependency and Under Limited Memory

High Dimensional Data Analysis With Dependency and Under Limited Memory

File(s)
Sun_cornellgrad_0058F_11757.pdf (5.53 MB)
Permanent Link(s)
https://doi.org/10.7298/8mfr-nr52
https://hdl.handle.net/1813/70014
Collections
Cornell Theses and Dissertations
Author
Sun, Yiming
Abstract

Several methods for high dimensional analysis are proposed in this thesis under the condition that there are data dependency and limited memory. The first part of the work proposes a model free method for building networks for time series data when data are dependent from weakly stationary time series. We develop a thresholding based on methods to estimate multivariate spectral density under weakly sparsity assumption for high dimensional time series. Our theoretical analysis ensures that consistent estimations of spectral density matrix of a p-dimensional time series using n samplesare possible under high-dimensional regime $\log p/n \rightarrow 0$ as long as the true spectral density is approximately sparse. A key technical component of our analysis is a new concentration inequality of average periodogram around its expectation, which is of independent interest. Our estimation consistency results complement existing results for shrinkage based estimators of multivariate spectral density, which require no assumption on sparsity but only ensure consistent estimation in a regime p^2/n --> 0. In addition, our proposed thresholding based estimators perform consistent and automatic edge selection when coherence networks among the components of a multivariate time series are learned. We demonstrate the advantages of our estimators using simulation studies and a real data application on functional connectivity analysis with fMRI data. We further show that with a simple modification in the classic estimator, we can build a rigorous theory for adaptive thresholding in estimating multivariate spectral density for Gaussian process. This adaptive estimator can capture the heterogeneity across different positions in spectral density matrix at a better convergence rate in comparison to the hard thresholding estimator. The second part delves into compressing/analyzing high dimensional data with limited memory. We fixate on developing a streaming algorithm for Tucker Decomposition, generalization of singular value decomposition. The method applies a randomized linear map to the tensor to obtain a sketch that captures the important directions within each mode as well as the interactions among the modes. The sketch can be extracted from streaming or distributed data or with a single pass over the tensor which uses storage proportional to the degrees of freedom in the output Tucker approximation. Although the algorithm can exploit another view to compute a superior approximation, it does not require a second pass over the tensor. In conclusion, the paper provides a rigorous theoretical guarantee on elimination of the approximation error. Extensive numerical experiments show that the algorithm produces useful results that improve the state of the art for streaming Tucker decomposition. Along the development of one-pass Tucker decomposition, we propose a memory efficient random mapping which we call Tensor random projection. We further study its theoretical property in application to several areas like random projection, sketching algorithms for fast computation for tensor regression.

Description
213 pages
Date Issued
2019-12
Committee Chair
Basu, Sumanta
Udell, Madeleine Richards
Committee Member
Chen, Yudong
Nussbaum, Michael
Degree Discipline
Statistics
Degree Name
Ph. D., Statistics
Degree Level
Doctor of Philosophy
Type
dissertation or thesis
Link(s) to Catalog Record
https://newcatalog.library.cornell.edu/catalog/13119701

Site Statistics | Help

About eCommons | Policies | Terms of use | Contact Us

copyright © 2002-2026 Cornell University Library | Privacy | Web Accessibility Assistance