High Dimensional Data Analysis With Dependency and Under Limited Memory
Several methods for high dimensional analysis are proposed in this thesis under the condition that there are data dependency and limited memory. The first part of the work proposes a model free method for building networks for time series data when data are dependent from weakly stationary time series. We develop a thresholding based on methods to estimate multivariate spectral density under weakly sparsity assumption for high dimensional time series. Our theoretical analysis ensures that consistent estimations of spectral density matrix of a p-dimensional time series using n samplesare possible under high-dimensional regime $\log p/n \rightarrow 0$ as long as the true spectral density is approximately sparse. A key technical component of our analysis is a new concentration inequality of average periodogram around its expectation, which is of independent interest. Our estimation consistency results complement existing results for shrinkage based estimators of multivariate spectral density, which require no assumption on sparsity but only ensure consistent estimation in a regime p^2/n --> 0. In addition, our proposed thresholding based estimators perform consistent and automatic edge selection when coherence networks among the components of a multivariate time series are learned. We demonstrate the advantages of our estimators using simulation studies and a real data application on functional connectivity analysis with fMRI data. We further show that with a simple modification in the classic estimator, we can build a rigorous theory for adaptive thresholding in estimating multivariate spectral density for Gaussian process. This adaptive estimator can capture the heterogeneity across different positions in spectral density matrix at a better convergence rate in comparison to the hard thresholding estimator. The second part delves into compressing/analyzing high dimensional data with limited memory. We fixate on developing a streaming algorithm for Tucker Decomposition, generalization of singular value decomposition. The method applies a randomized linear map to the tensor to obtain a sketch that captures the important directions within each mode as well as the interactions among the modes. The sketch can be extracted from streaming or distributed data or with a single pass over the tensor which uses storage proportional to the degrees of freedom in the output Tucker approximation. Although the algorithm can exploit another view to compute a superior approximation, it does not require a second pass over the tensor. In conclusion, the paper provides a rigorous theoretical guarantee on elimination of the approximation error. Extensive numerical experiments show that the algorithm produces useful results that improve the state of the art for streaming Tucker decomposition. Along the development of one-pass Tucker decomposition, we propose a memory efficient random mapping which we call Tensor random projection. We further study its theoretical property in application to several areas like random projection, sketching algorithms for fast computation for tensor regression.
Basu, SumantaUdell, Madeleine Richards
Chen, Yudong; Nussbaum, Michael
Ph. D., Statistics
Doctor of Philosophy
dissertation or thesis