Methods For High Dimensional Matrix Computation And Diagnostics Of Distributed System
Big data provides opportunities, but also brings new challenges to modern scientific computing. In this thesis, we conduct sparse principal component analysis (SPCA) on high dimensional matrices. We propose a modified curvilinear algorithm to solve eigenvalue optimization with orthogonal constraints, and combine it with an augmented Lagrangian method to improve its computational efficiency. We compare our algorithm against standard PCA on the recovery of low-rank tensors and a mean-reverted statistical arbitrage strategy. The explosion of big data has also influenced the development on distributed computing systems. For debugging purposes, we are interested in predicting server run-time based on system data early in the process. We study discriminative models in functional data analysis, and introduce generative models that capture server regime-change behaviors. We also design computational methods, including a blocked Gibbs sampler, to improve the accuracy and efficiency of model estimation.