LATENT GAUSSIAN COPULA MODEL FOR HIGH DIMENSIONAL MIXED DATA, AND ITS APPLICATIONS
Due to the advent of “big data” technologies, mixed data that consist of both categorical and continuous variables are encountered in many application areas. We present a framework to estimate the correlation among variables of mixed data types via a rank-based approach under a latent Gaussian copula model. Theoretical properties of the correlation matrix estimator are also established. With the correlation matrix estimate Σ , we are able to further extend the topic to other problems, such as graphical models, regression, and classification. In particular, we propose a family of methods for prediction with high dimensional mixed data that involves a shrunken estimate of the inverse matrix of Σ. By maximizing the log likelihood of the data subject to a penalty on the elements of the inverse of Σ, we demonstrate that higher prediction accuracy can be achieved, compared to other popular existing methods. We also show that several existing methods are special cases of the family. In addition, we consider the classification problem via a covariance-based approach analogous to linear discriminant analysis.
Wells, Martin; Ning, Yang
Ph. D., Statistics
Doctor of Philosophy
dissertation or thesis