Joint-stochastic Spectral Inference for Robust Co-occurrence Modeling and Latent Topic Analysis
Co-occurrence information is powerful statistics that can model various discrete objects by their joint instances with other objects. Transforming unsupervised problems of learning low-dimensional geometry into provable decompositions of co-occurrence information, spectral inference provides fast algorithms and optimality guarantees for non-linear dimensionality reduction or latent topic analysis. Spectral approaches reduce the dependence on the original training examples and produce substantial gain in efficiency, but at costs: a) The algorithms perform poorly on real data that does not necessarily follow underlying models; b) Users can no longer infer information about individual examples, which is often important for real-world applications; c) Model complexity rapidly grows as the number of objects increases, requiring a careful curation of the vocabulary. The first issue is called model-data mismatch, which is a fundamental problem common in every spectral inference method for latent variable models. As real data never follows any particular computational model, this issue must be ad- dressed for practicality of the spectral inference beyond synthetic settings. For the second issue, users could revisit probabilistic inference to infer information about individual examples, but this brings back all the drawbacks of traditional approaches. One method is recently developed for spectral inference, but it works only on tiny models, quickly losing its performance for the datasets whose underlying structures exhibit realistic correlations. While probabilistic inference also suffers from the third issue, the problem is more serious for spectral inferences because co-occurrence information easily exceeds storable capacity as the size of vocabulary becomes larger. We cast the learning problem in the framework of Joint Stochastic Matrix Factorization (JSMF), showing that existing methods violate the theoretical conditions necessary for a good solution to exist. Proposing novel rectification paradigms for handling the model-data mismatch, the Rectified Anchor Word Algorithm (RAWA) is able to learn quality latent structures and their interactions even on small noisy data. We also propose the Prior Aware Dual Decomposition (PADD) that is capable of considering the learned interactions as well as the learned latent structures to robustly infer example- specific information. Beyond the theoretical guarantees, our experimental results show that RAWA recovers quality low-dimensional geometry on various textual/non-textual datasets comparable to probabilistic Gibbs sampling, and PADD substantially outperforms the recently developed method for learning low-dimensional representations of individual examples. Although this thesis does not address the complexity issue for large vocabulary, we have developed new methods that can drastically compress co-occurrence information and learn only with the compressed statistics without losing much precision. Providing rich capability to operate on millions of objects and billions of examples, we complete all the necessary tools to make spectral inference robust and scalable competitor to probabilistic inference for unsupervised latent structure learning. We hope our research serves an initial basis for a new perspective that combines the benefits of both spectral and probabilistic worlds.
Statistics; Applied mathematics; Anchor Word Algorithm; Co-occurrence Modeling; Joint-Stochastic Matrix Factorization; Rectification; Spectral Inference; Topic Modeling; Artificial intelligence
Frazier, Peter; Bindel, David S.
Ph. D., Computer Science
Doctor of Philosophy
dissertation or thesis