Discovering and Exploiting Structure for Gaussian Processes
Gaussian processes have emerged as a powerful tool for modeling complex and noisy functions. They have found wide applicability in personalized medicine, time series analysis, prediction tasks in the physical sciences, and recently blackbox optimization. Their success is in large part due to two fundamental advantages they enjoy over many other models. First, they provide convenient and often well-calibrated uncertainty estimates. Machine learning models make mistakes, and by offering users full probabilistic predictions, Gaussian processes allow for more informed decision making in the face of uncertainty. Second, they allow users to encode and incorporate prior knowledge about their modelling task through the use of flexible and composable covariance functions or kernels. This aspect of Gaussian processes is particularly critical when faced with functions that are expensive or burdensome to evaluate, as they allow users to develop models that will generalize even with very limited data–in some cases, even before the first training example is collected. Despite these two clear advantages, some of the most popular applications of Gaussian processes have focused on exploiting the first advantage of GPs, and very little on exploiting the latter. As an example, in Bayesian optimization, off-the-shelf implementations often use the most generic and flexible covariance functions available. While a priori this ensures the generality of Bayesian optimization, it can significantly increase the number of function evaluations requredto perform optimization. In this thesis, we will demonstrate by way of application that the second advantage can be just as critical as the first. By leveraging expert medical knowledge, we develop a GP model that exploits basic facts about human hearing to dramatically improve both the quality and speed of modern audiometric testing. By automatically discovering independence structure in an objective function, we can leverage recent work on additive structure in Bayesian optimization to achieve exponentially lower sample complexities. Finally, by exploiting the product structure inherent to the RBF kernel–arguably the most common covariance function in usage–we will develop approximations for the GP marginal log likelihood that result in exponential improvement to the running time complexity of Gaussian process inference compared to existing inducing point methods, resulting in the fastest asymptotic time complexity for training we are aware of.
Weinberger, Kilian Quirin
Bala, Kavita; Sridharan, Karthik
Ph. D., Computer Science
Doctor of Philosophy
dissertation or thesis