Scalable kernel methods and their use in black-box optimization
This dissertation uses structured linear algebra to scale kernel regression methods based on Gaussian processes (GPs) and radial basis function (RBF) interpolation to large, high-dimensional datasets. While kernel methods provide a general, principled framework for approximating functions from scattered data, they are often seen as impractical for large data sets as the standard approach to model fitting scales cubically with the number of data points. We introduce RBFs in Section 1.3 and GPs in Section 1.4. Chapter 2 develops novel O(n) approaches for GP regression with n points using fast approximate matrix vector multiplications (MVMs). Kernel learning with GPs require solving linear systems and computing the log determinant of an n x n kernel matrix. We use iterative methods relying on the fast MVMs to solve the linear systems and leverage stochastic approximations based on Chebyshev and Lanczos to approximate the log determinant. We find that Lanczos is generally highly efficient and accurate and superior to Chebyshev for kernel learning. We consider a large variety of experiments to demonstrate the generality of this approach. Chapter 3 extends the ideas from Chapter 3 to fitting a GP to both function values and derivatives. This requires linear solves and log determinants with an n(d+1) x n(d+1) kernel matrix in d dimensions, leading to O(n^3 d^3) computations for standard methods. We extend the previous methods and introduce a pivoted Cholesky preconditioner that cuts the iterations to convergence by several orders of magnitude. Our approaches, together with dimensionality reduction, lets us scale Bayesian optimization with derivatives to high-dimensional problems and large evaluation budgets. We introduce surrogate optimization in Section 1.5. Surrogate optimization is a key application of GPs and RBFs, where they are used to model a computationally-expensive black-box function based on previous evaluations. Chapter 4 introduces a global optimization algorithm for computationally expensive black-box function based on RBFs. Given an upper bound on the semi-norm of the objective function in a reproducing kernel Hilbert space associated with the RBF, we prove that our algorithm is globally convergent even though it may not sample densely. We discuss expected convergence rates and illustrate the performance of the method via experiments on a set of test problems. Chapter 5 describes Plumbing for Optimization with Asynchronous Parallelism (POAP) and the Python Surrogate Optimization Toolbox (pySOT). POAP is an event-driven framework for building and combining asynchronous optimization strategies, designed for global optimization of computationally expensive black-box functions where concurrent function evaluations are appealing. pySOT is a collection of synchronous and asynchronous surrogate optimization strategies, implemented in the POAP framework. The pySOT framework includes a variety of surrogate models, experimental designs, optimization strategies, test problems, and serves as a useful platform to compare methods. We use pySOT, to make an extensive comparison between synchronous and asynchronous parallel surrogate optimization methods, and find that asynchrony is never worse than synchrony on several challenging multimodal test problems.
Gaussian Process; Applied mathematics; Global Optimization; Radial Basis Function; Scalable Machine Learning; Surrogate Optimization; Mathematics; Computer science; Bayesian optimization
Bindel, David S.
Van Loan, Charles Francis; Shoemaker, Christine Ann; Townsend, Alex John
Ph. D., Applied Mathematics
Doctor of Philosophy
dissertation or thesis