Scalable kernel methods and their use in black-box optimization

Eriksson, David

Scalable kernel methods and their use in black-box optimization

dc.contributor.author	Eriksson, David
dc.contributor.chair	Bindel, David S.
dc.contributor.committeeMember	Van Loan, Charles Francis
dc.contributor.committeeMember	Shoemaker, Christine Ann
dc.contributor.committeeMember	Townsend, Alex John
dc.date.accessioned	2019-04-02T13:59:56Z
dc.date.available	2019-04-02T13:59:56Z
dc.date.issued	2018-12-30
dc.description.abstract	This dissertation uses structured linear algebra to scale kernel regression methods based on Gaussian processes (GPs) and radial basis function (RBF) interpolation to large, high-dimensional datasets. While kernel methods provide a general, principled framework for approximating functions from scattered data, they are often seen as impractical for large data sets as the standard approach to model fitting scales cubically with the number of data points. We introduce RBFs in Section 1.3 and GPs in Section 1.4. Chapter 2 develops novel O(n) approaches for GP regression with n points using fast approximate matrix vector multiplications (MVMs). Kernel learning with GPs require solving linear systems and computing the log determinant of an n x n kernel matrix. We use iterative methods relying on the fast MVMs to solve the linear systems and leverage stochastic approximations based on Chebyshev and Lanczos to approximate the log determinant. We find that Lanczos is generally highly efficient and accurate and superior to Chebyshev for kernel learning. We consider a large variety of experiments to demonstrate the generality of this approach. Chapter 3 extends the ideas from Chapter 3 to fitting a GP to both function values and derivatives. This requires linear solves and log determinants with an n(d+1) x n(d+1) kernel matrix in d dimensions, leading to O(n^3 d^3) computations for standard methods. We extend the previous methods and introduce a pivoted Cholesky preconditioner that cuts the iterations to convergence by several orders of magnitude. Our approaches, together with dimensionality reduction, lets us scale Bayesian optimization with derivatives to high-dimensional problems and large evaluation budgets. We introduce surrogate optimization in Section 1.5. Surrogate optimization is a key application of GPs and RBFs, where they are used to model a computationally-expensive black-box function based on previous evaluations. Chapter 4 introduces a global optimization algorithm for computationally expensive black-box function based on RBFs. Given an upper bound on the semi-norm of the objective function in a reproducing kernel Hilbert space associated with the RBF, we prove that our algorithm is globally convergent even though it may not sample densely. We discuss expected convergence rates and illustrate the performance of the method via experiments on a set of test problems. Chapter 5 describes Plumbing for Optimization with Asynchronous Parallelism (POAP) and the Python Surrogate Optimization Toolbox (pySOT). POAP is an event-driven framework for building and combining asynchronous optimization strategies, designed for global optimization of computationally expensive black-box functions where concurrent function evaluations are appealing. pySOT is a collection of synchronous and asynchronous surrogate optimization strategies, implemented in the POAP framework. The pySOT framework includes a variety of surrogate models, experimental designs, optimization strategies, test problems, and serves as a useful platform to compare methods. We use pySOT, to make an extensive comparison between synchronous and asynchronous parallel surrogate optimization methods, and find that asynchrony is never worse than synchrony on several challenging multimodal test problems.
dc.identifier.doi	https://doi.org/10.7298/nmmb-0r60
dc.identifier.other	Eriksson_cornellgrad_0058F_11191
dc.identifier.other	http://dissertations.umi.com/cornellgrad:11191
dc.identifier.other	bibid: 10757986
dc.identifier.uri	https://hdl.handle.net/1813/64846
dc.language.iso	en_US
dc.subject	Gaussian Process
dc.subject	Applied mathematics
dc.subject	Global Optimization
dc.subject	Radial Basis Function
dc.subject	Scalable Machine Learning
dc.subject	Surrogate Optimization
dc.subject	Mathematics
dc.subject	Computer science
dc.subject	Bayesian optimization
dc.title	Scalable kernel methods and their use in black-box optimization
dc.type	dissertation or thesis
dcterms.license	https://hdl.handle.net/1813/59810
thesis.degree.discipline	Applied Mathematics
thesis.degree.grantor	Cornell University
thesis.degree.level	Doctor of Philosophy
thesis.degree.name	Ph. D., Applied Mathematics

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Eriksson_cornellgrad_0058F_11191.pdf
Size:: 8.4 MB
Format:: Adobe Portable Document Format

Download

Collections

Cornell Theses and Dissertations