eCommons

 

Scalable kernel methods and their use in black-box optimization

dc.contributor.authorEriksson, David
dc.contributor.chairBindel, David S.
dc.contributor.committeeMemberVan Loan, Charles Francis
dc.contributor.committeeMemberShoemaker, Christine Ann
dc.contributor.committeeMemberTownsend, Alex John
dc.date.accessioned2019-04-02T13:59:56Z
dc.date.available2019-04-02T13:59:56Z
dc.date.issued2018-12-30
dc.description.abstractThis dissertation uses structured linear algebra to scale kernel regression methods based on Gaussian processes (GPs) and radial basis function (RBF) interpolation to large, high-dimensional datasets. While kernel methods provide a general, principled framework for approximating functions from scattered data, they are often seen as impractical for large data sets as the standard approach to model fitting scales cubically with the number of data points. We introduce RBFs in Section 1.3 and GPs in Section 1.4. Chapter 2 develops novel O(n) approaches for GP regression with n points using fast approximate matrix vector multiplications (MVMs). Kernel learning with GPs require solving linear systems and computing the log determinant of an n x n kernel matrix. We use iterative methods relying on the fast MVMs to solve the linear systems and leverage stochastic approximations based on Chebyshev and Lanczos to approximate the log determinant. We find that Lanczos is generally highly efficient and accurate and superior to Chebyshev for kernel learning. We consider a large variety of experiments to demonstrate the generality of this approach. Chapter 3 extends the ideas from Chapter 3 to fitting a GP to both function values and derivatives. This requires linear solves and log determinants with an n(d+1) x n(d+1) kernel matrix in d dimensions, leading to O(n^3 d^3) computations for standard methods. We extend the previous methods and introduce a pivoted Cholesky preconditioner that cuts the iterations to convergence by several orders of magnitude. Our approaches, together with dimensionality reduction, lets us scale Bayesian optimization with derivatives to high-dimensional problems and large evaluation budgets. We introduce surrogate optimization in Section 1.5. Surrogate optimization is a key application of GPs and RBFs, where they are used to model a computationally-expensive black-box function based on previous evaluations. Chapter 4 introduces a global optimization algorithm for computationally expensive black-box function based on RBFs. Given an upper bound on the semi-norm of the objective function in a reproducing kernel Hilbert space associated with the RBF, we prove that our algorithm is globally convergent even though it may not sample densely. We discuss expected convergence rates and illustrate the performance of the method via experiments on a set of test problems. Chapter 5 describes Plumbing for Optimization with Asynchronous Parallelism (POAP) and the Python Surrogate Optimization Toolbox (pySOT). POAP is an event-driven framework for building and combining asynchronous optimization strategies, designed for global optimization of computationally expensive black-box functions where concurrent function evaluations are appealing. pySOT is a collection of synchronous and asynchronous surrogate optimization strategies, implemented in the POAP framework. The pySOT framework includes a variety of surrogate models, experimental designs, optimization strategies, test problems, and serves as a useful platform to compare methods. We use pySOT, to make an extensive comparison between synchronous and asynchronous parallel surrogate optimization methods, and find that asynchrony is never worse than synchrony on several challenging multimodal test problems.
dc.identifier.doihttps://doi.org/10.7298/nmmb-0r60
dc.identifier.otherEriksson_cornellgrad_0058F_11191
dc.identifier.otherhttp://dissertations.umi.com/cornellgrad:11191
dc.identifier.otherbibid: 10757986
dc.identifier.urihttps://hdl.handle.net/1813/64846
dc.language.isoen_US
dc.subjectGaussian Process
dc.subjectApplied mathematics
dc.subjectGlobal Optimization
dc.subjectRadial Basis Function
dc.subjectScalable Machine Learning
dc.subjectSurrogate Optimization
dc.subjectMathematics
dc.subjectComputer science
dc.subjectBayesian optimization
dc.titleScalable kernel methods and their use in black-box optimization
dc.typedissertation or thesis
dcterms.licensehttps://hdl.handle.net/1813/59810
thesis.degree.disciplineApplied Mathematics
thesis.degree.grantorCornell University
thesis.degree.levelDoctor of Philosophy
thesis.degree.namePh. D., Applied Mathematics

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Eriksson_cornellgrad_0058F_11191.pdf
Size:
8.4 MB
Format:
Adobe Portable Document Format