Adaptive Bayes-Optimal Methods for Stochastic Search with Applications to Preference Learning
Pallone, Stephen Nicholas
Many fundamental problems in mathematics can be considered search problems, where one can make sequential queries to find a point with desired properties. This includes convex optimization, in which queries are points within the feasible region, and the corresponding subgradient implies a separating hyperplane that allows non-optimal points to be discarded. Search problems can be solved from a Bayes-optimal viewpoint, where a prior distribution is maintained over the search space quantifying the likelihood of the desired point's location. In this manner, queries and their responses allow this prior to be updated with new information, and the objective is to ask the best query to learn the location of the desired point as efficiently as possible. This framework can be adapted to solving variants of one dimensional deterministic search, which includes work such as developing query strategies associated with parallel function evaluations, or analyzing a bisection-like method that sorts the roots of functions that must be evaluated together. Most significantly, this idea is used in multidimensional search in the application of learning the preferences of a single user. In this case, the queries are comparative questions, where multiple alternatives are presented to the user, and they choose the most preferred option. The preferences of the user are encoded in a linear classifier, where the user's utility for an alternative roughly corresponds to the dot product between this classifier and a feature vector. The response model enables answers to be contaminated by probabilistic noise, allowing the analysis of responses that break mathematical principles such as transitivity. The objective is to minimize the expected posterior loss of this classifier by adaptively and sequentially selecting the best subset of alternatives to present to the user. We consider multiple forms of loss, most notably including posterior entropy, and provide results characterizing these best subsets. We develop theory yielding tight bounds for measures of loss under the optimal policy, show how computationally tractable algorithms can be implemented, and analyze these policies in two unique computational scenarios.
preferences; Operations research; active learning; bayesian; bisection search; information theory
Frazier, PeterHenderson, Shane G.
Lewis, Adrian S.; Joachims, Thorsten
Ph. D., Operations Research
Doctor of Philosophy
Attribution 4.0 International
dissertation or thesis
Except where otherwise noted, this item's license is described as Attribution 4.0 International