Adaptive Preference Learning With Bandit Feedback: Information Filtering, Dueling Bandits and Incentivizing Exploration

Other Titles
Abstract
In this thesis, we study adaptive preference learning, in which a machine learning system learns users' preferences from feedback while simultaneously using these learned preferences to help them find preferred items. We study three different types of user feedback in three application setting: cardinal feedback with application in information filtering systems, ordinal feedback with application in personalized content recommender systems, and attribute feedback with application in review aggregators. We connect these settings respectively to existing work on classical multi-armed bandits, dueling bandits, and incentivizing exploration. For each type of feedback and application setting, we provide an algorithm and a theoretical analysis bounding its regret. We demonstrate through numerical experiments that our algorithms outperform existing benchmarks.
Journal / Series
Volume & Issue
Description
Sponsorship
Date Issued
2017-12-30
Publisher
Keywords
Statistics; Operations research; Computer science; adaptive preference learning; bandit feedback; dueling bandits; incentivizing exploration; information filtering; multi-armed bandits
Location
Effective Date
Expiration Date
Sector
Employer
Union
Union Local
NAICS
Number of Workers
Committee Chair
Frazier, Peter
Committee Co-Chair
Committee Member
Topaloglu, Huseyin
Joachims, Thorsten
Degree Discipline
Operations Research
Degree Name
Ph. D., Operations Research
Degree Level
Doctor of Philosophy
Related Version
Related DOI
Related To
Related Part
Based on Related Item
Has Other Format(s)
Part of Related Item
Related To
Related Publication(s)
Link(s) to Related Publication(s)
References
Link(s) to Reference(s)
Previously Published As
Government Document
ISBN
ISMN
ISSN
Other Identifiers
Rights
Rights URI
Types
dissertation or thesis
Accessibility Feature
Accessibility Hazard
Accessibility Summary
Link(s) to Catalog Record