JavaScript is disabled for your browser. Some features of this site may not work without it.
Off-policy Evaluation and Learning for Interactive Systems

Author
Su, Yi
Abstract
Recent advances in reinforcement learning (RL) provide exciting potential for making agents learn, plan and act effectively in uncertain environments. Most existing algorithms in RL rely on known environments or the existence of a good simulator, where it is cheap to explore and collect the training data. However, this is not the case for human-centered interactive systems, in which online sampling or experimentation is costly, dangerous, or even illegal. This dissertation advocates an alternative data-driven approach that aims to evaluate and improve the performance of intelligent systems by only using the logged data from prior versions of the system (a.k.a. off-policy evaluation and learning). While such data is collected in large quantity as a byproduct of system operation, reasoning them is difficult since the data is biased and partial in nature. We present our key contributions in off-policy evaluation and learning for the contextual bandit setting, which is a state-less form of RL that is highly relevant to many real-world applications. This includes the discovery of a general family of counterfactual estimators for off-policy evaluation, which subsumes most estimators proposed to date; a principled optimization-based framework for automatically designing estimators, instead of manually constructing them; a data-driven model selection technique in off-policy policy evaluation settings; as well as various approaches for handling support-deficient data in the off-policy learning setting.
Description
206 pages
Date Issued
2021-08Committee Chair
Joachims, Thorsten
Committee Member
Sridharan, Karthik; Kallus, Nathan
Degree Discipline
Statistics
Degree Name
Ph. D., Statistics
Degree Level
Doctor of Philosophy
Rights
Attribution 4.0 International
Rights URI
Type
dissertation or thesis
Except where otherwise noted, this item's license is described as Attribution 4.0 International