eCommons

 

Off-policy Evaluation and Learning for Interactive Systems

dc.contributor.authorSu, Yi
dc.contributor.chairJoachims, Thorsten
dc.contributor.committeeMemberSridharan, Karthik
dc.contributor.committeeMemberKallus, Nathan
dc.date.accessioned2021-12-20T20:48:53Z
dc.date.available2021-12-20T20:48:53Z
dc.date.issued2021-08
dc.description206 pages
dc.description.abstractRecent advances in reinforcement learning (RL) provide exciting potential for making agents learn, plan and act effectively in uncertain environments. Most existing algorithms in RL rely on known environments or the existence of a good simulator, where it is cheap to explore and collect the training data. However, this is not the case for human-centered interactive systems, in which online sampling or experimentation is costly, dangerous, or even illegal. This dissertation advocates an alternative data-driven approach that aims to evaluate and improve the performance of intelligent systems by only using the logged data from prior versions of the system (a.k.a. off-policy evaluation and learning). While such data is collected in large quantity as a byproduct of system operation, reasoning them is difficult since the data is biased and partial in nature. We present our key contributions in off-policy evaluation and learning for the contextual bandit setting, which is a state-less form of RL that is highly relevant to many real-world applications. This includes the discovery of a general family of counterfactual estimators for off-policy evaluation, which subsumes most estimators proposed to date; a principled optimization-based framework for automatically designing estimators, instead of manually constructing them; a data-driven model selection technique in off-policy policy evaluation settings; as well as various approaches for handling support-deficient data in the off-policy learning setting.
dc.identifier.doihttps://doi.org/10.7298/8wee-5d36
dc.identifier.otherSu_cornellgrad_0058F_12733
dc.identifier.otherhttp://dissertations.umi.com/cornellgrad:12733
dc.identifier.urihttps://hdl.handle.net/1813/110655
dc.language.isoen
dc.rightsAttribution 4.0 International
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/
dc.titleOff-policy Evaluation and Learning for Interactive Systems
dc.typedissertation or thesis
dcterms.licensehttps://hdl.handle.net/1813/59810
thesis.degree.disciplineStatistics
thesis.degree.grantorCornell University
thesis.degree.levelDoctor of Philosophy
thesis.degree.namePh. D., Statistics

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Su_cornellgrad_0058F_12733.pdf
Size:
4.48 MB
Format:
Adobe Portable Document Format