eCommons

 

Counterfactual Evaluation And Learning From Logged User Feedback

dc.contributor.authorSwaminathan, Adith
dc.contributor.chairJoachims, Thorsten
dc.contributor.committeeMemberGehrke, Johannes E
dc.contributor.committeeMemberTardos, Eva
dc.contributor.committeeMemberKleinberg, Robert D
dc.date.accessioned2017-07-07T12:48:28Z
dc.date.available2017-07-07T12:48:28Z
dc.date.issued2017-05-30
dc.description.abstractInteractive systems that interact with and learn from user behavior are ubiquitous today. Machine learning algorithms are core components of such systems. In this thesis, we will study how we can re-use logged user behavior data to evaluate interactive systems and train their machine learned components in a principled way. The core message of the thesis is -- Using simple techniques from causal inference, we can improve popular machine learning algorithms so that they interact reliably. -- These improvements are effective and scalable, and complement current algorithmic and modeling advances in machine learning. -- They open further avenues for research in Counterfactual Evaluation and Learning to ensure machine learned components interact reliably with users and with each other. This thesis explores two fundamental tasks — evaluation and training of interactive systems. Solving evaluation and training tasks using logged data is an exercise in counterfactual reasoning. So we will first review concepts from causal inference for counterfactual reasoning, assignment mechanisms, statistical estimation and learning theory. The thesis then contains two parts. In the first part, we will study scenarios where unknown assignment mechanisms underlie the logged data we collect. These scenarios often arise in learning-to-rank and learning-to-recommend applications. We will view these applications through the lens of causal inference and modularize the problem of building a good ranking engine or recommender system into two components — first, infer a plausible assignment mechanism and second, reliably learn to rank or recommend assuming this mechanism was active when collecting data. The second part of the thesis focuses on scenarios where we collect logged data from past interventions. We will formalize these scenarios as batch learning from logged contextual bandit feedback. We will first develop better off-policy estimators for evaluating online user-centric metrics in information retrieval applications. In subsequent chapters, we will study the bias-variance trade-off when learning from logged interventions. This study will yield new learning principles, algorithms and insights into the design of statistical estimators for counterfactual learning. The thesis outlines a few principles, tools, datasets and software that hopefully prove to be useful to you as you build your interactive learning system.
dc.identifier.doihttps://doi.org/10.7298/X4FJ2DW6
dc.identifier.otherSwaminathan_cornellgrad_0058F_10211
dc.identifier.otherhttp://dissertations.umi.com/cornellgrad:10211
dc.identifier.otherbibid: 9948780
dc.identifier.urihttps://hdl.handle.net/1813/51557
dc.language.isoen_US
dc.rightsAttribution 4.0 International*
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/*
dc.subjectStatistics
dc.subjectComputer science
dc.subjectmachine learning
dc.subjectCausality
dc.subjectOff-Policy
dc.titleCounterfactual Evaluation And Learning From Logged User Feedback
dc.typedissertation or thesis
dcterms.licensehttps://hdl.handle.net/1813/59810
thesis.degree.disciplineComputer Science
thesis.degree.grantorCornell University
thesis.degree.levelDoctor of Philosophy
thesis.degree.namePh. D., Computer Science

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Swaminathan_cornellgrad_0058F_10211.pdf
Size:
2.8 MB
Format:
Adobe Portable Document Format