Unbiased Learning-to-Rank from Logged Implicit Feedback
Learning-to-rank (LTR) search results in large scale industrial information retrieval settings, such as personal email and e-commerce, directly from logged implicit user feedback such as clicks is highly attractive since such feedback is ubiquitous, routinely collected, user-focused and time-sensitive unlike manual relevance annotations or slow, disruptive A/B testing protocols. However, LTR from such feedback is challenging since it can be very partial and biased as signals of relevance. In particular, position bias must be addressed since higher ranks are more likely to be examined and clicked, and thus naively interpreting clicks as relevance labels leads to undesirable feedback loops and sub-optimal ranking quality. Towards this end, we develop a theoretical framework based on counterfactual reasoning that systematically deals with the various forms of position bias inherent in user behavior, and demonstrate its effectiveness in several real-world settings including Gmail and Arxiv search. While the framework can be adapted for any form of implicit feedback, we primarily focus on click data since they are routinely logged and reliable indicators of user intent. We present our key contributions within this framework, especially Intervention Harvesting, the first method for consistent position-bias estimation without additional online interventions or relevance modeling using logs from multiple rankers. The general unbiased LTR framework, and addressing position-dependent trust bias in relevance evaluation (in addition to examination bias) are also described in detail.
Wilson, Andrew; Sridharan, Karthik
Ph. D., Computer Science
Doctor of Philosophy
dissertation or thesis