Cornell University
Library
Cornell UniversityLibrary

eCommons

Help
Log In(current)
  1. Home
  2. Cornell University Graduate School
  3. Cornell Theses and Dissertations
  4. Expressive Models In Online Learning

Expressive Models In Online Learning

File(s)
Sharma, Yogeshwer.pdf (1.16 MB)
Permanent Link(s)
https://hdl.handle.net/1813/17704
Collections
Cornell Theses and Dissertations
Author
Sharma, Yogeshwer
Abstract

We study the online learning model: a widely applicable model for making repeated choices in an interactive environment. In standard online learning model (or an online learning problem), the decision-maker is provided with a set of alternatives, and selects one alternative in each of the T sequential trials, deriving a reward for each selection. After T trials, the total reward of the decision-maker is compared with the best "single-arm" strategy which has the maximum reward in hindsight. The difference between the reward of the best single-arm strategy and that of the algorithm is called the regret , and one seeks decision-making algorithms whose regret is sublinear in T and running time is polynomial in the problem size. In this thesis, we extend the basic online learning model in two important ways. In the first extension, we model sponsored search auctions as a multi-armed bandit problem (a type of online learning problem), and allow the alternatives (or advertisers in this case) to be strategic which can report possibly wrong rewards (in order to make personal gains). We seek to provide incentives to advertisers so as to get good solutions (socially efficient solutions). We prove that any socially efficient solution that provides right incentives to advertisers (being dominant strategy truthful) must suffer much higher regret than the regret suffered by algorithms for multi-armed problem without incentive issues. In the second extension, also motivated by sponsored search and resource selection in distributed systems, we allow the set of available alternatives to vary over time, provide a natural way to define the regret, and give policies for the decision-maker that suffer low regret. We also prove that the regret suffered by our policies is information-theoretically the lowest possible.

Date Issued
2010-10-20
Type
dissertation or thesis

Site Statistics | Help

About eCommons | Policies | Terms of use | Contact Us

copyright © 2002-2026 Cornell University Library | Privacy | Web Accessibility Assistance