Cornell University
Library
Cornell UniversityLibrary

eCommons

Help
Log In(current)
  1. Home
  2. Cornell University Graduate School
  3. Cornell Theses and Dissertations
  4. Improving Data Efficiency and Availability for Alignment

Improving Data Efficiency and Availability for Alignment

File(s)
Tucker_cornellgrad_0058F_14777.pdf (2.23 MB)
Permanent Link(s)
http://doi.org/10.7298/pvzv-wh28
https://hdl.handle.net/1813/117238
Collections
Cornell Theses and Dissertations
Author
Tucker, Aaron
Abstract

AI systems have become increasingly powerful by using increasing amounts of compute in order to implement more and more complex tasks. For example, writing a rules-based search engine that explicitly computes what links a specific user would be interested in for a given query is prohibitively difficult, but machine learning can predict what links a user is most likely to click on. Reinforcement learning from human feedback takes ML further by learning a reward model from preference comparisons to improve an LLM's responses, rather than trying to specify good behavior directly. In both of these settings, human feedback is critical to aligning ML systems for personalization and to perform difficult-to-directly-specify tasks. While compute gets cheaper and cheaper, human attention stays expensive. How can human feedback be more efficient? How can it be more available? This thesis presents several projects which improve data efficiency and data availability for aligning AI systems using human feedback. First, it analyzes the exploration and exploitation tradeoff induced by introducing an explicit cost for observing rewards in a bandit setting. Second, it provides methods for allocating a fixed interaction budget to improve the data efficiency of off-policy learning and evaluation in realistic settings appropriate for search and recommendation systems. Third, it shows how to improve LLM-based assistants using implicit feedback gathered from user interactions. Finally, it outlines preliminary results and promising directions for future work for using human feedback.

Description
199 pages
Date Issued
2024-12
Keywords
Active Learning
•
Alignment
•
Bandits
•
Off-policy Learning
•
RLHF
Committee Chair
Joachims, Thorsten
Committee Member
Sun, Wen
Weinberger, Kilian
Degree Discipline
Computer Science
Degree Name
Ph. D., Computer Science
Degree Level
Doctor of Philosophy
Rights
Attribution 4.0 International
Rights URI
https://creativecommons.org/licenses/by/4.0/
Type
dissertation or thesis
Link(s) to Catalog Record
https://newcatalog.library.cornell.edu/catalog/16922020

Site Statistics | Help

About eCommons | Policies | Terms of use | Contact Us

copyright © 2002-2026 Cornell University Library | Privacy | Web Accessibility Assistance