Cornell University
Library
Cornell UniversityLibrary

eCommons

Help
Log In(current)
  1. Home
  2. Cornell University Graduate School
  3. Cornell Theses and Dissertations
  4. Reinforcement Learning with Latent Low Rank Structure

Reinforcement Learning with Latent Low Rank Structure

File(s)
Sam_cornellgrad_0058F_15218.pdf (1.17 MB)
Permanent Link(s)
https://doi.org/10.7298/pt05-8h73
https://hdl.handle.net/1813/120914
Collections
Cornell Theses and Dissertations
Author
Sam, Tyler
Abstract

The practicality of reinforcement learning algorithms has been limited due to poor scaling with respect to the problem size; without structural assumptions, algorithms incur a sample complexity bound or regret bound that scales with the product of the sizes of the state space $|S|$ and action space $|A|$ of a finite horizon MDP, e.g., $|S||A|$ or $\sqrt{|S||A|}$. In this work, we consider a class of MDPs that have latent low rank structure. We first investigate what are the minimal low rank assumptions that admit efficient reinforcement learning. We first show that without imposing further assumptions beyond low rank of $Q^*$, if one is constrained to estimate the $Q$ function using only observations from a subset of entries, there is a worst case instance in which one must incur a sample complexity exponential in the horizon $H$ to learn a near optimal policy. We subsequently show that under stronger low rank structural assumptions, given access to a generative model, Low Rank Monte Carlo Policy Iteration (LR-MCPI) and Low Rank Empirical Value Iteration (LR-EVI) achieve the desired sample complexity of $\tilde{O}\left((|S|+|A|)\mathrm{poly}(d,H)/\epsilon^2\right)$ for a rank $d$ setting, which is minimax optimal with respect to the scaling of $|S|, |A|$, and $\epsilon$. In contrast to literature on linear and low-rank MDPs, we do not require a known feature mapping, our algorithm is computationally simple, and our results hold for long time horizons. We next consider the problem of transferring a latent low rank representation when the source and target MDPs have transition kernels with Tucker rank $(|S|, d, |A| )$, $(|S| , |S| , d), (d, |S| , |A| )$, or $(d , d , d )$. In each setting, we introduce the transfer-ability coefficient $\alpha$ that measures the difficulty of representational transfer. Our algorithm learns latent representations in each source MDP and then exploits the linear structure to remove the dependence on $|S| , |A| $, or $|S| |A| $ in the target MDP regret bound. We complement our positive results with information theoretic lower bounds that show our algorithms (excluding the ($d, d, d$) setting) are minimax-optimal with respect to $\alpha$.

Description
244 pages
Date Issued
2025-08
Keywords
Reinforcement Learning
Committee Chair
Yu, Christina
Committee Member
Chen, Yudong
Sun, Wen
Frazier, Peter
Degree Discipline
Operations Research and Information Engineering
Degree Name
Ph. D., Operations Research and Information Engineering
Degree Level
Doctor of Philosophy
Rights
Attribution 4.0 International
Rights URI
https://creativecommons.org/licenses/by/4.0/
Type
dissertation or thesis

Site Statistics | Help

About eCommons | Policies | Terms of use | Contact Us

copyright © 2002-2026 Cornell University Library | Privacy | Web Accessibility Assistance