Non-convex and Interactive Learning via Stochastic Optimization

Other Titles


Advances in machine learning have led to many empirical breakthroughs in computer science, from developing state-of-the-art image classification models to defeating the world champion in the game of Go. From statistical learning (e.g. image recognition) to interactive learning (e.g. the game of Go), this success is primarily driven by a shift towards using over-parameterized non-convex models, like deep neural networks, for learning. In practice, these high-dimensional models are trained by formulating the learning task as a stochastic optimization problem and using simple first-order algorithms like Stochastic Gradient Descent (SGD) to solve them. In the first part of the thesis, I will discuss my work on understanding why SGD succeeds in solving high-dimensional stochastic non-convex optimization problems arising in practice, and its implications for non-convex learning. It is widely believed that SGD works so well because it has an implicit bias that guides the algorithm towards good solutions that generalize well. I will start by discussing the limitations of this approach for understanding SGD, and then present a new framework based on Lyapunov potential that can overcome these limitations. This framework is based on exploiting a deep connection between the rates at which gradient flow converges on the test loss, the geometrical properties of the test loss, and an associated Lyapunov potential. In the second part of the thesis, I will discuss my work on understanding interactive learning through the lens of stochastic optimization. Much of the recent research in interactive learning, e.g.~in MDPs with large state-action spaces, has been limited to the stochastic setting where the underlying MDP is fixed throughout the interaction, and the learner has a value function class that contains the optimal value function for that MDP. I will consider two interactive learning settings: agnostic RL, and learning with an adversarially changing environment, where these assumptions do not hold. For agnostic RL, I will discuss a statistically optimal learning algorithm for low-rank MDPs that is based on using auto-regressions to estimate the value of a policy, and for finding the best policy. The coefficients for the auto-regressions are estimated by solving a stochastic non-convex optimization problem. For learning with adversarially changing environments, we provide a general approach to solving adversarial decision-making problems by reducing them to full information online learning using a per-step minimax optimization problem.

Journal / Series

Volume & Issue


512 pages


Date Issued




Deep learning theory; Generalization; Interactive Learning; Machine learning theory; Stochastic optimization


Effective Date

Expiration Date




Union Local


Number of Workers

Committee Chair

Sridharan, Karthik

Committee Co-Chair

Committee Member

Kleinberg, Robert
Tardos, Eva
Sun, Wen

Degree Discipline

Computer Science

Degree Name

Ph. D., Computer Science

Degree Level

Doctor of Philosophy

Related Version

Related DOI

Related To

Related Part

Based on Related Item

Has Other Format(s)

Part of Related Item

Related To

Related Publication(s)

Link(s) to Related Publication(s)


Link(s) to Reference(s)

Previously Published As

Government Document




Other Identifiers


Attribution-NonCommercial 4.0 International


dissertation or thesis

Accessibility Feature

Accessibility Hazard

Accessibility Summary

Link(s) to Catalog Record