Non-convex and Interactive Learning via Stochastic Optimization

dc.contributor.authorSekhari, Ayush
dc.contributor.chairSridharan, Karthiken_US
dc.contributor.committeeMemberKleinberg, Roberten_US
dc.contributor.committeeMemberTardos, Evaen_US
dc.contributor.committeeMemberSun, Wenen_US
dc.description512 pagesen_US
dc.description.abstractAdvances in machine learning have led to many empirical breakthroughs in computer science, from developing state-of-the-art image classification models to defeating the world champion in the game of Go. From statistical learning (e.g. image recognition) to interactive learning (e.g. the game of Go), this success is primarily driven by a shift towards using over-parameterized non-convex models, like deep neural networks, for learning. In practice, these high-dimensional models are trained by formulating the learning task as a stochastic optimization problem and using simple first-order algorithms like Stochastic Gradient Descent (SGD) to solve them. In the first part of the thesis, I will discuss my work on understanding why SGD succeeds in solving high-dimensional stochastic non-convex optimization problems arising in practice, and its implications for non-convex learning. It is widely believed that SGD works so well because it has an implicit bias that guides the algorithm towards good solutions that generalize well. I will start by discussing the limitations of this approach for understanding SGD, and then present a new framework based on Lyapunov potential that can overcome these limitations. This framework is based on exploiting a deep connection between the rates at which gradient flow converges on the test loss, the geometrical properties of the test loss, and an associated Lyapunov potential. In the second part of the thesis, I will discuss my work on understanding interactive learning through the lens of stochastic optimization. Much of the recent research in interactive learning, e.g.~in MDPs with large state-action spaces, has been limited to the stochastic setting where the underlying MDP is fixed throughout the interaction, and the learner has a value function class that contains the optimal value function for that MDP. I will consider two interactive learning settings: agnostic RL, and learning with an adversarially changing environment, where these assumptions do not hold. For agnostic RL, I will discuss a statistically optimal learning algorithm for low-rank MDPs that is based on using auto-regressions to estimate the value of a policy, and for finding the best policy. The coefficients for the auto-regressions are estimated by solving a stochastic non-convex optimization problem. For learning with adversarially changing environments, we provide a general approach to solving adversarial decision-making problems by reducing them to full information online learning using a per-step minimax optimization problem.en_US
dc.rightsAttribution-NonCommercial 4.0 International*
dc.subjectDeep learning theoryen_US
dc.subjectInteractive Learningen_US
dc.subjectMachine learning theoryen_US
dc.subjectStochastic optimizationen_US
dc.titleNon-convex and Interactive Learning via Stochastic Optimizationen_US
dc.typedissertation or thesisen_US
dcterms.license Science University of Philosophy D., Computer Science


Original bundle
Now showing 1 - 1 of 1
Thumbnail Image
3.66 MB
Adobe Portable Document Format