Cornell University
Library
Cornell UniversityLibrary

eCommons

Help
Log In(current)
  1. Home
  2. Cornell University Graduate School
  3. Cornell Theses and Dissertations
  4. Efficient Learning in Complex Systems: Reinforcement Learning with Graphon Re-Sampling and Financial Hedging with Optimization-Based Neural Architectures

Efficient Learning in Complex Systems: Reinforcement Learning with Graphon Re-Sampling and Financial Hedging with Optimization-Based Neural Architectures

Access Restricted

Access to this document is restricted. Some items have been embargoed at the request of the author, but will be made publicly available after the "No Access Until" date.

During the embargo period, you may request access to the item by clicking the link to the restricted file(s) and completing the request form. If we have contact information for a Cornell author, we will contact the author and request permission to provide access. If we do not have contact information for a Cornell author, or the author denies or does not respond to our inquiry, we will not be able to provide access. For more information, review our policies for restricted content.

File(s)
Huo_cornellgrad_0058F_15253.pdf (1.04 MB)
No Access Until
2027-09-09
Permanent Link(s)
https://doi.org/10.7298/ahrq-a633
https://hdl.handle.net/1813/120806
Collections
Cornell Theses and Dissertations
Author
Huo, Peihan
Abstract

This thesis develops unified, scalable methods for data-driven decision-making. First, we propose the Re-Sampled Graphon Game (R-SGG), an $N$-agent reinforcement learning model designed to approximately learn the Nash Equilibrium (NE) of Multi-Population Mean Field Games (MP-MFGs). To obtain stationary NE policies, we introduce a re-sampling scheme in which agents interact on a network that is re-sampled at each time step from a piecewise constant graphon. Agents do not observe their network connections or the resulting empirical neighbor impact, while state transitions and rewards are generated with an unknown dependence on this impact. At the population level, we prove that R-SGG inherits the state reachability and mixing properties of the MP-MFG without additional assumptions. At the agent level, we examine two information schemes for Conditional Temporal Difference (CTD) learning: (i) a locally centralized setting, where policies are synchronized within each population, and (ii) a fully decentralized setting, where agents update their policies independently. We establish convergence guarantees along a single-path trajectory with a sample complexity of $\tilde{\mathcal{O}}(\epsilon^{-2})$ and $\tilde{\mathcal{O}}(\epsilon^{-2-c})$, respectively, for the locally centralized and decentralized case. Next, we formulate a Multi-Population Mean-Field Trading Game (MP-MFTG) in which each trader’s risk aversion evolves under personal trading outcomes and network-aggregated sentiment from other populations. By grouping agents into $K$ homogeneous populations linked via an influence matrix, the model remains tractable while preserving key inter-population heterogeneity. We characterize the Nash equilibrium of a regularized MP-MFTG and analyze an iterative Policy Mirror Ascent scheme, first proving convergence under full model knowledge. We then introduce a simulator-based RL algorithm that relies only on sampled state-transition and reward trajectories, and we prove its polynomial sample complexity and linear convergence to the equilibrium policy profile. Numerical experiments demonstrate both efficiency and the potential of MP-MFTG for analyzing large systems of trading agents. Finally, at the individual level, we address the opacity of existing deep-hedging models by embedding a differentiable Markowitz optimization layer—implemented via cvxpylayers—within a neural hedging network. Our Mean-Variance Deep Hedging framework treats risk-aversion and transaction-cost coefficients as trainable parameters and enforces realistic trading constraints (e.g., concentration limits, turnover caps). We show that this convex-optimization layer enriches representational power relative to standard MLP-based architectures, complies with disciplined-parameterized-problem rules for efficient differentiation, and yields transparent, post-hoc insights into learned hedge ratios. Empirical results on multi-instrument European-option hedging confirm that our framework matches the performance of existing Deep Hedging methods while providing clear interpretability of the hedging strategy.

Description
133 pages
Date Issued
2025-08
Keywords
Game Theory
•
Machine Learning
•
Network Theory
•
Quantitative Finance
•
Reinforcement Learning
Committee Chair
Minca, Andreea
Committee Member
Jarrow, Robert
Sosoe, Philippe
Degree Discipline
Applied Mathematics
Degree Name
Ph. D., Applied Mathematics
Degree Level
Doctor of Philosophy
Rights
Attribution 4.0 International
Rights URI
https://creativecommons.org/licenses/by/4.0/
Type
dissertation or thesis

Site Statistics | Help

About eCommons | Policies | Terms of use | Contact Us

copyright © 2002-2026 Cornell University Library | Privacy | Web Accessibility Assistance