Efficient Learning in Complex Systems: Reinforcement Learning with Graphon Re-Sampling and Financial Hedging with Optimization-Based Neural Architectures
Access to this document is restricted. Some items have been embargoed at the request of the author, but will be made publicly available after the "No Access Until" date.
During the embargo period, you may request access to the item by clicking the link to the restricted file(s) and completing the request form. If we have contact information for a Cornell author, we will contact the author and request permission to provide access. If we do not have contact information for a Cornell author, or the author denies or does not respond to our inquiry, we will not be able to provide access. For more information, review our policies for restricted content.
This thesis develops unified, scalable methods for data-driven decision-making. First, we propose the Re-Sampled Graphon Game (R-SGG), an $N$-agent reinforcement learning model designed to approximately learn the Nash Equilibrium (NE) of Multi-Population Mean Field Games (MP-MFGs). To obtain stationary NE policies, we introduce a re-sampling scheme in which agents interact on a network that is re-sampled at each time step from a piecewise constant graphon. Agents do not observe their network connections or the resulting empirical neighbor impact, while state transitions and rewards are generated with an unknown dependence on this impact. At the population level, we prove that R-SGG inherits the state reachability and mixing properties of the MP-MFG without additional assumptions. At the agent level, we examine two information schemes for Conditional Temporal Difference (CTD) learning: (i) a locally centralized setting, where policies are synchronized within each population, and (ii) a fully decentralized setting, where agents update their policies independently. We establish convergence guarantees along a single-path trajectory with a sample complexity of $\tilde{\mathcal{O}}(\epsilon^{-2})$ and $\tilde{\mathcal{O}}(\epsilon^{-2-c})$, respectively, for the locally centralized and decentralized case. Next, we formulate a Multi-Population Mean-Field Trading Game (MP-MFTG) in which each trader’s risk aversion evolves under personal trading outcomes and network-aggregated sentiment from other populations. By grouping agents into $K$ homogeneous populations linked via an influence matrix, the model remains tractable while preserving key inter-population heterogeneity. We characterize the Nash equilibrium of a regularized MP-MFTG and analyze an iterative Policy Mirror Ascent scheme, first proving convergence under full model knowledge. We then introduce a simulator-based RL algorithm that relies only on sampled state-transition and reward trajectories, and we prove its polynomial sample complexity and linear convergence to the equilibrium policy profile. Numerical experiments demonstrate both efficiency and the potential of MP-MFTG for analyzing large systems of trading agents. Finally, at the individual level, we address the opacity of existing deep-hedging models by embedding a differentiable Markowitz optimization layer—implemented via cvxpylayers—within a neural hedging network. Our Mean-Variance Deep Hedging framework treats risk-aversion and transaction-cost coefficients as trainable parameters and enforces realistic trading constraints (e.g., concentration limits, turnover caps). We show that this convex-optimization layer enriches representational power relative to standard MLP-based architectures, complies with disciplined-parameterized-problem rules for efficient differentiation, and yields transparent, post-hoc insights into learned hedge ratios. Empirical results on multi-instrument European-option hedging confirm that our framework matches the performance of existing Deep Hedging methods while providing clear interpretability of the hedging strategy.