Theory and Practice of Large-scale Logistics: Offline Contextual Bandits and Decomposition Methods
This dissertation addresses two important problems in large-scale logistics, motivated by first-hand experience in industry and military settings. The first part focuses on offline contextual bandits, drawing from work at Uber on driver incentive programs. I introduce Empirical Soft Regret (ESR), a novel loss function for value-based learning that addresses limitations of accuracy-based approaches in misspecified settings. Unlike standard methods that fail when reward models are poorly specified, ESR provably yields policies that asymptotically achieve optimal performance while remaining compatible with gradient-based optimization. The value of this approach is demonstrated through applications in health datasets, news recommendation, and computational materials science. The second part addresses large-scale logistics involving simultaneous routing and scheduling of commodity deliveries across intermodal networks. In collaboration with the United States Marine Corps and Navy, I develop a mixed-integer programming formulation for expeditionary warfare logistics that captures the various physical constraints placed on the network. To address computational limitations, I propose an efficient solution method based on dual decomposition that leverages Lagrangian duality to split the problem into smaller, computationally tractable subproblems. This work bridges the gap between operations research theory and practice, demonstrating how theoretical foundations can be successfully translated into practical solutions for complex real-world logistics challenges.