THREE ESSAYS ON DISCRETE CHOICE LATENT CLASS MODEL, ONLINE REUSABLE RESOURCE ALLOCATION AND TRANSFORMER MODEL FOR OFFLINE REINFORCEMENT LEARNING
My dissertation is composed of three parts related to decision models and strategies in different contexts. The first part studies discrete models' latent class assignment strategies and analyzes the statistical behavior of six competing class assignment strategies. A Monte Carlo study and two empirical case studies are used to show that assigning individuals to classes based on maximum multinomial logit probabilities behaves better than randomly drawn classes in market share predictions. While randomly drawn classes have higher accuracy in predicted class shares, class assignment based on individual-level conditional estimates that account for the sampling distribution of the assignment parameters shows superior behavior for a larger number of choice occasions per individual. The second part considers the widely-studied reusable-resource allocation problem – a principal can allocate up to $C$ identical resources (or servers) in an online manner to arriving agents, each of whom requires a server for $D$ periods, and offers a random reward sampled from some known underlying process. In contrast to past work, in high-dimensional regimes where $C$ and $D$ are large, we demonstrate policies with exponential convergence to the hindsight optimal value under discrete-valued arrival processes (including time-varying arrivals). Our results are based on a novel combination of receding-horizon control along with recent sample-path coupling techniques; this approach may prove useful in establishing similar exponential convergence and state-space collapse results in more complex control settings. The third part proposes the Re-evaluated Transformer model for model-free offline reinforcement learning tasks to both better learn state values and to address data coverage issues. The proposed model incorporates a novel and concise Re-evaluation mechanism in the sequential model, which leverages the trained model to predict and update state values. Extensive experiments on multiple offline RL benchmarks are conducted to demonstrate state-of-the-art performance.