Computational and Statistical Properties of Optimal Transport-Based Distances
Over the past two decades, the statistical and computational properties of optimal transport (OT) have been systematically studied, driven in part by OT's broad applicability across data science, statistics, economics, physics, and other fields. This body of work has identified the curse of dimensionality inherent in statistical estimation with OT distances and has led the development of computationally and statistically efficient regularizations of OT. The first contribution of this thesis is a general framework for deriving limit distributions and other asymptotic properties of empirical regularized OT distances. While OT distances enable a natural comparison between distributions on a common space, comparing datasets of different types -- such as text and images -- requires specifying an ad hoc cost function, which may fail to capture a meaningful correspondence between data points. To address this issue, Gromov-Wasserstein (GW) distances have been proposed, enabling a comparison of metric measure spaces based on their intrinsic structure. As a result, GW distances have found widespread use in applications involving heterogeneous data. The second contribution of this thesis is to establish the first limit laws for empirical GW distances and to derive consistent resampling schemes. Given the broad applicability of GW distances, their computation is of a particular interest. However, existing algorithms for computing regularized GW distances lack comprehensive convergence rate guarantees. To address this gap, the final contribution of this thesis is to introduce the first algorithms for computing regularized GW distances subject to formal non-asymptotic convergence rate guarantees.