Algorithmic techniques for optimal collective communication
Improving the performance of collective communication---algorithms for coordinated data movement in distributed computing---becomes increasingly important as the use of large-scale machine learning clusters continues to grow. In this work, we introduce techniques for synthesizing optimal algorithms for collective communication for any given host and network topology. We provide a novel multicommodity flow-based approach that allows us to find provably optimal algorithms in terms of both latency and bandwidth utilization. Critical to our techniques are two important innovations: a conversion method from fluid solutions to discrete step-based solutions, and Mirrored Dantzig-Wolfe, which exploits the high degree of symmetry often present in communication networks. We complement our theoretical contributions with experimental results, demonstrating the practical efficacy of the constructed algorithms.