Mapping Neural Network Inference Onto Heterogeneous Hardware Platforms
No Access Until
Permanent Link(s)
Collections
Other Titles
Author(s)
Abstract
Datacenters are evolving towards heterogeneity, incorporating specialized hardware for tasks such as networking, video processing, and particularly deep learning. To effectively harness the compute capabilities of modern heterogeneous datacenters, this thesis proposes an approach for compiler-level partitioning of deep neural networks (DNNs) across interconnected hardware devices. We present a comprehensive framework for heterogeneous DNN compilation, offering automatic partitioning and device mapping. Our scheduler integrates an exact solver, utilizing a mixed integer linear programming (MILP) formulation, and a modularity-based heuristic for scalability. Additionally, we introduce a theoretical lower bound formula to assess the quality of heuristic solutions, enabling the evaluation of optimal solutions. We evaluate the proposed scheduler by optimizing both traditional DNNs and randomly-wired neural networks, while considering latency and throughput constraints. Our experiments are conducted on a heterogeneous system consisting of a CPU and two distinct GPUs. Compared to simply running DNNs on the fastest GPU, our framework achieves latency reductions of over 3