Mapping Neural Network Inference Onto Heterogeneous Hardware Platforms

Other Titles


Datacenters are evolving towards heterogeneity, incorporating specialized hardware for tasks such as networking, video processing, and particularly deep learning. To effectively harness the compute capabilities of modern heterogeneous datacenters, this thesis proposes an approach for compiler-level partitioning of deep neural networks (DNNs) across interconnected hardware devices. We present a comprehensive framework for heterogeneous DNN compilation, offering automatic partitioning and device mapping. Our scheduler integrates an exact solver, utilizing a mixed integer linear programming (MILP) formulation, and a modularity-based heuristic for scalability. Additionally, we introduce a theoretical lower bound formula to assess the quality of heuristic solutions, enabling the evaluation of optimal solutions. We evaluate the proposed scheduler by optimizing both traditional DNNs and randomly-wired neural networks, while considering latency and throughput constraints. Our experiments are conducted on a heterogeneous system consisting of a CPU and two distinct GPUs. Compared to simply running DNNs on the fastest GPU, our framework achieves latency reductions of over 3× and throughput improvements of up to 2.9× by automatically leveraging both data and model parallelism. Furthermore, our modularity-based "splitting" heuristic significantly enhances solution runtime by up to 395×, without compromising solution quality compared to the exact MILP approach. Additionally, it outperforms alternative heuristic baselines by 30-60% in terms of solution quality. Lastly, we present two case studies to demonstrate the capabilities of our scheduler. The first case study investigates performance in memory-constrained environments, while the second explores the extension of our framework for scheduling large language models across multiple heterogeneous servers by leveraging symmetry in the hardware setup. Overall, this research contributes to the efficient deployment of DNNs in heterogeneous datacenters through compiler-level partitioning, showcasing improved latency, throughput, and solution scalability.

Journal / Series

Volume & Issue


77 pages


Date Issued




Deep learning; Heterogeneous compilation; Scheduling problem


Effective Date

Expiration Date




Union Local


Number of Workers

Committee Chair

Abdelfattah, Mohamed

Committee Co-Chair

Committee Member

Zhang, Zhiru

Degree Discipline

Electrical and Computer Engineering

Degree Name

M.S., Electrical and Computer Engineering

Degree Level

Master of Science

Related Version

Related DOI

Related To

Related Part

Based on Related Item

Has Other Format(s)

Part of Related Item

Related To

Related Publication(s)

Link(s) to Related Publication(s)


Link(s) to Reference(s)

Previously Published As

Government Document




Other Identifiers


Rights URI


dissertation or thesis

Accessibility Feature

Accessibility Hazard

Accessibility Summary

Link(s) to Catalog Record