Cornell University
Library
Cornell UniversityLibrary

eCommons

Help
Log In(current)
  1. Home
  2. Cornell University Graduate School
  3. Cornell Theses and Dissertations
  4. Mapping Neural Network Inference Onto Heterogeneous Hardware Platforms

Mapping Neural Network Inference Onto Heterogeneous Hardware Platforms

File(s)
Ghannane_cornell_0058O_11853.pdf (1.73 MB)
Permanent Link(s)
https://doi.org/10.7298/brpz-5c38
https://hdl.handle.net/1813/114448
Collections
Cornell Theses and Dissertations
Author
Ghannane, Yassine
Abstract

Datacenters are evolving towards heterogeneity, incorporating specialized hardware for tasks such as networking, video processing, and particularly deep learning. To effectively harness the compute capabilities of modern heterogeneous datacenters, this thesis proposes an approach for compiler-level partitioning of deep neural networks (DNNs) across interconnected hardware devices. We present a comprehensive framework for heterogeneous DNN compilation, offering automatic partitioning and device mapping. Our scheduler integrates an exact solver, utilizing a mixed integer linear programming (MILP) formulation, and a modularity-based heuristic for scalability. Additionally, we introduce a theoretical lower bound formula to assess the quality of heuristic solutions, enabling the evaluation of optimal solutions. We evaluate the proposed scheduler by optimizing both traditional DNNs and randomly-wired neural networks, while considering latency and throughput constraints. Our experiments are conducted on a heterogeneous system consisting of a CPU and two distinct GPUs. Compared to simply running DNNs on the fastest GPU, our framework achieves latency reductions of over 3$\times$ and throughput improvements of up to 2.9$\times$ by automatically leveraging both data and model parallelism. Furthermore, our modularity-based "splitting" heuristic significantly enhances solution runtime by up to 395$\times$, without compromising solution quality compared to the exact MILP approach. Additionally, it outperforms alternative heuristic baselines by 30-60% in terms of solution quality. Lastly, we present two case studies to demonstrate the capabilities of our scheduler. The first case study investigates performance in memory-constrained environments, while the second explores the extension of our framework for scheduling large language models across multiple heterogeneous servers by leveraging symmetry in the hardware setup. Overall, this research contributes to the efficient deployment of DNNs in heterogeneous datacenters through compiler-level partitioning, showcasing improved latency, throughput, and solution scalability.

Description
77 pages
Date Issued
2023-08
Keywords
Deep learning
•
Heterogeneous compilation
•
Scheduling problem
Committee Chair
Abdelfattah, Mohamed
Committee Member
Zhang, Zhiru
Degree Discipline
Electrical and Computer Engineering
Degree Name
M.S., Electrical and Computer Engineering
Degree Level
Master of Science
Type
dissertation or thesis
Link(s) to Catalog Record
https://newcatalog.library.cornell.edu/catalog/16219219

Site Statistics | Help

About eCommons | Policies | Terms of use | Contact Us

copyright © 2002-2026 Cornell University Library | Privacy | Web Accessibility Assistance