Cornell University
Library
Cornell UniversityLibrary

eCommons

Help
Log In(current)
  1. Home
  2. Cornell University Graduate School
  3. Cornell Theses and Dissertations
  4. Algorithm-Accelerator Co-Design for High-performance and Secure Deep Learning

Algorithm-Accelerator Co-Design for High-performance and Secure Deep Learning

File(s)
Hua_cornellgrad_0058_13376.pdf (4.87 MB)
Permanent Link(s)
https://doi.org/10.7298/187f-ce12
https://hdl.handle.net/1813/112932
Collections
Cornell Theses and Dissertations
Author
Hua, Weizhe
Abstract

Deep learning has emerged as a new engine for many of today's artificial intelligence/machine learning systems, leading to several recent breakthroughs in vision and natural language processing tasks.However, as we move into the era of deep learning with billions and even trillions of parameters, meeting the computational and memory requirements to train and serve state-of-the-art models has become extremely challenging. Optimizing the computational cost and memory footprint of deep learning models for better system performance is critical to the widespread deployment of deep learning. Moreover, a massive amount of sensitive and private user data is exposed to the deep learning system during the training or serving process. Therefore, it is essential to investigate potential vulnerabilities in existing deep learning hardware, and then design secure deep learning systems that provide strong privacy guarantees for user data and the models that learn from the data. In this dissertation, we propose to co-design the deep learning algorithms and hardware architectural techniques to improve both the performance and security/privacy of deep learning systems. On high-performance deep learning, we first introduce channel gating neural network (CGNet), which exploits the dynamic sparsity of specific inputs to reduce computation of convolutional neural networks. We also co-develop an ASIC accelerator for CGNet that can turn theoretical FLOP reduction into wall-clock speedup. Secondly, we present Fast Linear Attention with a Single Head (FLASH), a state-of-the-art language model specifically designed for Google's TPU that can achieve transformer-level quality with linear complexity with respect to the sequence length. Through our empirical studies on masked language modeling, auto-regressive language modeling, and fine-tuning for question answering, FLASH achieves at least similar if not better quality compared to the augmented transformer, while being significantly faster (e.g., up to 12 times faster). On the security of deep learning, we study the side-channel vulnerabilities of existing deep learning accelerators. We then introduce a secure accelerator architecture for privacy-preserving deep learning, named GuardNN. GuardNN provides a trusted execution environment (TEE) with specialized protection for deep learning, and achieves a small trusted computing base and low protection overhead at the same time. The FPGA prototype of GuardNN achieves a maximum performance overhead of 2.4% across four different modern DNNs models for ImageNet.

Description
176 pages
Date Issued
2022-12
Committee Chair
Suh, Edward
Committee Member
Zhang, Zhiru
De Sa, Christopher
Degree Discipline
Electrical and Computer Engineering
Degree Name
Ph. D., Electrical and Computer Engineering
Degree Level
Doctor of Philosophy
Rights
Attribution 4.0 International
Rights URI
https://creativecommons.org/licenses/by/4.0/
Type
dissertation or thesis
Link(s) to Catalog Record
https://newcatalog.library.cornell.edu/catalog/15644063

Site Statistics | Help

About eCommons | Policies | Terms of use | Contact Us

copyright © 2002-2026 Cornell University Library | Privacy | Web Accessibility Assistance