Cornell University
Library
Cornell UniversityLibrary

eCommons

Help
Log In(current)
  1. Home
  2. Cornell University Graduate School
  3. Cornell Theses and Dissertations
  4. Trainable Fixed-Point Quantization for Deep Learning Acceleration on FPGAs

Trainable Fixed-Point Quantization for Deep Learning Acceleration on FPGAs

File(s)
Dai_cornell_0058O_11992.pdf (613.82 KB)
Permanent Link(s)
https://doi.org/10.7298/09q4-1e04
https://hdl.handle.net/1813/115806
Collections
Cornell Theses and Dissertations
Author
Dai, Dingyi
Abstract

This thesis introduces a novel fixed-point quantization aware training tool, QFX, designed to bridge the gap between trained and deployed deep learning models on resource-constrained devices, such as embedded FPGAs. Conventional quantization techniques have primarily focused on quantizing only the matrix multiplication in deep learning models during training, necessitating extensive fine-tuning efforts to quantize other layers to fixed-point precision for FPGA deployments. In this thesis, QFX enables training “hardware-ready” deep learning models to address the problem, by effectively emulating the fixed-point casting function and basic arithmetic operations, and dynamically learning the binary-point position during model training. During deployment, the fixed-point operations in the model can be seamlessly transitioned to their synthesizable counterparts supported by HLS, which eliminates numerical issues. The performance is evaluated on image classification tasks based on accuracy on practical datasets. Moreover, the thesis introduces K-hot, a multiplier-free quantization strategy within QFX, designed to minimize DSP usage. The effectiveness of this strategy is demonstrated through integration with a state-of-the-art binarized neural network accelerator, showcasing improved hardware performance on an embedded FPGA. In summary, this thesis presents a tool that simplifies the quantization process, enabling the training of high-quality fixed-point quantized models with reduced quantization overhead. The tool is intended for open-source release, and can potentially expand the audience with less preliminary quantization knowledge needed in efficient machine learning. The innovative approach of applying multiplier-free quantization to binarized neural networks also implies future possibilities for deploying extremely compressed deep learning models on resource-constrained devices.

Description
48 pages
Date Issued
2024-05
Keywords
Acceleration
•
Deep Learning
•
Fixed-point
•
FPGA
•
Quantization
Committee Chair
Zhang, Zhiru
Committee Member
Abdelfattah, Mohamed
Degree Discipline
Electrical and Computer Engineering
Degree Name
M.S., Electrical and Computer Engineering
Degree Level
Master of Science
Rights
Attribution-NoDerivatives 4.0 International
Rights URI
https://creativecommons.org/licenses/by-nd/4.0/
Type
dissertation or thesis
Link(s) to Catalog Record
https://newcatalog.library.cornell.edu/catalog/16575437

Site Statistics | Help

About eCommons | Policies | Terms of use | Contact Us

copyright © 2002-2026 Cornell University Library | Privacy | Web Accessibility Assistance