Architectural Frameworks for Automated Design and Optimization of Hardware Accelerators
No Access Until
Permanent Link(s)
Collections
Other Titles
Author(s)
Abstract
As technology scaling slows down and only provides diminishing improvements in general-purpose processor performance, computing systems are increasingly relying on customized accelerators to meet the performance and energy efficiency requirements of emerging applications. For example, today's mobile SoCs rely on accelerators to perform compute-intensive tasks, and datacenters are starting to deploy accelerators for applications such as web search and machine learning. This trend is expected to continue and future systems will contain more specialized accelerators. However, the traditional hardware-oriented accelerator design methodology is costly and inefficient because it requires significant manual effort in the design process. This development model is unsustainable in the future where a wide variety of accelerators are expected to be designed for a large number of applications. To solve this problem, the development cost of accelerators must be drastically reduced, which calls for more productive design methodologies that can create high-quality accelerators with low manual effort. This thesis addresses the above challenge with architectural frameworks that combine novel accelerator architectures with automated design and optimization frameworks to enable designing high-performance and energy-efficient accelerators with minimal manual effort. Specifically, the first part of the thesis proposes a framework for automatically generating accelerators that can effectively tolerate long, variable memory latencies, which improves performance and reduces design effort by removing the need to manually create data preloading logic. The framework leverages architecture mechanisms such as memory prefetching and access/execute decoupling, as well as automated compiler analysis to generate accelerators that can intelligently preload data needed in the future from the main memory. The second part of the thesis proposes a framework for building parallel accelerators that leverage concepts from task-based parallel programming, which enables software programmers to quickly create high-performance accelerators using familiar parallel programming paradigms, without needing to know low-level hardware design knowledge. The framework uses a computation model that supports dynamic parallelism in addition to static parallelism, and includes a flexible architecture that supports dynamic scheduling to enable mapping a wide range of parallel applications to hardware accelerators and achieve good performance. In addition, we designed a unified language that can be mapped to both software and hardware, enabling programmers to create parallel software and parallel accelerators in a unified framework. The third part of the thesis proposes a framework that enables accelerators to perform intelligent dynamic voltage and frequency scaling (DVFS) to achieve good energy-efficiency for interactive and real-time applications. The framework combines program analysis and machine learning to train predictors that can accurately predict the computation time needed for each job, and adjust the DVFS levels to reduce the energy consumption.
Journal / Series
Volume & Issue
Description
Sponsorship
Date Issued
Publisher
Keywords
Location
Effective Date
Expiration Date
Sector
Employer
Union
Union Local
NAICS
Number of Workers
Committee Chair
Committee Co-Chair
Committee Member
Zhang, Zhiru