JavaScript is disabled for your browser. Some features of this site may not work without it.
Decoupling Algorithm from Hardware Customizations for Software-Defined Reconfigurable Computing

Author
Lai, Yi-Hsiang
Abstract
With the pursuit of improving compute performance under strict power constraints, there is an increasing need for deploying applications to heterogeneous hardware architectures with spatial accelerators such as FPGAs. However, although these heterogeneous computing platforms are becoming widely available, they are very difficult to program especially with FPGAs. As a result, the use of such platforms has been limited to a small subset of programmers with specialized hardware knowledge. In this dissertation, we first provide a taxonomy of the essential techniques for building a high-performance FPGA accelerator, which requires customizations of the compute engines, memory hierarchy, and data representations. We also summarize a rich spectrum of work on programming abstractions and optimizing compilers that provide different trade-offs between performance and productivity. Next we present SuSy, a programming framework composed of a domain-specific language (DSL) and a compilation flow that enables programmers to productively build high-performance systolic arrays on FPGAs. With SuSy, programmers express the design functionality in the form of uniform recurrence equations (UREs). The URE description in SuSy is followed by a set of decoupled spatial mapping primitives that specify how to map the equations to a spatial architecture. More concretely, programmers can apply space-time transformations and several other memory and I/O optimizations to build a highly efficient systolic architecture productively. After that, we present HeteroCL, an open-source programming infrastructure composed of a Python-based domain-specific language and an FPGA-targeted compilation flow. Similar to SuSy, HeteroCL cleanly decouples algorithm specifications from three important types of hardware customization in compute, data types, and memory architectures. In addition, HeteroCL produces highly efficient hardware implementations for a variety of popular workloads by targeting spatial architecture templates such as systolic arrays and stencil with dataflow architectures. Finally, we introduce DrTrace, a trace-based online profiling technique that enables automated validation and recommendation for application-specific data reuse. Unlike existing work that leverages static analysis, our proposed technique can infer stencil operations from programs with data-dependent memory accesses. Moreover, by integrating the proposed profiling technique with HeteroCL, the stencil operations can be mapped to efficient hardware such as dataflow pipelines with line buffers.
Description
145 pages
Date Issued
2022-05Committee Chair
Zhang, Zhiru
Committee Member
Suh, Edward; Sampson, Adrian
Degree Discipline
Electrical and Computer Engineering
Degree Name
Ph. D., Electrical and Computer Engineering
Degree Level
Doctor of Philosophy
Type
dissertation or thesis