Reconfigurable Architectures For Chip Multiprocessors
Prior research in chip-level reconfigurable computing has involved augmenting a single processor core with reconfigurable logic. Despite significant performance gains for some applications, the area and power costs can easily outweigh the benefits, especially when considering the breadth of applications run on a general purpose processor and the benefit they receive from reconfigurable logic, from orders of magnitude benefit to no benefit at all. Moreover, this prior work focused almost exclusively on uniprocessor systems and did not address the unique requirements of parallel applications. This dissertation proposes novel reconfigurable architectures for chip multiprocessors (CMPs). In our approach, the reconfigurable fabric is shared among multiple threads from both sequential and parallel applications to amortize the area and power costs and increase fabric utilization. To further reduce the overhead, we propose a heterogeneous CMP where different regions are optimized for different tasks, including regions with shared reconfigurable fabrics, and other regions with only conventional cores. Within a reconfigurable region, the architecture dynamically manages the use of the shared fabric and includes mechanisms that accelerate parallel applications and enable parallelization of otherwise sequential applications. We first identify a number of features from previous proposals that enable efficient sharing of reconfigurable logic. With these features in mind we design Specialized Programmable Logic (SPL), a reconfigurable fabric specially tailored for sharing among multiple cores, and evaluate and optimize the SPL under a range of both single- and multi-threaded applications. As with other shared structures, shared SPL must be intelligently controlled in order to achieve optimal performance. We propose a number of sharing schemes and find that, with proper management, shared SPL achieves performance similar to providing each core with its own large, private fabric, while substantially reducing area and peak power costs. When multiple single- and multi-threaded applications are running on multiple SPL clusters, the assignment of threads to clusters and the dynamic partitioning of the fabric significantly impact performance. To address these issues, we propose a number of management algorithms that control both thread scheduling and SPL sharing. Finally, the shared nature of the SPL makes it well suited for communicating among the attached cores. We propose modifications to the baseline SPL design that allow it to provide a means of fine-grained interthread and barrier communication among cores sharing the fabric. Performing communication through the SPL provides the additional benefit of allowing computation to be performed on the data while it is in-flight to the recipient. When incorporated into a heterogeneous CMP, the combined computation and communication abilities of the SPL provide significant benefits over a CMP with only traditional cores.
dissertation or thesis