Methodologies, Architectures, and Prototypes for Scaling On- and Off-Chip Interconnects
Access to this document is restricted. Some items have been embargoed at the request of the author, but will be made publicly available after the "No Access Until" date.
During the embargo period, you may request access to the item by clicking the link to the restricted file(s) and completing the request form. If we have contact information for a Cornell author, we will contact the author and request permission to provide access. If we do not have contact information for a Cornell author, or the author denies or does not respond to our inquiry, we will not be able to provide access. For more information, review our policies for restricted content.
The slowdown of Moore’s Law and the end of Dennard scaling have driven modern computing systems to embrace parallelism, both within single chips and across multiple compute devices, in order to meet the growing computational demands. Efficient data movement, both on-chip and off-chip, has thus become increasingly critical. However, scaling on- and off-chip interconnects each presents unique challenges in both methodology and architecture. For on-chip interconnects, challenges include: (1) the methodology challenge of developing a robust framework to model, test, and evaluate on-chip network (OCN) designs across a vast design space, and (2) the architecture challenge of bridging the gap between theoretical advances and practical implementation of scalable, low-diameter OCN topologies. For off-chip interconnects, challenges include: (1) the methodology challenge of modeling large-scale distributed systems accurately, and (2) the architecture challenge of breaking the capacity, latency, and bandwidth trade-offs inherent in current off-chip interconnect technologies. This thesis addresses these challenges by developing new methodologies, proposing architectural solutions, and validating their feasibility through practical silicon prototypes.The first part of this thesis focuses on OCNs for manycore architectures. I first present PyOCN, a unified Python-based framework for modeling, testing, and evaluating on-chip networks, which vertically integrates multiple research methodologies and enables productive design space exploration of OCNs. Next, I propose practical low-diameter OCN topologies that can be effecively implemented with a tiled physical design methodology, bridging the gap between principle and practice. Finally, the CIFER chip tape-out demonstrates the feasibility and effectiveness of PyOCN as well as the tiled physical design approach. The second part of the thesis addresses challenges in scaling off-chip interconnects, particularly for machine learning workloads. I first present LLMCompass-E2E, a comprehensive framework for modeling large-scale distributed LLM training performance. I then explore the potential of emerging co-packaged silicon photonic interconnects by proposing an optically connected multi-stack HBM module which can effectively break the trade-off between memory bandwidth and capacity. Lastly, the PIPES chip tape-out demonstrates a practical implementation of such co-packaged silicon photonic interconnects, highlighting their potential for scalable, high-performance interconnect solutions in large-scale distributed systems.