Ephemeral Vector Engines
With the recent end of Dennard’s scaling and slowdown of Moore’s law, computer architects have turned to specialization to retain the regular improvements in performance and efficiency conventionally obtained through process advancements. Although traditional fixed-function acceleration is able to achieve high performance and efficiency by leveraging specialization, it struggles with low programmability and lack of flexibility. Previous work has shown the ability of next-generation vector abstraction to balance programmability and specialization. The recent rise in popularity of next-generation vector architectures highlights the inherent tension between its two traditional micro-architectures: integrated vector unit and dedicated vector engine. While integrated vector unit achieves modest performance with low area-overhead, dedicated vector engine achieves higher performance at the expense of higher area-overhead. This thesis leverages recent advancements in in-situ compute-in-memory to address this tension. The culmination of this thesis, ephemeral vector engines (EVE), aims at solving this tension. EVE is a novel next-generation vector micro-architecture leveraging SRAM-based compute-in-memory (S-CIM) circuits to reconfigure private L2 caches on-the-fly to support next-generation vector execution. While previous work on S-CIM has explored bit-serial execution, this thesis further explores bit-parallel execution with the following conclusion: bit-serial achieves high-throughput but high-latency, while bit-parallel lowers the latency greatly at the expense of lower throughput. This thesis considers a bit-hybrid approach instead to balance throughput and latency. To evaluate the area and cycle-time of EVE, this thesis presents a detailed circuit template that enables bit-hybrid S-CIM with varying parallelization factor. To evaluate the performance of EVE, this thesis leverages high-fidelity cycle-approximate models for an integrated vector unit, a decoupled vector engine, and EVE. By leveraging S-CIM, EVE increases performance by 4.59x over an integrated vector unit, thus matching the performance of a decoupled vector engine while incurring a tenth of its area-overhead. In summary, EVE leverages S-CIM to achieve a performance comparable to the decoupled vector engine, while incurring an area-overhead equivalent to that of the integrated vector unit.