Software-Oriented Hardware Prefetching and Vector Execution
The hardware-software abstraction enables programmers to write high-level algorithms without delving into low-level microarchitectural details. Compilers, positioned at the interface of hardware and software, perform numerous optimizations to enhance performance. Nonetheless, their functionality is limited by the ISA contract designed by hardware developers. Rethinking this abstraction can unlock powerful optimizations at the compiler stage. For instance, emerging scalable vector ISAs expose hardware vector length as a programmable constant to the software, which, with compiler support, can improve vectorization opportunities in addition to code portability. Additionally, hardware prefetchers come with software prefetching knobs to leverage programmer knowledge for performance gains. However, this control is limited, unable to influence dynamic prefetching decisions made by the hardware, which has been shown to cause performance regression in datacenter settings. This thesis aims to enhance compiler-guided optimizations for autovectorization and hardware prefetching. The auto-vectorization evaluation identifies compiler shortcomings with scalable vector ISAs, and proposes ScaleIR as a prototype, to improve mask representations in the LLVM IR. ProP uses profile-guided hints to better guide hardware prefetching decisions. Together, these projects enable compilers to effectively leverage and redefine the software-hardware abstraction, boosting performance and efficiency.