Ganusov, Ilya2007-07-202012-07-202007-07-20bibid: 6476362https://hdl.handle.net/1813/7925Scaling the performance of applications with little thread-level parallelism is one of the most serious impediments to the success of multi-core architectures. At the same time, the long latency of memory accesses represents one of the largest performance bottlenecks for individual program threads. As a result, a typical microprocessor spends a significant amount of time waiting for data to be delivered from memory instead of performing useful computation. Fortunately, it is often possible to guess which memory data will be needed by a program thread in the near future. Various hardware and software prefetching techniques have been developed to fetch critical data before they are requested by the processor. This way prefetching can eliminate processor stalls otherwise induced by the slow response from the memory system. The main contribution of this dissertation is the development of two techniques that utilize extra cores of a chip multiprocessor (CMP) as prefetching engines to increase the performance of single program threads. The proposed approaches effectively leverage the execution capabilities of chip multiprocessors to compute data addresses that are likely to miss in the cache and prefetch them ahead of program thread load requests. I demonstrate the effectiveness of the proposed approaches by performing cycle-accurate simulations of a chip multiprocessor consisting of two four-way superscalar cores running the single-threaded SPEC CPU2000 benchmark suite. The proposed mechanisms provide significant performance improvements over a baseline that already includes an aggressive hardware stream prefetcher. A comparison with other multi-core prefetching mechanisms from the literature shows that the techniques proposed in this dissertation provide competitive performance, incur less energy overhead, and require considerably simpler hardware support.907568 bytesapplication/pdfen-UScomputer engineeringcomputer architecturechip multiprocessorprefetchingUsing General-Purpose Processor Cores as Prefetching Engines in Chip Multiprocessor Architectures