High Performance Sequential Execution In Fine-Grain Multicore Processors Via Core Aggregation
This dissertation presents core fusion, a reconﬁgurable chip multiprocessor (CMP) architecture where groups of fundamentally independent cores can dynamically morph into a larger CPU, or they can be used as distinct processing elements, as needed at run time by applications. Core fusion improves sequentialcode performance and thus gracefully accommodates software diversity in future’s highly-parallel CMPs. It provides a single execution model across all conﬁgurations, requires no additional programming effort or specialized compiler support, maintains ISA compatibility, and leverages mature micro-architecture technology. We ﬁrst present an effective approach to dynamically fuse multiple narrowissue out-of-order cores into a more powerful out-of-order execution engine. The use of out-of-order base cores provides the design with valuable opportunities for latency hiding. Next, we present a second set of mechanisms to dynamically fuse multiple in-order cores into a more powerful out-of-order execution engine. In-order cores are extremely power-efﬁcient and simple, and they help maximize core count, which is ideal for exploiting thread-level parallelism (TLP). However, sequential-code performance is signiﬁcantly degraded. Enabling core fusion on such substrates proves to be very effective in boosting performance, and only with relatively small hardware overhead.
dissertation or thesis