Thread Scheduling For Chip Multiprocessors
No Access Until
Permanent Link(s)
Collections
Other Titles
Author(s)
Abstract
Large, high frequency single-core chip designs are increasingly being replaced with larger chip multiprocessor (CMP) designs that tradeoff frequency for greater numbers of cores. Power has become a first-order design constraint, leading to designs optimized for computing efficiency, defined as the product of energy and delay (ED). This will continue, based on current technology trends, due to physical thermal and power constraints. To efficiently leverage these new and future generations of hardware, software designers write multithreaded programs. However, creating multithreaded software is non-trivial, and has been an active area of research for several decades. Software developers make certain assumptions of the underlying substrate during product design that may not be true, which can severely affect application performance. For example, software and the operating system assume off-core resources increase in tandem with numbers of cores, which is often not the case. The result is poor performance because scheduling decisions fail to properly account for this non-uniform substrate. We investigate how to schedule applications for current and future systems when their performance can be limited by frequency heterogeneity among cores, or by the sharing of unified resources. We examine application scaling on both homogeneous and frequency heterogeneous CMPs, and find multithreaded applications often do not scale with increasing numbers of cores, due to off-chip memory limitations and increased contention among threads. Increasing static and off-chip system power consumption can also mask which concurrency level (number of threads) is the most energy efficient for multithreaded programs. Operating system (OS) schedulers are unaware of these is- sues, and cannot create optimal schedules. We demonstrate how the OS and user-space programs can be made aware of these issues at run-time using hardware performance counters that already exist on chip. We evaluate user-space run-time energy-aware co-scheduling heuristics for running poorly scaling programs at efficient concurrency levels to improve overall performance and energy consumption. Our online scheduling algorithms automate the process of choosing programs to co-schedule, as well as the process of choosing the numbers of threads. Our software schedulers ensure the fairness of co-scheduled applications, and improve computing efficiency for the entire multiprogrammed workload. By using software schedulers, we do not require hardware modifications, making our work portable to different platforms. We extend this work by investigating thread scheduling and application coscheduling for frequency heterogenous processors. By extending our schedulers to be aware of processor heterogeneity and mapping application threads to the processors that run them the most efficiently, we achieve improved computing efficiency over scheduling in a processor-oblivious manner.