Improving Resource Efficiency for Colocation of Multiple Latency-Critical Applications in Datacenters
Modern datacenters host many interactive, latency-critical (LC) services, such as web search, social networking, online maps, etc. These services have strict quality-of-service (QoS) requirements, whereby most requests must be fulfilled within a given latency constraint to guarantee good user experience. At the same time, datacenters usually adopt multi-tenancy, i.e., scheduling jobs from multiple users on the same physical host, to increase server utilization and cost efficiency. Unfortunately, multi-tenancy often comes at a performance penalty, as co-scheduled applications contend for shared resources, leading to interference and performance unpredictability. Interference is particularly destructive for LC applications, which must meet strict quality of service (QoS) guarantees. This dissertation aims to tackle the challenge of hardware resource management of interactive, latency-critical services in datacenters, to improve resource efficiency while meeting the strict latency/QoS constraints under multi-tenancy. We start by performing a comprehensive analysis to understand the impact and importance of resource management to LC applications. Then, we present three runtime resource managers for LC applications, to tackle the challenge of resource management from three different aspects. First, this dissertation presents PARTIES, a QoS-aware resource manager that enables an arbitrary number of latency-critical applications to share a physical node without QoS violations. Compared to prior work, PARTIES overcomes the limitation of a single LC application per node in modern datacenters, to allow more efficiency gains from multi-tenancy. PARTIES leverages a set of hardware and software resource partitioning mechanisms to adjust allocations dynamically at runtime, in a way that meets the QoS requirements of each co-scheduled application, and maximizes the aggregated throughput for the entire machine. Second, we tackle the same challenge of QoS-aware resource management under colocation, but in a different system platform. Motivated by the emerging hardware trend of adopting processing-in-memory (PIM) to combat the slowdown of Moore's Law, we study the implications of PIM to LC applications, and presents PIMCloud, a QoS-aware resource manager designed for PIM-enabled cloud systems. Similar to PARTIES, PIMCloud allows colocation of multiple LC and BE applications while meeting the QoS targets of all the LC applications. However, PIMCloud is designed for PIM-enabled servers, and specifically manages PIM-introduced resources including heterogeneous cores and memory. Finally, during the development of PARTIES (which focuses on multi-resource interactions), we find that power management alone can be further optimized by leveraging request and application level information. Therefore, the last component of this dissertation is ReTail, a QoS-aware power manager for interactive latency-critical applications. It provides a general and systematic framework to predict per-request latency, which is further used to optimize power consumption for each application. ReTail can be applied on top of PARTIES, to save power under multi-tenancy.
cloud; datacenter; power management; processing in memory; resource management
Martínez, José F.
van Renesse, Robbert; Delimitrou, Christina
Electrical and Computer Engineering
Ph. D., Electrical and Computer Engineering
Doctor of Philosophy
dissertation or thesis