Hardware-Software Techniques for Improving Resource Efficiency in Datacenters
No Access Until
Permanent Link(s)
Collections
Other Titles
Author(s)
Abstract
Cloud multi-tenancy, which is a major contributor to cost efficiency, leads to unpredictable performance due to interference in the shared resources, especially when it comes to interactive services. Multi-tenancy is disallowed altogether, degrading utilization or ---at best--- interactive services are co-scheduled with low priority, best-effort workloads, whose performance can be sacrificed when deemed necessary. In this dissertation, we propose to improve server utilization by co-scheduling latency-critical applications with batch applications and mitigate the interference in shared resources using various hardware and software techniques. We specifically explore leveraging approximation, resource partitioning (viz. core relocation, LLC and memory capacity partitioning), and reconfigurable cores. Approximate computing applications offer the opportunity to enable tighter colocation among multiple applications whose performance is important. We present Pliant, a lightweight cloud runtime that leverages the ability of approximate computing applications to tolerate some loss in their output quality to boost the utilization of shared servers. During periods of high resource contention, Pliant employs incremental and interference-aware approximation to reduce contention in shared resources, and prevent QoS violations for co-scheduled interactive, latency-critical services. Reconfigurable cores allow fine-grained power and performance adjustments in cores and open up more opportunities for colocation, as they can adjust to the dynamic needs of a specific mix of co-scheduled applications. Additionally, reconfigurable cores are an attractive solution to manage power among the applications executing on a server node that share the node-wide power budget. We propose CuttleSys, an online resource manager that combines scalable machine learning and fast design space exploration to determine the performance and power of an application across all possible core and cache reconfigurations, and effectively navigate the large design space to arrive at a high-performing solution while operating under a power budget. CuttleSys combines performance and power inference using Stochastic Gradient Descent, with a highly-parallelized design space exploration algorithm geared towards high-dimensional searches. The combination of these two techniques permits efficiently identifying a per-core configuration and cache partition, in a way that meets QoS for interactive services and maximizes throughput for co-scheduled batch workloads, while operating under a power budget.
Journal / Series
Volume & Issue
Description
Sponsorship
Date Issued
Publisher
Keywords
Location
Effective Date
Expiration Date
Sector
Employer
Union
Union Local
NAICS
Number of Workers
Committee Chair
Delimitrou, Christina