eCommons

 

Hardware-Software Techniques for Improving Resource Efficiency in Datacenters

Other Titles

Abstract

Cloud multi-tenancy, which is a major contributor to cost efficiency, leads to unpredictable performance due to interference in the shared resources, especially when it comes to interactive services. Multi-tenancy is disallowed altogether, degrading utilization or ---at best--- interactive services are co-scheduled with low priority, best-effort workloads, whose performance can be sacrificed when deemed necessary. In this dissertation, we propose to improve server utilization by co-scheduling latency-critical applications with batch applications and mitigate the interference in shared resources using various hardware and software techniques. We specifically explore leveraging approximation, resource partitioning (viz. core relocation, LLC and memory capacity partitioning), and reconfigurable cores. Approximate computing applications offer the opportunity to enable tighter colocation among multiple applications whose performance is important. We present Pliant, a lightweight cloud runtime that leverages the ability of approximate computing applications to tolerate some loss in their output quality to boost the utilization of shared servers. During periods of high resource contention, Pliant employs incremental and interference-aware approximation to reduce contention in shared resources, and prevent QoS violations for co-scheduled interactive, latency-critical services. Reconfigurable cores allow fine-grained power and performance adjustments in cores and open up more opportunities for colocation, as they can adjust to the dynamic needs of a specific mix of co-scheduled applications. Additionally, reconfigurable cores are an attractive solution to manage power among the applications executing on a server node that share the node-wide power budget. We propose CuttleSys, an online resource manager that combines scalable machine learning and fast design space exploration to determine the performance and power of an application across all possible core and cache reconfigurations, and effectively navigate the large design space to arrive at a high-performing solution while operating under a power budget. CuttleSys combines performance and power inference using Stochastic Gradient Descent, with a highly-parallelized design space exploration algorithm geared towards high-dimensional searches. The combination of these two techniques permits efficiently identifying a per-core configuration and cache partition, in a way that meets QoS for interactive services and maximizes throughput for co-scheduled batch workloads, while operating under a power budget.

Journal / Series

Volume & Issue

Description

163 pages

Sponsorship

Date Issued

2020-05

Publisher

Keywords

approximation; datacenter; latency-critical applications; power management; reconfigurable; resource efficiency

Location

Effective Date

Expiration Date

Sector

Employer

Union

Union Local

NAICS

Number of Workers

Committee Chair

Albonesi, David
Delimitrou, Christina

Committee Co-Chair

Committee Member

Martinez, Jose

Degree Discipline

Electrical and Computer Engineering

Degree Name

Ph. D., Electrical and Computer Engineering

Degree Level

Doctor of Philosophy

Related Version

Related DOI

Related To

Related Part

Based on Related Item

Has Other Format(s)

Part of Related Item

Related To

Related Publication(s)

Link(s) to Related Publication(s)

References

Link(s) to Reference(s)

Previously Published As

Government Document

ISBN

ISMN

ISSN

Other Identifiers

Rights

Rights URI

Types

dissertation or thesis

Accessibility Feature

Accessibility Hazard

Accessibility Summary

Link(s) to Catalog Record