Optimizing Foundational System Building Blocks of Datacenter Applications

Access Restricted
Access to this document is restricted. Some items have been embargoed at the request of the author, but will be made publicly available after the "No Access Until" date.
During the embargo period, you may request access to the item by clicking the link to the restricted file(s) and completing the request form. If we have contact information for a Cornell author, we will contact the author and request permission to provide access. If we do not have contact information for a Cornell author, or the author denies or does not respond to our inquiry, we will not be able to provide access. For more information, review our policies for restricted content.
No Access Until
Permanent Link(s)
Collections
Other Titles
Author(s)
Abstract
Cloud computing has become the prevailing computing infrastructure for the majority of the world's computation. Computing platforms for cloud computing and large internet services are hosted in datacenters, and optimizing the performance of datacenter applications can result in significant cost savings. Given the diversity of datacenter workloads, optimizing a single application may not yield substantial improvements in the total system efficiency, as costs are spread across numerous independent workloads. In contrast, optimizing the foundational system building blocks of datacenter applications, including high-level system infrastructures to underlying system software libraries, can significantly improve the productivity of the datacenter fleet, since entire classes of datacenter applications can benefit from such optimizations. This dissertation proposes a series of optimizations in foundational system building blocks of datacenter applications. Applications running in datacenter are often built as collections of loosely coupled services that are deployed and executed through high-level system building blocks such as serverless workflow engines and microservice frameworks. First, we focus on optimizing such a system building block at the top of the computing stack, the serverless computing framework. Despite the benefits of ease of programming, fast elasticity, and fine-grained billing, serverless computing suffers from resource inefficiency. We designed Aquatope, a QoS-and-uncertainty-aware resource scheduler for end-to-end serverless workflows that takes into account the inherent uncertainty present in FaaS platforms, and improves performance predictability and resource efficiency. Aquatope uses a set of scalable and validated Bayesian models to create prewarmed containers ahead of function invocations, and to allocate appropriate resources at function granularity to meet a complex workflow’s end-to-end QoS, while minimizing resource cost. Aquatope demonstrates that a joint solution to cold start and resource management, taking into account uncertainty, can effectively improve the resource efficiency of serverless applications. However, serverless workflows still suffer from significant control plane and inter-function communication overheads, which make them unsuitable for latency-critical applications. We also designed Meteion, a fast and efficient serverless workflow engine for latency-critical interactive applications. Meteion decouples the control plane from the workflow execution, and leverages lightweight per-function engines to enable decentralized workflow orchestration and direct inter-function communication. Meteion's DAG scheduler utilizes the workflow's latency distribution and graph structure to provision containers promptly, ensuring that functions can execute seamlessly on worker servers without falling back to the control plane. Second, we delve into a foundational system library, the memory allocator. Datacenter applications typically share the usage of certain low-level software libraries, and memory allocation constitutes a substantial component of datacenter computation. Optimizing the memory allocator can improve application performance, leading to significant cost savings. We present the first comprehensive characterization of TCMalloc at warehouse scale. Our characterization reveals a profound diversity in the memory allocation patterns, allocated object sizes and lifetimes, for large-scale datacenter workloads, as well as in their performance on heterogeneous hardware platforms. Based on these insights, we optimize TCMalloc for warehouse-scale environments. Specifically, we propose optimizations for each level of its cache hierarchy that include usage-based dynamic sizing of allocator caches, leveraging hardware topology to mitigate inter-core communication overhead, and improving allocation packing algorithms based on statistical data. Evaluation results show that these optimizations significantly improve the productivity of the datacenter fleet.
Journal / Series
Volume & Issue
Description
Sponsorship
Date Issued
Publisher
Keywords
Location
Effective Date
Expiration Date
Sector
Employer
Union
Union Local
NAICS
Number of Workers
Committee Chair
Committee Co-Chair
Committee Member
Weatherspoon, Hakim