Building Evolvable Distributed Systems for Dynamic Data Center Environments
No Access Until
Permanent Link(s)
Collections
Other Titles
Author(s)
Abstract
Distributed systems that are built to run in data centers should sustain the expected level of performance and scale to developing workloads, while at the same time handling evolving infrastructure and tolerating failures. To cope with the performance and scalability demands, systems need to incorporate techniques such as sharding, replication, and batching. It is also necessary to support online configuration changes as hardware is being updated or a new version of the system is being deployed. All this is sometimes termed "organic growth" of a distributed system. While there has been much work on how to build large-scale distributed systems as services that run in dynamic data center environments, there is little or no support for evolving them organically and understanding how this evolution changes the system. Moreover, most state-of-the-art distributed systems that undergo evolution and growth become more complex and unmanageable over time, making maintenance of such systems an increasingly difficult task. This thesis introduces Ovid, a framework for building large-scale distributed systems that need to evolve quickly as a result of changes in their functionality or the assumptions they made for their initial deployment. In practice, organic growth often makes distributed systems increasingly more complex and unmanageable. To counter this, Ovid supports transformations, automated refinements that allow distributed systems to be developed from simple components. Ovid models distributed systems as a collection of agents, self-contained state machines that communicate by exchanging messages. Next, it applies a transformation to a system, which replaces agents by one or more new agents, in effect creating a new specification for the system. Transformations can be applied recursively, resulting in a tree of transformations. Examples of transformations include replication, batching, sharding, and encryption. Ovid can automatically replicate for fault-tolerance, shard for scalable capacity, batch for higher throughput, and encrypt for better security. Refinement mappings prove that transformed systems implement the original specification, as shown by the full refinement of a storage system replicated with the Chain Replication protocol to a centralized storage system. The result is a software-defined distributed system, in which a logically centralized controller specifies the components, their interactions, and their transformations. Such systems can be updated on-the-fly, changing assumptions or providing new guarantees while keeping the original implementation of the application logic unchanged. This thesis also presents the implementation of Ovid, which includes an interactive and visual tool for specifying and transforming distributed systems and a run-time environment that deploys and runs the agents in a data center. The interactive designer makes it relatively easy, even for novice users, to construct systems that are scalable and reliable. The designer can be run from any web browser. The run-time environment evolves systems deployed in a data center and manages all execution and communication fully automatically. Finally, the evaluation for a key-value store built with Ovid shows the benefits of building a system using the Ovid framework. The performance evaluation underlines that systems that can evolve and adjust to their environment offer various performance benefits.
Journal / Series
Volume & Issue
Description
Sponsorship
Date Issued
Publisher
Keywords
Location
Effective Date
Expiration Date
Sector
Employer
Union
Union Local
NAICS
Number of Workers
Committee Chair
Committee Co-Chair
Committee Member
Sirer, Emin G.
Kleinberg, Robert David