Cornell University
Library
Cornell UniversityLibrary

eCommons

Help
Log In(current)
  1. Home
  2. Cornell University Graduate School
  3. Cornell Theses and Dissertations
  4. Building a Scalable Shared Log

Building a Scalable Shared Log

File(s)
Ding_cornellgrad_0058F_12317.pdf (411.71 KB)
Permanent Link(s)
https://doi.org/10.7298/adyw-1e71
https://hdl.handle.net/1813/103351
Collections
Cornell Theses and Dissertations
Author
Ding, Cong
Abstract

The shared log paradigm is at the heart of modern distributed applications in the growing cloud computing industry. Often, application logs must be stored durably for analytics, regulations, or failure recovery, and their smooth operation depends closely on how the log is implemented. An ideal implementation of the shared log abstraction should be capable of growing elastically in response to the needs of its client applications, without compromising availability; recover quickly from failures; adopt the data layout that best matches the performance requirement of its clients; scale write throughput without giving up on total order; and offer a low latency that satisfies applications' requirements. Unfortunately, no single shared log today can offer this combination of features. In particular, no shared log provides both total order and seamless reconfiguration, i.e., the capability to reconfigure the service without compromising its global availability. Additionally, no shared log provides both total order and low latency. In this dissertation, we analyze the challenges of achieving both total order and seamless reconfiguration, and of attaining both total order and low latency. Based on this analysis, we have built and evaluated two systems, Scalog and Ziplog, to demonstrate how to address these challenges. Scalog is a new implementation of the shared log abstraction that offers an unprecedented combination of features for continuous smooth delivery of service: Scalog allows applications to customize data placement, supports reconfiguration with no loss in availability, and recovers quickly from failures. At the same time, Scalog provides high throughput and total order. At its core is a novel ordering protocol that (1) efficiently merges the order in which records are stored at each shard to produce a single global order across shards, and (2) collates and processes records in the same batch that may originate from any client and be stored at any shard. This novel design enables Scalog to scale to thousands of shards while providing seamless reconfiguration, flexible data placement, and quick failure recovery. Ziplog is a new implementation of a totally ordered shared log that achieves latency and throughput comparable to what today can only be delivered by systems that optimize only one of these metrics at the expense of the other. Ziplog achieves these results through a new API that, instead of adding new records to the log through a linearizable Append operation, relies on a linearizable InsertAfter operation that specifies the log position past which the new record should be inserted. This new API allows Ziplog to totally order records across shards without needing cross-shard coordination and with an average latency of fewer than three message delays.

Description
115 pages
Date Issued
2020-12
Committee Chair
Alvisi, Lorenzo
Van Renesse, Robbert
Committee Member
Agarwal, Rachit
Chen, Li
Degree Discipline
Computer Science
Degree Name
Ph. D., Computer Science
Degree Level
Doctor of Philosophy
Rights
Attribution 4.0 International
Rights URI
https://creativecommons.org/licenses/by/4.0/
Type
dissertation or thesis
Link(s) to Catalog Record
https://newcatalog.library.cornell.edu/catalog/13312058

Site Statistics | Help

About eCommons | Policies | Terms of use | Contact Us

copyright © 2002-2026 Cornell University Library | Privacy | Web Accessibility Assistance