Cornell University
Library
Cornell UniversityLibrary

eCommons

Help
Log In(current)
  1. Home
  2. Cornell University Graduate School
  3. Cornell Theses and Dissertations
  4. RDMA-Accelerated State Machine for Cloud Services

RDMA-Accelerated State Machine for Cloud Services

File(s)
Jha_cornellgrad_0058_13330.pdf (4.12 MB)
Permanent Link(s)
https://doi.org/10.7298/1mxw-cv03
https://hdl.handle.net/1813/112934
Collections
Cornell Theses and Dissertations
Author
Jha, Sagar
Abstract

With the advent of RDMA, networks that provide upward of 100Gbps data-transfer speeds and sub-microsecond latencies have become widely available. These high speeds and low delays are a disruptive departure from prior capabilities, and force a redesign of the systems and applications running above them. In this dissertation, we tackle a core question: optimal support for systems that employ data or state machine replication for strong consistency. Such systems are common and vital in the cloud, and state machine replication is a building block for stronger higher level guarantees such as the ones commonly sought in database applications. Advances thus have the potential for broad impact. This dissertation focuses on the design and implementation of Derecho, a user-space library written in C++ that offers fault-tolerant state machine replication to application subgroups. We motivate Derecho's unusual design, and show that the system's in-memory replication performs close to the network capacity for a variety of application scenarios and cluster settings. We demonstrate that the system is reliable, both through proofs and experiments: replicas maintain the same state through crash failures and network partitions, and the system will only restart correctly, after first repairing any damaged durable state on persistent storage. Derecho reexpresses Paxos protocols in a novel manner that allows replicas to agree on state updates asynchronously, resulting in continuous, non-blocking data paths. Derecho's in-memory replication outperforms traditional Paxos systems on TCP by a factor of 80X and a modern Paxos system on RDMA by a factor of 25X. Theoretical study reveals that Derecho's protocols are optimal under a model put forward by the theory community more than a decade ago, but never previously viewed as a practical goal that might be of value in deployed systems. Although the focus of the author has been quite practical throughout his work (a focus reflected in the choices of material included in this dissertation), this optimality confirms our intuition about the style of knowledge-based programming best matched to the shared state table (SST) abstraction proposed by this author, around which Derecho was built. Specifically, Derecho expresses atomic multicast and Paxos protocols using a form of coding best understood as epistemic (temporal) logic programming, which in turn is expressed over a monotonic table encoding system state that is continuously extended in a lock-free, asynchronous, uncoordinated manner. For a variety of actions important in atomic multicast or Paxos, our protocols are designed to deduce the point at which those actions can be safely taken. We think of the shared system state as a monotonic knowledge representation in which new information extends but never invalidates past information. Efficiency, for a protocol of this kind, centers on making such deductions as soon as it is safe to do so, and then ensuring that the protocols themselves impose what might be termed {\em knowledge-minimal} requirements: they should await the weakest condition adequate for safe progress. The SST design was carefully optimized to run protocols expressed this way as efficiently as possible, hence effort invested to tune and optimize protocols expressed over the SST leads to the same place that an effort to derive an optimal formulation would have arrived at. This fortuitous insight is simply that the shift in style compelled by RDMA, because of its specific latency and throughput properties, happens to favor the network model within which optimality becomes the most natural coding style! Our goals in creating Derecho were primarily practical, and it is in some ways surprising to realize that simply by setting out to build a fast data replication tool for extremely fast network hardware, we arrived at a design point in which the most optimal protocol designs from a performance perspective also would turn out to be optimal in a more formal sense. We had not anticipated this when we started the project. It is perhaps surprising that this alignment was not previously noticed by the theory community: the state machine replication as a model can be traced back to papers written 40 years ago, and there have been hundreds (if not thousands) of research papers and systems implementing the model. However, the same properties that make RDMA so disruptive for systems builders also are disruptive relative to classical ways of modeling and abstracting networks: RDMA forces us to think about shared memory models rather than purely in terms of message passing. Surely this explains why even the most efficient of the previous generation of Paxos solutions turns out to be one or more orders of magnitude slower than our Derecho protocols. What we encountered in our work should be of relevance in many other kinds of systems, protocols and distributed algorithms. We hope that future researchers will pick up the thread and carry it even further: what we achieved for atomic multicast and state machine replication would surely be possible in many other settings.

Description
305 pages
Date Issued
2022-12
Keywords
Derecho
•
Distributed Systems
•
Paxos
•
Performance
•
RDMA
•
Replication
Committee Chair
Birman, Kenneth
Committee Member
Van Renesse, Robbert
Meszaros, Karola
Myers, Andrew
Degree Discipline
Computer Science
Degree Name
Ph. D., Computer Science
Degree Level
Doctor of Philosophy
Type
dissertation or thesis
Link(s) to Catalog Record
https://newcatalog.library.cornell.edu/catalog/15644176

Site Statistics | Help

About eCommons | Policies | Terms of use | Contact Us

copyright © 2002-2026 Cornell University Library | Privacy | Web Accessibility Assistance