Scaling the Infrastructure of Practical Blockchain Systems

Other Titles
The infrastructure of a blockchain system consists of a replication service that tolerates limited adversarial behavior among participants. It requires both a Byzantine fault tolerant (BFT) replication protocol to defend against the adversaries and an underlying storage system to preserve states. This dissertation explores two designs for BFT replication and one design for the persistent storage. We first present HotStuff, a leader-based BFT replication protocol for the partially synchronous system model. Once network communication becomes synchronous, HotStuff enables a correct leader to drive the protocol to consensus at the pace of actual (vs. maximum) network delay—a property called responsiveness—and with communication complexity that is linear in the number of replicas. To the best of our knowledge, HotStuff is the first partially synchronous BFT replication protocol exhibiting these combined properties. HotStuff is built around a novel framework that forms a bridge between classical BFT foundations and blockchains. It allows the expression of other known protocols (DLS, PBFT, Tendermint, Casper), and ours, in a common framework. Our deployment of HotStuff over a network with over 100 replicas achieves throughput and latency comparable to that of BFT-SMaRt, while enjoying a linear communication footprint during leader failover (vs. cubic with BFT-SMaRt). Then, we introduce a family of leaderless BFT protocols, exploiting metastable properties of network subsampling. These protocols provide a strong probabilistic safety guarantee in the presence of Byzantine adversaries while their concurrent and leaderless nature enables them to achieve high throughput and scalability. Unlike blockchains that rely on Proof-of-Work, blockchains built on our protocols are quiescent and green. Unlike traditional consensus protocols where typically one or more nodes must process a linear number of bits in the number of total nodes per decision, no node processes more than a logarithmic number of bits. It does not require accurate knowledge of all participants and exposes new possible tradeoffs and improvements in safety and liveness for building consensus protocols. We describe the Snow protocol family, and how it can be used to construct the core of an internet-scale electronic payment system, Avalanche, which is evaluated in a large scale deployment. Experiments demonstrate that the system can achieve high throughput, provide low confirmation latency and scale well compared to existing systems that deliver similar functionality. For our implementation and setup, the bottleneck of the system is in transaction verification. Finally we propose a new in-memory index that is also storage-friendly. A “lazy-trie” is a variant of the hash-trie data structure that achieves near-optimal height, has practical storage overhead, and can be maintained on-disk with standard write-ahead logging. We present CedrusDB, a persistent key-value store based on a lazy-trie. The lazy-trie is kept on disk while made available in memory using standard memory-mapping. The lazy-trie organization in virtual memory allows CedrusDB to better leverage concurrent processing than other on-disk index schemes (LSMs, B+-trees). CedrusDB achieves comparable or superior performance to recent log-based in-memory key-value stores in mixed workloads while being able to recover quickly from failures.
Journal / Series
Volume & Issue
196 pages
Date Issued
Blockchain; Computer Systems; Distributed Consensus; Distributed Systems; Persistent Storage
Effective Date
Expiration Date
Union Local
Number of Workers
Committee Chair
Renesse, Robbert Van
Committee Co-Chair
Committee Member
Sampson, Adrian
Kleinberg, Robert David
Degree Discipline
Computer Science
Degree Name
Ph. D., Computer Science
Degree Level
Doctor of Philosophy
Related Version
Related DOI
Related To
Related Part
Based on Related Item
Has Other Format(s)
Part of Related Item
Related To
Related Publication(s)
Link(s) to Related Publication(s)
Link(s) to Reference(s)
Previously Published As
Government Document
Other Identifiers
Attribution-ShareAlike 4.0 International
dissertation or thesis
Accessibility Feature
Accessibility Hazard
Accessibility Summary
Link(s) to Catalog Record