Dependable Systems for Managing Valuable Data
Distributed computing systems are increasingly relied upon to manage valuable data that is used to make important decisions, and as a result, are subject to competing demands of dependability and performance. In this dissertation, we propose and evaluate several systems that provide desirable dependability features while maintaining high enough data throughput to be used in deployments with tight time constraints. First, we present a system for collecting data from a network of "smart" devices (such as smart power meters) that allows each device to keep its contributions anonymous while still providing accurate and timely answers to queries about the network's observed state. Next, we introduce the Derecho platform, a library for building replicated datacenter services that easily tolerates faults and keeps updates strongly consistent, yet achieves incredibly low response times and high bandwidth due to an innovative use of RDMA networking. We next explore how to guarantee data durability in replicated services, describing and implementing an algorithm for recovering replicated state machines (such as Derecho) after a shutdown that leaves only their disk-persisted state. Finally, we describe a data storage service built with the Derecho library that generates cryptographically tamper-proof logs for each update to the data it stores, adding an additional layer of dependability for high-value data.