Supporting Distributed Systems of Distributed Systems
The prevalence, scale, and complexity of cloud-based applications are growing rapidly. These distributed applications adopt multi-tiered designs where each tier is composed of many distributed components. Their design is dynamic to match changing workloads and requirements: a component can be added, removed, or replaced with distributed components. While there is much work on improving aspects such as the performance and fault tolerance of each part of a cloud system, there has been little focus on how these parts should be composed together. This dissertation presents a systematic approach to building large-scale cloud systems and outlines the middleware that supports it. We identify the challenges in building and maintaining "distributed systems of distributed systems" that are increasingly relevant in cloud settings, and we provide a solution that allows these systems to be derived and reconfigured over time in a modular and methodical way. To support a variety of applications, we devise a solution that is both general and backward compatible. At the core of our approach is a novel message bus that hides the low-level specification and implementation details of unrelated parts of distributed systems from one another. We demonstrate and evaluate how various distributed systems of distributed systems can be built using our approach.