Towards Efficient And Reliable Publish-Subscribe For Geo-Distributed Datacenters
Topic-based publish-subscribe systems have become an increasingly critical part of infrastructure that supports today's cloud-based services. Such systems collect, store, and disseminate log records over many datacenters across the globe, each containing thousands of inexpensive fault-prone machines. Reliability, scalability, and high performance are all desirable. Achieving all these properties at the same time is a significant research challenge. Scaling out data collection and dissemination naively could bring about unconventionally high bandwidth over-subscription and network congestion. Storing data for reliability comes with the cost of storage capacity and potential system slowdown. This thesis seeks to meet the challenge by providing novel building blocks for topic-based publish-subscribe systems. We introduce the Sprinkler reliable broadcast facility that scales out data dissemination over geo-distributed datacenters. We propose a storage framework that supports the concept of rediversification to scale up the storage. We present a structure called funnelling trees to scale out data collection. We show the design and implementation of a novel form of garbage collection, a technique that can be incorporated with all three tasks to reduce stress brought by high workload. Under typical web caching workloads, the benefit of garbage collection is significant. Together with these components, this thesis provides a complete picture including frameworks, protocols, and implementations. We address all three ele- ments in a topic-based publish-subscribe service-data collection, storage, and dissemination. Sprinkler achieves both reliability and scalability across geodistributed datacenters under sustained high workload.
Orman,Levent V.; Foster,John N.
Ph.D. of Computer Science
Doctor of Philosophy
dissertation or thesis