Cost-Aware Resource Management for Decentralized Internet Services
Decentralized network services, such as naming systems, content distribution networks, and publish-subscribe systems, play an increasingly critical role and are required to provide high performance, low latency service, achieve high availability in the presence of network and node failures, and handle a large volume of users. Judicious utilization of expensive system resources, such as memory space, network bandwidth, and number of machines, is fundamental to achieving the above properties. Yet, current network services typically rely on less-informed, heuristic-based techniques to manage scarce resources, and often fall short of expectations. This thesis presents a principled approach for building high performance, robust, and scalable network services. The key contribution of this thesis is to show that resolving the fundamental cost-benefit tradeoff between resource consumption and performance through mathematical optimization is practical in large-scale distributed systems, and enables decentralized network services to meet efficiently system-wide performance goals. This thesis presents a practical approach for resource management in three stages: analytically model the cost-benefit tradeoff as a constrained optimization problem, determine a near-optimal resource allocation strategy on the fly, and enforce the derived strategy through light-weight, decentralized mechanisms. It builds on self-organizing structured overlays, which provide failure resilience and scalability, and complements them with stronger performance guarantees and robustness under sudden changes in workload. This work enables applications to meet system-wide performance targets, such as low average response times, high cache hit rates, and small update dissemination times with low resource consumption. Alternatively, applications can make the maximum use of available resources, such as storage and bandwidth, and derive large gains in performance. I have implemented an extensible framework called Honeycomb to perform cost-aware resource management on structured overlays based on the above approach and built three critical network services using it. These services consist of a new name system for the Internet called CoDoNS that distributes data associated with domain names, an open-access content distribution network called CobWeb that caches web content for faster access by users, and an online information monitoring system called Corona that notifies users about changes to web pages. Simulations and performance measurements from a planetary-scale deployment show that these services provide unprecedented performance improvement over the current state of the art.
Computer Science; Distributed Systems
dissertation or thesis