Cornell University
Library
Cornell UniversityLibrary

eCommons

Help
Log In(current)
  1. Home
  2. Cornell University Graduate School
  3. Cornell Theses and Dissertations
  4. Automatic Scaling Iterative Computations

Automatic Scaling Iterative Computations

File(s)
gw222.pdf (1.69 MB)
Permanent Link(s)
https://hdl.handle.net/1813/34246
Collections
Cornell Theses and Dissertations
Author
Wang, Guozhang
Abstract

In this thesis, we address the problem of efficiently and automatically scaling iterative computational applications through parallel programming frameworks. While there has been much progress in designing and developing parallel platforms with high level programming paradigms for batch-oriented applications, these platforms are ill-fitted for iterative computations due to their ignorance of resident data and enforcement of "embarrassingly parallel" batch-style processing of data sets within every computational operators. To address these challenges we propose a set of methods that leverage certain properties of iterative computations to enhance the performance of the resulting parallel programs for these large-scale iterative applications. More specifically, we (1) leverage data locality to reduce communication overhead within individual iterations due to data transfer, and (2) leverage sparse data dependency to further minimize inter-process synchronization overhead and enable asynchronous executions by relaxing the consistency requirements of iterative computations. To illustrate (1) we propose a large-scale programming framework for behavioral simulations. Our framework allows developers to script their simulation agent behavior logic using an object-oriented Java-like programming language and parallelize the resulting simulation systems with millions of scripted agents by compiling the per-agent behavior logic as iterative spatial joins and distributing this query plan into a cluster of machines. We use various query optimization techniques such as query rewrite and indexing to boost the singlemachine performance of the program. More importantly, we leverage the spatial locality properties of the scripted agent behavior logic to reduce the intermachine communication overhead. To illustrate (2), we present a parallel platform for iterative graph processing applications. Our platform distinguishes itself from previous parallel graph processing systems in that it combines the easy programmability of a synchronous processing model with the high performance of asynchronous executions. This combination is achieved by separating the application's computational logic from the underlying execution policies in our platform: developers only need to code their applications once with a synchronous programming model based on message passing between vertices, where the sparse data dependency is completely captured by the messages. Developers can then customize methods of handling message reception and selection to effectively choose different synchronous or asynchronous execution policies via relaxing of the consistency requirements of the application encoded on the messages.

Date Issued
2013-08-19
Keywords
Iterative
•
Large Scale
•
Programming Frameworks
Committee Chair
Gehrke, Johannes E.
Committee Member
Li, Ping
Bindel, David S.
Joachims, Thorsten
Degree Discipline
Computer Science
Degree Name
Ph. D., Computer Science
Degree Level
Doctor of Philosophy
Type
dissertation or thesis

Site Statistics | Help

About eCommons | Policies | Terms of use | Contact Us

copyright © 2002-2026 Cornell University Library | Privacy | Web Accessibility Assistance