Automatic Scaling Iterative Computations

Wang, Guozhang

Automatic Scaling Iterative Computations

dc.contributor.author	Wang, Guozhang	en_US
dc.contributor.chair	Gehrke, Johannes E.	en_US
dc.contributor.committeeMember	Li, Ping	en_US
dc.contributor.committeeMember	Bindel, David S.	en_US
dc.contributor.committeeMember	Joachims, Thorsten	en_US
dc.date.accessioned	2013-09-16T16:42:43Z
dc.date.available	2018-08-20T06:01:05Z
dc.date.issued	2013-08-19	en_US
dc.description.abstract	In this thesis, we address the problem of efficiently and automatically scaling iterative computational applications through parallel programming frameworks. While there has been much progress in designing and developing parallel platforms with high level programming paradigms for batch-oriented applications, these platforms are ill-fitted for iterative computations due to their ignorance of resident data and enforcement of "embarrassingly parallel" batch-style processing of data sets within every computational operators. To address these challenges we propose a set of methods that leverage certain properties of iterative computations to enhance the performance of the resulting parallel programs for these large-scale iterative applications. More specifically, we (1) leverage data locality to reduce communication overhead within individual iterations due to data transfer, and (2) leverage sparse data dependency to further minimize inter-process synchronization overhead and enable asynchronous executions by relaxing the consistency requirements of iterative computations. To illustrate (1) we propose a large-scale programming framework for behavioral simulations. Our framework allows developers to script their simulation agent behavior logic using an object-oriented Java-like programming language and parallelize the resulting simulation systems with millions of scripted agents by compiling the per-agent behavior logic as iterative spatial joins and distributing this query plan into a cluster of machines. We use various query optimization techniques such as query rewrite and indexing to boost the singlemachine performance of the program. More importantly, we leverage the spatial locality properties of the scripted agent behavior logic to reduce the intermachine communication overhead. To illustrate (2), we present a parallel platform for iterative graph processing applications. Our platform distinguishes itself from previous parallel graph processing systems in that it combines the easy programmability of a synchronous processing model with the high performance of asynchronous executions. This combination is achieved by separating the application's computational logic from the underlying execution policies in our platform: developers only need to code their applications once with a synchronous programming model based on message passing between vertices, where the sparse data dependency is completely captured by the messages. Developers can then customize methods of handling message reception and selection to effectively choose different synchronous or asynchronous execution policies via relaxing of the consistency requirements of the application encoded on the messages.	en_US
dc.identifier.other	bibid: 8267174
dc.identifier.uri	https://hdl.handle.net/1813/34246
dc.language.iso	en_US	en_US
dc.subject	Iterative	en_US
dc.subject	Large Scale	en_US
dc.subject	Programming Frameworks	en_US
dc.title	Automatic Scaling Iterative Computations	en_US
dc.type	dissertation or thesis	en_US
thesis.degree.discipline	Computer Science
thesis.degree.grantor	Cornell University	en_US
thesis.degree.level	Doctor of Philosophy
thesis.degree.name	Ph. D., Computer Science

Files

Original bundle

Now showing 1 - 1 of 1

Name:: gw222.pdf
Size:: 1.69 MB
Format:: Adobe Portable Document Format

Download

Collections

Cornell Theses and Dissertations