eCommons

 

Automatic Scaling Iterative Computations

dc.contributor.authorWang, Guozhangen_US
dc.contributor.chairGehrke, Johannes E.en_US
dc.contributor.committeeMemberLi, Pingen_US
dc.contributor.committeeMemberBindel, David S.en_US
dc.contributor.committeeMemberJoachims, Thorstenen_US
dc.date.accessioned2013-09-16T16:42:43Z
dc.date.available2018-08-20T06:01:05Z
dc.date.issued2013-08-19en_US
dc.description.abstractIn this thesis, we address the problem of efficiently and automatically scaling iterative computational applications through parallel programming frameworks. While there has been much progress in designing and developing parallel platforms with high level programming paradigms for batch-oriented applications, these platforms are ill-fitted for iterative computations due to their ignorance of resident data and enforcement of "embarrassingly parallel" batch-style processing of data sets within every computational operators. To address these challenges we propose a set of methods that leverage certain properties of iterative computations to enhance the performance of the resulting parallel programs for these large-scale iterative applications. More specifically, we (1) leverage data locality to reduce communication overhead within individual iterations due to data transfer, and (2) leverage sparse data dependency to further minimize inter-process synchronization overhead and enable asynchronous executions by relaxing the consistency requirements of iterative computations. To illustrate (1) we propose a large-scale programming framework for behavioral simulations. Our framework allows developers to script their simulation agent behavior logic using an object-oriented Java-like programming language and parallelize the resulting simulation systems with millions of scripted agents by compiling the per-agent behavior logic as iterative spatial joins and distributing this query plan into a cluster of machines. We use various query optimization techniques such as query rewrite and indexing to boost the singlemachine performance of the program. More importantly, we leverage the spatial locality properties of the scripted agent behavior logic to reduce the intermachine communication overhead. To illustrate (2), we present a parallel platform for iterative graph processing applications. Our platform distinguishes itself from previous parallel graph processing systems in that it combines the easy programmability of a synchronous processing model with the high performance of asynchronous executions. This combination is achieved by separating the application's computational logic from the underlying execution policies in our platform: developers only need to code their applications once with a synchronous programming model based on message passing between vertices, where the sparse data dependency is completely captured by the messages. Developers can then customize methods of handling message reception and selection to effectively choose different synchronous or asynchronous execution policies via relaxing of the consistency requirements of the application encoded on the messages.en_US
dc.identifier.otherbibid: 8267174
dc.identifier.urihttps://hdl.handle.net/1813/34246
dc.language.isoen_USen_US
dc.subjectIterativeen_US
dc.subjectLarge Scaleen_US
dc.subjectProgramming Frameworksen_US
dc.titleAutomatic Scaling Iterative Computationsen_US
dc.typedissertation or thesisen_US
thesis.degree.disciplineComputer Science
thesis.degree.grantorCornell Universityen_US
thesis.degree.levelDoctor of Philosophy
thesis.degree.namePh. D., Computer Science

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
gw222.pdf
Size:
1.69 MB
Format:
Adobe Portable Document Format