eCommons

 

Optimizing Response Time For Distributed Applications In Public Clouds

Other Titles

Author(s)

Abstract

An increasing number of distributed data-driven applications are moving into public clouds. By sharing resources and operating at large scale, public clouds promise higher utilization and lower costs than private clusters. Also, flexible resource allocation and billing methods offered by public clouds enable tenants to control response time or time-to-solution of their applications. To achieve high utilization, however, cloud providers inevitably place virtual machine instances non-contiguously, i.e., instances of a given application may end up in physically distant machines in the cloud. This allocation strategy leads to significant heterogeneity in average network latency between instances. Also, virtualization and the shared use of network resources between tenants results in network latency jitter. We observe that network latency heterogeneity and jitter in the cloud can greatly increase the time required for communication in these distributed data-driven applications, which leads to significantly worse response time. To improve response time under latency jitter, we propose a general parallel framework which exposes a high-level, data-centric programming model. We design a jitter-tolerant runtime that exploits this programming model to absorb latency spikes transparently by (1) carefully scheduling computation and (2) replicating data and computation. To improve response time with heterogeneous mean latency, we present ClouDiA, a general deployment advisor that selects application node deployments minimizing either (1) the largest latency between application nodes, or (2) the longest critical path among all application nodes. We also describe how to effectively control response time for interactive data analytics in public clouds. We introduce Smart, the first elastic cloud resource manager for in-memory interactive data analytics. Smart enables control of the speed of queries by letting users specify the number of compute units per GB of data processed, and quickly reacts to speed changes by adjusting the amount of resources allocated to the user. We then describe SmartShare, an extension of Smart that can serve multiple data scientists simultaneously to obtain additional cost savings without sacrificing query performance guarantees. Taking advantage of the workload characteristics of interactive data analysis, such as think time and overlap between datasets, we are able to further improve resource utilization and reduce cost.

Journal / Series

Volume & Issue

Description

Sponsorship

Date Issued

2015-01-26

Publisher

Keywords

Public Cloud; Response Time; Network Latency

Location

Effective Date

Expiration Date

Sector

Employer

Union

Union Local

NAICS

Number of Workers

Committee Chair

Gehrke, Johannes E.

Committee Co-Chair

Committee Member

Kozen, Dexter Campbell
Bindel, David S.
Demers, Alan J.

Degree Discipline

Computer Science

Degree Name

Ph. D., Computer Science

Degree Level

Doctor of Philosophy

Related Version

Related DOI

Related To

Related Part

Based on Related Item

Has Other Format(s)

Part of Related Item

Related To

Related Publication(s)

Link(s) to Related Publication(s)

References

Link(s) to Reference(s)

Previously Published As

Government Document

ISBN

ISMN

ISSN

Other Identifiers

Rights

Rights URI

Types

dissertation or thesis

Accessibility Feature

Accessibility Hazard

Accessibility Summary

Link(s) to Catalog Record