Developing, Optimizing and Hosting Data-Driven Web Applications

Yang, Fan

Developing, Optimizing and Hosting Data-Driven Web Applications

Files

Thesis.pdf (1.41 MB)

Permanent Link(s)

https://hdl.handle.net/1813/11146

Collections

Cornell Theses and Dissertations

Full item page

Author(s)

Yang, Fan

Abstract

Building web applications using current systems is not an easy task and we face the following challenges: (1) It is difficult to program web applications on top of the standard three-tier architecture. (2) Performance optimizations and tunings are mostly done manually, which is tedious, error-prone and suboptimal. (3) It is hard for non-technical users to construct web applications for their own needs. (4) Current platforms do not scale to host a large number of applications in a cost-effective, manageable and/or flexible manner. In this thesis, we propose technologies to address those challenges in developing, optimizing and hosting data-driven web applications.

Data-Driven web applications are usually structured following the standard three-tier architecture with different programming models used at different tiers. This division not only creates an impedance mismatch problem for developers but also forces them to manually partition application logic across tiers, which results in complex logic, suboptimal system design, and expensive re-partitioning of applications as systems evolve. We propose a unified development platform based on HILDA, a high-level language for developing data-driven web applications. The primary benefits of HILDA over existing development platforms are: (a) it uses a unified data and programming model for all layers of the application, (b) it is declarative, (c) it enables conflict detection for concurrent updates, (d) it supports structured programming for web sites, (e) it separates application logic from presentation. Instead of using different languages for different layers, developers build the whole application in HILDA. HILDA code is translated into executables that run on top of the three-tier architecture. The runtime system automatically partitions the application logic between tiers based on runtime properties of the application, to optimize the system performance while obeying memory constraints at the clients. We evaluate our methodology with traces from a real Course Management System used at Cornell University as well as an online bookstore from the TPC-W benchmark. The results show that automatic partitioning outperforms manual partitioning without the associated development overhead.

There are many cases where non-technical users want to build data-driven web applications to fit their own needs. An emerging trend in Social Networking sites and Web portals is the opening up of APIs to external application developers. For example, the Facebook Platform, Google Gadgets and Yahoo! Widgets allow users to design their own applications, which can then can be integrated with the platform and shared with others. However, current APIs are targeted towards developers with programming expertise and database knowledge; they are not accessible to a large class of users who do not have a programming/database background but would nevertheless like to create new applications. To address this need, we have developed the AppForge system, which provides a WYSIWYG application development platform. Users can graphically specify the components of webpages inside a Web browser, and the corresponding database schema and application logic are automatically generated on the fly by the system. The WYSIWYG interface gives instantaneous feedback on what users just created and allows them to run, test and continuously refine their applications and greatly lower the bar for building such applications.

While each user-generated application by itself is quite small (in terms of size and throughput requirements), there are many such applications and existing data management solutions are not designed to handle this form of scalability in a cost-effective, manageable and/or flexible manner. For instance, large installations of commercial database systems such as Oracle, DB2 and SQL Server are usually very expensive and difficult to manage. At the other extreme, low-cost data hosting solutions such as Amazon's SimpleDB do not support sophisticated data manipulation primitives such as joins that are necessary for developing most Web applications. To address this issue, we explore a new point in the design space whereby we use commodity hardware and free software (MySQL) to scale to a large number of applications while still supporting full SQL functionality, transactional guarantees, high availability and Service Level Agreements (SLAs). We do so by exploiting the key property that each application is ``small'' and can fit in a single machine (which can possibly be shared with other applications). Using this property, we design replication strategies, data migration techniques and load balancing operations that automate the tasks that would otherwise contribute to the operational and management complexity of dealing with a large number of applications. We have conducted extensive experiments, based on the TPC-W benchmark data sets and workloads, to study the performance aspects of our system. Our experiments demonstrate that our system can host a very large number of Web applications and provide them rich functionality, strong consistency, high performance, high availability and data protection in an inexpensive manner by using commodity hardware and software components.

Sponsorship

This work supported by the National Science Foundation under Grant No. 534404.

Date Issued

2008-07-24T20:41:05Z

Keywords

Database; Data-Driven Web Application; Performance; Scalability; WYSYWYG; Optimization

Types

dissertation or thesis

Developing, Optimizing and Hosting Data-Driven Web Applications

Files

No Access Until

Permanent Link(s)

Collections

Other Titles

Author(s)

Abstract

Journal / Series

Volume & Issue

Description

Sponsorship

Date Issued

Publisher

Keywords

Location

Effective Date

Expiration Date

Sector

Employer

Union

Union Local

NAICS

Number of Workers

Committee Chair

Committee Co-Chair

Committee Member

Degree Discipline

Degree Name

Degree Level

Related Version

Related DOI

Related To

Related Part

Based on Related Item

Has Other Format(s)

Part of Related Item

Related To

Related Publication(s)

Link(s) to Related Publication(s)

References

Link(s) to Reference(s)

Previously Published As

Government Document

ISBN

ISMN

ISSN

Other Identifiers

Rights

Rights URI

Types

Accessibility Feature

Accessibility Hazard

Accessibility Summary

Link(s) to Catalog Record