Developing, Optimizing and Hosting Data-Driven Web Applications
Building web applications using current systems is not an easy task and we face the following challenges: (1) It is difficult to program web applications on top of the standard three-tier architecture. (2) Performance optimizations and tunings are mostly done manually, which is tedious, error-prone and suboptimal. (3) It is hard for non-technical users to construct web applications for their own needs. (4) Current platforms do not scale to host a large number of applications in a cost-effective, manageable and/or flexible manner. In this thesis, we propose technologies to address those challenges in developing, optimizing and hosting data-driven web applications. Data-Driven web applications are usually structured following the standard three-tier architecture with different programming models used at different tiers. This division not only creates an impedance mismatch problem for developers but also forces them to manually partition application logic across tiers, which results in complex logic, suboptimal system design, and expensive re-partitioning of applications as systems evolve. We propose a unified development platform based on HILDA, a high-level language for developing data-driven web applications. The primary benefits of HILDA over existing development platforms are: (a) it uses a unified data and programming model for all layers of the application, (b) it is declarative, (c) it enables conflict detection for concurrent updates, (d) it supports structured programming for web sites, (e) it separates application logic from presentation. Instead of using different languages for different layers, developers build the whole application in HILDA. HILDA code is translated into executables that run on top of the three-tier architecture. The runtime system automatically partitions the application logic between tiers based on runtime properties of the application, to optimize the system performance while obeying memory constraints at the clients. We evaluate our methodology with traces from a real Course Management System used at Cornell University as well as an online bookstore from the TPC-W benchmark. The results show that automatic partitioning outperforms manual partitioning without the associated development overhead. There are many cases where non-technical users want to build data-driven web applications to fit their own needs. An emerging trend in Social Networking sites and Web portals is the opening up of APIs to external application developers. For example, the Facebook Platform, Google Gadgets and Yahoo! Widgets allow users to design their own applications, which can then can be integrated with the platform and shared with others. However, current APIs are targeted towards developers with programming expertise and database knowledge; they are not accessible to a large class of users who do not have a programming/database background but would nevertheless like to create new applications. To address this need, we have developed the AppForge system, which provides a WYSIWYG application development platform. Users can graphically specify the components of webpages inside a Web browser, and the corresponding database schema and application logic are automatically generated on the fly by the system. The WYSIWYG interface gives instantaneous feedback on what users just created and allows them to run, test and continuously refine their applications and greatly lower the bar for building such applications. While each user-generated application by itself is quite small (in terms of size and throughput requirements), there are many such applications and existing data management solutions are not designed to handle this form of scalability in a cost-effective, manageable and/or flexible manner. For instance, large installations of commercial database systems such as Oracle, DB2 and SQL Server are usually very expensive and difficult to manage. At the other extreme, low-cost data hosting solutions such as Amazon's SimpleDB do not support sophisticated data manipulation primitives such as joins that are necessary for developing most Web applications. To address this issue, we explore a new point in the design space whereby we use commodity hardware and free software (MySQL) to scale to a large number of applications while still supporting full SQL functionality, transactional guarantees, high availability and Service Level Agreements (SLAs). We do so by exploiting the key property that each application is ``small'' and can fit in a single machine (which can possibly be shared with other applications). Using this property, we design replication strategies, data migration techniques and load balancing operations that automate the tasks that would otherwise contribute to the operational and management complexity of dealing with a large number of applications. We have conducted extensive experiments, based on the TPC-W benchmark data sets and workloads, to study the performance aspects of our system. Our experiments demonstrate that our system can host a very large number of Web applications and provide them rich functionality, strong consistency, high performance, high availability and data protection in an inexpensive manner by using commodity hardware and software components.
This work supported by the National Science Foundation under Grant No. 534404.
Database; Data-Driven Web Application; Performance; Scalability; WYSYWYG; Optimization
dissertation or thesis