Data Management Techniques For Fast Function Approximation
Computing and information technology is fundamentally changing the face of modern science. Traditional methods of performing scienti?c studies are now making way for the next generation of methods that use computing technology. However, the kinds of calculations that scientists wish to perform, and the ways in which they want to collect, archive and analyze information has posed several new challenges in data management and algorithm design. Achieving the goal of using computing technology effectively in scienti?c applications has become an important area of research in computer science, called eScience or data-driven science. Most data-driven scienti?c applications are aimed at studying and understanding some real world physical phenomenon. The general methodology followed by a scientist is to ?rst model the physical phenomenon either directly from the mathematical equations governing the phenomenon, or from a large dataset of observations about the phenomenon. Recent advances in data management, data mining and machine learning have addressed numerous challenges that arise in this ?rst stage of model building. However, the state of the art methods are inadequate in addressing challenges that arise in the second stage of a data driven scienti?c study, where the scientist uses the model she has built to help her understand the physical phenomenon, using tools such as computer simulation and visualization. This thesis identi?es and addresses data management challenges that arise when a complex model built for a real world phenomenon is analyzed by a scientist to gain insights about the phenomenon. The ?rst part of the thesis concentrates on high-dimensional function approximation (HFA), a problem relevant to virtually all applications that use computer simulation as the methodology for understanding complex models. We explore various aspects of HFA in depth, identify key data management problems, and propose solutions that signi?cantly speedup long running scienti?c simulations. Besides computer simulation, visualizing low dimensional summaries of a complex model is another method commonly used by scientists to understand models. Most real world models are complex and involve thousands of attributes. In order to get a very good understanding of a model, a scientist generates a very large number of low dimensional summaries for the model. Generating large sets of summaries for a complex model presents a challenging data management task and the second part of the thesis develops scalable algorithms for solving this data management problem.
dissertation or thesis