Robust Querying for Data Analysis and Processing
Traditionally, formal languages such as SQL have been used by users for data analysis. However, these interfaces are not easily accessible to lay users without an IT background. This has led to the emergence of novel interfaces such as visual and natural language query interfaces. While these interfaces democratize data access, they can introduce ambiguities in understanding the user's query intent in the frontend. Furthermore, efficiently executing these queries in the backend is not a straightforward task. Traditional systems typically optimize the query execution by selecting a plan based on analytical cost models. However, these models can lead to suboptimal choices due to statistical inaccuracies. This thesis focuses on developing a robust data analysis platform that addresses these issues by making multiple query and plan choices instead of using a single one. In this thesis, I introduce three systems that facilitate robust data analysis and processing. The first system, MUVE, enables natural language queries through typed or voice input. It provides users with alternative query interpretations and optimizes visual output to minimize the time required to identify the correct results. The second system, SkinnerMT, parallelizes adaptive query processing to improve efficiency and robustness. It utilizes different parallel methods, allocating threads for plan searching or execution on data partitions. The third system, ROME, strategically selects complementary plans for concurrent execution, increasing the likelihood of incorporating an optimal plan. These systems contribute to the robustness of interactive data analysis systems by optimally selecting queries and plans from both the frontend and backend.