Maybms: A System For Managing Large Amounts Of Uncertain Data
This dissertation presents the foundations for building a scalable database management system for managing uncertain data, as it appears in different data management scenarios such as data integration, data cleaning, scientiﬁc data and web data management. The result of this work is MayBMS - a scalable open-source database management system for managing large amounts of uncertain data. MayBMS uses the so-called U-relational databases to represent uncertainty. U-relational databases store uncertainty and correlations in a purely relational way, and are a complete representation system for ﬁnite world sets. Other beneﬁts achieved by our representation model include compact storage and efﬁcient query evaluation. The results of our experimental evaluation clearly show that query evaluation in MayBMS scales up to large data sizes and uncertainty ratios, and that MayBMS consistently outperforms other current systems for managing uncertain data. The dissertation also discusses optimization of queries on vertically partitioned data, efﬁcient conﬁdence computation algorithms, and challenges and solutions when designing an application programming interface for uncertain databases.
dissertation or thesis