JavaScript is disabled for your browser. Some features of this site may not work without it.
On the Completeness of Full-Text Search Languages for XML

Author
Botev, Chavdar; Amer-Yahia, Sihem; Shanmugasundaram, Jayavel
Abstract
We study formal properties of full-text search languages for XML.
Our main contribution is the development of a formal model for full-text search based on the positions of tokens in XML nodes. Building on this model, we define a full-text calculus based on first-order logic, and a full-text algebra based on the relational algebra. We show that the full-text calculus and algebra are equivalent even in the presence of arbitrary position-based predicates, such as distance predicates and phrase matching. This suggests a notion of completeness for full-text languages. None of the full-text search languages that we are aware of are complete under the above characterization. We propose a new full-text language that is complete and naturally generalizes existing full-text languages. Our formalization in terms of the relational model can also serve as the basis for (a) joint optimization of structured and full-text search queries, and (b) ranking full-text search query results by leveraging existing work on the probabilistic relational model.
Date Issued
2003-12-14Publisher
Cornell University
Subject
computer science; technical report
Previously Published As
http://techreports.library.cornell.edu:8081/Dienst/UI/1.0/Display/cul.cis/TR2003-1917
Type
technical report