On the Completeness of Full-Text Search Languages for XML
MetadataShow full item record
Botev, Chavdar; Amer-Yahia, Sihem; Shanmugasundaram, Jayavel
We study formal properties of full-text search languages for XML. Our main contribution is the development of a formal model for full-text search based on the positions of tokens in XML nodes. Building on this model, we define a full-text calculus based on first-order logic, and a full-text algebra based on the relational algebra. We show that the full-text calculus and algebra are equivalent even in the presence of arbitrary position-based predicates, such as distance predicates and phrase matching. This suggests a notion of completeness for full-text languages. None of the full-text search languages that we are aware of are complete under the above characterization. We propose a new full-text language that is complete and naturally generalizes existing full-text languages. Our formalization in terms of the relational model can also serve as the basis for (a) joint optimization of structured and full-text search queries, and (b) ranking full-text search query results by leveraging existing work on the probabilistic relational model.
computer science; technical report
Previously Published As