Cornell University
Library
Cornell UniversityLibrary

eCommons

Help
Log In(current)
  1. Home
  2. Cornell Computing and Information Science
  3. Computing and Information Science
  4. Computing and Information Science Technical Reports
  5. On the Completeness of Full-Text Search Languages for XML

On the Completeness of Full-Text Search Languages for XML

File(s)
TR2003-1917.pdf (149.4 KB)
Permanent Link(s)
https://hdl.handle.net/1813/5629
Collections
Computing and Information Science Technical Reports
Author
Botev, Chavdar
Amer-Yahia, Sihem
Shanmugasundaram, Jayavel
Abstract

We study formal properties of full-text search languages for XML. Our main contribution is the development of a formal model for full-text search based on the positions of tokens in XML nodes. Building on this model, we define a full-text calculus based on first-order logic, and a full-text algebra based on the relational algebra. We show that the full-text calculus and algebra are equivalent even in the presence of arbitrary position-based predicates, such as distance predicates and phrase matching. This suggests a notion of completeness for full-text languages. None of the full-text search languages that we are aware of are complete under the above characterization. We propose a new full-text language that is complete and naturally generalizes existing full-text languages. Our formalization in terms of the relational model can also serve as the basis for (a) joint optimization of structured and full-text search queries, and (b) ranking full-text search query results by leveraging existing work on the probabilistic relational model.

Date Issued
2003-12-14
Publisher
Cornell University
Keywords
computer science
•
technical report
Previously Published as
http://techreports.library.cornell.edu:8081/Dienst/UI/1.0/Display/cul.cis/TR2003-1917
Type
technical report

Site Statistics | Help

About eCommons | Policies | Terms of use | Contact Us

copyright © 2002-2026 Cornell University Library | Privacy | Web Accessibility Assistance