On the Completeness of Full-Text Search Languages for XML Chavdar Botev Cornell University cbotev@cs.cornell.edu Sihem Amer-Yahia AT&T Labs–Research sihem@research.att.com Jayavel Shanmugasundaram Cornell University jai@cs.cornell.edu Abstract We study formal properties of full-text search languages for XML. Our main contribution is the development of a formal model for full-text search based on the positions of tokens in XML nodes. Building on this model, we define a full-text calculus based on first-order logic, and a full-text algebra based on the relational algebra. We show that the full-text calculus and algebra are equivalent even in the presence of arbitrary position-based predicates, such as distance predicates and phrase matching. This suggests a notion of completeness for full-text languages. None of the full-text search languages that we are aware of are complete under the above characterization. We propose a new full-text language that is complete and naturally generalizes existing full-text languages. Our formalization in terms of the relational model can also serve as the basis for (a) the joint optimization of structured and full-text search queries, and (b) ranking full-text search query results by leveraging existing work on probabilistic relational models. 1 Introduction The emergence of XML has resulted in the rapid growth of semi-structured XML repositories. Examples of such publicly available repositories are the Library of Congress (LoC) documents [15], the IEEE INEX data collection [14], Shakespeare’s plays, DBLP and SIGMOD Record. A common characteristic of these repositories is that they have a mix of structured and unstructured data. For example, the LoC XML documents describe congressional bills and contain structured information such as bill date and sponsors and unstructured text such as bill description. Querying the content of such documents requires powerful full-text search primitives, in addition to structured query operations. While XML query languages such as XPath and XQuery [20] support sophisticated structured query operators such as selections, joins and aggregations, they can only support very rudimentary full-text search. For instance, full-text search in XQuery is expressed using the function: contains($e, tokens) which returns true iff the XML node bound to the variable $e contains all the tokens in tokens. While this function is sufficient for simple sub-string matching, it is woefully inadequate for more complex searches. As an illustration, consider the following example from the XQuery Full-Text Use Cases Document [21]. Example 1 (Use Case 10.4): Consider an XML document that contains book and article elements. Find the book elements that contain the keywords “efficient” and the phrase “task completion” in that order with at most 10 intervening tokens. The above query includes phrase matching, order specifications, and distance predicates. The contains function in XQuery cannot express and compose all these full-text primitives. XML full-text languages that we are aware of are also unable to express the above query as well as other full-text operations specified in the XQuery Full-Text Use Cases Document, such as general Boolean conditions and paragraph scoping. We believe that one reason for the current state of affairs is that there is no adequate formal model for full-text 1 search in XML. This makes it hard to understand the full scope of full-text search in XML and develop appropriate languages that can be tightly integrated with structured XML query languages. One of the main contributions of this paper is the development of a formal model for full-text search in XML. Our model is based on the notion of positions of tokens within an XML node. Explicitly modeling the positions in an XML node, instead of simply treating nodes as a “bag of words” as in traditional information retrieval (IR), allows us to formally capture position-based predicates such as distance and ordered predicates, in addition to traditional Boolean operators. Building on the above formal model, we define a full-text calculus based on first-order logic, and a full-text algebra based on the relational algebra. The calculus and algebra are powerful enough to model arbitrary position-based predicates. We also show that for any given set of position-based predicates, the full-text calculus and algebra have equivalent expressive power. This suggests a notion of completeness for full-text search languages. Given the above notion of completeness, we show the incompleteness of existing full-text search languages that we are aware of. We also describe a new full-text search language based on the full-text calculus, which is complete and naturally generalizes existing Boolean full-text search languages. This new language is the formal basis for the TeXQuery full-text language proposal [1], which we have submitted to the W3C Full-Text Task Force, whose charter is to extend XQuery with full-text search capabilities. Although not discussed in detail in this paper, our full-text algebra based on the relational model can serve (a) as the foundation for the joint optimization of structured and full-text search queries, and (b) as the basis for scoring and ranking full-text search results based on the probabilistic relational model developed for information retrieval [11]. Finally, although our formalism is used to model full-text search in XML documents, our results also apply to “flat” documents, where we simply consider each flat document to be an XML node with no internal structure. However, one of the main benefits of formalizing full-text search for XML is that it provides the formal foundation for tightly integrating full-text search with structured queries. 2 Modeling Full-Text Search in XML At its core, a full-text search specification for XML has two components: the search context, which specifies the set of XML nodes (i.e., the context nodes) over which the full-text search is to be performed and, the full-text condition, which specifies the condition that should be evaluated on each context node. Only the context nodes that satisfy the condition qualify as answers. In Example 1 in the introduction, the search context is the set of book elements (and not article elements) that satisfy the full-text condition: contains the keyword “efficient” and the phrase “task completion” in that order with at most 10 intervening tokens. In order to specify a full-text search query over XML documents, we need (1) a language to specify the context nodes and, (2) a language to specify the full-text search condition (the full-text search language). For (1), we can use a structured XML query language such as XQuery, which is relationally complete and has a well-defined formal semantics [20]. We thus focus on the full-text search language in this paper.. One important issue in full-text search is scoring and ranking the context nodes based on how well they satisfy the full-text search condition. While we do not explicitly consider specific scoring schemes in our formalism, in Section 3.3, we shall briefly describe how a scoring scheme based on the probabilistic relational model naturally fits into our full-text algebra formalism. 2.1 Modeling Context Nodes To reason about full-text search languages, we need a formal model for the context nodes. A context node can be any XML node (e.g., element and attribute nodes). Existing models for context nodes are insufficient for expressive full-text search. For instance, the XQuery data model for the book element in Figure 1 2 Elina(5) Rose(6) Usability(10) of(11) a(12) software(13) measures(14) how(15) well(16) the(17) software(18) provides(19) support(20) for(21) quickly(22) achieving(23) specified(24) goals(25). Figure 1: Positions Example treats all the text under an element as a single text node (ignore the numbers in parentheses for now). This model is enough to identify sub-strings in the text and evaluate queries such as find author nodes containing ’Elina’. However, it is insufficient to answer queries such as find books that contain the tokens ’usability’ and ’testing’ with at most three intervening tokens. Traditional IR models solve part (but not all) of this problem by tokenizing the entire document (or equivalently, context node in our setting), and representing each token separa ¢t¡¤e£¥l£§y.¦©¨In¨the§§ex¨!a#m"%p$&le£¢'¢in¨!(0F)1ig324u%re¨!561,£§7¢th8§e¨9te9x9A@ t in the context node would be modeled as the “bag of tokens” . However, this model still cannot capture the distance between tokens (we note that some IR languages do, however, support restricted forms of distance predicates; see Section 4.2 for a more detailed comparison with such languages). In this paper, we explicitly model the position of a token in a context node. We argue that this model, although simple, is powerful enough to capture the semantics of existing full-text search languages. Further, it serves as the formal basis for defining position-based predicates such as proximity distance and order predicates. In Figure 1, we have used a simple numeric position (within parenthesis) for each token. It is important to note that our proposed model does not dictate any specific implementation of positions (such as numeric positions, as shown above). More expressive positions that capture the notions of lines, sentences and paragraphs can be used, and this will enable more sophisticated predicates on positions than simple distance (such as whether two search tokens appear in the same line or sentence). Our language formalisms are extensible with respect to the set of predicates, and more complex position identifiers will enable the use of more complex predicates. 2.2 Model Definition We now define our formal model. B is the set o£§f73c$Fon£¢t2Ge7Ixt H nodes, C is the set of positions, and D is the (possibly infinite) set of tokens. The functio£§n¦V8E 2WH BQPSR1T maps a context node to the set of positions in the context node. The function U CXPYD maps e2ach positi£§o7n3$Fto£¢2Gth7ae`12ctobekedfn  #s¥to¨ re¨d9g9ga9g¨t tha@ t po£§s¦Vit8io2in`p.qIbrndsth¡¤e£¥e£§x¦ amp£§le¦V8in2iF` igbrudtre1 , if the context node is denoted by , then E R R§h , U , U R , and so on. 3 Full-Text Calculus and Algebra We define a calculus for full-text search based on first-order logic that captures the key notions of search context, positions, and position-based predicates, which were introduced in the previous section. We then define a relation-based full-text algebra that is equivalent to the full-text calculus. We also outline the potential benefits of our formalism. 3.1 Full-Text Calculus The full-text calculus defines the following predicates to model basic full-text primitives. 3  ¢¡ ¤£ ¦¥ ¨§ ©8#' & 6£¢2 $8 %$`124£¥a8¢b 24£¥a8 is true iff B (recall that B is the set of context nodes).     ©&7 £§7a`124£¥a8§¨ ©£§7¥b ©£§7 £§73$F£¢2G7a`124£¥a8¢b E is true iff E . This predicate explicitly captures the notion    of positions in an XML node. &7 £§¦V82i` ©£§7 ¨$p£§¦b $p£§¦ d £§¦V82i` ©£§7¥b U is true iff U . This predicate captures the relationship between tokens and the positions in which they occur. ' 8a7 A full-text language may also wish to specify an additional set of position-based predicates, E , ¥depending on user needs. T#h' e c£§a7 lculus is general enough to suppo6rt£¢a2Gr7b$i7trary position-based predicates.     £  £"! # $© £  £"!%©Specifically, given a set E %o'f 8p©o` siti¨o9gn9g9g¨ va0ria¨ bqle¨s9g,9g9g¨an¢db a set ¨9g9g9 of cons#ta' nts£§,7 the calc¨u9gl9gu9g¨ s can ¥ &£    '   ¤  's6u£¢p2Gpo7r$t7 any predicate of the form: ' 8a7 df q F7$p#2 8#` ©, £§w7 h¨e©re£§7 ¢¨! F7$!b ¨!£¢'§a8' 8©E ` ©£§7 aqn¨ d©£§7 qb ¨ )( 0  ¤  ' 2131)  ¤  ' &£  4  '7q 8 ©#.'§Fo` ©r £§ex7 am¨ ©p£§l7 eq,b w¨!e cou©ld£§7ad` e©fi£§7ne¨ E©£§7 qb!@  F7$p#2 8#` ©£§7 ¨ ©£§7 ¥¨! F7$!b  5  '  ¤  '  4 F7$ ©£§7 . H©er£§e7 , £¢'§a8' 8©` ©£§7 ¨ ©£§7 b return©s£§t7 rue iff there are  ' )( 0  ¤  '  4  6' 2131)  ¤  '©at£§m7 os7qt 8 ©i#n'§ter`v©e£§n7ing¨ ©t£§o7 keb ns between©£§7 and ; ©£§7 is  true©i£§ff7a` ©£§7 o¨ ©cc£§u7 rsb before  ¤  '©£§;7 ©£§7 is true iff is in the same paragraph as ; is true iff and are different positions. 3.1.1 Full-Text Calculus Queries 57 ¡ ¤£ ¦¥ ¨§ 8@9 6A B§4 24£¥a8 9 6A B§4 9 6A B§4A full-text calculu24s£¥qau8 ery is of the form: 8#' & 6£¢2 $8 %$`124£¥a8¢b " 8' ( %'`124£¥a8¢b!@ " 8' ( %'`124£¥a8¢b " 8' .I( ntu%it'iv`124el£¥ya, 8¢b the query returns s that are in the search context, and that satisfy . , hereafter24c£¥aal8led the query expression, is a first-order logic expression that specifies the full-text search con- dition. is the only free variable in the query expression. The structure of the query expression is recursively defined as follows.    DC  ¤C©&7 £§7a`124£¥a8§¨ ©£§7 b 24£¥a8 ©£§7 E is a query expression where is the free variable and #' £§7 E.    DC  ECF© G©H¥&7 £§¦V82i` ©£§7 F¨$p£§¦b ©£§7 U is a query expression, where #' £§7 $p£§¦ 6£¢2G7$7 E and .     ¤  D £  £"!  I©  EC©%' 8©` ©£§7 ¨9g9g9g¨ ©£§7 ¨ ¨9g9g9g¨ ¢b %' 8 £QPB©H¥ 6£¢2G7$7 is a query expression, where ' 8a7 ©£§7 E, #' £§7 E and .   6 D' 4 6F8 D' 6T D'8 8 88 R R SR R R R RIf and are query expressions, , 88 , and    EC  DC D8  DC  DC XW8 ©£§7 p`A&7 £§7a`124£¥a8§¨ ©£§7 b R U  EC© R VIf8¢b is a query expression, then ©£§7 E #' £§7 R are query expressions, where E. 8 are query expressions. 8¢b ©£§7 p`A&7 £§7a`124£¥a8§¨ ©£§7 3b , and E A full-text calculus query has the conventional semantics of first-order logic. The form of the quantifi- cation in the last bullet guarantees that the query expression in the full-text calculus can be evaluated using only the positions and tokens in the context node, without having to look at other positions. This notion is similar to the notion of safety for the relational calculus. We now provide some examples of full-text calculus queries. The following query returns the context 57 ¡ ¤£ ¦¥ ¨§ #8  ¤  ¤ &8  ¤ Y 2Y Q8node s24t£¥haa8t co8n#ta' in& t6he£¢2 to$k8 e%n$s`124’t£¥eas8¢t’b and ©’u£§s7 ab`Ai&lity7 ’. £§7a`124£¥a8§¨ ©£§7 b &7 £§¦V82i` ©£§7 ¨ ¤$8q7$ b  '  ' &8 U  ' Y 2A5Y©£§7 ¥`A&7 £§7a`124£¥a8§¨ ©£§7 qb &7 £§¦V82i` ©£§E 7 ¢¨ " 7q¡ )13$ bbb!@ U UE U In the subsequent examples, we only show the query expression since the rest of the query is the same. The following query returns the context nodes that contain the token ’test’ and the token ’usability’ with at most  ¤  ¤ &8  ¤ Y 2Y 38  '  ' Q85 inte©rv£§e7 ni`An&g7tok£§en7as`124. £¥a8§¨ ©£§7 b &7 £§¦V82i` ©£§7 ¨ $8q7$ b U UE U ©£§7 ¥`A&7 £§7a`124£¥a8§¨ ©£§7 qb E 4  ' Y 2A5Y &8 &£  ¤  '&7 £§¦V82i` ©£§7 ¢¨ " 7q¡ )13$ b  F7$p#2 8#` ©£§7 ¨ ©£§7 ¢¨¡  bbb U The following query returns the context nodes that contain two occurrences of the token ’test’ and do not  ¤  ¤ Q8  ¤ Y 2Y Q8  '  ' Q8  ' Y 2Yconta©in£§7the`A&tok7 en£§’7au`1s24a£¥bail8§it¨ y©’£§. 7 b e&7 £§¦V82i` ©£§7 ¨ ¤$8q7$ b U8 2131)  ¤  ' 38   W U  Y 2A5Y  ©£§7a` E©£§7 ¨ ©£§7 b ©£§7£¢§`A&U 7 £§7a`124£¥a8§¨ ©£§7£¢qb V SE ©£§7 ¥`A&7 £§7a`124£¥a8§¨ ©£§7 qb e&7 £§¦V82i` ©£§7 ¥¨ ¤$8q7$ b &7 £§¦VE 82i` ©£§7£¢¢¨ " 7q¡ )13$ bbU bb U 3.2 Full-Text Algebra We now define our full-text relations an¨d a"lg!¤e¨b9g9gr9ga¨o pe#%r$ators. The underlying data model for our algebra is a full-text relation of 'th& e form ¤¦¥ §©¨ where the domain of §©¨ is B (context nodes), and the domain of is C (positions). ¤ satisfies the following properties.   ¤ has always at least the attribute §©¨ . This captures the context node for full-text search. The remaining attributes in ¤ capture the essence of full-text search, which is to manipulate positions.     B© ©£§7  ©£§7 ¥Ea£§c7h3$Ftu£¢p2Gle7a`1$¤9i)n a( f£¥uall8¢-tb ext relation should satisfy the condition that for all the positions in , E . The intuition is that a full-text search query can only manipulate positions within a single context node. A full-text algebra expression is based on the following full-text relations that characterize the search context nodes, their positions, and the tokens at these positions.   ©10  243£5 § 76  8  ` §©¨ b 24£¥a8 : This relation contains a tuple ( 24£¥a8 ) for each B.   8 © @9 A%B AV` ¨ 4!!b 24£¥a8 © £§7 §©¨£§ 73$F£¢2G7a:`124T£¥hais8¢brelation contains a tuple for each ( ©£§7 , ) pair that satisfies: 24£¥a8 © BE . Intuitively, this relation relates context nodes to their positions.   ©` ¨ 4! b $p£§¦V82 5QPSRUTWVFX  B© 8  B© 48 ¤C¡DFE¡GIH §©¨24£¥a 8 ©£§7 : This is a family 24of£¥are8 lations, o©n£§e7 for e£§ac7h3$F£¢2G7a`124£¥aD8¢b . $p£§¦V82 cd onta£§in¦Vs82ia` t©u£§p7¥leb for each ( 5YPSRU,TWVFX ) pair that satisfies: B $p£§¦V82 E U. Intuitively, contains positions that contain , and is similar to an inverted list in IR. We note that while each ¤`C¡DFE¡GIH relation is finite, there number of such relations will be infinite if D is infinite. However, this does not lead to a problem in defining the algebra because each algebra expression is finite, and can only refer to a finite set of such relations. Also, physically instantiating the potentially infinite set of ¤C¡DFE¡GIH relations is not a problem because only a finite sub-set of these relations will be non-empty (because the search context is finite), so only this finite set of relations will have to be explicitly stored. This is in fact what happens in current implementations of inverted lists. ' 8a7 In addition, as in the calculus, we have a set of position-based predicates E . 3.2.1 Full-Text Algebra Operators and Queries The full-text algebra operators are similar to the relational operators, but with two important differences. First, full-text algebra operators only operate on full-text relations (as defined above), and not on arbitrary relations, due to the nature of full-text search. Second, full-text algebra operators implicitly enforce that each operation only manipulates positions within a single node, and not across nodes. These two properties ensure that the full-text algebra is equivalent to the full-text calculus in characterizing full-text search. A full-text algebra expression is defined recursively as follows.  10  243£5 § 76  8  is an algebra expression that returns the tuples in the full-text relation 0  243£5 § 76  8  . 5  @9 A%B  A is an algebra expression that returns the tuples in the full-text relation 9 A%B  A .   ©$p£§¦V82 ¤C¡DFE¡GIH is an algebra expression that returns the tuples in the ¤'C¡DFE¡GIH relation, where D.   B§4 ¤ B§4 ¤ B§4 )( If %' is an algebra expression,!  ¢¡¤£¡D¦¥ G¨§ ©UC C¤§    § ©UC C ` ( %' b ( is an algebra expression. If %' et hv¡¤ea£¡lD¦ru¥eaG¨st§u©UelC stCftu§o llt§ -©UhtC eeCx ft` u¤ rle!ll-b at,etwixothneraererlae tiroeinsnat¤hmee,tdrtahtdoeithfiaouvnllea-tlceroxentlsareteicoluanttaiiovlneprcoojerrc&et’sisop.noNnoodptieenrgtahtaototr.  the new expression is: The attribute names of always has to include §©¨ in the full-text algebra - this enforces the property that full-text search is always scoped within a single context node.   B§4 ¤ B§4 ' B§4 ) B§4 '( %' ( %' ` ( %'  ( %' b B§4 ¤ B§4 '  'I( f %' an(d %' are algebr5a expres5sions, then is an algebra expression, If the and new expression evalu!at!e is: ¤ to ¦ ¡¤£¡D¦¥ G" an# d ¡¤£¡D¦¥ G)r¤%ep$ ,ecwtihveerley,&th¦e n¡¤£¡D¦th¥ Ge" fu# l ¡¤l-£¡tD¦e¥ xG t relation corresponding to is the traditional relational equi-join operation on the §©¨ attribute. The duplicate §©¨ attribute is p ro& jected out in the result full-text relation, and the position attributes are renamed to be consecutive ’s. Note again how the full-text algebra does not allow operations across nodes because the only predicate that is permitted in the join is equality between the attributes §©¨ of the two relations.   B§4 ¤ ! B§4 ¤(  © B§4 4 If %' %' 8i s an alg' e8bar7 a exp(ress%io' n, then ')( V¦021 ©U5C C¦§    § ©UC C432§ 5¤4§    § 546¦7 ` ( %' ¤b is an algebra expression, where !expression is: ' E ( V¦021 ©UC . C¦§ If    § ©UC C43¨§ 5¤¦§    § e54v687 a` l¤ u!abt,eswhtoere ' , the full-text relation corresponding to the new is the traditional relational selection operator.   B§4 ¤ B§4 ' B§4 ) B§4 ' B§4 ¤ B§4 ' B§4 ¤( %' ( %' ` ( %' !9 ( %' b ( %' A@ ( %' ( %' CB B§4 'I( f %' and are algebra expres9 sio@ n, thenB , , and are algebra expressions. These , and operators have the same semantics as in traditional relational algebra. A full-text algebra query is a full-text algebra expression that produces a full-text relation with a single attribute (this attribute has to be §©¨ by definition). The set of nodes in the result full-text relation defines the result of a full-text algebra query. We now provide some examples of full-text algebra queries that correspond to the calculus example in S e¡¤£¡cD¦ti¥ oG n` ¤3C .G1DF.C 1. T¤FhE2eD¦©¦fG o&¦lH¡l&oCwI bing query returns the context nodes that contain the token ’test’ and ’usability’: CThe within a fdoislltoawnciengofq5u:er ¢y¡¤£¡rD¦e¥ tGu`r'ns0 QPthPQR eXTS cVo1 (TnU t§e(WxV t§ X7n` o¤dC eGsDFC th at¤FcE2oD¦©¦nG t&¦aH¡i&nCI thbbe token ’test’ and the token ’usability’ CThe following query d o¡¤£¡nD¦¥ oG t` ¤FcoE2D¦n©¦tGa&¦iH¡n& CtIhbebb token returns the ’usability’: context  Y¡¤£¡D¦¥ G `n` ' o0dQe`as` ( tR hP a1bRt P PcU o§ RnP tP aV i7 n` ¤tC wGDFoC occu¤rCreGDFnC cbebcs of `th0 e t24ok3£e5 n § ’te st’ 76 8 dan9 d 3.3 Equivalence of Calculus and Algebra and Its Applications ' 8a7 Theorem 1 Given a set of position-based predicates E , the full-text calculus and the full-text algebra are equivalent in expressive power. The proof is in Appendix A, and is similar to the equivalence proof for the relational calculus and algebra. The equivalence of the full-text calculus and algebra suggests a notion of completeness for full-text search languages. This provides a formal basis for comparing the expressive power of different query languages, as we shall do in the next section. To the best of our knowledge, this is the first attempt to formalize the expressive power of full-text search languages, either for flat documents or for XML documents. Developing 6 a full-text algebra in terms of relations also provides a foundation for tightly integrating, optimizing and evaluating structured (relational or XML) queries with full-text search. The full-text algebra also enables us to rank query results by leveraging existing work on the probabalistic relational model developed in the context of IR [11, 23]. Specifically, the probabilistic relational model includes a probability attribute for each tuple that specifies its relevance to the result relation. A tuple with a high probability is very relevant to the result relation, while a tuple with low probability is not. In addition, the model defines how these probabilities are propagated through traditional relational operators. In our context, we simply need to add a new probability attribute to our full-text relations. We can then rely on these techniques to propagate this attribute through the algebra operators, and produce ranked results. 4 Incompleteness of Existing Languages and a Complete Language In this section, we show the incompleteness of existing full-text languages with respect to the algebra and calculus. We then define a complete full-text language based on the full-text calculus that naturally generalizes existing languages. 4.1 Incompleteness of Boolean Full-Text Search Languages Boolean full-text search languages are commonly used in IR, and have also been proposed for XML full- text search [10, 19]. A typical syntax for such languages, which we shall call BOOL, is given below. The simplest query is a search token, which can either be a string literal (such as ’test’) or the keyword ANY, which matches any token in a node. In addition, the query can be composed with Boolean operators.      Query := Token NOT Query Query AND Query Query OR Query  Token := StringLiteral ANY   8  Y 3YWe can recursively define the semantics of BOOL in terms of ou`A&r c7 alc£§u7alu`1s2e.¨ Ifb tIhe&qu7 ery£§¦Vis82ia` GSt¨ r¤i$pn£§g¦VL8i2 tebrab l  U ’token’, it is equivalent to the calculus query express`Ai&on7 £§7a`12e¨ E bb U . B§4 B§4 UIf the query is ANY, it is equ( iva%le' nt to the( ex%p'ression E . If the query is of the form NOT S B§4 8 B§4 B§4 B§4Query, it is equivalent to , where is( the%c'Valculu( s e%x' pression fo( r Q%u'Ve ry. I(f th%e' query is of the form Query1 AND Query2, it is equivalent to R , where and R are calculus   & D8expressions for Query1 and Query2 respectively. OR is defined similarly. As an eqx`Aa&m7ple£§, 7ath`12ee ¨ qu¤eb ry & Y 2Y &8 ¦' ¦' 38 ¦' Y 2A5Y U’&t7es£§t¦V’82iA` ND¨ $N8qO7$Tbb ’usa` bil&it7 y’£§7ais`12eeq¨ uqivb alen&tt7 o th£§e¦V8c2ial`cu¢lu¨ s" q7que¡ ry)13e$ xpbrb ession: E S UU EU . ¢¡Obviously, BOOL cannot express position-'b8aase7 dd predicates. However, we now show that even if we disallow such predicates in the calculus (i.e., E ), BOOL is still incomplete if D is infinite. ¢¡Th' e8oar7 emd 2 If D is infinite, there exists a full-text query that can be expressed in the full-text calculus with E , but which cannot be expressed by BOOL.   ¨8   Pro`Ao&fS7 ke£§tc7ah`1:2eW¨ eb shal&ls7how£§¦Vt8h2iat` Gn¨o$ qbub ery in BOOL can express the following calculus query: $ U @© S$ E U (i.e., find context nodes that contain at least one token that is not ), £ ¥ where D . The proof is by contradiction. Assume that there exists a query in BOOL that can)e( xpress ¥ ' ¥  ¥¤  ¥ ' £ " Q' © t)he( cal)cu( lus query. Let D be the se$ t of)to( kens that appear in $ . We construct two con$ text no9 de` s @  $ an@¥db Q' ¦¤. con$ tains only the token . contains the token and one other token D D ¥  ¥ ' £()su( ch a token always exists because D is infin)it( e and is finite). By the construction, we can see that ¥  ¥ ' £)( doe)s( not satisfy the calculus query, while does. We will now show that either returns both or or neither of them; since this contradicts our assumption, this will prove the theorem. ¥¨§ ¥§ ¥§ £ ¥ G ¥ ' ¥©§Let be the calculus expression equivalent to . We show by induction on th)e( structure)o( f that every sub-expression of (and hence ) returns the same Boolean value for and . If the 7   F8  ¥ `A&7 £§7a`12e¨ b &7 £§¦V82i` G¨$p£§¦V82cbb )( ¥ '  U ¡  " ¥ % ¥ 's)u(b-expr$pe£§s¦Vs8io2Ind is $ of the form $p£§¦V82 E d $ U )( , i)t r( eturns true for b$p£§o¦Vth82 and   ¥  ¥ 'if , and false if (by`A&co7 nst£§ru7ac`12etio¨ nbob f and - recall )th(at )a(ppears £ B§4 U ¥  ¥ 'in ). If the sub-expression is of th( e f%o'rm E , it returns true for both )( and ).( If B§4 Sthe sub-e( xpr%e'ssion is of the form , then it returns the same Boolean value for both and because returns the same Boolean value (by induction). A similar argument can also be made for the 8 T ¢ ¢¡and Boolean operators. ' 8a7 d If we limit D to be finite, however, we can prove that BOOL is complete with E . ¢¡' 8a7 d Theorem 3 If D is finite, every query that can be expressed in the full-text calculus with E can be expressed in BOOL.   8  "The proo`Af&is7 pre£§s7ae`1n2et¨edb in Ap&pe7 ndi£§x¦V8A2i. ` TGh¨$e mbb ain intuition is that, if D is finite, we can express queries  U Ss$ uch as: E U in BOOL by explicitly listing all the tokens that are not 6. Although BOOL is complete under this assumption, it is not always practical bec$ ause even for simple queries such as the one above, we need to explicitly list all possible tokens other than in the query. 4.2 Incompleteness of Existing Predicate-Based Full-Text Search Languages We now consider full-text languages that have position-based predicates in addition to Boolean operators [2, 4]. A typical syntax for such a language, which we shall call DIST, is given below.        Query := Token NOT Query Query AND Query Query OR Query dist(Token,Token,Integer)  Token := StringLiteral ANY &£The semantics of DIST is the same as B OF7O$pL#2, ex8 cept for the addition of dist(Token,Token,Integer). This construct is the equivalent of the predicate introduced in the calculus (Section 3.1), and spec- " 0' " 0'ifies that the num$ be$ r o f intervening token$ s shou$ ld be less than the s pecified integer. More formally, the & & 68 &  68 ¦' ¦' 68 ¦' 0' 68 &£ & ¦'semq`Aa&nti7cs o£§f7ad`12eis¨ t( ¤b , &, )7 for£§¦Vso8m2i`e to¨k$ enb s a§n`Ad&7 an£§d7a`1s2eo¨meqb int&ege7 r £§i¦Vs8g2iiv` e¥n¨b$ yb the Fc7a$pl#c2ulu8#s` exq¨pre¢s¨!sibobnb : U  0' U$ $ E U E U &7 £§¦V82 . If or is ANY instead of a string literal, then the corresponding U predicate is omitted in the semantics. We now show that DIST is incomplete with respect to the calculus so long as D is not trivially small. We can also prove similar incompleteness results for other position-based predicates. 7 &£ 7 £ & ¦'Th' e8oar7 emdf q4 FIf7$p#D2 8#` R ¨ , t¥h¨!erb!e@ exists a full-text query that can be expressed in the full-text calculus with E , but which cannot be expressed by DIST. & & 8 ¦' ¦' 8 &  8 ¦' 0' 8 &£ & ¦'Proqo`A&f Sk7 etc£§h7a:`12eW¨ e ¤sbhall s§h`Ao&w7tha£§t 7an`1o2eq¨ uerb yii&nD7 IS£§T¦V8c2ian` ex¨p$ ¤reb sis&th7e fo£§l¦Vlo8w2ii` ng¥¨c$ albculusFq7u$p#er2 y:8#` ¨ ¢¨!#bbb U © 0' © U ¤  0' ¨ S 0'$ E $ $ Ed $ U U $$ , where D , D and (i.e., find context nodes where the tokens and do not appear next ¥  ¥ ' £ ¥ to each other at least once). The proof is by contradiction. Assume that )th( ere exist)s ( a query in D)IS( T  Q' " ¥ '  Q'that can express the$ calculus query.$ We now constr$ uct)tw( o context nodes a$ nd as fol$lows.  Q' ¥ Gcon$ tains the tokens$ followed by followed by . )co( ntains the tokens followed by followed ¥ 'b)y( followed by . By the construction, we can see that does not satisfy the calculus query, while ¥ % ¥ ' £ £does. Using i)nd( uction)o( n the structure of similar to the proof of Theorem 2, we can show that ¢either returns both or or neither of them. This is a contradiction. 4.3 A Complete Full-Text Query Language We now present a new language COMP based on the full-text calculus that is complete even in the presence of arbitrary position-based predicates. COMP shares the same syntax as BOOL for simple Boolean queries, 8 but naturally generalizes BOOL with position variables to achieve completeness. Thus, simple queries retain the same conventional syntax, while new constructs are only required for more complex queries.            Query := Token NOT Query Query AND Query Query OR Query SOME Var Query EVERY Var Query Preds      Token := StringLiteral ANY Var HAS StringLiteral Var HAS ANY    Preds := distance(Var,Var,Integer) diffpos(Var,Var) ... The main additions to BOOL are the HAS construct in Token, and the SOME, EVERY and Preds constructs in Query (the semantics of the other constructs remain unchanged from BOOL). The HAS   )explicitly bind position variab$pl£§e¦s (Var) to positions whe&re7 tok£§e¦Vn8s2io` cc#u' r.¨T$ph£§e¦sb emantics fcoorn’s  tr#u' ct allow$ps£§u¦ s HAS#' ’ to in terms 4ANY’ of is: t&he7 cE al£§c7au`1l2eus¨ ,  w#' heb re is . While a StringLiteral is: the HAS construct U allows us to   explicitly . The semantics for ’  bind position variables to HAS token ¤ U 4 5 ¤ 8 B§4 V 4 B§4 ¤ GW B§4 B§4pf’SoorOsiQMtiEoune  sry#,.' thTeQhSueeOsrMyemE’ iasann¡tdi  c#sE' VoqEf`A&R’EY7VEcEo£§Rn7aYs`1tr2e u¨#c  t' s#' allb ow(s u%s '#tob , Query’ is ¢  quantif(y w#h' e¢r`Ae&7 o%v'er these positions. The semantics of E £§7ai`1s2et¨h  e#c' albculus (exp%re'#sb sion sema( nti%c's , where is the calculus expression semantics for  ¤ £  £  £  £%ar' b8i©tr`a  r#y' po¨s9i9t9qio¨ n  -#b' a( s¨ed¨p9r9e9qd¨ica£ tb e.s. The Qseumearyn.ticFsinoafllay,prthedeicParteeds’pcroedn(s tr#u' ct ,.a.l.l,o  w#' s( for the , ,... £ definition of )’, is simply: As an illustration of the power of COMP, the following two queries express the calculus queries used to & & prove the incompleteness of$ BOOL and DIST in Theorems 2 and 4, respectively. & ¦' &  ¦' 0' # ¦'SOME (NOT HAS ) $ $ SOME SOME ( HAS AND HAS AND NOT distance( , ,0)) We can prove that COMP is complete (the proof is in the appendix). ' 8a7 Theorem 5 Every query th'a8tac7 an be expressed in the full-text calculus using predicates E can be expressed by COMP using E . 5 Related Work Most of IR research [2][18] has focused on methods for relevance estimation and efficient evaluation of keyword queries. In this context, full-text languages have been developed to implement specific primitives, but their formal properties such as expressive power and completeness have not been studied. This observation also applies to XML full-text search languages such as XQuery/IR [3], XIRQL [10], XSEarch [8], XRANK [12], XXL [19] and Niagara [22]. Existing work on probabilistic relational databases [11, 23] could be used to extend our algebra to support relevance scoring of query results. Several other works have used relational systems to store inverted lists and translate keyword queries to SQL [5, 9, 13, 16, 17, 22]. Such implementations do not provide a formal basis for completeness and could benefit from our formalization. Finally, our study of completeness for full-text languages is similar to the that for the relational algebra and calculus [7], and goal is to provide a similar formal basis for full-text querying so that it can be tightly integrated with structured queries. 6 Conclusion This paper presents a simple, yet powerful formalization of full-text search for XML. While this paper makes an initial step in characterizing full-text query languages for XML, there are multiple directions we can explore to build on this work. First, we wish to explore how the formal characterization of fulltext search in terms of the relational model can enable the seamless integration of full-text languages with structured query languages, in terms of both query optimization and query evaluation. Second, we wish 9 to incorporate primitives such as stemming, thesaurus and stop-words in our framework. We believe that quantification over tokens adds additional expressive power, and would enable these additional features. References [1] S. Amer-Yahia, C. Botev, J. Robie, J. Shanmugasundaram. TeXQuery: A Full-Text Search Extension to XQuery. http:/www.cs.cornell.edu/database/TeXQuery/. [2] R. Baeza-Yates, B. Ribiero-Neto. Modern Information Retrieval. Addison-Wesley, 1999. [3] J. M. Bremer, M. Gertz. XQuery/IR: Integrating XML Document and Data Retrieval. WebDB 2002. [4] E. W. Brown. Fast Evaluation of Structured Queries for Information Retrieval. SIGIR 1995. [5] T. T. Chinenyanga, N. Kushmerick. Expressive and Efficient Ranked Querying of XML Data. WebDB 2001. [6] E. F. Codd. A Relational Model of Data for Large Shared Data Banks. Commun. ACM 13(6): 377-387 (1970). [7] E.F. Codd. Relational Completeness of Database Sublanguages. In R. Rustin (ed.), Database Systems, Prentice- Hall, 1972. [8] S. Cohen et al. XSEarch: A Semantic Search Engine for XML. VLDB 2003. [9] D. Florescu, D. Kossmann, I. Manolescu. Integrating Keyword Search into XML Query Processing. WWW 2000. [10] N. Fuhr, K. Grossjohann. XIRQL: An Extension of XQL for Information Retrieval. SIGIR 2000. [11] N. Fuhr, T. Ro¨lleke. A Probabilistic Relational Algebra for the Integration of Information Retrieval and Database Systems. ACM TOIS 15(1), 1997. [12] L. Guo, F. Shao, C. Botev, J. Shanmugasundaram. XRANK: Ranked Keyword Search over XML Documents. SIGMOD 2003. [13] V. Hristidis, L. Gravano, Y. Papakonstantinou. Efficient IR-Style Keyword Search over Relational Databases. VLDB 2003. [14] Initiative for the Evaluation of XML Retrieval. http://www.is.informatik.uni-duisburg.de/projects/inex03/. [15] Library of Congress. http://lcweb.loc.gov/crsinfo/xml/. [16] J. Melton, A. Eisenberg. SQL Multimedia and Application Packages (SQL/MM). SIGMOD Record 30(4), 2001. [17] A. Salminen. A Relational Model for Unstructured Documents. SIGIR 1987. [18] G. Salton, M. J. McGill. Introduction to Modern Information Retrieval. McGraw-Hill, 1983. [19] A. Theobald, G. Weikum. The Index-Based XXL Search Engine for Querying XML Data with Relevance Ranking. EDBT 2002. [20] The World Wide Web Consortium. XQuery 1.0: An XML Query Language. W3C Working Draft. http://www.w3.org/TR/xquery/. [21] The World Wide Web Consortium. XQuery and XPath Full-Text Use Cases. W3C Working Draft. http://www.w3.org/TR/xmlquery-full-text-use-cases/. [22] C. Zhang, J. Naughton, D. DeWitt, Q. Luo, G. Lohman. On Supporting Containment Queries in Relational Database Management Systems. SIGMOD 2001. [23] E. Zimanyi. Query Evaluations in Probabilistic Relational Databases. Theoretical Computer Science, 1997. 10 A Proofs of Theorems Theorem 1: Equivalence of the Full-Text Calculus and Algebra Le' m8am7 a 1. For every full-text algebra expression that only uses position-based predicates from the set E , there ex' i8sats7 an equivalent full-text calculus expression that only uses position-based predicates from B§4the same set E .   )¢¡a( %' ¥  0' £ ¥ £ B§4  P5 r`o)of( S£¥kae8§tc¨!h#:$F$ W¨!#e$Fw$ ¢i¨ll9g9g9gp¨!r#o$Fv$UeT b th¦at for every algebra expression , which6eav) al( uat%e's `1t2eo¨ ar¨e9gl9ga9g¨t'ioT nb F  #  7 ¡ ¤£ ¦¥ ¨§ #8 & #8 ¤8 2e¨ ¨9g9g9g¨ ',T#@ , ther ae`12eex¨ ists¨9ga9g9g¨ c'aT lcb ulus8q#u' er& y6e£¢x2 p$r8e%ss$i`1o2cnb &7 £§7a`12e¨ ¤b 9g9g9  &8 ¥ £ B§4 & w&it7 h fr£§e7ae`12eva¨ 'riT¥abbles6a) ( %'`12e¨ ,s¨u9g9gc9gh¨ 'tThb!a@6t d 5 E B§4E   . )¢¡a( %' The proof is by induction on the structure of .   B§4 ¥ £ B§4   8  T  ¥S U   Q8  ¥ £ B§4 U 7 ¡ ¤£ ¦¥ ¨§ Q8If` )¢¡a( &7 %'Xd 0 £§7a`12e¨ b 243£5 e&7 §  7£§6 7a`12e8 ¨ 6a) ,bbthe6na) ( ( %'`12cbId %'`12cb ` &7 £§7a`12e¨ b &7 £§7a`12e¨ bb E  2 8#' E & 6£¢2 $8 %$`12cb £ B§4 ¡ ¤£ ¦¥ ¨§6a) ( %'E `12cb!@ E . 8#' is& a6lw£¢a2 y$s8 %tr$ue; therefore, is equal to the full-text relation by its definition.   B§4 ¥ £ B§4 # & & 7 ¡ ¤£ ¦¥ ¨§ Q8  )¢¡a( %' &I&f7 £§7a`12e¨ d9 ¤b!@ A%B  A 6a) , then ( %'`12e¨ ¤beds&7 £§7a`12e¨ 7 E £§7 b  a`12e¨ , i.e. b 8#' & 6£¢2 $8 %$`12cb E is equal to the full-text relation £ E by its definition.   B§4 ¥ £ B§4  Y 3Y & 7 ¡ ¤£ ¦¥ ¨§ Q8  )¢¡a( %' d 6a) ( %'`12cbeds&7 £§¦V82i` G¨ ¤$p£§¦V82 b  a`12e¨ b 8#' & 6£¢2 $8 %$`12cb & &8 & Y 3YI&f7 £§7a`12e¨ ¤b ¤C¡DF&E¡GIH 7 , th£§e¦Vn82i` ¨ $p£§¦V82 b!@ U . 5QPSRUTWVFX EU is equal to the full-text relation by its definition.   B§4 5Y B§4 ¤Y  )¢¡a( ¥ £ B§4 )Y &  B§4 6YIf %' d  ¡¤£¡D¦¥ G¨§ ©UC C¤§    § ©UC C `¥¤§¦©¨ 2 gb   )¢¡a( %' 86 a) ,( wh%e'reA`12e¨ ¨9g9g9g¨ isb a full-  te)¢x¡at( alg%e' bra expression whose Y ¥  0' 2 ( £ ¥ £ B§4 # C C # C # #8e5 qAu` i)va( le£¥nat8§c¨!a#lc$F$ulu¨!#s$Fq$ u¥e¨r9gy9g9g¨!e#x$Fp$ resb sion sis  6a) ( %'`12e¨ ¨9g9ga9g¨ ndFb d  &eva7 lua£§te7as`12eto¨  ¤b U   &8 ¥ £ B§4 6Y &  & C 7 ¡ ¤U £ ¦¥ ¨§ Q89g9g9 &7 £§7a`12e¨ b 6a) ( , %' `12e¨, the¨n9g9g9g¨ b  a`12e¨ ¨9g9g9g¨ Fb 8#' & 6£¢2 $8 %E $`12cb P  C  P Q8X¥ £ B§4 & C &  7 ¡ ¤£ ¦¥ ¨§ Q8 P    P &8 ¥ £ B§4 6Y &  Y " " §§ §§ &E 7 £§7a`12e¨ ¢b &7E £§7a`12e¨ qb E 6a) ( 6a) ( %'`12e¨ %' A`12e¨ ¨9g9g9g¨ Fb!@ ¨9g9g9g¨ . 6=b! @6 d  R¤0 V   § (TU R¤§ 0  V § ("! § (TU  a`12e¨ §    § ("! ` ¨9g9g9g¨ 5b . b 8#' & 6£¢2 $8 %$`12cb   B§4 B§4 4 B§4 '  )¢¡a( C ¥  2If %' d#¤§¦©¨ 8 24! 5 !$` )¤§( ¦©¨£¥ a88§ ¨!2#$$F,$   w¨9gh9ge9g¨!r#e$F$ )¢¡a( b %'   )¢¡a( ed an¥d¨ %' are full-text algebra ex- ¥ £ B§4 C &  ¥ £ B§4 #  ¦pressions that eva6luaa) te( to%' `12e¨ ¥ £ B§4 &  Q8X¥ £ B§4  #  ¦ &  ¦ 7 ¡ ¤£ ¦¥ ¨§ Q8e6xap)re(ssio%n's`12ear¨ e ¨9g9g9g¨ P   ¦  P 8 ¥ £ B§4 &  8 ¥ £ B§4  #  ¦  '5 " W§  5   § U  V &7 E £§7a`1U 2eb ¨ 6a) ¢b ¨9g9g9g¨ b ( %'`12e!¨ 6a) ( for %'`12eU  ¨  d ! ¥fo¨ r ¨9g9g9g¨ ¨9g9g9g¨ R U  U R6, aa)nd( ,tb hVebn.6 aa`12e) ¨ ( th¨%9ge%'9gi9g'r`1¨ 2e`1e2e¨q¨uU i vU a¨ l9gVe9g9gn¨¨t9g9g9g9gc9g9g8¨¨a#lc' uUUl&u s6q£¢VV u2@b e$rdd8 y%$`12cb .   B§4 5Y B§4 ¤Y  )¢¡a( BY ¥ £ B§4 5Y & If %'Id '§%'& G¤¥ 1 ©UC 5 C§    § ©UC C43§ 5¤¦§    § 546 7 `( 8 2 b( , where %' is a full-tex6t aa) lg( ebr%a' ex`12epr¨ ess¨io9g9gn9g¨ 'thT ab t ¥ £ B§4 &  ¥ £ B§4 6Y &  Q8  2 £  £ &  7evalu6ataes) t( o th%e'`1r2eel¨aton¨9g9g9g¨ ¡ ¤£ ¦¥ ¨§ Q8 P   P Q8X¥ £ B§4 6Y &  Q8  2 £  £th8e#n' & ! Y' ( V¦021bR P 6£¢2 $8 %$`12cb  P U §    § R P P) § S U §    § S10 7 5 " 'aT nbrdd th6e ae)qu( iva%le' nAt`12ec¨alcu¨l9gu9g9gs¨ 'qTubery%'e8x©pr` e#s$Fs$ io¨n9g9g9gi¨!s#$F$ §    § T &7 E £§7a`12e¨ ¢b 6a) ( %' `12e¨ ¨9g9g9g¨ 'T b %' . ¨ 8©` #¨$F$9g9g9g¨¨ 9g9g£ 9g¨!b .#$F a$ `12e¨¨ ¨9g9g,9g¨ 'T b ¨9g9g9g¨ £ b!@6d   B§4 B§4 4 B§4 '  )¢¡a( CLet @%' d2¤§¦©¨ 24! 3¤§¦©¨ 2   )¢¡a( $8 5  d 8¥ ¨ , where %'   )¢¡a( and %' are full-text algebra ex- ¥ £ B§4 DC &  ¥ £ B§4 #  ¥ £ B§4 ¤ &  Tp6reas) si( ons%'tha`12et e¨ val¨u9ga9g9gt¨e'tT ob fo r d ¥¨ R and th6eair) e( qui%va'l`1e2en¨t ca¨lc9g9gu9 'luT sb qd uer6y ae)xp( res%si' o¢n`1s2ea¨ re¨9g9g9 'T b ¥ £ B§4 ' &  &  7 ¡ ¤£ ¦¥ ¨§ 38 P   P Q86a) ( %' `12e¨ ¨9g9g9 'T b  afo`12er ¨ ¨9g9g9g¨ 'R T, b then8#' & 6£¢2 $8 %$`12cb ¥ £ B§4 ¤ &  QT ¥ £ B§4 ' &   '` 6a) ( %' ¢`12e¨ ¨9g9g9 'T .b 6a) ( %' §`12e¨ ¨9g9g9 'T bb!@6d 5 @ 5  " §    § T &7 E £§7a`12e¨ ¢b .   B§4 B§4 4 B§4 '  )¢¡a( CLet B%' d2¤§¦©¨ 24! 3¤§¦©¨ 2   )¢¡a( $8 5  d 8¥ ¨ , where %'   )¢¡a( and %' are full-text algebra ex- pressions that evaluate to for R and their equivalent calculus query expressions are 11 ¥ £ B§4 DC &  ¥ £ B§4 #  ¥ £ B§4 ¤ &  86a) ( %' `12e¨ ¨9g9g9g¨ 'T b  d ¥¨ 6a) ( %'`12e¨ ¨9g9g9 'T b d 6a) ( %' ¢`12e¨ ¨9g9g9 'T b ¥ £ B§4 ' &  &  7 ¡ ¤£ ¦¥ ¨§ &8 P   P6a) ( %' `12e¨ ¨9g9g9 'T b  afo`12er ¨ ¨9g9g9g¨ 'R T, b then8#' & 6£¢2 $8 %$`12cb  8 ¥ £ B§4 ¤ &  38 ¥ £ B§4 ' &   '` 6a) ( %' ¢`12e¨ ¨9g9g9 '.T b 6a) ( %' §`12e¨ ¨9g9g9 'T§bb!@ d 5 AB 5 " §    § T &7 E £§7a`12e¨ qb .   B§4 B§4 4 B§4 '  )¢¡a( CLet %' d 9¤§¦©¨ 24! ¤§¦©¨ 2   )¢¡a( $8 5 W8 d ,¥¨where %'   )¢¡a( and %' are full-text algebra ¥ £ B§4 DC &  ¥ £ B§4 #  ¥ £ B§4 ¤ &  8e6xap)re(ssio%n' s`1t2eh¨at e¨v9ga9g9gl¨ u'aT teb to  d for¥¨ R 6aan) d( the%i'r `1e2eq¨ uiv¨a9gl9ge9 'nTt b cad lcu6luas) q( uer%y' ¢ex`12ep¨ress¨i9go9gn9 'sT bare S ¥ £ B§4 ' &  &  7 ¡ ¤£ ¦¥ ¨§ 38 P   P Q86a) ( ¥ £ B§4 ¤ &  &8 ¥ £ B§4 ' &   '` 6a) ( %' §`12e¨ ¨9g9g9 'T§b fo ar`12e¨ ¨9g9g9gR ¨ ', T thb en 8#' & 6£¢2 $8 %$`12cb  %' ¢`12e¨ ¨9g9g9 'T b . 6a) ( %' §`12e¨ ¨9g9g9 'T§bb!@ d 5 !9 5 " §    § T &7 E £§7a`12e¨ qb S. ¥ ¥ £ B§4This completes th)e ( str£¥uac8 tural induction. The requirement that fu6ll-at) ex( t a%lg' ebra queries evaluate to a 7 ¡ ¤£ ¦¥ ¨§ 38 ¥ £ B§4relation with a sing2le attr i2bute8e#n' su& r6es£¢t2 h$a8 t%th$e`12ccb orre6spao) n(din%g'`12cb!@ expression will have only ¢one free variable - . Therefore, is a valid calculus query. Le' m8am7 a 2. For every full-text calculus expression that only uses position-based predicates from the set E , there ex' i8sats7 an equivalent full-text algebra expression that only uses position-based predicates from ¥ £ B§4 X the same set E . 6a) ( %'`12e¨ ¨9g9g9g¨ 'T b #  £ B§4Proof Ske tc2eh¨ : W¨9ge9g9g¨w'iT#ll@ p¦rove  that for every query calculus expres  si)¢o¡an( %' with free ¥  0' #  7 ¡ ¤£ ¦¥ ¨§ ¤8 P   P ¤8v5 a`r)iab( le£¥sa8§¨!#$F$ ¥ £ B§4 & 6a) ( %'`12e¨ ¨!#$F$ ¢¨9g9g,9g¨!#$F$UT b , there exis ats`12ean¨ alg¨9ge9gb9g¨ r'aT eb xpre8ss#i' on& ¨9g9g9g¨ 'T b!@6d 5 , such that 6£¢2 $8 %$`1,2cwb hic h e" va§ l u§ T a&tes7 to E £§a7ar`1e2ela¨ tioqbn ¥ £ B§4. 6a) ( %'   ¥ £ B§4   B§4The proof is by induction on the structure of . 6a) ( If %'`12e¨ b d &7 £§7a`12e¨ b   )¢¡a( E , then %' d 9 A%B  A . The proof of the equivalence is the   ¥ £ B§4   Y 3Y B§4same as the analogous case from Lemma 1. 6a) ( %'`12e¨ beds&7 £§¦V82i` G¨ ¤$p£§¦V82 b   )¢¡a( %'0d If U , then ¤C¡DFE¡GIH . The proof of the equivalence is the same as the analogous case from Lemma 1.   ¥ £ B§4 &   &  £  £ B§46a) ¡ ¤£ ¦¥ ¨§ 38 C  C &8 &  £  £   7I` f9 A%B 8#'  & ( %'`12e¨ ¨9g9g9g¨ A 6£¢2 9g9g9% 9 A%B $8 %$`12cb  'T brd Ab %' 8©` ¨9g9g9g¨ 'Ta¨ q¨9g9g9g¨ £ b ,¦ then   )¢¡a( %'05fd df'§ a%'`1& 2eG¤¥ ¨  " , w§  h  e§ T re&th7 eE n£§u7am`12eb¨er 3ob f jo%in' s8©is` .¨O9g9g9gb¨ v'iTao¨usly¨,9g9g9g¨ £ b!@ . 1 % §¨ 9g9g§ 9g% ¨ ¡ '§ 5¤T ¦b§    § 54687   ¥ £ B§4 &  Y Y Y Y Y Y ¥ £ B§4 ¤ &  Y Y Q86a) ( %'`12e¨ ¨9g9g9g¨ £¢¨ ¨9g9g9g¨ ¨ ¨9g9g9g¨ S brd 6a) ( %' q`12e¨ ¨9g9g9g¨ £¢3¨ ¨9g9g9g¨ b ¥ £ B§4 ' &  YR Y RY Y R R ( £ ¥ £ B§4 4 R ¥ R £ B§4 'I6f a) ( %' `12e¨ ¨9g9g9g¨ £¢3¨ ¨9g9g9g¨ S b ¦sd )¥¤ ¦¤ 6a) ( %' 6a) ( %' R R B§4  B§4 ', where ,   )¢¡a( %' , an  d)¢¡a( %' are cal-  ¥  2Y 2Y ' ¥  2Y Y 2Y Yculus qu5 er¢y` )ex( pr£¥eas8§si¨!o#n$Fs$ w¨9gi9gt9gh¨!#e$Fq$UuTai¨!v#a$Fle$ n¨t 9ga9g9gl¨!g#e$Fb$ ra b expres5 sio`n)s ( £¥a8§¨!#$F$ ¨a9gn9g9gd¨!#$F$§¢3¨!#$F$ ¨9g9g,9g¨!w#h$F$ iSchb eval- B§4u  a)¢t¡ae( to%' d `¥¤§¦©¨ 24!  &  Y Y Y Y Y Y 7 &  Y Y ©  8 &  Y Y Y Y © ' a`12e¨ ¨9g9g9g¨ £¢¨ ¨9g9g9g8¨  ¨ ¨  9g¡¤9g9g£¡¨ D¦¥ S G¨b§ ¨© ©© § `1 2e § ¨ ¨ © © ¤§¦©¨ ¨9g9g9g¨ 2 8£ ¢3¨ a$ nb d¨B 9g9g9g` ¨   ¡¤£¡D¦b ¥ G¨§ ¨5© §    §¨ 3© ¤§¦©¨ `12e¨ &24! $¤§¦©¨ 8¨ 9g9g9g¨ £¢3¨ ¨9g9g9g8¨  2 S b$ b , 5 thed n . 5 ¢@ &  R Y R Y R Y Y R Y Y 7¤¥ £ B§4 ¤ R & R  Y Y Q8 R R a`12e¨ ¨9g9g9g¨ £¢¨ ¨9g9g9g¨ ¨ ¨9g9g9g¨ S b 6a) ( %' ¢`12e¨ ¨9g9g9g¨ £¢3¨ ¨9g9g9g¨ b ¥ £ B§4 ' &R   R Y YR Y YR R R=6a) ( %' `12e¨ ¨9g9g9g¨ £¢3¨ ¨9g9g9g¨ S b!@ R R , which is exactly what we wnated to show.   ¥ £ B§4 &  Y Y Y Y Y Y ¥ £ B§4 ¤ &  Y Y QT6a) ( %'`12e¨ ¨9g9g9g¨ £¢¨ ¨9g9g9g¨ ¨ ¨9g9g9g¨ S b 6a) ( %' ¢`12e¨ ¨9g9g9g¨ £¢¨ ¨9g9g9g¨ b ¥ £ B§4 ' &  YR Y RY Y R R ( £ ¥ £ B§4 4 R ¥ R £ B§4 'I6f a) ( %' `12e¨ ¨9g9g9g¨ £¢3¨ ¨9g9g9g¨ S b ¦sd = )¥¤ ¦¤ 6a) ( %' 6a) ( %' R R B§4  B§4 ', where ,   )¢¡a( %' , an  d)¢¡a( %' are cal-  ¥  2Y 2Y ' ¥  2Y Y 2Y Yculus qu5 er¢y` )ex( pr£¥eas8§si¨!o#n$Fs$ w¨9gi9gt9gh¨!#e$Fq$UuTai¨!v#a$Fle$ n¨t 9ga9g9gl¨!g#e$Fb$ ra b expres5 sio`n)s ( £¥a8§¨!#$F$ ¨a9gn9g9gd¨!#$F$§¢3¨!#$F$ ¨9g9g,9g¨!w#h$F$ iSchb eval- B§4u  a)¢t¡ae( to%' d `¥¤§¦©¨ 24!  &  Y Y Y Y Y Y 7 &  Y Y ©  T &  Y Y Y Y © ' a`12e¨ ¨9g9g9g¨ £¢¨ ¨9g9g9g8¨  ¨ ¨  9g¡¤9g9g£¡¨ D¦¥ S G¨b§ ¨© ©© § `1 2e § ¨ ¨ © © ¤§¦©¨ ¨9g9g9g¨ 2 8£ ¢3¨ a$ nb d¨@ 9g9g9g` ¨   ¡¤£¡D¦b ¥ G¨§ ¨5© §    §¨ 3© ¤§¦©¨ `12e¨ &24! $¤§¦©¨ 8¨ 9g9g9g¨ £¢3¨ ¨9g9g9g8¨  2 S b$ b , 5 thed n . 5 ¢@ &  R Y R Y R Y Y R Y Y 7¤¥ £ B§4 ¤ R & R  Y Y QT R R a`12e¨ ¨9g9g9g¨ £¢¨ ¨9g9g9g¨ ¨ ¨9g9g9g¨ S b 6a) ( %' ¢`12e¨ ¨9g9g9g¨ £¢3¨ ¨9g9g9g¨ b ¥ £ B§4 ' &R   R Y YR Y YR R R=6a) ( %' `12e¨ ¨9g9g9g¨ £¢3¨ ¨9g9g9g¨ S b!@ R R , which is what we wanted to show. 12   ¥ £ B§4 F  ¥ £ B§4 6Y &  ¥ £ B§4 ¤Y6a) ( %'`12e¨ ¨9g9g9g¨ 'T b d 6a) ( %' A`12e¨ ¨9g9g9g¨ 'T b 6a) ( %' S B§4 YLet us consirder the case   )¢¡a( %,' where is Y &  ¡  F B§4  7 ¡ ¤£ ¦¥ ¨§ 8 C  B§4 ¤Y C 8to a5 cAa`1l2ecu¨ lus¨9gq9gu9g¨ e'rTyb expre¦ssion . If ¦ 5 t hat is eq  u)¢iv¡aa( len%t'ftod d , t ah`1e2en¨ ¨9g9g9g¨ 'T b the8` #9a'lgA%e& Bb6raA£¢2e x$p8 9gr%9ge9%$s`1s2cio9b nA%B  ¥ £ B§4 6Y & th6e an)um( be%r' oA`1f2ejo¨ ins¨9gi9gs9g¨ 'T. b!@  A¥b " 9  §§T ,)¢w¡a(hich%' evaluates &7 £§7a`12e, ¨wh3bere E S B§4 Y 7 ¡ ¤£ ¦¥ ¨§ 8¦sd    )¢¡a( ¡ ¤£ ¦¥ ¨§ 38 ¥ £ B§4 6YIf8#' & 6,£¢2th$e8 n%$`12cb , %' d 6a) which is what 0 (  243£5 %' A`12c§ b!7@ 6  wew9 an¤§t¦©ed¨ 8 to show. 2 8 and 5 d  2 8#' & 6£¢2 $8 %$`12cb   ¥ £ B§4 & S  #  # Q8X¥ £ B§4 6Y &  # ¥ £ B§4 ¤Y6a) ( %'`12e¨ ¨9g9g9g¨ 'T brd 'T  &7£( £¥a8#`12e¨ 'T  b 6a) ( %' A`12e¨ ¨9g9g9g¨ 'T  b 6a) ( %' U B§4 YIf   )¢¡a( %' , where Y B§4 Y &  7 ¡ ¤£ ¦¥ ¨§ 8is a calcu5 lus query   U  # &  # © Y &  7 ¡ ¤£ ¦¥ ¨§ Q8  # ¥ £ B§4 6Y &  #ua'tTes to`12e¨ ,¨th9g9ge9gn¨ 'T  e)¢¡ax(pre%ss' iond b 5@ t ah a`1¡¤2et£¡¨D¦i¥sG¨e§ %¨q¤9gu9g§ 9gi¨ v'§ %aT¡ l¤eb ntanto8d#t'5he& dalge ab`12era¨ ex¨p9gr9ge9g¨ s'siT ob n 6£¢2 $8 %$`12cb 'T  8#' & 6,£¢w2 $h8 ic%h$`1e2cvb al6a) ( %' A`12e¨ ¨9g9g9g¨ 'T  b!@ U= .   ¥ £ B§4 &   #  # W ¥ £ B§4 6Y &  #6a) ( %'`12e¨ ¨9g9g9g¨ 'T b d 'T  &7£( £¥a8#`12e¨ 'T  b 6a) ( %' A`12e¨ ¨9g9g9g¨ 'T  b ¥ £ B§4 6Y V B§4 YL6eat) ( %'   )¢¡a,( wh%e' re ¥ £ B§4 #   # ¥ £ B§4 6Y &  #is a calc6ulau)s ( que%r'y`12eex¨ pre¨s9gs9gi9go¨ 'nT tbrhdat is 'eqT u ivale6nat ) to( th%e' al`1g2ee¨ bra¨9ge9gx9g¨ p'rTessibon . SXU SWe use the equation and apply the previous case.. ¥2 For every calculus query, its query expression has only one fre)e ( va£¥riaa8 ble, , therefore the equivalent ¢algebra query evaluates to a relation that contains a single column, query. . Therfore, it is a valid algebra The above two Leammas prove the equivalence of the full-text calculus and algebra. ¢ Theorem 3: Completeness of BOOL when is finite £ 7 ¡ ¤£ ¦¥ ¨§ F8d  2 8#' & 6£¢2 $8 %$`12cb `12cb!@ Proof Sketch: Let ( E be a calculus query expression. We will prove 6 ¦' that there exists an equivalent Query expression in BOOL. Without loss of generality,¨ w¥e¨9ga9g9gs¨ sume that £every quantified variable `1i2cnb has a unique name. Let these position variable names be . We first normalize E using the sequence of equivalence transformations presented below. #C C&7 £§7a`12e¨ Ab &7 £§¦V82i` p¨$!b 1. (Sink Negations) Move all negations down to the predicates   &8    E &7 £§7aa`1n2ed¨ b U   .   W SXS   W SXU   38Replace &any7 re£§p7ae`1t2eiti¨ veb negations   w&ith7 £§. 7aIn`12eve¨ rtb quantifiers:   E &7 £§7a`12ei¨ s rb eplac  ed V S S&V U Swith E and E is replaced with E . 3C C&7 £§¦V82i` F¨$!b &7 £§¦V82i` ¨$!b  C S2. (Group) Move every expression of the form U and U &7 £§¦V8o2 ut of the scope of any quantifier over a variable different from . This is possible because U is applied on &P ¥¤§¦©¨  ¦¤  P © ¤ © 68 "T ¦  P £only one position   variab  le.¨ F@ormally  , th¨ ie @transformation is a repeated application of P 8 £ T £ U V FCwhere , , and has no free variable . Use the co&mm7 ut£§a7ati`1v2eit¨ y 3ob f C C C C C ©and to group the above predicate expressions next to each oth er`1a2en¨d r3ib ght after E   ¨ @ . £ £ U VWe get a propositional formula with propositions of the form where . FC C W&7 £§7a`12e¨ 3b  C C 8  V3. (Remove unive&rsa7 l qu£§a7an`12etifi¨ c3ab tion) Remove any universal quantifers by replacing E C SXU C 38 ¦ C C Sw&ith7 £§7a`12e¨ 3bE `12e¨ b . We get a propositional formula over propositions of the form UE . ¦ C C`12e¨ 3b 4. (Local DNF) Convert each to DNF.   8   T   Y Y 8  Y T&7 £§7a`12e¨ b ` `12e¨ b `12e¨ bb ` &7 £§7a`12e¨ b `12e¨ bb Y Y UY Y 8 Y Y C C 8 ¦ C U C ¦ C C5. (` Splirt)&R7epl£§a7ace`12e¨ b E `12e¨ bb &7 £§7a`12e¨ 3b wit`1h2e¨ 3b E `12e¨ 3b U  UE to¨9g9g9g¨ T E for every disjunct in . P P 38 ¥P R P R ¥PLet the ne&w7pos£§it7aio`12en¨ vaqrb iable#s`1b2ee¨ qb . We get a propositional formula over propositions of the U R R Rform E where is a conjunction. 13 ¥P P £ U R P R P Q86. (G#l`1o2eb¨ alqDb NF) Consider to be a propositional formula over propositions of the form R . Convert it to a DNF. &7 £§7a`12e¨ qb E 96( ` b £ 7 £We ¡ ¤£ ¦¥ ¨§ 8  XC8 P ¥ £C P ¡ ¤£ ¦¥ U¨§R F8 C  P X8 C P¥ R C P 78#W' e& define for a calculus query exprd ess i2 on o6b£¢s2er$v8 e%$th`12catb after ¢ &7 £§7a`12e¨ b t§ ¥h@e, normalization w`12eh¨ erb e each ¢ § is either 8a#s' th&e 6eq£¢u2 iv$8a%le$n`1t2cqb uer`¡y  in &7 £§7a`12e¨ B£Ob ¢ OL§ ¢.b!@ d¥¤ `12e¨ b of the form E §¦  2 or of the SXU 9 R 9   R £ C 7 ¡ ¤£ ¦¥ ¨§ F8 C £form E , as in ste¨ pd Gl o2 balDN8#F' f&ro6m£¢2 th$e8 %n$o`1r2cmb aliz¢ ati¨¥o@ n. Therefore, can be 9  ! 9  !decompo6se( d` £ £ C P £ £ £see that inbto=th` 6e (ca`lcul§ ubs expressions AND ... AND 6( ` § U bb ` 6( OR ... OR ` P§ b AND ... aAnNdDitQiEs (noPt§ d© ibffib cult to . Thus, C P £we ! !  ! £  U    Q8 !  !  SXU   Q8 can focus only A§s`1s2ee¨enb above, on converting each § . each§`12e¨ § bis of&th7e fo£§r¦Vm82i` &7 G¨$!b E £§7a`12e¨ &7 b  §`12e¨ £§¦V82i£` G¨$!b b or of the form &7 £§7a`12e¨ b E !  S£ , where £ ¥`12eis¨ b U or U . In either case, we can consider there £ C P U   38 are no duplicates among Let us first consider £ the case w; hoethreerw§ised , we c&an7 sim£§7apl`1y2e¨ E elib min atP e £ thP e`12em¨ . b .   4 ' !    !   0''    0' C PI$ f thd er$ e exists ' and such that £ U `12e¨ bids&7 U £§¦V82i` G¨$ b ,£ 9 £ C P " " £empty , the6n (the` set. co§ ¢nb d=itiAonNY”onAeNtDokeNnOTpe(r$ p6os5 iti9go9g9 n6”5 (S$ S ection 3.1) )), where V is D `12e¨ bid &7 £§¦V82i` G¨$ b U , and vid ola t$ed¨.9g9gT9g¨h$ eS r@ efore, § is the is the set of all tokens. Intuitively, this query returns the empty set because it requires the result nodes to contain a token that is not in D , which is impossible.   4 ' !   !  ' C P 9 SIf there exists ' and such that £ U `12e¨ b  £ £ C Pthen th$ is6i5 s 9ga9gn9 6o5 b$vS ious contradiction and d § &7 £§¦V82i` G¨$!b U and £ V `12e¨ bd 6( is the empty set. We define ` &7 £§¦V82i` G¨$!b § ¢b U = ANY AND NOT( )) as above.   4 6' !   ! ' '  !   0YLe&tt7her£§e¦Ve8x2iis` tGs¨$!b and there does not exist such§`1t2eha¨ t b£ U `12e¨ beds&7 U £§¦V82i` G¨$!b &7 £§¦V82ia`nGd¨$£ b V `12e¨ bed S  2Y    38  £ C P S 7 ¡ ¤£ ¦¥ ¨§ 8U $ 9 C Ptoke&n7 U £E d$ £§7a`12e.¨ . Then we can ignore any £ Tb he&lat7 ter£§a¦Vr8e2it` rGiv¨$!iab!l@ ly satisfied. In which contains this case, § =  2 6( U , which is exactly the semantics for U ` 8#' & 6£¢2 $8fo%r$`1s2cobme § ¥b $ =.   8 £  C P 2C 7 ¡ ¤£ ¦¥ ¨§ 8 U   @8 S  2C @8T9g9gh9 e las&tc7 as£§e¦Vi8s2i` G§¨$ d ¢b  2 8#' & 6£¢2 $8 %$`12cb &7 £§7a`12e¨ b E &7 U2 £§¦V82i` G¨$ Ub S 2C 2C P #T ¤T P P  P 2C £ C P 2C 7 ¡ ¤£ ¦¥ ¨§ &8  P P  &8ken9 ! ¢7 ¡ ¤£ ¦¥ ¨§ &8 U   &8D`A&7  P 9 C P P U P&7 £U fU  r£§o$ £§¦Vm¦VU 8¨82it9g2iUh9g` 9gGe`¨G$¨c¨$  o$ @ mU.b!b@pDleume9g9ge9ton.tth&T eh$ 7ifisU U ¨ne9g£§i9gxt¦V9gep¨8n$r2ieess` sGs@ i¨ooo$ nff#D c ba$,b!n@ U ¨b9ge9g9g¨i$nt e@ rpreted as the §d ¤  2  2 with8#r' eg& a6rd£¢s2 8#' & 6£¢2 = 6( ` . The latter is trivially equivalent to the query condition t$o8 %th$e`12csbet $8 %$`12cb § ¢b $ = U thatH D OR & $ c7 oU n¨t£§9ga9g7a9gi¨n`1$2es¨ a@ &7 E £§7a`12e¨ E$ ... OR tod b b  . £ C P SXU   8  9 £ C P 9 S £ C P S £ C PIn case § d &7 £§7a`12e¨ b E  P P `12e¨ b 6( ` £ , then § ¢b 6( = NOT ` § ¢b , where § is transformed as in the previous case. Theorem 5: Completeness of COMP ¥ E(39 6A6£ ¥ £ B§4   Proof Sketch: We will prove that every calculus query can6bea) re( pre%se'n`1t2ee¨d by¨9ga9g9g¨ C'OT bMP query " 8' . We use induction on the structure of the query expression .   ¥ £ B§4   ¥ E(39 6A  ¥ £ B§46a) ( %'`12e¨ beds&7 £§7a`12e¨ b 6£ If E , then " 8' 6a) ( %' = HAS ANY. This is equivalent to by definition.   ¥ £ B§4   Y 3Y ¥ E(39 6A 6a) ( %'`12e¨ brd &7 £§¦V82i` G¨ $p£§¦V82 b 6£ ¥ £ B§4If 6a) ( %' U , then " 8' $p£§¦V82 = HAS ’ ’. This is equiv- alent to by definition. 14   ¥ £ B§4 &   &  £  £ ¥ E(39 6A6a) ( ¥ £ B§4If %'`12e¨ ¨9g9g9g¨ 'T brd %' 8©` 6a) ( %' ¨9g9g9g¨ 'Ta¨ ¨9g9g9g¨ £ b , then 6£ " 8' d  2  `  !¨9g9g9g¨ E ¨W3 !¨9g9g9g¨W3 ¨ b . This is equivalent to by definition.   ¥ £ B§4 &  Y Y Y Y Y Y ¥ £ B§4 ¤ &  Y Y Q86a) ( %'`12e¨ ¨9g9g9g¨ £¢¨ ¨9g9g9g¨ ¨ ¨9g9g9g¨ S b 6a) ( %' ¢`12e¨ ¨9g9g9g¨ £¢¨ ¨9g9g9g¨ b ¥ £ B§4 ' &  YR Y RY Y R R ( £ ¥ £ B§4 4 R ¥ R £ B§4 'I6f a) ( %' `12e¨ ¨9g9g9g¨ £¢3¨ ¨9g9g9g¨ S b ¦sd = )¥¤ ¦¤ 6a) ( %' 6a) ( R R ¥ E(39 6A¦ ¥ E(39 6A4', where 6, £ " 8' , and 6£ ¥ E(39 6A ¥ E(39 6A  ¥ E(39 6A4' ¥ £ B§4c6u£lus que" r8y' expr6es£ sions w" 8it' h equivalen6t£ COMP" 8q' ueries be 6anad) ( %' %' " 8' are cal- , then = AND . This is equivalent to by definition.   ¥ £ B§4 &  Y Y Y Y Y Y ¥ £ B§4 ¤ &  Y Y QT6a) ( %'`12e¨ ¨9g9g9g¨ £¢¨ ¨9g9g9g¨ ¨ ¨9g9g9g¨ S b 6a) ( %' ¢`12e¨ ¨9g9g9g¨ £¢¨ ¨9g9g9g¨ b ¥ £ B§4 ' &  YR Y RY Y R R ( £ ¥ £ B§4 4 R ¥ R £ B§4 'I6f a) ( %' `12e¨ ¨9g9g9g¨ £¢3¨ ¨9g9g9g¨ S b ¦sd = )¥¤ ¦¤ 6a) ( %' 6a) ( R R ¥ E(39 6A¦ ¥ E(39 6A4', where 6, £ " 8' , and 6£ ¥ E(39 6A ¥ E(39 6A  ¥ E(39 6A4' ¥ £ B§4c6u£lus que" r8y' expr6es£ sions w" 8it' h equival6en£ t COM" P8' queries be 6aan) d( %' %' " 8' are cal- , then = OR . This is equivalent to by definition.   ¥ £ B§4 &  ¥ £ B§4 6Y &  ¥ £ B§4 ¤Y6a) ( %'`12e¨ ¨9g9g9g¨ 'T b d 6a) ( %' `12e¨ ¨9g9g9g¨ 'T b 6a) ( S ¥ E(39 6A Y ¥ E(39 6A ¥ E(39 6A YIf 6£ " 8' , where 6£ ¥ £ B§4pression that is equiva6lean) t t( o th%e' COMP query , then %' " 8' is a calcul6us£ query" e8x' - = NOT . This is equivalent to by definition.   ¥ £ B§4 &   #  # Q8X¥ £ B§4 6Y &  # ¥ £ B§4 ¤Y6a) ( %'`12e¨ ¨9g9g9g¨ 'T b 'T  &7£( £¥a8#`12e¨ 'T  b 6a) ( %' A`12e¨ U ¥ E(39 6A¦Y ¥ E(39 6AIf = 6£  # ¥ E(39 6A5Y ¥ £ B§4is a calcul'uTs query e6xp£ ressio" n8t'hat is equivalent to the COM6P aqu) e( ry %' ¨9g9g9g¨ 'T  b 6a) ( " 8' , w6he£ re " 8' , then %' = SOME ( ). This is equivalent to by definition.   ¥ £ B§4 &   #  # W ¥ £ B§4 6Y &  #6a) ( %'`12e¨ ¨9g9g9g¨ 'T b d 'T  &7£( £¥a8#`12e¨ 'T  b 6a) ( %' `12e¨ ¨9g9g9g¨ 'T  b ¥ £ B§4 6Y V ¥ E(39 6A¦YI6fa) ( %' 6£ " 8' , where ¥ E(39 6A  # ¥ E(39 6A5Y ¥ £ B§46£ " 8' is a calculus q"uT e ry exp6re£ssion t"ha8t' is equivalent to the COMP qu6eary) ( %' , then ¢ = EVERY ( ) . This is equivalent to by definition. 15