Extended Boolean Information Retrieval
Salton, Gerard; Fox, Edward A.; Wu, Harry
In conventional information retrieval Boolean combinations of index terms are used to formulate the users' information requests. While any document is in principle retrievable by a Boolean query, the amount of output obtainable by Boolean processing is difficult to control, and the retrieved items are not ranked in any presumed order of importance to the user population. In the vector processing model of retrieval, the retrieved items are easily ranked in decreasing order of the query-record similarity, but the queries themselves are unstructured and expressed as simple sets of weighted index terms. A new, extended Boolean information retrieval system is introduced which is intermediate between the Boolean system of query processing and the vector processing model. The query structure inherent in the Boolean system is preserved, while at the same time weighted terms may be incorporated into both queries and stored documents; the retrieved output can also be ranked in strict similarity order with the user queries. A conventional retrieval system can be modified to make use of the extended system. Laboratory tests indicate that the extended system produces better retrieval output than either the Boolean or the vector processing systems.
computer science; technical report
Previously Published As