Salton, Gerard; Buckley, Chris
Very large text databases now exist in machine-readable form, covering arbitrary subject matter in unrestricted discourse areas. The conventional text retrieval approaches are not easily used in such circumstances, because the knowledge needed to understand unrestricted subject matter is not readily available for practical use. A new approach is outlined for text structuring and retrieval, based on flexible text matching methods using different context granularities. When global as well as local similarities exist between distinct texts, the presumption is that the texts cover semantically similar subject areas. This leads to the automatic introduction of links between related texts, and to the retrieval of text excerpts in response to available user queries. Evaluation results are given to demonstrate the effectiveness of the text matching approach.
computer science; technical report
Previously Published As