JavaScript is disabled for your browser. Some features of this site may not work without it.
A Theory of Indexing

Author
Salton, Gerard
Abstract
THe content analysis, or indexing problem, is fundamental in information storage and retrieval. Several automatic procedures are examined for the assignment of significance values to the terms, or keywords, identifying the documents of a collection. Good and bad index terms are characterized by objective measures, leading to the conclusion that the best index terms are those with medium document frequency and skewed frequency distributions. A discrimination value model is introduced which makes it possible to construct effective indexing vocabularies by using phrase and thesaurus transformations to modify poor discriminators - those whose document frequency is too high, or too low - into better discriminators, and hence more useful index terms. Test results are included which illustrate the effectiveness of the theory.
Date Issued
1974-03Publisher
Cornell University
Subject
computer science; technical report
Previously Published As
http://techreports.library.cornell.edu:8081/Dienst/UI/1.0/Display/cul.cs/TR74-203
Type
technical report