Show simple item record

dc.contributor.authorSalton, Gerarden_US
dc.contributor.authorYang, C. S.en_US
dc.contributor.authorYu, C. T.en_US
dc.description.abstractAn attempt is made to characterize the usefulness of terms occurring in stored documents and user queries as a function of their frequency characteristics across the documents of a collection. It is found that the best terms are those having medium frequency in the collection and skewed frequency distributions. Correspondingly, terms exhibiting either very high or very low document frequency are not as useful. To improve the indexing vocabulary, it becomes necessary to group low frequency terms into classes, and to break up high frequency terms by forming phrases. An indexing theory is described based on term frequency considerations, and a new phrase generation method is introduced. The resulting improvements in the indexing vocabulary are evaluated.en_US
dc.format.extent1024673 bytes
dc.format.extent408116 bytes
dc.publisherCornell Universityen_US
dc.subjectcomputer scienceen_US
dc.subjecttechnical reporten_US
dc.titleContribution to the Theory of Indexingen_US
dc.typetechnical reporten_US

Files in this item


This item appears in the following Collection(s)

Show simple item record