A Theory of Term Importance in Automatic Text Analysis
Permanent Link(s)
Collections
Author
Salton, Gerard
Yang, C. S.
Yu, C. T.
Abstract
Most existing automatic content analysis and indexing techniques are based on word frequency characteristics applied largely in an ad hoc manner. Contradictory requirements arise in this connection, in that terms exhibiting high occurence frequencies in individual documents are often useful for high recall performance (to retrieve many relevant items), whereas terms with low frequency in the whole collection are useful for high precision (to reject nonrelevant items).
Date Issued
1974-07
Publisher
Cornell University
Keywords
Previously Published as
http://techreports.library.cornell.edu:8081/Dienst/UI/1.0/Display/cul.cs/TR74-208
Type
technical report