A Note on Inverse Document Frequency Weighting Scheme
Wong, S. K. M.; Yao, Y. Y.
Based on the Shannon information theory, a measure for term value is introduced. This study is an attempt to provide a theoretical justification for the inverse document frequency (IDF) weighting scheme. The argument presented in this paper is somewhat different from those suggested earlier. It is shown that IDF weights can be derived from the proposed approach by assuming that each index term has an even distribution within a subset of documents. A critical comment on the signal-noise ratio (S/N) weighting method is also included.
computer science; technical report
Previously Published As