JavaScript is disabled for your browser. Some features of this site may not work without it.
Document Length Normalization

Author
Singhal, Amit; Salton, Gerard; Mitra, Mandar; Buckley, Chris
Abstract
In the TREC collection -- a large full-text experimental text collection with widely varying document lengths -- we observe that the likelihood of a document being judged relevant by a user increases with the document length. We show that a retrieval strategy, such as the vector-space cosine match, that retrieves documents of different lengths with roughly equal probability, will not optimally retrieve useful documents from such a collection. We present a modified technique that attempts to match the likelihood of retrieving a document of a certain length to the likelihood of documents of that length being judged relevant, and show that this technique yields significant improvements in retrieval effectiveness.
Date Issued
1995-07Publisher
Cornell University
Subject
computer science; technical report
Previously Published As
http://techreports.library.cornell.edu:8081/Dienst/UI/1.0/Display/cul.cs/TR95-1529
Type
technical report