Singhal, AmitSalton, GerardMitra, MandarBuckley, Chris2007-04-232007-04-231995-07http://techreports.library.cornell.edu:8081/Dienst/UI/1.0/Display/cul.cs/TR95-1529https://hdl.handle.net/1813/7186In the TREC collection -- a large full-text experimental text collection with widely varying document lengths -- we observe that the likelihood of a document being judged relevant by a user increases with the document length. We show that a retrieval strategy, such as the vector-space cosine match, that retrieves documents of different lengths with roughly equal probability, will not optimally retrieve useful documents from such a collection. We present a modified technique that attempts to match the likelihood of retrieving a document of a certain length to the likelihood of documents of that length being judged relevant, and show that this technique yields significant improvements in retrieval effectiveness.574646 bytes610626 bytesapplication/pdfapplication/postscripten-UScomputer sciencetechnical reportDocument Length Normalizationtechnical report