Show simple item record

dc.contributor.authorSinghal, Amiten_US
dc.contributor.authorSalton, Gerarden_US
dc.contributor.authorMitra, Mandaren_US
dc.contributor.authorBuckley, Chrisen_US
dc.date.accessioned2007-04-23T18:03:17Z
dc.date.available2007-04-23T18:03:17Z
dc.date.issued1995-07en_US
dc.identifier.citationhttp://techreports.library.cornell.edu:8081/Dienst/UI/1.0/Display/cul.cs/TR95-1529en_US
dc.identifier.urihttps://hdl.handle.net/1813/7186
dc.description.abstractIn the TREC collection -- a large full-text experimental text collection with widely varying document lengths -- we observe that the likelihood of a document being judged relevant by a user increases with the document length. We show that a retrieval strategy, such as the vector-space cosine match, that retrieves documents of different lengths with roughly equal probability, will not optimally retrieve useful documents from such a collection. We present a modified technique that attempts to match the likelihood of retrieving a document of a certain length to the likelihood of documents of that length being judged relevant, and show that this technique yields significant improvements in retrieval effectiveness.en_US
dc.format.extent574646 bytes
dc.format.extent610626 bytes
dc.format.mimetypeapplication/pdf
dc.format.mimetypeapplication/postscript
dc.language.isoen_USen_US
dc.publisherCornell Universityen_US
dc.subjectcomputer scienceen_US
dc.subjecttechnical reporten_US
dc.titleDocument Length Normalizationen_US
dc.typetechnical reporten_US


Files in this item

Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record

Statistics