eCommons

 

Pivoted Document Length Normalization

dc.contributor.authorSinghal, Amiten_US
dc.contributor.authorBuckley, Chrisen_US
dc.contributor.authorMitra, Mandaren_US
dc.contributor.authorSalton, Gerarden_US
dc.date.accessioned2007-04-23T18:05:18Z
dc.date.available2007-04-23T18:05:18Z
dc.date.issued1995-11en_US
dc.description.abstractDocument length normalization is an important aspect of term weight assignment in an automatic information retrieval system. In this study, we observe that a normalization scheme that retrieves documents of all lengths with similar chances as their likelihood of relevance will outperform another scheme which retrieves documents with chances very different from their likelihood of relevance. We show that the retrieval probabilities for a particular normalization method deviate systematically from the relevance probabilities across different collections. We present pivoted normalization a technique that can be used to reduce the gap between the relevance and the retrieval probabilities. Training pivoted normalization on one collection, we can successfully use it on other (new) text collections, yielding a robust, collection independent normalization technique. We use the idea of pivoting with the well known cosine normalization scheme. We point out some shortcomings of the cosine normalization function and present two new normalization functions --- pivoted unique normalization and pivoted byte size normalization.en_US
dc.format.extent588248 bytes
dc.format.extent560380 bytes
dc.format.mimetypeapplication/pdf
dc.format.mimetypeapplication/postscript
dc.identifier.citationhttp://techreports.library.cornell.edu:8081/Dienst/UI/1.0/Display/cul.cs/TR95-1560en_US
dc.identifier.urihttps://hdl.handle.net/1813/7217
dc.language.isoen_USen_US
dc.publisherCornell Universityen_US
dc.subjectcomputer scienceen_US
dc.subjecttechnical reporten_US
dc.titlePivoted Document Length Normalizationen_US
dc.typetechnical reporten_US

Files

Original bundle
Now showing 1 - 2 of 2
Loading...
Thumbnail Image
Name:
95-1560.pdf
Size:
574.46 KB
Format:
Adobe Portable Document Format
No Thumbnail Available
Name:
95-1560.ps
Size:
547.25 KB
Format:
Postscript Files