Cornell University
Library
Cornell UniversityLibrary

eCommons

Help
Log In(current)
  1. Home
  2. Cornell Computing and Information Science
  3. Computer Science
  4. Computer Science Technical Reports
  5. Pivoted Document Length Normalization

Pivoted Document Length Normalization

File(s)
95-1560.ps (547.25 KB)
95-1560.pdf (574.46 KB)
Permanent Link(s)
https://hdl.handle.net/1813/7217
Collections
Computer Science Technical Reports
Author
Singhal, Amit
Buckley, Chris
Mitra, Mandar
Salton, Gerard
Abstract

Document length normalization is an important aspect of term weight assignment in an automatic information retrieval system. In this study, we observe that a normalization scheme that retrieves documents of all lengths with similar chances as their likelihood of relevance will outperform another scheme which retrieves documents with chances very different from their likelihood of relevance. We show that the retrieval probabilities for a particular normalization method deviate systematically from the relevance probabilities across different collections. We present pivoted normalization a technique that can be used to reduce the gap between the relevance and the retrieval probabilities. Training pivoted normalization on one collection, we can successfully use it on other (new) text collections, yielding a robust, collection independent normalization technique. We use the idea of pivoting with the well known cosine normalization scheme. We point out some shortcomings of the cosine normalization function and present two new normalization functions --- pivoted unique normalization and pivoted byte size normalization.

Date Issued
1995-11
Publisher
Cornell University
Keywords
computer science
•
technical report
Previously Published as
http://techreports.library.cornell.edu:8081/Dienst/UI/1.0/Display/cul.cs/TR95-1560
Type
technical report

Site Statistics | Help

About eCommons | Policies | Terms of use | Contact Us

copyright © 2002-2026 Cornell University Library | Privacy | Web Accessibility Assistance