Cornell University
Library
Cornell UniversityLibrary

eCommons

Help
Log In(current)
  1. Home
  2. Cornell Computing and Information Science
  3. Computer Science
  4. Computer Science Technical Reports
  5. Implementing Agglomerative Hierarchic Clustering Algorithms for Use in Document Retrieval

Implementing Agglomerative Hierarchic Clustering Algorithms for Use in Document Retrieval

File(s)
86-765.ps (446.29 KB)
86-765.pdf (1.54 MB)
Permanent Link(s)
https://hdl.handle.net/1813/6605
Collections
Computer Science Technical Reports
Author
Voorhees, Ellen M.
Abstract

Searching hierarchically clustered document collections can be effective, but creating the cluster hierarchies is expensive since there are both many documents and many terms. However, the information in the document-term matrix is sparse: documents are usually indexed by relatively few terms. This paper describes the implementations of three agglomerative hierarchic clustering algorithms that exploit this sparsity so that collections much larger than the algorithms' worst case running times would suggest can be clustered. The implementations described in the paper have been used to cluster a collection of 12,000 documents.

Date Issued
1986-07
Publisher
Cornell University
Keywords
computer science
•
technical report
Previously Published as
http://techreports.library.cornell.edu:8081/Dienst/UI/1.0/Display/cul.cs/TR86-765
Type
technical report

Site Statistics | Help

About eCommons | Policies | Terms of use | Contact Us

copyright © 2002-2026 Cornell University Library | Privacy | Web Accessibility Assistance