Clustered File Generation and Its Application to Computer Science Taxonomies
Bergmark, D.; Salton, Gerard
A clustered file organization is one where related, or similar records are grouped into classes, or clusters of items in such a way that all items within a cluster are jointly retrievable. Such a file organization is advantageous for interactive searching where tentative query formulations may be used and the records may be specified incompletely or approximately. An inexpensive file clustering method applicable to large files is given together with an appropriate file search method. The method is used to cluster a file of research articles in computer science based on citation similarities between the papers; this leads to the identification of groups of active computer science research topics and of productive computer scientists.
computer science; technical report
Previously Published As