Generation and Search of Clustered Files
Bergmark, D.; Salton, Gerard; Wong, A.
A classified, or clustered file is one where related, or similar records are grouped into classes, or clusters of items in such a way that all items within a cluster are jointly retrievable. Clustered files exhibit substantial advantages in many retrieval environments over the more conventional inverted list or multilist technologies. An inexpensive file clustering method applicable to large files is given together with appropriate file search methods. An abstract model is used to predict the retrieval effectiveness of various search methods in a clustered file environment, and experimental evidence is introduced to confirm the usefulness of the model. As an example, a collection of research papers in computer science is clustered automatically, and the resulting research clusters are compared with exissting, manually constructed taxonomies for the computer field.
computer science; technical report
Previously Published As