Characterization of Two New Experimental Collections in Computer and Information Science Containing Textual and Bibliographic Concepts
Fox, Edward A.
Two new collections are described which are particularly useful for investigating the interaction between textual and bibliographic data in the automatic indexing and retrieval of documents. An extension to the vector space model has been proposed whereby various types of concepts are included in the representation of such documents. Experiments using an enhanced version of the SMART system have shown such an extended model to perform better than simpler schemes. The CACM and ISI collections developed for this research should be of value for future related studies. The ISI collection has author, title/abstract, and co-citation data for the 1460 most highly cited articles and manuscripts in information science in the 1969-1977 period. The CACM collection contains 7 types of concepts for the 3204 articles published in the Communication of the ACM up through 1979. These collections have 76 and 52 queries, respectively, along with relevance judgments.
computer science; technical report
Previously Published As