Some Considerations for Implementing the SMART Information Retrieval System Under UNIX
Fox, Edward A.
Since the early 1960's the SMART project has tested out new ideas in information science aimed at fully automatic document retrieval. Beginning in 1980 development of an enhanced and generalized version of SMART has progresses at Cornell. The current implementation is in the C language and runs under the UNIX operating system on a VAX 11/780 computer. The history of SMART is outlined. Considerations that led to the current design are described. Since SMART now allows multiple concept types to be manipulated in connection with an extended vector representation, storage and processing issues are discussed, including use of INGRES relations. Clustering algorithms are presented and run parameters are given for document clustering and subsequent clustered searching. SMART experiments (e.g. with p-norm queries, or probabilistic methods) can be compared using the evaluation package. The S statistical package can be applied to performing other special analysis and descriptive tasks. Finally, to illustrate the usefulness of these facilities, an outline is given of current SMART activities and of future plans.
computer science; technical report
Previously Published As