An Error Analysis for Functions of Qualitative Attributes with Application to Information Retrieval
The use of overlapping, non-hierarchical classifications in information retrieval is considered. It is assumed that the population of objects to be classified is such that only a subset of the classes satisfying the classificatory criterion may be found. The effect of this assumption on the measurement of classification stability is considered. As a step towards the determination of stability, a general technique is presented for deriving the expectation of a statistical function of the similarities between the objects of the population. It is assumed that the objects are describable in terms of two-state attributes which are susceptible independently and equiprobably to error with an assignable probability. Two commonly encountered similarity functions are treated in detail. The techniques disclosed are applicable, in principle, to classification algorithms, whether hierarchical or non-hierarchical, which utilize a similarity matrix giving the similarities between pairs of objects described by two-state independent attributes.
computer science; technical report
Previously Published As