A Generalized Term Dependence Model in Information Retrieval
Yu, C. T.; Buckley, Chris; Lam, K.; Salton, Gerard
The tree dependence model has been used successfully to incorporate dependencies between certain term pairs on the information retrieval process, while the Bahadur Lazarsfeld Expansion (BLE) which specifies dependencies between all subsets of terms has been used to identify productive clusters of items in a clustered data base environment. The successes of these models are unlikely to be accidental; it is of interest therefore to examine the similarities between the two models. The disadvantage of the BLE model is the exponential number of terms appearings in the full expression, while a truncated BLE system may produce negative probability values. The disadvantage of the tree dependence model is the restriction to dependencies between certain term pairs only and the exclusion of higher-order dependencies. A generalized term dependence model is introduced in this study which does not carry the disadvantages of either the tree dependence or the BLE models. Sample evaluation results are included to demonstrate the usefulness of the generalized system.
computer science; technical report
Previously Published As