MetadataShow full item record
Caruana, Rich; Elhawary, Mohamed; Nguyen, Nam; Smith, Casey
Clustering is ill-defined. Unlike supervised learning where labels lead to crisp performance criteria such as accuracy and squared error, clustering quality depends on how the clusters will be used. Devising clustering criteria that capture what users need is difficult. Most clustering algorithms search for one optimal clustering based on a pre-specified clustering criterion. Once that clustering has been determined, no further clusterings are examined. Our approach differs in that we search for many alternate reasonable clusterings of the data, and then allow users to select the clustering(s) that best fit their needs. Any reasonable partitioning of the data is potentially useful for some purpose, regardless of whether or not it is optimal according to a specific clustering criterion. Our approach first finds a variety of reasonable clusterings. It then clusters this diverse set of clusterings so that users must only examine a small number of qualitatively different clusterings. In this paper, we present methods for automatically generating a diverse set of alternate clusterings, as well as methods for grouping clusterings into meta clusters. We evaluate meta clustering on four test problems, and then apply meta clustering to two case studies. Surprisingly, clusterings that would be of most interest to users often are not very compact clusterings.
computer science; technical report
Previously Published As