Cornell University
Library
Cornell UniversityLibrary

eCommons

Help
Log In(current)
  1. Home
  2. Cornell Computing and Information Science
  3. Computing and Information Science
  4. Computing and Information Science Technical Reports
  5. Supervised k-Means Clustering

Supervised k-Means Clustering

File(s)
paper.pdf (244.33 KB)
Permanent Link(s)
https://hdl.handle.net/1813/11621
Collections
Computing and Information Science Technical Reports
Author
Finley, Thomas
Joachims, Thorsten
Abstract

The k-means clustering algorithm is one of the most widely used, effective, and best understood clustering methods. However, successful use of k-means requires a carefully chosen distance measure that reflects the properties of the clustering task. Since designing this distance measure by hand is often difficult, we provide methods for training k-means using supervised data. Given training data in the form of sets of items with their desired partitioning, we provide a structural SVM method that learns a distance measure so that k-means produces the desired clusterings. We propose two variants of the methods -- one based on a spectral relaxation and one based on the traditional k-means algorithm -- that are both computationally efficient. For each variant, we provide a theoretical characterization of its accuracy in solving the training problem. We also provide an empirical clustering quality and runtime analysis of these learning methods on varied high-dimensional datasets.

Sponsorship
This work was supported under NSF Award IIS-0713483 ``Learning Structure to Structure Mapping,'' and through a gift from Yahoo! Inc.
Date Issued
2008-11-18T05:14:17Z
Keywords
machine learning
•
k-means
•
clustering
•
computer science
Type
technical report

Site Statistics | Help

About eCommons | Policies | Terms of use | Contact Us

copyright © 2002-2026 Cornell University Library | Privacy | Web Accessibility Assistance