Supervised k-Means Clustering

Finley, Thomas; Joachims, Thorsten

Supervised k-Means Clustering

Files

paper.pdf (244.33 KB)

Permanent Link(s)

https://hdl.handle.net/1813/11621

Collections

Computing and Information Science Technical Reports

Full item page

Author(s)

Finley, Thomas

Joachims, Thorsten

Abstract

The k-means clustering algorithm is one of the most widely used, effective, and best understood clustering methods. However, successful use of k-means requires a carefully chosen distance measure that reflects the properties of the clustering task. Since designing this distance measure by hand is often difficult, we provide methods for training k-means using supervised data. Given training data in the form of sets of items with their desired partitioning, we provide a structural SVM method that learns a distance measure so that k-means produces the desired clusterings. We propose two variants of the methods -- one based on a spectral relaxation and one based on the traditional k-means algorithm -- that are both computationally efficient. For each variant, we provide a theoretical characterization of its accuracy in solving the training problem. We also provide an empirical clustering quality and runtime analysis of these learning methods on varied high-dimensional datasets.

Sponsorship

This work was supported under NSF Award IIS-0713483 ``Learning Structure to Structure Mapping,'' and through a gift from Yahoo! Inc.

Date Issued

2008-11-18T05:14:17Z

Keywords

machine learning; k-means; clustering; computer science

Types

technical report

Supervised k-Means Clustering

Files

No Access Until

Permanent Link(s)

Collections

Other Titles

Author(s)

Abstract

Journal / Series

Volume & Issue

Description

Sponsorship

Date Issued

Publisher

Keywords

Location

Effective Date

Expiration Date

Sector

Employer

Union

Union Local

NAICS

Number of Workers

Committee Chair

Committee Co-Chair

Committee Member

Degree Discipline

Degree Name

Degree Level

Related Version

Related DOI

Related To

Related Part

Based on Related Item

Has Other Format(s)

Part of Related Item

Related To

Related Publication(s)

Link(s) to Related Publication(s)

References

Link(s) to Reference(s)

Previously Published As

Government Document

ISBN

ISMN

ISSN

Other Identifiers

Rights

Rights URI

Types

Accessibility Feature

Accessibility Hazard

Accessibility Summary

Link(s) to Catalog Record