eCommons

 

ALGORITHMIC CONSIDERATIONS FOR SPECTRAL CLUSTERING, AND LEARNING GRAPHS FROM LABELED DATA

dc.contributor.authorSingh, Abhay
dc.contributor.chairDamle, Anilen_US
dc.contributor.committeeMemberKleinberg, Roberten_US
dc.date.accessioned2024-01-31T21:12:45Z
dc.date.available2024-01-31T21:12:45Z
dc.date.issued2023-05
dc.description.abstractSpectral clustering is an extensively-used algorithm to identify well-connected sets of nodes, or clusters, in graphs. Generally, the approach involves computing a suitable number of eigenvectors from the (normalized) graph Laplacian and thereafter applying a downstream clustering algorithm, such as k-means, with the set of spectral embeddings as input. Despite the popularity of this general approach, a number of fundamental challenges still remain when applying spectral clustering. In this thesis, we propose new algorithmic techniques that address a few of these challenges. We tackle the problems of clustering uncertainty, graph embedding quality, noisy data, direct cluster assignment, vanishing eigengaps, and fair spectral embeddings.We first propose an algorithmic framework that uses the eigenvectors of the normalized graph Laplacian to output a set of distributions, each of which is supported on the nodes of that graph. We propose and evaluate clustering methods with a metric that measures the localization, or variance, of a distribution supported on the nodes. Our outputted distributions are empirically well-localized and achieve low variance when directly optimized for this objective. Values from these distributions can be read off directly to provide cluster assignments, or initialize other clustering algorithms, such as k-means. Due to the interpretation of clusters as distributions, it is straightforward to identify ‘noisy’ nodes as those with little probability mass. Further, insights from perturbation theory inspire a simple yet effective strategy to deal with vanishing eigengaps: compute additional eigenvectors and subsequently reduce their dimensionality. The framework applies a semi-orthogonal matrix to the right of the eigenvectors and simultaneously achieves both localization and dimensionality reduction. The resulting algorithms are efficient due to a parameterization of the real-valued rotation matrices SO(k) as the exponential map of skew-symmetric matrices, leveraging a fundamental result from Lie theory. In later sections, we describe a robust method to deal with vanishing eigengaps that can be incorporated in a traditional spectral clustering setup. The idea is to use a smoothed version of the typical hard threshold when selecting the number of eigenvectors. We design an iterative algorithm that gives a smooth threshold with desired degrees-of-freedom, defined as the trace of the threshold function applied to the Laplacian. We also describe a degree equilibration technique for spectral clustering in problems of fairness and stereotyping. Finally, we conclude with a method to fit a graph to labeled data, with the insight that a graph that is representative of a set of data can be used to improve downstream prediction.en_US
dc.identifier.doihttps://doi.org/10.7298/cpj6-aj21
dc.identifier.otherSingh_cornell_0058O_11736
dc.identifier.otherhttp://dissertations.umi.com/cornell:11736
dc.identifier.urihttps://hdl.handle.net/1813/113948
dc.language.isoen
dc.titleALGORITHMIC CONSIDERATIONS FOR SPECTRAL CLUSTERING, AND LEARNING GRAPHS FROM LABELED DATAen_US
dc.typedissertation or thesisen_US
dcterms.licensehttps://hdl.handle.net/1813/59810.2
thesis.degree.disciplineComputer Science
thesis.degree.grantorCornell University
thesis.degree.levelMaster of Science
thesis.degree.nameM.S., Computer Science

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Singh_cornell_0058O_11736.pdf
Size:
3.69 MB
Format:
Adobe Portable Document Format