ALGORITHMIC CONSIDERATIONS FOR SPECTRAL CLUSTERING, AND LEARNING GRAPHS FROM LABELED DATA
dc.contributor.author | Singh, Abhay | |
dc.contributor.chair | Damle, Anil | en_US |
dc.contributor.committeeMember | Kleinberg, Robert | en_US |
dc.date.accessioned | 2024-01-31T21:12:45Z | |
dc.date.available | 2024-01-31T21:12:45Z | |
dc.date.issued | 2023-05 | |
dc.description.abstract | Spectral clustering is an extensively-used algorithm to identify well-connected sets of nodes, or clusters, in graphs. Generally, the approach involves computing a suitable number of eigenvectors from the (normalized) graph Laplacian and thereafter applying a downstream clustering algorithm, such as k-means, with the set of spectral embeddings as input. Despite the popularity of this general approach, a number of fundamental challenges still remain when applying spectral clustering. In this thesis, we propose new algorithmic techniques that address a few of these challenges. We tackle the problems of clustering uncertainty, graph embedding quality, noisy data, direct cluster assignment, vanishing eigengaps, and fair spectral embeddings.We first propose an algorithmic framework that uses the eigenvectors of the normalized graph Laplacian to output a set of distributions, each of which is supported on the nodes of that graph. We propose and evaluate clustering methods with a metric that measures the localization, or variance, of a distribution supported on the nodes. Our outputted distributions are empirically well-localized and achieve low variance when directly optimized for this objective. Values from these distributions can be read off directly to provide cluster assignments, or initialize other clustering algorithms, such as k-means. Due to the interpretation of clusters as distributions, it is straightforward to identify ‘noisy’ nodes as those with little probability mass. Further, insights from perturbation theory inspire a simple yet effective strategy to deal with vanishing eigengaps: compute additional eigenvectors and subsequently reduce their dimensionality. The framework applies a semi-orthogonal matrix to the right of the eigenvectors and simultaneously achieves both localization and dimensionality reduction. The resulting algorithms are efficient due to a parameterization of the real-valued rotation matrices SO(k) as the exponential map of skew-symmetric matrices, leveraging a fundamental result from Lie theory. In later sections, we describe a robust method to deal with vanishing eigengaps that can be incorporated in a traditional spectral clustering setup. The idea is to use a smoothed version of the typical hard threshold when selecting the number of eigenvectors. We design an iterative algorithm that gives a smooth threshold with desired degrees-of-freedom, defined as the trace of the threshold function applied to the Laplacian. We also describe a degree equilibration technique for spectral clustering in problems of fairness and stereotyping. Finally, we conclude with a method to fit a graph to labeled data, with the insight that a graph that is representative of a set of data can be used to improve downstream prediction. | en_US |
dc.identifier.doi | https://doi.org/10.7298/cpj6-aj21 | |
dc.identifier.other | Singh_cornell_0058O_11736 | |
dc.identifier.other | http://dissertations.umi.com/cornell:11736 | |
dc.identifier.uri | https://hdl.handle.net/1813/113948 | |
dc.language.iso | en | |
dc.title | ALGORITHMIC CONSIDERATIONS FOR SPECTRAL CLUSTERING, AND LEARNING GRAPHS FROM LABELED DATA | en_US |
dc.type | dissertation or thesis | en_US |
dcterms.license | https://hdl.handle.net/1813/59810.2 | |
thesis.degree.discipline | Computer Science | |
thesis.degree.grantor | Cornell University | |
thesis.degree.level | Master of Science | |
thesis.degree.name | M.S., Computer Science |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- Singh_cornell_0058O_11736.pdf
- Size:
- 3.69 MB
- Format:
- Adobe Portable Document Format