JavaScript is disabled for your browser. Some features of this site may not work without it.
Using Graphs For Topic Discovery

Author
Jo, Yookyung
Abstract
As large-scale digital text collections become abundant, the necessity of automatically summarizing text data by discovering topics and the evolution of topics in them is well-justified and there is surge of research interest in the task. We use graphs for topic discovery and topic evolution discovery by mining the statistical properties of graphs associated with the text data. Considering that an increasing number of text collections have some kind of networks associated with the data (text data in social network service, research paper collections, digital text with user browsing history), there is a great potential in using graphs for the task of text mining. Our work on topic and topic evolution discovery shows qualitatively different results from the existing approaches in that the discovered topics exhibit concreteness with a variety of size and time dynamics and in that the rich topology of topic evolution is captured in the result. We discover topics by mining the correlation between topic terms and the citation graph. This is done by developing a statistical measure, associated with terms, for the connectivity of a document graph. In topic evolution discovery, we capture the inherent topology of topic evolution in a corpus by discovering quantized units of evolutionary change in content and connecting them by summarizing the underlying document network. We note that topic words and nontopic words differ in their distributional properties and use this observation to discover topics via making a document network. We use the same observation to enhance the quality of topics obtained by Latent Dirichlet Allocation.
Date Issued
2011-08-31Subject
topic; evolution; network
Committee Chair
Hopcroft, John E
Committee Member
Joachims, Thorsten; Shmoys, David B
Degree Discipline
Computer Science
Degree Name
Ph. D., Computer Science
Degree Level
Doctor of Philosophy
Type
dissertation or thesis