Using Graphs For Topic Discovery
As large-scale digital text collections become abundant, the necessity of automatically summarizing text data by discovering topics and the evolution of topics in them is well-justified and there is surge of research interest in the task. We use graphs for topic discovery and topic evolution discovery by mining the statistical properties of graphs associated with the text data. Considering that an increasing number of text collections have some kind of networks associated with the data (text data in social network service, research paper collections, digital text with user browsing history), there is a great potential in using graphs for the task of text mining. Our work on topic and topic evolution discovery shows qualitatively different results from the existing approaches in that the discovered topics exhibit concreteness with a variety of size and time dynamics and in that the rich topology of topic evolution is captured in the result. We discover topics by mining the correlation between topic terms and the citation graph. This is done by developing a statistical measure, associated with terms, for the connectivity of a document graph. In topic evolution discovery, we capture the inherent topology of topic evolution in a corpus by discovering quantized units of evolutionary change in content and connecting them by summarizing the underlying document network. We note that topic words and nontopic words differ in their distributional properties and use this observation to discover topics via making a document network. We use the same observation to enhance the quality of topics obtained by Latent Dirichlet Allocation.
topic; evolution; network
Hopcroft, John E
Joachims, Thorsten; Shmoys, David B
Ph.D. of Computer Science
Doctor of Philosophy
dissertation or thesis