Extracting Opinions And Events From Text: Joint Inference Approaches
With the rapid growth of text data on the Web and on personal devices, there is an increasing need to automatically process text and unlock different types of information from it. Opinions and events are two important types of information that appear ubiquitously in text. One represents subjective information, concerning a person's attitudes, beliefs, sentiment, judgements and evaluations, and the other represents factual information concerning what happens in the real world. The ability to extract and interpret opinions and events is essential for many natural language processing (NLP) applications such as news summarization, open-domain question answering, social media analysis, and government document management. While NLP has made great progress on information extraction tasks such as named entity recognition (entities like persons, organizations and locations) and named entity resolution (determining references of entities), much less progress has been made on the extraction of complex information such as opinions and events. Existing methods mostly extract individual components and attributes of opinions and events without accounting for their dependencies. Moreover, they often make phrase- or sentence-level predictions without considering the larger discourse context, such as a document or a conversation. This dissertation presents models that address these two shortcomings. To capture the interdependencies among different information elements, we pro- pose models that can perform joint inference across different but related extraction subtasks, including joint opinion entity extraction and relation extraction, and joint opinion segmentation and attribute classification. Extensive experiments show that joint inference yields significant improvements when compared to standard approaches that combine the subtasks in a pipeline, and achieves state-of-the-art performance on the extraction subtasks. To facilitate global discourse understanding, we explore machine learning techniques that allow the integration of linguistic evidence at multiple levels of context - at the word, sentence, and document level - into coherent probabilistic models. Specifically, we develop a structured learning approach that can leverage intra- and inter-sentential cues in fine-grained sentiment analysis, and a Bayesian clustering model for event coreference resolution within a document and across documents. In both applications, we demonstrate the advantages of learning from multiple levels of contextual evidence.
Opinion extraction; Event extraction; Joint inference
Frazier,Peter; Gehrke,Johannes E.
Ph. D., Computer Science
Doctor of Philosophy
dissertation or thesis