EMPIRICAL METHODS FOR FINE-GRAINED OPINION EXTRACTION FROM TEXT
Opinions are everywhere. The op/ed pages of newspapers, political blogs, and consumer websites like epinions.com are just some examples of the textual opinions available to readers. And there are many consumers who are interested in following these opinions - intelligence analysts who track the opinions of foreign countries, public relation firms who want to ensure positive opinions for their clients, pollsters who want to know the public's opinions about politicians, and companies who want to know customers' opinions about their products. The problem faced by all of these consumers of opinion is that there is such a wealth of text to process that it is hard to read it all. Central to processing the opinions in these text will be solving two specific problems - identifying expressions of opinion, and identifying their hierarchical structure. We demonstrate solutions involving empirical natural language processing techniques. Although empirical, data-driven methods such as these have become the norm in natural language processing, little work has been done in analyzing their impact on the reproducibility, efficiency, and effectiveness of research. We address two specific problems in this area. We introduce a lightweight computational workflow system to improve the reproducibility and efficiency of machine learning and natural language processing experiments. And we investigate the process of feature generation, setting out desiderata for an ideal process and exploring the effectiveness of several alternatives. Both are investigated in the context of the natural language learning tasks set out earlier.
computer science; natural language processing; machine learning; subjectivity; sentiment; opinion; computational workflow; computational linguistics
dissertation or thesis