Computational Linguistic Models Of Deceptive Opinion Spam

Ott, Myle2013-09-162018-08-202013-08-19bibid: 8267361https://hdl.handle.net/1813/34309Consumers increasingly rely on user-generated online reviews when making purchase decisions. However, the ease of posting reviews online, potentially anonymously, raises questions about whether unscrupulous business may be posting deceptive opinion spam-fraudulent or fictitious reviews that have been deliberately written to sound authentic, in order to deceive the reader. Unfortunately, as this thesis demonstrates, people are largely unable to identify deceptive opinion spam. Accordingly, it is challenging to obtain deceptive reviews for study, and, moreover, very little is known about the prevalence of deception among online reviews. This thesis presents the first thorough investigation of deceptive opinion spam in online review communities. First, we present a novel approach for obtaining deceptive opinion spam, based on crowdsourcing, which we apply to obtain 1,280 known (gold standard) deceptive reviews of hotels and restaurants. After confirming that people are poor judges of deceptive reviews, we then present results showing that supervised Machine Learning text classifiers can be trained to detect deceptive opinion spam with nearly 90% accuracy in some settings, far surpassing human detection performance. Next, we explore linguistic features associated with deceptive reviews, and compare these features across three contextual dimensions, including the sentiment of the review (positive vs. negative), the domain of the review (hotel vs. restaurant), and the domain expertise of the reviewer (crowdsourced workers vs. hotel employees). Finally, we present a Bayesian framework for estimating the prevalence of deception among online reviews, based on the predictions made by our Machine Learning text classifiers. Applying this framework to six online hotel review communities, we present the first empirical estimates of the rates of deception among online hotel reviews, and additionally evaluate the efficacy of increasing review posting costs to reduce the prevalence of deceptive opinion spam.en-USComputational Linguistic Models Of Deceptive Opinion Spamdissertation or thesis