Computational Linguistic Models Of Deceptive Opinion Spam

Other Titles



Consumers increasingly rely on user-generated online reviews when making purchase decisions. However, the ease of posting reviews online, potentially anonymously, raises questions about whether unscrupulous business may be posting deceptive opinion spam-fraudulent or fictitious reviews that have been deliberately written to sound authentic, in order to deceive the reader. Unfortunately, as this thesis demonstrates, people are largely unable to identify deceptive opinion spam. Accordingly, it is challenging to obtain deceptive reviews for study, and, moreover, very little is known about the prevalence of deception among online reviews. This thesis presents the first thorough investigation of deceptive opinion spam in online review communities. First, we present a novel approach for obtaining deceptive opinion spam, based on crowdsourcing, which we apply to obtain 1,280 known (gold standard) deceptive reviews of hotels and restaurants. After confirming that people are poor judges of deceptive reviews, we then present results showing that supervised Machine Learning text classifiers can be trained to detect deceptive opinion spam with nearly 90% accuracy in some settings, far surpassing human detection performance. Next, we explore linguistic features associated with deceptive reviews, and compare these features across three contextual dimensions, including the sentiment of the review (positive vs. negative), the domain of the review (hotel vs. restaurant), and the domain expertise of the reviewer (crowdsourced workers vs. hotel employees). Finally, we present a Bayesian framework for estimating the prevalence of deception among online reviews, based on the predictions made by our Machine Learning text classifiers. Applying this framework to six online hotel review communities, we present the first empirical estimates of the rates of deception among online hotel reviews, and additionally evaluate the efficacy of increasing review posting costs to reduce the prevalence of deceptive opinion spam.

Journal / Series

Volume & Issue



Date Issued





Effective Date

Expiration Date




Union Local


Number of Workers

Committee Chair

Cardie, Claire T

Committee Co-Chair

Committee Member

Hancock, Jeffrey T.
Hopcroft, John E

Degree Discipline

Computer Science

Degree Name

Ph. D., Computer Science

Degree Level

Doctor of Philosophy

Related Version

Related DOI

Related To

Related Part

Based on Related Item

Has Other Format(s)

Part of Related Item

Related To

Related Publication(s)

Link(s) to Related Publication(s)


Link(s) to Reference(s)

Previously Published As

Government Document




Other Identifiers


Rights URI


dissertation or thesis

Accessibility Feature

Accessibility Hazard

Accessibility Summary

Link(s) to Catalog Record