Show simple item record

dc.contributor.authorOtt, Myleen_US
dc.date.accessioned2013-09-16T16:43:01Z
dc.date.issued2013-08-19en_US
dc.identifier.urihttp://hdl.handle.net/1813/34309
dc.description.abstractConsumers increasingly rely on user-generated online reviews when making purchase decisions. However, the ease of posting reviews online, potentially anonymously, raises questions about whether unscrupulous business may be posting deceptive opinion spam-fraudulent or fictitious reviews that have been deliberately written to sound authentic, in order to deceive the reader. Unfortunately, as this thesis demonstrates, people are largely unable to identify deceptive opinion spam. Accordingly, it is challenging to obtain deceptive reviews for study, and, moreover, very little is known about the prevalence of deception among online reviews. This thesis presents the first thorough investigation of deceptive opinion spam in online review communities. First, we present a novel approach for obtaining deceptive opinion spam, based on crowdsourcing, which we apply to obtain 1,280 known (gold standard) deceptive reviews of hotels and restaurants. After confirming that people are poor judges of deceptive reviews, we then present results showing that supervised Machine Learning text classifiers can be trained to detect deceptive opinion spam with nearly 90% accuracy in some settings, far surpassing human detection performance. Next, we explore linguistic features associated with deceptive reviews, and compare these features across three contextual dimensions, including the sentiment of the review (positive vs. negative), the domain of the review (hotel vs. restaurant), and the domain expertise of the reviewer (crowdsourced workers vs. hotel employees). Finally, we present a Bayesian framework for estimating the prevalence of deception among online reviews, based on the predictions made by our Machine Learning text classifiers. Applying this framework to six online hotel review communities, we present the first empirical estimates of the rates of deception among online hotel reviews, and additionally evaluate the efficacy of increasing review posting costs to reduce the prevalence of deceptive opinion spam.en_US
dc.language.isoen_USen_US
dc.titleComputational Linguistic Models Of Deceptive Opinion Spamen_US
dc.typedissertation or thesisen_US
dc.description.embargo2018-08-20
thesis.degree.disciplineComputer Scienceen_US
thesis.degree.grantorCornell Universityen_US
thesis.degree.levelDoctor of Philosophyen_US
thesis.degree.namePh.D. of Computer Scienceen_US
dc.contributor.chairCardie, Claire Ten_US
dc.contributor.committeeMemberHancock, Jeffrey T.en_US
dc.contributor.committeeMemberHopcroft, John Een_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record

Statistics