JavaScript is disabled for your browser. Some features of this site may not work without it.
Plagiarism Detection in arXiv

Author
Sorokina, Daria; Gehrke, Johannes; Warner, Simeon; Ginsparg, Paul
Abstract
We describe a large-scale application of methods for finding
plagiarism and self-plagiarism in research document collections. The methods are applied to a collection of 284,834 documents collected by arXiv.org over a 14 year period, covering a few different research disciplines. The methodology efficiently detects a variety of problematic author behaviors, and heuristics are developed to reduce the number of false positives. The methods are also efficient enough to implement as a real-time submission screen for a collection many times larger.
Date Issued
2006-09-28Publisher
Cornell University
Subject
computer science; technical report
Previously Published As
http://techreports.library.cornell.edu:8081/Dienst/UI/1.0/Display/cul.cis/TR2006-2046
Type
technical report