Plagiarism Detection in arXiv

Other Titles
Abstract
We describe a large-scale application of methods for finding plagiarism and self-plagiarism in research document collections. The methods are applied to a collection of 284,834 documents collected by arXiv.org over a 14 year period, covering a few different research disciplines. The methodology efficiently detects a variety of problematic author behaviors, and heuristics are developed to reduce the number of false positives. The methods are also efficient enough to implement as a real-time submission screen for a collection many times larger.
Journal / Series
Volume & Issue
Description
Sponsorship
Date Issued
2006-09-28
Publisher
Cornell University
Keywords
computer science; technical report
Location
Effective Date
Expiration Date
Sector
Employer
Union
Union Local
NAICS
Number of Workers
Committee Chair
Committee Co-Chair
Committee Member
Degree Discipline
Degree Name
Degree Level
Related Version
Related DOI
Related To
Related Part
Based on Related Item
Has Other Format(s)
Part of Related Item
Related To
Related Publication(s)
Link(s) to Related Publication(s)
References
Link(s) to Reference(s)
Previously Published As
http://techreports.library.cornell.edu:8081/Dienst/UI/1.0/Display/cul.cis/TR2006-2046
Government Document
ISBN
ISMN
ISSN
Other Identifiers
Rights
Rights URI
Types
technical report
Accessibility Feature
Accessibility Hazard
Accessibility Summary
Link(s) to Catalog Record