3 datasets found

W
PAN-PC-10
webis.de
anthology.aicmu.ac.cn
3250123
Updated 2010
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Martin Potthast; Benno Stein; Andreas Eiselt (2010). PAN-PC-10 [Dataset]. http://doi.org/10.5281/zenodo.3250123
Explore at:
3250123Available download formats
Unique identifier
https://doi.org/10.5281/zenodo.3250123
Dataset updated
2010
Dataset provided by
University of Kassel, hessian.AI, and ScaDS.AI
The Web Technology & Information Systems Network
Bauhaus-Universität Weimar
Authors
Martin Potthast; Benno Stein; Andreas Eiselt
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This corpus is outdated. Please use its successor PAN-PC-11.
Z
PAN Plagiarism Corpus 2010 (PAN-PC-10)
data.niaid.nih.gov
zenodo.org
Updated Jan 24, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Eiselt, Andreas (2020). PAN Plagiarism Corpus 2010 (PAN-PC-10) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3250122
Explore at:
Dataset updated
Jan 24, 2020
Dataset provided by
Eiselt, Andreas
Potthast, Martin
Rosso, Paolo
Barrón-Cedeño, Alberto
Stein, Benno
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This corpus is outdated. Please use its successor PAN-PC-11: https://doi.org/10.5281/zenodo.3250095

The PAN plagiarism corpus 2010 (PAN-PC-10) is a corpus for the evaluation of automatic plagiarism detection algorithms. For research purposes the corpus can be used free of charge.

The PAN-PC-10 contains documents in which artificial plagiarism has been inserted automatically as well as documents in which simulated plagiarism has been inserted manually. The former have been constructed using a so-called random plagiarist, a computer program which constructs plagiarism according to a number of parameters, while the latter have been obtained with crowdsourcing via Amazon's Mechanical Turk.
W
Webis-CPC-11
anthology.aicmu.ac.cn
webis.de
3251771
Updated 2011
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Martin Potthast; Benno Stein (2011). Webis-CPC-11 [Dataset]. http://doi.org/10.5281/zenodo.3251771
Explore at:
3251771Available download formats
Unique identifier
https://doi.org/10.5281/zenodo.3251771
Dataset updated
2011
Dataset provided by
Leipzig University
The Web Technology & Information Systems Network
Bauhaus-Universität Weimar
Authors
Martin Potthast; Benno Stein
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Webis Crowd Paraphrase Corpus 2011 (Webis-CPC-11) contains 7,859 candidate paraphrases obtained from Mechanical Turk crowdsourcing. The corpus is made up of 4,067 accepted paraphrases, 3,792 rejected non-paraphrases, and the original texts. These samples have formed part of PAN 2010 international plagiarism detection competition, but were not previously available separate to rest of the competition data.
Not seeing a result you expected?
Learn how you can add new datasets to our index.