1 dataset found
  1. Webis Text Reuse Corpus 2012

    • zenodo.org
    xz
    Updated Jan 24, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Martin Potthast; Martin Potthast; Matthias Hagen; Matthias Hagen; Michael Völske; Michael Völske; Jakob Gomoll; Benno Stein; Benno Stein; Jakob Gomoll (2020). Webis Text Reuse Corpus 2012 [Dataset]. http://doi.org/10.5281/zenodo.1341602
    Explore at:
    xzAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Martin Potthast; Martin Potthast; Matthias Hagen; Matthias Hagen; Michael Völske; Michael Völske; Jakob Gomoll; Benno Stein; Benno Stein; Jakob Gomoll
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    The Webis Text Reuse Corpus 2012 (Webis-TRC-12) compiles manually written documents obtained from a completely controlled, yet representative environment that emulates the web. Each document in the corpus is about one of the 150 topics used at the TREC Web Tracks 2009–2011, thus forming a strong connection with existing evaluation efforts. Writers, hired at the crowdsourcing platform oDesk, had to retrieve sources for a given topic and to reuse text from what they found. Part of the corpus are detailed interaction logs that consistently cover the search for sources as well as the creation of documents. This will allow for in-depth analyses of how text is composed if a writer is at liberty to reuse texts from a third party.

  2. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Martin Potthast; Martin Potthast; Matthias Hagen; Matthias Hagen; Michael Völske; Michael Völske; Jakob Gomoll; Benno Stein; Benno Stein; Jakob Gomoll (2020). Webis Text Reuse Corpus 2012 [Dataset]. http://doi.org/10.5281/zenodo.1341602
Organization logo

Webis Text Reuse Corpus 2012

Explore at:
3 scholarly articles cite this dataset (View in Google Scholar)
xzAvailable download formats
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Martin Potthast; Martin Potthast; Matthias Hagen; Matthias Hagen; Michael Völske; Michael Völske; Jakob Gomoll; Benno Stein; Benno Stein; Jakob Gomoll
License

Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically

Description

The Webis Text Reuse Corpus 2012 (Webis-TRC-12) compiles manually written documents obtained from a completely controlled, yet representative environment that emulates the web. Each document in the corpus is about one of the 150 topics used at the TREC Web Tracks 2009–2011, thus forming a strong connection with existing evaluation efforts. Writers, hired at the crowdsourcing platform oDesk, had to retrieve sources for a given topic and to reuse text from what they found. Part of the corpus are detailed interaction logs that consistently cover the search for sources as well as the creation of documents. This will allow for in-depth analyses of how text is composed if a writer is at liberty to reuse texts from a third party.

Search
Clear search
Close search
Google apps
Main menu