2 datasets found
  1. PAN Wikipedia Quality Flaw Corpus 2012 (PAN-WQF-12)

    • zenodo.org
    • explore.openaire.eu
    • +1more
    application/gzip
    Updated Jan 24, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Maik Anderka; Benno Stein; Benno Stein; Michael Völske; Michael Völske; Maik Anderka (2020). PAN Wikipedia Quality Flaw Corpus 2012 (PAN-WQF-12) [Dataset]. http://doi.org/10.5281/zenodo.3250135
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Maik Anderka; Benno Stein; Benno Stein; Michael Völske; Michael Völske; Maik Anderka
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The PAN Wikipedia Quality Flaw Corpus 2012, PAN-WQF-12, provides human-labeled English Wikipedia articles that contain specific quality flaws.

    The corpus comprises 1,592,226 articles extracted from the English Wikipedia snapshot from January 4th, 2012. A subset of 208,228 articles is labled with ten specific quality flaws, which are listed in the following table. The labeling is based on human-defined cleanup tags. In addition, the corpus comprises 1,383,998 articles that have not been tagged with any cleanup tag.

  2. W

    PAN-WQF-12

    • anthology.aicmu.ac.cn
    • webis.de
    3250135
    Updated 2012
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Maik Anderka; Benno Stein (2012). PAN-WQF-12 [Dataset]. http://doi.org/10.5281/zenodo.3250135
    Explore at:
    3250135Available download formats
    Dataset updated
    2012
    Dataset provided by
    The Web Technology & Information Systems Network
    Diebold Nixdorf
    Bauhaus-Universität Weimar
    Authors
    Maik Anderka; Benno Stein
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The PAN Wikipedia Quality Flaw Corpus 2012, PAN-WQF-12, provides human-labeled English Wikipedia articles that contain specific quality flaws.

  3. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Maik Anderka; Benno Stein; Benno Stein; Michael Völske; Michael Völske; Maik Anderka (2020). PAN Wikipedia Quality Flaw Corpus 2012 (PAN-WQF-12) [Dataset]. http://doi.org/10.5281/zenodo.3250135
Organization logo

PAN Wikipedia Quality Flaw Corpus 2012 (PAN-WQF-12)

Explore at:
3 scholarly articles cite this dataset (View in Google Scholar)
application/gzipAvailable download formats
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Maik Anderka; Benno Stein; Benno Stein; Michael Völske; Michael Völske; Maik Anderka
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

The PAN Wikipedia Quality Flaw Corpus 2012, PAN-WQF-12, provides human-labeled English Wikipedia articles that contain specific quality flaws.

The corpus comprises 1,592,226 articles extracted from the English Wikipedia snapshot from January 4th, 2012. A subset of 208,228 articles is labled with ten specific quality flaws, which are listed in the following table. The labeling is based on human-defined cleanup tags. In addition, the corpus comprises 1,383,998 articles that have not been tagged with any cleanup tag.

Search
Clear search
Close search
Google apps
Main menu