Feedback
17 results found
  1. Webis-Sentences-17

    • webis.de
    • temir.org
    Published Feb 27, 2017
  2. Webis-Simple-Sentences-17 Corpus

    • zenodo.org
    • search.datacite.org
    Published Feb 27, 2017
  3. Webis-QSpell-17

    • webis.de
    • temir.org
    Published 2017
  4. Webis-Mnemonics-17

    • webis.de
    • temir.org
    • +1more
    Published 2017
  5. Webis Query Spelling Corpus 2017 (Webis-QSpell-17)

    • zenodo.org
    Published Aug 11, 2017
  6. Positive and Negative Sentences

    • www.kaggle.com
    Updated Jan 19, 2018
  7. SNAP Memetracker

    • www.kaggle.com
    Updated Nov 21, 2016
  8. G

    Nepali ASR

    • ai.google
  9. g

    Federal Justice Statistics Program Data, 1978-1994: [United States]

    • datasearch.gesis.org
    Published Aug 5, 2015
  10. E

    hfradar_1b42_be47_cf37

    • erddap.osupytheas.fr
    Created Aug 23, 2018
  11. z

    A Canadian French Emotional Speech Dataset

    • zenodo.org
    Published Apr 17, 2018
  12. g

    Data from: Improving Prison Classification Procedures in Vermont: Applying...

    • datasearch.gesis.org
    Published Aug 5, 2015
  13. Prison in India

    • data.world
    Updated Sep 11, 2018
  14. Pre-trained Word Vectors for Spanish

    • www.kaggle.com
    Updated Aug 9, 2017
  15. g

    Archival Version

    • datasearch.gesis.org
    Published Aug 5, 2015
  16. d

    Data from: Evaluation of Boot Camps for Juvenile Offenders in Cleveland,...

    • www.da-ra.de
    Published Nov 2, 1999
  17. g

    Effects of Local Sanctions on Serious Criminal Offending in Cities with...

    • datasearch.gesis.org
    Published Aug 5, 2015
  18. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
Facebook
Twitter
Email
Click to copy link
Link copied

Webis-Sentences-17

  • Dataset published Feb 27, 2017
Dataset provided by
Bauhaus University, Weimarhttp://www.uni-weimar.de/
The Web Technology & Information Systems Network
Authors
Stein, Benno; Kiesel, Johannes; Lucks, Stefan
Description

The Webis-Sentences-17 corpus is a collection of 3,369,618,811 sentences extracted from the ClueWeb12 web crawl. It is designed to allow for statistical analyses of human-written sentences. More details on the sentence extraction can be found in the associated publication. The Webis-Simple-Sentences-17 corpus contains 471,085,690 English sentences from the Webis-Sentences-17 corpus. The sentences were sampled to achieve a level of sentence complexity similar to the one of sentences that humans make up as a memory aid for remembering passwords. Sentence complexity was determined by syllables per word. Both corpora are split in training and test set as they are used in the associated publication. The test set is extracted from part 00 of the ClueWeb12, while the training set is extracted from the other parts.

Search
Clear search
Close search
Google apps
Main menu