Saved datasets
Last updated
Download format
Croissant
Croissant is a format for Machine Learning datasets
Learn more about this at mlcommons.org/croissant.
Usage rights
License from data provider
Please review the applicable license to make sure your contemplated use is permitted.
Topic
Provider
Free
Cost to access
Described as free to access or have a license that allows redistribution.
2 datasets found
  1. W

    Webis-Gmane-19

    • webis.de
    3766984
    Updated 2019
  2. Webis Gmane Email Corpus 2019

    • zenodo.org
    Updated Jun 4, 2020
  3. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Janek Bevendorff; Khalid Al-Khatib; Martin Potthast; Benno Stein (2019). Webis-Gmane-19 [Dataset]. http://doi.org/10.5281/zenodo.3766984

Webis-Gmane-19

Explore at:
2 scholarly articles cite this dataset (View in Google Scholar)
3766984Available download formats
Dataset updated
2019
Dataset provided by
Kassel University, hessian.AI, and ScaDS.AI
The Web Technology & Information Systems Network
Bauhaus-Universität Weimar
University of Groningen
Authors
Janek Bevendorff; Khalid Al-Khatib; Martin Potthast; Benno Stein
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

A large-scale corpus of over 153 million fully-segmented emails from 14.635 public mailing lists.

The Webis Gmane Email Corpus 2019 is a dataset of more than 153 million parsed and segmented emails crawled between February and May 2019 from gmane.io covering more than 20 years of public mailing lists. The dataset has been published as a resource at ACL 2020.

Search
Clear search
Close search
Google apps
Main menu