Search
Clear search
Close search
Main menu
Google apps
100+ datasets found
  1. W

    Webis-Dataset-Reviews-21

    • webis.de
    • anthology.aicmu.ac.cn
    • +1more
    4491927
    Updated 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nikolay Kolyada; Martin Potthast; Benno Stein (2021). Webis-Dataset-Reviews-21 [Dataset]. http://doi.org/10.5281/zenodo.4491927
    Explore at:
    4491927Available download formats
    Dataset updated
    2021
    Dataset provided by
    Bauhaus-Universität Weimar and Leipzig University
    University of Kassel, hessian.AI, and ScaDS.AI
    The Web Technology & Information Systems Network
    Bauhaus-Universität Weimar
    Authors
    Nikolay Kolyada; Martin Potthast; Benno Stein
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Webis-Dataset-Reviews-21 corpus comprises the curated list of 13,372 NLP-related datasets and their 539,411 mentions extracted from all the publications available in ACL Anthology corpus.

  2. W

    Webis-Revenue-10

    • webis.de
    • live.european-language-grid.eu
    • +2more
    3257461
    Updated 2010
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Henning Wachsmuth; Peter Prettenhofer; Benno Stein (2010). Webis-Revenue-10 [Dataset]. http://doi.org/10.5281/zenodo.3257461
    Explore at:
    3257461Available download formats
    Dataset updated
    2010
    Dataset provided by
    DataRobot, Inc.
    The Web Technology & Information Systems Network
    Leibniz Universität Hannover
    Bauhaus-Universität Weimar
    Authors
    Henning Wachsmuth; Peter Prettenhofer; Benno Stein
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The corpus consists of 1,128 German news articles from the years 2003 to 2009, collected from 29 general and business news websites. In each article, statements on the revenue of companies or markets were manually annotated, i.e., sentences and entities that refer to a statement are tagged and linked to each other.

  3. W

    Webis-CLS-10

    • anthology.aicmu.ac.cn
    • webis.de
    3251672
    Updated 2010
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Peter Prettenhofer; Benno Stein (2010). Webis-CLS-10 [Dataset]. http://doi.org/10.5281/zenodo.3251672
    Explore at:
    3251672Available download formats
    Dataset updated
    2010
    Dataset provided by
    The Web Technology & Information Systems Network
    Bauhaus-Universität Weimar
    DataRobot, Inc.
    Authors
    Peter Prettenhofer; Benno Stein
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Cross-Lingual Sentiment (CLS) dataset comprises about 800.000 Amazon product reviews in the four languages English, German, French, and Japanese.

  4. W

    Webis-SameSide-19

    • anthology.aicmu.ac.cn
    • webis.de
    • +1more
    4382353
    Updated 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    - (2020). Webis-SameSide-19 [Dataset]. http://doi.org/10.5281/zenodo.4382353
    Explore at:
    4382353Available download formats
    Dataset updated
    2020
    Dataset provided by
    The Web Technology & Information Systems Network
    Authors
    -
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset contains argument pairs which are sampled from args.me dataset and cover two topics: abortion and gay marriage. The dataset is used in the same side stance classification challenge which consists of two experiments (cross-topics and within topics)

  5. W

    Webis-CMV-20

    • anthology.aicmu.ac.cn
    • webis.de
    3778297
    Updated 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nikolay Kolyada; Benno Stein (2020). Webis-CMV-20 [Dataset]. http://doi.org/10.5281/zenodo.3778297
    Explore at:
    3778297Available download formats
    Dataset updated
    2020
    Dataset provided by
    The Web Technology & Information Systems Network
    Bauhaus-Universität Weimar
    Authors
    Nikolay Kolyada; Benno Stein
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Webis-CMV-20 dataset comprises all available posts and comments in the ChangeMyView subreddit from the foundation of the subreddit in 2005, until September 2017. From these, we have derived two sub-datasets for the tasks of persuasiveness prediction, and opinion malleability prediction. In addition, the corpus comprises historical posts by CMV authors, and derived personal characteristics.

  6. W

    Webis-Gmane-19

    • webis.de
    • anthology.aicmu.ac.cn
    3766984
    Updated 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Janek Bevendorff; Khalid Al-Khatib; Martin Potthast; Benno Stein (2019). Webis-Gmane-19 [Dataset]. http://doi.org/10.5281/zenodo.3766984
    Explore at:
    3766984Available download formats
    Dataset updated
    2019
    Dataset provided by
    University of Groningen
    University of Kassel, hessian.AI, and ScaDS.AI
    The Web Technology & Information Systems Network
    Bauhaus-Universität Weimar and Leipzig University
    Bauhaus-Universität Weimar
    Authors
    Janek Bevendorff; Khalid Al-Khatib; Martin Potthast; Benno Stein
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A large-scale corpus of over 153 million fully-segmented emails from 14.635 public mailing lists.

    The Webis Gmane Email Corpus 2019 is a dataset of more than 153 million parsed and segmented emails crawled between February and May 2019 from gmane.io covering more than 20 years of public mailing lists. The dataset has been published as a resource at ACL 2020.

  7. W

    Webis-SameSentiment-21

    • anthology.aicmu.ac.cn
    • webis.de
    5495793
    Updated 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Martin Potthast (2021). Webis-SameSentiment-21 [Dataset]. http://doi.org/10.5281/zenodo.5495793
    Explore at:
    5495793Available download formats
    Dataset updated
    2021
    Dataset provided by
    The Web Technology & Information Systems Network
    Leipzig University
    Authors
    Martin Potthast
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Webis-SameSentiment-21 dataset is a collection of sentiment review pairs for Same Sentiment Classification. The dataset only contains the pair ids (business and review id) to allow recreation of the dataset. The actual review text has to be downloaded from Yelp.

  8. W

    Webis-PC-08

    • anthology.aicmu.ac.cn
    • webis.de
    3254618
    Updated 2008
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Benno Stein; Sven Meyer zu Eißen (2008). Webis-PC-08 [Dataset]. http://doi.org/10.5281/zenodo.3254618
    Explore at:
    3254618Available download formats
    Dataset updated
    2008
    Dataset provided by
    The Web Technology & Information Systems Network
    Bauhaus-Universität Weimar
    Authors
    Benno Stein; Sven Meyer zu Eißen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This corpus is outdated. Please use its successor PAN-PC-11.

  9. W

    Data from: Webis-Web-Archive-17

    • webis.de
    • anthology.aicmu.ac.cn
    • +2more
    1002203
    Updated 2017
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Johannes Kiesel; Martin Potthast; Matthias Hagen; Benno Stein; Florian Kneist (2017). Webis-Web-Archive-17 [Dataset]. http://doi.org/10.5281/zenodo.1002203
    Explore at:
    1002203Available download formats
    Dataset updated
    2017
    Dataset provided by
    University of Kassel, hessian.AI, and ScaDS.AI
    The Web Technology & Information Systems Network
    Friedrich Schiller University Jena
    GESIS - Leibniz Institute for the Social Sciences
    Bauhaus-Universität Weimar
    Authors
    Johannes Kiesel; Martin Potthast; Matthias Hagen; Benno Stein; Florian Kneist
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Webis-Web-Archive-17 comprises a total of 10,000 web page archives from mid-2017 that were carefully sampled from the Common Crawl to involve a mixture of high-ranking and low-ranking web pages. The dataset contains the web archive files, HTML DOM, and screenshots of each web page, as well as per-page annotations of visual web archive quality.

  10. W

    Webis-SameSide-21

    • webis.de
    • anthology.aicmu.ac.cn
    5380989
    Updated 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Erik Körner; Gerhard Heyer; Martin Potthast (2021). Webis-SameSide-21 [Dataset]. http://doi.org/10.5281/zenodo.5380989
    Explore at:
    5380989Available download formats
    Dataset updated
    2021
    Dataset provided by
    Sächsische Akademie der Wissenschaften zu Leipzig
    University of Kassel, hessian.AI, and ScaDS.AI
    The Web Technology & Information Systems Network
    Authors
    Erik Körner; Gerhard Heyer; Martin Potthast
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Webis-SameSide-21 dataset is a resampled dataset based on the Same Side Stance Classification shared task dataset.

  11. W

    Webis-Web-Archive-Quality-22

    • anthology.aicmu.ac.cn
    • webis.de
    6881334
    Updated 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Martin Potthast; Johannes Kiesel; Benno Stein (2022). Webis-Web-Archive-Quality-22 [Dataset]. http://doi.org/10.5281/zenodo.6881334
    Explore at:
    6881334Available download formats
    Dataset updated
    2022
    Dataset provided by
    The Web Technology & Information Systems Network
    Leipzig University
    Bauhaus-Universität Weimar
    Authors
    Martin Potthast; Johannes Kiesel; Benno Stein
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Webis-Web-Archive-Quality-22 comprises a total of 6,500 pairs of screenshots from web pages as they were archived and as they were reproduced from that archive, along with archive quality annotations and information of DOM elements on the screenshot.

  12. W

    Webis-Snippet-20

    • webis.de
    3653834
    Updated 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wei-Fan Chen; Shahbaz Syed; Benno Stein; Matthias Hagen; Martin Potthast (2020). Webis-Snippet-20 [Dataset]. http://doi.org/10.5281/zenodo.3653834
    Explore at:
    3653834Available download formats
    Dataset updated
    2020
    Dataset provided by
    University of Kassel, hessian.AI, and ScaDS.AI
    University of Bonn
    Friedrich Schiller University Jena
    NEC Laboratories Europe
    The Web Technology & Information Systems Network
    Bauhaus-Universität Weimar
    Authors
    Wei-Fan Chen; Shahbaz Syed; Benno Stein; Matthias Hagen; Martin Potthast
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Webis Abstractive Snippet Corpus 2020 (Webis-Snippet-20) comprises four abstractive snippet dataset from ClueWeb09, Clueweb12, and DMOZ descriptions. More than 10 million

  13. W

    Webis-NIL-21

    • anthology.aicmu.ac.cn
    • webis.de
    5092851
    Updated 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Martin Potthast; Benno Stein; Matthias Hagen (2021). Webis-NIL-21 [Dataset]. http://doi.org/10.5281/zenodo.5092851
    Explore at:
    5092851Available download formats
    Dataset updated
    2021
    Dataset provided by
    The Web Technology & Information Systems Network
    Leipzig University
    Friedrich Schiller University Jena
    Bauhaus-Universität Weimar
    Authors
    Martin Potthast; Benno Stein; Matthias Hagen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Webis Netspeak Instant Search Log 2021 (Webis-NIL-21) is an excerpt of the log of the Netspeak search engine. The dataset contains about 37,000 log entries, which correspond to keystroke interactions the users of Netspeak made with it's search interface while entering their queries. This enables the study of instant search logs in general, and that of identifying keystroke interactions belonging to the same query in particular. The latter is annotated in the log.

  14. W

    Webis-ConcluGen-21

    • webis.de
    • anthology.aicmu.ac.cn
    4818133
    Updated 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shahbaz Syed; Milad Alshomary; Henning Wachsmuth; Martin Potthast (2021). Webis-ConcluGen-21 [Dataset]. http://doi.org/10.5281/zenodo.4818133
    Explore at:
    4818133Available download formats
    Dataset updated
    2021
    Dataset provided by
    University of Kassel, hessian.AI, and ScaDS.AI
    The Web Technology & Information Systems Network
    NEC Laboratories Europe
    Leibniz Universität Hannover
    University of Groningen
    Authors
    Shahbaz Syed; Milad Alshomary; Henning Wachsmuth; Martin Potthast
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The corpus contains 136,996 (argumentative text, conclusion) pairs for the task of informative conclusion generation. For each argument in the corpus, argumentative knowledge such as discussion topic, conclusion targets and argument aspects are provided.

  15. W

    Webis-Clickbait-16

    • webis.de
    • anthology.aicmu.ac.cn
    • +3more
    3251557
    Updated 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Martin Potthast; Benno Stein; Matthias Hagen; Sebastian Köpsel (2016). Webis-Clickbait-16 [Dataset]. http://doi.org/10.5281/zenodo.3251557
    Explore at:
    3251557Available download formats
    Dataset updated
    2016
    Dataset provided by
    University of Kassel, hessian.AI, and ScaDS.AI
    The Web Technology & Information Systems Network
    Friedrich Schiller University Jena
    Bauhaus-Universität Weimar
    Authors
    Martin Potthast; Benno Stein; Matthias Hagen; Sebastian Köpsel
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Webis Clickbait Corpus 2016 (Webis-Clickbait-16) comprises 2992 Twitter tweets sampled from top 20 news publishers as per retweets in 2014. The tweets have been manually annotated by three independent annotators with regard to whether they can be considered clickbait. A total of 767 tweets are considered clickbait by the majority of annotators. The majority vote of reviewers can be used as a ground truth to build clickbait detection technology. This corpus is the first of its kind and gives rise to the development of technology to tackle clickbait.

  16. W

    Webis-Editorials-16

    • webis.de
    • anthology.aicmu.ac.cn
    • +3more
    3254405
    Updated 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Steve Göring; Henning Wachsmuth; Johannes Kiesel; Matthias Hagen; Benno Stein (2016). Webis-Editorials-16 [Dataset]. http://doi.org/10.5281/zenodo.3254405
    Explore at:
    3254405Available download formats
    Dataset updated
    2016
    Dataset provided by
    GESIS - Leibniz Institute for the Social Sciences
    Leibniz Universität Hannover
    The Web Technology & Information Systems Network
    Friedrich Schiller University Jena
    Bauhaus-Universität Weimar
    University of Groningen
    Authors
    Steve Göring; Henning Wachsmuth; Johannes Kiesel; Matthias Hagen; Benno Stein
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Webis-Editorials-16 corpus is a novel corpus with 300 news editorials evenly selected from three diverse online news portals: Al Jazeera, Fox News, and The Guardian. The aim of the corpus is to study (1) the mining and classification of fine-grained types of argumentative discourse units and (2) the analysis of argumentation strategies pursued in editorials to achieve persuasion. To this end, each editorial contains manual type annotations of all units that capture the role that a unit plays in the argumentative discourse, such as assumption or statistics. The corpus consists of 14,313 units of six different types, each annotated by three professional annotators from the crowdsourcing platform upwork.com.

  17. VA FOIA Website

    • catalog.data.gov
    • data.va.gov
    • +4more
    Updated May 1, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of Veterans Affairs (2021). VA FOIA Website [Dataset]. https://catalog.data.gov/dataset/va-foia-website
    Explore at:
    Dataset updated
    May 1, 2021
    Dataset provided by
    United States Department of Veterans Affairshttp://va.gov/
    Description

    U.S. Department of Veterans Affairs Freedom of Information Act Service Webpage with many links to associated information.

  18. W

    webis-comparative-web-search-questions-20

    • webis.de
    Updated 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    *last name, first name (2020). webis-comparative-web-search-questions-20 [Dataset]. https://webis.de/data/webis-comparative-web-search-questions-20.html
    Explore at:
    Dataset updated
    2020
    Dataset provided by
    *Bauhaus-Universität Weimar
    The Web Technology & Information Systems Network
    Authors
    *last name, first name
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    *variables measured in dataset
    Measurement technique
    *technique or technology used in a Dataset
    Description

    Webis-comparative-web-search-questions-20 comprises 15,000 web questions collected from the public datasets. The questions are manually annotated as comparative or not. The comparative ones are annotated with more fine-grained subclasses.

  19. W

    Webis-WVC-07

    • webis.de
    3341473
    Updated 2007
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Martin Potthast; Benno Stein (2007). Webis-WVC-07 [Dataset]. http://doi.org/10.5281/zenodo.3341473
    Explore at:
    3341473Available download formats
    Dataset updated
    2007
    Dataset provided by
    University of Kassel, hessian.AI, and ScaDS.AI
    The Web Technology & Information Systems Network
    Bauhaus-Universität Weimar
    Authors
    Martin Potthast; Benno Stein
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This corpus is outdated. Please use its successors PAN-WVC-10 and PAN-WVC-11.

  20. d

    1950 Census: Official 1950 Census Website

    • catalog.data.gov
    • datasets.ai
    Updated Mar 11, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Office of Innovation (2023). 1950 Census: Official 1950 Census Website [Dataset]. https://catalog.data.gov/dataset/1950-census-official-1950-census-website
    Explore at:
    Dataset updated
    Mar 11, 2023
    Dataset provided by
    Office of Innovation
    Description

    "Website allows the public full access to the 1950 Census images, census maps and descriptions.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Nikolay Kolyada; Martin Potthast; Benno Stein (2021). Webis-Dataset-Reviews-21 [Dataset]. http://doi.org/10.5281/zenodo.4491927

Webis-Dataset-Reviews-21

Explore at:
2 scholarly articles cite this dataset (View in Google Scholar)
4491927Available download formats
Dataset updated
2021
Dataset provided by
Bauhaus-Universität Weimar and Leipzig University
University of Kassel, hessian.AI, and ScaDS.AI
The Web Technology & Information Systems Network
Bauhaus-Universität Weimar
Authors
Nikolay Kolyada; Martin Potthast; Benno Stein
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

The Webis-Dataset-Reviews-21 corpus comprises the curated list of 13,372 NLP-related datasets and their 539,411 mentions extracted from all the publications available in ACL Anthology corpus.