100+ datasets found

W
Webis-Dataset-Reviews-21
webis.de
anthology.aicmu.ac.cn
+1more
4491927
Updated 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nikolay Kolyada; Martin Potthast; Benno Stein (2021). Webis-Dataset-Reviews-21 [Dataset]. http://doi.org/10.5281/zenodo.4491927
Explore at:
4491927Available download formats
Unique identifier
https://doi.org/10.5281/zenodo.4491927
Dataset updated
2021
Dataset provided by
Bauhaus-Universität Weimar and Leipzig University
University of Kassel, hessian.AI, and ScaDS.AI
The Web Technology & Information Systems Network
Bauhaus-Universität Weimar
Authors
Nikolay Kolyada; Martin Potthast; Benno Stein
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Webis-Dataset-Reviews-21 corpus comprises the curated list of 13,372 NLP-related datasets and their 539,411 mentions extracted from all the publications available in ACL Anthology corpus.
W
Webis-Revenue-10
webis.de
live.european-language-grid.eu
+2more
3257461
Updated 2010
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Henning Wachsmuth; Peter Prettenhofer; Benno Stein (2010). Webis-Revenue-10 [Dataset]. http://doi.org/10.5281/zenodo.3257461
Explore at:
3257461Available download formats
Unique identifier
https://doi.org/10.5281/zenodo.3257461
Dataset updated
2010
Dataset provided by
DataRobot, Inc.
The Web Technology & Information Systems Network
Leibniz Universität Hannover
Bauhaus-Universität Weimar
Authors
Henning Wachsmuth; Peter Prettenhofer; Benno Stein
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The corpus consists of 1,128 German news articles from the years 2003 to 2009, collected from 29 general and business news websites. In each article, statements on the revenue of companies or markets were manually annotated, i.e., sentences and entities that refer to a statement are tagged and linked to each other.
W
Webis-CLS-10
anthology.aicmu.ac.cn
webis.de
3251672
Updated 2010
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Peter Prettenhofer; Benno Stein (2010). Webis-CLS-10 [Dataset]. http://doi.org/10.5281/zenodo.3251672
Explore at:
3251672Available download formats
Unique identifier
https://doi.org/10.5281/zenodo.3251672
Dataset updated
2010
Dataset provided by
The Web Technology & Information Systems Network
Bauhaus-Universität Weimar
DataRobot, Inc.
Authors
Peter Prettenhofer; Benno Stein
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Cross-Lingual Sentiment (CLS) dataset comprises about 800.000 Amazon product reviews in the four languages English, German, French, and Japanese.
W
Webis-SameSide-19
anthology.aicmu.ac.cn
webis.de
+1more
4382353
Updated 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
- (2020). Webis-SameSide-19 [Dataset]. http://doi.org/10.5281/zenodo.4382353
Explore at:
4382353Available download formats
Unique identifier
https://doi.org/10.5281/zenodo.4382353
Dataset updated
2020
Dataset provided by
The Web Technology & Information Systems Network
Authors
-
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset contains argument pairs which are sampled from args.me dataset and cover two topics: abortion and gay marriage. The dataset is used in the same side stance classification challenge which consists of two experiments (cross-topics and within topics)
W
Webis-CMV-20
anthology.aicmu.ac.cn
webis.de
3778297
Updated 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nikolay Kolyada; Benno Stein (2020). Webis-CMV-20 [Dataset]. http://doi.org/10.5281/zenodo.3778297
Explore at:
3778297Available download formats
Unique identifier
https://doi.org/10.5281/zenodo.3778297
Dataset updated
2020
Dataset provided by
The Web Technology & Information Systems Network
Bauhaus-Universität Weimar
Authors
Nikolay Kolyada; Benno Stein
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Webis-CMV-20 dataset comprises all available posts and comments in the ChangeMyView subreddit from the foundation of the subreddit in 2005, until September 2017. From these, we have derived two sub-datasets for the tasks of persuasiveness prediction, and opinion malleability prediction. In addition, the corpus comprises historical posts by CMV authors, and derived personal characteristics.
W
Webis-Gmane-19
webis.de
anthology.aicmu.ac.cn
3766984
Updated 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Janek Bevendorff; Khalid Al-Khatib; Martin Potthast; Benno Stein (2019). Webis-Gmane-19 [Dataset]. http://doi.org/10.5281/zenodo.3766984
Explore at:
3766984Available download formats
Unique identifier
https://doi.org/10.5281/zenodo.3766984
Dataset updated
2019
Dataset provided by
University of Groningen
University of Kassel, hessian.AI, and ScaDS.AI
The Web Technology & Information Systems Network
Bauhaus-Universität Weimar and Leipzig University
Bauhaus-Universität Weimar
Authors
Janek Bevendorff; Khalid Al-Khatib; Martin Potthast; Benno Stein
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
A large-scale corpus of over 153 million fully-segmented emails from 14.635 public mailing lists.
The Webis Gmane Email Corpus 2019 is a dataset of more than 153 million parsed and segmented emails crawled between February and May 2019 from gmane.io covering more than 20 years of public mailing lists. The dataset has been published as a resource at ACL 2020.
W
Webis-SameSentiment-21
anthology.aicmu.ac.cn
webis.de
5495793
Updated 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Martin Potthast (2021). Webis-SameSentiment-21 [Dataset]. http://doi.org/10.5281/zenodo.5495793
Explore at:
5495793Available download formats
Unique identifier
https://doi.org/10.5281/zenodo.5495793
Dataset updated
2021
Dataset provided by
The Web Technology & Information Systems Network
Leipzig University
Authors
Martin Potthast
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Webis-SameSentiment-21 dataset is a collection of sentiment review pairs for Same Sentiment Classification. The dataset only contains the pair ids (business and review id) to allow recreation of the dataset. The actual review text has to be downloaded from Yelp.
W
Webis-PC-08
anthology.aicmu.ac.cn
webis.de
3254618
Updated 2008
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Benno Stein; Sven Meyer zu Eißen (2008). Webis-PC-08 [Dataset]. http://doi.org/10.5281/zenodo.3254618
Explore at:
3254618Available download formats
Unique identifier
https://doi.org/10.5281/zenodo.3254618
Dataset updated
2008
Dataset provided by
The Web Technology & Information Systems Network
Bauhaus-Universität Weimar
Authors
Benno Stein; Sven Meyer zu Eißen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This corpus is outdated. Please use its successor PAN-PC-11.
W
Data from: Webis-Web-Archive-17
webis.de
anthology.aicmu.ac.cn
+2more
1002203
Updated 2017
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Johannes Kiesel; Martin Potthast; Matthias Hagen; Benno Stein; Florian Kneist (2017). Webis-Web-Archive-17 [Dataset]. http://doi.org/10.5281/zenodo.1002203
Explore at:
1002203Available download formats
Unique identifier
https://doi.org/10.5281/zenodo.1002203
Dataset updated
2017
Dataset provided by
University of Kassel, hessian.AI, and ScaDS.AI
The Web Technology & Information Systems Network
Friedrich Schiller University Jena
GESIS - Leibniz Institute for the Social Sciences
Bauhaus-Universität Weimar
Authors
Johannes Kiesel; Martin Potthast; Matthias Hagen; Benno Stein; Florian Kneist
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Webis-Web-Archive-17 comprises a total of 10,000 web page archives from mid-2017 that were carefully sampled from the Common Crawl to involve a mixture of high-ranking and low-ranking web pages. The dataset contains the web archive files, HTML DOM, and screenshots of each web page, as well as per-page annotations of visual web archive quality.
W
Webis-SameSide-21
webis.de
anthology.aicmu.ac.cn
5380989
Updated 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Erik Körner; Gerhard Heyer; Martin Potthast (2021). Webis-SameSide-21 [Dataset]. http://doi.org/10.5281/zenodo.5380989
Explore at:
5380989Available download formats
Unique identifier
https://doi.org/10.5281/zenodo.5380989
Dataset updated
2021
Dataset provided by
Sächsische Akademie der Wissenschaften zu Leipzig
University of Kassel, hessian.AI, and ScaDS.AI
The Web Technology & Information Systems Network
Authors
Erik Körner; Gerhard Heyer; Martin Potthast
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Webis-SameSide-21 dataset is a resampled dataset based on the Same Side Stance Classification shared task dataset.
W
Webis-Web-Archive-Quality-22
anthology.aicmu.ac.cn
webis.de
6881334
Updated 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Martin Potthast; Johannes Kiesel; Benno Stein (2022). Webis-Web-Archive-Quality-22 [Dataset]. http://doi.org/10.5281/zenodo.6881334
Explore at:
6881334Available download formats
Unique identifier
https://doi.org/10.5281/zenodo.6881334
Dataset updated
2022
Dataset provided by
The Web Technology & Information Systems Network
Leipzig University
Bauhaus-Universität Weimar
Authors
Martin Potthast; Johannes Kiesel; Benno Stein
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Webis-Web-Archive-Quality-22 comprises a total of 6,500 pairs of screenshots from web pages as they were archived and as they were reproduced from that archive, along with archive quality annotations and information of DOM elements on the screenshot.
W
Webis-Snippet-20
webis.de
3653834
Updated 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wei-Fan Chen; Shahbaz Syed; Benno Stein; Matthias Hagen; Martin Potthast (2020). Webis-Snippet-20 [Dataset]. http://doi.org/10.5281/zenodo.3653834
Explore at:
3653834Available download formats
Unique identifier
https://doi.org/10.5281/zenodo.3653834
Dataset updated
2020
Dataset provided by
University of Kassel, hessian.AI, and ScaDS.AI
University of Bonn
Friedrich Schiller University Jena
NEC Laboratories Europe
The Web Technology & Information Systems Network
Bauhaus-Universität Weimar
Authors
Wei-Fan Chen; Shahbaz Syed; Benno Stein; Matthias Hagen; Martin Potthast
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Webis Abstractive Snippet Corpus 2020 (Webis-Snippet-20) comprises four abstractive snippet dataset from ClueWeb09, Clueweb12, and DMOZ descriptions. More than 10 million
W
Webis-NIL-21
anthology.aicmu.ac.cn
webis.de
5092851
Updated 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Martin Potthast; Benno Stein; Matthias Hagen (2021). Webis-NIL-21 [Dataset]. http://doi.org/10.5281/zenodo.5092851
Explore at:
5092851Available download formats
Unique identifier
https://doi.org/10.5281/zenodo.5092851
Dataset updated
2021
Dataset provided by
The Web Technology & Information Systems Network
Leipzig University
Friedrich Schiller University Jena
Bauhaus-Universität Weimar
Authors
Martin Potthast; Benno Stein; Matthias Hagen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Webis Netspeak Instant Search Log 2021 (Webis-NIL-21) is an excerpt of the log of the Netspeak search engine. The dataset contains about 37,000 log entries, which correspond to keystroke interactions the users of Netspeak made with it's search interface while entering their queries. This enables the study of instant search logs in general, and that of identifying keystroke interactions belonging to the same query in particular. The latter is annotated in the log.
W
Webis-ConcluGen-21
webis.de
anthology.aicmu.ac.cn
4818133
Updated 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shahbaz Syed; Milad Alshomary; Henning Wachsmuth; Martin Potthast (2021). Webis-ConcluGen-21 [Dataset]. http://doi.org/10.5281/zenodo.4818133
Explore at:
4818133Available download formats
Unique identifier
https://doi.org/10.5281/zenodo.4818133
Dataset updated
2021
Dataset provided by
University of Kassel, hessian.AI, and ScaDS.AI
The Web Technology & Information Systems Network
NEC Laboratories Europe
Leibniz Universität Hannover
University of Groningen
Authors
Shahbaz Syed; Milad Alshomary; Henning Wachsmuth; Martin Potthast
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The corpus contains 136,996 (argumentative text, conclusion) pairs for the task of informative conclusion generation. For each argument in the corpus, argumentative knowledge such as discussion topic, conclusion targets and argument aspects are provided.
W
Webis-Clickbait-16
webis.de
anthology.aicmu.ac.cn
+3more
3251557
Updated 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Martin Potthast; Benno Stein; Matthias Hagen; Sebastian Köpsel (2016). Webis-Clickbait-16 [Dataset]. http://doi.org/10.5281/zenodo.3251557
Explore at:
3251557Available download formats
Unique identifier
https://doi.org/10.5281/zenodo.3251557
Dataset updated
2016
Dataset provided by
University of Kassel, hessian.AI, and ScaDS.AI
The Web Technology & Information Systems Network
Friedrich Schiller University Jena
Bauhaus-Universität Weimar
Authors
Martin Potthast; Benno Stein; Matthias Hagen; Sebastian Köpsel
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Webis Clickbait Corpus 2016 (Webis-Clickbait-16) comprises 2992 Twitter tweets sampled from top 20 news publishers as per retweets in 2014. The tweets have been manually annotated by three independent annotators with regard to whether they can be considered clickbait. A total of 767 tweets are considered clickbait by the majority of annotators. The majority vote of reviewers can be used as a ground truth to build clickbait detection technology. This corpus is the first of its kind and gives rise to the development of technology to tackle clickbait.
W
Webis-Editorials-16
webis.de
anthology.aicmu.ac.cn
+3more
3254405
Updated 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Steve Göring; Henning Wachsmuth; Johannes Kiesel; Matthias Hagen; Benno Stein (2016). Webis-Editorials-16 [Dataset]. http://doi.org/10.5281/zenodo.3254405
Explore at:
3254405Available download formats
Unique identifier
https://doi.org/10.5281/zenodo.3254405
Dataset updated
2016
Dataset provided by
GESIS - Leibniz Institute for the Social Sciences
Leibniz Universität Hannover
The Web Technology & Information Systems Network
Friedrich Schiller University Jena
Bauhaus-Universität Weimar
University of Groningen
Authors
Steve Göring; Henning Wachsmuth; Johannes Kiesel; Matthias Hagen; Benno Stein
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Webis-Editorials-16 corpus is a novel corpus with 300 news editorials evenly selected from three diverse online news portals: Al Jazeera, Fox News, and The Guardian. The aim of the corpus is to study (1) the mining and classification of fine-grained types of argumentative discourse units and (2) the analysis of argumentation strategies pursued in editorials to achieve persuasion. To this end, each editorial contains manual type annotations of all units that capture the role that a unit plays in the argumentative discourse, such as assumption or statistics. The corpus consists of 14,313 units of six different types, each annotated by three professional annotators from the crowdsourcing platform upwork.com.
VA FOIA Website
catalog.data.gov
data.va.gov
+4more
Updated May 1, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department of Veterans Affairs (2021). VA FOIA Website [Dataset]. https://catalog.data.gov/dataset/va-foia-website
Explore at:
Dataset updated
May 1, 2021
Dataset provided by
United States Department of Veterans Affairshttp://va.gov/
Description
U.S. Department of Veterans Affairs Freedom of Information Act Service Webpage with many links to associated information.
W
webis-comparative-web-search-questions-20
webis.de
Updated 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
*last name, first name (2020). webis-comparative-web-search-questions-20 [Dataset]. https://webis.de/data/webis-comparative-web-search-questions-20.html
Explore at:
Dataset updated
2020
Dataset provided by
*Bauhaus-Universität Weimar
The Web Technology & Information Systems Network
Authors
*last name, first name
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
*variables measured in dataset
Measurement technique
*technique or technology used in a Dataset
Description
Webis-comparative-web-search-questions-20 comprises 15,000 web questions collected from the public datasets. The questions are manually annotated as comparative or not. The comparative ones are annotated with more fine-grained subclasses.
W
Webis-WVC-07
webis.de
3341473
Updated 2007
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Martin Potthast; Benno Stein (2007). Webis-WVC-07 [Dataset]. http://doi.org/10.5281/zenodo.3341473
Explore at:
3341473Available download formats
Unique identifier
https://doi.org/10.5281/zenodo.3341473
Dataset updated
2007
Dataset provided by
University of Kassel, hessian.AI, and ScaDS.AI
The Web Technology & Information Systems Network
Bauhaus-Universität Weimar
Authors
Martin Potthast; Benno Stein
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This corpus is outdated. Please use its successors PAN-WVC-10 and PAN-WVC-11.
d
1950 Census: Official 1950 Census Website
catalog.data.gov
datasets.ai
Updated Mar 11, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Office of Innovation (2023). 1950 Census: Official 1950 Census Website [Dataset]. https://catalog.data.gov/dataset/1950-census-official-1950-census-website
Explore at:
Dataset updated
Mar 11, 2023
Dataset provided by
Office of Innovation
Description
"Website allows the public full access to the 1950 Census images, census maps and descriptions.

Facebook

Twitter

Click to copy link

Link copied

Cite

Nikolay Kolyada; Martin Potthast; Benno Stein (2021). Webis-Dataset-Reviews-21 [Dataset]. http://doi.org/10.5281/zenodo.4491927

Webis-Dataset-Reviews-21

Explore at:

2 scholarly articles cite this dataset (View in Google Scholar)

4491927Available download formats

Unique identifier

https://doi.org/10.5281/zenodo.4491927

Dataset updated

2021

Dataset provided by

Bauhaus-Universität Weimar and Leipzig University
University of Kassel, hessian.AI, and ScaDS.AI
The Web Technology & Information Systems Network
Bauhaus-Universität Weimar

Authors

Nikolay Kolyada; Martin Potthast; Benno Stein

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

The Webis-Dataset-Reviews-21 corpus comprises the curated list of 13,372 NLP-related datasets and their 539,411 mentions extracted from all the publications available in ACL Anthology corpus.

Webis-Dataset-Reviews-21

Webis-Revenue-10

Webis-CLS-10

Webis-SameSide-19

Webis-CMV-20

Webis-Gmane-19

Webis-SameSentiment-21

Webis-PC-08

Data from: Webis-Web-Archive-17

Webis-SameSide-21

Webis-Web-Archive-Quality-22

Webis-Snippet-20

Webis-NIL-21

Webis-ConcluGen-21

Webis-Clickbait-16

Webis-Editorials-16

VA FOIA Website

webis-comparative-web-search-questions-20

Webis-WVC-07

1950 Census: Official 1950 Census Website

Webis-Dataset-Reviews-21