1 dataset found

Webis Clickbait Spoiling Corpus 2022
zenodo.org
explore.openaire.eu
zip
Updated Jul 12, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Matthias Hagen; Matthias Hagen; Maik Fröbe; Maik Fröbe; Artur Jurk; Martin Potthast; Martin Potthast; Artur Jurk (2023). Webis Clickbait Spoiling Corpus 2022 [Dataset]. http://doi.org/10.5281/zenodo.8136637
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.8136637
Dataset updated
Jul 12, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Matthias Hagen; Matthias Hagen; Maik Fröbe; Maik Fröbe; Artur Jurk; Martin Potthast; Martin Potthast; Artur Jurk
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Webis Clickbait Spoiling Corpus 2022

The Webis Clickbait Spoiling Corpus 2022 (Webis-Clickbait-22) contains 5,000 spoiled clickbait posts crawled from Facebook, Reddit, and Twitter.
This corpus supports the task of clickbait spoiling, which deals with generating a short text that satisfies the curiosity induced by a clickbait post.

This dataset contains the clickbait posts and manually cleaned versions of the linked documents, and extracted spoilers for each clickbait post.
Additionally, the spoilers are categorized into three types: short phrase spoilers, longer passage spoilers, and multiple non-consecutive pieces of text.

This dataset contains the clickbait posts and manually cleaned versions of the linked documents, and extracted spoilers for each clickbait post.
Additionally, the spoilers are categorized into three types: short phrase spoilers, longer passage spoilers, and multiple non-consecutive pieces of text. The test set of this dataset was used for the SemEval-2023 clickbait spoiling task. You can re-execute and adopt the software submissions made through for this SemEval task, please see the instructions and overview of approaches in TIRA.

Overview

The dataset comes with predefined train/validation/test splits:

training.jsonl: 3,200 posts for training

validation.jsonl: 800 posts for validation

test.jsonl: 1,000 posts for testing

The test set was used for the SemEval-2023 clickbait spoiling task. This shared task was organized with TIRA.io and participants submitted Docker software during the task. Please see the instructions in TIRA to re-execute or modify the approaches.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Matthias Hagen; Matthias Hagen; Maik Fröbe; Maik Fröbe; Artur Jurk; Martin Potthast; Martin Potthast; Artur Jurk (2023). Webis Clickbait Spoiling Corpus 2022 [Dataset]. http://doi.org/10.5281/zenodo.8136637

Webis Clickbait Spoiling Corpus 2022

Explore at:

9 scholarly articles cite this dataset (View in Google Scholar)

zipAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.8136637

Dataset updated

Jul 12, 2023

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Matthias Hagen; Matthias Hagen; Maik Fröbe; Maik Fröbe; Artur Jurk; Martin Potthast; Martin Potthast; Artur Jurk

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Webis Clickbait Spoiling Corpus 2022

The Webis Clickbait Spoiling Corpus 2022 (Webis-Clickbait-22) contains 5,000 spoiled clickbait posts crawled from Facebook, Reddit, and Twitter.
This corpus supports the task of clickbait spoiling, which deals with generating a short text that satisfies the curiosity induced by a clickbait post.

This dataset contains the clickbait posts and manually cleaned versions of the linked documents, and extracted spoilers for each clickbait post.
Additionally, the spoilers are categorized into three types: short phrase spoilers, longer passage spoilers, and multiple non-consecutive pieces of text. The test set of this dataset was used for the SemEval-2023 clickbait spoiling task. You can re-execute and adopt the software submissions made through for this SemEval task, please see the instructions and overview of approaches in TIRA.

Overview

The dataset comes with predefined train/validation/test splits:

training.jsonl: 3,200 posts for training
validation.jsonl: 800 posts for validation
test.jsonl: 1,000 posts for testing

The test set was used for the SemEval-2023 clickbait spoiling task. This shared task was organized with TIRA.io and participants submitted Docker software during the task. Please see the instructions in TIRA to re-execute or modify the approaches.

Clear search

Close search

Google apps

Main menu

Webis Clickbait Spoiling Corpus 2022

Webis Clickbait Spoiling Corpus 2022See More Versions

Webis Clickbait Spoiling Corpus 2022