100+ datasets found

h
poem_sentiment
huggingface.co
opendatalab.com
Updated Dec 9, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The HF Datasets community (2020). poem_sentiment [Dataset]. https://huggingface.co/datasets/poem_sentiment
Explore at:
Dataset updated
Dec 9, 2020
Dataset authored and provided by
The HF Datasets community
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Poem Sentiment is a sentiment dataset of poem verses from Project Gutenberg. This dataset can be used for tasks such as sentiment classification or style transfer for poems.
k
Arabic-Poetry-Dataset
kaggle.com
Updated Jan 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). Arabic-Poetry-Dataset [Dataset]. https://www.kaggle.com/datasets/mdanok/arabic-poetry-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 30, 2023
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description
Arabic Poetry Dataset This dataset contains over 70,000 poems from more than 750 poets from the 6th to the 20th century. The dataset includes the following fields for each poem:

poet_name: The name of the poet who wrote the poem.

poet_era: The historical era during which the poet lived and wrote.

poem_tags: Themes or keywords associated with the poem.

poem_title: The title of the poem.

poem_text: The full text of the poem.

poem_count: The number of poems written by the poet in the dataset.

The dataset provides a rich source of information for those interested in Arabic poetry, its evolution and development over time, and the diverse themes and styles of different poets. Whether you are a researcher, student, or enthusiast, this dataset is an excellent resource for exploring and discovering the beauty of Arabic poetry.

Acknowledgments The data was entirely scraped from aldiwan.net.
P
Gutenberg Poem Dataset Dataset
paperswithcode.com
Updated Jan 27, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Emily Sheng; David Uthus (2021). Gutenberg Poem Dataset Dataset [Dataset]. https://paperswithcode.com/dataset/gutenberg-poem-dataset
Explore at:
Dataset updated
Jan 27, 2021
Authors
Emily Sheng; David Uthus
Description
Gutenberg Poem Dataset is used for the next verse prediction component.
h
gutenberg-poetry-corpus
huggingface.co
Updated Oct 20, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BigLAM: BigScience Libraries, Archives and Museums (2022). gutenberg-poetry-corpus [Dataset]. https://huggingface.co/datasets/biglam/gutenberg-poetry-corpus
Explore at:
Dataset updated
Oct 20, 2022
Dataset authored and provided by
BigLAM: BigScience Libraries, Archives and Museums
License
https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/
Description
Allison Parrish's Gutenberg Poetry Corpus

This corpus was originally published under the CC0 license by Allison Parrish. Please visit Allison's fantastic accompanying GitHub repository for usage inspiration as well as more information on how the data was mined, how to create your own version of the corpus, and examples of projects using it. This dataset contains 3,085,117 lines of poetry from hundreds of Project Gutenberg books. Each line has a corresponding gutenberg_id (1191… See the full description on the dataset page: https://huggingface.co/datasets/biglam/gutenberg-poetry-corpus.
P
BLP Dataset
paperswithcode.com
Updated Oct 31, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aditeya Baral; Himanshu Jain; Deeksha D; Dr. Mamatha H R (2021). BLP Dataset [Dataset]. https://paperswithcode.com/dataset/blp16
Explore at:
Dataset updated
Oct 31, 2021
Authors
Aditeya Baral; Himanshu Jain; Deeksha D; Dr. Mamatha H R
Description
A blackout poetry dataset constructed from publicly available short stories and large poems. The dataset consists of two variants: 8K and 16K examples of passages along with a poem generated from the passage and the indices of the words in the passage from which words in the poem have been selected. The dataset also contains perplexity scores for each of the poems indicating the language quality of the poems.

The dataset was constructed synthetically, and hence contains multiple poor poems and frequent grammatical errors. However, it is a great starting point for the task of applying machine learning to blackout poetry generation.
k
Shereno--A-Dataset-of-Persian-Modernist-Poetry
kaggle.com
Updated Dec 11, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2021). Shereno--A-Dataset-of-Persian-Modernist-Poetry [Dataset]. https://www.kaggle.com/datasets/elhamaghakhani/persian-poems
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 11, 2021
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Collection of Persian Modernist Poetry from Iranian contemporary poets
Spanish Poetry Dataset
kaggle.com
Updated Oct 10, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andrea Morales Garzón (2020). Spanish Poetry Dataset [Dataset]. https://www.kaggle.com/andreamorgar/spanish-poetry-dataset/tasks
Explore at:
Dataset updated
Oct 10, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Andrea Morales Garzón
License
http://www.gnu.org/licenses/lgpl-3.0.htmlhttp://www.gnu.org/licenses/lgpl-3.0.html
Description
Context

There are not many poetry datasets, and in Spanish language is even worst! With this dataset, we want to give access to these quality Spanish data for NLP tasks.

Content

Data was acquired in July 2020 from the poetry webpage www.poemas-del-alma.com. It provides a wide amount of data involving poems in Spanish. Data was scraped using Python library BeautifulSoup. For each poem in www.poemas-del-alma.com, we collected the name of the poet, poem, and poem title. Scraping processed is available at https://github.com/andreamorgar/poesIA/blob/master/poetry-scrapper.py.

Acknowledgements

We wouldn't be here without www.poemas-del-alma.com, which provides the poetry collection in this dataset.

Inspiration

Very simple dataset, but with many potentials. I'm itching to discover new literary structures within Spanish literature data, a wider analysis, and so on!
k
Free-Bengali-Poetry
kaggle.com
Updated Aug 7, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2021). Free-Bengali-Poetry [Dataset]. https://www.kaggle.com/datasets/truthr/free-bengali-poetry
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 7, 2021
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
A Copyright Free Collection of 2,686 Bengali Poems from Prominent Poets
P
CCPM Dataset
paperswithcode.com
Updated Dec 26, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wenhao Li; Fanchao Qi; Maosong Sun; Xiaoyuan Yi; Jiarui Zhang (2021). CCPM Dataset [Dataset]. https://paperswithcode.com/dataset/ccpm
Explore at:
Dataset updated
Dec 26, 2021
Authors
Wenhao Li; Fanchao Qi; Maosong Sun; Xiaoyuan Yi; Jiarui Zhang
Description
Introduction

CCPM is a large Chinese classical poetry matching dataset that can be used for poetry matching, understanding and translation.

The main task of this dataset is: given a description in modern Chinese, the model is supposed to select one line of Chinese classical poetry from four candidates that semantically match the given description most.

Size

It contains 27,218 instances in total, which are split into training (21,778), validation (2,720) and test (2,720) sets.

Format

Each instance is composed of translation (the description in modern Chinese, a string), choice (four candidate lines of Chinese classical poetry, a list) and answer (the index of the correct line, an integer between 0 and 3).
d
20C Poetry
search.dataone.org
dataverse.harvard.edu
Updated Nov 22, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andrew Piper (2023). 20C Poetry [Dataset]. https://search.dataone.org/view/sha256%3Adbe4d8962b7aa0e76c3fca3cb8d92146b3bf7eb7b61bfdf6985b7b6f4a9df491
Explore at:
Dataset updated
Nov 22, 2023
Dataset provided by
Harvard Dataverse
Authors
Andrew Piper
Description
This is a table of word counts for a collection of 75,297 English-language poems.
Gutenberg Poetry Dataset
kaggle.com
Updated Jun 26, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kaustubh Pathak (2020). Gutenberg Poetry Dataset [Dataset]. https://www.kaggle.com/terminate9298/gutenberg-poetry-dataset/notebooks
Explore at:
Dataset updated
Jun 26, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Kaustubh Pathak
Description
Context

Poetry from Gutenberg Project containing 2703086 Rows of Sentences.

Acknowledgements

Note - This is Dataset Belonging to Allison Parrish Link to Corpus - https://github.com/aparrish/gutenberg-poetry-corpus
d
Poem Emotion Recognition Corpus (PERC) - Dataset - B2FIND
b2find.dkrz.de
Updated May 2, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). Poem Emotion Recognition Corpus (PERC) - Dataset - B2FIND [Dataset]. https://b2find.dkrz.de/dataset/1e1c0a4b-ea50-588f-a20b-32196a0df539
Explore at:
Dataset updated
May 2, 2023
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Even though there is available of the lexicon in emotion analysis, to identify emotion from poems had to rely on limited emotion lexicons. Since those lexicons are not created for poems, and it is not concentrated on poetic features. This paper presents a text corpus PERC(Poem Emotion Recognition Corpus) comprising a set of poems and features for emotion recognition from poems. Emotion classication is based on 'Navarasa,' described in 'Natyasastra.' Navarasa consists of nine primary emotions such as Love, Sad, Anger, Hate, Fear, Surprise, Courage, Joy, and Peace. Although there are many text corpus for emotion recognition, we do not know of a text corpus for poems based on nine emotions. The corpus created is from an exhaustive collection of poems of Indian poets for the period 1850-2016. The novelty of this work is the creation of a corpus using poems mined from the web and evaluated by human experts.
h
public-domain-poetry
huggingface.co
Updated Oct 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kamil (2023). public-domain-poetry [Dataset]. https://huggingface.co/datasets/DanFosing/public-domain-poetry
Explore at:
Dataset updated
Oct 3, 2023
Authors
Kamil
License
https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/
Description
Overview

This dataset is a collection of approximately 38,500 poems from https://www.public-domain-poetry.com/.

Language

The language of this dataset is English.

License

All data in this dataset is public domain, which means you should be able to use it for anything you want, as long as you aren't breaking any law in the process of doing so.
Data from: Hindi Poem Dataset
kaggle.com
Updated Aug 14, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tushar Singh (2021). Hindi Poem Dataset [Dataset]. https://www.kaggle.com/tusharsingh1999/hindi-poem-dataset/code
Explore at:
Dataset updated
Aug 14, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Tushar Singh
License
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Description
Context

Since I could not find a good dataset online for Hindi poems, I decided to scrape public sites to find some beautiful poems. This dataset is the result of tha scraping process undertook using scrapy module in python.

Content

The dataset can be loaded as a python list of dictionaries by reading JSON line by line and converting each line using json module.

Example: data = [] with open("scraped_all.json", "r") as f: for line in f: data.append(json.loads(line))

Acknowledgements

Dataset is scraped from: https://www.amarujala.com/kavya/kavita.
k
Turkish-Poetry-Dataset
kaggle.com
Updated Nov 19, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2020). Turkish-Poetry-Dataset [Dataset]. https://www.kaggle.com/datasets/redrussianarmy/turkish-poetry-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 19, 2020
License
http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html
Description
Turkish poetry dataset including 7 different poetry books
k
Arabic-Poetry-Dataset--6th---21st-century-
kaggle.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Arabic-Poetry-Dataset--6th---21st-century- [Dataset]. https://www.kaggle.com/datasets/fahd09/arabic-poetry-dataset-478-2017
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Description
Explore the Arabic literature from the 6th to the 21st century.
H
A Small Corpus of Anglo-American Poetry
dataverse.harvard.edu
txt
Updated Jan 17, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Harvard Dataverse (2022). A Small Corpus of Anglo-American Poetry [Dataset]. http://doi.org/10.7910/DVN/F9XM79
Explore at:
txt(76142), txt(107436), txt(101831), txt(83157), txt(87051), txt(93859), txt(41476), txt(35762), txt(76317), txt(102072), txt(56130), txt(66496), txt(180950), txt(53815), txt(87140), txt(123636), txt(62057)Available download formats
Unique identifier
https://doi.org/10.7910/DVN/F9XM79
Dataset updated
Jan 17, 2022
Dataset provided by
Harvard Dataverse
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Time period covered
1907 - 1923
Description
This is a small corpus of Anglo-American poetry. With one exception, each of the titles was taken from the Gutenberg site: 1. James Joyce, Chamber Music (London: Elkin Matthews, 1907) [eBook #2817]; 2. Ezra Pound, Personae of Ezra Pound (London: Elkin Mathews, Vigo Street, 1914) [ebook #41162], and 3. Lustra of Ezra Pound (London: Elkin Mathews, 1916) [EBook #55564]; 4. T. S. Eliot, Prufrock and Other Observations [eBook #1459]; 5. D. H. Lawrence, Amores (New York: B. W. Huebsch, 1916) [eBook #22531], 6. New Poems (London: Martin Seeker, 1918) [EBook #22726], and 7. Birds, Beasts and Flowers (London: Martin Secker Ltd, 1923). [EBook #60337]; 8. Hilda Doolittle, Sea Garden (London: Constable and Company Ltd, 1916) [EBook #28665], 9. Hymen (New York: Henry Holt and Company, 1921) [EBook #28666], and 10. Heliodora and Other Poems (Boston and New York: Houghton Mifflin Company, 1924) [EBook #62456]; 11. John Gould Fletcher, Goblins and Pagodas (Boston and New York: The Riverside Press Cambridge, 1916) [EBook #38856]; 12. William Butler Yeats, Responsibilities and Other Poems (New York: the Macmillan Company, 1916) [EBook #36865] and 13. The Wild Swans at Coole (New York: The Macmillan Company, 1919) [EBook #32491]; 14. Edith Sitwell, The Wooden Pegasus (Oxford: Basil Blackwell, 1920) [EBook #62560]; 15. Osbert Sitwell, Argonaut and Juggernaut (London: Chatto & Windus, 1919) [EBook #61368], and 16. Out of the Flame (London: Grant Richards Ltd, 1923). [EBook #61369]. To this collection, I have added: 17. Sacheverell Sitwell, The People’s Palace (Oxford: Blackwell), the text of which I took from the Internet Archive and formatted for text processing.
d
PANGAEA - Data from Physical Oceanography of the Eastern Mediterranean...
b2find.dkrz.de
Updated May 4, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). PANGAEA - Data from Physical Oceanography of the Eastern Mediterranean (POEM) - Dataset - B2FIND [Dataset]. https://b2find.dkrz.de/dataset/73d284ca-13a9-5b5e-83f9-0729611c9c45
Explore at:
Dataset updated
May 4, 2023
Area covered
Mediterranean Sea
Description
The ultimate goal of the Program for the Exploration of the Eastern Mediterranean (POEM) is to reach a comprehensive knowledge of the physical, chemical, and biological oceanography of the Eastern Mediterranean. Such knowledge is an essential basis for environmental management, resource exploration, and marine operations. The overall scientific objectives are to: (1) describe the physical phenomena and quantify their kinematics; (2) define basic dynamical processes; and (3) construct physical models suitable for general ocean scientific studies and applications.
Poetry Analysis Data
kaggle.com
Updated Jul 8, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
JPSS (2017). Poetry Analysis Data [Dataset]. https://www.kaggle.com/datasets/jatindersehdev/poetry-analysis-data
Explore at:
Dataset updated
Jul 8, 2017
Dataset provided by
Kagglehttp://kaggle.com/
Authors
JPSS
Description
Context

This contains a set of poems required for analysis for a poem

Content

selection of poems from poetryfoundation .com

Acknowledgements

Poems from Poetryfoundation.com

Inspiration

It is interesting to use this data, only for the purpose of pure research on the capability of AI & ML to classify poems.
h
final-vietnamese-poem
huggingface.co
Updated Feb 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nguyễn Công Đạt (2024). final-vietnamese-poem [Dataset]. https://huggingface.co/datasets/Libosa2707/final-vietnamese-poem
Explore at:
Dataset updated
Feb 8, 2024
Authors
Nguyễn Công Đạt
Description
Libosa2707/final-vietnamese-poem dataset hosted on Hugging Face and contributed by the HF Datasets community

Facebook

Twitter

Click to copy link

Link copied

Cite

The HF Datasets community (2020). poem_sentiment [Dataset]. https://huggingface.co/datasets/poem_sentiment

poem_sentiment

Gutenberg Poem Dataset

Explore at:

27 scholarly articles cite this dataset (View in Google Scholar)

Dataset updated

Dec 9, 2020

Dataset authored and provided by

The HF Datasets community

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Poem Sentiment is a sentiment dataset of poem verses from Project Gutenberg. This dataset can be used for tasks such as sentiment classification or style transfer for poems.

Clear search

Close search

Google apps

Main menu

poem_sentiment

Arabic-Poetry-Dataset

Gutenberg Poem Dataset Dataset

gutenberg-poetry-corpus

BLP Dataset

Shereno--A-Dataset-of-Persian-Modernist-Poetry

Spanish Poetry Dataset

Context

Content

Acknowledgements

Inspiration

Free-Bengali-Poetry

CCPM Dataset

20C Poetry

Gutenberg Poetry Dataset

Context

Acknowledgements

Poem Emotion Recognition Corpus (PERC) - Dataset - B2FIND

public-domain-poetry

Data from: Hindi Poem Dataset

Context

Content

Acknowledgements

Turkish-Poetry-Dataset

Arabic-Poetry-Dataset--6th---21st-century-

A Small Corpus of Anglo-American Poetry

PANGAEA - Data from Physical Oceanography of the Eastern Mediterranean...

Poetry Analysis Data

Context

Content

Acknowledgements

Inspiration

final-vietnamese-poem

poem_sentiment

Gutenberg Poem Dataset