Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Poem Sentiment is a sentiment dataset of poem verses from Project Gutenberg. This dataset can be used for tasks such as sentiment classification or style transfer for poems.
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Arabic Poetry Dataset This dataset contains over 70,000 poems from more than 750 poets from the 6th to the 20th century. The dataset includes the following fields for each poem:
The dataset provides a rich source of information for those interested in Arabic poetry, its evolution and development over time, and the diverse themes and styles of different poets. Whether you are a researcher, student, or enthusiast, this dataset is an excellent resource for exploring and discovering the beauty of Arabic poetry.
Acknowledgments The data was entirely scraped from aldiwan.net.
Gutenberg Poem Dataset is used for the next verse prediction component.
https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/
Allison Parrish's Gutenberg Poetry Corpus
This corpus was originally published under the CC0 license by Allison Parrish. Please visit Allison's fantastic accompanying GitHub repository for usage inspiration as well as more information on how the data was mined, how to create your own version of the corpus, and examples of projects using it. This dataset contains 3,085,117 lines of poetry from hundreds of Project Gutenberg books. Each line has a corresponding gutenberg_id (1191… See the full description on the dataset page: https://huggingface.co/datasets/biglam/gutenberg-poetry-corpus.
A blackout poetry dataset constructed from publicly available short stories and large poems. The dataset consists of two variants: 8K and 16K examples of passages along with a poem generated from the passage and the indices of the words in the passage from which words in the poem have been selected. The dataset also contains perplexity scores for each of the poems indicating the language quality of the poems.
The dataset was constructed synthetically, and hence contains multiple poor poems and frequent grammatical errors. However, it is a great starting point for the task of applying machine learning to blackout poetry generation.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Collection of Persian Modernist Poetry from Iranian contemporary poets
http://www.gnu.org/licenses/lgpl-3.0.htmlhttp://www.gnu.org/licenses/lgpl-3.0.html
There are not many poetry datasets, and in Spanish language is even worst! With this dataset, we want to give access to these quality Spanish data for NLP tasks.
Data was acquired in July 2020 from the poetry webpage www.poemas-del-alma.com. It provides a wide amount of data involving poems in Spanish. Data was scraped using Python library BeautifulSoup. For each poem in www.poemas-del-alma.com, we collected the name of the poet, poem, and poem title. Scraping processed is available at https://github.com/andreamorgar/poesIA/blob/master/poetry-scrapper.py.
We wouldn't be here without www.poemas-del-alma.com, which provides the poetry collection in this dataset.
Very simple dataset, but with many potentials. I'm itching to discover new literary structures within Spanish literature data, a wider analysis, and so on!
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
A Copyright Free Collection of 2,686 Bengali Poems from Prominent Poets
Introduction
CCPM is a large Chinese classical poetry matching dataset that can be used for poetry matching, understanding and translation.
The main task of this dataset is: given a description in modern Chinese, the model is supposed to select one line of Chinese classical poetry from four candidates that semantically match the given description most.
Size
It contains 27,218 instances in total, which are split into training (21,778), validation (2,720) and test (2,720) sets.
Format
Each instance is composed of translation (the description in modern Chinese, a string), choice (four candidate lines of Chinese classical poetry, a list) and answer (the index of the correct line, an integer between 0 and 3).
This is a table of word counts for a collection of 75,297 English-language poems.
Poetry from Gutenberg Project containing 2703086 Rows of Sentences.
Note - This is Dataset Belonging to Allison Parrish Link to Corpus - https://github.com/aparrish/gutenberg-poetry-corpus
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Even though there is available of the lexicon in emotion analysis, to identify emotion from poems had to rely on limited emotion lexicons. Since those lexicons are not created for poems, and it is not concentrated on poetic features. This paper presents a text corpus PERC(Poem Emotion Recognition Corpus) comprising a set of poems and features for emotion recognition from poems. Emotion classication is based on 'Navarasa,' described in 'Natyasastra.' Navarasa consists of nine primary emotions such as Love, Sad, Anger, Hate, Fear, Surprise, Courage, Joy, and Peace. Although there are many text corpus for emotion recognition, we do not know of a text corpus for poems based on nine emotions. The corpus created is from an exhaustive collection of poems of Indian poets for the period 1850-2016. The novelty of this work is the creation of a corpus using poems mined from the web and evaluated by human experts.
https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/
Overview
This dataset is a collection of approximately 38,500 poems from https://www.public-domain-poetry.com/.
Language
The language of this dataset is English.
License
All data in this dataset is public domain, which means you should be able to use it for anything you want, as long as you aren't breaking any law in the process of doing so.
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Since I could not find a good dataset online for Hindi poems, I decided to scrape public sites to find some beautiful poems. This dataset is the result of tha scraping process undertook using scrapy module in python.
The dataset can be loaded as a python list of dictionaries by reading JSON line by line and converting each line using json module.
Example:
data = []
with open("scraped_all.json", "r") as f:
for line in f:
data.append(json.loads(line))
Dataset is scraped from: https://www.amarujala.com/kavya/kavita.
http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html
Turkish poetry dataset including 7 different poetry books
Explore the Arabic literature from the 6th to the 21st century.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This is a small corpus of Anglo-American poetry. With one exception, each of the titles was taken from the Gutenberg site: 1. James Joyce, Chamber Music (London: Elkin Matthews, 1907) [eBook #2817]; 2. Ezra Pound, Personae of Ezra Pound (London: Elkin Mathews, Vigo Street, 1914) [ebook #41162], and 3. Lustra of Ezra Pound (London: Elkin Mathews, 1916) [EBook #55564]; 4. T. S. Eliot, Prufrock and Other Observations [eBook #1459]; 5. D. H. Lawrence, Amores (New York: B. W. Huebsch, 1916) [eBook #22531], 6. New Poems (London: Martin Seeker, 1918) [EBook #22726], and 7. Birds, Beasts and Flowers (London: Martin Secker Ltd, 1923). [EBook #60337]; 8. Hilda Doolittle, Sea Garden (London: Constable and Company Ltd, 1916) [EBook #28665], 9. Hymen (New York: Henry Holt and Company, 1921) [EBook #28666], and 10. Heliodora and Other Poems (Boston and New York: Houghton Mifflin Company, 1924) [EBook #62456]; 11. John Gould Fletcher, Goblins and Pagodas (Boston and New York: The Riverside Press Cambridge, 1916) [EBook #38856]; 12. William Butler Yeats, Responsibilities and Other Poems (New York: the Macmillan Company, 1916) [EBook #36865] and 13. The Wild Swans at Coole (New York: The Macmillan Company, 1919) [EBook #32491]; 14. Edith Sitwell, The Wooden Pegasus (Oxford: Basil Blackwell, 1920) [EBook #62560]; 15. Osbert Sitwell, Argonaut and Juggernaut (London: Chatto & Windus, 1919) [EBook #61368], and 16. Out of the Flame (London: Grant Richards Ltd, 1923). [EBook #61369]. To this collection, I have added: 17. Sacheverell Sitwell, The People’s Palace (Oxford: Blackwell), the text of which I took from the Internet Archive and formatted for text processing.
The ultimate goal of the Program for the Exploration of the Eastern Mediterranean (POEM) is to reach a comprehensive knowledge of the physical, chemical, and biological oceanography of the Eastern Mediterranean. Such knowledge is an essential basis for environmental management, resource exploration, and marine operations. The overall scientific objectives are to: (1) describe the physical phenomena and quantify their kinematics; (2) define basic dynamical processes; and (3) construct physical models suitable for general ocean scientific studies and applications.
This contains a set of poems required for analysis for a poem
selection of poems from poetryfoundation .com
Poems from Poetryfoundation.com
It is interesting to use this data, only for the purpose of pure research on the capability of AI & ML to classify poems.
Libosa2707/final-vietnamese-poem dataset hosted on Hugging Face and contributed by the HF Datasets community
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Poem Sentiment is a sentiment dataset of poem verses from Project Gutenberg. This dataset can be used for tasks such as sentiment classification or style transfer for poems.