100+ datasets found
  1. h

    poem_sentiment

    • huggingface.co
    • opendatalab.com
    Updated Dec 9, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The HF Datasets community (2020). poem_sentiment [Dataset]. https://huggingface.co/datasets/poem_sentiment
    Explore at:
    Dataset updated
    Dec 9, 2020
    Dataset authored and provided by
    The HF Datasets community
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Poem Sentiment is a sentiment dataset of poem verses from Project Gutenberg. This dataset can be used for tasks such as sentiment classification or style transfer for poems.

  2. k

    Arabic-Poetry-Dataset

    • kaggle.com
    Updated Jan 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). Arabic-Poetry-Dataset [Dataset]. https://www.kaggle.com/datasets/mdanok/arabic-poetry-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 30, 2023
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    Arabic Poetry Dataset This dataset contains over 70,000 poems from more than 750 poets from the 6th to the 20th century. The dataset includes the following fields for each poem:

    • poet_name: The name of the poet who wrote the poem.
    • poet_era: The historical era during which the poet lived and wrote.
    • poem_tags: Themes or keywords associated with the poem.
    • poem_title: The title of the poem.
    • poem_text: The full text of the poem.
    • poem_count: The number of poems written by the poet in the dataset.

    The dataset provides a rich source of information for those interested in Arabic poetry, its evolution and development over time, and the diverse themes and styles of different poets. Whether you are a researcher, student, or enthusiast, this dataset is an excellent resource for exploring and discovering the beauty of Arabic poetry.

    Acknowledgments The data was entirely scraped from aldiwan.net.

  3. P

    Gutenberg Poem Dataset Dataset

    • paperswithcode.com
    Updated Jan 27, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Emily Sheng; David Uthus (2021). Gutenberg Poem Dataset Dataset [Dataset]. https://paperswithcode.com/dataset/gutenberg-poem-dataset
    Explore at:
    Dataset updated
    Jan 27, 2021
    Authors
    Emily Sheng; David Uthus
    Description

    Gutenberg Poem Dataset is used for the next verse prediction component.

  4. h

    gutenberg-poetry-corpus

    • huggingface.co
    Updated Oct 20, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BigLAM: BigScience Libraries, Archives and Museums (2022). gutenberg-poetry-corpus [Dataset]. https://huggingface.co/datasets/biglam/gutenberg-poetry-corpus
    Explore at:
    Dataset updated
    Oct 20, 2022
    Dataset authored and provided by
    BigLAM: BigScience Libraries, Archives and Museums
    License

    https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/

    Description

    Allison Parrish's Gutenberg Poetry Corpus

    This corpus was originally published under the CC0 license by Allison Parrish. Please visit Allison's fantastic accompanying GitHub repository for usage inspiration as well as more information on how the data was mined, how to create your own version of the corpus, and examples of projects using it. This dataset contains 3,085,117 lines of poetry from hundreds of Project Gutenberg books. Each line has a corresponding gutenberg_id (1191… See the full description on the dataset page: https://huggingface.co/datasets/biglam/gutenberg-poetry-corpus.

  5. P

    BLP Dataset

    • paperswithcode.com
    Updated Oct 31, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aditeya Baral; Himanshu Jain; Deeksha D; Dr. Mamatha H R (2021). BLP Dataset [Dataset]. https://paperswithcode.com/dataset/blp16
    Explore at:
    Dataset updated
    Oct 31, 2021
    Authors
    Aditeya Baral; Himanshu Jain; Deeksha D; Dr. Mamatha H R
    Description

    A blackout poetry dataset constructed from publicly available short stories and large poems. The dataset consists of two variants: 8K and 16K examples of passages along with a poem generated from the passage and the indices of the words in the passage from which words in the poem have been selected. The dataset also contains perplexity scores for each of the poems indicating the language quality of the poems.

    The dataset was constructed synthetically, and hence contains multiple poor poems and frequent grammatical errors. However, it is a great starting point for the task of applying machine learning to blackout poetry generation.

  6. k

    Shereno--A-Dataset-of-Persian-Modernist-Poetry

    • kaggle.com
    Updated Dec 11, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). Shereno--A-Dataset-of-Persian-Modernist-Poetry [Dataset]. https://www.kaggle.com/datasets/elhamaghakhani/persian-poems
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 11, 2021
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Collection of Persian Modernist Poetry from Iranian contemporary poets

  7. Spanish Poetry Dataset

    • kaggle.com
    Updated Oct 10, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrea Morales Garzón (2020). Spanish Poetry Dataset [Dataset]. https://www.kaggle.com/andreamorgar/spanish-poetry-dataset/tasks
    Explore at:
    Dataset updated
    Oct 10, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Andrea Morales Garzón
    License

    http://www.gnu.org/licenses/lgpl-3.0.htmlhttp://www.gnu.org/licenses/lgpl-3.0.html

    Description

    Context

    There are not many poetry datasets, and in Spanish language is even worst! With this dataset, we want to give access to these quality Spanish data for NLP tasks.

    Content

    Data was acquired in July 2020 from the poetry webpage www.poemas-del-alma.com. It provides a wide amount of data involving poems in Spanish. Data was scraped using Python library BeautifulSoup. For each poem in www.poemas-del-alma.com, we collected the name of the poet, poem, and poem title. Scraping processed is available at https://github.com/andreamorgar/poesIA/blob/master/poetry-scrapper.py.

    Acknowledgements

    We wouldn't be here without www.poemas-del-alma.com, which provides the poetry collection in this dataset.

    Inspiration

    Very simple dataset, but with many potentials. I'm itching to discover new literary structures within Spanish literature data, a wider analysis, and so on!

  8. k

    Free-Bengali-Poetry

    • kaggle.com
    Updated Aug 7, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). Free-Bengali-Poetry [Dataset]. https://www.kaggle.com/datasets/truthr/free-bengali-poetry
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 7, 2021
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    A Copyright Free Collection of 2,686 Bengali Poems from Prominent Poets

  9. P

    CCPM Dataset

    • paperswithcode.com
    Updated Dec 26, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wenhao Li; Fanchao Qi; Maosong Sun; Xiaoyuan Yi; Jiarui Zhang (2021). CCPM Dataset [Dataset]. https://paperswithcode.com/dataset/ccpm
    Explore at:
    Dataset updated
    Dec 26, 2021
    Authors
    Wenhao Li; Fanchao Qi; Maosong Sun; Xiaoyuan Yi; Jiarui Zhang
    Description

    Introduction

    CCPM is a large Chinese classical poetry matching dataset that can be used for poetry matching, understanding and translation.

    The main task of this dataset is: given a description in modern Chinese, the model is supposed to select one line of Chinese classical poetry from four candidates that semantically match the given description most.

    Size

    It contains 27,218 instances in total, which are split into training (21,778), validation (2,720) and test (2,720) sets.

    Format

    Each instance is composed of translation (the description in modern Chinese, a string), choice (four candidate lines of Chinese classical poetry, a list) and answer (the index of the correct line, an integer between 0 and 3).

  10. d

    20C Poetry

    • search.dataone.org
    • dataverse.harvard.edu
    Updated Nov 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrew Piper (2023). 20C Poetry [Dataset]. https://search.dataone.org/view/sha256%3Adbe4d8962b7aa0e76c3fca3cb8d92146b3bf7eb7b61bfdf6985b7b6f4a9df491
    Explore at:
    Dataset updated
    Nov 22, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Andrew Piper
    Description

    This is a table of word counts for a collection of 75,297 English-language poems.

  11. Gutenberg Poetry Dataset

    • kaggle.com
    Updated Jun 26, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kaustubh Pathak (2020). Gutenberg Poetry Dataset [Dataset]. https://www.kaggle.com/terminate9298/gutenberg-poetry-dataset/notebooks
    Explore at:
    Dataset updated
    Jun 26, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Kaustubh Pathak
    Description

    Context

    Poetry from Gutenberg Project containing 2703086 Rows of Sentences.

    Acknowledgements

    Note - This is Dataset Belonging to Allison Parrish Link to Corpus - https://github.com/aparrish/gutenberg-poetry-corpus

  12. d

    Poem Emotion Recognition Corpus (PERC) - Dataset - B2FIND

    • b2find.dkrz.de
    Updated May 2, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). Poem Emotion Recognition Corpus (PERC) - Dataset - B2FIND [Dataset]. https://b2find.dkrz.de/dataset/1e1c0a4b-ea50-588f-a20b-32196a0df539
    Explore at:
    Dataset updated
    May 2, 2023
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Even though there is available of the lexicon in emotion analysis, to identify emotion from poems had to rely on limited emotion lexicons. Since those lexicons are not created for poems, and it is not concentrated on poetic features. This paper presents a text corpus PERC(Poem Emotion Recognition Corpus) comprising a set of poems and features for emotion recognition from poems. Emotion classication is based on 'Navarasa,' described in 'Natyasastra.' Navarasa consists of nine primary emotions such as Love, Sad, Anger, Hate, Fear, Surprise, Courage, Joy, and Peace. Although there are many text corpus for emotion recognition, we do not know of a text corpus for poems based on nine emotions. The corpus created is from an exhaustive collection of poems of Indian poets for the period 1850-2016. The novelty of this work is the creation of a corpus using poems mined from the web and evaluated by human experts.

  13. h

    public-domain-poetry

    • huggingface.co
    Updated Oct 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kamil (2023). public-domain-poetry [Dataset]. https://huggingface.co/datasets/DanFosing/public-domain-poetry
    Explore at:
    Dataset updated
    Oct 3, 2023
    Authors
    Kamil
    License

    https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/

    Description

    Overview

    This dataset is a collection of approximately 38,500 poems from https://www.public-domain-poetry.com/.

      Language
    

    The language of this dataset is English.

      License
    

    All data in this dataset is public domain, which means you should be able to use it for anything you want, as long as you aren't breaking any law in the process of doing so.

  14. Data from: Hindi Poem Dataset

    • kaggle.com
    Updated Aug 14, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tushar Singh (2021). Hindi Poem Dataset [Dataset]. https://www.kaggle.com/tusharsingh1999/hindi-poem-dataset/code
    Explore at:
    Dataset updated
    Aug 14, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Tushar Singh
    License

    Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
    License information was derived automatically

    Description

    Context

    Since I could not find a good dataset online for Hindi poems, I decided to scrape public sites to find some beautiful poems. This dataset is the result of tha scraping process undertook using scrapy module in python.

    Content

    The dataset can be loaded as a python list of dictionaries by reading JSON line by line and converting each line using json module.

    Example:
    data = []
        with open("scraped_all.json", "r") as f:
          for line in f:
            data.append(json.loads(line))
    

    Acknowledgements

    Dataset is scraped from: https://www.amarujala.com/kavya/kavita.

  15. k

    Turkish-Poetry-Dataset

    • kaggle.com
    Updated Nov 19, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2020). Turkish-Poetry-Dataset [Dataset]. https://www.kaggle.com/datasets/redrussianarmy/turkish-poetry-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 19, 2020
    License

    http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html

    Description

    Turkish poetry dataset including 7 different poetry books

  16. k

    Arabic-Poetry-Dataset--6th---21st-century-

    • kaggle.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Arabic-Poetry-Dataset--6th---21st-century- [Dataset]. https://www.kaggle.com/datasets/fahd09/arabic-poetry-dataset-478-2017
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Description

    Explore the Arabic literature from the 6th to the 21st century.

  17. H

    A Small Corpus of Anglo-American Poetry

    • dataverse.harvard.edu
    txt
    Updated Jan 17, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Harvard Dataverse (2022). A Small Corpus of Anglo-American Poetry [Dataset]. http://doi.org/10.7910/DVN/F9XM79
    Explore at:
    txt(76142), txt(107436), txt(101831), txt(83157), txt(87051), txt(93859), txt(41476), txt(35762), txt(76317), txt(102072), txt(56130), txt(66496), txt(180950), txt(53815), txt(87140), txt(123636), txt(62057)Available download formats
    Dataset updated
    Jan 17, 2022
    Dataset provided by
    Harvard Dataverse
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Time period covered
    1907 - 1923
    Description

    This is a small corpus of Anglo-American poetry. With one exception, each of the titles was taken from the Gutenberg site: 1. James Joyce, Chamber Music (London: Elkin Matthews, 1907) [eBook #2817]; 2. Ezra Pound, Personae of Ezra Pound (London: Elkin Mathews, Vigo Street, 1914) [ebook #41162], and 3. Lustra of Ezra Pound (London: Elkin Mathews, 1916) [EBook #55564]; 4. T. S. Eliot, Prufrock and Other Observations [eBook #1459]; 5. D. H. Lawrence, Amores (New York: B. W. Huebsch, 1916) [eBook #22531], 6. New Poems (London: Martin Seeker, 1918) [EBook #22726], and 7. Birds, Beasts and Flowers (London: Martin Secker Ltd, 1923). [EBook #60337]; 8. Hilda Doolittle, Sea Garden (London: Constable and Company Ltd, 1916) [EBook #28665], 9. Hymen (New York: Henry Holt and Company, 1921) [EBook #28666], and 10. Heliodora and Other Poems (Boston and New York: Houghton Mifflin Company, 1924) [EBook #62456]; 11. John Gould Fletcher, Goblins and Pagodas (Boston and New York: The Riverside Press Cambridge, 1916) [EBook #38856]; 12. William Butler Yeats, Responsibilities and Other Poems (New York: the Macmillan Company, 1916) [EBook #36865] and 13. The Wild Swans at Coole (New York: The Macmillan Company, 1919) [EBook #32491]; 14. Edith Sitwell, The Wooden Pegasus (Oxford: Basil Blackwell, 1920) [EBook #62560]; 15. Osbert Sitwell, Argonaut and Juggernaut (London: Chatto & Windus, 1919) [EBook #61368], and 16. Out of the Flame (London: Grant Richards Ltd, 1923). [EBook #61369]. To this collection, I have added: 17. Sacheverell Sitwell, The People’s Palace (Oxford: Blackwell), the text of which I took from the Internet Archive and formatted for text processing.

  18. d

    PANGAEA - Data from Physical Oceanography of the Eastern Mediterranean...

    • b2find.dkrz.de
    Updated May 4, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). PANGAEA - Data from Physical Oceanography of the Eastern Mediterranean (POEM) - Dataset - B2FIND [Dataset]. https://b2find.dkrz.de/dataset/73d284ca-13a9-5b5e-83f9-0729611c9c45
    Explore at:
    Dataset updated
    May 4, 2023
    Area covered
    Mediterranean Sea
    Description

    The ultimate goal of the Program for the Exploration of the Eastern Mediterranean (POEM) is to reach a comprehensive knowledge of the physical, chemical, and biological oceanography of the Eastern Mediterranean. Such knowledge is an essential basis for environmental management, resource exploration, and marine operations. The overall scientific objectives are to: (1) describe the physical phenomena and quantify their kinematics; (2) define basic dynamical processes; and (3) construct physical models suitable for general ocean scientific studies and applications.

  19. Poetry Analysis Data

    • kaggle.com
    Updated Jul 8, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    JPSS (2017). Poetry Analysis Data [Dataset]. https://www.kaggle.com/datasets/jatindersehdev/poetry-analysis-data
    Explore at:
    Dataset updated
    Jul 8, 2017
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    JPSS
    Description

    Context

    This contains a set of poems required for analysis for a poem

    Content

    selection of poems from poetryfoundation .com

    Acknowledgements

    Poems from Poetryfoundation.com

    Inspiration

    It is interesting to use this data, only for the purpose of pure research on the capability of AI & ML to classify poems.

  20. h

    final-vietnamese-poem

    • huggingface.co
    Updated Feb 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nguyễn Công Đạt (2024). final-vietnamese-poem [Dataset]. https://huggingface.co/datasets/Libosa2707/final-vietnamese-poem
    Explore at:
    Dataset updated
    Feb 8, 2024
    Authors
    Nguyễn Công Đạt
    Description

    Libosa2707/final-vietnamese-poem dataset hosted on Hugging Face and contributed by the HF Datasets community

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
The HF Datasets community (2020). poem_sentiment [Dataset]. https://huggingface.co/datasets/poem_sentiment

poem_sentiment

Gutenberg Poem Dataset

Explore at:
27 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Dec 9, 2020
Dataset authored and provided by
The HF Datasets community
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Poem Sentiment is a sentiment dataset of poem verses from Project Gutenberg. This dataset can be used for tasks such as sentiment classification or style transfer for poems.

Search
Clear search
Close search
Google apps
Main menu