Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
The MusicCaps dataset contains 5,521 music examples, each of which is labeled with an English aspect list and a free text caption written by musicians.
The text is solely focused on describing how the music sounds, not the metadata like the artist name.
The labeled examples are 10s music clips from the AudioSet dataset (2,858 from the eval and 2,663 from the train split).
Please cite the corresponding paper, when using this dataset: http://arxiv.org/abs/2301.11325 (DOI: 10.48550/arXiv.2301.11325
)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset provides a list of lyrics from 1950 to 2019 describing music metadata as sadness, danceability, loudness, acousticness, etc. We also provide some informations as lyrics which can be used to natural language processing.
The audio data was scraped using Echo Nest® API integrated engine with spotipy Python’s package. The spotipy API permits the user to search for specific genres, artists,songs, release date, etc. To obtain the lyrics we used the Lyrics Genius® API as baseURL for requesting data based on the song title and artist name.
Michaël Defferrard, Kirell Benzi, Pierre Vandergheynst, Xavier Bresson, EPFL LTS2.
Source: http://archive.ics.uci.edu/ml/datasets/FMA%3A+A+Dataset+For+Music+Analysis
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
MuMu is a Multimodal Music dataset with multi-label genre annotations that combines information from the Amazon Reviews dataset and the Million Song Dataset (MSD). The former contains millions of album customer reviews and album metadata gathered from Amazon.com. The latter is a collection of metadata and precomputed audio features for a million songs.
To map the information from both datasets we use MusicBrainz. This process yields the final set of 147,295 songs, which belong to 31,471 albums. For the mapped set of albums, there are 447,583 customer reviews from the Amazon Dataset. The dataset have been used for multi-label music genre classification experiments in the related publication. In addition to genre annotations, this dataset provides further information about each album, such as genre annotations, average rating, selling rank, similar products, and cover image url. For every text review it also provides helpfulness score of the reviews, average rating, and summary of the review.
The mapping between the three datasets (Amazon, MusicBrainz and MSD), genre annotations, metadata, data splits, text reviews and links to images are available here. Images and audio files can not be released due to copyright issues.
These data can be used together with the Tartarus deep learning library https://github.com/sergiooramas/tartarus.
Scientific References
Please cite the following paper if using MuMu dataset or Tartarus library.
Oramas S., Nieto O., Barbieri F., & Serra X. (2017). Multi-label Music Genre Classification from audio, text and images using Deep Features. In Proceedings of the 18th International Society for Music Information Retrieval Conference (ISMIR 2017). https://arxiv.org/abs/1707.04916
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset was studied on Temporal Analysis and Visualisation of Music paper, in the following link:
https://sol.sbc.org.br/index.php/eniac/article/view/12155
This dataset provides a list of lyrics from 1950 to 2019 describing music metadata as sadness, danceability, loudness, acousticness, etc. We also provide some informations as lyrics which can be used to natural language processing.
The audio data was scraped using Echo Nest® API integrated engine with spotipy Python’s package. The spotipy API permits the user to search for specific genres, artists,songs, release date, etc. To obtain the lyrics we used the Lyrics Genius® API as baseURL for requesting data based on the song title and artist name.
MusicCaps is a dataset composed of 5.5k music-text pairs, with rich text descriptions provided by human experts. For each 10-second music clip, MusicCaps provides:
1) A free-text caption consisting of four sentences on average, describing the music and
2) A list of music aspects, describing genre, mood, tempo, singer voices, instrumentation, dissonances, rhythm, etc.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Stimulus materials, design matrix, and mean ratings for music and emotion study using optimal design in factorial manipulation of musical features (Eerola, Friberg & Bresin, 2013).
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The GTZAN Music Speech dataset is a dataset of audio recordings of music and speech. The dataset consists of 120 tracks, each containing 30 seconds of audio. The tracks in the dataset are all 22050Hz Mono 16-bit audio files in .wav format. The dataset is split into a training set and a test set, with 60 tracks in each set. The training set is used to train machine learning models to classify music and speech, and the test set is used to evaluate the performance of these models.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
The raw dataset comprises approximately 1,700 musical pieces in .mp3 format, sourced from the NetEase music. The lengths of these pieces range from 270 to 300 seconds. All are sampled at the rate of 48,000 Hz. As the website providing the audio music includes style labels for the downloaded music, there are no specific annotators involved. Validation is achieved concurrently with the downloading process. They are categorized into a total of 16 genres.
For the pre-processed version, audio is cut into an 11.4-second segment, resulting in 36,375 files, which are then transformed into Mel, CQT and Chroma. In the end, the data entry has six columns: the first three columns represent the Mel, CQT, and Chroma spectrogram slices in .jpg format, respectively, while the last three columns contain the labels for the three levels. The first level comprises two categories, the second level consists of nine categories, and the third level encompasses 16 categories. The entire dataset is shuffled and split into training, validation, and test sets in a ratio of 8:1:1. This dataset can be used for genre classification.
https://choosealicense.com/licenses/openrail/https://choosealicense.com/licenses/openrail/
Eddycrack864/Music-Dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
"Festejo" is a meticulously curated AI-generated music dataset that captures the essence of Festejo, encapsulating the spirited percussions, infectious rhythms, and celebratory melodies that define this genre.
With an array of expertly crafted samples, this serves as a captivating playground for machine learning applications, offering a unique opportunity to explore and infuse your compositions with the dynamic essence of Afro-Peruvian heritage.
Each meticulously generated sample allows for engagement with the intricate tapestry of Festejo's rhythms and melodies, inspiring to create compositions that honor its cultural roots while embracing the limitless possibilities of AI-generated music.
This exceptional AI Music Dataset encompasses an array of vital data categories, contributing to its excellence. It encompasses Machine Learning (ML) Data, serving as the foundation for training intricate algorithms that generate musical pieces. Music Data, offering a rich collection of melodies, harmonies, and rhythms that fuel the AI's creative process. AI & ML Training Data continuously hone the dataset's capabilities through iterative learning. Copyright Data ensures the dataset's compliance with legal standards, while Intellectual Property Data safeguards the innovative techniques embedded within, fostering a harmonious blend of technological advancement and artistic innovation.
This dataset can also be useful as Advertising Data to generate music tailored to resonate with specific target audiences, enhancing the effectiveness of advertisements by evoking emotions and capturing attention. It can be a valuable source of Social Media Data as well. Users can post, share, and interact with the music, leading to increased user engagement and virality. The music's novelty and uniqueness can spark discussions, debates, and trends across social media communities, amplifying its reach and impact.
https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified
We introduce the Free Music Archive (FMA), an open and easily accessible dataset suitable for evaluating several tasks in MIR, a field concerned with browsing, searching, and organizing large music collections. The community s growing interest in feature and end-to-end learning is however restrained by the limited availability of large audio datasets. The FMA aims to overcome this hurdle by providing 917 GiB and 343 days of Creative Commons-licensed audio from 106,574 tracks from 16,341 artists and 14,854 albums, arranged in a hierarchical taxonomy of 161 genres. It provides full-length and high-quality audio, pre-computed features, together with track- and user-level metadata, tags, and free-form text such as biographies. We here describe the dataset and how it was created, propose a train/validation/test split and three subsets, discuss some suitable MIR tasks, and evaluate some baselines for genre recognition. Code, data, and usage examples are available at
Affective correspondence between 3,812 songs and over 85,000 images. To facilitate the study of crossmodal emotion analysis, we constructed a large scale database, which we call the Image-Music Affective Correspondence (IMAC) database. It consists of more than 85,000 images and 3,812 songs (approximately 270 hours of audio). Each data sample is labeled with one of the three emotions: positive, neutral and negative. The IMAC database is constructed by combining an existing image emotion database (You et al., 2016) with a new music emotion database curated by us (Verma et al., 2019).
"Country Music" is an AI music dataset meticulously crafted to revolutionize the field of music generation. This extensive collection encapsulates a myriad of acoustic guitar strumming patterns, expressive vocal styles, twangy banjo melodies, soulful fiddle tunes, and the iconic sound of the pedal steel guitar, all performed with the genuine essence of country music.
The comprehensive metadata associated with each sample provides an immersive context, encompassing details about the instruments utilized, strumming techniques, chord progressions, tempos, and dynamic levels.
"Country Music" is a transformative resource that empowers machine learning applications to delve into the heart of country music, enabling the generation of highly authentic and emotive compositions that accurately reflect the intricacies of this genre.
This exceptional AI Music Dataset encompasses an array of vital data categories, contributing to its excellence. It encompasses Machine Learning (ML) Data, serving as the foundation for training intricate algorithms that generate musical pieces. Music Data, offering a rich collection of melodies, harmonies, and rhythms that fuel the AI's creative process. AI & ML Training Data continuously hone the dataset's capabilities through iterative learning. Copyright Data ensures the dataset's compliance with legal standards, while Intellectual Property Data safeguards the innovative techniques embedded within, fostering a harmonious blend of technological advancement and artistic innovation.
This dataset can also be useful as Advertising Data to generate music tailored to resonate with specific target audiences, enhancing the effectiveness of advertisements by evoking emotions and capturing attention. It can be a valuable source of Social Media Data as well. Users can post, share, and interact with the music, leading to increased user engagement and virality. The music's novelty and uniqueness can spark discussions, debates, and trends across social media communities, amplifying its reach and impact.
Physical albums accounted for less than ten percent of music consumption in the United States in 2021. While fans of jazz or rock music were most likely to listen to artists via physical formats, the vast majority of listeners across all genres enjoyed their favorite tracks through on-demand streaming services.
Stream it, just stream it
A closer look at the distribution of streamed music consumption reveals that R&B and hip-hop is the most streamed music genre in the United States. Roughly 30 percent of all streams came from this genre in 2021, which can at least partially be explained by its popularity among young and digitally savvy listeners. The overall demand for streamed audio content is growing rapidly each year, and in 2021, the number of paid music streaming subscribers in the U.S. surpassed a record 82 million. That same year, nearly one trillion music streams were amassed in the United States alone, signaling a bright future for services like Spotify or Amazon Music. So which artists were audiences listening to?
Most popular songs
In 2021, "Levitating" by Dua Lipa was the most popular song in the U.S. based on audio streams. The track from her second studio album "Future Nostalgia" was streamed a total of 804 million times while also ranking among the top three best-selling digital music singles worldwide. Looking at the artists with the most streams overall that year, Olivia Rodrigo topped the charts with a combined 1.47 billion streams for her songs "drivers license" and "good 4 u".
According to a study on music consumption worldwide in 2022, younger generations tended to find new songs via music apps and social media, while older generations also used the radio as a format to discover new audio content.
Neural Audio Fingerprint Dataset
(c) 2021 by Sungkyun Chang https://github.com/mimbres/neural-audio-fp This dataset includes all music sources, background noise and impulse-reponses (IR) samples that have been used in the work "https://arxiv.org/abs/2010.11910">"Neural Audio Fingerprint for High-specific Audio Retrieval based on Contrastive Learning".
Format:
16-bit PCM Mono WAV, Sampling rate 8000 Hz
Description:
/… See the full description on the dataset page: https://huggingface.co/datasets/arch-raven/music-fingerprint-dataset.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset provides a list of lyrics from 1950 to 2019 describing music metadata as sadness, danceability, loudness, acousticness, etc. We also provide some informations as lyrics which can be used to natural language processing.
The audio data was scraped using Echo Nest® API integrated engine with spotipy Python’s package. The spotipy API permits the user to search for specific genres, artists,songs, release date, etc. To obtain the lyrics we used the Lyrics Genius® API as baseURL for requesting data based on the song title and artist name.
Music Information Retrieval (MIR) is the task of extracting higher-level information such as genre, artist or instrumentation from music [1]. Music genre classification is an important area of MIR and a rapidly evolving research area. Presently, slight research work has been done on automatic music genre classification of Nigerian songs. Hence, this study presents a new music dataset named ORIN which mainly cores traditional Nigerian songs of four genres (Fuji, Juju, Highlife and Apala). The ORIN dataset consists of 208 Nigerian songs downloaded from the internet. Timbral Texture Features were then mined from one or two 30 second segments from each song using the Librosa [2] python library. Songs features were mined straight from the digital song files. Each piece of song was sampled at 22.5Khz 16-bit mono audio files. The song signal was then shared into frames of 1024 samples with 50% overlay between successive frames. A Hamming window is applied without pre-emphasis for each frame. Then, 29 averaged Spectral energies are obtained from a bank of 29 Mel triangular filters followed by a DCT, yielding 20 Mel frequency Cepstrum Coefficients (MFCC). The mean and standard deviation of the values taken across frames is considered as the representative final feature that is fed to the model for each of the spectral features. These features consist of the time (FFT) and frequency (MFCC) feature sets of the dataset domains.
"Folk Music" is an exceptional AI music dataset meticulously curated to preserve and celebrate the timeless beauty of folk music. This comprehensive collection focuses exclusively on folk music, capturing its essence through a rich assortment of melodies, rhythms, and cultural expressions.
By leveraging the diversity and quality of the folk music samples provided, "Folk Music" empowers machine learning applications to generate authentic and evocative folk compositions.
The detailed metadata associated with each sample in the dataset provides a wealth of contextual information, including instrument types, playing techniques, specific melodies, tempos, and dynamics. This allows for a deeper understanding and exploration of the intricate dynamics that make folk music unique.
This exceptional AI Music Dataset encompasses an array of vital data categories, contributing to its excellence. It encompasses Machine Learning (ML) Data, serving as the foundation for training intricate algorithms that generate musical pieces. Music Data, offering a rich collection of melodies, harmonies, and rhythms that fuel the AI's creative process. AI & ML Training Data continuously hone the dataset's capabilities through iterative learning. Copyright Data ensures the dataset's compliance with legal standards, while Intellectual Property Data safeguards the innovative techniques embedded within, fostering a harmonious blend of technological advancement and artistic innovation.
This dataset can also be useful as Advertising Data to generate music tailored to resonate with specific target audiences, enhancing the effectiveness of advertisements by evoking emotions and capturing attention. It can be a valuable source of Social Media Data as well. Users can post, share, and interact with the music, leading to increased user engagement and virality. The music's novelty and uniqueness can spark discussions, debates, and trends across social media communities, amplifying its reach and impact.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
The MusicCaps dataset contains 5,521 music examples, each of which is labeled with an English aspect list and a free text caption written by musicians.
The text is solely focused on describing how the music sounds, not the metadata like the artist name.
The labeled examples are 10s music clips from the AudioSet dataset (2,858 from the eval and 2,663 from the train split).
Please cite the corresponding paper, when using this dataset: http://arxiv.org/abs/2301.11325 (DOI: 10.48550/arXiv.2301.11325
)