1 dataset found
  1. W

    Webis-YouTube8MA-18

    • webis.de
    3724806
    Updated 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anny Marleen Hißbach; Tim Gollub; Martin Potthast (2018). Webis-YouTube8MA-18 [Dataset]. http://doi.org/10.5281/zenodo.3724806
    Explore at:
    3724806Available download formats
    Dataset updated
    2018
    Dataset provided by
    University of Kassel, hessian.AI, and ScaDS.AI
    Bauhaus-Universität Weimar
    The Web Technology & Information Systems Network
    Authors
    Anny Marleen Hißbach; Tim Gollub; Martin Potthast
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We used the YouTube Data API to augment the YouTube 8M corpus by crawling a variety of meta data for the videos.

    First point of interest was the "video resource," which comprises data about the video, such as the video's title, description, uploader name, tags, view count, and more. Also included in the meta data is whether comments have been left for the video. If so, we downloaded them as well, including information about their authors, likes, dislikes, and responses.

    There is no property which specifies a video's language, since this information is not mandatory when uploading a video. Also, the API provides only information about the available captions, but not the captions themselves. Only the uploader of a video is given access to its captions via the API; we extracted them using youtube-dl. For each video, all manually created captions were downloaded, and auto-generated captions in the "default" language and English. The "default" auto-generated caption gives perhaps the only hint at a video's original language.

    Finally, we downloaded all thumbnails used to advertise a video, which are not available via the API, but only via a canonical URL. Our corpus provides the possibility to recreate the way a video is presented on YouTube (meta data and thumbnail), what the actual content is ((sub)titles and descriptions), and how its viewers reacted (comments).

  2. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Anny Marleen Hißbach; Tim Gollub; Martin Potthast (2018). Webis-YouTube8MA-18 [Dataset]. http://doi.org/10.5281/zenodo.3724806

Webis-YouTube8MA-18

Explore at:
3724806Available download formats
Dataset updated
2018
Dataset provided by
University of Kassel, hessian.AI, and ScaDS.AI
Bauhaus-Universität Weimar
The Web Technology & Information Systems Network
Authors
Anny Marleen Hißbach; Tim Gollub; Martin Potthast
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

We used the YouTube Data API to augment the YouTube 8M corpus by crawling a variety of meta data for the videos.

First point of interest was the "video resource," which comprises data about the video, such as the video's title, description, uploader name, tags, view count, and more. Also included in the meta data is whether comments have been left for the video. If so, we downloaded them as well, including information about their authors, likes, dislikes, and responses.

There is no property which specifies a video's language, since this information is not mandatory when uploading a video. Also, the API provides only information about the available captions, but not the captions themselves. Only the uploader of a video is given access to its captions via the API; we extracted them using youtube-dl. For each video, all manually created captions were downloaded, and auto-generated captions in the "default" language and English. The "default" auto-generated caption gives perhaps the only hint at a video's original language.

Finally, we downloaded all thumbnails used to advertise a video, which are not available via the API, but only via a canonical URL. Our corpus provides the possibility to recreate the way a video is presented on YouTube (meta data and thumbnail), what the actual content is ((sub)titles and descriptions), and how its viewers reacted (comments).

Search
Clear search
Close search
Google apps
Main menu