2 datasets found
  1. W

    Webis-CLS-10

    • anthology.aicmu.ac.cn
    • webis.de
    3251672
    Updated 2010
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Peter Prettenhofer; Benno Stein (2010). Webis-CLS-10 [Dataset]. http://doi.org/10.5281/zenodo.3251672
    Explore at:
    3251672Available download formats
    Dataset updated
    2010
    Dataset provided by
    Bauhaus-Universität Weimar
    The Web Technology & Information Systems Network
    DataRobot, Inc.
    Authors
    Peter Prettenhofer; Benno Stein
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Cross-Lingual Sentiment (CLS) dataset comprises about 800.000 Amazon product reviews in the four languages English, German, French, and Japanese.

  2. Webis Cross-Lingual Sentiment Dataset 2010 (Webis-CLS-10)

    • zenodo.org
    Updated Apr 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Peter Prettenhofer; Benno Stein; Benno Stein; Peter Prettenhofer (2023). Webis Cross-Lingual Sentiment Dataset 2010 (Webis-CLS-10) [Dataset]. http://doi.org/10.5281/zenodo.3251672
    Explore at:
    Dataset updated
    Apr 14, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Peter Prettenhofer; Benno Stein; Benno Stein; Peter Prettenhofer
    Description

    The Cross-Lingual Sentiment (CLS) dataset comprises about 800.000 Amazon product reviews in the four languages English, German, French, and Japanese.

    For more information on the construction of the dataset see (Prettenhofer and Stein, 2010) or the enclosed readme files. If you have a question after reading the paper and the readme files, please contact Peter Prettenhofer.

    We provide the dataset in two formats: 1) a processed format which corresponds to the preprocessing (tokenization, etc.) in (Prettenhofer and Stein, 2010); 2) an unprocessed format which contains the full text of the reviews (e.g., for machine translation or feature engineering).

    The dataset was first used by (Prettenhofer and Stein, 2010). It consists of Amazon product reviews for three product categories---books, dvds and music---written in four different languages: English, German, French, and Japanese. The German, French, and Japanese reviews were crawled from Amazon in November, 2009. The English reviews were sampled from the Multi-Domain Sentiment Dataset (Blitzer et. al., 2007). For each language-category pair there exist three sets of training documents, test documents, and unlabeled documents. The training and test sets comprise 2.000 documents each, whereas the number of unlabeled documents varies from 9.000 - 170.000.

  3. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Peter Prettenhofer; Benno Stein (2010). Webis-CLS-10 [Dataset]. http://doi.org/10.5281/zenodo.3251672

Webis-CLS-10

Explore at:
57 scholarly articles cite this dataset (View in Google Scholar)
3251672Available download formats
Dataset updated
2010
Dataset provided by
Bauhaus-Universität Weimar
The Web Technology & Information Systems Network
DataRobot, Inc.
Authors
Peter Prettenhofer; Benno Stein
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

The Cross-Lingual Sentiment (CLS) dataset comprises about 800.000 Amazon product reviews in the four languages English, German, French, and Japanese.

Search
Clear search
Close search
Google apps
Main menu