1 dataset found
  1. W

    Webis-Ambient-15

    • webis.de
    • zenodo.org
    3250669
    Updated 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Matthias Hagen; Tim Gollub; Matthias Busse (2015). Webis-Ambient-15 [Dataset]. http://doi.org/10.5281/zenodo.3250669
    Explore at:
    3250669Available download formats
    Dataset updated
    2015
    Dataset provided by
    The Web Technology & Information Systems Network
    Friedrich Schiller University Jena
    Bauhaus-Universität Weimar
    Authors
    Matthias Hagen; Tim Gollub; Matthias Busse
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This corpus is an extension of the Ambient data set created by Carpineto and Romano. For each subtopic, the websites of the given URLs were downloaded (if accessible). Those documents are named as the original documents, for example, 1/1.4/1.3.html. Each subtopic was then manually enriched to ten documents with websites retrieved by Google (for example, 1/1.1/g00.html - 'g' for Google, 00 for the first Google result). Some subtopics could not be sufficently enriched and were discarded. Moreover, some subtopics were duplicates or not interpretable and were also discarded.

  2. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Matthias Hagen; Tim Gollub; Matthias Busse (2015). Webis-Ambient-15 [Dataset]. http://doi.org/10.5281/zenodo.3250669

Webis-Ambient-15

Explore at:
3250669Available download formats
Dataset updated
2015
Dataset provided by
The Web Technology & Information Systems Network
Friedrich Schiller University Jena
Bauhaus-Universität Weimar
Authors
Matthias Hagen; Tim Gollub; Matthias Busse
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This corpus is an extension of the Ambient data set created by Carpineto and Romano. For each subtopic, the websites of the given URLs were downloaded (if accessible). Those documents are named as the original documents, for example, 1/1.4/1.3.html. Each subtopic was then manually enriched to ten documents with websites retrieved by Google (for example, 1/1.1/g00.html - 'g' for Google, 00 for the first Google result). Some subtopics could not be sufficently enriched and were discarded. Moreover, some subtopics were duplicates or not interpretable and were also discarded.

Search
Clear search
Close search
Google apps
Main menu