1 dataset found
  1. PAN18 Author Identification: Attribution

    • zenodo.org
    zip
    Updated Nov 25, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mike Kestemont; Michael Tschuggnall; Efstathios Stamatatos; Walter Daelemans; Günther Specht; Benno Stein; Benno Stein; Martin Potthast; Martin Potthast; Mike Kestemont; Michael Tschuggnall; Efstathios Stamatatos; Walter Daelemans; Günther Specht (2023). PAN18 Author Identification: Attribution [Dataset]. http://doi.org/10.5281/zenodo.3737849
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 25, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Mike Kestemont; Michael Tschuggnall; Efstathios Stamatatos; Walter Daelemans; Günther Specht; Benno Stein; Benno Stein; Martin Potthast; Martin Potthast; Mike Kestemont; Michael Tschuggnall; Efstathios Stamatatos; Walter Daelemans; Günther Specht
    Description

    We provide a corpus which comprises a set of cross-domain authorship attribution problems in each of the following 5 languages: English, French, Italian, Polish, and Spanish. Note that we specifically avoid to use the term 'training corpus' because the sets of candidate authors of the development and the evaluation corpora are not overlapping. Therefore, your approach should not be designed to particularly handle the candidate authors of the development corpus.

    Each problem consists of a set of known fanfics by each candidate author and a set of unknown fanfics located in separate folders. The file problem-info.json that can be found in the main folder of each problem, shows the name of folder of unknown documents and the list of names of candidate author folders.

    The true author of each unknown document can be seen in the file ground-truth.json, also found in the main folder of each problem.

    In addition, to handle a collection of such problems, the file collection-info.jsonincludes all relevant information. In more detail, for each problem it lists its main folder, the language (either "en", "fr", "it", "pl", or "sp") and encoding (always UTF-8) of its documents.

    More information: Link

  2. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Mike Kestemont; Michael Tschuggnall; Efstathios Stamatatos; Walter Daelemans; Günther Specht; Benno Stein; Benno Stein; Martin Potthast; Martin Potthast; Mike Kestemont; Michael Tschuggnall; Efstathios Stamatatos; Walter Daelemans; Günther Specht (2023). PAN18 Author Identification: Attribution [Dataset]. http://doi.org/10.5281/zenodo.3737849
Organization logo

PAN18 Author Identification: Attribution

Explore at:
zipAvailable download formats
Dataset updated
Nov 25, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Mike Kestemont; Michael Tschuggnall; Efstathios Stamatatos; Walter Daelemans; Günther Specht; Benno Stein; Benno Stein; Martin Potthast; Martin Potthast; Mike Kestemont; Michael Tschuggnall; Efstathios Stamatatos; Walter Daelemans; Günther Specht
Description

We provide a corpus which comprises a set of cross-domain authorship attribution problems in each of the following 5 languages: English, French, Italian, Polish, and Spanish. Note that we specifically avoid to use the term 'training corpus' because the sets of candidate authors of the development and the evaluation corpora are not overlapping. Therefore, your approach should not be designed to particularly handle the candidate authors of the development corpus.

Each problem consists of a set of known fanfics by each candidate author and a set of unknown fanfics located in separate folders. The file problem-info.json that can be found in the main folder of each problem, shows the name of folder of unknown documents and the list of names of candidate author folders.

The true author of each unknown document can be seen in the file ground-truth.json, also found in the main folder of each problem.

In addition, to handle a collection of such problems, the file collection-info.jsonincludes all relevant information. In more detail, for each problem it lists its main folder, the language (either "en", "fr", "it", "pl", or "sp") and encoding (always UTF-8) of its documents.

More information: Link

Search
Clear search
Close search
Google apps
Main menu