Esperanto Dataset
Mostly English
Vortlisto
https://github.com/paulmakepeace/vortlisto
License
The original word list was created in the 90's for a now-defunct exam (see README.md for more details). It's unclear what the copyright status of that text is. Bill Walker with others' help provided translations. Those translations came from various sources. (Who owns the translation of a word?) Bill Walker has kindly given permission for his gcselist.htm… See the full description on the dataset page: https://huggingface.co/datasets/Infinitestarcode/esperanto.
https://catalog.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdfhttps://catalog.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdf
https://catalog.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdfhttps://catalog.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdf
The Arbobanko (Esperanto Treebank) is a 52,000 token dependency treebank of Esperanto with texts from the MONATO news magazine, consisting of random excerpts from the period 2000-2010. All words were annotated for lemma, part-of-speech, inflection, compounding and affixing, syntactic function, dependency links, NER types, semantic types of nouns and adjectives, and verb frame categories.Morphosyntactic and dependency annotation was performed with the EspGram parser, and manually revised. Semantic categories were added in a second round of annotation, and are also manually revised and disambiguated. The format is native Constraint Grammar sgml, with token-based tag lines, xml with feature-attribute pairs or CoNNL tab format.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Explore Esperanto badge through unique data from multiples sources: key facts, real-time news, interactive charts, detailed maps & open datasets
This is the LMF version of the Apertium bilingual dictionary for Esperanto and Spanish languages. Bilingual LMF dictionaries were generated from Apertium bilingual dix files. For each Apertium bilingual correspondence, the corresponding source and target monolingual entries (LexicalEntry) were generated in addition to the bilingual correspondence (SenseAxis) element. Apertium is a free/open-source machine translation platform, initially aimed at related-language pairs but recently expanded to deal with more divergent language pairs (such as Esperanto-Spanish). The platform provides: a language-independent machine translation engine; tools to manage the linguistic data necessary to build a machine translation system for a given language pair and linguistic data for a growing number of language pairs.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Explore Secondary school Esperanto through unique data from multiples sources: key facts, real-time news, interactive charts, detailed maps & open datasets
https://deepfo.com/documentacion.php?idioma=enhttps://deepfo.com/documentacion.php?idioma=en
magazines in Esperanto. name, image, categories, date Closed, date first issue, date founded, Frequency, city Headquarters, administrative division Headquarters, country Headquarters, continent Headquarters, Country, continent, ISSN, Website
https://deepfo.com/documentacion.php?idioma=enhttps://deepfo.com/documentacion.php?idioma=en
Radios in Esperanto. name, image, date Commenced operations, date founded, Frequency, city Headquarters, administrative division Headquarters, country Headquarters, continent Headquarters, Country, continent, coverage, Language, Prefix, date dissolved, Website, Owner
https://dataverse.csuc.cat/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.34810/data316https://dataverse.csuc.cat/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.34810/data316
This is the LMF version of the Apertium bilingual dictionary for Esperanto and Catalanlanguages. Bilingual LMF dictionaries were generated from Apertium bilingual dix files. For each Apertium bilingual correspondence, the corresponding source and target monolingual entries (LexicalEntry) were generated in addition to the bilingual correspondence (SenseAxis) element. Apertium is a free/open-source machine translation platform, initially aimed at related-language pairs but recently expanded to deal with more divergent language pairs (such as Esperanto-Catalan). The platform provides: a language-independent machine translation engine; tools to manage the linguistic data necessary to build a machine translation system for a given language pair and linguistic data for a growing number of language pairs.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Explore The accepted Esperanto dictionary = Leksikono de oficialaj vortoj through unique data from multiples sources: key facts, real-time news, interactive charts, detailed maps & open datasets
Esperanto is a language from the Constructed family, spoken in Eurasia. The UKC Lexicon of Esperanto is represented as a lexico-semantic network. It consists of words, word senses, synsets, as well as sense-level and synset-level relationships.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Credit report of International Ready Esperanto contains unique and detailed export import market intelligence with it's phone, email, Linkedin and details of each import and export shipment like product, quantity, price, buyer, supplier names, country and date of shipment.
La obra que aquí vamos a analizar, Lenguaje y utopía. El movimiento esperantista en España, 1890-1936, es el trabajo elaborado por un sociólogo: Roberto Garvía Soto. A la hora de examinar su contenido lo hemos hecho bajo tres premisas. La primera es que se trata del primer ensayo que se realiza sobre el movimiento esperantista en castellano, por lo que junto a las aportaciones que realiza también debemos tener presente las líneas de investigación que no aborda. Segundo, para el análisis de las cuestiones nucleares del movimiento esperantista español, el autor ha tenido que recurrir a fuentes escritas en esperanto con la consiguiente necesidad de tener que saber dominar dicho idioma. Tercero, como el autor señala en la introducción, no nos encontramos ante un trabajo que aborde en su totalidad el movimiento esperantista español (...).
This dataset was created by Anton Popov
It contains the following files:
International language Newton's list comment: Esperanto. Letter in box dated 12.8.1966 from L. Wye of Brisbane gives some details of tape(reproduced from talk given by Capell) and comments on Esperanto. [Compiler's Note: Capell was a dedicated member of the NSW Esperanto Association.] Archival tape notes: 00.00 AIATSIS announcement. Tape Misc. 3, side 1. Talk by Arthur Capell about Esperanto. 36.30 End of Tape Misc. 3, side 1 and end of archive tape. Side 2 of Misc 3 is blank. AIATSIS Identifier: A16674 (no side b). Language as given: Esperanto
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Explore our visualization depicting Books per Publication date where book publisher is Jubilea Brita Esperanto Kongreso, based on data extracted from The British Library and available for download as a PNG.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Credit report of Esperanto Jeans contains unique and detailed export import market intelligence with it's phone, email, Linkedin and details of each import and export shipment like product, quantity, price, buyer, supplier names, country and date of shipment.
https://www.gnu.org/licenses/old-licenses/gpl-2.0-standalone.htmlhttps://www.gnu.org/licenses/old-licenses/gpl-2.0-standalone.html
OntoLex-lemon and TSV conversion of Apertium Bidix. For more details, see https://www.aclweb.org/anthology/2020.lrec-1.401/
Authors of the original data:
(c) 2005--2007, Universitat d'Alacant (Transducens group) (c) 2007, Universitat Pompeu Fabra (c) 2009--, Hèctor Alòs i Font
https://deepfo.com/documentacion.php?idioma=eshttps://deepfo.com/documentacion.php?idioma=es
Radios en Esperanto. nombre, imagen, Fecha inicio operaciones, Fecha de fundación, Frecuencia, ciudad sede, división administrativa sede, país sede, continente sede, País, continente, cobertura, Idioma, Prefijo, Fecha de disolución, Sitio web, Owner
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Text2TCS automatically extracts terminological concept systems from natural language text. Terms are domain-specific natural language expressions that describe domain-specific concepts. It extracts terms, concepts and concept relations and represent them in a terminological concept system, building on a prespecified relation typology: generic, partitive, activity, associative, causal, spatial, instrumental, origination, and property relations. Syonyms are detected and finally grouped in the output format (text and TBX/XML).
The system has been trained on English and German but builds on a pre-trained multilingual neural model (XLM-R) that allows Text2TCS to transfer its functionality to the following languages: Afrikaans, Albanian, Amharic, Arabic, Armenian, Assamese, Azerbaijani, Basque, Belarusian, Bengali, Bengali Romanized, Bosnian, Breton, Bulgarian, Burmese, Catalan, Chinese (Simplified), Chinese (Traditional), Croatian, Czech, Danish, Dutch, English, Esperanto, Estonian, Filipino, Finnish, French, Galician, Georgian, German, Greek, Gujarati, Hausa, Hebrew, Hindi, Hindi Romanized, Hungarian, Icelandic, Indonesian, Irish, Italian, Japanese, Javanese, Kannada, Kazakh, Khmer, Korean, Kurdish (Kurmanji), Kyrgyz, Lao, Latin, Latvian, Lithuanian, Macedonian, Malagasy, Malay, Malayalam, Marathi, Mongolian, Nepali, Norwegian, Oriya, Oromo, Pashto, Persian, Polish, Portuguese, Punjabi, Romanian, Russian, Sanskri, Scottish, Gaelic, Serbian, Sindhi, Sinhala, Slovak, Slovenian, Somali, Spanish, Sundanese, Swahili, Swedish, Tamil, Tamil Romanized, Telugu, Telugu Romanized, Thai, Turkish, Ukrainian, Urdu, Urdu Romanized, Uyghur, Uzbek, Vietnamese, Welsh, Western, Frisian, Xhosa, Yiddish.
The list of input and output languages below is more restrictive since we utilize an automated language recognition tool and a sentence tokenizer. The indicated languages represent the languages officially supported by those two tools and XLM-R, even though our application might be able to also process other languages from the list above.
Esperanto Dataset
Mostly English
Vortlisto
https://github.com/paulmakepeace/vortlisto
License
The original word list was created in the 90's for a now-defunct exam (see README.md for more details). It's unclear what the copyright status of that text is. Bill Walker with others' help provided translations. Those translations came from various sources. (Who owns the translation of a word?) Bill Walker has kindly given permission for his gcselist.htm… See the full description on the dataset page: https://huggingface.co/datasets/Infinitestarcode/esperanto.