Esperanto Dataset
Mostly English
Vortlisto
https://github.com/paulmakepeace/vortlisto
License
The original word list was created in the 90's for a now-defunct exam (see README.md for more details). It's unclear what the copyright status of that text is. Bill Walker with others' help provided translations. Those translations came from various sources. (Who owns the translation of a word?) Bill Walker has kindly given permission for his gcselist.htm… See the full description on the dataset page: https://huggingface.co/datasets/Infinitestarcode/esperanto.
https://catalog.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdfhttps://catalog.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdf
https://catalog.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdfhttps://catalog.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdf
The Arbobanko (Esperanto Treebank) is a 52,000 token dependency treebank of Esperanto with texts from the MONATO news magazine, consisting of random excerpts from the period 2000-2010. All words were annotated for lemma, part-of-speech, inflection, compounding and affixing, syntactic function, dependency links, NER types, semantic types of nouns and adjectives, and verb frame categories.Morphosyntactic and dependency annotation was performed with the EspGram parser, and manually revised. Semantic categories were added in a second round of annotation, and are also manually revised and disambiguated. The format is native Constraint Grammar sgml, with token-based tag lines, xml with feature-attribute pairs or CoNNL tab format.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Explore Esperanto badge through unique data from multiples sources: key facts, real-time news, interactive charts, detailed maps & open datasets
https://deepfo.com/documentacion.php?idioma=enhttps://deepfo.com/documentacion.php?idioma=en
Radios in Esperanto. name, image, date Commenced operations, date founded, Frequency, city Headquarters, administrative division Headquarters, country Headquarters, continent Headquarters, Country, continent, coverage, Language, Prefix, date dissolved, Website, Owner
This is the LMF version of the Apertium bilingual dictionary for Esperanto and English languages. Bilingual LMF dictionaries were generated from Apertium bilingual dix files. For each Apertium bilingual correspondence, the corresponding source and target monolingual entries (LexicalEntry) were generated in addition to the bilingual correspondence (SenseAxis) element. Apertium is a free/open-source machine translation platform, initially aimed at related-language pairs but recently expanded to deal with more divergent language pairs (such as Esperanto-English). The platform provides: a language-independent machine translation engine; tools to manage the linguistic data necessary to build a machine translation system for a given language pair and linguistic data for a growing number of language pairs.
https://dataverse.csuc.cat/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.34810/data316https://dataverse.csuc.cat/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.34810/data316
This is the LMF version of the Apertium bilingual dictionary for Esperanto and Catalanlanguages. Bilingual LMF dictionaries were generated from Apertium bilingual dix files. For each Apertium bilingual correspondence, the corresponding source and target monolingual entries (LexicalEntry) were generated in addition to the bilingual correspondence (SenseAxis) element. Apertium is a free/open-source machine translation platform, initially aimed at related-language pairs but recently expanded to deal with more divergent language pairs (such as Esperanto-Catalan). The platform provides: a language-independent machine translation engine; tools to manage the linguistic data necessary to build a machine translation system for a given language pair and linguistic data for a growing number of language pairs.
Esperanto is a language from the Constructed family, spoken in Eurasia. The UKC Lexicon of Esperanto is represented as a lexico-semantic network. It consists of words, word senses, synsets, as well as sense-level and synset-level relationships.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Credit report of Esperanto Jeans contains unique and detailed export import market intelligence with it's phone, email, Linkedin and details of each import and export shipment like product, quantity, price, buyer, supplier names, country and date of shipment.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Credit report of International Ready Esperanto contains unique and detailed export import market intelligence with it's phone, email, Linkedin and details of each import and export shipment like product, quantity, price, buyer, supplier names, country and date of shipment.
https://deepfo.com/documentacion.php?idioma=enhttps://deepfo.com/documentacion.php?idioma=en
magazines in Esperanto. name, image, categories, date Closed, date first issue, date founded, Frequency, city Headquarters, administrative division Headquarters, country Headquarters, continent Headquarters, Country, continent, ISSN, Website
https://www.gnu.org/licenses/old-licenses/gpl-2.0-standalone.htmlhttps://www.gnu.org/licenses/old-licenses/gpl-2.0-standalone.html
OntoLex-lemon and TSV conversion of Apertium Bidix. For more details, see https://www.aclweb.org/anthology/2020.lrec-1.401/
Authors of the original data:
(c) 2008--2009, Jacob Nordfalk (c) 2009, Hèctor Alòs i Font (c) 2005--2007, Universitat d'Alacant (Transducens group) -- English data (c) 2005--2007, Universitat Pompeu Fabra -- English data
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Explore Secondary school Esperanto through unique data from multiples sources: key facts, real-time news, interactive charts, detailed maps & open datasets
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Explore The accepted Esperanto dictionary = Leksikono de oficialaj vortoj through unique data from multiples sources: key facts, real-time news, interactive charts, detailed maps & open datasets
This dataset was created by Anton Popov
It contains the following files:
The associated repository contains the code and the corpora that were used in order to build a "learnable" system that generates open-domain textual summaries in Arabic and Esperanto given a set of Wikidata triples as input. The two corpora that have been used for the experiments are included in the repository: (i) Wikidata triples aligned with Wikipedia summaries in Arabic and (ii) Wikidata triples aligned with Wikipedia summaries in Esperanto.
"Apertium is a toolbox to build open-source shallow-transfer machine translation systems, especially suitable for related language pairs: it includes the engine, maintenance tools, and open linguistic data for several language pairs."
Language-pair data includes:
The above are the "released" language pairs, data includes:
There is also a lot of data of the above kinds for unreleased language pairs, eg. Icelandic → English, North Sámi → Lule Sámi; and tools to maintain such data.
COPYING file in language pair data archive contains a copy of the GPL.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Text2TCS automatically extracts terminological concept systems from natural language text. Terms are domain-specific natural language expressions that describe domain-specific concepts. It extracts terms, concepts and concept relations and represent them in a terminological concept system, building on a prespecified relation typology: generic, partitive, activity, associative, causal, spatial, instrumental, origination, and property relations. Syonyms are detected and finally grouped in the output format (text and TBX/XML).
The system has been trained on English and German but builds on a pre-trained multilingual neural model (XLM-R) that allows Text2TCS to transfer its functionality to the following languages: Afrikaans, Albanian, Amharic, Arabic, Armenian, Assamese, Azerbaijani, Basque, Belarusian, Bengali, Bengali Romanized, Bosnian, Breton, Bulgarian, Burmese, Catalan, Chinese (Simplified), Chinese (Traditional), Croatian, Czech, Danish, Dutch, English, Esperanto, Estonian, Filipino, Finnish, French, Galician, Georgian, German, Greek, Gujarati, Hausa, Hebrew, Hindi, Hindi Romanized, Hungarian, Icelandic, Indonesian, Irish, Italian, Japanese, Javanese, Kannada, Kazakh, Khmer, Korean, Kurdish (Kurmanji), Kyrgyz, Lao, Latin, Latvian, Lithuanian, Macedonian, Malagasy, Malay, Malayalam, Marathi, Mongolian, Nepali, Norwegian, Oriya, Oromo, Pashto, Persian, Polish, Portuguese, Punjabi, Romanian, Russian, Sanskri, Scottish, Gaelic, Serbian, Sindhi, Sinhala, Slovak, Slovenian, Somali, Spanish, Sundanese, Swahili, Swedish, Tamil, Tamil Romanized, Telugu, Telugu Romanized, Thai, Turkish, Ukrainian, Urdu, Urdu Romanized, Uyghur, Uzbek, Vietnamese, Welsh, Western, Frisian, Xhosa, Yiddish.
The list of input and output languages below is more restrictive since we utilize an automated language recognition tool and a sentence tokenizer. The indicated languages represent the languages officially supported by those two tools and XLM-R, even though our application might be able to also process other languages from the list above.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Audio files sampled at 48000Hz of an American male pronouncing the names of the Esperanto letters in three ways. Retroflex-r and trilled-r are included.
https://deepfo.com/documentacion.php?idioma=eshttps://deepfo.com/documentacion.php?idioma=es
Radios en Esperanto. nombre, imagen, Fecha inicio operaciones, Fecha de fundación, Frecuencia, ciudad sede, división administrativa sede, país sede, continente sede, País, continente, cobertura, Idioma, Prefijo, Fecha de disolución, Sitio web, Owner
Esperanto Dataset
Mostly English
Vortlisto
https://github.com/paulmakepeace/vortlisto
License
The original word list was created in the 90's for a now-defunct exam (see README.md for more details). It's unclear what the copyright status of that text is. Bill Walker with others' help provided translations. Those translations came from various sources. (Who owns the translation of a word?) Bill Walker has kindly given permission for his gcselist.htm… See the full description on the dataset page: https://huggingface.co/datasets/Infinitestarcode/esperanto.