100+ datasets found

P
French Wikipedia Dataset
paperswithcode.com
opendatalab.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Louis Martin; Benjamin Muller; Pedro Javier Ortiz Suárez; Yoann Dupont; Laurent Romary; Éric Villemonte de la Clergerie; Djamé Seddah; Benoît Sagot, French Wikipedia Dataset [Dataset]. https://paperswithcode.com/dataset/french-wikipedia
Explore at:
Authors
Louis Martin; Benjamin Muller; Pedro Javier Ortiz Suárez; Yoann Dupont; Laurent Romary; Éric Villemonte de la Clergerie; Djamé Seddah; Benoît Sagot
Area covered
French
Description
French Wikipedia is a dataset used for pretraining the CamemBERT French language model. It uses the official 2019 French Wikipedia dumps
Accès des ménages français à Internet 2006-2023
fr.statista.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista, Accès des ménages français à Internet 2006-2023 [Dataset]. https://fr.statista.com/statistiques/509227/menage-francais-acces-internet/
Explore at:
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2006 - 2023
Area covered
France, French
Description
Cette statistique met en évidence la part des ménages ayant un accès Internet en France de 2006 à 2023. On constate que le taux de pénétration d'Internet au sein des foyers français a dépassé 80 % en 2012. En 2023, 93 % des ménages français avaient accès à Internet.Le taux de pénétration d'Internet diffère selon l'âge : en 2016, 92 % des 18-24 ans se déclaraient internautes, contre seulement 56 % des personnes âgées de 70 ans et plus.
d
Manifestations (Français)
data.gouv.fr
data.europa.eu
+2more
csv, json, shp, xls
Updated Feb 11, 2015
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2015). Manifestations (Français) [Dataset]. https://www.data.gouv.fr/en/datasets/manifestations-francais-pdct/
Explore at:
csv, json, xls, shpAvailable download formats
Dataset updated
Feb 11, 2015
License
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Area covered
French
Description
Manifestations dans le Pas-de-Calais. \ Ce jeu de données est en Français.
d
Perception and production of plosives: Data from Norwegian learners of...
dataone.org
dataverse.no
+1more
Updated Jan 5, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andreassen, Helene N. (2024). Perception and production of plosives: Data from Norwegian learners of French L3 [Dataset]. http://doi.org/10.18710/MIFRWN
Explore at:
Unique identifier
https://doi.org/10.18710/MIFRWN
Dataset updated
Jan 5, 2024
Dataset provided by
DataverseNO
Authors
Andreassen, Helene N.
Area covered
French
Description
This dataset contains different measures of plosives produced by 16 Norwegian learners of French as a third language during a reading task and a repetition task. The data are extracted from two corpora collected within the framework of the IPFC project (Interphonologie du français contemporain): the Tromsø corpus with high school students, and the Oslo corpus with university students enrolled in a first year course on French phonetics and phonology. The dataset contains four files: A readme file, the word list used during the reading and repetition tasks, a data file containing all measures, and a text file presenting average values and VOT ranges for the individual informants.
P
Fon-French Dataset Dataset
paperswithcode.com
Updated Jun 13, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bonaventure F. P. Dossou; Chris C. Emezue (2020). Fon-French Dataset Dataset [Dataset]. https://paperswithcode.com/dataset/fon-french-dataset
Explore at:
Dataset updated
Jun 13, 2020
Authors
Bonaventure F. P. Dossou; Chris C. Emezue
Area covered
French
Description
FFR Dataset is an ongoing project to collect, clean and store corpora of Fon and French sentences for machine translation from Fon-French. Fon (also called Fongbe) is an African-indigenous language spoken mostly in Benin, by about 1.7 million people. As training data is crucial to the high performance of a machine learning model, the aim of the project is to compile the largest set of training corpora for the research and design of translation and NLP models involving Fon. There are 117,029 parallel Fon-French sentences at the moment.
F
French Shopping List OCR Image Dataset
futurebeeai.com
wav
Updated Aug 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). French Shopping List OCR Image Dataset [Dataset]. https://www.futurebeeai.com/dataset/ocr-dataset/french-shopping-list-ocr-image-dataset
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/data-license-agreementhttps://www.futurebeeai.com/data-license-agreement
Area covered
French
Dataset funded by
FutureBeeAI
Description
What’s Included
Introducing the French Shopping List Image Dataset - a diverse and comprehensive collection of handwritten text images carefully curated to propel the advancement of text recognition and optical character recognition (OCR) models designed specifically for the French language.
Dataset Contain & Diversity:
Containing more than 2000 images, this French OCR dataset offers a wide distribution of different types of shopping list images. Within this dataset, you'll discover a variety of handwritten text, including sentences, and individual item name words, quantity, comments, etc on shopping lists. The images in this dataset showcase distinct handwriting styles, fonts, font sizes, and writing variations.
To ensure diversity and robustness in training your OCR model, we allow limited (less than three) unique images in a single handwriting. This ensures we have diverse types of handwriting to train your OCR model on. Stringent measures have been taken to exclude any personally identifiable information (PII) and to ensure that in each image a minimum of 80% of space contains visible French text.
The images have been captured under varying lighting conditions, including day and night, as well as different capture angles and backgrounds. This diversity helps build a balanced OCR dataset, featuring images in both portrait and landscape modes.
All these shopping lists were written and images were captured by native French people to ensure text quality, prevent toxic content, and exclude PII text. We utilized the latest iOS and Android mobile devices with cameras above 5MP to maintain image quality. Images in this training dataset are available in both JPEG and HEIC formats.
Metadata:
In addition to the image data, you will receive structured metadata in CSV format. For each image, this metadata includes information on image orientation, country, language, and device details. Each image is correctly named to correspond with the metadata.
This metadata serves as a valuable resource for understanding and characterizing the data, aiding informed decision-making in the development of French text recognition models.
Update & Custom Collection:
We are committed to continually expanding this dataset by adding more images with the help of our native French crowd community.
If you require a customized OCR dataset containing shopping list images tailored to your specific guidelines or device distribution, please don't hesitate to contact us. We have the capability to curate specialized data to meet your unique requirements.
Additionally, we can annotate or label the images with bounding boxes or transcribe the text in the images to align with your project's specific needs using our crowd community.
License:
This image dataset, created by FutureBeeAI, is now available for commercial use.
Conclusion:
Leverage this shopping list image OCR dataset to enhance the training and performance of text recognition, text detection, and optical character recognition models for the French language. Your journey to improved language understanding and processing begins here.
s
Wake Word French Dataset | Shaip
ha.shaip.com
sm.shaip.com
+57more
Updated Dec 24, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shaip (2024). Wake Word French Dataset | Shaip [Dataset]. https://ha.shaip.com/offerings/speech-data-catalog/wake-word-french-dataset/
Explore at:
Dataset updated
Dec 24, 2024
Dataset authored and provided by
Shaip
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
French
Description
This dataset provides a comprehensive set of audio recordings for wake word detection in French. It features a range of accents, speech patterns, and environmental conditions to ensure reliable and accurate performance of speech recognition systems. Ideal for developers and researchers working on French language technology solutions.
d
The French English Discourse Study – Canada (FrEnDS-CAN)
search.dataone.org
borealisdata.ca
Updated Oct 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cleave, Patricia L.; Chen, Xi; Cormier, Pierre; Kay-Raining Bird, Elizabeth; MacLeod, Andrea; Rezzonica, Stefeno; Slavkov, Nikolay; Sutton, Ann; Trudeau, Natacha (2024). The French English Discourse Study – Canada (FrEnDS-CAN) [Dataset]. http://doi.org/10.5683/SP3/6Z8KIO
Explore at:
Unique identifier
https://doi.org/10.5683/SP3/6Z8KIO
Dataset updated
Oct 2, 2024
Dataset provided by
Borealis
Authors
Cleave, Patricia L.; Chen, Xi; Cormier, Pierre; Kay-Raining Bird, Elizabeth; MacLeod, Andrea; Rezzonica, Stefeno; Slavkov, Nikolay; Sutton, Ann; Trudeau, Natacha
Area covered
French, Canada
Description
The French English Discourse Study – Canada (FrEnDS-CAN) is a multisite research project lead by a consortium of researchers from a number of Canadian universities. The project examined the development of discourse skills in mono- and bilingual children between the ages of 7 and 12. Discourse samples were collected from monolingual French, monolingual English, and bilingual French and English-speaking children. In addition, samples were collected from children who spoke Arabic as their home language. Three discourse contexts were included, conversational, narrative (story telling) and expository (description of favorite game or sport). There were two main objectives for the study: 1) to increase our understanding of monolingual and bilingual development by analyzing the impacts of language status (mono-/bilingual) and discourse type on various aspects of language development at the word, sentence and discourse structure levels and 2) to improve our ability to identify language disabilities in monolingual and bilingual children through the development of normative information and the creation of databases of language samples in the three discourse contexts.
E
BDSONS Base de données des sons du français
catalogue.elra.info
live.european-language-grid.eu
Updated Mar 31, 2005
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency) (2005). BDSONS Base de données des sons du français [Dataset]. https://catalogue.elra.info/en-us/repository/browse/ELRA-S0005/
Explore at:
Dataset updated
Mar 31, 2005
Dataset provided by
ELRA (European Language Resources Association)
ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency)
License
https://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdfhttps://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdf
https://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdfhttps://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdf
Area covered
French
Description
The BDSONS Database is a French - speech database with two subsets: evaluation and acoustic modelling. The Corpora consist of 32 speakers: 16 male and 16 female (7 CD-ROMs of approximately 3,5 Gigabytes), Phonetic labelling (partly) available on additional floppies, of the following data: "Evaluation" (32 speakers): adjustment: 5 sentences and 54 bi-syllabic "logatomes", numbers, digits, letters, and names (spelled in isolation and in connected speech). "Acoustic" (12 speakers): Words: 600 CVCV including 20 consonant and semi-consonant and vowels /a/, /i/, /u/ ; 200 consonant clusters; rhyme tests for consonant and vowels (pairs and triplets), sentences: 52 phonetically balanced sentences, 44 nasal sentences, 192 sentences including real words in French with 16 consonants and 12 vowels. Phonetic labelling for a subset of the data is available on floppy disk.
D
Replication Data for: Les expressions spatiales en français médiéval:...
dataverse.no
dataverse.azure.uit.no
+1more
txt
Updated Sep 28, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Thomas Rainsford; Thomas Rainsford (2023). Replication Data for: Les expressions spatiales en français médiéval: particules et formes préfixées en de- [Dataset]. http://doi.org/10.18710/X5ZFXZ
Explore at:
txt(18451), txt(10248), txt(6659), txt(38092), txt(191770), txt(82867), txt(145338), txt(130473), txt(55781), txt(1710), txt(111842), txt(394452), txt(73757), txt(4428), txt(3238)Available download formats
Unique identifier
https://doi.org/10.18710/X5ZFXZ
Dataset updated
Sep 28, 2023
Dataset provided by
DataverseNO
Authors
Thomas Rainsford; Thomas Rainsford
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
French
Description
This dataset contains the raw data and the R scripts necessary to replicate all tables and figures in the cited publication. The raw data consists of manually-annotated plain-text concordances containing instances of five pairs of Old French spatial prepositions (ens/dedans, hors/dehors, avant/devant, arriere/derriere, sus/dessus). The concordances were initially extracted from the "Base de Français Médiéval" corpus (http://txm.bfm-corpus.org/). (publication abstract): This paper compares the syntactic distribution of two separate series of spatial preposition-adverbs in medieval French: "base" forms descended directly from Latin adverbs and forms prefixed with de-. As both types of form may occur with a similar meaning either as prepositions or as adverbs, many grammars of Old French typically consider them to be free variants. However, on the basis of a detailed quantitative analysis of five pairs of forms across 1.4 million words of medieval French drawn from the Base de français médiéval corpus, I argue that the base forms are particles, being favoured in motion expressions and showing limited prepositional uses, while the de-prefixed forms, favoured in static contexts or as locative adjuncts, are best analysed as locative adverbs with secondary prepositional uses.
w
Books called On parle français
workwithdata.com
Updated Jul 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Work With Data (2024). Books called On parle français [Dataset]. https://www.workwithdata.com/datasets/books?f=1&fcol0=book&fop0=%3D&fval0=On+parle+fran%C3%A7ais
Explore at:
Dataset updated
Jul 19, 2024
Dataset authored and provided by
Work With Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
French
Description
This dataset is about books and is filtered where the book is On parle français, featuring 7 columns including author, BNB id, book, book publisher, and ISBN. The preview is ordered by publication date (descending).
c
Flash Eurobarometer 346 (Les Français et l´Union européenne)
datacatalogue.cessda.eu
search.gesis.org
+1more
Updated Mar 14, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
European Commission (2023). Flash Eurobarometer 346 (Les Français et l´Union européenne) [Dataset]. http://doi.org/10.4232/1.11630
Explore at:
Unique identifier
https://doi.org/10.4232/1.11630
Dataset updated
Mar 14, 2023
Dataset provided by
Brussels DG Communication COMM A1 ´Research and Speechwriting´
Authors
European Commission
Time period covered
Feb 27, 2012 - Feb 28, 2012
Area covered
France
Measurement technique
Telephone interview: Computer-assisted (CATI)
Description
Attitudes towards the European Union.
Topics: attitude towards the following statements on European integration: guarantees peace on the continent, makes France stronger against the rest of the world, contributes to France’s prosperity; preferred decision level for measures against the economic crisis: national or EU level; management of the economic crisis to date as joint action of the European countries or following national interests; attitude towards selected propositions: increased monitoring of national budgets by the EU, increased regulation of financial markets, stricter monitoring of rating agencies, introduction of a financial transaction tax, harmonization of the taxation systems of the member states, programme to stimulate economic growth, ban on imports of products from certain countries, principle of reciprocity in international exchange, decisions on EU level on the basis of a qualified majority and not unanimously; respondent feels well informed about political life in France and in the EU, citizens need more information on the EU given by French politicians, citizens need more information on the EU given by the media; satisfaction with the current personal situation and expected development for the next three years; left-right self-placement.

Demography: age; sex; age at end of education; occupation; professional position; nationality; region; type of community; own a mobile phone and fixed (landline) phone; household composition and household size.

Additionally coded was: type of phone line; weighting factor.
A
‘Niveau de vie des Français par commune’ analyzed by Analyst-2
analyst-2.ai
Updated Oct 30, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2017). ‘Niveau de vie des Français par commune’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/data-europa-eu-niveau-de-vie-des-francais-par-commune-44c4/latest
Explore at:
Dataset updated
Oct 30, 2017
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
French
Description
Analysis of ‘Niveau de vie des Français par commune’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from http://data.europa.eu/88u/dataset/59f89adf88ee381016f69c0c on 17 January 2022.

--- Dataset description provided by original source is as follows ---

L'Insee a publié les niveaux de vie des ménages par commune pour l'année 2014. Le dispositif d'analyse, appelé Filosofi, permet de détailler où se situent les zones de pauvreté en France.

--- Original source retains full ownership of the source dataset ---
Groupes sanguins des Français, selon le système ABO
fr.statista.com
Updated Mar 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Groupes sanguins des Français, selon le système ABO [Dataset]. https://fr.statista.com/statistiques/656008/groupes-sanguins-repartition-abo-france/
Explore at:
Dataset updated
Mar 5, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2023
Area covered
France
Description
Cette statistique illustre la répartition des groupes sanguins dans la population française, selon le système ABO. On peut y lire que moins de 5 % des Français possèdent le groupe sanguin AB. Pour plus d'informations, vous pouvez consulter notre infographie sur la compatibilité des groupes sanguins.
General evolution of new immigrant’s understanding of written French...
statista.com
Updated Jul 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2024). General evolution of new immigrant’s understanding of written French 2019-2022 [Dataset]. https://www.statista.com/statistics/1454636/evolution-of-immigrants-understanding-of-french/
Explore at:
Dataset updated
Jul 4, 2024
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
France
Description
In France, in 2022, half of the new immigrants that took a test related to written comprehension of the French language had 80 percent or more success rate. The number has increased since 2019, indeed that year only 40 percent of the new immigrants were highly successful when completing the test, and they were 44 percent in 2020.
g
Phytochorologie des départements français
gbif.org
Updated Sep 7, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
GBIF (2016). Phytochorologie des départements français [Dataset]. http://doi.org/10.15468/x3te8g
Explore at:
Unique identifier
https://doi.org/10.15468/x3te8g
Dataset updated
Sep 7, 2016
Dataset provided by
Tela Botanica
GBIF
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Ce projet à pour but de réaliser les listes de plantes présentes dans chaque département français.

Un travail d'harmonisation est réalisé pour la nomenclature des taxons sur la base de l'index synonymique réalisé par Benoît BOCK dans le cadre du projet "Index synonymique" de Tela Botanica.

Les listes réalisées seront disponibles en ligne au fur et à mesure de leur réalisation.

Vous pouvez aussi consulter, via une interface web, l'état d'avancement du projet : http://www.tela-botanica.org/chorologie
o
Liste des musées français
userclub.opendatasoft.com
csv, excel, json
Updated Nov 29, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Liste des musées français [Dataset]. https://userclub.opendatasoft.com/explore/dataset/liste-des-musees-francais/
Explore at:
csv, excel, jsonAvailable download formats
Dataset updated
Nov 29, 2024
License
Licence Ouverte / Open Licence 1.0https://www.etalab.gouv.fr/wp-content/uploads/2014/05/Open_Licence.pdf
License information was derived automatically
Area covered
French
Description
Il n'y a pas de description pour ce jeu de données.
Observatoire à long-terme du programme Ornitho-Eco (Institut Polaire...
gbif.org
Updated Dec 16, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Inventaire National du Patrimoine Naturel; Inventaire National du Patrimoine Naturel (2022). Observatoire à long-terme du programme Ornitho-Eco (Institut Polaire Français - IPEV) sur les campagnes en mer dans les Terres Australes et Antarctiques Françaises - Observations de mammifères marins issues des campagnes dans les Terres Australes et Antarctiques Françaises (programme 109 Institut Polaire Français - IPEV) [Dataset]. http://doi.org/10.15468/nutvhu
Explore at:
Unique identifier
https://doi.org/10.15468/nutvhu
Dataset updated
Dec 16, 2022
Dataset provided by
Global Biodiversity Information Facilityhttps://www.gbif.org/
UMS PatriNat (OFB-CNRS-MNHN), Paris
Authors
Inventaire National du Patrimoine Naturel; Inventaire National du Patrimoine Naturel
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Apr 10, 1979 - Jan 30, 2014
Area covered

Description
Ce jeu de données est issu de la base appelée Pelagis-Observations (base de données développée et administrée par l'UMS 3468 BBEES). Il rassemble les données d'observations de mammifères marins collectées au cours d'embarquements à bord des navires Astrolabe et Marion Dufresne I et II lors des campagnes dans les Terres Australes et Antarctiques Françaises (programme 109 Institut Polaire Français - IPEV), entre 1982 et 2015. Il s’agit ici des campagnes logistiques, scientifiques et océanographiques OISO.
A
‘2014-24 - Visas délivrés aux conjoints de Français’ analyzed by Analyst-2
analyst-2.ai
Updated Nov 16, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2021). ‘2014-24 - Visas délivrés aux conjoints de Français’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/data-europa-eu-2014-24-visas-delivres-aux-conjoints-de-francais-5b13/32ed7b1a/?iid=000-416&v=presentation
Explore at:
Dataset updated
Nov 16, 2021
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
French
Description
Analysis of ‘2014-24 - Visas délivrés aux conjoints de Français’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from http://data.europa.eu/88u/dataset/543bd41488ee3805963c163f on 19 January 2022.

--- Dataset description provided by original source is as follows ---

Analyse de l’évolution de la délivrance pour les principales catégories de visas

--- Original source retains full ownership of the source dataset ---
d
Replication Data for: Le rôle de la variation dans le développement...
search.dataone.org
dataverse.no
Updated Jan 5, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andreassen, Helene N.; Lyche, Chantal (2024). Replication Data for: Le rôle de la variation dans le développement phonologique: Acquisition du schwa illustrée par deux corpus d'apprenants norvégiens [Dataset]. https://search.dataone.org/view/sha256%3Acc111488cf676e1f40a9c30de8e78449ead7532266110dd938e19a7d7bc355a9
Explore at:
Dataset updated
Jan 5, 2024
Dataset provided by
DataverseNO
Authors
Andreassen, Helene N.; Lyche, Chantal
Description
Article abstract: Phonological variation forms an integrated part of language acquisition, and one important challenge for learners of French as a post-L1 language concerns schwa alternation, in perception as well as production. This paper presents a first analysis of the behavior of schwa in conversational speech in two Norwegian learner corpora, and tests the hypothesis whereby the acquisition of the phonological variable depends on phonotactic structure and frequency. The examination of the learners' productions indicates that even at an advanced level, they are far from mastering the target system, which encourages a more explicit exposition in the classroom to the factors conditioning schwa alternation. About the dataset: The dataset consists of 2 data files and a readme-file explaining the content of the data files. The 2 data files contain information about schwa behavior in monosyllables and the initial syllable of polysyllables, in conversational speech, in two learner corpora (Norwegian learners of French as a post-L1 language). The data are coded by making use of the schwa pilot coding system developed within the IPFC project (Interphonologie du français contemporain). For access to the sound files, contact the authors.

Facebook

Twitter

Click to copy link

Link copied

Cite

Louis Martin; Benjamin Muller; Pedro Javier Ortiz Suárez; Yoann Dupont; Laurent Romary; Éric Villemonte de la Clergerie; Djamé Seddah; Benoît Sagot, French Wikipedia Dataset [Dataset]. https://paperswithcode.com/dataset/french-wikipedia

French Wikipedia Dataset

Explore at:

Authors

Louis Martin; Benjamin Muller; Pedro Javier Ortiz Suárez; Yoann Dupont; Laurent Romary; Éric Villemonte de la Clergerie; Djamé Seddah; Benoît Sagot

Area covered

French

Description

French Wikipedia is a dataset used for pretraining the CamemBERT French language model. It uses the official 2019 French Wikipedia dumps

French Wikipedia Dataset

Accès des ménages français à Internet 2006-2023

Manifestations (Français)

Perception and production of plosives: Data from Norwegian learners of...

Fon-French Dataset Dataset

French Shopping List OCR Image Dataset

What’s Included

Wake Word French Dataset | Shaip

The French English Discourse Study – Canada (FrEnDS-CAN)

BDSONS Base de données des sons du français

Replication Data for: Les expressions spatiales en français médiéval:...

Books called On parle français

Flash Eurobarometer 346 (Les Français et l´Union européenne)

‘Niveau de vie des Français par commune’ analyzed by Analyst-2

Groupes sanguins des Français, selon le système ABO

General evolution of new immigrant’s understanding of written French...

Phytochorologie des départements français

Liste des musées français

Observatoire à long-terme du programme Ornitho-Eco (Institut Polaire...

‘2014-24 - Visas délivrés aux conjoints de Français’ analyzed by Analyst-2

Replication Data for: Le rôle de la variation dans le développement...

French Wikipedia Dataset