Saved datasets
Last updated
Download format
Croissant
Croissant is a format for Machine Learning datasets
Learn more about this at mlcommons.org/croissant.
Usage rights
License from data provider
Please review the applicable license to make sure your contemplated use is permitted.
Topic
Provider
Free
Cost to access
Described as free to access or have a license that allows redistribution.
100+ datasets found
  1. h

    WikipediaUpdated

    • huggingface.co
    Updated May 4, 2023
    + more versions
  2. T

    wikipedia

    • tensorflow.org
    • huggingface.co
    Updated Aug 9, 2019
  3. h

    wikipedia-summary-dataset

    • huggingface.co
    Updated Feb 15, 2023
  4. c

    Plaintext Wikipedia dump 2018

    • lindat.mff.cuni.cz
    • live.european-language-grid.eu
    Updated Feb 25, 2018
  5. Extended Wikipedia Multimodal Dataset

    • kaggle.com
    zip
    Updated Apr 4, 2020
  6. P

    Wiki-en Dataset

    • paperswithcode.com
    • opendatalab.com
    Updated Jul 25, 2019
  7. h

    simple-wikipedia

    • huggingface.co
    Updated Aug 17, 2023
  8. Data from: Wikipedia Citations: A comprehensive dataset of citations with...

    • zenodo.org
    zip
    Updated Nov 12, 2020
  9. P

    Wizard of Wikipedia Dataset

    • paperswithcode.com
    Updated Aug 4, 2022
  10. f

    Data from: Wiki-Reliability: A Large Scale Dataset for Content Reliability...

    • figshare.com
    txt
    Updated Mar 14, 2021
  11. f

    Wikipedia Article Topics for All Languages (based on article outlinks)

    • figshare.com
    bz2
    Updated Jul 20, 2021
  12. Wikipedia Article Networks

    • kaggle.com
    zip
    Updated Nov 12, 2019
  13. Plain Text Wikipedia 2020-11

    • kaggle.com
    zip
    Updated Nov 27, 2020
  14. P

    Wikipedia Person and Animal Dataset Dataset

    • paperswithcode.com
    Updated Nov 27, 2021
    + more versions
  15. wikipedia-22-12-simple-embeddings

    • huggingface.co
    • opendatalab.com
    Updated Mar 29, 2023
    + more versions
  16. Arabic Wiki data Dump 2018

    • kaggle.com
    zip
    Updated Feb 6, 2018
  17. f

    English Wikipedia Quality Asssessment Dataset

    • figshare.com
    application/bzip2
    Updated May 31, 2023
  18. Wikipedia Talk Labels: Personal Attacks

    • figshare.com
    txt
    Updated Feb 22, 2017
    + more versions
  19. d

    Bangla Wikipedia dataset - Dataset - B2FIND

    • b2find.dkrz.de
    Updated Oct 31, 2023
    + more versions
  20. Wikipedia Talk Corpus

    • figshare.com
    application/x-gzip
    Updated Jan 23, 2017
Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
jojo jenkins (2023). WikipediaUpdated [Dataset]. https://huggingface.co/datasets/luciferxf/WikipediaUpdated

WikipediaUpdated

luciferxf/WikipediaUpdated

Explore at:
Dataset updated
May 4, 2023
Authors
jojo jenkins
Description

Wikipedia dataset containing cleaned articles of all languages. The datasets are built from the Wikipedia dump (https://dumps.wikimedia.org/) with one split per language. Each example contains the content of one full Wikipedia article with cleaning to strip markdown and unwanted sections (references, etc.).

Search
Clear search
Close search
Google apps
Main menu