Search
Clear search
Close search
Main menu
Google apps
100+ datasets found
  1. T

    mnist

    • tensorflow.org
    • universe.roboflow.com
    • +5more
    Updated Jun 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). mnist [Dataset]. https://www.tensorflow.org/datasets/catalog/mnist
    Explore at:
    Dataset updated
    Jun 1, 2024
    Description

    The MNIST database of handwritten digits.

    To use this dataset:

    import tensorflow_datasets as tfds
    
    ds = tfds.load('mnist', split='train')
    for ex in ds.take(4):
     print(ex)
    

    See the guide for more informations on tensorflow_datasets.

    https://storage.googleapis.com/tfds-data/visualization/fig/mnist-3.0.1.png" alt="Visualization" width="500px">

  2. P

    MNIST Dataset

    • paperswithcode.com
    Updated Nov 16, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Y. LeCun; L. Bottou; Y. Bengio; P. Haffner (2021). MNIST Dataset [Dataset]. https://paperswithcode.com/dataset/mnist
    Explore at:
    Dataset updated
    Nov 16, 2021
    Authors
    Y. LeCun; L. Bottou; Y. Bengio; P. Haffner
    Description

    The MNIST database (Modified National Institute of Standards and Technology database) is a large collection of handwritten digits. It has a training set of 60,000 examples, and a test set of 10,000 examples. It is a subset of a larger NIST Special Database 3 (digits written by employees of the United States Census Bureau) and Special Database 1 (digits written by high school students) which contain monochrome images of handwritten digits. The digits have been size-normalized and centered in a fixed-size image. The original black and white (bilevel) images from NIST were size normalized to fit in a 20x20 pixel box while preserving their aspect ratio. The resulting images contain grey levels as a result of the anti-aliasing technique used by the normalization algorithm. the images were centered in a 28x28 image by computing the center of mass of the pixels, and translating the image so as to position this point at the center of the 28x28 field.

  3. o

    mnist_784

    • openml.org
    Updated Sep 29, 2014
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yann LeCun; Corinna Cortes; Christopher J.C. Burges (2014). mnist_784 [Dataset]. https://www.openml.org/d/554
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 29, 2014
    Authors
    Yann LeCun; Corinna Cortes; Christopher J.C. Burges
    Description

    Author: Yann LeCun, Corinna Cortes, Christopher J.C. Burges
    Source: MNIST Website - Date unknown
    Please cite:

    The MNIST database of handwritten digits with 784 features, raw data available at: http://yann.lecun.com/exdb/mnist/. It can be split in a training set of the first 60,000 examples, and a test set of 10,000 examples

    It is a subset of a larger set available from NIST. The digits have been size-normalized and centered in a fixed-size image. It is a good database for people who want to try learning techniques and pattern recognition methods on real-world data while spending minimal efforts on preprocessing and formatting. The original black and white (bilevel) images from NIST were size normalized to fit in a 20x20 pixel box while preserving their aspect ratio. The resulting images contain grey levels as a result of the anti-aliasing technique used by the normalization algorithm. the images were centered in a 28x28 image by computing the center of mass of the pixels, and translating the image so as to position this point at the center of the 28x28 field.

    With some classification methods (particularly template-based methods, such as SVM and K-nearest neighbors), the error rate improves when the digits are centered by bounding box rather than center of mass. If you do this kind of pre-processing, you should report it in your publications. The MNIST database was constructed from NIST's NIST originally designated SD-3 as their training set and SD-1 as their test set. However, SD-3 is much cleaner and easier to recognize than SD-1. The reason for this can be found on the fact that SD-3 was collected among Census Bureau employees, while SD-1 was collected among high-school students. Drawing sensible conclusions from learning experiments requires that the result be independent of the choice of training set and test among the complete set of samples. Therefore it was necessary to build a new database by mixing NIST's datasets.

    The MNIST training set is composed of 30,000 patterns from SD-3 and 30,000 patterns from SD-1. Our test set was composed of 5,000 patterns from SD-3 and 5,000 patterns from SD-1. The 60,000 pattern training set contained examples from approximately 250 writers. We made sure that the sets of writers of the training set and test set were disjoint. SD-1 contains 58,527 digit images written by 500 different writers. In contrast to SD-3, where blocks of data from each writer appeared in sequence, the data in SD-1 is scrambled. Writer identities for SD-1 is available and we used this information to unscramble the writers. We then split SD-1 in two: characters written by the first 250 writers went into our new training set. The remaining 250 writers were placed in our test set. Thus we had two sets with nearly 30,000 examples each. The new training set was completed with enough examples from SD-3, starting at pattern # 0, to make a full set of 60,000 training patterns. Similarly, the new test set was completed with SD-3 examples starting at pattern # 35,000 to make a full set with 60,000 test patterns. Only a subset of 10,000 test images (5,000 from SD-1 and 5,000 from SD-3) is available on this site. The full 60,000 sample training set is available.

  4. a

    MNIST Database

    • academictorrents.com
    bittorrent
    Updated Feb 1, 2001
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Christopher J.C. Burges and Yann LeCun and Corinna Cortes (2001). MNIST Database [Dataset]. https://academictorrents.com/details/ce990b28668abf16480b8b906640a6cd7e3b8b21
    Explore at:
    bittorrentAvailable download formats
    Dataset updated
    Feb 1, 2001
    Dataset authored and provided by
    Christopher J.C. Burges and Yann LeCun and Corinna Cortes
    License

    https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified

    Description

    The MNIST database of handwritten digits, available from this page, has a training set of 60,000 examples, and a test set of 10,000 examples. It is a subset of a larger set available from NIST. The digits have been size-normalized and centered in a fixed-size image. It is a good database for people who want to try learning techniques and pattern recognition methods on real-world data while spending minimal efforts on preprocessing and formatting. The original black and white (bilevel) images from NIST were size normalized to fit in a 20x20 pixel box while preserving their aspect ratio. The resulting images contain grey levels as a result of the anti-aliasing technique used by the normalization algorithm. the images were centered in a 28x28 image by computing the center of mass of the pixels, and translating the image so as to position this point at the center of the 28x28 field. With some classification methods (particuarly template-based methods, such as SVM and K-nearest neighbors),

  5. a

    not-MNIST

    • datasets.activeloop.ai
    • opendatalab.com
    • +3more
    deeplake
    Updated Mar 11, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yaroslav Bulatov (2022). not-MNIST [Dataset]. https://datasets.activeloop.ai/docs/ml/datasets/not-mnist-dataset/
    Explore at:
    deeplakeAvailable download formats
    Dataset updated
    Mar 11, 2022
    Authors
    Yaroslav Bulatov
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The not-MNIST dataset is a dataset of handwritten digits. It is a challenging dataset that can be used for machine learning and artificial intelligence research. The dataset consists of 100,000 images of handwritten digits. The images are divided into a training set of 60,000 images and a test set of 40,000 images. The images are drawn from a variety of fonts and styles, making them more challenging than the MNIST dataset. The images are 28x28 pixels in size and are grayscale. The dataset is available under the Creative Commons Zero Public Domain Dedication license.

  6. P

    Data from: Fashion-MNIST Dataset

    • paperswithcode.com
    Updated Jul 26, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Han Xiao; Kashif Rasul; Roland Vollgraf (2021). Fashion-MNIST Dataset [Dataset]. https://paperswithcode.com/dataset/fashion-mnist
    Explore at:
    Dataset updated
    Jul 26, 2021
    Authors
    Han Xiao; Kashif Rasul; Roland Vollgraf
    Description

    Fashion-MNIST is a dataset comprising of 28×28 grayscale images of 70,000 fashion products from 10 categories, with 7,000 images per category. The training set has 60,000 images and the test set has 10,000 images. Fashion-MNIST shares the same image size, data format and the structure of training and testing splits with the original MNIST.

  7. h

    MNIST

    • huggingface.co
    Updated Mar 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    MNIST [Dataset]. https://huggingface.co/datasets/graphs-datasets/MNIST
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 2, 2023
    Dataset authored and provided by
    Graph Datasets
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset Card for MNIST

      Dataset Summary
    

    The MNIST dataset consists of 55000 images in 10 classes, represented as graphs. It comes from a computer vision dataset.

      Supported Tasks and Leaderboards
    

    MNIST should be used for multiclass graph classification.

      External Use
    
    
    
    
    
    
    
      PyGeometric
    

    To load in PyGeometric, do the following: from datasets import load_dataset

    from torch_geometric.data import Data from torch_geometric.loader… See the full description on the dataset page: https://huggingface.co/datasets/graphs-datasets/MNIST.

  8. Mnist 42000 Images Dataset

    • universe.roboflow.com
    zip
    Updated Apr 25, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Roboflow (2023). Mnist 42000 Images Dataset [Dataset]. https://universe.roboflow.com/roboflow-jvuqo/mnist-42000-images-u0qdg
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 25, 2023
    Dataset authored and provided by
    Roboflow
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Numbers
    Description

    The MNIST database (Modified National Institute of Standards and Technology database) is a large database of handwritten digits that is commonly used for training various image processing systems. The database is also widely used for training and testing in the field of machine learning. It was created by "re-mixing" the samples from NIST's original datasets. The creators felt that since NIST's training dataset was taken from American Census Bureau employees, while the testing dataset was taken from American high school students, it was not well-suited for machine learning experiments. Furthermore, the black and white images from NIST were normalized to fit into a 28x28 pixel bounding box and anti-aliased, which introduced grayscale levels.

    Yann LeCun, Courant Institute, NYU Corinna Cortes, Google Labs, New York Christopher J.C. Burges, Microsoft Research, Redmond

  9. MNIST 2 Digit Classification Dataset

    • kaggle.com
    Updated Sep 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aman Kumar (2023). MNIST 2 Digit Classification Dataset [Dataset]. https://www.kaggle.com/datasets/amankumar234/mnist-2-digit-classification-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 19, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Aman Kumar
    Description

    Objective :

    The goal of this dataset is to create a custom dataset for multi-digit recognition tasks by concatenating pairs of digits from the MNIST dataset into single 128x128 pixel images and assigning labels that represent two-digit numbers from '00' to '99'.

    Dataset Features :

    Image (128 x 128 pixel Numpy array): The dataset contains images of size 128 x 128 pixels. Each image is a composition of two pairs of MNIST digits. Each digit occupies a 28 x 28 pixel space within the larger 128 x 128 pixel canvas. The digits are randomly placed within the canvas to simulate real-world scenarios.

    Label (Int): The labels represent two-digit numbers ranging from '00' to '99'. These labels are assigned based on the digits present in the image and their order. For example, an image with '7' and '2' as the first and second digits would be labeled as '72' ('7' * 10 + '2'). Leading zeros are added to ensure that all labels are two characters in length.

    Dataset Size:

    Training Data: 60,000 data points Test Data: 10,000 data points

    Data Generation: To create this dataset, you would start with the MNIST dataset, which contains single-digit images of handwritten digits from '0' to '9'. For each data point in the new dataset, you would randomly select two pairs of digits from MNIST and place them on a 128 x 128 canvas. The digits are placed at random positions, and their order can also be random. After creating the multi-digit image, you assign a label by concatenating the labels of the individual digits while ensuring they are two characters in length.

    Key Features of the 2-Digit Classification Dataset:

    Multi-Digit Images: This dataset consists of multi-digit images, each containing two handwritten digits. The inclusion of multiple digits in a single image presents a unique and challenging classification task.

    Labeling Complexity: Labels are represented as two-digit numbers, adding complexity to the classification problem. The labels range from '00' to '99,' encompassing a wide variety of possible combinations.

    Diverse Handwriting Styles: The dataset captures diverse handwriting styles, making it suitable for testing the robustness and generalization capabilities of machine learning models.

    128x128 Pixel Images: Images are provided in a high-resolution format of 128x128 pixels, allowing for fine-grained analysis and leveraging the increased image information.

    Large-Scale Training and Test Sets: With 60,000 training data points and 10,000 test data points, this dataset provides ample data for training and evaluating classification models.

    Potential Use Cases:

    Multi-Digit Recognition: The dataset is ideal for developing and evaluating machine learning models that can accurately classify multi-digit sequences, which find applications in reading house numbers, license plates, and more.

    OCR (Optical Character Recognition) Systems: Researchers and developers can use this dataset to train and benchmark OCR systems for recognizing handwritten multi-digit numbers.

    Real-World Document Processing: In scenarios where documents contain multiple handwritten numbers, such as invoices, receipts, and forms, this dataset can be valuable for automating data extraction.

    Address Parsing: It can be used to build systems capable of parsing handwritten addresses and extracting postal codes or other important information.

    Authentication and Security: Multi-digit classification models can contribute to security applications by recognizing handwritten PINs, passwords, or access codes.

    Education and Handwriting Analysis: Educational institutions can use this dataset to create handwriting analysis tools and assess the difficulty of recognizing different handwritten number combinations.

    Benchmarking Deep Learning Models: Data scientists and machine learning practitioners can use this dataset as a benchmark for testing and improving deep learning models' performance in multi-digit classification tasks.

    Data Augmentation: Researchers can employ data augmentation techniques to generate even more training data by introducing variations in digit placement and size.

    Model Explainability: Developing models for interpreting and explaining the reasoning behind classifying specific multi-digit combinations can have applications in AI ethics and accountability.

    Visualizations and Data Exploration: Researchers can use this dataset to explore visualizations and data analysis techniques to gain insights into the characteristics of handwritten multi-digit numbers.

    In summary, the 2-Digit Classification Dataset offers a unique opportunity to work on a challenging multi-digit recognition problem with real-world applications, making it a valuable resource for researchers, developers, and data scientists.

    Note: Creating this dataset would require a considerable amount of preprocessing and image manipulation. It's important to ensure that the labeling and placement of digits are done in a consistent and unbiased manner to create a reliable dataset for training and evaluation on Kaggle. Additionally, you may want to explore data augmentation techniques to increase the dataset's diversity and robustness.

    View the notebook as well in the link given below : https://www.kaggle.com/code/amankumar234/2-digit-classification-mnist-score-97

  10. T

    fashion_mnist

    • tensorflow.org
    • opendatalab.com
    • +4more
    Updated Jun 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). fashion_mnist [Dataset]. https://www.tensorflow.org/datasets/catalog/fashion_mnist
    Explore at:
    Dataset updated
    Jun 1, 2024
    Description

    Fashion-MNIST is a dataset of Zalando's article images consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated with a label from 10 classes.

    To use this dataset:

    import tensorflow_datasets as tfds
    
    ds = tfds.load('fashion_mnist', split='train')
    for ex in ds.take(4):
     print(ex)
    

    See the guide for more informations on tensorflow_datasets.

    https://storage.googleapis.com/tfds-data/visualization/fig/fashion_mnist-3.0.1.png" alt="Visualization" width="500px">

  11. MNIST_png

    • kaggle.com
    Updated Jun 14, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexander Y. (2022). MNIST_png [Dataset]. https://www.kaggle.com/datasets/alexanderyyy/mnist-png
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 14, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Alexander Y.
    Description

    This is a copy of the MNIST dataset converted to PNG by MOHAN https://www.kaggle.com/datasets/jidhumohan/mnist-png Original contains zip file inside of zip and may not work properly. Images are sorted by categories 0 to 9 into subfolders. The image size is 28x28 pixels. 60K of the train files and 10K of testing files.

  12. g

    Reduced MNIST

    • gts.ai
    json
    Updated May 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GTS (2024). Reduced MNIST [Dataset]. https://gts.ai/dataset-download/reduced-mnist/
    Explore at:
    jsonAvailable download formats
    Dataset updated
    May 5, 2024
    Dataset provided by
    GLOBOSE TECHNOLOGY SOLUTIONS PRIVATE LIMITED
    Authors
    GTS
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Explore the Reduced MNIST dataset, featuring a streamlined version of the classic MNIST handwritten digits dataset.

  13. Data from: MNIST handwritten digits

    • kaggle.com
    Updated May 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    HICHAM ACHAHBOUN (2024). MNIST handwritten digits [Dataset]. https://www.kaggle.com/datasets/hichamachahboun/mnist-handwritten-digits
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 23, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    HICHAM ACHAHBOUN
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    MNIST Dataset

    Introduction

    The MNIST (Modified National Institute of Standards and Technology) dataset is a widely used dataset for training and testing image processing systems. It consists of 70,000 images of handwritten digits (0-9), split into a training set of 60,000 images and a test set of 10,000 images. Each image is 28x28 pixels in size.

    Description

    The MNIST dataset includes:

    • Training Images: 60,000 images of handwritten digits.
    • Training Labels: Corresponding labels for the training images.
    • Test Images: 10,000 images of handwritten digits.
    • Test Labels: Corresponding labels for the test images.

    Each image is grayscale and has been size-normalized and centered in a fixed-size image.

    How to Use

    First, load the dataset files:

    import numpy as np
    
    train_val_images = 'train_images.npy' # Train 80%, Validation 20%
    train_val_labels = 'train_labels.npy' # Train 80%, Validation 20%
    test_images = 'test_images.npy'
    test_labels = 'test_labels.npy'
    
    train_val_images = np.load(train_val_images)
    train_val_labels = np.load(train_val_labels)
    

    Split the dataset into training, validation, and test sets:

    # 90% of the training data for training, 10% for validation
    train_images = train_val_images[:int(train_val_images.shape[0] * 0.9)]
    train_labels = train_val_labels[:int(train_val_labels.shape[0] * 0.9)]
    
    val_images = train_val_images[int(train_val_images.shape[0] * 0.9):]
    val_labels = train_val_labels[int(train_val_labels.shape[0] * 0.9):]
    
    test_images = np.load(test_images)
    test_labels = np.load(test_labels)
    

    Now, you can use these splits for your machine learning model training and evaluation.

    Acknowledgement

    The MNIST dataset was created by Yann LeCun, Corinna Cortes, and Christopher J.C. Burges. It is widely used for benchmarking image processing systems and is publicly available for academic and research purposes. Special thanks to the creators for making this dataset available to the research community.

  14. P

    Moving MNIST Dataset

    • paperswithcode.com
    • tensorflow.org
    • +1more
    Updated Feb 7, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nitish Srivastava; Elman Mansimov; Ruslan Salakhutdinov (2021). Moving MNIST Dataset [Dataset]. https://paperswithcode.com/dataset/moving-mnist
    Explore at:
    Dataset updated
    Feb 7, 2021
    Authors
    Nitish Srivastava; Elman Mansimov; Ruslan Salakhutdinov
    Description

    The Moving MNIST dataset contains 10,000 video sequences, each consisting of 20 frames. In each video sequence, two digits move independently around the frame, which has a spatial resolution of 64×64 pixels. The digits frequently intersect with each other and bounce off the edges of the frame

  15. MPI-MNIST Dataset

    • zenodo.org
    application/gzip, pdf
    Updated Jan 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Meira Iske; Meira Iske; Hannes Albers; Hannes Albers; Tobias Kluth; Tobias Kluth; Tobias Knopp; Tobias Knopp (2025). MPI-MNIST Dataset [Dataset]. http://doi.org/10.5281/zenodo.12799417
    Explore at:
    application/gzip, pdfAvailable download formats
    Dataset updated
    Jan 14, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Meira Iske; Meira Iske; Hannes Albers; Hannes Albers; Tobias Kluth; Tobias Kluth; Tobias Knopp; Tobias Knopp
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A dataset for magnetic particle imaging based on the MNIST dataset.

    This dataset contains simulated MPI measurements along with ground truth phantoms selected from the https://yann.lecun.com/exdb/mnist/" target="_blank" rel="noopener">MNIST database of handwritten digits. A state-of-the-art model-based system matrix is used to simulate the MPI measurements of the MNIST phantoms. These measurements are equipped with noise perturbations captured by the preclinical MPI system (Bruker, Ettlingen, Germany). The dataset can be utilized in its provided form, while additional data is included to offer flexibility for creating customized versions.

    MPI-MNIST features four different system matrices, each available in three spatial resolutions. The provided data is generated using a specified system matrix at highest spatial resolution. Reconstruction operations can be performed by using any of the provided system matrices at a lower resolution. This setup allows for simulating reconstructions from either an exact or an inexact forward operator. To cover further operator deviation setups, we provide additional noise data for the application of pixelwise noise to the reconstruction system matrix.

    For supporting the development of learning-based methods, a large amount of further noise samples, captured by the Bruker scanner, is provided.

    For a detailed description of the dataset, see arxiv.org/abs/2501.05583.

    The Python-based GitHub repository available at https://github.com/meiraiske/MPI-MNIST" href="https://github.com/meiraiske/MPI-MNIST" target="_blank" rel="noopener">https://github.com/meiraiske/MPI-MNIST can be used for downloading the data from this website and preparing it for project use which includes an integration to PyTorch or PyTorch Lightning modules.

    File Structure

    All data, except for the phantoms, is provided in the MDF file format. This format is specifically tailored to store MPI data and contains metadata corresponding to the experimental setup. The ground truth phantoms are provided as HDF5 files since they do not require any metadata.

    • SM: Contains twelve system matrices named SM_{physical model}_{resolution}.mdf. It covers four physical models given in three resolutions ('coarse', 'int' and 'fine'). The highest resolution ('fine') is used for data generation.
    • large_noise: Contains large_NoiseMeas.mdf with 390060 noise measurements. Each noise measurement has been averaged over ten empty scanner measurements. This can be used e.g. for learning-based methods.

    For dataset in ['train', 'test']:

    • {dataset}_noise: Contains four noise matrices, where each noise measurement has been averaged over ten empty scanner measurements:
      1. NoiseMeas_phantom_{dataset}.mdf : Additive measurement noise for simulated measurements.
      2. NoiseMeas_phantom_bg_{dataset}.mdf : Unused noise reserved for background correction of 1.
      3. NoiseMeas_SM_{dataset}.mdf : System Matrix noise, that can be applied to each pixel of the reconstruction system matrix.
      4. NoiseMeas_SM_bg_{dataset}.mdf : Unused noise reserved for background correction of 3.
    • {dataset}_gt: Contains {dataset}_gt.hdf5 with flattened and preprocessed ground truth MNIST phantoms given in coarse resolution (15x17=255 pixels) with pixel values in [0, 10].
    • {dataset}_obs: Contains {dataset}_obs.mdf with noise free simulated measurements (observations) of {dataset}_gt.hdf5 using the system matrix stored in SM_fluid_opt_fine.mdf.
    • {dataset}_obsnoisy: Contains {dataset}_obsnoisy.mdf with noise contained simulated measurements, resulting from {dataset}_obs.mdf and {dataset}_phantom_noise.mdf.


    In line with MNIST, each MDF/HDF5 file in {dataset}_gt, {dataset}_obs, {dataset}_obsnoisy for dataset in ['train', 'test'] contains 60000 samples for 'train' and 10000 samples for 'test'. The data can be manually reproduced in the intermediate resolution (45x51=2295 pixels) from the files in this dataset using the system matrices in intermediate ('int') resolution for reconstruction and upsampling the ground truth phantoms by 3 pixels per dimension. This case is also implemented in the Github repository .

    The PDF file MPI-MNIST_Metadata.pdf contains a list of meta information for each of the MDF files of this dataset.

  16. h

    mnist-text-default

    • huggingface.co
    Updated Feb 22, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fraser Greenlee (2021). mnist-text-default [Dataset]. https://huggingface.co/datasets/Fraser/mnist-text-default
    Explore at:
    Dataset updated
    Feb 22, 2021
    Authors
    Fraser Greenlee
    Description

    MNIST dataset adapted to a text-based representation.

    This allows testing interpolation quality for Transformer-VAEs.

    System is heavily inspired by Matthew Rayfield's work https://youtu.be/Z9K3cwSL6uM

    Works by quantising each MNIST pixel into one of 64 characters. Every sample has an up & down version to encourage the model to learn rotation invarient features.

    Use .array_to_text( and .text_to_array( methods to test your generated data.

    Data format: - text: (30 x 28 tokens, 840 tokens total): Textual representation of MNIST digit, for example: 00 down ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! 01 down ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! 02 down ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! 03 down ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! 04 down ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! 05 down ! ! ! ! ! ! ! ! ! ! ! ! ! % % % @ C L ' J a ^ @ ! ! ! ! 06 down ! ! ! ! ! ! ! ! ( * 8 G K ` ` ` ` ` Y L ` ] Q 1 ! ! ! ! 07 down ! ! ! ! ! ! ! - \ ` ` ` ` ` ` ` ` _ 8 5 5 / * ! ! ! ! ! 08 down ! ! ! ! ! ! ! % W ` ` ` ` ` R N ^ ] ! ! ! ! ! ! ! ! ! ! 09 down ! ! ! ! ! ! ! ! 5 H ; ` ` T # ! + G ! ! ! ! ! ! ! ! ! ! 10 down ! ! ! ! ! ! ! ! ! $ ! G ` 7 ! ! ! ! ! ! ! ! ! ! ! ! ! ! 11 down ! ! ! ! ! ! ! ! ! ! ! C ` P ! ! ! ! ! ! ! ! ! ! ! ! ! ! 12 down ! ! ! ! ! ! ! ! ! ! ! # P ` 2 ! ! ! ! ! ! ! ! ! ! ! ! ! 13 down ! ! ! ! ! ! ! ! ! ! ! ! ) ] Y I < ! ! ! ! ! ! ! ! ! ! ! 14 down ! ! ! ! ! ! ! ! ! ! ! ! ! 5 ] ` ` > ' ! ! ! ! ! ! ! ! ! 15 down ! ! ! ! ! ! ! ! ! ! ! ! ! ! , O ` ` F ' ! ! ! ! ! ! ! ! 16 down ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! % 8 ` ` O ! ! ! ! ! ! ! ! 17 down ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! _ ` _ 1 ! ! ! ! ! ! ! 18 down ! ! ! ! ! ! ! ! ! ! ! ! ! ! , A N ` ` T ! ! ! ! ! ! ! ! 19 down ! ! ! ! ! ! ! ! ! ! ! ! * F Z ` ` ` _ N ! ! ! ! ! ! ! ! 20 down ! ! ! ! ! ! ! ! ! ! ' = X ` ` ` ` S 4 ! ! ! ! ! ! ! ! ! 21 down ! ! ! ! ! ! ! ! & 1 V ` ` ` ` R 5 ! ! ! ! ! ! ! ! ! ! ! 22 down ! ! ! ! ! ! % K W ` ` ` ` Q 5 # ! ! ! ! ! ! ! ! ! ! ! ! 23 down ! ! ! ! . L Y ` ` ` ` ^ B # ! ! ! ! ! ! ! ! ! ! ! ! ! ! 24 down ! ! ! ! C ` ` ` V B B % ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! 25 down ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! 26 down ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! 27 down ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! - label: Just a number with the texts matching label.

  17. MNIST Greek Letters

    • kaggle.com
    Updated Jun 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sayan Gupta (2024). MNIST Greek Letters [Dataset]. https://www.kaggle.com/datasets/sayangupta001/mnist-greek-letters
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 29, 2024
    Dataset provided by
    Kaggle
    Authors
    Sayan Gupta
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    This dataset consists of 600 28x28 images of all 24 Greek alphabets (25 for each alphabet).

    All the digits were handwritten using a GUI interface made with tkinter library.

    Suggestions regarding the dataset are welcome.

  18. MNIST 2 Digit dataset

    • kaggle.com
    Updated Dec 26, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rashaad Meyer (2022). MNIST 2 Digit dataset [Dataset]. https://www.kaggle.com/datasets/rashaadmeyer/mnist-2-digit-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 26, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Rashaad Meyer
    Description

    This dataset is based off the MNIST dataset of handwritten digits. It provides 2 digits per image instead of 1. The training set is composed of 55,000 examples, and the test set has 5,000 samples. The goal is to classify both digits correctly!

  19. P

    Permuted MNIST Dataset

    • paperswithcode.com
    Updated Sep 30, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ian J. Goodfellow; Mehdi Mirza; Da Xiao; Aaron Courville; Yoshua Bengio (2021). Permuted MNIST Dataset [Dataset]. https://paperswithcode.com/dataset/permuted-mnist
    Explore at:
    Dataset updated
    Sep 30, 2021
    Authors
    Ian J. Goodfellow; Mehdi Mirza; Da Xiao; Aaron Courville; Yoshua Bengio
    Description

    Permuted MNIST is an MNIST variant that consists of 70,000 images of handwritten digits from 0 to 9, where 60,000 images are used for training, and 10,000 images for test. The difference of this dataset from the original MNIST is that each of the ten tasks is the multi-class classification of a different random permutation of the input pixels.

  20. P

    N-MNIST Dataset

    • paperswithcode.com
    Updated Mar 31, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). N-MNIST Dataset [Dataset]. https://paperswithcode.com/dataset/n-mnist
    Explore at:
    Dataset updated
    Mar 31, 2023
    Description

    Brief Description The Neuromorphic-MNIST (N-MNIST) dataset is a spiking version of the original frame-based MNIST dataset. It consists of the same 60 000 training and 10 000 testing samples as the original MNIST dataset, and is captured at the same visual scale as the original MNIST dataset (28x28 pixels). The N-MNIST dataset was captured by mounting the ATIS sensor on a motorized pan-tilt unit and having the sensor move while it views MNIST examples on an LCD monitor as shown in this video. A full description of the dataset and how it was created can be found in the paper below. Please cite this paper if you make use of the dataset.

    Orchard, G.; Cohen, G.; Jayawant, A.; and Thakor, N. “Converting Static Image Datasets to Spiking Neuromorphic Datasets Using Saccades", Frontiers in Neuroscience, vol.9, no.437, Oct. 2015

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
(2024). mnist [Dataset]. https://www.tensorflow.org/datasets/catalog/mnist

mnist

Explore at:
70 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Jun 1, 2024
Description

The MNIST database of handwritten digits.

To use this dataset:

import tensorflow_datasets as tfds

ds = tfds.load('mnist', split='train')
for ex in ds.take(4):
 print(ex)

See the guide for more informations on tensorflow_datasets.

https://storage.googleapis.com/tfds-data/visualization/fig/mnist-3.0.1.png" alt="Visualization" width="500px">