Feedback
3 results found
  1. DDSM Mammography

    • www.kaggle.com
    Updated Jul 3, 2018
  2. T

    curated_breast_imaging_ddsm

    • www.tensorflow.org
  3. IRMA Version of DDSM LJPEG DataDigital Database for Screening Mammography...

    • search.datacite.org
    Published 2010
  4. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
Facebook
Twitter
Email
Click to copy link
Link copied

DDSM Mammography

tfrecords files of scans from the DDSM dataset

16 scholarly articles cite this dataset (View in Google Scholar)
  • Dataset updated Jul 3, 2018
Authors
Eric A. Scuccimarra
License
CC0: Public Domainhttps://creativecommons.org/publicdomain/zero/1.0/
Available download formats from providers
zip (6377193753 bytes), npy (61584 bytes)
Description

Summary

This dataset consists of images from the DDSM [1] and CBIS-DDSM [3] datasets. The images have been pre-processed and converted to 299x299 images by extracting the ROIs. The data is stored as tfrecords files for TensorFlow.

The dataset contains 55,890 training examples, of which 14% are positive and the remaining 86% negative, divided into 5 tfrecords files.

Note - The data has been separated into training and test as per the division in the CBIS-DDSM dataset. The test files have been divided equally into test and validation data. However the split between test and validation data was done incorrectly, resulted in the test numpy files containing only masses and the validation files containing only calcifications. These files should be combined in order to have balanced and complete test data.

Pre-processing

The dataset consists of negative images from the DDSM dataset and positive images from the CBIS-DDSM dataset. The data was pre-processed to convert it into 299x299 images.

The negative (DDSM) images were tiled into 598x598 tiles, which were then resized to 299x299.

The positive (CBIS-DDSM) images had their ROIs extracted using the masks with a small amount of padding to provide context. Each ROI was then randomly cropped three times into 598x598 images, with random flips and rotations, and then the images were resized down to 299x299.

The images are labeled with two labels:

  1. label_normal - 0 for negative and 1 for positive
  2. label - full multi-class labels, 0 is negative, 1 is benign calcification, 2 is benign mass, 3 is malignant calcification, 4 is malignant mass

The following Python code will decode the training examples:

   features = tf.parse_single_example(
        serialized_example,
        features={
            'label': tf.FixedLenFeature([], tf.int64),
            'label_normal': tf.FixedLenFeature([], tf.int64),
            'image': tf.FixedLenFeature([], tf.string)
        })

    # extract the data
    label = features['label_normal']
    image = tf.decode_raw(features['image'], tf.uint8)

    # reshape and scale the image
    image = tf.reshape(image, [299, 299, 1])

The training examples do include images which contain content other than breast tissue, such as black background and occasionally overlay text.

Inspiration

Previous work [5] has already dealt with classifying pre-identified lesions, this dataset was created with the intention of classifying raw scans as positive or negative by detecting abnormalities. The ability to automatically detect lesions could save many lives.

Acknowledgements

[1] The Digital Database for Screening Mammography, Michael Heath, Kevin Bowyer, Daniel Kopans, Richard Moore and W. Philip Kegelmeyer, in Proceedings of the Fifth International Workshop on Digital Mammography, M.J. Yaffe, ed., 212-218, Medical Physics Publishing, 2001. ISBN 1-930524-00-5.

[2] Current status of the Digital Database for Screening Mammography, Michael Heath, Kevin Bowyer, Daniel Kopans, W. Philip Kegelmeyer, Richard Moore, Kyong Chang, and S. Munish Kumaran, in Digital Mammography, 457-460, Kluwer Academic Publishers, 1998; Proceedings of the Fourth International Workshop on Digital Mammography.

[3] Rebecca Sawyer Lee, Francisco Gimenez, Assaf Hoogi , Daniel Rubin (2016). Curated Breast Imaging Subset of DDSM. The Cancer Imaging Archive.

[4] Clark K, Vendt B, Smith K, Freymann J, Kirby J, Koppel P, Moore S, Phillips S, Maffitt D, Pringle M, Tarbox L, Prior F. The Cancer Imaging Archive (TCIA): Maintaining and Operating a Public Information Repository, Journal of Digital Imaging, Volume 26, Number 6, December, 2013, pp 1045-1057.

[5] D. Levy, A. Jain, Breast Mass Classification from Mammograms using Deep Convolutional Neural Networks, arXiv:1612.00542v1, 2016

Search
Clear search
Close search
Google apps
Main menu