Feedback
100+ results found
  1. Breast Cancer Wisconsin (Diagnostic) Data Set

    • www.kaggle.com
    Updated Sep 25, 2016
  2. Breast Cancer Wisconsin

    • data.world
    Updated Oct 9, 2019
  3. Breast Cancer

    • data.world
    Updated Sep 18, 2019
  4. Breast Cancer Prediction Dataset

    • www.kaggle.com
    Updated Sep 26, 2018
  5. H

    Replication Data for: Wisconsin Breast Cancer Diagnostic

    • dataverse.harvard.edu
    Updated Apr 6, 2016
  6. Breast Cancer Proteomes

    • www.kaggle.com
    Updated Jul 3, 2016
  7. m

    Gene Expression Profiles of Breast Cancer

    • data.mendeley.com
    • search.datacite.org
    Updated Dec 21, 2017
  8. f

    Data from: BreCaHAD: A Dataset for Breast Cancer Histopathological...

    • figshare.com
    Updated Jan 28, 2019
  9. Breast Cancer WI (Diagnostic)

    • data.world
    Updated May 22, 2018
  10. NKI Breast Cancer Data

    • data.world
    Updated Oct 10, 2019
  11. o

    Breast Cancer Dataset

    • omictools.com
    Updated Jul 1, 2016
  12. Five human breast cancer microarray gene expression datasets

    • researchdata.ands.org.au
    Published 2011
  13. Incidence of breast cancer(all)

    • data.gov.uk
    • data.wu.ac.at
    Updated Feb 9, 2010
  14. H

    Replication Data for: Ljubljana Breast Cancer

    • dataverse.harvard.edu
    Updated Apr 6, 2016
  15. f

    Breast cancer data from TCGA for machine learning exercise

    • figshare.com
    Updated Jan 19, 2016
  16. H

    Data from: Progression free survival in Iraqi breast cancer patients treated...

    • dataverse.harvard.edu
    Updated Dec 28, 2018
  17. Mortality from breast cancer in females (CCGOIS 1.20)

    • data.gov.uk
    • data.wu.ac.at
    Updated Jul 2, 2019
  18. d

    Breast Cancer Quality Performance Indicators

    • data.gov.uk
    • data.wu.ac.at
    Updated Apr 29, 2014
  19. Years of Life Lost (YLL): Breast cancer

    • data.gov.uk
    • data.wu.ac.at
    Updated Feb 9, 2010
  20. o

    MAQC-II Project: human breast cancer (BR) data set

    • omictools.com
    • www.omicsdi.org
    Updated Aug 10, 2018
  21. Breast cancer in England

    • data.gov.uk
    • data.wu.ac.at
    Updated Sep 1, 2013
  22. o

    Data from: Human breast cancer associated fibroblasts exhibit subtype...

    • omictools.com
    Updated Aug 13, 2018
  23. E

    Data from: Characterization of individual foci of multicentric/multifocal...

    • ega-archive.org
  24. Breast cancer statistics

    • www.ons.gov.uk
    Published Nov 19, 2015
  25. m

    CT images and radiotherapy treatment planning of patients with breast...

    • data.mendeley.com
    • mendeley.figshare.com
    Updated Jun 13, 2017
  26. f

    A Physical Mechanism and Global Quantification of Breast Cancer

    • figshare.com
    Updated Sep 28, 2016
  27. BREAST CANCER ALTERNATIVES INC, fiscal year ending Dec. 2016

    • projects.propublica.org
  28. Incidence of and mortality from breast cancer in England 2004 to 2017

    • www.ons.gov.uk
    Published Dec 19, 2018
  29. Breast Cancer Wisconsin (Diagnostic)

    • data.world
    Updated Sep 22, 2019
  30. Breast cancer and cervical cancer screenings

    • data.europa.eu
    Updated Oct 11, 2019
  31. r

    Primary breast cancer: ESMO Clinical Practice Guidelines for diagnosis,...

    • www.researchgate.net
    Published Oct 1, 2013
  32. t

    Breast Cancer Screening, Borough

    • datahub.io
    • data.wu.ac.at
    Updated Sep 26, 2015
  33. Breast Cancer Subclassification

    • researchdata.ands.org.au
    Published Feb 15, 2013
  34. Plasma metabolic fingerprint for breast cancer (MS) - part I

    • www.metabolomicsworkbench.org
    • www.omicsdi.org
  35. d

    Data from: Modifiable patient-related barriers and their association with...

    • datadryad.org
  36. Duke Breast Cancer Dataset

    • www.kaggle.com
    Updated Mar 25, 2018
  37. a

    Breast Cancer Tissue Bank Sample and Image Collection

    • researchdata.ands.org.au
    Published Sep 24, 2010
  38. o

    Data from: Genomic Interaction Profiles in Breast Cancer Reveal Altered...

    • omictools.com
    • datamed.org
    Updated Nov 14, 2017
  39. o

    Epigenome analysis of tumor adjacent normal tissue from breast cancer...

    • omictools.com
    • www.omicsdi.org
    Updated Jul 25, 2018
  40. Breast cancer deaths, England and Wales, 1980 to 1989 registrations

    • www.ons.gov.uk
    Published Mar 10, 2017
  41. r

    Data from: CYP2D6*4 polymorphisms and breast cancer risk

    • www.researchgate.net
    Published Jan 1, 2010
  42. Premature deaths and age-standardised mortality rates from breast cancer by...

    • www.ons.gov.uk
    Published Sep 6, 2018
  43. Breast cancer: Mortality rate

    • data.gov.uk
    • data.wu.ac.at
    Updated Feb 9, 2010
  44. r

    Data from: Comparative Study on Breast Cancer

    • www.researchgate.net
    Published Dec 31, 2013
  45. r

    RRST-Health Science Association between Antioxidant Enzymes and Breast...

    • www.researchgate.net
    Published Apr 17, 2014
  46. t

    BIOGRID CURATED DATA FOR PUBLICATION: Apigenin inhibits...

    • thebiogrid.org
  47. Long-term Breast Cancer Survival - England and Wales

    • www.ons.gov.uk
    Published Oct 30, 2014
  48. d

    Data from: Outcome of breast cancer in Moroccan young women correlated to...

    • datadryad.org
  49. o

    Expression Data from transNOAH breast cancer trial

    • www.omicsdi.org
    • omictools.com
  50. Number of people diagnosed with Lobular Breast Cancer, England, 2006 to 2016...

    • www.ons.gov.uk
    Published Apr 2, 2019
  51. a

    Breast Cancer Tissue Bank Blood Samples

    • researchdata.ands.org.au
    Published Sep 24, 2010
  52. Breast Cancer Wisconsin (Prognostic)

    • data.world
    Updated Sep 14, 2019
  53. f

    Diabetes and Breast Cancer Subtypes

    • figshare.com
    Updated Feb 1, 2017
  54. H

    Growth Factor Stimulation Induces a Distinct ERalpha Cistrome Underlying...

    • dataverse.harvard.edu
    Updated Oct 18, 2010
  55. Health Status: Breast Cancer Ratios, 1986 to 1995

    • open.canada.ca
    • data.amerigeoss.org
    Updated Jan 26, 2017
  56. o

    A microarray meta-dataset of breast cancer

    • www.omicsdi.org
  57. D

    Quality of life in Indonesian women suspected of breast cancer and the...

    • dataverse.nl
    Updated Jul 3, 2018
  58. E

    Genetic mechanisms of resistance to chemotherapy in breast cancer

    • ega-archive.org
    • www.omicsdi.org
  59. r

    Data from: The role of vitamin D in therapy of breast cancer

    • www.researchgate.net
    Published Oct 9, 2013
  60. Data_Sheet_1_EYA2 Correlates With Clinico-Pathological Features of Breast...

    • figshare.com
    Updated Jan 29, 2019
  61. r

    Data from: Computer Model Challenges Breast Cancer Treatment Strategy

    • www.researchgate.net
    Published Jan 1, 1994
  62. E

    FinHer Breast Cancer Study

    • ega-archive.org
    • www.omicsdi.org
  63. o

    Her2/Neu breast cancer mouse model transcriptome

    • www.omicsdi.org
    • omictools.com
  64. z

    Raw BRCA1/2 variants in breast cancer patients and healthy relatives...

    • zenodo.org
    • figshare.com
    • +1more
    Published Dec 21, 2016
  65. f

    Examining the Pathogenesis of Breast Cancer Using a Novel Agent-Based Model...

    • figshare.com
    Updated Dec 2, 2015
  66. BREAST CANCER RELIEF FOUNDATION, fiscal year ending Sept. 2011

    • projects.propublica.org
  67. f

    Combining Gene Signatures Improves Prediction of Breast Cancer Survival

    • figshare.com
    Updated Jan 18, 2016
  68. E

    Breast Cancer -Very young women with ER+ tumor

    • ega-archive.org
  69. Health Status: Breast Cancer Rates, 1986 to 1995

    • open.canada.ca
    • data.amerigeoss.org
    • +1more
    Updated Jan 26, 2017
  70. r

    Data from: MCF-7 breast cancer cells selected for tamoxifen resistance...

    • www.researchgate.net
    Published Aug 25, 2015
  71. BREAST CANCER PREVENTION INSTITUTE, fiscal year ending Dec. 2015

    • projects.propublica.org
  72. m

    RNASeq data from primary breast cancer clinical study pre- and post- two...

    • data.mendeley.com
    Updated Oct 10, 2018
  73. E

    Breast cancer sequential sampling study

    • ega-archive.org
    • www.omicsdi.org
  74. Supplementary Material for: Let-7d Inhibits Growth and Metastasis in Breast...

    • figshare.com
    Updated Jul 5, 2018
  75. m

    Data from: Understanding the Community Interest of Breast Cancer in...

    • data.mendeley.com
    Updated Dec 3, 2018
  76. r

    Data from: Prospective study of Outcomes in Sporadic versus Hereditary...

    • www.researchgate.net
    Published Jan 1, 2007
  77. f

    Computational Prediction and Analysis of Breast Cancer Targets for...

    • figshare.com
    Updated Jan 15, 2016
  78. m

    STROMAL EXPRESSION OF CD10 IN BREAST CANCER- A NEW PROGNOSTIC MARKER

    • data.mendeley.com
    Updated Nov 17, 2017
  79. m

    Patient Data of Breast Cancer

    • data.mendeley.com
    • figshare.com
    Updated Jan 18, 2017
  80. d

    Data from: High-throughput adaptive sampling for whole-slide histopathology...

    • datadryad.org
  81. d

    Cancer screening coverage - breast cancer (% eligible women screened...

    • data.gov.uk
    • data.wu.ac.at
    Updated Mar 18, 2015
  82. BREAST CANCER ACTION, fiscal year ending June 2017

    • projects.propublica.org
  83. m

    Data for: Adherence to adjuvant endocrine therapy in women with breast...

    • data.mendeley.com
    Updated Dec 30, 2017
  84. Breast cancer incidence rates for Cumbria, 2012 to 2014

    • www.ons.gov.uk
    Published Dec 20, 2016
  85. breast cancer dataset from breakhis

    • www.kaggle.com
    Updated May 10, 2019
  86. o

    Gene expression profiles in breast cancer: breast cancer tissues vs. normal...

    • omictools.com
    • datamed.org
    Updated Oct 11, 2016
  87. a

    BCTB Invasive Breast Cancer Samples and Images

    • researchdata.ands.org.au
    Published Sep 24, 2010
  88. Quality of Life Social Environment Indicator - Incidence of Breast Cancer...

    • open.canada.ca
    • data.wu.ac.at
    Updated Jan 26, 2017
  89. m

    Genomic evolution of breast cancer metastasis and relapse - Yates et al.

    • data.mendeley.com
    • mendeley.figshare.com
    Updated Jun 28, 2017
  90. t

    BIOGRID CURATED DATA FOR PUBLICATION: Proliferative role of TRAF4 in breast...

    • thebiogrid.org
  91. r

    2 3 Targets for the Action of Phytoestrogens in Breast Cancer—Focus on...

    • www.researchgate.net
    Published Mar 22, 2014
  92. o

    Breast cancer endothelial interaction

    • omictools.com
    • www.omicsdi.org
    • +1more
    Updated Mar 22, 2012
  93. a

    Breast Cancer St Vincents VIC database, BioGrid Australia Ltd

    • researchdata.ands.org.au
    Published Jun 21, 2012
  94. t

    BIOGRID CURATED DATA FOR PUBLICATION: The oncogenic STP axis promotes...

    • thebiogrid.org
  95. o

    Tumor-associated stroma derived from primary clinical breast cancer samples...

    • omictools.com
    • www.omicsdi.org
    Updated Dec 6, 2012
  96. o

    Vascular properties of breast cancer intrinsic subtypes

    • omictools.com
    • datamed.org
    Updated Jul 17, 2015
  97. o

    Data from: Prediction of lymph node involvement in breast cancer from...

    • omictools.com
    • www.omicsdi.org
    Updated Oct 29, 2018
  98. Probing the metabolic phenotype of breast cancer cells by multiple tracer...

    • www.metabolomicsworkbench.org
    • www.omicsdi.org
  99. o

    Chemokine expressions in breast cancer patients

    • omictools.com
    • www.omicsdi.org
    Updated Jun 30, 2017
  100. C

    Number of Cancer Surgeries (Volume) Performed in California Hospitals

    • data.chhs.ca.gov
    Updated Oct 4, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied

Breast Cancer Proteomes

Dividing breast cancer patients into separate sub-classes

51 scholarly articles cite this dataset (View in Google Scholar)
  • Dataset updated Jul 3, 2016
Authors
kajot
License
Unknown
Available download formats from providers
zip (12440701 bytes), csv (6674 bytes), csv (18637 bytes)
Description

Context: This data set contains published iTRAQ proteome profiling of 77 breast cancer samples generated by the Clinical Proteomic Tumor Analysis Consortium (NCI/NIH). It contains expression values for ~12.000 proteins for each sample, with missing values present when a given protein could not be quantified in a given sample.

Content:

File: 77_cancer_proteomes_CPTAC_itraq.csv

  • RefSeq_accession_number: RefSeq protein ID (each protein has a unique ID in a RefSeq database)
  • gene_symbol: a symbol unique to each gene (every protein is encoded by some gene)
  • gene_name: a full name of that gene Remaining columns: log2 iTRAQ ratios for each sample (protein expression data, most important), three last columns are from healthy individuals

File: clinical_data_breast_cancer.csv

First column "Complete TCGA ID" is used to match the sample IDs in the main cancer proteomes file (see example script). All other columns have self-explanatory names, contain data about the cancer classification of a given sample using different methods. 'PAM50 mRNA' classification is being used in the example script.

File: PAM50_proteins.csv

Contains the list of genes and proteins used by the PAM50 classification system. The column RefSeqProteinID contains the protein IDs that can be matched with the IDs in the main protein expression data set.

Past Research: The original study: http://www.nature.com/nature/journal/v534/n7605/full/nature18003.html (paywall warning)

In brief: the data were used to assess how the mutations in the DNA are affecting the protein expression landscape in breast cancer. Genes in our DNA are first transcribed into RNA molecules which then are translated into proteins. Changing the information content of DNA has impact on the behavior of the proteome, which is the main functional unit of cells, taking care of cell division, DNA repair, enzymatic reactions and signaling etc. They performed K-means clustering on the protein data to divide the breast cancer patients into sub-types, each having unique protein expression signature. They found that the best clustering was achieved using 3 clusters (original PAM50 gene set yields four different subtypes using RNA data).

Inspiration:

This is an interesting study and I myself wanted to use this breast cancer proteome data set for other types of analyses using machine learning that I am performing as a part of my PhD. However, I though that the Kaggle community (or at least that part with biomedical interests) would enjoy playing with it. I added a simple K-means clustering example for that data with some comments, the same approach as used in the original paper. One thing is that there is a panel of genes, the PAM50 which is used to classify breast cancers into subtypes. This panel was originally based on the RNA expression data which is (in my opinion) not as robust as the measurement of mRNA's final product, the protein. Perhaps using this data set, someone could find a different set of proteins (they all have unique NP_/XP_ identifiers) that would divide the data set even more robustly? Perhaps into a higher numbers of clusters with very distinct protein expression signatures?

Example K-means analysis script: http://pastebin.com/A0Wj41DP

Search
Clear search
Close search
Google apps
Main menu