100+ datasets found
  1. d

    Data from: Breast Cancer Wisconsin

    • data.world
    csv, zip
    Updated Apr 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Health (2024). Breast Cancer Wisconsin [Dataset]. https://data.world/health/breast-cancer-wisconsin
    Explore at:
    csv, zipAvailable download formats
    Dataset updated
    Apr 11, 2024
    Authors
    Health
    Area covered
    Wisconsin
    Description

    Source:

    Creators:

    1. Dr. William H. Wolberg, General Surgery Dept.
      University of Wisconsin, Clinical Sciences Center
      Madison, WI 53792
      wolberg '@' eagle.surgery.wisc.edu

    2. W. Nick Street, Computer Sciences Dept.
      University of Wisconsin, 1210 West Dayton St., Madison, WI 53706
      street '@' cs.wisc.edu 608-262-6619

    3. Olvi L. Mangasarian, Computer Sciences Dept.
      University of Wisconsin, 1210 West Dayton St., Madison, WI 53706
      olvi '@' cs.wisc.edu

    Donor: Nick Street

    Source: UCI - Machine Learning Repository

  2. Wisconsin Breast Cancer Dataset

    • kaggle.com
    Updated Jun 18, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ana M. (2018). Wisconsin Breast Cancer Dataset [Dataset]. https://www.kaggle.com/datasets/anacoder1/wisc-bc-data
    Explore at:
    Dataset updated
    Jun 18, 2018
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Ana M.
    Area covered
    Wisconsin
    Description

    This data was donated by researchers of the University of Wisconsin and includes the measurements from digitized images of fine-needle aspirate of a breast mass.

    You can find the dataset at https://github.com/dataspelunking/MLwR/blob/master/Machine%20Learning%20with%20R%20(2nd%20Ed.)/Chapter%2003/wisc_bc_data.csv.

    The breast cancer data includes 569 examples of cancer biopsies, each with 32 features. One feature is an identification number, another is the cancer diagnosis and 30 are numeric-valued laboratory measurements. The diagnosis is coded as "M" to indicate malignant or "B" to indicate benign.

    The other 30 numeric measurements comprise the mean, standard error and worst (i.e. largest) value for 10 different characteristics of the digitized cell nuclei, which are as follows:-

    • Radius
    • Texture
    • Perimeter
    • Area
    • Smoothness
    • Compactness
    • Concavity
    • Concave Points
    • Symmetry
    • Fractal dimension
  3. SEER Breast Cancer Data

    • ieee-dataport.org
    • zenodo.org
    • +1more
    Updated Jan 18, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    JING TENG (2019). SEER Breast Cancer Data [Dataset]. http://doi.org/10.21227/a9qy-ph35
    Explore at:
    Dataset updated
    Jan 18, 2019
    Dataset provided by
    Institute of Electrical and Electronics Engineershttp://www.ieee.ro/
    Authors
    JING TENG
    Description

    This dataset of breast cancer patients was obtained from the 2017 November update of the SEER Program of the NCI, which provides information on population-based cancer statistics. The dataset involved female patients with infiltrating duct and lobular carcinoma breast cancer (SEER primary cites recode NOS histology codes 8522/3) diagnosed in 2006-2010. Patients with unknown tumor size, examined regional LNs, regional positive LNs, and patients whose survival months were less than 1 month were excluded; thus, 4024 patients were ultimately included.

  4. d

    Breast Cancer Wisconsin (Diagnostic)

    • data.world
    csv, zip
    Updated Apr 10, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    UCI (2024). Breast Cancer Wisconsin (Diagnostic) [Dataset]. https://data.world/uci/breast-cancer-wisconsin-diagnostic
    Explore at:
    csv, zipAvailable download formats
    Dataset updated
    Apr 10, 2024
    Dataset provided by
    data.world, Inc.
    Authors
    UCI
    Area covered
    Wisconsin
    Description

    Source:

    Creators:
    1. Dr. William H. Wolberg, General Surgery Dept. University of Wisconsin, Clinical Sciences Center Madison, WI 53792wolberg '@' eagle.surgery.wisc.edu
    2. W. Nick Street, Computer Sciences Dept. University of Wisconsin, 1210 West Dayton St., Madison, WI 53706street '@' cs.wisc.edu 608-262-6619
    3. Olvi L. Mangasarian, Computer Sciences Dept. University of Wisconsin, 1210 West Dayton St., Madison, WI 53706olvi '@' cs.wisc.edu

    Donor:
    Nick Street

    Data Set Information:

    Features are computed from a digitized image of a fine needle aspirate (FNA) of a breast mass. They describe characteristics of the cell nuclei present in the image. A few of the images can be found at
    Separating plane described above was obtained using Multisurface Method-Tree (MSM-T) [K. P. Bennett, "Decision Tree Construction Via Linear Programming." Proceedings of the 4th Midwest Artificial Intelligence and Cognitive Science Society, pp. 97-101, 1992], a classification method which uses linear programming to construct a decision tree. Relevant features were selected using an exhaustive search in the space of 1-4 features and 1-3 separating planes.
    The actual linear program used to obtain the separating plane in the 3-dimensional space is that described in: [K. P. Bennett and O. L. Mangasarian: "Robust Linear Programming Discrimination of Two Linearly Inseparable Sets", Optimization Methods and Software 1, 1992, 23-34].
    This database is also available through the UW CS ftp server:ftp ftp.cs.wisc.educd math-prog/cpo-dataset/machine-learn/WDBC/

    Attribute Information:

    1) ID number
    2) Diagnosis (M = malignant, B = benign)
    3-32)

    Ten real-valued features are computed for each cell nucleus:
    a) radius (mean of distances from center to points on the perimeter)
    b) texture (standard deviation of gray-scale values)
    c) perimeter
    d) area
    e) smoothness (local variation in radius lengths)
    f) compactness (perimeter^2 / area - 1.0)
    g) concavity (severity of concave portions of the contour)
    h) concave points (number of concave portions of the contour)
    i) symmetry
    j) fractal dimension ("coastline approximation" - 1)

    Relevant Papers:

    First Usage:
    W.N. Street, W.H. Wolberg and O.L. Mangasarian. Nuclear feature extraction for breast tumor diagnosis. IS&T/SPIE 1993 International Symposium on Electronic Imaging: Science and Technology, volume 1905, pages 861-870, San Jose, CA, 1993.

    O.L. Mangasarian, W.N. Street and W.H. Wolberg. Breast cancer diagnosis and prognosis via linear programming. Operations Research, 43(4), pages 570-577, July-August 1995. Medical literature:

    W.H. Wolberg, W.N. Street, and O.L. Mangasarian. Machine learning techniques to diagnose breast cancer from fine-needle aspirates. Cancer Letters 77 (1994) 163-171.

    W.H. Wolberg, W.N. Street, and O.L. Mangasarian. Image analysis and machine learning applied to breast cancer diagnosis and prognosis. Analytical and Quantitative Cytology and Histology, Vol. 17 No. 2, pages 77-87, April 1995.

    W.H. Wolberg, W.N. Street, D.M. Heisey, and O.L. Mangasarian. Computerized breast cancer diagnosis and prognosis from fine needle aspirates. Archives of Surgery 1995;130:511-516.

    W.H. Wolberg, W.N. Street, D.M. Heisey, and O.L. Mangasarian. Computer-derived nuclear features distinguish malignant from benign breast cytology. Human Pathology, 26:792*796, 1995. See also:

    Papers That Cite This Data Set1:

    • Gavin Brown. Diversity in Neural Network Ensembles. The University of Birmingham. 2004.
      • Hussein A. Abbass. An evolutionary artificial neural networks approach for breast cancer diagnosis. Artificial Intelligence in Medicine, 25. 2002.
      • Baback Moghaddam and Gregory Shakhnarovich. Boosted Dyadic Kernel Discriminants. NIPS. 2002.
      • Krzysztof Grabczewski and Wl/odzisl/aw Duch. Heterogeneous Forests of Decision Trees. ICANN. 2002.
      • András Antos and Balázs Kégl and Tamás Linder and Gábor Lugosi. Data-dependent margin-based generalization bounds for classification. Journal of Machine Learning Research, 3. 2002.
      • Kristin P. Bennett and Ayhan Demiriz and Richard Maclin. Exploiting unlabeled data in ensemble methods. KDD. 2002.
      • Robert Burbidge and Matthew Trotter and Bernard F. Buxton and Sean B. Holden. STAR - Sparsity through Automated Rejection. IWANN (1). 2001.
      • Nikunj C. Oza and Stuart J. Russell. Experimental comparisons of online and batch versions of bagging and boosting. KDD. 2001.
      • Yuh-Jeng Lee. Smooth Support Vector Machines. Preliminary Thesis Proposal Computer Sciences Department University of Wisconsin. 2000.
      • Justin Bradley and Kristin P. Bennett and Bennett A. Demiriz. Constrained K-Means Clustering. Microsoft Research Dept. of Mathematical Sciences One Microsoft Way Dept. of Decision Sciences and Eng. Sys. 2000.
      • Lorne Mason and Peter L. Bartlett and Jonathan Baxter. Improved Generalization Through Explicit Optimization of Margins. Machine Learning, 38. 2000.
      • P. S and Bradley K. P and Bennett A. Demiriz. Constrained K-Means Clustering. Microsoft Research Dept. of Mathematical Sciences One Microsoft Way Dept. of Decision Sciences and Eng. Sys. 2000.
      • Endre Boros and Peter Hammer and Toshihide Ibaraki and Alexander Kogan and Eddy Mayoraz and Ilya B. Muchnik. An Implementation of Logical Analysis of Data. IEEE Trans. Knowl. Data Eng, 12. 2000.
      • Chun-Nan Hsu and Hilmar Schuschel and Ya-Ting Yang. The ANNIGMA-Wrapper Approach to Neural Nets Feature Selection for Knowledge Discovery and Data Mining. Institute of Information Science. 1999.
      • Lorne Mason and Peter L. Bartlett and Jonathan Baxter. Direct Optimization of Margins Improves Generalization in Combined Classifiers. NIPS. 1998.
      • W. Nick Street. A Neural Network Model for Prognostic Prediction. ICML. 1998.
      • Yk Huhtala and Juha Kärkkäinen and Pasi Porkka and Hannu Toivonen. Efficient Discovery of Functional and Approximate Dependencies Using Partitions. ICDE. 1998.
      • Huan Liu and Hiroshi Motoda and Manoranjan Dash. A Monotonic Measure for Optimal Feature Selection. ECML. 1998.
      • Kristin P. Bennett and Erin J. Bredensteiner. A Parametric Optimization Method for Machine Learning. INFORMS Journal on Computing, 9. 1997.
      • Rudy Setiono and Huan Liu. NeuroLinear: From neural networks to oblique decision rules. Neurocomputing, 17. 1997.
      • . Prototype Selection for Composite Nearest Neighbor Classifiers. Department of Computer Science University of Massachusetts. 1997.
      • Jennifer A. Blue and Kristin P. Bennett. Hybrid Extreme Point Tabu Search. Department of Mathematical Sciences Rensselaer Polytechnic Institute. 1996.
      • Erin J. Bredensteiner and Kristin P. Bennett. Feature Minimization within Decision Trees. National Science Foundation. 1996.
      • Ismail Taha and Joydeep Ghosh. Characterization of the Wisconsin Breast cancer Database Using a Hybrid Symbolic-Connectionist System. Proceedings of ANNIE. 1996.
      • Geoffrey I. Webb. OPUS: An Efficient Admissible Algorithm for Unordered Search. J. Artif. Intell. Res. (JAIR, 3. 1995.
      • Rudy Setiono. Extracting M-of-N Rules from Trained Neural Networks. School of Computing National University of Singapore.
      • Jarkko Salojarvi and Samuel Kaski and Janne Sinkkonen. Discriminative clustering in Fisher metrics. Neural Networks Research Centre Helsinki University of Technology.
      • Wl odzisl and Rafal Adamczak and Krzysztof Grabczewski and Grzegorz Zal. A hybrid method for extraction of logical rules from data. Department of Computer Methods, Nicholas Copernicus University.
      • Charles Campbell and Nello Cristianini. Simple Learning Algorithms for Training Support Vector Machines. Dept. of Engineering Mathematics.
      • Chotirat Ann and Dimitrios Gunopulos. Scaling up the Naive Bayesian Classifier: Using Decision Trees for Feature Selection. Computer Science Department University of California.
      • Wl odzisl/aw Duch and Rudy Setiono and Jacek M. Zurada. Computational intelligence methods for rule-based data understanding.
      • Rafael S. Parpinelli and Heitor S. Lopes and Alex Alves Freitas. An Ant Colony Based System for Data Mining: Applications to Medical Data. CEFET-PR, CPGEI Av. Sete de Setembro, 3165.
      • Wl/odzisl/aw Duch and Rafal/ Adamczak Email:duchraad@phys. uni. torun. pl. Statistical methods for construction of neural networks. Department of Computer Methods, Nicholas Copernicus University.
      • Rafael S. Parpinelli and Heitor S. Lopes and Alex Alves Freitas. PART FOUR: ANT COLONY OPTIMIZATION AND IMMUNE SYSTEMS Chapter X An Ant Colony Algorithm for Classification Rule Discovery. CEFET-PR, Curitiba.
      • Adam H. Cannon and Lenore J. Cowen and Carey E. Priebe. Approximate Distance Classification. Department of Mathematical Sciences The Johns Hopkins University.
      • Andrew I. Schein and Lyle H. Ungar. A-Optimality for Active Learning of Logistic Regression Classifiers. Department of Computer and Information Science Levine Hall.
      • Bart Baesens and Stijn Viaene and Tony Van Gestel and J. A. K Suykens and Guido Dedene and Bart De Moor and Jan Vanthienen and Katholieke Universiteit Leuven. An Empirical Assessment of Kernel Type Performance for Least Squares Support Vector Machine Classifiers. Dept. Applied Economic Sciences.
      • Adil M. Bagirov and Alex Rubinov and A. N. Soukhojak and John Yearwood. Unsupervised and supervised data classification via nonsmooth and global optimization. School of Information Technology and Mathematical Sciences, The University of Ballarat.
      • Rudy Setiono and Huan Liu. Neural-Network Feature Selector. Department of Information Systems and Computer Science National University of Singapore.
      • Huan Liu. A Family of Efficient Rule Generators. Department of Information Systems and Computer Science National University of Singapore.

    Citation Request:

    Please refer to the Machine Learning Repository's citation policy. [1] Papers were automatically harvested and associated with this data set, in collaborationwith

  5. d

    Breast Cancer

    • data.world
    csv, zip
    Updated Apr 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    UCI (2024). Breast Cancer [Dataset]. https://data.world/uci/breast-cancer
    Explore at:
    csv, zipAvailable download formats
    Dataset updated
    Apr 15, 2024
    Dataset provided by
    data.world, Inc.
    Authors
    UCI
    Description

    Source:

    Creators: Matjaz Zwitter & Milan Soklic (physicians)
    Institute of Oncology University Medical Center
    Ljubljana, Yugoslavia

    Donors:
    Ming Tan and Jeff Schlimmer (Jeffrey.Schlimmer '@' a.gp.cs.cmu.edu)

    Data Set Information:

    This is one of three domains provided by the Oncology Institute that has repeatedly appeared in the machine learning literature. (See also lymphography and primary-tumor.)
    This data set includes 201 instances of one class and 85 instances of another class. The instances are described by 9 attributes, some of which are linear and some are nominal.

    Attribute Information:

    1. Class: no-recurrence-events, recurrence-events
    2. age: 10-19, 20-29, 30-39, 40-49, 50-59, 60-69, 70-79, 80-89, 90-99.
    3. menopause: lt40, ge40, premeno.
    4. tumor-size: 0-4, 5-9, 10-14, 15-19, 20-24, 25-29, 30-34, 35-39, 40-44, 45-49, 50-54, 55-59.
    5. inv-nodes: 0-2, 3-5, 6-8, 9-11, 12-14, 15-17, 18-20, 21-23, 24-26, 27-29, 30-32, 33-35, 36-39.
    6. node-caps: yes, no.
    7. deg-malig: 1, 2, 3.
    8. breast: left, right.
    9. breast-quad: left-up, left-low, right-up, right-low, central.
    10. irradiat: yes, no.

    Relevant Papers:

    Michalski,R.S., Mozetic,I., Hong,J., & Lavrac,N. (1986). The Multi-Purpose Incremental Learning System AQ15 and its Testing Application to Three Medical Domains. In Proceedings of the Fifth National Conference on Artificial Intelligence, 1041-1045, Philadelphia, PA: Morgan Kaufmann.
    Clark,P. & Niblett,T. (1987). Induction in Noisy Domains. In Progress in Machine Learning (from the Proceedings of the 2nd European Working Session on Learning), 11-30, Bled, Yugoslavia: Sigma Press.
    Tan, M., & Eshelman, L. (1988). Using weighted networks to represent classification knowledge in noisy domains. Proceedings of the Fifth International Conference on Machine Learning, 121-134, Ann Arbor, MI.
    Cestnik,G., Konenenko,I, & Bratko,I. (1987). Assistant-86: A Knowledge-Elicitation Tool for Sophisticated Users. In I.Bratko & N.Lavrac (Eds.) Progress in Machine Learning, 31-45, Sigma Press.

    Papers That Cite This Data Set1:

    • Igor Fischer and Jan Poland. Amplifying the Block Matrix Structure for Spectral Clustering. Telecommunications Lab. 2005.
      • Saher Esmeir and Shaul Markovitch. Lookahead-based algorithms for anytime induction of decision trees. ICML. 2004.
      • Gavin Brown. Diversity in Neural Network Ensembles. The University of Birmingham. 2004.
      • Kaizhu Huang and Haiqin Yang and Irwin King and Michael R. Lyu and Laiwan Chan. Biased Minimax Probability Machine for Medical Diagnosis. AMAI. 2004.
      • Qingping Tao Ph. D. MAKING EFFICIENT LEARNING ALGORITHMS WITH EXPONENTIALLY MANY FEATURES. Qingping Tao A DISSERTATION Faculty of The Graduate College University of Nebraska In Partial Fulfillment of Requirements. 2004.
      • Krzysztof Grabczewski and Wl/odzisl/aw Duch. Heterogeneous Forests of Decision Trees. ICANN. 2002.
      • Hussein A. Abbass. An evolutionary artificial neural networks approach for breast cancer diagnosis. Artificial Intelligence in Medicine, 25. 2002.
      • Fei Sha and Lawrence K. Saul and Daniel D. Lee. Multiplicative Updates for Nonnegative Quadratic Programming in Support Vector Machines. NIPS. 2002.
      • Kristin P. Bennett and Ayhan Demiriz and Richard Maclin. Exploiting unlabeled data in ensemble methods. KDD. 2002.
      • Baback Moghaddam and Gregory Shakhnarovich. Boosted Dyadic Kernel Discriminants. NIPS. 2002.
      • András Antos and Balázs Kégl and Tamás Linder and Gábor Lugosi. Data-dependent margin-based generalization bounds for classification. Journal of Machine Learning Research, 3. 2002.
      • Michael G. Madden. Evaluation of the Performance of the Markov Blanket Bayesian Classifier Algorithm. CoRR, csLG/0211003. 2002.
      • Yongmei Wang and Ian H. Witten. Modeling for Optimal Probability Prediction. ICML. 2002.
      • Remco R. Bouckaert. Accuracy bounds for ensembles under 0 { 1 loss. Xtal Mountain Information Technology & Computer Science Department, University of Waikato. 2002.
      • Nikunj C. Oza and Stuart J. Russell. Experimental comparisons of online and batch versions of bagging and boosting. KDD. 2001.
      • Bernhard Pfahringer and Geoffrey Holmes and Richard Kirkby. Optimizing the Induction of Alternating Decision Trees. PAKDD. 2001.
      • Robert Burbidge and Matthew Trotter and Bernard F. Buxton and Sean B. Holden. STAR - Sparsity through Automated Rejection. IWANN (1). 2001.
      • Bernhard Pfahringer and Geoffrey Holmes and Gabi Schmidberger. Wrapping Boosters against Noise. Australian Joint Conference on Artificial Intelligence. 2001.
      • W. Nick Street and Yoo-Hyon Kim. A streaming ensemble algorithm (SEA) for large-scale classification. KDD. 2001.
      • Lorne Mason and Peter L. Bartlett and Jonathan Baxter. Improved Generalization Through Explicit Optimization of Margins. Machine Learning, 38. 2000.
      • Endre Boros and Peter Hammer and Toshihide Ibaraki and Alexander Kogan and Eddy Mayoraz and Ilya B. Muchnik. An Implementation of Logical Analysis of Data. IEEE Trans. Knowl. Data Eng, 12. 2000.
      • P. S and Bradley K. P and Bennett A. Demiriz. Constrained K-Means Clustering. Microsoft Research Dept. of Mathematical Sciences One Microsoft Way Dept. of Decision Sciences and Eng. Sys. 2000.
      • Sally A. Goldman and Yan Zhou. Enhancing Supervised Learning with Unlabeled Data. ICML. 2000.
      • Justin Bradley and Kristin P. Bennett and Bennett A. Demiriz. Constrained K-Means Clustering. Microsoft Research Dept. of Mathematical Sciences One Microsoft Way Dept. of Decision Sciences and Eng. Sys. 2000.
      • Yuh-Jeng Lee. Smooth Support Vector Machines. Preliminary Thesis Proposal Computer Sciences Department University of Wisconsin. 2000.
      • Petri Kontkanen and Petri Myllym and Tomi Silander and Henry Tirri and Peter Gr. On predictive distributions and Bayesian networks. Department of Computer Science, Stanford University. 2000.
      • Kristin P. Bennett and Ayhan Demiriz and John Shawe-Taylor. A Column Generation Algorithm For Boosting. ICML. 2000.
      • Matthew Mullin and Rahul Sukthankar. Complete Cross-Validation for Nearest Neighbor Classifiers. ICML. 2000.
      • David W. Opitz and Richard Maclin. Popular Ensemble Methods: An Empirical Study. J. Artif. Intell. Res. (JAIR, 11. 1999.
      • Chun-Nan Hsu and Hilmar Schuschel and Ya-Ting Yang. The ANNIGMA-Wrapper Approach to Neural Nets Feature Selection for Knowledge Discovery and Data Mining. Institute of Information Science. 1999.
      • David M J Tax and Robert P W Duin. Support vector domain description. Pattern Recognition Letters, 20. 1999.
      • Kai Ming Ting and Ian H. Witten. Issues in Stacked Generalization. J. Artif. Intell. Res. (JAIR, 10. 1999.
      • Ismail Taha and Joydeep Ghosh. Symbolic Interpretation of Artificial Neural Networks. IEEE Trans. Knowl. Data Eng, 11. 1999.
      • Lorne Mason and Jonathan Baxter and Peter L. Bartlett and Marcus Frean. Boosting Algorithms as Gradient Descent. NIPS. 1999.
      • Iñaki Inza and Pedro Larrañaga and Basilio Sierra and Ramon Etxeberria and Jose Antonio Lozano and Jos Manuel Peña. Representing the behaviour of supervised classification learning algorithms by Bayesian networks. Pattern Recognition Letters, 20. 1999.
      • Lorne Mason and Peter L. Bartlett and Jonathan Baxter. Direct Optimization of Margins Improves Generalization in Combined Classifiers. NIPS. 1998.
      • Richard Maclin. Boosting Classifiers Regionally. AAAI/IAAI. 1998.
      • Huan Liu and Hiroshi Motoda and Manoranjan Dash. A Monotonic Measure for Optimal Feature Selection. ECML. 1998.
      • Yk Huhtala and Juha Kärkkäinen and Pasi Porkka and Hannu Toivonen. Efficient Discovery of Functional and Approximate Dependencies Using Partitions. ICDE. 1998.
      • W. Nick Street. A Neural Network Model for Prognostic Prediction. ICML. 1998.
      • Kristin P. Bennett and Erin J. Bredensteiner. A Parametric Optimization Method for Machine Learning. INFORMS Journal on Computing, 9. 1997.
      • Pedro Domingos. Control-Sensitive Feature Selection for Lazy Learners. Artif. Intell. Rev, 11. 1997.
      • Rudy Setiono and Huan Liu. NeuroLinear: From neural networks to oblique decision rules. Neurocomputing, 17. 1997.
      • . Prototype Selection for Composite Nearest Neighbor Classifiers. Department of Computer Science University of Massachusetts. 1997.
      • Erin J. Bredensteiner and Kristin P. Bennett. Feature Minimization within Decision Trees. National Science Foundation. 1996.
      • Ismail Taha and Joydeep Ghosh. Characterization of the Wisconsin Breast cancer Database Using a Hybrid Symbolic-Connectionist System. Proceedings of ANNIE. 1996.
      • Kamal Ali and Michael J. Pazzani. Error Reduction through Learning Multiple Descriptions. Machine Learning, 24. 1996.
      • Jennifer A. Blue and Kristin P. Bennett. Hybrid Extreme Point Tabu Search. Department of Mathematical Sciences Rensselaer Polytechnic Institute. 1996.
      • Pedro Domingos. Unifying Instance-Based and Rule-Based Induction. Machine Learning, 24. 1996.
      • Geoffrey I. Webb. OPUS: An Efficient Admissible Algorithm for Unordered Search. J. Artif. Intell. Res. (JAIR, 3. 1995.
      • Christophe Giraud and Tony Martinez and Christophe G. Giraud-Carrier. University of Bristol Department of Computer Science ILA: Combining Inductive Learning with Prior Knowledge and Reasoning. 1995.
      • Ron Kohavi. A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection. IJCAI. 1995.
      • M. A. Galway and Michael G. Madden. DEPARTMENT OF INFORMATION TECHNOLOGY technical report NUIG-IT-011002 Evaluation of the Performance of the Markov Blanket Bayesian Classifier Algorithm. Department of Information Technology National University of Ireland, Galway.
      • John G. Cleary and Leonard E. Trigg. Experiences with OB1, An Optimal Bayes Decision Tree Learner. Department of Computer Science University of Waikato.
      • Wl/odzisl/aw Duch and Rafal/ Adamczak Email:duchraad@phys. uni. torun. pl. Statistical methods for construction of neural networks. Department of Computer Methods, Nicholas Copernicus University.
      • Rong-En Fan and P. -H Chen
  6. Breast Cancer Wisconsin - benign or malignant

    • kaggle.com
    Updated Jul 20, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ninja Coding (2020). Breast Cancer Wisconsin - benign or malignant [Dataset]. https://www.kaggle.com/datasets/ninjacoding/breast-cancer-wisconsin-benign-or-malignant
    Explore at:
    Dataset updated
    Jul 20, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Ninja Coding
    Description

    Context

    It is quite common to find ML-based applications embedded with real-time patient data available from different healthcare systems in multiple countries, thereby increasing the efficacy of new treatment options which were unavailable before. This data set is all about predicting whether the cancer cells are benign or malignant.

    Content

    Information about attributes:

    There are total 10 attributes(int)- Sample code number: id number Clump Thickness: 1 - 10 Uniformity of Cell Size: 1 - 10 Uniformity of Cell Shape: 1 - 10 Marginal Adhesion: 1 - 10 Single Epithelial Cell Size: 1 - 10 Bare Nuclei: 1 - 10 Bland Chromatin: 1 - 10 Normal Nucleoli: 1 - 10 Mitoses: 1 - 10 Predicted class: 2 for benign and 4 for malignant

    Acknowledgements

    This data set(Original Wisconsin Breast Cancer Database) is taken from UCI Machine Learning Repository.

    Inspiration

    This is the first ever data set I am sharing in Kaggle. It would be a great pleasure if you find this data set useful to develop your own model. Hope this simple data set will help beginners to develop their own models for classification and learn how to make their model even better.

  7. Breast Cancer Dataset

    • kaggle.com
    zip
    Updated Jul 15, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rithik Kotha (2021). Breast Cancer Dataset [Dataset]. https://www.kaggle.com/rithikkotha/breast-cancer-dataset
    Explore at:
    zip(49094 bytes)Available download formats
    Dataset updated
    Jul 15, 2021
    Authors
    Rithik Kotha
    Description

    Dataset

    This dataset was created by Rithik Kotha

    Contents

  8. d

    Breast Cancer Wisconsin (Original)

    • data.world
    csv, zip
    Updated Apr 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    UCI (2024). Breast Cancer Wisconsin (Original) [Dataset]. https://data.world/uci/breast-cancer-wisconsin-original
    Explore at:
    zip, csvAvailable download formats
    Dataset updated
    Apr 9, 2024
    Authors
    UCI
    Area covered
    Wisconsin
    Description

    Source:

    Creator:
    Dr. WIlliam H. Wolberg (physician)
    University of Wisconsin Hospitals
    Madison, Wisconsin, USA

    Donor:
    Olvi Mangasarian (mangasarian '@' cs.wisc.edu)
    Received by David W. Aha (aha '@' cs.jhu.edu)

    Data Set Information:

    Samples arrive periodically as Dr. Wolberg reports his clinical cases. The database therefore reflects this chronological grouping of the data. This grouping information appears immediately below, having been removed from the data itself:
    Group 1: 367 instances (January 1989)
    Group 2: 70 instances (October 1989)
    Group 3: 31 instances (February 1990)
    Group 4: 17 instances (April 1990)
    Group 5: 48 instances (August 1990)
    Group 6: 49 instances (Updated January 1991)
    Group 7: 31 instances (June 1991)

    Group 8: 86 instances (November 1991)

    Total: 699 points (as of the donated datbase on 15 July 1992)

    Note that the results summarized above in Past Usage refer to a dataset of size 369, while Group 1 has only 367 instances. This is because it originally contained 369 instances; 2 were removed. The following statements summarizes changes to the original Group 1's set of data:

    Group 1 : 367 points: 200B 167M (January 1989)
    Revised Jan 10, 1991: Replaced zero bare nuclei in 1080185 & 1187805
    Revised Nov 22,1991: Removed 765878,4,5,9,7,10,10,10,3,8,1 no record
    : Removed 484201,2,7,8,8,4,3,10,3,4,1 zero epithelial
    : Changed 0 to 1 in field 6 of sample 1219406
    : Changed 0 to 1 in field 8 of following sample:
    : 1182404,2,3,1,1,1,2,0,1,1,1

    Attribute Information:

    1. Sample code number: id number
    2. Clump Thickness: 1 - 10
    3. Uniformity of Cell Size: 1 - 10
    4. Uniformity of Cell Shape: 1 - 10
    5. Marginal Adhesion: 1 - 10
    6. Single Epithelial Cell Size: 1 - 10
    7. Bare Nuclei: 1 - 10
    8. Bland Chromatin: 1 - 10
    9. Normal Nucleoli: 1 - 10
    10. Mitoses: 1 - 10
    11. Class: (2 for benign, 4 for malignant)

    Relevant Papers:

    Wolberg, W.H., & Mangasarian, O.L. (1990). Multisurface method of pattern separation for medical diagnosis applied to breast cytology. In Proceedings of the National Academy of Sciences, 87, 9193*9196.

    Zhang, J. (1992). Selecting typical instances in instance-based learning. In Proceedings of the Ninth International Machine Learning Conference (pp. 470*479). Aberdeen, Scotland: Morgan Kaufmann.

    Papers That Cite This Data Set:

    • Gavin Brown. Diversity in Neural Network Ensembles. The University of Birmingham. 2004.
    • Krzysztof Grabczewski and Wl/odzisl/aw Duch. Heterogeneous Forests of Decision Trees. ICANN. 2002.
    • András Antos and Balázs Kégl and Tamás Linder and Gábor Lugosi. Data-dependent margin-based generalization bounds for classification. Journal of Machine Learning Research, 3. 2002.
    • Kristin P. Bennett and Ayhan Demiriz and Richard Maclin. Exploiting unlabeled data in ensemble methods. KDD. 2002.
    • Hussein A. Abbass. An evolutionary artificial neural networks approach for breast cancer diagnosis. Artificial Intelligence in Medicine, 25. 2002.
    • Baback Moghaddam and Gregory Shakhnarovich. Boosted Dyadic Kernel Discriminants. NIPS. 2002.
    • Robert Burbidge and Matthew Trotter and Bernard F. Buxton and Sean B. Holden. STAR - Sparsity through Automated Rejection. IWANN (1). 2001.
    • Nikunj C. Oza and Stuart J. Russell. Experimental comparisons of online and batch versions of bagging and boosting. KDD. 2001.
    • Yuh-Jeng Lee. Smooth Support Vector Machines. Preliminary Thesis Proposal Computer Sciences Department University of Wisconsin. 2000.
    • Justin Bradley and Kristin P. Bennett and Bennett A. Demiriz. Constrained K-Means Clustering. Microsoft Research Dept. of Mathematical Sciences One Microsoft Way Dept. of Decision Sciences and Eng. Sys. 2000.
    • Lorne Mason and Peter L. Bartlett and Jonathan Baxter. Improved Generalization Through Explicit Optimization of Margins. Machine Learning, 38. 2000.
    • P. S and Bradley K. P and Bennett A. Demiriz. Constrained K-Means Clustering. Microsoft Research Dept. of Mathematical Sciences One Microsoft Way Dept. of Decision Sciences and Eng. Sys. 2000.
    • Endre Boros and Peter Hammer and Toshihide Ibaraki and Alexander Kogan and Eddy Mayoraz and Ilya B. Muchnik. An Implementation of Logical Analysis of Data. IEEE Trans. Knowl. Data Eng, 12. 2000.
    • Chun-Nan Hsu and Hilmar Schuschel and Ya-Ting Yang. The ANNIGMA-Wrapper Approach to Neural Nets Feature Selection for Knowledge Discovery and Data Mining. Institute of Information Science. 1999.
    • Huan Liu and Hiroshi Motoda and Manoranjan Dash. A Monotonic Measure for Optimal Feature Selection. ECML. 1998.
    • Lorne Mason and Peter L. Bartlett and Jonathan Baxter. Direct Optimization of Margins Improves Generalization in Combined Classifiers. NIPS. 1998.
    • W. Nick Street. A Neural Network Model for Prognostic Prediction. ICML. 1998.
    • Yk Huhtala and Juha Kärkkäinen and Pasi Porkka and Hannu Toivonen. Efficient Discovery of Functional and Approximate Dependencies Using Partitions. ICDE. 1998.
    • . Prototype Selection for Composite Nearest Neighbor Classifiers. Department of Computer Science University of Massachusetts. 1997.
    • Kristin P. Bennett and Erin J. Bredensteiner. A Parametric Optimization Method for Machine Learning. INFORMS Journal on Computing, 9. 1997.
    • Rudy Setiono and Huan Liu. NeuroLinear: From neural networks to oblique decision rules. Neurocomputing, 17. 1997.
    • Erin J. Bredensteiner and Kristin P. Bennett. Feature Minimization within Decision Trees. National Science Foundation. 1996.
    • Ismail Taha and Joydeep Ghosh. Characterization of the Wisconsin Breast cancer Database Using a Hybrid Symbolic-Connectionist System. Proceedings of ANNIE. 1996.
    • Jennifer A. Blue and Kristin P. Bennett. Hybrid Extreme Point Tabu Search. Department of Mathematical Sciences Rensselaer Polytechnic Institute. 1996.
    • Geoffrey I. Webb. OPUS: An Efficient Admissible Algorithm for Unordered Search. J. Artif. Intell. Res. (JAIR, 3. 1995.
    • Wl odzisl/aw Duch and Rudy Setiono and Jacek M. Zurada. Computational intelligence methods for rule-based data understanding.
    • Rafael S. Parpinelli and Heitor S. Lopes and Alex Alves Freitas. An Ant Colony Based System for Data Mining: Applications to Medical Data. CEFET-PR, CPGEI Av. Sete de Setembro, 3165.
    • Wl/odzisl/aw Duch and Rafal/ Adamczak Email:duchraad@phys. uni. torun. pl. Statistical methods for construction of neural networks. Department of Computer Methods, Nicholas Copernicus University.
    • Rafael S. Parpinelli and Heitor S. Lopes and Alex Alves Freitas. PART FOUR: ANT COLONY OPTIMIZATION AND IMMUNE SYSTEMS Chapter X An Ant Colony Algorithm for Classification Rule Discovery. CEFET-PR, Curitiba.
    • Adam H. Cannon and Lenore J. Cowen and Carey E. Priebe. Approximate Distance Classification. Department of Mathematical Sciences The Johns Hopkins University.
    • Andrew I. Schein and Lyle H. Ungar. A-Optimality for Active Learning of Logistic Regression Classifiers. Department of Computer and Information Science Levine Hall.
    • Bart Baesens and Stijn Viaene and Tony Van Gestel and J. A. K Suykens and Guido Dedene and Bart De Moor and Jan Vanthienen and Katholieke Universiteit Leuven. An Empirical Assessment of Kernel Type Performance for Least Squares Support Vector Machine Classifiers. Dept. Applied Economic Sciences.
    • Adil M. Bagirov and Alex Rubinov and A. N. Soukhojak and John Yearwood. Unsupervised and supervised data classification via nonsmooth and global optimization. School of Information Technology and Mathematical Sciences, The University of Ballarat.
    • Rudy Setiono and Huan Liu. Neural-Network Feature Selector. Department of Information Systems and Computer Science National University of Singapore.
    • Huan Liu. A Family of Efficient Rule Generators. Department of Information Systems and Computer Science National University of Singapore.
    • Rudy Setiono. Extracting M-of-N Rules from Trained Neural Networks. School of Computing National University of Singapore.
    • Jarkko Salojarvi and Samuel Kaski and Janne Sinkkonen. Discriminative clustering in Fisher metrics. Neural Networks Research Centre Helsinki University of Technology.
    • Wl odzisl and Rafal Adamczak and Krzysztof Grabczewski and Grzegorz Zal. A hybrid method for extraction of logical rules from data. Department of Computer Methods, Nicholas Copernicus University.
    • Charles Campbell and Nello Cristianini. Simple Learning Algorithms for Training Support Vector Machines. Dept. of Engineering Mathematics.
    • Chotirat Ann and Dimitrios Gunopulos. Scaling up the Naive Bayesian Classifier: Using Decision Trees for Feature Selection. Computer Science Department University of California.

    Citation Request:

    This breast cancer databases was obtained from the University of Wisconsin Hospitals, Madison from Dr. William H. Wolberg. If you publish results when using this database, then please include this information in your acknowledgements. Also, please cite one or more of:1. O. L. Mangasarian and W. H. Wolberg: "Cancer diagnosis via linear programming", SIAM News, Volume 23, Number 5, September 1990, pp 1 & 18.2. William H. Wolberg and O.L. Mangasarian: "Multisurface method of pattern separation for medical diagnosis applied to breast cytology", Proceedings of the National Academy of Sciences, U.S.A., Volume 87, December 1990, pp 9193-9196.3. O. L. Mangasarian, R. Setiono, and W.H. Wolberg: "Pattern recognition via linear programming: Theory and application to medical diagnosis", in: "Large-scale numerical optimization", Thomas F. Coleman and Yuying Li, editors, SIAM Publications, Philadelphia 1990, pp 22-30.4. K. P. Bennett & O. L. Mangasarian: "Robust linear programming discrimination of two linearly inseparable sets", Optimization Methods and Software 1, 1992, 23-34 (Gordon & Breach Science

  9. P

    BreakHis Dataset

    • paperswithcode.com
    Updated Sep 2, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). BreakHis Dataset [Dataset]. https://paperswithcode.com/dataset/breakhis
    Explore at:
    Dataset updated
    Sep 2, 2022
    Description

    The Breast Cancer Histopathological Image Classification (BreakHis) is composed of 9,109 microscopic images of breast tumor tissue collected from 82 patients using different magnifying factors (40X, 100X, 200X, and 400X). It contains 2,480 benign and 5,429 malignant samples (700X460 pixels, 3-channel RGB, 8-bit depth in each channel, PNG format). This database has been built in collaboration with the P&D Laboratory - Pathological Anatomy and Cytopathology, Parana, Brazil.

    Paper: F. A. Spanhol, L. S. Oliveira, C. Petitjean and L. Heutte, "A Dataset for Breast Cancer Histopathological Image Classification," in IEEE Transactions on Biomedical Engineering, vol. 63, no. 7, pp. 1455-1462, July 2016, doi: 10.1109/TBME.2015.2496264

  10. UCI Wisconsin Breast Cancer

    • kaggle.com
    Updated Jun 5, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Steve Coffie (2020). UCI Wisconsin Breast Cancer [Dataset]. https://www.kaggle.com/datasets/stevecoffie/uci-wisconsin-breast-cancer
    Explore at:
    Dataset updated
    Jun 5, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Steve Coffie
    Area covered
    Wisconsin
    Description

    Context

    Diagnostic Wisconsin Breast Cancer Database

    Content

    What's inside is more than just rows and columns. Make it easy for others to get started by describing how you acquired the data and what time period it represents, too.

    Acknowledgements

    Features are computed from a digitized image of a fine needle aspirate (FNA) of a breast mass. They describe characteristics of the cell nuclei present in the image. n the 3-dimensional space is that described in: [K. P. Bennett and O. L. Mangasarian: "Robust Linear Programming Discrimination of Two Linearly Inseparable Sets", Optimization Methods and Software 1, 1992, 23-34].

    This database is also available through the UW CS ftp server: ftp ftp.cs.wisc.edu cd math-prog/cpo-dataset/machine-learn/WDBC/

    Also can be found on UCI Machine Learning Repository: https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Diagnostic%29

    Attribute Information:

    1) ID number 2) Diagnosis (M = malignant, B = benign) 3-32)

    Ten real-valued features are computed for each cell nucleus:

    a) radius (mean of distances from center to points on the perimeter) b) texture (standard deviation of gray-scale values) c) perimeter d) area e) smoothness (local variation in radius lengths) f) compactness (perimeter^2 / area - 1.0) g) concavity (severity of concave portions of the contour) h) concave points (number of concave portions of the contour) i) symmetry j) fractal dimension ("coastline approximation" - 1)

    The mean, standard error and "worst" or largest (mean of the three largest values) of these features were computed for each image, resulting in 30 features. For instance, field 3 is Mean Radius, field 13 is Radius SE, field 23 is Worst Radius.

    All feature values are recoded with four significant digits.

    Missing attribute values: none

    Class distribution: 357 benign, 212 malignant

    Inspiration

    Using Machine Learning and AI to contribute in cancer treatment

  11. d

    NKI Breast Cancer Data

    • data.world
    csv, zip
    Updated Apr 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Devi Ramanan (2024). NKI Breast Cancer Data [Dataset]. https://data.world/deviramanan2016/nki-breast-cancer-data
    Explore at:
    zip, csvAvailable download formats
    Dataset updated
    Apr 17, 2024
    Authors
    Devi Ramanan
    Description

    272 breast cancer patients (as rows), 1570 columns. Network built using only gene expression. Meta data includes patient info, treatment, and survival.

    Each node is a group of patients similar to each other. Flares (left) represent sub-populations that are distinct from the larger population. (One differentiating factor between the two flares is estrogen expression (low = top flare, high = bottom flare)). Bottom flare is a group of patients with 100% survival. Top flare shows a range of survival – very poor towards the tip (red), and very good near the base (circled).

    The circled group of good survivors have genetic indicators of poor survivors (i.e. low ESR1 levels, which is typically the prognostic indicator of poor outcomes in breast cancer) – understanding this group could be critical for helping improve mortality rates for this disease. Why this group survived was quickly analysed by using the Outcome Column (here Event Death - which is binary - 0,1) as a Data Lens (which we term Supervised vs Unsupervised analyses).

    Published in 2 papers - Nature and PNAS:

    http://www.nature.com/articles/srep01236

    https://d1bp1ynq8xms31.cloudfront.net/wp-content/uploads/2015/02/Topology_Based_Data_Analysis_Identifies_a_Subgroup_of_Breast_Cancer_with_a_unique_mutational_profile_and_excellent_survival.pdf

  12. m

    Gene Expression Profiles of Breast Cancer

    • data.mendeley.com
    Updated Dec 21, 2017
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Haozhe Xie (2017). Gene Expression Profiles of Breast Cancer [Dataset]. http://doi.org/10.17632/v3cc2p38hb.1
    Explore at:
    Dataset updated
    Dec 21, 2017
    Authors
    Haozhe Xie
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The published dataset consists of four sperate datasets:

    • BC-TCGA consists of 17,814 genes and 590 samples (including 61 normal tissue samples and 529 breast cancer tissue samples).
    • GSE2034 includes 12,634 genes and 286 breast cancer samples (including 107 recurrence tumor samples and 179 no recurrence samples).
    • GSE25066 has 492 breast cancer samples available (including 100 pathologic complete response (PCR) samples and 392 residual disease (RD) samples) and 12,634 genes.
    • Simulation Data includes 100 positive samples and 100 negative samples with 10,000 features, and each feature in SData follows normal distributions: N(0, 0.1) and N(0 ± r, 0.1) for positive and negative samples, respectively, where r ∈ [−0.125, 0.125].

    All of the datasets are used in the experiments in the paper (Comparison among dimensionality reduction techniques based on Random Projection for cancer classification, Xie et al., 2016).

  13. d

    UCTH Breast Cancer Dataset - Dataset - B2FIND

    • b2find.dkrz.de
    Updated Oct 23, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). UCTH Breast Cancer Dataset - Dataset - B2FIND [Dataset]. https://b2find.dkrz.de/dataset/c0924699-9119-542e-9a52-f9b9293c23bb
    Explore at:
    Dataset updated
    Oct 23, 2023
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Research Hypothesis: This study hypothesizes that there are significant associations between the diagnostic characteristics of patients, including age, menopause status, tumor size, presence of invasive nodes, affected breast, metastasis status, breast quadrant, history of breast conditions, and their breast cancer diagnosis result. Data Collection and Description:The dataset of 213 patient observations was obtained from the University of Calabar Teaching Hospital cancer registry over 24 months (January 2019–August 2021). The data includes eleven features: year of diagnosis, age, menopause status, tumor size in cm, number of invasive nodes, breast (left or right) affected, metastasis (yes or no), quadrant of the breast affected, history of breast disease, and diagnosis result (benign or malignant).Notable Findings:Upon preliminary examination, the data shows variations in diagnosis results across different patient features. A noticeable trend is the higher prevalence of malignant results among patients with larger tumor sizes and the presence of invasive nodes. Additionally, postmenopausal women seem to have a higher rate of malignant diagnoses.Interpretation and Usage:The data can be analyzed using statistical and machine learning techniques to determine the strength and significance of associations between patient characteristics and breast cancer diagnosis. This can contribute to predictive modeling for the early detection and diagnosis of breast cancer.However, the interpretation must consider potential limitations, such as missing data or bias in data collection. Furthermore, the data reflects patients from a single hospital, limiting the generalizability of the findings to wider populations.The data could be valuable for healthcare professionals, researchers, or policymakers interested in understanding breast cancer diagnosis factors and improving healthcare strategies for breast cancer. It could also be used in patient education about risk factors associated with breast cancer.

  14. breast cancer dataset

    • zenodo.org
    Updated Aug 1, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amanda Elgoraish; Amanda Elgoraish; Ahmed Alnory; Ahmed Alnory (2021). breast cancer dataset [Dataset]. http://doi.org/10.5281/zenodo.5150469
    Explore at:
    Dataset updated
    Aug 1, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Amanda Elgoraish; Amanda Elgoraish; Ahmed Alnory; Ahmed Alnory
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is a dataset of breast cancer patients

  15. D

    Global Breast Cancer Diagnostics Market – Industry Trends and Forecast to...

    • databridgemarketresearch.com
    Updated Feb 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Bridge Market Research (2023). Global Breast Cancer Diagnostics Market – Industry Trends and Forecast to 2030 [Dataset]. https://www.databridgemarketresearch.com/reports/global-breast-cancer-diagnostics-market
    Explore at:
    Dataset updated
    Feb 2023
    Dataset authored and provided by
    Data Bridge Market Research
    License

    https://www.databridgemarketresearch.com/privacy-policyhttps://www.databridgemarketresearch.com/privacy-policy

    Time period covered
    2023 - 2030
    Area covered
    Global
    Description

    Report Metric

    Details

    Forecast Period

    2023 to 2030

    Base Year

    2022

    Historic Years

    2021 (Customizable to 2020-2016)

    Quantitative Units

    Revenue in USD Million, Volumes in Units, Pricing in USD

    Segments Covered

    By Test Type (Imaging, Biopsy, Genomic Test, Blood Test, and Others), Type (Ductal In Situ Carcinoma, Invasive Ductal Carcinoma, Inflammatory Breast Cancer, and Metastatic Breast Cancer), End User (Hospitals, Clinics, Research and Academic Institutes, Diagnostic Centers, and Others) Distribution Channel (Direct Tender, Retail Sales, and Others).

    Countries Covered

    U.S., Canada, Mexico, Germany, France, U.K., Italy, Russia, Spain, Netherlands, Switzerland, Belgium, Turkey, Ireland and Rest of Europe, China, Japan, India, Australia, South Korea, Singapore, Malaysia, Thailand, Indonesia, Philippines and Rest of Asia-Pacific, South Africa, Saudi Arabia, UAE, Egypt, Israel and Rest of Middle East and Africa, Brazil, Argentina, and Rest of South America.

    Market Players Covered

    The major companies which are dealing in the market are F-Hoffmann La Roche Ltd., Siemens Healthcare GmbH, General Electric, Koninklijke Philips N.V., FUJIFILM Corporation, Abbott, Hologic, Inc., OncoStem, Provista Diagnostics, Thermo Fisher Scientific Inc., Myriad Genetics, Inc., Illumina, Inc., Bio-Rad Laboratories, Inc., BD, NanoString., Cepheid, BIOMÉRIEUX, Exact Sciences Corporation, Biocept, Inc., and Abacus ALS, among others.

  16. A

    ‘Breast cancer dataset ’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Sep 30, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2021). ‘Breast cancer dataset ’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-breast-cancer-dataset-6b64/8cb0cbd8/?iid=005-819&v=presentation
    Explore at:
    Dataset updated
    Sep 30, 2021
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Breast cancer dataset ’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/ankitbarai507/breast-cancer-dataset on 30 September 2021.

    --- Dataset description provided by original source is as follows ---

    Acknowledgements

    This dataset is taken from UCI machine learning repository

    Inspiration

    Create a classifier that can predict the risk of having breast cancer with routine parameters for early detection.

    --- Original source retains full ownership of the source dataset ---

  17. k

    Breast-Cancer-Gene-Expression-Profiles--METABRIC-

    • kaggle.com
    Updated May 27, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2020). Breast-Cancer-Gene-Expression-Profiles--METABRIC- [Dataset]. https://www.kaggle.com/datasets/raghadalharbi/breast-cancer-gene-expression-profiles-metabric
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 27, 2020
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    Clinical attributes, m-RNA levels z-score, and genes mutations for 1904 patients

  18. d

    Breast Cancer Dataset

    • datamed.org
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Breast Cancer Dataset [Dataset]. https://datamed.org/display-item.php?repository=0008&idName=ID&id=5914e0695152c67771b39c91
    Explore at:
    Description

    Signatures of Oncogenic Pathway Deregulation in Human Cancers. The ability to define cancer subtypes, recurrence of disease, and response to specific therapies using DNA microarray-based gene expression signatures has been demonstrated in multiple studies. Such data is also of substantial importance to the analysis of cellular signaling pathways central to the oncogenic process. With this focus, we have developed a series of gene expression signatures that reliably reflect the activation status of several oncogenic pathways. When evaluated in several large collections of human cancers, these gene expression signatures identify patterns of pathway deregulation in tumors, and clinically relevant associations with disease outcomes. Combining signature-based predictions across several pathways identifies coordinated patterns of pathway deregulation that distinguish between specific cancers and tumor sub-types. Clustering tumors based on pathway signatures further defines prognosis in respective patient subsets, demonstrating that patterns of oncogenic pathway deregulation underlie the development of the oncogenic phenotype and reflect the biology and outcome of specific cancers. Furthermore, predictions of pathway deregulation in cancer cell lines are shown to coincide with sensitivity to therapeutic agents that target components of the pathway, underscoring the potential for such pathway prediction to guide the use of targeted therapeutics. Keywords: other Overall design: RNA was extracted from frozen tissue of primary breast tumors for gene array analysis.

  19. A

    ‘Breast Cancer Wisconsin - benign or malignant’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Sep 30, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2021). ‘Breast Cancer Wisconsin - benign or malignant’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-breast-cancer-wisconsin-benign-or-malignant-ca20/daff23d1/?iid=042-127&v=presentation
    Explore at:
    Dataset updated
    Sep 30, 2021
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Breast Cancer Wisconsin - benign or malignant’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/ninjacoding/breast-cancer-wisconsin-benign-or-malignant on 30 September 2021.

    --- Dataset description provided by original source is as follows ---

    Context

    It is quite common to find ML-based applications embedded with real-time patient data available from different healthcare systems in multiple countries, thereby increasing the efficacy of new treatment options which were unavailable before. This data set is all about predicting whether the cancer cells are benign or malignant.

    Content

    Information about attributes:

    There are total 10 attributes(int)- Sample code number: id number Clump Thickness: 1 - 10 Uniformity of Cell Size: 1 - 10 Uniformity of Cell Shape: 1 - 10 Marginal Adhesion: 1 - 10 Single Epithelial Cell Size: 1 - 10 Bare Nuclei: 1 - 10 Bland Chromatin: 1 - 10 Normal Nucleoli: 1 - 10 Mitoses: 1 - 10 Predicted class: 2 for benign and 4 for malignant

    Acknowledgements

    This data set(Original Wisconsin Breast Cancer Database) is taken from UCI Machine Learning Repository.

    Inspiration

    This is the first ever data set I am sharing in Kaggle. It would be a great pleasure if you find this data set useful to develop your own model. Hope this simple data set will help beginners to develop their own models for classification and learn how to make their model even better.

    --- Original source retains full ownership of the source dataset ---

  20. E

    Breast Cancer Statistics – By Demographics, Region, Type and Mortality Rate

    • enterpriseappstoday.com
    Updated Mar 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    EnterpriseAppsToday (2023). Breast Cancer Statistics – By Demographics, Region, Type and Mortality Rate [Dataset]. https://www.enterpriseappstoday.com/stats/breast-cancer-statistics.html
    Explore at:
    Dataset updated
    Mar 13, 2023
    Dataset authored and provided by
    EnterpriseAppsToday
    License

    https://www.enterpriseappstoday.com/privacy-policyhttps://www.enterpriseappstoday.com/privacy-policy

    Time period covered
    2022 - 2032
    Area covered
    Global
    Description

    Introduction

    Breast Cancer Statistics: Unfortunately, Cancer like disease is increasing globally due to bad lifestyles and habits. But thanks to technology the rate of life expectancy, early-stage diagnosis, and survival rates have increased as well. According to the studies, breast cancer is the most common in women due to various factors. Fortunately, we observe global cancer awareness month in October every year. In such ways, various communities come together as support groups. These breast cancer statistics represent the data collected from a global perspective. As responsible citizens of each country, we must take proper steps against cancer patients to keep them motivated for a cheerful life.

    Editor’s Choice

    • Currently, there are around 3.5 million survivors in the United States of America.
    • The risk associated with getting diagnosed with breast cancer is around 95% in women aged 40 years and above.
    • Every year, around 220,000 women in the United States of America are at risk of getting diagnosed with invasive breast cancer.
    • Breast cancer is 100 times more common in women than men as stated by Breast cancer statistics.
    • Around the world, breast cancer has become the second leading cause of death among women.
    • 1 out of 1000 men is at risk of getting diagnosed with breast cancer.
    • The mortality rate due to breast cancer has been reduced because of technological inventions.
    • Around the world, 18.1 million people are cancer survivors.
    • Breast cancer statistics say that by the year 2032, the number of breast cancer survivors will increase by 24% resulting in 22.5 million survivors.
Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Health (2024). Breast Cancer Wisconsin [Dataset]. https://data.world/health/breast-cancer-wisconsin

Data from: Breast Cancer Wisconsin

Breast Cancer Wisconsin (Diagnostic) Data Set

Related Article
Explore at:
3 scholarly articles cite this dataset (View in Google Scholar)
csv, zipAvailable download formats
Dataset updated
Apr 11, 2024
Authors
Health
Area covered
Wisconsin
Description

Source:

Creators:

  1. Dr. William H. Wolberg, General Surgery Dept.
    University of Wisconsin, Clinical Sciences Center
    Madison, WI 53792
    wolberg '@' eagle.surgery.wisc.edu

  2. W. Nick Street, Computer Sciences Dept.
    University of Wisconsin, 1210 West Dayton St., Madison, WI 53706
    street '@' cs.wisc.edu 608-262-6619

  3. Olvi L. Mangasarian, Computer Sciences Dept.
    University of Wisconsin, 1210 West Dayton St., Madison, WI 53706
    olvi '@' cs.wisc.edu

Donor: Nick Street

Source: UCI - Machine Learning Repository

Search
Clear search
Close search
Google apps
Main menu