An Empirical Comparison of Machine Learning Algorithms for Predicting Breast Cancer

An Empirical Comparison of Machine Learning Algorithms for Predicting Breast Cancer

According to recent statistics, breast cancer is one of the most prevalent cancers among women in the world. It represents the majority of new cancer cases and cancer-related deaths. Early diagnosis is very important, as it becomes fatal unless detected and treated in early stages. With the latest advances in artificial intelligence and machine learning (ML), there is a great potential to diagnose breast cancer by using structured data. In this paper, we conduct an empirical comparison of 10 popular machine learning models for the prediction of breast cancer. We used well known Wisconsin Breast Cancer Dataset (WBCD) to train the models and employed advanced accuracy metrics for comparison. Experimental results show that all models demonstrate superior accuracy, while Support Vector Machines (SVM) had slightly better performance than other methods. Logistic Regression, K-Nearest Neighbors and Neural Networks also proved to be strong classifiers for predicting breast cancer.

___

  • Agarap, A. F. M. (2018). On breast cancer detection: an application of machine learning algorithms on the wisconsin diagnostic dataset. Paper presented at the Proceedings of the 2nd International Conference on Machine Learning and Soft Computing.
  • Akay, M. F. (2009). Support vector machines combined with feature selection for breast cancer diagnosis. Expert Systems with Applications, 36(2), 3240-3247. American Cancer Society. (2018). "Cancer Facts & Figures 2018". Atlanta, American Cancer Society.
  • Boser, B. E., Guyon, I. M., & Vapnik, V. N. (1992). A training algorithm for optimal margin classifiers. Paper presented at the Proceedings of the fifth annual workshop on Computational learning theory.
  • Breiman, L. (2001). Random forests. Machine learning, 45(1), 5-32.
  • Breiman, L., Friedman, J., Olshen, R., & Stone, C. (1984). Classification and regression trees. Wadsworth Int. Group, 37(15), 237-251.
  • Chen, T., & Guestrin, C. (2016). Xgboost: A scalable tree boosting system. Paper presented at the Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining.
  • Chen, T., He, T., Benesty, M., Khotilovich, V., & Tang, Y. (2015). Xgboost: extreme gradient boosting. R package version 0.4-2, 1-4.
  • Clark, P., & Niblett, T. (1989). The CN2 induction algorithm. Machine learning, 3(4), 261-283.
  • Cortes, C., & Vapnik, V. (1995). Soft margin classifiers. Machine learning, 20, 273-297.
  • Cover, T. M., & Hart, P. (1967). Nearest neighbor pattern classification. IEEE transactions on information theory, 13(1), 21-27.
  • Frank, E., Hall, M., & Pfahringer, B. (2002). Locally weighted naive bayes. Paper presented at the Proceedings of the Nineteenth conference on Uncertainty in Artificial Intelligence.
  • Freund, Y., & Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of computer and system sciences, 55(1), 119-139.
  • Friedman, J. H. (2001). Greedy function approximation: a gradient boosting machine. Annals of statistics, 1189-1232.
  • Friedman, J. H. (2002). Stochastic gradient boosting. Computational statistics & data analysis, 38(4), 367-378.
  • Han, J., Kamber, M., & Pei, J. (2011). Data mining concepts and techniques third edition. The Morgan Kaufmann Series in Data Management Systems, 83-124.
  • Hecht-Nielsen, R. (1992). Theory of the backpropagation neural network. In Neural networks for perception (pp. 65-93): Elsevier.
  • Ho, T. K. (1998). Nearest neighbors in random subspaces. Paper presented at the Joint IAPR International Workshops on Statistical Techniques in Pattern Recognition (SPR) and Structural and Syntactic Pattern Recognition (SSPR).
  • Hosmer Jr, D. W., Lemeshow, S., & Sturdivant, R. X. (2013). Applied logistic regression (Vol. 398): John Wiley & Sons.
  • Jain, D., & Singh, V. (2018). Diagnosis of Breast Cancer and Diabetes using Hybrid Feature Selection Method. Paper presented at the 2018 Fifth International Conference on Parallel, Distributed and Grid Computing (PDGC).
  • Kadam, V. J., Jadhav, S. M., & Vijayakumar, K. (2019). Breast Cancer Diagnosis Using Feature Ensemble Learning Based on Stacked Sparse Autoencoders and Softmax Regression. Journal of medical systems, 43(8), 263.
  • Kleinbaum, D. G., Dietz, K., Gail, M., Klein, M., & Klein, M. (2002). Logistic regression: Springer.
  • Li, X., Wang, L., & Sung, E. (2008). AdaBoost with SVM-based component classifiers. Engineering Applications of Artificial Intelligence, 21(5), 785-795.
  • Liaw, A., & Wiener, M. (2002). Classification and regression by randomForest. R news, 2(3), 18-22.
  • Mitchell, T. M. (1997). Machine learning. 1997. Burr Ridge, IL: McGraw Hill, 45(37), 870-877.
  • Morgan, J. N., & Sonquist, J. A. (1963). Problems in the analysis of survey data, and a proposal. Journal of the American statistical association, 58(302), 415-434.
  • Omondiagbe, D. A., Veeramani, S., & Sidhu, A. S. (2019). Machine Learning Classification Techniques for Breast Cancer Diagnosis. Paper presented at the IOP Conference Series: Materials Science and Engineering.
  • Polat, K., & Güneş, S. (2007). Breast cancer diagnosis using least square support vector machine. Digital signal processing, 17(4), 694-701.
  • Rashed, E., & El Seoud, M. (2019). Deep learning approach for breast cancer diagnosis. Paper presented at the Proceedings of the 2019 8th International Conference on Software and Information Engineering.
  • Rosenblatt, F. (1958). The perceptron: a probabilistic model for information storage and organization in the brain. Psychological review, 65(6), 386.
  • Rustam, Z., & Hartini, S. (2019). Classification of Breast Cancer using Fast Fuzzy Clustering based on Kernel. Paper presented at the IOP Conference Series: Materials Science and Engineering.
  • Sadhukhan, S., Upadhyay, N., & Chakraborty, P. (2020). Breast Cancer Diagnosis Using Image Processing and Machine Learning. In Emerging Technology in Modelling and Graphics (pp. 113-127): Springer.
  • Sethi, A. (2018). Analogizing of Evolutionary and Machine Learning Algorithms for Prognosis of Breast Cancer. Paper presented at the 2018 7th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions)(ICRITO).
  • Siegel, R., & Jemal, A. (2015). Cancer facts & figures 2015. American Cancer Society Cancer Facts & Figures.
  • Sri, M. N., Sailaja, D., Priyanka, J. H., Chittineni, S., & RamaKrishnaMurthy, M. (2019). Performance Evaluation of SVM and Neural Network Classification Methods for Diagnosis of Breast Cancer. Paper presented at the International Conference on E-Business and Telecommunications.
  • ŞENOL, Ü., & MUSAYEV, Z. Estimating Wind Energy Potential by Artificial Neural Networks Method. Bilge International Journal of Science and Technology Research, 1(1), 23-31.
  • Tekin, S., & Çan, T. Yapay Sinir Ağları Yöntemi ile Ermenek Havzası’nın (Karaman) Kayma Türü Heyelan Duyarlılık Değerlendirmesi. Bilge International Journal of Science and Technology Research, 3(1), 21-28.
  • Timofeev, R. (2004). Classification and regression trees (CART) theory and applications. Humboldt University, Berlin.
  • Tokmak, M., & Küçüksille, E. U. Kötü Amaçlı Windows Çalıştırılabilir Dosyalarının Derin Öğrenme İle Tespiti. Bilge International Journal of Science and Technology Research, 3(1), 67-76.
  • Vapnik, V. (1998). Statistical Learning Theory Wiley-Interscience. New York.
  • Wright, R. E. (1995). Logistic regression.
  • Yue, W., Wang, Z., Chen, H., Payne, A., & Liu, X. (2018). Machine learning with applications in breast cancer diagnosis and prognosis. Designs, 2(2), 13.
  • Zheng, Z., & Webb, G. I. (2000). Lazy learning of Bayesian rules. Machine learning, 41(1), 53-84.
Bilge International Journal of Science and Technology Research-Cover
  • ISSN: 2651-401X
  • Yayın Aralığı: Yılda 2 Sayı
  • Başlangıç: 2017
  • Yayıncı: Kutbilge Akademisyenler Derneği
Sayıdaki Diğer Makaleler

Adana Katı Atık Toplama Tesisinin Mevcut Yer Seçim Uygunluğunun Konumsal Bilgi Teknolojileri ile Değerlendirilmesi

Müge ÜNAL, Ahmet ÇİLEK, Esra Deniz GÜNER

The effects of a single atom substitution and temperature on electronic and photophysical properties F8T2 organic material

Mustafa KURBAN

An Empirical Comparison of Machine Learning Algorithms for Predicting Breast Cancer

Fatih BASCİFTCİ, Hamit Taner ÜNAL

Classification of The Monolithic Columns Produced in Troad and Mysia Region Ancient Granite Quarries in Northwestern Anatolia via Soft Decision-Making

Serdar ENGİNOĞLU, Murat AY, Naim ÇAĞMAN, Veysel TOLUN

Effects of Using JP8-Diesel Fuel Mixtures in a Pump Injector Engine on Engine Performance

Hasan AYDOĞAN, Emin Çağatay ALTINOK

Review of traditionally consumed antidiabetic fruits in the diet

Ebru AYDIN

Development of Reinforced Composites Containing Tea Tree Oil and Propolis for the Treatment of Horse Hoof Cracks

Kamila SOBKOWİAK, Tomasz GOZDEK, Mustafa KARABOYACI

A Data Classification Method in Machine Learning Based on Normalised Hamming Pseudo-Similarity of Fuzzy Parameterized Fuzzy Soft Matrices

Samet MEMİŞ, Serdar ENGİNOĞLU, Uğur ERKAN

Controlling structural and electronic properties of ZnO NPs: Density-functional tight-binding method

Mustafa KURBAN, Hasan KURBAN, Mehmet DALKILIÇ

Numerical Simulation of Annular Flow boiling in Millimeter-scale Channels and Investigation of Design Parameters Using Taguchi Method

Aliihsan KOCA, Mansour Nasiri KHALAJİ