Topluluk Öğrenmesi Yöntemleri ile Göğüs Kanseri Teşhisi

Göğüs kanseri, dünya çapında kadınlar arasında en yaygın kanser türlerinden biridir ve günümüz toplumunda önemli bir halk sağlığı sorunu haline gelmiştir. Göğüs kanserinin erken teşhisi, hastaların klinik tedaviye zamanında ulaşabilmesini sağlayabileceği için hastaların hayatta kalma şansını önemli ölçüde artırabilmektedir. Makine Öğrenimi teknikleri, göğüs kanserinin erken teşhisi için hekimlerin yararlanabileceği etkili bir mekanizma olarak kullanılabilecek ve hastaların hayatta kalma oranını büyük ölçüde artıracak araçlar geliştirmek için kullanılabilir. Bu çalışmada, göğüs kanseri teşhisi için Destek Vektör Makinesi (SVM), K-En Yakın Komşuluk (KNN), Naive Bayes (NB), Karar Ağacı (DT) ve Rastgele Orman (RF) makine öğrenmesi yöntemlerinin doğruluk yüzdelerine göre performans karşılaştırmaları yapılmıştır. Bireysel sınıflandırıcıların doğruluk oranını arttırmak için bagging, boosting ve voting topluluk makine öğrenmesi yöntemleri uygulanmıştır. Oylama topluluk yöntemi için; çoğunluğa dayalı (hard) oylama ve olasılığa dayalı (soft) oylama yöntemleri kullanılmıştır. Yükseltme topluluk öğrenme yöntemi için; Adaptive Boosting (Adaboost), Gradient Boosting ve XGBoost algoritmaları kullanılmıştır. Kullanılan yöntemlerin performans karşılaştırması, Wisconsin Diagnostic Breast Cancer (WDBC) veri kümesi kullanılarak yapılmıştır. Veri özelliklerin ölçeklerinin farklı olması, bir veri kümesinin modellemesini olumsuz etkilemektedir. Bu nedenle, özelliklerin ölçeklendirilmesi için standardizasyon ön işleme işlemi uygulanmıştır. Topluluk öğrenme yöntemleri; kesinlik, duyarlılık, f-skor ve doğruluk performans kriterlerine göre karşılaştırılmıştır. En yüksek doğruluk yüzdesi; Soft Oylama (Ağırlıklandırılmış), Bagging (SVC), Adaboost (SVC) ve XGBoost topluluk öğrenme yöntemleriyle elde edilmiştir.

Breast Cancer Diagnosis with Ensemble Learning Methods

Breast cancer is one of the most common types of cancer among women worldwide and has become a major public health problem in today's society. Early diagnosis of breast cancer can significantly increase patients' chances of survival due to enabling patients to reach clinical treatment on time. Machine Learning techniques can be used for developing breast cancer detection tools that can be used as an effective mechanism which physicians can benefit from. In this study, performance comparisons were made according to the accuracy percentages of Support Vector Machine (SVM), K-Nearest Neighbors (KNN), Naive Bayes (NB), Decision Tree (DT) and Random Forest (RF) machine learning methods for breast cancer diagnosis. Bagging, boosting and voting ensemble machine learning methods were applied to increase the accuracy of individual classifiers. Hard voting and soft voting methods were used for the voting ensemble learning method. Adaptive Boosting (Adaboost), Gradient Boosting and XGBoost algorithms were used for the boosting ensemble learning method. The experiments were conducted on Wisconsin Diagnostic Breast Cancer (WDBC) Dataset. Different feature scales of data features negatively affect the modeling of a data set. Therefore, standardization preprocessing has been applied to scale the features. The ensemble learning methods were compared according to precision, recall, f-score, accuracy performance criteria. The highest accuracy percentage was obtained with Soft Voting, Bagging (SVC), Adaboost (SVC) and XGBoost ensemble learning methods.

___

  • Ahmad, L. G., Eshlaghy, A. T., Poorebrahimi, A., Ebrahimi, M. & Razavi, A. R. (2013). Using three machine learning techniques for predicting breast cancer recurrence. J Health Med Inform, 4(124), 3. http://dx.doi.org/10.4172/2157-7420.1000124
  • Akinnuwesi, B. A., Macaulay, B. O. & Aribisala, B. S. (2020). Breast cancer risk assessment and early diagnosis using Principal Component Analysis and support vector machine techniques. Informatics in Medicine Unlocked, 21, 100459.
  • Aruna, S., Rajagopalan, S. P. & Nandakishore, L. V. (2011). Knowledge based analysis of various statistical tools in detecting breast cancer. Computer Science & Information Technology, 2, 37-45.
  • Assiri, A. S., Nazir, S. & Velastin, S. A. (2020). Breast tumor classification using an ensemble machine learning method. Journal of Imaging, 6(6), 39.
  • Bashir, S., Qamar, U. & Khan, F. H. (2015). Heterogeneous classifiers fusion for dynamic breast cancer diagnosis using weighted vote based ensemble. Quality & Quantity, 49(5), 2061- 2076.
  • Bazazeh, D., & Shubair, R. (2016). Comparative study of machine learning algorithms for breast cancer detection and diagnosis. In 2016 5th International Conference on Electronic Devices, Systems and Applications (ICEDSA), 1-4.
  • Caplan, L. (2014). Delay in breast cancer: implications for stage at diagnosis and survival. Frontiers in public health, 2, 87. https://doi.org/10.3389/fpubh.2014.00087
  • Chaurasia, V. & Pal, S. (2014). Data mining techniques: To predict and resolve breast cancer survivability. International Journal of Computer Science and Mobile Computing IJCSMC, 3(1), 10-22.
  • Dhanya, R. Paul, I. R., Akula, S. S., Sivakumar, M., & Nair, J. J. (2020). F-test feature selection in Stacking ensemble model for breast cancer prediction. Procedia Computer Science, 171, 1561-1570. https://doi.org/10.1016/j.procs.2020.04.167
  • Gupta, P. & Garg, S. (2020). Breast cancer prediction using varying parameters of machine learning models. Procedia Computer Science, 171, 593-601.
  • International Agency for Research on Cancer. (2013). Latest world cancer statistics Global cancer burden rises to 14.1 million new cases in 2012: Marked increase in breast cancers must be addressed. World Health Organization, 12.
  • Jabbar, M. A. (2021). Breast Cancer Data Classification Using Ensemble Machine Learning. Engineering and Applied Science Research, 48(1), 65-72.
  • Kumar, U., Nikhil, M.S., & Sumangali, K. (2017). Prediction of breast cancer using voting classifier technique. In 2017 IEEE International Conference on Smart Technologies and Management for Computing, Communication, Controls, Energy and Materials (ICSTM), 108-114.
  • Lavanya, D. & Rani, K. U. (2012). Ensemble decision tree classifier for breast cancer data. International Journal of Information Technology Convergence and Services, 2(1), 17-24.
  • Meo, A. S. (2018). Blood Groups and Breast Cancer. Pakistan journal of medical sciences, 34(6), 1589. https://doi.org/10.12669/pjms.346.16824
  • Mitchell, T. M. (2005). Generative and Discriminative Classifiers: Naive Bayes and Logistic Regression. Draft book chapter.
  • Misra, S. & Li, H. (2019). Noninvasive fracture characterization based on the classification of sonic wave travel times. Machine Learning for Subsurface Characterization, 243-287.
  • Punitha, S., Al-Turjman, F. & Stephan, T. (2021). An automated breast cancer diagnosis using feature selection and parameter optimization in ANN. Computers & Electrical Engineering, 90, 106958. https://doi.org/10.1016/j.compeleceng.2020.106958
  • Rudra, S., Uddin, M. & Alam, M. M. (2019). Forecasting of breast cancer and diabetes using ensemble learning. International Journal of Computer Communication and Informatics, 1(1), 1-5.
  • Quinlan, J. R. (1993). C4. 5: Programming for machine learning. Morgan Kauffmann, 38(48), 49.
  • Saçlı, B., Aydınalp, C., Cansız, G., Joof, S., Yilmaz, T., Çayören, M., Önal, B. & Akduman, I. (2019). Microwave dielectric property based classification of renal calculi: Application of a kNN algorithm. Computers in biology and medicine, 112, 103366. https://doi.org/10.1016/j.compbiomed.2019.103366
  • Sree, S. V., Ng, E. Y., Acharya, R. U., & Faust, O. (2011). Breast imaging: A survey. World journal of clinical oncology, 2(4), 171–178. https://doi.org/10.5306/wjco.v2.i4.171
  • Vaka, A. R., Soni, B. & Reddy, S. (2020). Breast cancer detection by leveraging Machine Learning. ICT Express, 6(4), 320-324. https://doi.org/10.1016/j.icte.2020.04.009
  • Wang, H., Zheng, B., Yoon, S. W. & Ko, H. S. (2018). A support vector machine-based ensemble algorithm for breast cancer diagnosis. European Journal of Operational Research, 267(2), 687-699. https://doi.org/10.1016/j.ejor.2017.12.001
  • Williams, G. (2011). Descriptive and predictive analytics. In Data Mining with Rattle and R (ss. 171-177). Springer. https://doi.org/10.1007/978-1-4419-9890-3_8
  • Wolberg, W. H., Street, W. N. & Mangasarian, O. L. (1992). Breast cancer Wisconsin (diagnostic) data set. UCI Machine Learning Repository.