An investigation of ensemble learning methods in classification problems and an application on non-small-cell lung cancer data

An investigation of ensemble learning methods in classification problems and an application on non-small-cell lung cancer data

This study aims to classify NSCLC death status and consists of patient records of 24 variables created by the open-source dataset of the cancer data site. Besides, basic classifiers such as SMO (Sequential Minimal Optimization), K-NN (K-Nearest Neighbor), random forest, and XGBoost (Extreme Gradient Boosting), which are machine learning methods, and their performances, and voting, bagging, boosting, and stacking methods from ensemble learning methods were used. Performance evaluation of models was compared in terms of accuracy, specificity, sensitivity, precision, and Roc curve. The basic classifier performances of random forest, SMO, K-NN, and XGBoost classifiers, their performances in the bagging ensemble learning method, and their performances in the boosting ensemble learning method are evaluated. In addition, Model 1 (random forest + SMO), Model 2 (XGBoost + K-NN), Model 3 (random forest + K-NN), Model 4 (XGBoost+SMO), Model 5 (SMO+K-NN + random forest), Model 6 (SMO+K-NN+XGBoost) and Model 7 (SMO+K-NN + random forest + XGBoost) the performances of in different metrics were expressed. The boosting ensemble learning method, which provides the maximum classification performance with XGBoost, achieved a 0.982 accuracy value, 0.971 sensitivity value, 0.989 precision value, 0.989 specificity value, and 0.998 ROC curve. It is recommended to use ensemble learning methods for classification problems in patients with a high prevalence of cancer to achieve successful results.

___

  • 1. Gunbatar H, Sertogullarindan B, et al. Evaluation of cases with lung cancer; 3-year analysis. Van Med J. 2012;13-20.
  • 2. TT. Association and Annual Congress. Lung and pleural malignancies working group, Turkey's lung cancer map project. Turkey's Lung Cancer Incidence. 2005;13.
  • 3. Yilmaz A, Damadoglu E, Salturk C, et al. Delays in the diagnosis and treatment of primary lung cancer: are longer delays associated with the advanced pathological stage? Ups J Med Sci. 2008;113.3:287-96.
  • 4. Schreiber G, McCrory DC. Performance characteristics of different modalities for diagnosis of suspected lung cancer: summary of published evidence. Chest. 2003;123;115-28.
  • 5. Yoh W.M. TNM classification for lung cancer. Ann Thorac Cardiovasc Surg. 2003;9:343-50. 6. Carvalho S, Troost EG, Bons J, et al. Prognostic value of blood-biomarkers related to hypoxia, inflammation, immune response, and tumor load in nonsmall cell lung cancer–A survival model with external validation. Radiother Oncol. 2016;119:487-94.
  • 7. Akman M, Genc Y, Ankarali H. Random Forests Methods and an Application in Health Science. Turkiye Klinikleri J Biostat. 2011;36-48.
  • 8. Rao S, Gupta P. Implementing Improved Algorithm Over Apriori Data Mining Association Rule Algorithm. Int J Comput Sci Inf Technol. 2012;3:489-93.
  • 9. Alatas B. Fuzzy Logic and Genetic Algorithm Approach to the Discovery of Quantitative Association Rules. FiUniv Unv. Graduate School of Natural and Applied Sciences. 2003; Elazig.
  • 10. Karabatak M, Ince MC. Student Success Analysis with Apriori Algorithm. Elektrik Elektronik Bilgisayar Mühendisliği Sempozyumu (ELECO), Bursa. 2004.
  • 11. Platt J. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Advances in large margin classifiers,, 1999;10.3:61-74.
  • 12. Manimaran J, Velmurugan T. Analysing the quality of association rules by computing an interestingness measure. Indian J Sci Technol. 2015;1-12.
  • 13. Kumar S, Joshi N. Rule power factor: a new interest measure in associative classification. Procedia Comput Sci. 2016;12-8.
  • 14. Yildiz O. Melanoma detection from dermoscopy images with deep learning methods: A comprehensive study. J Fac Eng Archit Gazi Univ. 2019;2241- 60.
  • 15. Chen MS, Han J, Yu PS. Data mining: an overview from a database perspective. IEEE Trans Knowl Data Eng. 1996;866-83.
  • 16. Berzal F, Blanco I, Sánchez D, Vila M.A. Measuring the accuracy and interest of association rules: A new framework. Intell Data Anal. 2002;221- 35.
  • 17. Polikar R. Ensemble learning. in Ensemble machine learning: Springer. 2012;1-34.
  • 18. Breiman L. Random forests. Mach Learn. 2001;45:5-32.
  • 19. Zhou ZH. Ensemble methods: foundations and algorithms. CRC Press. 2012.
  • 20. Wolpert DH. Stacked generalization. Neural Networks. 1992;241-59.
  • 21. Alpar R. Applied Statistics and Validity-Reliability with Examples from Sports, Health and Education Sciences. 2016;513-57.
  • 22. Yasar S, Arslan A, Colak C, Yologlu S. A Developed Interactive Web Application for Statistical Analysis: Statistical Analysis Software. MBSJ Health Sci. 2020;227-39.
  • 23. Campbell M. RStudio Projects. in Learn RStudio IDE: Springer. 2019;39- 48.
  • 24. Hofmann M, Klinkenberg R. RapidMiner: Data mining use cases and business analytics applications. CRC Press. 2016.
  • 25. Yadav S, Shukla S. Analysis of k-fold cross-validation over hold-out validation on colossal datasets for quality classification. in 2016 IEEE 6th IACC. 2016;78-83.
  • 26. Jemal A, Thomas A, Murray T, Thun M. Cancer statistics. Ca-Cancer J Clin. 2002;52:23-47.
  • 27. Cokkinides V, Albano J, Samuels A, et al. American cancer society: Cancer facts and figures. Atlanta: American Cancer Society. 2005.
  • 28. Capewell S. Patients presenting with lung cancer in South East Scotland. Thorax. 1987;42:853-7.
  • 29. Kourou K, Exarchos TP, et al. Machine learning applications in cancer prognosis and prediction. Comput Struct Biotechnol J. 2015;13:8-17.
  • 30. Asri H, Mousannif H, Al Moatassime H, Noel T. Using machine learning algorithms for breast cancer risk prediction and diagnosis. Procedia Comput Sci. 2016;83:1064-9.
  • 31. Karacan H, Yesilbudak M. User-Centered Interactive Data Mining: A Literature Review. J Information Technologies2010;3.
  • 32. Li K, Liu Z, Han Y. Study of selective ensemble learning methods based on support vector machine. Phys Procedia. 2012;33:1518-25.
  • 33. Nishio M, Nishizawa M, Sugiyama O, et al. Computer-aided diagnosis of lung nodule using gradient tree boosting and Bayesian optimization. PloS One. 2018;13:e0195875.
  • 34. Faisal MI, Bashir S, Khan ZS, Khan FH. An evaluation of machine learning classifiers and ensembles for early-stage prediction of lung cancer. Paper presented at 2018 3rd International Conference on Emerging Trends in Engineering, Sciences and Technology (ICEEST)] IEEE. 2018.
Medicine Science-Cover
  • ISSN: 2147-0634
  • Yayın Aralığı: Yılda 4 Sayı
  • Başlangıç: 2012
  • Yayıncı: Effect Publishing Agency ( EPA )
Sayıdaki Diğer Makaleler

Excessive patient relatives in the examination room on the daily orthopaedic outpatient clinic and decreasing methods

Mustafa Abdullah ÖZDEMİR, Ömer AYIK, Murat ALTAN

Evaluation of the effects of curcumin, erdosteine, vitamin E and vitamin C on paracetamol toxicity

Nurşah BAŞOL, Cansel ÖZMEN, Seda OCAKLI, Selçuk ÇETİN

Evaluation of pleural effusions developed after abdominal operations

Adil KOYUNCU, Cihad TATAR, Aziz ARI, Türkan DÜBÜŞ

Evaluation of a sternum dehiscence reconstruction graft on an animal model

Ferit KASIMZADE, Özhan KARATAŞ, Fatih ADA

Working environment-related leisure time satisfaction levels and health behaviors of university office workers and ergonomic solutions

Cihan ÖNEN, Erhan DİNÇER, Muhammed Bahadır SANDIKÇI

A rare manifestation of giant cell arteritis: Bilateral scalp necrosis

İsmail Okan YILDIRIM, Ayşegül ÖZGÜL, Rafet ÖZBEY, Dursun TÜRKMEN, Serpil ŞENER, Nihal ALTUNIŞIK, Mücahit MARSAK, Server YOLBAŞ

The relationship between women’s health literacy and traditional practices about infant care and breastfeeding in the postnatal period

Şermin TİMUR TAŞHAN, Simge ÖZTÜRK

The effect of pregnant Women's family sense of coherence on the fear of COVID-19

Hacer ÜNVER, Nilay FİLOĞLU ERSU

The role of neutrophil-lymphocyte ratio, platelet-lymphocyte ratio, crp and albumin to predict postoperative one-year mortality in patients with hip fractures

Mustafa YERLİ, Ali YÜCE, Hakan GÜRBÜZ, Niyazi İĞDE, Mustafa Buğra AYAZ

Retrospective investigation of the effect of Vitamin B12 deficiency on hemogram parameters

Müjgan GÜRLER, Orhan YAZAR