Classification of stroke with gradient boosting tree using smote-based oversampling method

Classification of stroke with gradient boosting tree using smote-based oversampling method

The aim of this study is to classify the disease with the gradient increasing tree classification method in an open access dataset containing data from patients with and without stroke disease. In addition, it is aimed to compare the results by balancing the data with the oversampling method Synthetic Minority Over-sampling Technique (SMOTE) which is one of the data balancing methods in the study. In this study, a dataset containing information about patients with and without stroke disease obtained from the address "https://www.kaggle.com/asaumya/healthcare-problem-prediction-stroke-patients" was used. In the study, SMOTE was used as the data balancing method, and the gradient boosting tree method was used in the modeling. The performance of the model was evaluated by Specificity, sensitivity, accuracy, positive predictive value and negative predictive values. Specificity, sensitivity, accuracy, positive predictive value and negative predictive values were obtained as 0.0887, 0.9772, 0.9339, 0.9544 and 0.1679, respectively, according to the modeling result using the gardient boosting tree method using the original version of the dataset. Specificity, sensitivity, accuracy, positive predictive value and negative predictive values were obtained as 0.0887, 0.9772, 0.9339, 0.9544 and 0.1679, respectively, according to the modeling result using the gardient boosting tree method using the SMOTE applied version of the dataset. When the results obtained from the study were examined, the modeling results made with the SMOTE applied dataset were obtained more consistently and realistically. As a result, it is suggested that researchers use dataset balancing methods to acquire more accurate results whenever they come across an unbalanced dataset problem.

___

  • 1. Baroni AFFB, Fábio SRC, Dantas RO. Risk factors for swallowing dysfunction in stroke patients. Arq Gastroenterol. 2012;49:118-24.
  • 2. Hatano S. Experience from a multicentre stroke register: a preliminary report, Bull World Health Organ. 1976;54:541.
  • 3. Sarikaya H, Ferro J, Arnold M. Stroke prevention-medical and lifestyle measures. Eur Neurol. 2015;73:150-7.
  • 4. Feigin VL, Nichols E, Alam T, et al. Global, regional, and national burden of neurological disorders, 1990–2016: a systematic analysis for the Global Burden of Disease Study 2016. Lancet Neurol. 2019;18:459-80.
  • 5. Stinear CM, Lang CE, Zeiler S, et al. Advances and challenges in stroke rehabilitation. Lancet Neurol. 2020;19:348-60.
  • 6. Adams RD, Victor M, Ropper AH, et al. Principles of neurology, ed: LWW, 1997.
  • 7. Tasci ME, Samli R. Diagnosis of Heart Disease with Data Mining. Eur J Eng Sci Tech. 2020;88-95.
  • 8. Dogan A, Birant D. Machine learning and data mining in manufacturing. Expert Syst Appl. 2020;114060.
  • 9. Ozmen O, Ahmad K, Engin A. Performance Comparison of Classifiers on Heart Disease Data. Fırat University J Engineering Sciences. 2018;30:153- 9.
  • 10. Chawla NV, Bowyer KW, Hall LO, et al. SMOTE: synthetic minority oversampling technique. J Artif Intell Res. 2002;16:321-57.
  • 11. Natekin A, Knoll A. Gradient boosting machines, a tutorial. Front Neurorobot. 2013;7:21.
  • 12. Maldonado S, López J, Vairetti C. An alternative SMOTE oversampling strategy for high-dimensional datasets. Appl. Soft Comput. 2019;76:380-9. classification using SMOTE and cluster-based undersampling, in 2015 7Th international joint conference on knowledge discovery, knowledge engineering and knowledge management (IC3k). 2015;226-34.
  • 14. Manju B, Nair AR. Classification of Cardiac Arrhythmia of 12 Lead ECG Using Combination of SMOTEENN, XGBoost and Machine Learning Algorithms, in 2019 ISED. 2019;1-7.
  • 15. Wang Z, Wu C, Zheng K, et al. SMOTETomek-based resampling for personality recognition. IEEE Access. 2019;7:129678-89.
  • 16. Li T, Levina E, Zhu J. Network cross-validation by edge sampling. Biometrika. 2020;107:257-6.
  • 17. Guelman L. Gradient boosting trees for auto insurance loss cost modeling and prediction. Expert Syst Appl. 2012;39:3659-67.
  • 18. Friedman JH. Greedy function approximation: a gradient boosting machine. Ann Stat. 2001;1189-232.
  • 19. Ahmed T, Bosu A, Iqbal A, et al. SentiCR: a customized sentiment analysis tool for code review interactions, in 2017 32nd IEEE/ACM International Conference on ASE. 2017;106-11.
  • 20. Medin J, Nordlund A, Ekberg K. Increasing stroke incidence in Sweden between 1989 and 2000 among persons aged 30 to 65 years: evidence from the Swedish Hospital Discharge Register. Stroke. 2004;35:1047-51.
  • 21. Altun Y, Aydin I, and Algin A. Demographic characteristics of stroke types in Adiyaman. Turkish J Norol. 2018;24:6.
  • 22. Katan M, Luft A. Global burden of stroke, in Seminars in neurology. 2018;208-11.
  • 23. Wu G, Chang EY. Class-boundary alignment for imbalanced dataset learning, in ICML 2003 workshop on learning from imbalanced data sets II, Washington, DC. 2003;49-56.
  • 24. He H, Garcia EA. Learning from imbalanced data, IEEE Trans Knowl Data Eng. 2009;1263-84.
  • 25. Han H, Wang WY, Mao BH. Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning, in International conference on intelligent computing. 2005;878-87.
Medicine Science-Cover
  • ISSN: 2147-0634
  • Yayın Aralığı: 4
  • Başlangıç: 2012
  • Yayıncı: Effect Publishing Agency ( EPA )
Sayıdaki Diğer Makaleler

A comparative analysis of long-term life qualities of head and neck cancer (nasopharyngeal, hypopharyngeal and laryngeal cancers) patients who were treated with the intensity-modulated radiation therapy (IMRT): Expectations and outcomes

Mehmet Turan Cicek, Mehmet Aslan

How technology addiction affects social anxiety in adolescent girls? A sample of Turkey's southeast

Behiye Di̇lmen Bayar, Funda Kavak Budak

The effect of intraoperative neuromonitoring on complications of hypocalcemia following

Mehmet Turan Cicek, Mehmet Aslan

Respiratory viruses in hospitalized children with acute respiratory infections at 2019-2020 autumn-winter season: A single-center experience before COVID-19 pandemic

Mursit Hasbek, Ayca Komurluoglu Tan

An unusual cause of hypoglycemia; the case of a male patient with Munchausen's Syndrome

Bahri Evren, Ibrahim Sahi̇n, Selin Genc, Ayshe Slocum, Abdulkadir Bozbay, Emine Sener Aydin

Comparison of mortality rate between patient with or without dementia following geriatric hip fracture surgery

Mehmet EKİNCİ, Mehmet ERSİN, Erol GUNEN, Murat YILMAZ

Post-Covid pain frequency and affecting factors

Bora TETİK, Gulsum Hilal DEMİ R, Osman KURT, Serdar DERYA

Experience of ibrutinib in a patient with recurrent mantle cell lymphoma with orbital involvement

Emin Kaya, Omer Faruk Bahcecioglu, Mehmet Ali Erkurt, Hilal Er Ulubaba, Irfan Kuku, Soykan Bicim, Ahmet Sarici, Ilhami Berber

Evaluation of cerebral microbleeds in patients using rivaroxaban for cardioembolism prophylaxis in non-valvular atrial fibrillation

Ali Haydar Baykan, Sukru Sahin, Ali Zeynel Abidin TAK

Evaluation of serum perlecan levels in pregnancy with mild and severe preeclampsia

Tufan Arslanca, Muhammed Seyi̇thanoglu, Ugurkan Erkayiran, Abdullah Tok, Seli̇m Karakucuk