Classification of stroke with gradient boosting tree using smote-based oversampling method

Classification of stroke with gradient boosting tree using smote-based oversampling method

The aim of this study is to classify the disease with the gradient increasing tree classification method in an open access dataset containing data from patients with and without stroke disease. In addition, it is aimed to compare the results by balancing the data with the oversampling method Synthetic Minority Over-sampling Technique (SMOTE) which is one of the data balancing methods in the study. In this study, a dataset containing information about patients with and without stroke disease obtained from the address "https://www.kaggle.com/asaumya/healthcare-problem-prediction-stroke-patients" was used. In the study, SMOTE was used as the data balancing method, and the gradient boosting tree method was used in the modeling. The performance of the model was evaluated by Specificity, sensitivity, accuracy, positive predictive value and negative predictive values. Specificity, sensitivity, accuracy, positive predictive value and negative predictive values were obtained as 0.0887, 0.9772, 0.9339, 0.9544 and 0.1679, respectively, according to the modeling result using the gardient boosting tree method using the original version of the dataset. Specificity, sensitivity, accuracy, positive predictive value and negative predictive values were obtained as 0.0887, 0.9772, 0.9339, 0.9544 and 0.1679, respectively, according to the modeling result using the gardient boosting tree method using the SMOTE applied version of the dataset. When the results obtained from the study were examined, the modeling results made with the SMOTE applied dataset were obtained more consistently and realistically. As a result, it is suggested that researchers use dataset balancing methods to acquire more accurate results whenever they come across an unbalanced dataset problem.

___

  • 1. Baroni AFFB, Fábio SRC, Dantas RO. Risk factors for swallowing dysfunction in stroke patients. Arq Gastroenterol. 2012;49:118-24.
  • 2. Hatano S. Experience from a multicentre stroke register: a preliminary report, Bull World Health Organ. 1976;54:541.
  • 3. Sarikaya H, Ferro J, Arnold M. Stroke prevention-medical and lifestyle measures. Eur Neurol. 2015;73:150-7.
  • 4. Feigin VL, Nichols E, Alam T, et al. Global, regional, and national burden of neurological disorders, 1990–2016: a systematic analysis for the Global Burden of Disease Study 2016. Lancet Neurol. 2019;18:459-80.
  • 5. Stinear CM, Lang CE, Zeiler S, et al. Advances and challenges in stroke rehabilitation. Lancet Neurol. 2020;19:348-60.
  • 6. Adams RD, Victor M, Ropper AH, et al. Principles of neurology, ed: LWW, 1997.
  • 7. Tasci ME, Samli R. Diagnosis of Heart Disease with Data Mining. Eur J Eng Sci Tech. 2020;88-95.
  • 8. Dogan A, Birant D. Machine learning and data mining in manufacturing. Expert Syst Appl. 2020;114060.
  • 9. Ozmen O, Ahmad K, Engin A. Performance Comparison of Classifiers on Heart Disease Data. Fırat University J Engineering Sciences. 2018;30:153- 9.
  • 10. Chawla NV, Bowyer KW, Hall LO, et al. SMOTE: synthetic minority oversampling technique. J Artif Intell Res. 2002;16:321-57.
  • 11. Natekin A, Knoll A. Gradient boosting machines, a tutorial. Front Neurorobot. 2013;7:21.
  • 12. Maldonado S, López J, Vairetti C. An alternative SMOTE oversampling strategy for high-dimensional datasets. Appl. Soft Comput. 2019;76:380-9. classification using SMOTE and cluster-based undersampling, in 2015 7Th international joint conference on knowledge discovery, knowledge engineering and knowledge management (IC3k). 2015;226-34.
  • 14. Manju B, Nair AR. Classification of Cardiac Arrhythmia of 12 Lead ECG Using Combination of SMOTEENN, XGBoost and Machine Learning Algorithms, in 2019 ISED. 2019;1-7.
  • 15. Wang Z, Wu C, Zheng K, et al. SMOTETomek-based resampling for personality recognition. IEEE Access. 2019;7:129678-89.
  • 16. Li T, Levina E, Zhu J. Network cross-validation by edge sampling. Biometrika. 2020;107:257-6.
  • 17. Guelman L. Gradient boosting trees for auto insurance loss cost modeling and prediction. Expert Syst Appl. 2012;39:3659-67.
  • 18. Friedman JH. Greedy function approximation: a gradient boosting machine. Ann Stat. 2001;1189-232.
  • 19. Ahmed T, Bosu A, Iqbal A, et al. SentiCR: a customized sentiment analysis tool for code review interactions, in 2017 32nd IEEE/ACM International Conference on ASE. 2017;106-11.
  • 20. Medin J, Nordlund A, Ekberg K. Increasing stroke incidence in Sweden between 1989 and 2000 among persons aged 30 to 65 years: evidence from the Swedish Hospital Discharge Register. Stroke. 2004;35:1047-51.
  • 21. Altun Y, Aydin I, and Algin A. Demographic characteristics of stroke types in Adiyaman. Turkish J Norol. 2018;24:6.
  • 22. Katan M, Luft A. Global burden of stroke, in Seminars in neurology. 2018;208-11.
  • 23. Wu G, Chang EY. Class-boundary alignment for imbalanced dataset learning, in ICML 2003 workshop on learning from imbalanced data sets II, Washington, DC. 2003;49-56.
  • 24. He H, Garcia EA. Learning from imbalanced data, IEEE Trans Knowl Data Eng. 2009;1263-84.
  • 25. Han H, Wang WY, Mao BH. Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning, in International conference on intelligent computing. 2005;878-87.
Medicine Science-Cover
  • ISSN: 2147-0634
  • Yayın Aralığı: 4
  • Başlangıç: 2012
  • Yayıncı: Effect Publishing Agency ( EPA )
Sayıdaki Diğer Makaleler

In silico binding affinity of multi-therapeutic agent Pinostrobin on various mammalian albumins: Computational evaluation of animal models

Esma Eryilmaz Dogan

A comparative analysis of long-term life qualities of head and neck cancer (nasopharyngeal, hypopharyngeal and laryngeal cancers) patients who were treated with the intensity-modulated radiation therapy (IMRT): Expectations and outcomes

Mehmet Turan Cicek, Mehmet Aslan

Psychological and biochemical Impact of Covid-19 pandemic on health workers in Siirt, Turkey

Osman Ozudogru, Naci Omer Alayunt, Ergul Cakan

Hepatopancreaticobiliary injuries during the COVID-19 pandemic

Murat Derebey, Mahmut Arif Yuksek

Determination of the prevalence of dental anomalies by digital panoramic radiography analysis

Neslihan Yilmaz Cirakoglu, Mihriban Gokcek

Morphometric alteration induced by cyclophosphamide in rat kidney and protective efficacy of coenzyme Q10: A stereological study

Ahmad YAHYAZADEH

Comparison of mental disorders seen at the end of the first wave of the Covid 19 pandemic with sociodemographic data

Elif Aktan Mutlu, Emine Yagmur Zorbozan

The characteristics of intimate partner violence cases

Ahsen KAYA, Hatice Sezin YİLMAZER, Burcu ÖZÇALIŞKAN, Ekin Özgür AKTAŞ

Analysis of optical coherence tomography angiography measurements following retinal detachment surgery

Tugba Ceti̇nkaya, Muhammed Mustafa Kurt

Evaluation of the relationship of temporomandibular disorders and the morphology of the mandible

Suayip Burak Duman, Bahadir Sancar, Busra Karadeniz, Omer Faruk Boylu, Ertunc Dayi