Onur SEVLİ

Diyabet hastalığının farklı sınıflandırıcılar kullanılarak teşhisi

Diyabet dünya genelinde görülme oranı giderek artan, yaygın sağlık sorunlarından biridir. Kronik bir hastalık olan diyabet kontrol altına alınmadığı takdirde göz, kalp, böbrek gibi birçok organda tahribata ve ölümlere neden olabilmektedir. Diyabetin erken teşhisi oluşabilecek komplikasyonları önleme ve yaşam kalitesini arttırma açısından önemlidir. Medikal alanda yaygın kullanılan makine öğrenmesi teknikleri farklı hastalıkların teşhisinde uzmanlar için zeki birer karar destek sistemi rolü üstlenmektedir. Bu çalışma, diyabetin erken teşhisine yönelik olarak 6 farklı makine öğrenmesi tekniği ile PIMA diyabet veri seti üzerinde gerçekleştirilen sınıflama çalışmalarını içermektedir. Sınıflama çalışmalarındaki temel amaç tahmin doğruluğunu arttırmaktır. Bu çalışmada sınıflandırıcıların başarıları arttırmak için veri seti üzerinde 14 farklı yeniden örnekleme yöntemi kullanılmıştır. Her bir makine öğrenmesi modeli için örnekleme olmaksızın ve yeniden örnekleme yapılarak, 90 sınıflama işlemi gerçekleştirilmiştir. Her bir sınıflandırma işleminin başarısı 5 farklı performans metriği ile raporlanmıştır. En başarılı sonuç %96,296 doğrulukla, InstanceHardnessThreshold az örnekleme tekniği ile birlikte Rastgele Orman modelinin kullanıldığı sınıflandırma işleminde elde edilmiştir. Yeniden örnekleme tekniklerinin genel olarak sınıflandırıcıların başarılarını arttırdığı ve kolektif öğrenme yöntemleri ile birlikte kullanıldığında daha başarılı sonuç verdiği görülmüştür. Literatürde aynı veri seti üzerinde, çeşitli makine öğrenmesi yöntemleri kullanılarak yapılan en son çalışmalar ile kıyaslandığında, bu çalışmada elde edilen başarının diğerlerinden daha yüksek ortaya konmuştur.

Anahtar Kelimeler:

Diyabet teşhisi, makine öğrenmesi, yeniden örnekleme

PDF

___

N. Cho et al., “IDF Diabetes Atlas: Global estimates of diabetes prevalence for 2017 and projections for 2045,” Diabetes research and clinical practice, vol. 138, pp. 271–281, 2018.
G. Roglic and World Health Organization, Eds., Global report on diabetes. Geneva, Switzerland: World Health Organization, 2016.
A. D. Association and others, “Diagnosis and classification of diabetes mellitus,” Diabetes care, vol. 32, no. Supplement 1, pp. S62–S67, 2009.
G. Swapna, R. Vinayakumar, and K. Soman, “Diabetes detection using deep learning algorithms,” ICT Express, vol. 4, no. 4, pp. 243–246, 2018.
S. Palaniappan and R. Awang, “Intelligent heart disease prediction system using data mining techniques,” in 2008 IEEE/ACS international conference on computer systems and applications, 2008, pp. 108–115.
I. Kavakiotis, O. Tsave, A. Salifoglou, N. Maglaveras, I. Vlahavas, and I. Chouvarda, “Machine learning and data mining methods in diabetes research,” Computational and structural biotechnology journal, vol. 15, pp. 104–116, 2017.
H. Lai, H. Huang, K. Keshavjee, A. Guergachi, and X. Gao, “Predictive models for diabetes mellitus using machine learning techniques,” BMC Endocrine Disorders, vol. 19, no. 1, p. 101, Oct. 2019, doi: 10.1186/s12902-019-0436-6.
L. Kopitar, P. Kocbek, L. Cilar, A. Sheikh, and G. Stiglic, “Early detection of type 2 diabetes mellitus using machine learning-based prediction models,” Scientific Reports, vol. 10, no. 1, p. 11981, Jul. 2020, doi: 10.1038/s41598-020-68771-z.
M. Maniruzzaman, M. J. Rahman, B. Ahammed, and M. M. Abedin, “Classification and prediction of diabetes disease using machine learning paradigm,” Health information science and systems, vol. 8, no. 1, pp. 1–14, 2020.
L. Zhang, Y. Wang, M. Niu, C. Wang, and Z. Wang, “Machine learning for characterizing risk of type 2 diabetes mellitus in a rural Chinese population: The Henan Rural Cohort Study,” Scientific reports, vol. 10, no. 1, pp. 1–10, 2020.
L. Muhammad, E. A. Algehyne, and S. S. Usman, “Predictive supervised machine learning models for diabetes mellitus,” SN Computer Science, vol. 1, no. 5, pp. 1–10, 2020.
D. Sisodia and D. S. Sisodia, “Prediction of Diabetes using Classification Algorithms,” Procedia Computer Science, vol. 132, pp. 1578–1585, Jan. 2018, doi: 10.1016/j.procs.2018.05.122.
Q. Zou, K. Qu, Y. Luo, D. Yin, Y. Ju, and H. Tang, “Predicting Diabetes Mellitus With Machine Learning Techniques,” Front Genet, vol. 9, pp. 515–515, Nov. 2018, doi: 10.3389/fgene.2018.00515.
S. Wei, X. Zhao, and C. Miao, “A comprehensive exploration to the machine learning techniques for diabetes identification,” in 2018 IEEE 4th World Forum on Internet of Things (WF-IoT), 2018, pp. 291–295, doi: 10.1109/WF-IoT.2018.8355130.
P. S. Kohli and S. Arora, “Application of Machine Learning in Disease Prediction,” in 2018 4th International Conference on Computing Communication and Automation (ICCCA), 2018, pp. 1–4, doi: 10.1109/CCAA.2018.8777449.
A. Mir and S. N. Dhage, “Diabetes Disease Prediction Using Machine Learning on Big Data of Healthcare,” in 2018 Fourth International Conference on Computing Communication Control and Automation (ICCUBEA), 2018, pp. 1–6, doi: 10.1109/ICCUBEA.2018.8697439.
K. M. Varma and D. Panda, “Comparative analysis of Predicting Diabetes Using Machine Learning Techniques,” J. Emerg. Technol. Innov. Res, vol. 6, pp. 522–530, 2019.
M. Radja and A. W. R. Emanuel, “Performance Evaluation of Supervised Machine Learning Algorithms Using Different Data Set Sizes for Diabetes Prediction,” in 2019 5th International Conference on Science in Information Technology (ICSITech), 2019, pp. 252–258, doi: 10.1109/ICSITech46713.2019.8987479.
A. Yahyaoui, A. Jamil, J. Rasheed, and M. Yesiltepe, “A decision support system for diabetes prediction using machine learning and deep learning techniques,” in 2019 1st International Informatics and Software Engineering Conference (UBMYK), 2019, pp. 1–4.
S. Benbelkacem and B. Atmani, “Random Forests for Diabetes Diagnosis,” in 2019 International Conference on Computer and Information Sciences (ICCIS), 2019, pp. 1–4, doi: 10.1109/ICCISci.2019.8716405.
R. Birjais, A. K. Mourya, R. Chauhan, and H. Kaur, “Prediction and diagnosis of future diabetes risk: a machine learning approach,” SN Applied Sciences, vol. 1, no. 9, pp. 1–8, 2019.
Q. Wang, W. Cao, J. Guo, J. Ren, Y. Cheng, and D. N. Davis, “DMP_MI: An effective diabetes mellitus classification algorithm on imbalanced data With missing values,” IEEE Access, vol. 7, pp. 102232–102238, 2019.
S. Srivastava, L. Sharma, V. Sharma, A. Kumar, and H. Darbari, “Prediction of diabetes using artificial neural network approach,” in Engineering Vibration, Communication and Information Processing, Springer, 2019, pp. 679–687.
N. Yuvaraj and K. SriPreethaa, “Diabetes prediction in healthcare systems using machine learning algorithms on Hadoop cluster,” Cluster Computing, vol. 22, no. 1, pp. 1–9, 2019.
G. Battineni, G. G. Sagaro, C. Nalini, F. Amenta, and S. K. Tayebati, “Comparative machine-learning approach: A follow-up study on type 2 diabetes predictions by cross-validation methods,” Machines, vol. 7, no. 4, p. 74, 2019.
A. Agarwal and A. Saxena, “Comparing Machine Learning Algorithms to Predict Diabetes in Women and Visualize Factors Affecting It the Most—A Step Toward Better Health Care for Women,” in International Conference on Innovative Computing and Communications, Singapore, 2020, pp. 339–350.
M. Livington, L. Sujihelen, and C. Senthilsingh, “Predictive Design to Analyze Diabetes using Machine Learning Classifier,” Solid State Technology, vol. 63, no. 5, pp. 6862–6871, 2020.
H. Naz and S. Ahuja, “Deep learning approach for diabetes prediction using PIMA Indian dataset,” Journal of Diabetes & Metabolic Disorders, vol. 19, no. 1, pp. 391–403, Jun. 2020, doi: 10.1007/s40200-020-00520-5.
M. K. Hasan, M. A. Alam, D. Das, E. Hossain, and M. Hasan, “Diabetes Prediction Using Ensembling of Different Machine Learning Classifiers,” IEEE Access, vol. 8, pp. 76516–76531, 2020, doi: 10.1109/ACCESS.2020.2989857.
H. Kaur and V. Kumari, “Predictive modelling and analytics for diabetes using a machine learning approach,” Applied computing and informatics, 2020.
R. Patil, L. Majumder, M. Jain, and V. Patil, “Diabetes Disease Prediction Using Machine Learning,” International Journal of Research in Engineering, Science and Management, vol. 3, no. 6, pp. 292–295, 2020.
B. Pranto et al., “Evaluating machine learning methods for predicting diabetes among female patients in bangladesh,” Information, vol. 11, no. 8, p. 374, 2020.
D. J. Reddy et al., “Predictive machine learning model for early detection and analysis of diabetes,” Materials Today: Proceedings, 2020, doi: https://doi.org/10.1016/j.matpr.2020.09.522.
F. Nusrat, B. Uzbaş, and Ö. K. Baykan, “Prediction of Diabetes Mellitus by using Gradient Boosting Classification,” Avrupa Bilim ve Teknoloji Dergisi, pp. 268–272.
K. Utku, “Zeki optimizasyon tabanlı destek vektör makineleri ile diyabet teşhisi,” Politeknik Dergisi, vol. 22, no. 3, pp. 557–566, 2019.
“UCI Machine Learning Repository.” https://archive.ics.uci.edu/ml/index.php (accessed Jan. 09, 2021).
V. Vapnik, S. E. Golowich, and A. Smola, “Support vector method for function approximation, regression estimation, and signal processing,” Advances in neural information processing systems, pp. 281–287, 1997.
E. Fix and J. L. Hodges Jr, “Discriminatory analysis-nonparametric discrimination: Small sample performance,” CALIFORNIA UNIV BERKELEY, 1952.
T. K. Ho, “Random decision forests,” in Proceedings of 3rd international conference on document analysis and recognition, 1995, vol. 1, pp. 278–282.
L. Breiman, “Random Forests,” Machine Learning, vol. 45, no. 1, pp. 5–32, Ekim 2001, doi: 10.1023/A:1010933404324.
Y. Freund, R. E. Schapire, and others, “Experiments with a new boosting algorithm,” in icml, 1996, vol. 96, pp. 148–156.
N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: synthetic minority over-sampling technique,” Journal of artificial intelligence research, vol. 16, pp. 321–357, 2002.
F. Last, G. Douzas, and F. Bacao, “Oversampling for imbalanced learning based on k-means and smote,” arXiv preprint arXiv:1711.00837, 2017.
H. Nguyen, E. Cooper, and K. Kamei, “Borderline over-sampling for imbalanced data classification,” International Journal of Knowledge Engineering and Soft Data Paradigms, vol. 3, pp. 4–21, 2011, doi: 10.1504/IJKESDP.2011.039875.
H. Han, W.-Y. Wang, and B.-H. Mao, “Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning,” in Advances in Intelligent Computing, Berlin, Heidelberg, 2005, pp. 878–887.
H. He, Y. Bai, E. Garcia, and S. Li, “ADASYN: Adaptive Synthetic Sampling Approach for Imbalanced Learning,” in Proceedings of the International Joint Conference on Neural Networks, 2008, pp. 1322–1328, doi: 10.1109/IJCNN.2008.4633969.
C. Drummond, R. C. Holte, and others, “C4. 5, class imbalance, and cost sensitivity: why under-sampling beats over-sampling,” in Workshop on learning from imbalanced datasets II, 2003, vol. 11, pp. 1–8.
D. L. Wilson, “Asymptotic Properties of Nearest Neighbor Rules Using Edited Data,” IEEE Transactions on Systems, Man, and Cybernetics, vol. SMC-2, no. 3, pp. 408–421, 1972, doi: 10.1109/TSMC.1972.4309137.
J. Laurikkala, “Improving Identification of Difficult Small Classes by Balancing Class Distribution,” in Artificial Intelligence in Medicine, Berlin, Heidelberg, 2001, pp. 63–66.
“An Experiment with the Edited Nearest-Neighbor Rule,” IEEE Transactions on Systems, Man, and Cybernetics, vol. SMC-6, no. 6, pp. 448–452, 1976, doi: 10.1109/TSMC.1976.4309523.
M. R. Smith, T. Martinez, and C. Giraud-Carrier, “An instance level analysis of data complexity,” Machine Learning, vol. 95, no. 2, pp. 225–256, May 2014, doi: 10.1007/s10994-013-5422-z.
I. Mani and I. Zhang, “kNN approach to unbalanced data distributions: a case study involving information extraction,” in Proceedings of workshop on learning from imbalanced datasets, 2003, vol. 126.
I. Tomek and others, “Two modifications of CNN,” IEEE Trans. Syst. Man Cybern., vol. 6, pp. 769–772, 1976.
M. Kubat, S. Matwin, and others, “Addressing the curse of imbalanced training sets: one-sided selection,” in Icml, 1997, vol. 97, pp. 179–186.
J. Prusa, T. M. Khoshgoftaar, D. J. Dittman, and A. Napolitano, “Using random undersampling to alleviate class imbalance on tweet sentiment data,” in 2015 IEEE international conference on information reuse and integration, 2015, pp. 197–202.
N. P. Tigga and S. Garg, “Prediction of Type 2 Diabetes using Machine Learning Classification Methods,” Procedia Computer Science, vol. 167, pp. 706–716, 2020, doi: https://doi.org/10.1016/j.procs.2020.03.336.