Early stage diabetes prediction using decision tree-based ensemble learning model

Early stage diabetes prediction using decision tree-based ensemble learning model

Diabetes is a lifelong disease that has undesirable effects on various organs, such as long-term organ damage, functional disorder, and finally failure of the organ. Diabetes must be treated under the supervision of a doctor. Diabetes is known as a disease that can be seen in many people today and is becoming widespread due to life conditions. If a person with diabetes does not receive any treatment at an early stage, the patient's body can react with serious complications. In addition to the medical methods used in the diagnosis of diabetes, this disease can be detected by an artificial intelligence approach. This research aims to establish the most influential variable among the many variables causing diabetes and to design a model that will predict diabetes to help doctors analyze the disease with selected machine learning methods. In this study, Decision Tree, Bagging with Decision Tree, Random Forest and Extra Tree algorithms were used for the proposed model and the highest accuracy values were obtained with the Extra Trees algorithm with 99.2%.

___

  • Kavakiotis, I., Tsave, O., Salifoglou, A., Maglaveras, N., Vlahavas, I., and Chouvarda, I., Machine learning, and data mining methods in diabetes research. Computational and structural biotechnology journal, 2017. 15: p. 104-116.
  • Choubey, D.K., Paul, S., and Bhattacharjee, J., Soft computing approaches for diabetes disease diagnosis: a survey. International Journal of Applied Engineering Research, 2014. 9(21): p. 11715-11726.
  • Ganji, M.F. and Abadeh, M.S., A fuzzy classification system based on Ant Colony Optimization for diabetes disease diagnosis. Expert Systems with Applications, 2011. 38(12): p. 14650-14659.
  • Karegowda, A.G., Manjunath, A., and Jayaram, M., Application of genetic algorithm optimized neural network connection weights for medical diagnosis of Pima Indians diabetes. International Journal on Soft Computing, 2011. 2(2): p. 15-23.
  • Maniruzzaman, M., Kumar, N., Abedin, M. M., Islam, M. S., Suri, H. S., El-Baz, A. S., and Suri, J. S., Comparative approaches for classification of diabetes mellitus data: Machine learning paradigm. Computer methods and programs in biomedicine, 2017. 152: p. 23-34.
  • Mir, A. and Dhage, S.N., Diabetes disease prediction using machine learning on big data of healthcare. in 2018 fourth international conference on computing communication control and automation (ICCUBEA). 2018. IEEE.
  • Sisodia, D. and Sisodia, D. S., Prediction of diabetes using classification algorithms. Procedia computer science, 2018. 132: p. 1578-1585.
  • Wu, H., Yang, S., Huang, Z., He, J., and Wang, X., Type 2 diabetes mellitus prediction model based on data mining. Informatics in Medicine Unlocked, 2018. 10: p. 100-107.
  • Zou, Q., Qu, K., Luo, Y., Yin, D., Ju, Y., and Tang, H., Predicting diabetes mellitus with machine learning techniques. Frontiers in genetics, 2018. 9: p. 515.
  • Alam, T. M., Iqbal, M. A., Ali, Y., Wahab, A., Ijaz, S., Baig, T. I., and Abbas, Z., A model for early prediction of diabetes. Informatics in Medicine Unlocked, 2019. 16: p. 100204.
  • Hegde, H., Shimpi, N., Panny, A., Glurich, I., Christie, P., and Acharya, A., Development of non-invasive diabetes risk prediction models as decision support tools designed for application in the dental clinical environment. Informatics in medicine unlocked, 2019. 17: p. 100254.
  • Lukmanto, R. B., Nugroho, A., and Akbar, H., Early detection of diabetes mellitus using feature selection and fuzzy support vector machine. Procedia Computer Science, 2019. 157: p. 46-54.
  • Juliet, M.P.L. and T. Bhavadharani, An improved prediction model for type 2 diabetes mellitus disease using clustering and classification algorithms. International Research Journal of Engineering and Technology (IRJET), 6(2): p. 1179-1186.
  • Wang, Q., Cao, W., Guo, J., Ren, J., Cheng, Y., and Davis, D. N., DMP_MI: An effective diabetes mellitus classification algorithm on imbalanced data With missing values. IEEE Access, 2019. 7: p. 102232-102238.
  • Khairunnisa, S., Suyanto, S., and Yunanto, P. E. Removing Noise, Reducing dimension, and Weighting Distance to Enhance k-Nearest Neighbors for Diabetes Classification. in 2020 3rd International Seminar on Research of Information Technology and Intelligent Systems (ISRITI). 2020. IEEE.
  • Tarokh, M.J., Type 2 Diabetes Prediction Using Machine Learning Algorithms. Jorjani Biomedicine Journal, 2020. 8(3): p. 4-18.
  • Gupta, D., Choudhury, A., Gupta, U., Singh, P., and Prasad, M., Computational approach to clinical diagnosis of diabetes disease: a comparative study. Multimedia Tools and Applications, 2021: p. 1-26.
  • Nai-Arun, N., and Sittidech, P., Ensemble learning model for diabetes classification. in Advanced Materials Research. 2014. Trans Tech Publ.
  • Patil, M. K., Sawarkar, S. D., and Narwane, M. S. Narwane, Designing a Model to Detect Diabetes using Machine Learning. Int. J. Eng. Res. Technol, 8(11), p: 333-340
  • Hasan, M. K., Alam, M. A., Das, D., Hossain, E., and Hasan, M., Diabetes prediction using ensembling of different machine learning classifiers. IEEE Access, 2020. 8: p. 76516-76531.
  • Kopitar, L., Kocbek, P., Cilar, L., Sheikh, A., and Stiglic, G., Early detection of type 2 diabetes mellitus using machine learning-based prediction models. Scientific reports, 2020. 10(1): p. 1-12.
  • Gamara, R. P. C., Bandala, A. A., Loresco, P. J. M., and Vicerra, R. R. P., Early stage diabetes likelihood prediction using artificial neural networks. in 2020 IEEE 12th International Conference on Humanoid, Nanotechnology, Information Technology, Communication and Control, Environment, and Management (HNICEM). 2020, IEEE.
  • Hu, F., Li, H., A novel boundary oversampling algorithm based on neighborhood rough set model: NRSBoundary-SMOTE. Mathematical Problems in Engineering, 2013.
  • Quinlan, J. R., Induction of decision trees, Machine Learning, 1, p: 81-106, 1986.
  • Perveen, S., Shahbaz, M., Guergachi, A., and Keshavjee, K., Performance analysis of data mining classification techniques to predict diabetes. ScienceDirect, 2016. 82: 115-121.
  • Breiman, L., 2001. Random forests. Machine Learning, 45(1): p. 5-32, 2001.
  • Geurts, P., Ernst, D., and Wehenkel, L., Extremely Randomized Trees, Machine Learning, 63(1), p. 3-42, 2006.
  • Başer, B. Ö., Yangın, M., and Sarıdaş, E. S., Makine öğrenmesi teknikleriyle diyabet hastalığının sınıflandırılması. Journal of Natural & Applied Sciences, 25(1), 2021.