The Diagnosis of Diabetes Mellitus with Boosting Methods

The Diagnosis of Diabetes Mellitus with Boosting Methods

In addition to the damage it can cause to various organs, diabetes mellitus (DM) also increases a person's risk of developing other serious health conditions. These can include heart disease, stroke, and nerve damage. Furthermore, DM is a leading cause of blindness and kidney failure. However, with proper management and treatment, many of the complications of DM can be prevented or delayed. Thus, early detection and treatment of DM are crucial. With the advancement of machine learning technology, new opportunities have emerged in the field of medicine. Many disease detection research rely on machine learning techniques, with a particular emphasis on boosting algorithms. Boosting algorithms are used to improve the accuracy of predictions made by other weak models such as decision trees. Using knowledge discovery methods, boosting algorithms are examined and compared on a diabetes dataset in this study. The performance of the boosting algorithms is evaluated by generating ROC curves and comparing average accuracy values. When the study's results were evaluated in terms of precision, Gradient Boosting, AdaBoost, CatBoost, LightGBM, and XGBoost algorithms gives success rates of %85, %83, %88, %86, and %87, respectively.

___

  • [1]. International Diabetes Federation, “Home”, https://idf.org/, (Accessed Mar. 13, 2023).
  • [2]. I. Iancu, M. Mota, and E. Iancu, "Method for the analyzing of blood glucose dynamics in diabetes mellitus patients," 2008 IEEE International Conference on Automation, Quality and Testing, Robotics, Cluj-Napoca, Romania, 2008, pp. 60-65.
  • [3]. A. Krasteva, V. Panov, A. Krasteva, A. Kisselova, and Z. Krastev, “Oral cavity and systemic diseases—Diabetes Mellitus,” Biotechnology & Biotechnological Equipment. vol. 25, pp. 2183-2186, 2011.
  • [4]. A. Bener, E. J. Kim, F. Mutlu, A. Eliyan, H. Delghan, E. Nofal, L. Shalabi, N. Wadi, “Burden of diabetes mellitus attributable to demographic levels in Qatar: an emerging public health problem,” Diabetology Metabolic Syndrome. vol. 8, no. 4, pp. 216-20, 2014.
  • [5]. N. Wang, and G. Kang, "A monitoring system for type 2 diabetes mellitus," IEEE 14th International Conference on e-Health Networking, Applications and Services (Healthcom), 2012, pp. 62-67.
  • [6]. B. Robson, S. Boray, “Data-mining to build a knowledge representation store for clinical decision support. Studies on curation and validation based on machine performance in multiple choice medical licensing examinations,” Computers in Biology and Medicine. vol. 73, pp.71-93, 2016.
  • [7]. V. Vijayan, A. Ravikumar. “Study of Data Mining Algorithms for Prediction and Diagnosis of Diabetes Mellitus,” International Journal of Computer Applications, vol. 95, pp. 12-16, 2014.
  • [8]. O. S. Lupse, M. Crisan-Vida, L. Stoicu-Tivadar, E. Bernard, “Supporting diagnosis and treatment in medical care based on big data processing”, Studies in Health Technology and Informatics, vol. 197, pp. 65-9, 2014.
  • [9]. U. Neumann, N. Genze, D. Heider, “EFS: an ensemble feature selection tool implemented as R-package and web-application,” BioData Mining, vol. 10, no. 1, 2017.
  • [10]. Q. Zou, K. Qu, Y. Luo, D. Yin, Y. Ju, H. Tang, “Predicting Diabetes Mellitus With Machine Learning Techniques,” Frontiers in Genetics. vol. 9, 2018.
  • [11]. H. Kaur, V. Kumari, “Predictive Modelling and Analytics for Diabetes using a Machine Learning Approach,” Applied Computing and Informatics, vol. 18, no. 1/2, pp. 90-100, 2018.
  • [12]. V. Rawat, S. Suryakant, “A Classification System for Diabetic Patients with Machine Learning Techniques,” International Journal of Mathematical, Engineering and Management Sciences. vol. 4. pp. 729-744, 2019.
  • [13]. P. Chen, C. Pan, “Diabetes classification model based on boosting algorithms,” BMC Bioinformatics, vol. 19, 2018.
  • [14]. W. Xu, J. Zhang, Q. Zhang, and X. Wei, "Risk prediction of type II diabetes based on random forest model," Third International Conference on Advances in Electrical, Electronics, Information, Communication and Bio-Informatics (AEEICB), 2017, pp. 382-386.
  • [15]. C. Kalaiselvi, G. Nasira, “A New Approach for Diagnosis of Diabetes and Prediction of Cancer Using ANFIS,” Proceedings - World Congress on Computing and Communication Technologies, 2014, pp. 188-190.
  • [16]. J. Pangaribuan, S. Suharjito, “Diagnosis of Diabetes Mellitus Using Extreme Learning Machine,” International Conference on Information Technology Systems and Innovation, ICITSI 2014 - Proceedings. 2014.
  • [17]. J. W. Smith, J. E. Everhart, W. C. Dickson, W. C. Knowler, and R. S. Johannes, “Using the ADAP learning algorithm to forecast the onset of diabetes mellitus,” Proceedings of the Annual Symposium on Computer Applications in Medical Care, 1988, pp. 261-265.
  • [18]. X. Ying, “Ensemble Learning”, 2014
  • [19]. P. Bartlett, Y. Freund, W. S. Lee, R. E. Schapire, “Boosting the margin: A new explanation for the effectiveness of voting methods,” The Annals of Statistics, vol. 26, no. 5, pp. 1651-1686, 1998.
  • [20]. P. Harrington “Machine Learning in Action,” Manning, Publications, 2012.
  • [21]. J. H. Friedman, “Stochastic gradient boosting,” Computational Statistics & Data Analysis, vol. 38, pp. 367- 378, 2002.
  • [22]. S. Soshnikov, C. Lee, V. Vlassov, M. Gaidar, and S. Vladimirov, "A Comparison of Some Predictive Models for Modeling Abortion Rate in Russia," 14th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing, 2013, pp. 115-120.
  • [23]. S. Xiaosong, Y. Cheng, D. Xue, “Classification Algorithm of Urban Point Cloud Data based on LightGBM,” IOP Conference Series: Materials Science and Engineering. vol. 631. no.5, 2019.
  • [24]. Y. Wang, T. Wang, “Application of Improved LightGBM Model in Blood Glucose Prediction,” Applied Sciences, vol. 10, no. 9, 2020.
  • [25]. X. Ma, J. Sha, D. Wang, Y. Yu, Q. Yang, X. Niu, “Study on a prediction of P2P network loan default based on the machine learning LightGBM and XGboost algorithms according to different high dimensional data cleaning,” Electronic Commerce Research and Applications, vol. 31, pp. 24-39, 2018.
  • [26]. G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, Q. Ye, T. Liu, “LightGBM: A Highly Efficient Gradient Boosting Decision Tree,” 31st Conference on Neural Information Processing Systems, 2017.
  • [27]. J. Friedman, T. Hastie, and R. Tibshirani, “Additive logistic regression: a statistical view of boosting,” The annals of statistics, vol. 28, no. 2, pp. 337-407, 2000.
  • [28]. H. Alshari, A. Saleh, A. Odabas, “CPU Performansı için Gradyan Artırıcı Karar Ağacı Algoritmalarının Karşılaştırılması,” Erciyes Üniversitesi Fen Bilimleri Enstitüsü Fen Bilimleri Dergisi, vol. 37, no. 1, pp. 157-168, 2021.
  • [29]. J. Hancock, T. Khoshgoftaar, “CatBoost for Big Data: an Interdisciplinary Review”, Journal of Big Data, 2020.
  • [30]. R. E. Schapire, Y. Freund, P. Bartlett, W. S. Lee, “Boosting the margin: A new explanation for the effectiveness of voting methods,” Machine Learning: Proceedings of 14th Int. Conference, 1997, pp. 322-330.
  • [31]. J. Ma, Z. Yu, Q. Yuanhao, J. Xu, Y. Cao, “Application of the XGBoost Machine Learning Method in PM2.5 Prediction: A Case Study of Shanghai,” Aerosol and Air Quality Research, vol. 20. no. 1, 2020.
  • [32]. S. Visa, B. Ramsay, A. Ralescu, E. Knaap. “Confusion Matrix-based Feature Selection,” 22nd Midwest Artificial Intelligence and Cognitive Science Conference, 2011.
  • [33]. T. Fawcett, “Introduction to ROC analysis,” Pattern Recognition Letters, vol. 27, pp. 861-874, 2006.
  • [34]. V. Ferraris, “Should We Rely on ROC Curves? – From Submarines to Medical Tests, the Answer Is a Definite Maybe,” The Journal of Thoracic and Cardiovascular Surgery, vol. 157, 2018.
El-Cezeri-Cover
  • ISSN: 2148-3736
  • Yayın Aralığı: Yılda 3 Sayı
  • Başlangıç: 2013
  • Yayıncı: Tüm Bilim İnsanları ve Akademisyenler Derneği