Feature Selection Based Data Mining Approach for Coronary Artery Disease Diagnosis

Cardiovascular diseases responsible for many deaths are very common and important health problems. According to World Health Organization, each year 17.7 million people die because of them. Coronary artery disease is the most important type of cardiovascular diseases that cause serious heart problems in patients, affecting the heart’s function negatively. Being aware of the important attributes for this disease will help field-specialist in the analysis of routine laboratory test results of a patient coming internal medicine or another medicine unit except for the cardiology unit. In this study, it is aimed to determine the significance of attributes for coronary artery disease by utilizing Stability Selection method. In experiments, the attributes; ‘Age’, ‘Atypical’, ‘Blood pressure’, ‘Current smoker’, ‘Diastolic murmur’, ‘Dyslipidemia’, ‘Diabetes mellitus’, ‘Ejection fraction’, ‘Erythrocyte sedimentation rate’, ‘Family history’, ‘Hypertension’, ‘Potassium’, ‘Nonanginal’, ‘Pulse rate’, ‘Q wave’, ‘Regional wall motion abnormality’, ‘Sex’, ‘St Depression’, ‘Triglyceride’, ‘Tinversion’, ‘Typical chest pain’ and ‘Valvular heart disease’ were found important for each sub-dataset. Besides, the performances of four traditional machine learning algorithms were evaluated to detection of this disease. Logistic Regression algorithm outperformed others with %90.88 value of accuracy, 95.18% value of sensitivity, and 81.34% value of specificity.

___

[1] R. Alizadehsani, J. Habibi, M.J. Hosseini, H. Mashayekhi, R. Boghrati, A. Ghandeharioun, B. Bahadorian and Z.A. Sani, "A data mining approach for diagnosis of coronary artery disease", Comput. Methods Programs Biomed. vol. 111, no. 1, pp. 52–61, 2013.

[2] R. Roberts, "A genetic basis for coronary artery disease", Trends Cardiovasc. Med. vol. 25, no. 3, pp. 171– 178, 2015.

[3] P. Chagas, L. Mazocco, J. da C.E. Piccoli, T.M. Ardenghi, L. Badimon, P.R.A. Caramori, L. Pellanda, I. Gomes and C.H.A. Schwanke, "Association of alcohol consumption with coronary artery disease severity", Clin. Nutr. vol. 36, no. 4, pp. 1036–1039, 2017.

[4] C. Yadav, S. Lade and M.K. Suman, "Predictive Analysis for the Diagnosis of Coronary Artery Disease using Association Rule Mining", International Journal of Computer Applications, vol. 87, pp. 9-13, 2014.

[5] N. Ghadiri Hedeshi and M. Saniee Abadeh, "Coronary artery disease detection using a fuzzy-boosting PSO approach", Comput. Intell. Neurosci. vol. 2014, pp. 1- 13, 2014.

[6] R. Alizadehsani, M.J. Hosseini, R. Boghrati, A. Ghandeharioun, F. Khozeimeh and Z.A. Sani, "Exerting Cost-Sensitive and Feature Creation Algorithms for Coronary Artery Disease Diagnosis", Int. J. Knowl. Discov. Bioinforma. vol. 3, no. 1, pp. 59–79, 2013.

[7] S. Nithya, C. Suresh and G. Dhas, "Fuzzy Logic Based Improved Support Vector Machine (F-Isvm) Classifier for Heart Disease Classification", ARPN Journal of Engineering and Applied Sciences, vol. 10, no. 16, pp. 6957-964, 2015.

[8] Z. Arabasadi, R. Alizadehsani, M. Roshanzamir, H. Moosaei and A.A. Yarifard, "Computer aided decision making for heart disease detection using hybrid neural network-Genetic algorithm", Comput. Methods Programs Biomed. vol. 141, 19–26, 2017.

[9] R. Alizadehsani, M.J. Hosseini, Z.A. Sani, A. Ghandeharioun and R. Boghrati, "Diagnosis of coronary artery disease using cost-sensitive algorithms", in: Proc. - 12th IEEE Int. Conf. Data Min. Work. ICDMW, Brussels, Belgium, pp. 9–16, 2012.

[10] C.J. Qin, Q. Guan and X.P. Wang, "Application of ensemble algorithm integrating multiple criteria feature selection in coronary heart disease detection", Biomed. Eng. - Appl. Basis Commun. vol. 29, no. 6, pp. 1-11, 2017.

[11] F. Babic, J. Olejar, Z. Vantova and J. Paralic, "Predictive and descriptive analysis for heart disease diagnosis", in: Proc. 2017 Fed. Conf. Comput. Sci. Inf. Syst. FedCSIS 2017, Institute of Electrical and Electronics Engineers Inc., pp. 155–163, 2017.

[12] L.A. Pathak, S. Shirodkar, R. Ruparelia and J. Rajebahadur, "Coronary artery disease in women", Indian Heart J. vol. 69, no. 4, pp. 532–538, 2017.

[13] A.H. Shahid and M.P. Singh, "A Novel Approach for Coronary Artery Disease Diagnosis using Hybrid Particle Swarm Optimization based Emotional Neural Network", Biocybern. Biomed. Eng. vol. 40, no. 4, pp. 1568–1585, 2020.

[14] D. Velusamy and K. Ramasamy, "Ensemble of heterogeneous classifiers for diagnosis and prediction of coronary artery disease with reduced feature subset", Comput. Methods Programs Biomed. vol. 198, pp. 1-13, 2021.

[15] E. Nasarian, M. Abdar, M.A. Fahami, R. Alizadehsani, S. Hussain, M.E. Basiri, M. ZomorodiMoghadam, X. Zhou, P. Pławiak, U.R. Acharya, R.S. Tan and N. Sarrafzadegan, "Association between work-related features and coronary artery disease: A heterogeneous hybrid feature selection integrated with balancing approach", Pattern Recognit. Lett. vol. 133, pp. 33–40, 2020.

[16] A.K. Malakar, D. Choudhury, B. Halder, P. Paul, A. Uddin and S. Chakraborty, "A review on coronary artery disease, its risk factors, and therapeutics", J. Cell. Physiol. vol. 234, no. 10, pp. 16812–16823, 2019.

[17] D. Effrosynidis and A. Arampatzis, "An evaluation of feature selection methods for environmental data", Ecol. Inform. vol. 61, pp. 1-10, 2021.

[18] K.P. Muhammed Niyas and P. Thiyagarajan, "Feature selection using efficient fusion of Fisher Score and greedy searching for Alzheimer’s classification", J. King Saud Univ. - Comput. Inf. Sci., in press, 2021. https://doi.org/10.1016/j.jksuci.2020.12.009.

[19] K.K. Kavitha and A. Kangaiammal, "Correlationbased high distinction feature selection in digital mammogram", Mater. Today Proc., vol. 147, no. 5, e218- e227, 2020.

[20] P.D. Sheth, S.T. Patil and M.L. Dhore, "Evolutionary computing for clinical dataset classification using a novel feature selection algorithm", J. King Saud Univ. - Comput. Inf. Sci., in press, 2020. https://doi.org/10.1016/j.jksuci.2020.12.012.

[21] F. Amini and G. Hu, "A two-layer feature selection method using Genetic Algorithm and Elastic Net", Expert Syst. Appl. vol. 166, pp. 1-10, 2021.

[22] S.B. Chen, Y.M. Zhang, C.H.Q. Ding, J. Zhang and B. Luo, "Extended adaptive Lasso for multi-class and multilabel feature selection", Knowledge-Based Syst. vol. 173, pp. 28–36, 2019.

[23] A.U. Haq, A. Zeb, Z. Lei and D. Zhang, "Forecasting daily stock trend using multi-filter feature selection and deep learning", Expert Syst. Appl., vol. 168, pp. 1-8, 2021.

[24] M. Toğaçar, B. Ergen and Z. Cömert, "Classification of white blood cells using deep features obtained from Convolutional Neural Network models based on the combination of feature selection methods", Appl. Soft Comput. J. vol. 97, pp. 1-10, 2020.

[25] T. Niu, J. Wang, H. Lu, W. Yang and P. Du, "Developing a deep learning framework with two-stage feature selection for multivariate financial time series forecasting", Expert Syst. Appl. vol. 148, pp. 1-17. 2020.

[26] W. Tian, Z. Liu, L. Li, S. Zhang and C. Li, "Identification of abnormal conditions in high-dimensional chemical process based on feature selection and deep learning", Chinese J. Chem. Eng. vol. 28, no. 7, pp. 1875- 1883, 2020.

[27] W. Kim, Y. Han, K.J. Kim and K.W. Song, "Electricity load forecasting using advanced feature selection and optimal deep learning model for the variable refrigerant flow systems", Energy Reports. vol. 6, 2604– 2618, 2020.

[28] H. Shi, H. Li, D. Zhang, C. Cheng and X. Cao, "An efficient feature generation approach based on deep learning and feature selection techniques for traffic classification", Comput. Networks. vol. 132, pp. 81–98, 2018.

[29] F. Mordelet, J. Horton, A.J. Hartemink, B.E. Engelhardt and R. Gordân, "Stability selection for regression-based models of transcription factor-DNA binding specificity", Bioinformatics, vol. 20, no. 13, i117- 125, 2013.

[30] N. Meinshausen and P. Bühlmann, "Stability selection", J. R. Stat. Soc. Ser. B. vol. 72, no. 4, pp. 417–473, 2010.

[31] C. Zucco, "Data Mining in Bioinformatics", Encycl. Bioinforma. Comput. Biol., Elsevier, vol. 1, pp. 328–335, 2019.

[32] M. Kantardzic, Data mining : concepts, models, methods, and algorithms, 3rd Edition, 2019.

[33] T. Hastie, R. Tibshirani and J. Friedman, Elements of Statistical Learning, 2nd edition, 2009.

[34] R. Kohavi, "A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection", IJCAI'95: Proceedings of the 14th international joint conference on Artificial intelligence, vol. 2, pp. 1137–1143, 1995.

[35] S.A. Shaikh, "Measures Derived from a 2 x 2 Table for an Accuracy of a Diagnostic Test", J. Biom. Biostat. vol. 2, no. 5, pp. 1–4, 2011.