Feature Selection using FFS and PCA in Biomedical Data Classification with AdaBoost-SVM

Recently, there has been an increasing trend to propose computer aided diagnosis systems for biomedical pattern recognition. A computer aided diagnosis method, which aims higher classification accuracy, is developed to classify the biomedical dataset. This process includes two types of machine learning algorithms: feature selection and classification. In this method, firstly, features were extracted from biomedical dataset, then the extracted features were classified by hybrid AdaBoost-Support Vector Machines (SVM) classifier structure. For feature selection, Forward Feature Selection (FFS) and Principal Component Analysis (PCA) algorithms were used, and the performance of the feature selection algorithms was tested by AdaBoost-SVM classifier. Following it, advantages and disadvantages of these algorithms were evaluated. Wisconsin Breast Cancer (WBC), Pima Diabetes (PD), Heart (Statlog) biomedical datasets were taken from UCI database and Electrocardiogram (ECG) signals were taken from Physionet ECG Database, and were used to test the proposed hybrid structure. The used two hybrid structures and other studies in the literature were compared with our findings. The obtained results show that the proposed hybrid structure has high classification accuracy for biomedical data classification

___

[1] B. Yuan and X. Ma, "Sampling+ reweighting: boosting the performance of AdaBoost on imbalanced datasets," Neural Networks (IJCNN), IEEE International Joint Conference, 2012, pp.1-6.

[2] PP. Dhakate, K. Rajeswari and D. Abin, "An ensemble approach for cancerious dataset analysis using feature selection," Communication Technologies (GCCT), IEEE Global Conference, 2015, pp. 479-482.

[3] Y. Gao and F. Gao, "Edited AdaBoost by weighted kNN," Neurocomputing, vol. 73, no. 16, Oct. 2010, pp. 3079-3088.

[4] X.F. Chen, H.J. Xing and X.Z. Wang, "A modified AdaBoost method for one-class SVM and its application to novelty detection," Systems, Man, and Cybernetics (SMC), IEEE International Conference, 2011, pp. 3506-3511.

[5] A. Lahiri and PK. Biswas, "A scalable model for knowledge sharing based supervised learning using AdaBoost," Advances in Pattern Recognition (ICAPR), IEEE Eighth International Conference, 2015, pp. 1-6.

[6] H. Zhang and J. Lu, "Creating ensembles of classifiers via fuzzy clustering and deflection," Fuzzy sets and Systems, vol. 161, no. 13, 2010, pp. 1790-1802.

[7] B. Chen and H.X. Zhang, "An approach of multiple classifiers ensemble based on feature selection," Fuzzy Systems and Knowledge Discovery, IEEE FSKD'08 Fifth International Conference, 2008, pp. 390-394.

[8] UCI Machine Learning Repository, Access Date: January 2017.

[9] J. Ghavidel, S. Yazdani and M. Analoui, "A new ensemble classifier creation method by creating new training set for each base classifier," Information and Knowledge Technology (IKT), IEEE 5th Conference, 2013, pp. 290-294.

[10] SL. Ham and N. Kwak, "Boosted-pca for binary classification problems" Circuits and Systems (ISCAS), IEEE International Symposium, 2012, pp. 1219-1222.

[11] X. Shu and P. Wang, "An improved Adaboost algorithm based on uncertain functions," Industrial Informatics-Computing Technology, Intelligent Technology, Industrial Information Integration (ICIICII), IEEE International Conference, 2015, pp 136- 139.

[12] K. Deng, “OMEGA: On-Line Memory-Based General Purpose System Classifier,” PhD thesis, Carnegie Mellon University, 1998.

[13] L. Smith, “A tutorial on Principal Component Analysis”, 2002, pp. 001–027.

[14] B. Kégl, “Introduction to AdaBoost”, 2009, pp. 011-014.

[15] J. Sochman and J. Malas, “AdaBoost with totally corrective updates for fast face detection”, Automatic Face and Gesture Recognition, 2004, pp. 445-450.

[16] S.R. Kulkarni and G. Harman, “Statistical learning theory: a tutorial,” in Wiley Interdisciplinary Review: Computational Statistics, vol. 3, no. 6, 2011, pp. 543-556.

[17] S. Kutscher, “Algorithms for ECG Feature Extraction: an Overview”, 2013, pp. 001-008.

[18] PhysioNet, Access Date: January 2017.

[19] A. Beygelzimer, J. Langford and B. Zadrozny, “Weighted oneagainst-all”, American Association for Artificial Intelligence, 2005, pp. 720-725.

[20] Liu, T., et al.: “Dictionary learning for VQ feature extraction in ECG beats classification. Expert Systems with Applications”, Vol. 53, pp. 129-137 (2016)

[21] Li, P., et al.: “High-Performance Personalized Heartbeat Classification Model for Long-Term ECG Signal”, IEEE Transactions on Biomedical Engineering, Vol. 64, No.1, pp. 78-86 (2017)