An improved tree model based on ensemble feature selection for classification

An improved tree model based on ensemble feature selection for classification

Researchers train and build specific models to classify the presence and absence of a disease and the accuracyof such classification models is continuously improved. The process of building a model and training depends on the medical data utilized. Various machine learning techniques and tools are used to handle different data with respect to disease types and their clinical conditions. Classification is the most widely used technique to classify disease and the accuracy of the classifier largely depends on the attributes. The choice of the attribute largely affects the diagnosis and performance of the classifier. Due to growing large volumes of medical data across different clinical conditions, the need for choosing relevant attributes and features still lacks method to handle datasets that target specific diseases. This study uses an ensemble-based feature selection using random trees and wrapper method to improve the classification. The proposed ensemble learning classification method derives a subset using the wrapper method, bagging, and random trees. The proposed method removes the irrelevant features and selects the optimal features for classification through probability weighting criteria. The improved algorithm has the ability to distinguish the relevant features from irrelevant features and improve the classification performance. The proposed feature selection method is evaluated using SVM, RF,and NB evaluators and the performances are compared against the FSNBb, FSSVMb, GASVMb, GANBb, and GARFb methods. The proposed method achieves mean classification accuracy of 92% and outperforms the other ensemble methods.

___

  • [1] Rodríguez D, Ruiz R, Cuadrado-Gallego J, Aguilar-Ruiz J. Detecting fault modules applying feature selection to classifiers. In: IEEE 2007 International Conference on Information Reuse and Integration; Las Vegas, NV, USA; 2007. pp. 667–672.
  • [2] Chandrashekar G, Sahin F. A survey on feature selection methods. Computers & Electrical Engineering 2014; 40 (1): 16-28. doi: 10.1016/j.compeleceng.2013.11.024
  • [3] Guyon I, Elisseeff A. An Introduction to Feature Extraction. In: Guyon I, Nikravesh M, Gunn S, Zadeh LA (editors). Feature Extraction. Studies in Fuzziness and Soft Computing. Berlin, Germany: Springer, 2006, pp. 1-25.
  • [4] Blum AL, Langley P. Selection of relevant features and examples in machine learning. Artificial Intelligence 1997; 97 (1-2): 245-271. doi: 10.1016/S0004-3702(97)00063-5
  • [5] Hsu HH, Hsieh CW, Lu MD. Hybrid feature selection by combining filters and wrappers. Expert Systems with Applications 2011; 38 (7): 8144-8150. doi: 10.1016/j.eswa.2010.12.156
  • [6] Breiman L. Random forests. Machine Learning 2001; 45 (1): 5–32. doi: 10.1023/A:1010933404324
  • [7] Chandralekha M, Shenbagavadivu N. Performance analysis of various machine learning techniques to predict cardiovascular disease: an empirical study. Applied Mathematics & Information Sciences 2018; 12 (1): 217-226. doi: 10.18576/amis/120121
  • [8] Guan Y, Myers CL, Hess DC, Barutcuoglu Z, Caudy AA et al. Predicting gene function in a hierarchical context with an ensemble of classifiers. Genome Biology 2008; 9 (1):S3. doi: 10.1186/gb-2008-9-s1-s3
  • [9] Hoque N, Singh M, Bhattacharyya DK. EFS-MI: An ensemble feature selection method for classification. Complex & Intelligent Systems 2018; 4 (2): 105-118. doi: 10.1007/s40747-017-0060-x
  • [10] Settouti N, Chikh MA, Barra V. A new feature selection approach based on ensemble methods in semi-supervised classification. Pattern Analysis and Applications 2017; 20 (3): 673-686. doi: 10.1007/s10044-015-0524-9
  • [11] Filali A, Jlassi C, Arous N. Dimensionality reduction with unsupervised ensemble learning using K-means variants. In: IEEE 2017 14th International Conference on Computer Graphics, Imaging and Visualization; Marrakesh, Morocco; 2017. pp. 93-98.
  • [12] Chaudhary A, Kolhe S, Kamal R. An improved random forest classifier for multi-class classification. Information Processing in Agriculture 2016; 3 (4): 215-222. doi: 10.1016/j.inpa.2016.08.002
  • [13] Sakri S, Rashid NA, Zain ZM. Particle swarm optimization feature selection for breast cancer recurrence prediction. IEEE Access 2018; 6: 29637-29647. doi: 10.1109/ACCESS.2018.2843443
  • [14] Cai Z, Zhu W. Feature selection for multi-label classification using neighborhood preservation. IEEE/CAA Journal of Automatica Sinica 2018; 5 (1): 320-330. doi:10.1109/JAS.2017.7510781
  • [15] Brankovic A, Falsone A, Prandini M, Piroddi L. Randomised algorithm for feature selection and classification. 2018; arXiv preprint. arXiv:1607.08400
  • [16] Park HW, Li D, Piao Y, Ryu KH. A hybrid feature selection method to classification and its application in hypertension diagnosis. In: ITBAM 2017 Information Technology in Bio- and Medical Informatics Conference; Lyon, France; 2017. pp. 11-19.
  • [17] Agre G, Dzhondzhorov A. A weighted feature selection method for instance-based classification. In: AIMSA 2016 Artificial Intelligence: Methodology, Systems, and Applications Conference; Varna, Bulgaria; 2016. pp. 14-25.
  • [18] Osanaiye O, Cai H, Choo KKR, Dehghantanha A, Xu Z et al. Ensemble-based multi-filter feature selection method for DDoS detection in cloud computing. EURASIP Journal on Wireless Communications and Networking 2016; 2016 (1): 130. doi: 10.1186/s13638-016-0623-3
  • [19] Shunmugapriya P, Kanmani S. A hybrid algorithm using ant and bee colony optimization for feature selection and classification (AC-ABC Hybrid). Swarm and Evolutionary Computation 2017; 36: 27-36. doi: 10.1016/j.swevo.2017.04.002
  • [20] Koutanaei FN, Sajedi H, Khanbabaei M. A hybrid data mining model of feature selection algorithms and ensemble learning classifiers for credit scoring. Journal of Retailing and Consumer Services 2015; 27: 11-23. doi: 10.1016/j.jretconser.2015.07.003
  • [21] Sasikala S, Balamurugan SA, Geetha S. Multi filtration feature selection (MFFS) to improve discriminatory ability in clinical data set. Applied Computing and Informatics 2016; 12 (2): 117-127. doi: 10.1016/j.aci.2014.03.002
  • [22] Ebrahimpour MK, Eftekhari M. Ensemble of feature selection methods: a hesitant fuzzy sets approach. Applied Soft Computing 2017; 50: 300-312. doi: 10.1016/j.asoc.2016.11.021
  • [23] Tseng CJ, Lu CJ, Chang CC, Chen GD, Cheewakriangkrai C. Integration of data mining classification techniques and ensemble learning to identify risk factors and diagnose ovarian cancer recurrence. Artificial Intelligence in Medicine 2017; 78: 47-54. doi: 10.1016/j.artmed.2017.06.003
  • [24] Kamkar I, Gupta SK, Phung D, Venkatesh S. Stable feature selection for clinical prediction: exploiting ICD tree structure using tree-lasso. Journal of Biomedical Informatics 2015; 53: 277-290. doi: 10.1016/j.jbi.2014.11.013
  • [25] Lu H, Chen J, Yan K, Jin Q, Xue Y et al. A hybrid feature selection algorithm for gene expression data classification. Neurocomputing 2017; 256: 56-62. doi: 10.1016/j.neucom.2016.07.080
  • [26] Liu J, Lin Y, Lin M, Wu S, Zhang J. Feature selection based on quality of information. Neurocomputing 2017; 225: 11-22. doi: 10.1016/j.neucom.2016.11.001
  • [27] Vivekanandan T, Iyengar NCSN. Optimal feature selection using a modified differential evolution algorithm and its effectiveness for prediction of heart disease. Computers in Biology and Medicine 2017; 90: 125-136. doi: 10.1016/j.compbiomed.2017.09.011
  • [28] Bellal F, Elghazel H, Aussem A. A semi-supervised feature ranking method with ensemble learning. Pattern Recognition Letters 2012; 33 (10): 1426-1433. doi: 10.1016/j.patrec.2012.03.001
  • [29] Hong Y, Kwong S, Chang Y, Ren Q. Unsupervised feature selection using clustering ensembles and population based incremental learning algorithm. Pattern Recognition 2008; 41 (9): 2742-2756. doi: 10.1016/j.patcog.2008.03.007
  • [30] Yang J, Yao D, Zhan X, Zhan X. Predicting disease risks using feature selection based on random forest and support vector machine. In: ISBRA 2014 Bioinformatics Research and Applications Conference; Zhangjiajie, China; 2014. pp. 1-11.
  • [31] Maldonado S, Weber R. A wrapper method for feature selection using support vector machines. Information Sciences 2009; 179 (13): 2208-2217. doi: 10.1016/j.ins.2009.02.014
  • [32] Zheng B, Yoon SW, Lam SS. Breast cancer diagnosis based on feature extraction using a hybrid of K-means and support vector machine algorithms. Expert Systems with Applications 2014; 41 (4): 1476-1482. doi: 10.1016/j.eswa.2013.08.044
  • [33] Zhou Q, Zhou H, Li T. Cost-sensitive feature selection using random forest: selecting low-cost subsets of informative features. Knowledge-Based Systems 2016; 95: 1-11. doi: 10.1016/j.knosys.2015.11.010
  • [34] Panthong R, Srivihok A. Wrapper feature subset selection for dimension reduction based on ensemble learning algorithm. Procedia Computer Science 2015; 72: 162-169. doi: 10.1016/j.procs.2015.12.117