An improved tree model based on ensemble feature selection for classification
An improved tree model based on ensemble feature selection for classification
Researchers train and build specific models to classify the presence and absence of a disease and the accuracyof such classification models is continuously improved. The process of building a model and training depends on the medical data utilized. Various machine learning techniques and tools are used to handle different data with respect to disease types and their clinical conditions. Classification is the most widely used technique to classify disease and the accuracy of the classifier largely depends on the attributes. The choice of the attribute largely affects the diagnosis and performance of the classifier. Due to growing large volumes of medical data across different clinical conditions, the need for choosing relevant attributes and features still lacks method to handle datasets that target specific diseases. This study uses an ensemble-based feature selection using random trees and wrapper method to improve the classification. The proposed ensemble learning classification method derives a subset using the wrapper method, bagging, and random trees. The proposed method removes the irrelevant features and selects the optimal features for classification through probability weighting criteria. The improved algorithm has the ability to distinguish the relevant features from irrelevant features and improve the classification performance. The proposed feature selection method is evaluated using SVM, RF,and NB evaluators and the performances are compared against the FSNBb, FSSVMb, GASVMb, GANBb, and GARFb methods. The proposed method achieves mean classification accuracy of 92% and outperforms the other ensemble methods.
___
- [1] Rodríguez D, Ruiz R, Cuadrado-Gallego J, Aguilar-Ruiz J. Detecting fault modules applying feature selection to
classifiers. In: IEEE 2007 International Conference on Information Reuse and Integration; Las Vegas, NV, USA;
2007. pp. 667–672.
- [2] Chandrashekar G, Sahin F. A survey on feature selection methods. Computers & Electrical Engineering 2014; 40
(1): 16-28. doi: 10.1016/j.compeleceng.2013.11.024
- [3] Guyon I, Elisseeff A. An Introduction to Feature Extraction. In: Guyon I, Nikravesh M, Gunn S, Zadeh LA (editors).
Feature Extraction. Studies in Fuzziness and Soft Computing. Berlin, Germany: Springer, 2006, pp. 1-25.
- [4] Blum AL, Langley P. Selection of relevant features and examples in machine learning. Artificial Intelligence 1997;
97 (1-2): 245-271. doi: 10.1016/S0004-3702(97)00063-5
- [5] Hsu HH, Hsieh CW, Lu MD. Hybrid feature selection by combining filters and wrappers. Expert Systems with
Applications 2011; 38 (7): 8144-8150. doi: 10.1016/j.eswa.2010.12.156
- [6] Breiman L. Random forests. Machine Learning 2001; 45 (1): 5–32. doi: 10.1023/A:1010933404324
- [7] Chandralekha M, Shenbagavadivu N. Performance analysis of various machine learning techniques to predict
cardiovascular disease: an empirical study. Applied Mathematics & Information Sciences 2018; 12 (1): 217-226.
doi: 10.18576/amis/120121
- [8] Guan Y, Myers CL, Hess DC, Barutcuoglu Z, Caudy AA et al. Predicting gene function in a hierarchical context
with an ensemble of classifiers. Genome Biology 2008; 9 (1):S3. doi: 10.1186/gb-2008-9-s1-s3
- [9] Hoque N, Singh M, Bhattacharyya DK. EFS-MI: An ensemble feature selection method for classification. Complex
& Intelligent Systems 2018; 4 (2): 105-118. doi: 10.1007/s40747-017-0060-x
- [10] Settouti N, Chikh MA, Barra V. A new feature selection approach based on ensemble methods in semi-supervised
classification. Pattern Analysis and Applications 2017; 20 (3): 673-686. doi: 10.1007/s10044-015-0524-9
- [11] Filali A, Jlassi C, Arous N. Dimensionality reduction with unsupervised ensemble learning using K-means variants.
In: IEEE 2017 14th International Conference on Computer Graphics, Imaging and Visualization; Marrakesh,
Morocco; 2017. pp. 93-98.
- [12] Chaudhary A, Kolhe S, Kamal R. An improved random forest classifier for multi-class classification. Information
Processing in Agriculture 2016; 3 (4): 215-222. doi: 10.1016/j.inpa.2016.08.002
- [13] Sakri S, Rashid NA, Zain ZM. Particle swarm optimization feature selection for breast cancer recurrence prediction. IEEE Access 2018; 6: 29637-29647. doi: 10.1109/ACCESS.2018.2843443
- [14] Cai Z, Zhu W. Feature selection for multi-label classification using neighborhood preservation. IEEE/CAA Journal
of Automatica Sinica 2018; 5 (1): 320-330. doi:10.1109/JAS.2017.7510781
- [15] Brankovic A, Falsone A, Prandini M, Piroddi L. Randomised algorithm for feature selection and classification. 2018; arXiv preprint. arXiv:1607.08400
- [16] Park HW, Li D, Piao Y, Ryu KH. A hybrid feature selection method to classification and its application in
hypertension diagnosis. In: ITBAM 2017 Information Technology in Bio- and Medical Informatics Conference;
Lyon, France; 2017. pp. 11-19.
- [17] Agre G, Dzhondzhorov A. A weighted feature selection method for instance-based classification. In: AIMSA 2016
Artificial Intelligence: Methodology, Systems, and Applications Conference; Varna, Bulgaria; 2016. pp. 14-25.
- [18] Osanaiye O, Cai H, Choo KKR, Dehghantanha A, Xu Z et al. Ensemble-based multi-filter feature selection method
for DDoS detection in cloud computing. EURASIP Journal on Wireless Communications and Networking 2016;
2016 (1): 130. doi: 10.1186/s13638-016-0623-3
- [19] Shunmugapriya P, Kanmani S. A hybrid algorithm using ant and bee colony optimization for feature selection and classification (AC-ABC Hybrid). Swarm and Evolutionary Computation 2017; 36: 27-36. doi:
10.1016/j.swevo.2017.04.002
- [20] Koutanaei FN, Sajedi H, Khanbabaei M. A hybrid data mining model of feature selection algorithms and ensemble learning classifiers for credit scoring. Journal of Retailing and Consumer Services 2015; 27: 11-23. doi:
10.1016/j.jretconser.2015.07.003
- [21] Sasikala S, Balamurugan SA, Geetha S. Multi filtration feature selection (MFFS) to improve discriminatory ability
in clinical data set. Applied Computing and Informatics 2016; 12 (2): 117-127. doi: 10.1016/j.aci.2014.03.002
- [22] Ebrahimpour MK, Eftekhari M. Ensemble of feature selection methods: a hesitant fuzzy sets approach. Applied
Soft Computing 2017; 50: 300-312. doi: 10.1016/j.asoc.2016.11.021
- [23] Tseng CJ, Lu CJ, Chang CC, Chen GD, Cheewakriangkrai C. Integration of data mining classification techniques
and ensemble learning to identify risk factors and diagnose ovarian cancer recurrence. Artificial Intelligence in
Medicine 2017; 78: 47-54. doi: 10.1016/j.artmed.2017.06.003
- [24] Kamkar I, Gupta SK, Phung D, Venkatesh S. Stable feature selection for clinical prediction: exploiting ICD tree
structure using tree-lasso. Journal of Biomedical Informatics 2015; 53: 277-290. doi: 10.1016/j.jbi.2014.11.013
- [25] Lu H, Chen J, Yan K, Jin Q, Xue Y et al. A hybrid feature selection algorithm for gene expression data classification.
Neurocomputing 2017; 256: 56-62. doi: 10.1016/j.neucom.2016.07.080
- [26] Liu J, Lin Y, Lin M, Wu S, Zhang J. Feature selection based on quality of information. Neurocomputing 2017; 225:
11-22. doi: 10.1016/j.neucom.2016.11.001
- [27] Vivekanandan T, Iyengar NCSN. Optimal feature selection using a modified differential evolution algorithm and
its effectiveness for prediction of heart disease. Computers in Biology and Medicine 2017; 90: 125-136. doi:
10.1016/j.compbiomed.2017.09.011
- [28] Bellal F, Elghazel H, Aussem A. A semi-supervised feature ranking method with ensemble learning. Pattern
Recognition Letters 2012; 33 (10): 1426-1433. doi: 10.1016/j.patrec.2012.03.001
- [29] Hong Y, Kwong S, Chang Y, Ren Q. Unsupervised feature selection using clustering ensembles and population based incremental learning algorithm. Pattern Recognition 2008; 41 (9): 2742-2756. doi: 10.1016/j.patcog.2008.03.007
- [30] Yang J, Yao D, Zhan X, Zhan X. Predicting disease risks using feature selection based on random forest and support
vector machine. In: ISBRA 2014 Bioinformatics Research and Applications Conference; Zhangjiajie, China; 2014.
pp. 1-11.
- [31] Maldonado S, Weber R. A wrapper method for feature selection using support vector machines. Information Sciences
2009; 179 (13): 2208-2217. doi: 10.1016/j.ins.2009.02.014
- [32] Zheng B, Yoon SW, Lam SS. Breast cancer diagnosis based on feature extraction using a hybrid of K-means
and support vector machine algorithms. Expert Systems with Applications 2014; 41 (4): 1476-1482. doi:
10.1016/j.eswa.2013.08.044
- [33] Zhou Q, Zhou H, Li T. Cost-sensitive feature selection using random forest: selecting low-cost subsets of informative
features. Knowledge-Based Systems 2016; 95: 1-11. doi: 10.1016/j.knosys.2015.11.010
- [34] Panthong R, Srivihok A. Wrapper feature subset selection for dimension reduction based on ensemble learning
algorithm. Procedia Computer Science 2015; 72: 162-169. doi: 10.1016/j.procs.2015.12.117