A robust ensemble feature selector based on rank aggregation for developing new VO2max prediction models using support vector machines

A robust ensemble feature selector based on rank aggregation for developing new VO2max prediction models using support vector machines

This paper proposes a new ensemble feature selector, called the majority voting feature selector (MVFS),for developing new maximal oxygen uptake (VO2max) prediction models using a support vector machine (SVM). Theapproach is based on rank aggregation, which meaningfully utilizes the correlation among the relevance ranks of predictorvariables given by three state-of-the-art feature selectors: Relief-F, minimum redundancy maximum relevance (mRMR),and maximum likelihood feature selection (MLFS). By applying the SVM combined with MVFS on a self-created datasetcontaining maximal and submaximal exercise data from 185 college students, several new hybrid VO2max predictionmodels have been created. To compare the performance of the proposed ensemble approach on prediction of VO2max,SVM-based models with individual combinations of Relief-F, mRMR, and MLFS as well as with other alternativeensemble feature selectors from the literature have also been developed. The results reveal that MVFS outperformsother individual and ensemble feature selectors and yields up to 8.76% increment and 11.15% decrement rates in multiplecorrelation coefficients (Rs) and root mean square errors (RMSEs), respectively. Furthermore, in addition to reconfirmingthe relevance of sex, age, and maximal heart rate in predicting VO2max, which were previously reported in the literature,it is revealed that submaximal heart rates and exercise times at 1.5-mile distance are two further discriminative predictorsof VO2max. The results have also been compared to those obtained by a general regression neural network and singledecision tree combined with MVFS, and it is shown that the SVM exhibits much better performance than other methodsfor prediction of VO2max.

___

  • [1] Abut F, Akay MF. Machine learning and statistical methods for the prediction of maximal oxygen uptake: recent advances. Medical Devices: Evidence and Research 2015; 8: 369-379. doi: 10.2147/MDER.S57281
  • [2] Chandrashekar G, Sahin F. A survey on feature selection methods. Computers and Electrical Engineering 2014; 40 (1): 16-28. doi: 10.1016/j.compeleceng.2013.11.024
  • [3] Aktürk E. Prediction of maximal oxygen uptake using machine learning methods combined with feature selection. MSc, Çukurova University, Adana, Turkey, 2014.
  • [4] Abut F, Akay MF, George JD. Support vector machines with individual combinations of Relief-F and mRMR for predicting VO2max from maximal and questionnaire data. International Journal of Advances in Electronics and Computer Science 2016; 3 (10): 63-67.
  • [5] Abut F, Akay MF, George J. Developing new VO2max prediction models from maximal, submaximal and questionnaire variables using support vector machines combined with feature selection. Computers in Biology and Medicine 2016; 79: 182-192. doi: 10.1016/j.compbiomed.2016.10.018
  • [6] Dietterich TG. Ensemble methods in machine learning. Multiple Classifier Systems 2000; 1857: 1–15. doi: 10.1007/3- 540-45014-9_1
  • [7] Sarkar C, Cooley S, Srivastava J. Robust feature selection technique using rank aggregation. Applied Artificial Intelligence 2014; 28 (3): 243–257. doi: 10.1080/08839514.2014.883903
  • [8] Giacinto G, Roli F. An approach to the automatic design of multiple classifier systems. Pattern Recognition Letters 2001; 22 (1): 25–33. doi: 10.1016/S0167-8655(00)00096-9
  • [9] Rokach L. Ensemble methods for classifiers. In: Maimon O, Rokach L (editors). Data Mining and Knowledge Discovery Handbook. Boston, MA, USA: Springer, 2005, pp. 957–980.
  • [10] Xu J, Sun L, Gao Y, Xu T. An ensemble feature selection technique for cancer recognition. Biomedical Materials and Engineering 2014; 24 (1): 1001–1008. doi: doi: 10.3233/BME-130897
  • [11] Zhang Z, Yang P. An ensemble of classifiers with genetic algorithm based feature selection. IEEE Intelligent Informatics Bulletin 2008; 9 (1): 18–24.
  • [12] Neumann U, Genze N, Heider D. EFS: An ensemble feature selection tool implemented as R-package and webapplication. BioData Mining 2017; 10 (1): 21. doi: 10.1186/s13040-017-0142-8
  • [13] Robnik-Šikonja M, Kononenko I. Theoretical and empirical analysis of relief-f and r-relief-f. Machine Learning 2003; 53 (1-2): 23-69. doi: 10.1023/A:1025667309714
  • [14] Peng H, Long F, Ding C. Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence 2005; 27 (8): 1226-1238. doi: 10.1109/TPAMI.2005.159
  • [15] Suzuki T, Sugiyama M, Sese J, Kanamori T. Approximating mutual information by maximum likelihood density ratio estimation. In: JMLR Workshop and Conference Proceedings; Whistler, Canada; 2008. pp. 5-20.
  • [16] Lam L, Suen SY. Application of majority voting to pattern recognition: an analysis of its behavior and performance. IEEE Transactions on Systems, Man, and Cybernetics - Part A 1997; 27 (5): 553–568. doi: 10.1109/3468.618255
  • [17] Zhang Z, Yang P. An ensemble of classifiers with genetic algorithm based feature selection. IEEE Intelligent Informatics Bulletin 2008; 9 (1): 18-24.
  • [18] Hearst MA. Support vector machines. IEEE Intelligent Systems 1998; 13 (4): 18-28. doi: 10.1109/5254.708428
  • [19] Akay MF, Abut F, Daneshvar S, Heil D. Prediction of upper body power of cross-country skiers using support vector machines. Arabian Journal for Science and Engineering 2015; 40 (4): 1045–1055. doi: 10.1007/s13369-015-1588-y
  • [20] Acikkar M, Akay MF, Ozgunen KT, Aydin K, Kurdak SS. Support vector machines for aerobic fitness prediction of athletes. Expert Systems with Applications 2009; 36 (2): 3596–3602. doi: 10.1016/j.eswa.2008.02.002
  • [21] Friedrichs F, Igel C. Evolutionary tuning of multiple SVM parameters. Neurocomputing 2005; 64: 107–117. doi: 10.1016/j.neucom.2004.11.022
  • [22] Hsu CW, Chang CC, Lin CJ. A Practical Guide to Support Vector Classification. Technical Report. Taipei, Taiwan: Department of Computer Science, National Taiwan University, 2003.
  • [23] Specht DF. A general regression neural network. IEEE Transactions on Neural Networks 1991; 2 (6): 568–576. doi: 10.1109/72.97934
  • [24] Ali J, Khan R, Ahmad N, Maqsood I. Random forests and decision trees. International Journal of Computer Science Issues 2012; 9 (5): 272–278.
  • [25] Taheri SM, Hesamian G. A generalization of the Wilcoxon signed-rank test and its applications. Statistical Papers 2012; 54 (2): 457–470. doi: 10.1007/s00362-012-0443-4
  • [26] Abut F, Akay MF, George J. Maximum-likelihood feature selection for prediction of maximal oxygen uptake using support vector machines. In: International Conference on Educational Research; Kyrenia, North Cyprus; 2017. p. 48.