A robust ensemble feature selector based on rank aggregation for developing new VO\textsubscript{2}max prediction models using support vector machines

Öz This paper proposes a new ensemble feature selector, called the majority voting feature selector (MVFS), for developing new maximal oxygen uptake (VO2max) prediction models using a support vector machine (SVM). The approach is based on rank aggregation, which meaningfully utilizes the correlation among the relevance ranks of predictor variables given by three state-of-the-art feature selectors: Relief-F, minimum redundancy maximum relevance (mRMR), and maximum likelihood feature selection (MLFS). By applying the SVM combined with MVFS on a self-created dataset containing maximal and submaximal exercise data from 185 college students, several new hybrid (VO2max) prediction models have been created. To compare the performance of the proposed ensemble approach on prediction of (VO2max), SVM-based models with individual combinations of Relief-F, mRMR, and MLFS as well as with other alternative ensemble feature selectors from the literature have also been developed. The results reveal that MVFS outperforms other individual and ensemble feature selectors and yields up to 8.76 % increment and 11.15 % decrement rates in multiple correlation coefficients (Rs) and root mean square errors {RMSE}s), respectively. Furthermore, in addition to reconfirming the relevance of sex, age, and maximal heart rate in predicting VO2max, which were previously reported in the literature, it is revealed that submaximal heart rates and exercise times at 1.5-mile distance are two further discriminative predictors of (VO2max). The results have also been compared to those obtained by a general regression neural network and single decision tree combined with MVFS, and it is shown that the SVM exhibits much better performance than other methods for prediction of (VO2max).