Correlation Coefficient Based Candidate Feature Selection Framework Using Graph Construction

Selection of strong features is crucial problem in machine learning. It is also considered as an inescapable exercise to minimize the number of variables available in the primary feature space for finer classification performance, decrease computation complexity , and minimized memory utilization. In this current work, a novel structure using Symmetrical Uncertainty (SU) and Correlation Coefficient (CCE) by constructing the graph to select the candidate feature set is presented. The nominated features are assembled into limited number of clusters by evaluating their CCE and considering the highest SU score feature. In every cluster, a feature with highest SU score is selected while remaining features in the same cluster are disregarded. The presented methodology was investigated with Ten(10) well known data sets. Exploratory results assures that the presented method is out pass than most of the traditional feature selection methods in accuracy. This framework is assessed using Lazy, Tree Based, Naive Bayes, and Rule Based learners.

___

  • 1. Liao, S.H., Chu, P.H. and Hsiao, P.Y., 2012. Data mining techniques and applications–A decade review from 2000 to 2011. Expert systems with applications, 39(12), pp.11303-11311.
  • 2. Kamal, N.A.M., Bakar, A.A. and Zainudin, S., 2015, August. Filter-wrapper approach to feature selection of GPCR protein. In Electrical Engineering and Informatics (ICEEI), 2015 International Conference on (pp. 693-698). IEEE.
  • 3. Chatcharaporn, K.O.M.K.I.D., Kittidachanupap, N.A.R.O.D.O.M., Kerdprasop, K.I.T.T.I.S.A.K. and KERDPRASOP, N., 2012. Comparison of feature selection and classification algorithms for restaurant dataset classification. In Proceedings of the 11th Conference on Latest Advances in Systems Science & Computational Intelligence (pp. 129-134).
  • 4. Chandrashekar, G. and Sahin, F., 2014. A survey on feature selection methods. Computers & Electrical Engineering, 40(1), pp.16-28.
  • 5. Lin, K.C., Zhang, K.Y., Huang, Y.H., Hung, J.C. and Yen, N., 2016. Feature selection based on an improved cat swarm optimization algorithm for big data classification. The Journal of Supercomputing, 72(8), pp.3210-3221.
  • 6. Panthong, R. and Srivihok, A., 2015. Wrapper Feature Subset Selection for Dimension Reduction Based on Ensemble Learning Algorithm. Procedia Computer Science, 72, pp.162-169.
  • 7. Maldonado, S. and Weber, R., 2011, November. Embedded Feature Selection for Support Vector Machines: State-of-the-Art and Future Challenges. In CIARP (pp. 304-311).
  • 8. Fonti, V. and Belitser, E., 2017. Feature Selection using LASSO. https://beta.vu.nl/nl/Images/werkstuk-fonti_tcm235-836234.pdf
  • 9. Lebedev, A.V., Westman, E., Van Westen, G.J.P., Kramberger, M.G., Lundervold, A., Aarsland, D., Soininen, H., Kłoszewska, I., Mecocci, P., Tsolaki, M. and Vellas, B., 2014. Random Forest ensembles for detection and prediction of Alzheimer's disease with a good between-cohort robustness. NeuroImage: Clinical, 6, pp.115-125.
  • 10. Hsu, H.H., Hsieh, C.W. and Lu, M.D., 2008, November. A Hybrid Feature Selection Mechanism. In Intelligent Systems Design and Applications, 2008. ISDA'08. Eighth International Conference on (Vol. 2, pp. 271-276). IEEE.
  • 11. Potharaju, S.P. and Sreedevi, M., 2017.A Novel Cluster of Feature Selection Method Based on Information Gain.IJCTA, 10(14),pp.9-16
  • 12. Peng, H., Long, F. and Ding, C., 2005. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on pattern analysis and machine intelligence, 27(8), pp.1226-1238.
  • 13. Ding, C. and Peng, H., 2005. Minimum redundancy feature selection from microarray gene expression data. Journal of bioinformatics and computational biology, 3(02), pp.185-205.
  • 14. Yuan, M., Yang, Z., Huang, G. and Ji, G., 2017. Feature selection by maximizing correlation information for integrated high-dimensional protein data. Pattern Recognition Letters, 92, pp.17-24.
  • 15. Partila, P., Voznak, M. and Tovarek, J., 2015. Pattern recognition methods and features selection for speech emotion recognition system. The Scientific World Journal, 2015.
  • 16. Koprinska, I., Rana, M. and Agelidis, V.G., 2015. Correlation and instance based feature selection for electricity load forecasting. Knowledge-Based Systems, 82, pp.29-40.
  • 17. Song, Q., Ni, J. and Wang, G., 2013. A fast clustering-based feature subset selection algorithm for high-dimensional data. IEEE transactions on knowledge and data engineering, 25(1), pp.1-14.