Correlation Coefficient Based Feature Selection Framework Using Graph Construction

Correlation Coefficient Based Feature Selection Framework Using Graph Construction

In machine learning, selecting the best features for classification is a critical issue. It is anecessary task to reduce the number of attributes/features existed in the initial feature space forachieving the outstanding classification accuracy, to minimize the computing power, and toreduce the memory size. In this present research, a novel methodology is proposed based on theconcept of Symmetrical Uncertainty (SU) and Correlation Coefficient (CCE) by constructingthe graph to select the reduced feature set. The recommended features by the proposedmethodology are clubbed into finite number of groups (clusters) by measuring their CCE andconsidering the highest SU score of the feature. From each group, a feature which has maximumSU value is picked up and rest of the features in the same group are ignored. The proposedstructure was inspected with ten (10) real world data sets available in the public domain.Experimental outcomes guarantees that the proposed method is recorded the better performancethan most of the traditional filter based feature selection methods. The proposed methodperformed better than traditional methods such as Information Gain and Chi-Square on 70 % ofthe data sets. It is also produced better result than traditional Gain ration method on 80 % of thedata sets and competing with traditional ReliefF approach on 50 % of the data sets. Thismethodology is assessed using Lazy, Tree Based, Naive Bayes, and Rule Based learners.

___

  • Liao, S.H., Chu, P.H., Hsiao, P.Y.,”Data mining techniques and applications–A decade review from 2000 to 2011”, Expert systems with applications, 39(12):11303-11311,(2012).
  • Jović, A., Brkić, K., Bogunović, N., “A review of feature selection methods with applications”, 38th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO),1(1): 1200-1205,(2015).
  • Sharma, A., Dey, S., “A comparative study of feature selection and machine learning techniques for sentiment analysis”, In Proceedings of the 2012 ACM research in applied computation symposium, 1(1):1-7,(2012).
  • Chandrashekar, G., Sahin, F.,”A survey on feature selection methods”, Computers & Electrical Engineering, 40(1):16-28,(2014).
  • Abualigah, L. M., Khader, A. T., “Unsupervised text feature selection technique based on hybrid particle swarm optimization algorithm with genetic operators for the text clustering”, The Journal of Supercomputing, 73(11):4773-4795,(2017).
  • Wang, A., An, N., Yang, J., Chen, G., Li, L.,Alterovitz, G.,” Wrapper-based gene selection with Markov blanket”, Computers in biology and medicine, 81(1):11-23,(2017).
  • Duan, K. B., Rajapakse, J. C., Wang, H., Azuaje, F. ,” Multiple SVM-RFE for gene selection in cancer classification with expression data”, IEEE transactions on nanobioscience, 4(3):228- 234,(2005).
  • Thakurta, A. G., Smith, A.,”Differentially private feature selection via stability arguments, and the robustness of the lasso”, In Conference on Learning Theory,1(1): 819-850,(2013).
  • Rodriguez-Galiano, V. F., Ghimire, B., Rogan, J., Chica-Olmo, M.,Rigol-Sanchez, J. P.,”An assessment of the effectiveness of a random forest classifier for land-cover classification”, ISPRS Journal of Photogrammetry and Remote Sensing, 67(1): 93-104,(2012).
  • Hsu, H. H., Hsieh, C. W., Lu, M. D. , “Hybrid feature selection by combining filters and wrappers”, Expert Systems with Applications, 38(7): 8144-8150,(2011).
  • Potharaju, S. P., Sreedevi, M.,” A Novel Cluster of Feature Selection Method Based on Information Gain” , IJCTA, 10(14): 9-16,(2017).
  • Maji, P., Garai, P., “On fuzzy-rough attribute selection: criteria of max-dependency, max-relevance, min-redundancy, and max-significance”, Applied Soft Computing, 13(9): 3968-3980,(2013).
  • Ding, C., Peng, H.,” Minimum redundancy feature selection from microarray gene expression data”, Journal of bioinformatics and computational biology, 3(2):185-205,(2005).
  • Yuan, M., Yang, Z., Huang, G., Ji, G.,” Feature selection by maximizing correlation information for integrated high-dimensional protein data”, Pattern Recognition Letters, 92(1), 17-24,(2017).
  • Partila, P., Voznak, M., Tovarek, J.,” Pattern recognition methods and features selection for speech emotion recognition system”, The Scientific World Journal, 15(1):1-7,(2015).
  • Koprinska, I., Rana, M., Agelidis, V. G.,” Correlation and instance based feature selection for electricity load forecasting”, Knowledge-Based Systems, 82(1): 29-40,(2015).
  • Mudaliar, P. U., Patil, T. A., Thete, S. S., Moholkar, K. P.,”A Fast Clustering Based Feature Subset Selection Algorithm for High Dimensional Data”, International journal of emerging trend in engineering and basic science, 2(1):494-499,(2015).
  • Potharaju, S. P.,Sreedevi, M.,”A Novel M-Cluster of Feature Selection Approach Based on Symmetrical Uncertainty for Increasing Classification Accuracy of Medical Datasets”, Journal of Engineering Science & Technology Review, 10(6):154-162,(2017).
  • Potharaju, S. P., Sreedevi, M.,”A Novel Subset Feature Selection Framework for Increasing the Classification Performance of SONAR Targets”, Procedia Computer Science, 125(1): 902- 909,(2018).