Sai Prasad POTHARAJU, Marriboyina SREEDEVI

An Unsupervised Approach for Selection of Candidate Feature Set Using Filter Based Techniques

High dimensionality is the one of the important issue in preprocessing stage of data mining.Initial feature space may have irrelevant or redundant features. These properties of featuresdecrease the performance of classifier, and also require more memory and high computingpower. This issue can be addressed by selecting the best feature subset for improving theclassification performance. In this research, we have proposed an unsupervised approach usingfilter based feature selection methods and K-Means clustering technique to derive the candidatesubset. Score of each feature is calculated using traditional filter based methods. Then Min-Maxtechnique is applied to normalize the dataset. K-Means algorithm is employed on the dataset toform the clusters of features. To decide the strong subset, Multi-Layer Perceptron(MLP) isapplied on each cluster. Best cluster is selected based on the minimum Root Mean Square(RMS) error rate given by MLP. This framework is compared with traditional methods over sixwell known datasets having the total features in between 34 and 90 using various classificationalgorithms. The proposed method recorded 75% competitive rate than Information Gain(IG),71% success rate than Gain Ratio Attribute Evaluator(GR) and Chi Square AttributeEvaluator(Chi), 83% competitive rate than ReliefF(Rel) traditional methods. Jrip classifierperformed 55%, J48 recorded 66%, Naive Bayes displayed 88%, IBK (Instance Based)displayed 80% success rate over all the datasets.

PDF

___

Liu, H., Yu, L., “Toward integrating feature selection algorithms for classification and clustering”, IEEE Transactions on knowledge and data engineering, 17(4): 491-502,(2005).
Liu, C., Wang, W., Zhao, Q., Shen, X., Konan, M.,” A new feature selection method based on a validity index of feature subset”, Pattern Recognition Letters, 92(1):1-8,(2017).
Stańczyk, U., “Feature evaluation by filter, wrapper, and embedded approaches”, In Feature Selection for Data and Pattern Recognition,584(1): 29-44,(2015).
Chandrashekar, G.,Sahin, F.,”A survey on feature selection methods”, Computers & Electrical Engineering, 40(1): 16-28,(2014).
Omar, N., Albared, M., Al-Moslmi, T., Al-Shabi, A.,”A comparative study of feature selection and machine learning algorithms for arabic sentiment classification”, In Asia information retrieval symposium,8870(1):429-443,(2014).
Kumar, V., Minz, S.,”Poem classification using machine learning approach”, In Proceedings of the Second International Conference on Soft Computing for Problem Solving,236(1): 675-682,(2012).
Kumar, V., Minz, S., “Multi-view ensemble learning for poem data classification using SentiWordNet”, In Advanced Computing, Networking and Informatics,1(1): 57-66,(2014).
Friedman, J., Hastie, T.,Tibshirani, R., “The elements of statistical learning “,Springer series in statistics,1(1):337-387,(2001).
Kumar, N., "Literature Review-Chronic Regulatory Focus and Financial Decision-Making”. Springer Briefs in Finance,1(1):5-20,( 2016).
Potharaju, S. P., Sreedevi, M., ”A Novel M-Cluster of Feature Selection Approach Based on Symmetrical Uncertainty for Increasing Classification Accuracy of Medical Datasets”, Journal of Engineering Science & Technology Review, 10(6):154-162,(2017).
Potharaju, S. P., Sreedevi, M., “A Novel Clustering Based Candidate Feature Selection Framework Using Correlation Coefficient for Improving Classification Performance”, Journal of Engineering Science and Technology Review, 10(6): 38-43,(2017).
Hall, M. A., Smith, L. A.,”Practical feature subset selection for machine learning”, Proceedings of the 21st Australasian Computer Science Conference ACSC, 1(1):181-191,(1998).
Liu, H., Setiono, R., “Chi2: Feature selection and discretization of numeric attributes”, Proc. IEEE 7th International Conference on Tools with Artificial Intelligence, 1(1): 338-391,(1995).
Robnik-Šikonja, M., Kononenko, I. ,”Theoretical and empirical analysis of ReliefF and RReliefF”, Machine learning, 53(1): 23-69,(2003).
Kalapatapu, P., Goli, S., Arthum, P., Malapati, A.,”A Study on Feature Selection and Classification Techniques of Indian Music”, Procedia Computer Science, 98(1): 125-131,(2016).
Novaković, J.,”Toward optimal feature selection using ranking methods and classification algorithms”, Yugoslav Journal of Operations Research, 21(1):119-135,(2016).
Alshalabi, H., Tiun, S., Omar, N., Albared, M. , “Experiments on the use of feature selection and machine learning methods in automatic malay text categorization”, Procedia Technology, 11(1): 748-754,(2013).
Hasan, M. A. M., Nasser, M., Ahmad, S., Molla, K. I.,”Feature selection for intrusion detection using random forest”, Journal of information security, 7(03): 129-140,(2016).
Charrad, M., Ghazzali, N., Boiteau, V., Niknafs, A. ,”NbClust: An R Package for Determining the Relevant Number of Clusters in a Data Set”, Journal of Statistical Software, 61(6): 1-36,(2014).
Chawla, N. V., Bowyer, K. W., Hall, L. O., Kegelmeyer, W. P. ,”SMOTE: synthetic minority oversampling technique”, Journal of artificial intelligence research, 16(1): 321-357,(2002).
Potharaju, S. P., Sreedevi, M., "An Improved prediction of Kidney disease using SMOTE." ,Indian Journal of Science and Technology ,9(31):1-7, (2016).
Potharaju, S. P., Sreedevi, M., "Ensembled Rule Based Classification Algorithms for predicting Imbalanced Kidney Disease Data" ,Journal of Engineering Science and Technology Review ,9(5): 201-207,(2016).
Singh, B., Kushwaha, N., Vyas, O. P.,”A feature subset selection technique for high dimensional data using symmetric uncertainty”, Journal of Data Analysis and Information Processing, 2(04), 95- 105,(2014).
Kumar, C. A., Sooraj, M. P.,Ramakrishnan, S.,” A Comparative Performance Evaluation of Supervised Feature Selection Algorithms on Microarray Datasets”. Procedia Computer Science, 115(1): 209-217,(2017).
Patro, S., Sahu, K. K.,"Normalization: A preprocessing stage", Cornell University Library,15(03):64- 62 ,(2015).
Dua, D., Karra Taniskidou, E.,” UCI Machine Learning Repository”,School of Information and Computer Science,Irvine, CA: University of California,1(1):1,(2017).
Witten, I. H., Frank, E., Hall, M. A., Pal, C. J. ,” The WEKA Workbench Data Mining: Practical Machine Learning Tools and Techniques", Morgan Kaufmann, 4(1):285-334,( 2016).