An online approach for feature selection for classification in big data
An online approach for feature selection for classification in big data
: Feature selection (FS), also known as attribute selection, is a process of selection of a subset of relevant features used in model construction. This process or method improves the classification accuracy by removing irrelevant and noisy features. FS is implemented using either batch learning or online learning. Currently, the FS methods are executed in batch learning. Nevertheless, these techniques take longer execution time and require larger storage space to process the entire dataset. Due to the lack of scalability, the batch learning process cannot be used for large data. In the present study, a scalable efficient Online Feature Selection (OFS) approach using the Sparse Gradient (SGr) technique was proposed to select the features from the dataset online. In this approach, the feature weights are proportionally decremented based on the threshold value, which results in attaining zeros for the insignificant features weights. In order to demonstrate the efficiency of this approach, an extensive set of experiments was conducted using 13 real-world datasets that range from small to large size. The results of the experiments showed an improved classification accuracy of 15%, which is considered to be significant when compared with the existing methods.
___
- [1] Liu H, Wu X, Zhang S. A new supervised feature selection method for pattern classification. Comput Intell 2014; 30: 342-361.
- [2] Zhu P, Zuo W, Zhang L, Hu Q, Shiu SCK. Unsupervised feature selection by regularized self-representation. Pattern Recogn 2015; 48: 438-446.
- [3] Freeman C, Kulic D, Basir O. An evaluation of classifier-specific filter measure performance for feature selection. Pattern Recogn 2015; 48: 1812-1826.
- [4] Sarkar C, Cooley S, Srivasta J. Robust feature selection technique using rank aggregation. Appl Artif Intell 2014; 28: 243-257.
- [5] Chyzhyk D, Savio A, Grana M. Evolutionary ELM wrapper feature selection for Alzheimers disease CAD on anatomical brain MRI. Neurocomputing 2014; 128: 73-80.
- [6] Crammer K, Dredze M, Pereira F. Exact convex confidence-weighted learning. In: 23rd Annual Conference on Advances in Neural Information Processing Systems; 612 December 2009; Vancouver, Canada. pp. 345-352.
- [7] Wu X, Yu K, Ding W, Wang H, Zhu X. Online feature selection with streaming features. IEEE T Pattern Anal 2013; 35: 1178-1192.
- [8] Perkins S, Theiler J. Online feature selection using grafting. In: 20th International Conference on Machine Learning; 2124 August 2003; Washington, DC, USA. pp. 592-599.
- [9] Zhou J, Foster D, Stine R, Ungar L. Streaming feature selection using alpha-investing. In: ACM 2005 Eleventh SIGKDD International Conference on Knowledge Discovery and Data Mining; 2124 August 2005; Chicago, IL, USA. New York, NY, USA: ACM. pp. 384-393.
- [10] Hoi SCH, Wang J, Zhao P, Jin R. Online feature selection for mining big data. In: ACM 2012 First International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications; 1216 August 2012; Beijing, China. New York, NY, USA: ACM. pp. 93-100.
- [11] Wang J, Zhao P, Hoi SCH, Jin R. Online feature selection and its applications. IEEE T Knowl Data En 2014; 26: 698-710.
- [12] Kogan J. Feature selection over distributed data streams. In: Yada K, editor. Data Mining for Service. Heidelberg, Germany: Springer-Verlag, 2014. pp. 11-26.
- [13] Langford J, Li L, Zhang T. Sparse online learning via truncated gradient. J Mach Learn Res 2009; 10: 719-743.