A multiseed-based SVM classification technique for training sample reduction

  A support vector machine (SVM) is not a popular method for a very large dataset classification because the training and testing time for such data are computationally expensive. Many researchers try to reduce the training time of SVMs by applying sample reduction methods. Many methods reduced the training samples by using a clustering technique. To reduce its high computational complexity, several data reduction methods were proposed in previous studies. However, such methods are not effective to extract informative patterns. This paper demonstrates a new supervised classification method, multiseed-based SVM (MSB-SVM), which is particularly intended to deal with very large datasets for multiclass classification. The main contributions of the paper are (i) an efficient multiseed technique for selection of seed points from circular/elongated class training samples, (ii) adjacent class pair selection from the set of multiseeds by using the minimum spanning tree, and (iii) extraction of support vectors from class pair seed equivalent regions to manage multiclass classification problems without being computationally expensive. Experimental results on a variety of datasets showed better performance compared to other sample-reducing methods in terms of training and testing time. Traditional support vector machine (SVM) solution suffers from O(n2)O(n2)O(n^{2}) time complexity, which makes it impractical for very large datasets. Here, multiseed point technique depends on the estimated density of each data, and the order of computation is O(nO(nO(n log n)n)n). Using the estimated density, the computational cost of the seed selection algorithm is O(n)O(n)O(n). So, this is the only burden for reducing the sample. However, reducing the sample takes less time with the proposed algorithm compared to the clustering methods. At the same time, the number of support vectors has been abruptly reduced, which takes less time to find the decision surface. Apart from this, the classification accuracy of the proposed technique is significantly better than other existing sample reduction methods especially for large datasets.

___

  • Vapnik VN. The Nature of Statistical Learning Theory. New York, NY, USA: Springer, 1995.
  • Foody GM, Mathur A. A relative evaluation of multiclass image classification by support vector machines. IEEE T Geosci Remote 2004; 42: 1335-1343.
  • Foody GM, Mathur A. toward intelligent training of supervised image classifications: directing training data acquisition for SVM classification. Remote Sens Environ 2004; 93: 107-117.
  • Du S, Chen S. Weighted support vector machine for classification. IEEE Sys Man Cybern 2005; 4: 3866-3871.
  • Tsai C. Training support vector machines based on stacked generalization for image classification. Neurocomputing 2005; 64: 497-503.
  • Strack R, Kecman V, Strack B, Li Q. Sphere support vector machines for large classification tasks. Neurocomputing 2013; 101: 59-67.
  • Gautam RS, Singh D, Mittal A, Sajin P. Application of SVM on satellite images to detect hotspots in Jharia coal field region of India. Adv Space Res 2014; 41: 1784-1792.
  • Hwang YS, Kwon JB, Moon JC, Cho SJ. Classifying malicious web pages by using an adaptive support vector machine. Journal of Information Processing Systems 2013; 9: 395-404.
  • Ghoggali N, Melgani F, Bazi Y. A multiobjective genetic SVM approach for classification problems with limited training samples. IEEE T Geosci Remote Sensing 2009; 47: 1707-1718.
  • Li X, Cervantes J, Yu W. Two-stage SVM classification for large data sets via randomly reducing and recovering training data. IEEE Sys Man Cybern; 7–10 Oct 2007; Montreal Que. Canada: pp. 3633-3638.
  • Yu H, Yang J, Han J. Classifying large datasets using SVMs with hierarchical clusters. Lect Notes Artif Int; 24–27 Aug 2003; Washington DC, USA: pp. 306-315.
  • Tong S, Koller D. Support vector machine active learning with applications to text classification. The Journal of Machine Learning Research 2002; 2: 45-66.
  • Lee YJ, Huang SY. RSVM: Reduced support vector machines. IEEE T Neural Networ 2007; 18: 1-13.
  • Folinno G, Pizzuti C, Spezzano G. GP Ensembles for large-scale data classification. IEEE T Evolut Comput 2006; 10: 604-616.
  • Lin CT, Yeh CM, Liang SF, Chung JF, Kumar N. Support-vector-based fuzzy neural network for pattern classifi- cation. IEEE T Fuzzy Syst 2006; 14: 31-41.
  • Tresp V. A bayesian committee machine. Neural Computation 2000; 12: 2719-2741.
  • Cervantesa J, Li X, Yu W, Li K. Support vector machine classification for large data sets via minimum enclosing ball clustering. Neurocomputing 2008; 71: 611-619.
  • Li X, Cervantes J, Yu W. Two-stage SVM classification for large data sets via randomly reducing and recovering training data. IEEE Sys Man Cybern; 7–10 Oct 2007; Montreal, Que., Canada: pp. 3633-3638.
  • Lin WC, Tsai CF, Ke SW, Hung CW, Eberle W. Learning to detect representative data for large scale instance selection. J Syst Software 2015; 106: 1-8.
  • Liu C, Wang W, Wang M, Lv F, Konan M. An efficient instance selection algorithm to reconstruct training set for support vector machine. Knowl-Based Syst 2017; 116: 58-73.
  • Gonzalez AA, Pastor JFD Rodriguez JJ, Osorio CG. Instance selection of linear complexity for big data. Knowl- Based Syst 2016; 107: 83-95.
  • Feng W, Huang W, Ren J. Class imbalance ensemble learning based on the margin theory. Applied Sciences 2018; 8: 815-843.
  • Wang S, Li Z, Liu C, Zhang X, Zhang H. Training data reduction to speed up SVM training. Appl Intell 2014; 41: 405-420.
  • Chaudhuri D, Chaudhuri BB. A novel multiseed nonhierarchical data clustering technique. IEEE T Syst Man Cy B 1997; 27: 871-877.
  • Melgani F, Bruzzone L. Classification of hyper-spectral remote sensing images with support vector machines. IEEE T Geosci Remote 2004; 42: 1778-1790.
  • Hsu CW, Lin CJ. A comparison of methods for multi-class support vector machines. IEEE T Neural Networ 2002; 13: 415-425.
  • Cheng L, Zhang J, Yang J, Ma J. An improved hierarchical multi-class support vector machine with binary tree architecture. International Conference on Internet Computing in Science and Engineering; 28–29 Jan 2008; Harbin, China: pp. 412-414.
  • Liu XZ, Feng GC. Kernel bisecting k-means clustering for SVM training sample reduction. Int C Patt Recog; 08-11 Dec 2008; Tampa, FL, USA: pp. 4562-4568.
  • Wang D, Shi L. Selecting valuable training samples for SVMs via data structure analysis. Neurocomputing 2008; 71: 2772-2781.