A multiseed-based SVM classification technique for training sample reduction
A support vector machine (SVM) is not a popular method
for a very large dataset classification because the training and testing
time for such data are computationally expensive. Many researchers try to
reduce the training time of SVMs by applying sample reduction methods. Many
methods reduced the training samples by using a clustering technique. To
reduce its high computational complexity, several data reduction methods
were proposed in previous studies. However, such methods are not effective
to extract informative patterns. This paper demonstrates a new supervised
classification method, multiseed-based SVM (MSB-SVM), which is particularly
intended to deal with very large datasets for multiclass classification. The
main contributions of the paper are (i) an efficient multiseed technique for
selection of seed points from circular/elongated class training samples,
(ii) adjacent class pair selection from the set of multiseeds by using the
minimum spanning tree, and (iii) extraction of support vectors from class
pair seed equivalent regions to manage multiclass classification problems
without being computationally expensive. Experimental results on a variety
of datasets showed better performance compared to other sample-reducing
methods in terms of training and testing time. Traditional support vector
machine (SVM) solution suffers from O(n2)O(n2)O(n^{2}) time complexity, which makes
it impractical for very large datasets. Here, multiseed point technique
depends on the estimated density of each data, and the order of computation
is O(nO(nO(n log n)n)n). Using the estimated density, the computational cost of the seed
selection algorithm is O(n)O(n)O(n). So, this is the only burden for reducing the
sample. However, reducing the sample takes less time with the proposed
algorithm compared to the clustering methods. At the same time, the number
of support vectors has been abruptly reduced, which takes less time to find
the decision surface. Apart from this, the classification accuracy of the
proposed technique is significantly better than other existing sample
reduction methods especially for large datasets.
___
- Vapnik VN. The Nature of Statistical Learning Theory. New York, NY, USA: Springer, 1995.
- Foody GM, Mathur A. A relative evaluation of multiclass image classification by support vector machines. IEEE T
Geosci Remote 2004; 42: 1335-1343.
- Foody GM, Mathur A. toward intelligent training of supervised image classifications: directing training data
acquisition for SVM classification. Remote Sens Environ 2004; 93: 107-117.
- Du S, Chen S. Weighted support vector machine for classification. IEEE Sys Man Cybern 2005; 4: 3866-3871.
- Tsai C. Training support vector machines based on stacked generalization for image classification. Neurocomputing
2005; 64: 497-503.
- Strack R, Kecman V, Strack B, Li Q. Sphere support vector machines for large classification tasks. Neurocomputing
2013; 101: 59-67.
- Gautam RS, Singh D, Mittal A, Sajin P. Application of SVM on satellite images to detect hotspots in Jharia coal
field region of India. Adv Space Res 2014; 41: 1784-1792.
- Hwang YS, Kwon JB, Moon JC, Cho SJ. Classifying malicious web pages by using an adaptive support vector
machine. Journal of Information Processing Systems 2013; 9: 395-404.
- Ghoggali N, Melgani F, Bazi Y. A multiobjective genetic SVM approach for classification problems with limited
training samples. IEEE T Geosci Remote Sensing 2009; 47: 1707-1718.
- Li X, Cervantes J, Yu W. Two-stage SVM classification for large data sets via randomly reducing and recovering
training data. IEEE Sys Man Cybern; 7–10 Oct 2007; Montreal Que. Canada: pp. 3633-3638.
- Yu H, Yang J, Han J. Classifying large datasets using SVMs with hierarchical clusters. Lect Notes Artif Int; 24–27
Aug 2003; Washington DC, USA: pp. 306-315.
- Tong S, Koller D. Support vector machine active learning with applications to text classification. The Journal of
Machine Learning Research 2002; 2: 45-66.
- Lee YJ, Huang SY. RSVM: Reduced support vector machines. IEEE T Neural Networ 2007; 18: 1-13.
- Folinno G, Pizzuti C, Spezzano G. GP Ensembles for large-scale data classification. IEEE T Evolut Comput 2006;
10: 604-616.
- Lin CT, Yeh CM, Liang SF, Chung JF, Kumar N. Support-vector-based fuzzy neural network for pattern classifi-
cation. IEEE T Fuzzy Syst 2006; 14: 31-41.
- Tresp V. A bayesian committee machine. Neural Computation 2000; 12: 2719-2741.
- Cervantesa J, Li X, Yu W, Li K. Support vector machine classification for large data sets via minimum enclosing
ball clustering. Neurocomputing 2008; 71: 611-619.
- Li X, Cervantes J, Yu W. Two-stage SVM classification for large data sets via randomly reducing and recovering
training data. IEEE Sys Man Cybern; 7–10 Oct 2007; Montreal, Que., Canada: pp. 3633-3638.
- Lin WC, Tsai CF, Ke SW, Hung CW, Eberle W. Learning to detect representative data for large scale instance
selection. J Syst Software 2015; 106: 1-8.
- Liu C, Wang W, Wang M, Lv F, Konan M. An efficient instance selection algorithm to reconstruct training set for
support vector machine. Knowl-Based Syst 2017; 116: 58-73.
- Gonzalez AA, Pastor JFD Rodriguez JJ, Osorio CG. Instance selection of linear complexity for big data. Knowl-
Based Syst 2016; 107: 83-95.
- Feng W, Huang W, Ren J. Class imbalance ensemble learning based on the margin theory. Applied Sciences 2018;
8: 815-843.
- Wang S, Li Z, Liu C, Zhang X, Zhang H. Training data reduction to speed up SVM training. Appl Intell 2014; 41:
405-420.
- Chaudhuri D, Chaudhuri BB. A novel multiseed nonhierarchical data clustering technique. IEEE T Syst Man Cy
B 1997; 27: 871-877.
- Melgani F, Bruzzone L. Classification of hyper-spectral remote sensing images with support vector machines. IEEE
T Geosci Remote 2004; 42: 1778-1790.
- Hsu CW, Lin CJ. A comparison of methods for multi-class support vector machines. IEEE T Neural Networ 2002;
13: 415-425.
- Cheng L, Zhang J, Yang J, Ma J. An improved hierarchical multi-class support vector machine with binary tree
architecture. International Conference on Internet Computing in Science and Engineering; 28–29 Jan 2008; Harbin,
China: pp. 412-414.
- Liu XZ, Feng GC. Kernel bisecting k-means clustering for SVM training sample reduction. Int C Patt Recog; 08-11
Dec 2008; Tampa, FL, USA: pp. 4562-4568.
- Wang D, Shi L. Selecting valuable training samples for SVMs via data structure analysis. Neurocomputing 2008;
71: 2772-2781.