Comparison of Different Training Data Reduction Approaches for Fast Support Vector Machines Based on Principal Component Analysis and Distance Based Measurements

Comparison of Different Training Data Reduction Approaches for Fast Support Vector Machines Based on Principal Component Analysis and Distance Based Measurements

Support vector machine is a supervised learning algorithm, which is recommended for classification and nonlinear function approaches. Support vector machines require remarkable amount of memory with time consuming process for large data sets in the training process. The main reason for this is the solving a constrained quadratic programming problem within the algorithm. In this paper, we proposed three approaches for identifying the non-critical points in training set and remove these non-critical points from the original training set for speeding up the training process of support vector machine. For this purpose, we used principal component analysis, Mahalanobis distance and Euclidean distance based measurements for the elimination of non-critical training instances in the training set. We compared the proposed methods in terms of computational time and classification accuracy between each other and conventional support vector machine. Our experimental results show that principal component analysis and Mahalanobis distance based proposed methods have positive effects on computational time without degrading the classification results than the Euclidean distance based proposed method and conventional support vector machine.

___

  • Vapnik V. “The nature of statistical learning theory” (Springer-Verlag, New York, 1995).
  • Cervantes J., X. Li, W. Yu, “Support vector classification for large data sets by reducing training data with change of classes”, International conference on systems, man. and cybernetics, IEEE, 2008, 2609-2614.
  • Javed I., M.N. Ayyaz, W. Mehmood “Efficient training data reduction for SVM based handwritten digits’ recognition”, International conference on electrical engineering, 2007, 1-4.
  • Koggalage R., S. Halgamuge “Reducing the number of training samples for fast support vector machine classification”, Neural information processing - letters and reviews, vol. 2, no. 3, 2004, 57-65.
  • Fortuna J., D. Capson, “Improved support vector classification using PCA and ICA feature space modification”, Pattern recognition, 37, 2004, 1117-1129.
  • Cao L. J., K.S. Chua, W.K. Chong, H.P. Lee, Q.M. Gu “A comparison of PCA, KPCA and ICA for dimensionality reduction in support vector machine”, Neurocomputing, 55, 2003, 321-336.
  • Subasi A., M.I. Gursoy “EEG signal classification using PCA, ICA, LDA and support vector machines”, Expert systems with applications, vol. 37, no. 12, 2010, 8659-8666.
  • Gertych A., A. Zhang, J. Sayre, S.P. Kurkowska, H.K. Huang “Bone age assessment of children using a digital hand atlas”, Computerized Medical Imaging and Graphics, 31, 2007, 322-331.
  • Güraksın G. E., H. Uğuz, Ö.K. Baykan “Bone age determination in young children from newborn to 6 year-old using support vector machines”, Turkish Journal of Electrical Engineering and Computer Sciences, 24, 1693-1708.
  • Frank A., A. Asuncion “UCI Machine Learning Repository” [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science, 2010.
  • Edla D. R., V. Gondlekar, V. Gauns “HK-Means:A Heuristic Approach to Initialize and estimate the number of clusters in biological data”, ICCESEN2016, ACTA PHYSICA POLONICA A, 130(1), 2016,78-82.
  • Grimaldi M., P. Cunningham, A. Kokaram “An evaluation of alternative feature selection strategies and ensemble techniques for classifying music”, Workshop in Multimedia Discovery and Mining, ECML/PKDD03, Dubrovnik, Croatia, 2003.
  • Tamura H., K. Tanno “Midpoint validation method for support vector machines with margin adjustment technique”, International Journal of Innovative Computing, Information and Control, 5(11(A)), 2009, 4025-4032.
  • Yüksel A. S., Ş.F. Çankaya, İ.S. Üncü “Design of a machine learning based predictive analytic system for spam problem”, ICCESEN2016, ACTA PHYSICA POLONICA A, 132 (2), 2017, 500-504.
  • Cömert Z., A.F. Kocamaz “Comparison of Machine Learning Technişques for Fetal Heart Rate Classification”, ICCESEN2016, ACTA PHYSICA POLONICA A, 132(3), 2017, 451-454.
  • Uğuz H. “A Biomedical System Based on Artificial Neural Network and Principal Component Analysis for Diagnosis of the Heart Valve Diseases”, J. Med. Syst., 36, 2012, 61-72.
  • Lindsay I., A. Smith “A tutorial on principal components analysis”, < http://kybele.psych.cornell.edu/~edelman/Psych-465-Spring-2003/PCA-tutorial>, 2002.
  • Valle S., W. Li, S.J. Qin “Selection of the number of principal components: The variance of the reconstruction error criterion with a comparison to other methods”, Ind. Eng. Chem. Res. 38, 1999, 4389–4401.
  • Torra V., Y. Narukawa “On a comparison between Mahalanobis distance and Choquet integral: The Choquet–Mahalanobis operatör”, Information Sciences, 190, 2012, 56-63.