A penalty function method for designing efficient robust classifiers with input space optimal separating surfaces

This paper considers robust classification as a constrained optimization problem. Where the constraints are nonlinear, inequalities defining separating surfaces, whose half spaces include or exclude the data depending on their classes and the cost, are used for attaining robustness and providing the minimum volume regions specified by the half spaces of the surfaces. The constraints are added to the cost using penalty functions to get an unconstrained problem for which the gradient descent method can be used. The separating surfaces, which are aimed to be found in this way, are optimal in the input data space in contrast to the conventional support vector machine (SVM) classifiers designed by the Lagrange multiplier method, which are optimal in the (transformed) feature space. Two types of surfaces, namely hyperellipsoidal and Gaussian-based surfaces created by radial basis functions (RBFs), are focused on in this paper due to their generality. Ellipsoidal classifiers are trained in 2 stages: a spherical surface is found in the first stage, and then the centers and the radii found in the first stage are taken as the initial input for the second stage to find the center and covariance matrix parameters of the ellipsoids. The penalty function approach to the design of robust classifiers enables the handling of multiclass classification. Compared to SVMs, multiple-kernel SVMs, and RBF classifiers, the proposed classifiers are found to be more efficient in terms of the required training time, parameter setting time, testing time, memory usage, and generalization error, especially for medium to large datasets. RBF-based input space optimal classifiers are also introduced for problems that are far from ellipsoidal, e.g., 2 Spirals.

A penalty function method for designing efficient robust classifiers with input space optimal separating surfaces

This paper considers robust classification as a constrained optimization problem. Where the constraints are nonlinear, inequalities defining separating surfaces, whose half spaces include or exclude the data depending on their classes and the cost, are used for attaining robustness and providing the minimum volume regions specified by the half spaces of the surfaces. The constraints are added to the cost using penalty functions to get an unconstrained problem for which the gradient descent method can be used. The separating surfaces, which are aimed to be found in this way, are optimal in the input data space in contrast to the conventional support vector machine (SVM) classifiers designed by the Lagrange multiplier method, which are optimal in the (transformed) feature space. Two types of surfaces, namely hyperellipsoidal and Gaussian-based surfaces created by radial basis functions (RBFs), are focused on in this paper due to their generality. Ellipsoidal classifiers are trained in 2 stages: a spherical surface is found in the first stage, and then the centers and the radii found in the first stage are taken as the initial input for the second stage to find the center and covariance matrix parameters of the ellipsoids. The penalty function approach to the design of robust classifiers enables the handling of multiclass classification. Compared to SVMs, multiple-kernel SVMs, and RBF classifiers, the proposed classifiers are found to be more efficient in terms of the required training time, parameter setting time, testing time, memory usage, and generalization error, especially for medium to large datasets. RBF-based input space optimal classifiers are also introduced for problems that are far from ellipsoidal, e.g., 2 Spirals.

___

  • Steinwart I, Christmann A. Support Vector Machines. New York, NY, USA: Springer-Verlag, 2008.
  • Haykin S. Neural Networks and Learning Machines. 3rd ed. Upper Saddle River, USA: Prentice Hall, 2008.
  • Bertsekas DP. Nonlinear Programming. 2nd ed. Belmont, MA, USA: Athena Scientific, 1999.
  • Cortes C, Vapnik VN. Support vector networks. Mach Learn 1995; 20: 273–297.
  • Hsu CW, Lin CJ. A comparison of methods for multi-class support vector machines. IEEE T Neural Networ 2002; 13: 415–425.
  • Mayoraz E, Alpaydın E. Support vector machines for multi-class classification. In: IWANN Conference; 2–4 June 1999; Alicante, Spain. pp. 833–842.
  • Platt JC, Cristianini N, Shawe-Taylor J. Large margin DAG’s for multiclass classification. In: Solla SA, Leen TK, M¨uller KR, editors. Advances in Neural Information Processing Systems. Cambridge, MA: MIT Press, 2000. pp. 547–553.
  • Weston J, Watkins C. Support vector machines for multi-class pattern recognition. In: The 7th European Sympo- sium on Artificial Neural Networks; 21–23 April 1999; Bruges, Belgium. pp. 219–224.
  • Chapelle O, Vapnik V, Bousquet O, Mukherjee S. Choosing multiple parameters for support vector machines. Mach Learn 2002; 46: 131–159.
  • Keerthi SS. Efficient tuning of SVM hyperparameters using radius/margin bound and iterative algorithm. IEEE T Neural Networ 2002; 13: 1225–1229.
  • Joachims T. Making large-scale SVM learning practical. In: Sch¨olkopf B, Burges CJC, Smola AS, editors. Advances in Kernel Methods–Support Vector Learning. Cambridge, MA: MIT Press, 1999. pp. 169–184.
  • Do˘gan H, G¨uzeli¸s C. Robust and fuzzy spherical clustering by a penalty parameter approach. IEEE T Circuits II 2006; 53: 637–641.
  • Mao KZ, Huang G. Neuron selection for RBF neural network classifier based on data structure preserving criterion. IEEE T Neural Networ 2005; 16: 1531–1540.
  • Ayat NE, Cheriet M, Remaki L, Suen CY. KMOD–a new support vector machine kernel with moderate decreasing for pattern recognition. Application to digit image recognition. In: The 6th International Conference on Document Analysis and Recognition; 10–13 September 2001; Quebec, Canada. pp. 1215–1219.
  • Reilly DL, Cooper LN, Elbaum C. A neural model for category learning. Biol Cybern 1982; 45: 35–41.
  • Marchand M, Shawe-Taylor J. The set covering machine. J Mach Learn Res 2002; 3: 723–746.
  • Rosen JB. Pattern separation by convex programming. J Math Anal Appl 1965; 10: 123–134.
  • Barnes R. An algorithm for separating patterns by ellipsoids. IBM J Res Dev 1982; 26: 759–764.
  • Tax D, Duin R. Support vector domain description. Pattern Recogn Lett 1999; 20: 1191–1199.
  • Glineur F. Pattern Separation Via Ellipsoids and Conic Programming. Mons, Belgium: M´emoire de DEA, Facult´e Polytechnique de Mons, 1998.
  • Astorino A, Fuduli A, Gaudioso M. Margin maximization in spherical separation. Comput Optim Appl 2012; 53: 301–322.
  • Okada Y, Konno H. Failure discrimination by semi-definite programming using a maximal margin ellipsoidal surface. J Comput Financ 2009; 12: 63–77.
  • Potra FA, Liu X. Pattern separation and prediction via linear and semi-definite programming. Stud Inform Control 2009; 18: 71–82.
  • Hao PY, Chiang JH, Lin YH. A new maximal-margin spherical-structured multi-class support vector machine. App Intell 2009; 30: 98–111.
  • Gotoh JY, Takeda A. Conditional minimum volume ellipsoid with application to multiclass discrimination. Comput Optim Appl 2008; 41: 27–51.
  • Kharechko A, Shawe-Taylor J. Text categorization via ellipsoid separation. In: Learning Methods for Text Under- standing and Mining Workshop; 26–29 January 2004; Grenoble, France. pp. 19–20.
  • Rakotomamonjy A, Bach F, Canu S, Grandvalet Y. SimpleMKL. J Mach Learn Res 2008; 9: 2491–2521.
  • Duin RPW, Juszczak P, Ridder Dde, Pacl´ık P, Pekalska E, Tax DMJ. PR-Tools, A MATLAB Toolbox for Pattern Recognition. http://www.prtools.org, 2004.
  • Chen Z, Haykin S. On different facts of regularization theory. Neural Comput 2002; 14: 2791–2846.
  • Mahdi RN, Rouchka EC. Reduced HyperBF Networks: regularization by explicit complexity reduction and scaled Rprop-based training. IEEE T Neural Networ 2011; 22: 673–686.
  • Tiantian X, Yu H, Hewlett J, Rozycki P, Wilamowski B. Fast and efficient second-order method for training radial basis function networks. IEEE T Neural Networ 2012; 23: 609–619.
  • Karayiannis NB. Reformulated radial basis neural networks trained by gradient descent. IEEE T Neural Networ 1999; 10: 657–671.
  • Blake CL, Merz CJ. UCI Repository of Machine Learning Databases. Irvine, CA, USA: Department of Information and Computer Science, University of California, Irvine, 1998. http://archive.ics.uci.edu/ml/data sets.html.
  • Duda RO, Hart PE, Stork DG. Pattern Classification. 2nd ed. New York, NY, USA: Wiley, 2000.
  • Chang CC, Lin CJ. LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2011; 2: 1–27.
  • Gunn SR. Support Vector Machines for Classification and Regression. Technical Report. Southampton, UK: Uni- versity of Southampton, 1998.
  • Chena HPC, Lee KY, Lee TJ, Lee YJ, Huang SY. Multiclass support vector classi?cation via coding and regression. Neurocomputing 2010; 73: 1501–1512.