Mehmet Erdal BALABAN, Ferhat Özgür ÇATAK

A MapReduce-based distributed SVM algorithm for binary classification

Although the support vector machine (SVM) algorithm has a high generalization property for classifying unseen examples after the training phase and a small loss value, the algorithm is not suitable for real-life classification and regression problems. SVMs cannot solve hundreds of thousands of examples in a training dataset. In previous studies on distributed machine-learning algorithms, the SVM was trained in a costly and preconfigured computer environment. In this research, we present a MapReduce-based distributed parallel SVM training algorithm for binary classification problems. This work shows how to distribute optimization problems over cloud computing systems with the MapReduce technique. In the second step of this work, we used statistical learning theory to find the predictive hypothesis that would minimize the empirical risks from hypothesis spaces that were created with the Reduce function of MapReduce. The results of this research are important for the training of big datasets for SVM algorithm-based classification problems. We provided the iterative training of the split dataset with the MapReduce technique; the accuracy of the classifier function will converge to global optimal classifier function accuracy in finite iteration size. The algorithm performance was measured on samples from letter recognition and pen-based recognition of a handwritten digits dataset.

PDF

___

[1] Bakır GH, Planck M, Bottou L, Weston J. Breaking SVM complexity with cross training. In: Proceedings of the 17th Neural Information Processing Systems Conference; 58 December 2005; Vancouver, BC, Canada. Cambridge, MA, USA: MIT Press. pp. 8188.
[2] Weston J, Mukherjee S, Chapelle O, Pontil M, Poggio T, Vapnik V. Feature selection for SVMs. In: Proceedings of the 14th Annual Neural Information Processing Systems Conference; 14 December 2000; Denver, CO, USA. Cambridge, MA, USA: MIT Press. pp. 668674.
[3] Kullback S, Leibler RA. On information and sufficiency. Ann Math Stat 1951; 22: 7986.
[4] Hall MA. Correlation-based feature selection for machine learning. PhD, University of Waikato, Hamilton, New Zealand, 1999.
[5] Raileanu LE, Stoffel K. Theoretical comparison between the Gini Index and Information Gain criteria. Ann Math Artif Intel 2004; 41: 7793.
[6] Mladeni´c D, Brank J, Grobelnik M, Milic-Frayling N. Feature selection using linear classifier weights: interaction with classification models. In: Proceedings of the 27th Annual International ACM SIGIR Conference; 2529 July 2004; Sheffield, UK. New York, USA: ACM. pp 234241.
[7] Jolliffe IT. Principal Component Analysis. New York, NY, USA: Springer, 2002.
[8] Golub GH, Reinsch C. Singular value decomposition and least squares solutions. Numer Math 1970; 10: 403420.
[9] Common P. Independent component analysis, a new concept? Signal Process 1994; 36: 287314.
[10] Vapnik V. The Nature of Statistical Learning Theory. New York, NY, USA: Springer, 1995.
[11] Graf HP, Cosatto E, Bottou L, Durdanovi´c I, Vapnik V. Parallel support vector machines: the cascade SVM. Adv Neur In 2005; 17: 521528.
[12] Collobert R, Bengio S, Bengio Y. A parallel mixture of SVMs for very large scale problems. Neural Comput 2002; 5: 11051114.
[13] Lu Y, Roychowdhury V, Vandenberghe L. Distributed parallel support vector machines in strongly connected networks. IEEE T Neural Networ 2008; 7: 11671178.
[14] R¨uping S. Incremental learning with support vector machines. In: IEEE 2001 International Conference on Data Mining; 29 November2 December 2001; San Jose, CA, USA. New York, NY, USA: IEEE. pp. 641642.
[15] Syed NA, Huan S, Kah L, Sung K. Incremental learning with support vector machines. In: Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery; 1518 August 1999; San Diego, California, USA. Cambridge, MA, USA: ACM. pp. 317321.
[16] Caragea C, Caragea D, Honavar V. Learning support vector machine classifiers from distributed data sources. In: Proceedings of the 20th National Conference on Artificial Intelligence; 913 July 2005; Pittsburgh, PA, USA. Palo Alto, CA, USA: AAAI. pp. 16021603.
[17] Sun Z, Fox G. Study on parallel SVM based on MapReduce. In: Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications; 1619 July 2012; Las Vegas, NV, USA. Sterling, VA, USA: CSREA. pp. 495561.
[18] Catak FO, Balaban ME. CloudSVM: training an SVM classifier in cloud computing systems. In: ICPCA/SWS 2012 Joint Conference on Pervasive Computing and the Networked World; 2830 November 2012; ˙Istanbul, Turkey. Dordrecht, Germany: Springer. pp. 5768.
[19] Frey PW, Slate DJ. Letter recognition using Holland-style adaptive classifiers. Mach Learn 1991; 2: 161182.
[20] Alimoglu F, Alpaydin E. Methods of combining multiple classifiers based on different representations for penbased handwriting recognition. In: Proceedings of the 4th International Conference on Document Analysis and Recognition; 1820 August 1997; Washington, DC, USA. New York, NY, USA: IEEE. pp. 637640.
[21] Stoyanov V, Ropson A, Eisner J. Empirical risk minimization of graphical model parameters given approximate inference, decoding, and model structure. In: Proceedings of the 14th International Conference on Artificial Intelligence and Statistics; 1113 April 2011; Fort Lauderdale, FL, USA. Chicago, IL, USA: JMLR. pp. 725733.
[22] Vapnik V. An overview of statistical learning theory. IEEE T Neural Netw 1999; 10: 988999.
[23] Mercer J. Functions of positive and negative type and their connection with the theory of integral equations. Philos T R Soc1909; 209: 415446.
[24] Dean J, Ghemawat S. MapReduce: simplified data processing on large clusters. In: Proceedings of the 6th conference on Symposium on Operating Systems Design and Implementation; 68 December 2004; Berkeley, CA, USA. New York, NY, USA: ACM. pp. 107113.
[25] Schatz M. CloudBurst: Highly sensitive read mapping with MapReduce. Bioinformatics 2009; 25: 13631369.
[26] Rosasco L, Vito ED, Caponnetto A, Piana M, Verri A. Are loss functions all the same? Neural Comput 2011; 16: 10631076.