A single pairwise model for classi cation using online learning with kernels

A single pairwise model for classi cation using online learning with kernels

Any binary or multi-class classiffcation problem can be transformedinto a pairwise prediction problem. This expands the data and bringsan advantage of learning from a richer set of examples, in the expenseof increasing costs when the data is in higher dimensions. Therefore,this study proposes to adopt an online support vector machine to workwith pairs of examples. This modiffed algorithm is suitable for largedata sets due to its online nature and it can also handle the sparsitystructure existing in the data. Performances of the pairwise setting andthe direct setting are compared in two problems from different domains.Results indicate that the pairwise setting outperforms the direct settingsigniffcantly. Furthermore, a general framework is designed to use thispairwise approach in a multi-class classiffcation task. Result indicatethat this single pairwise model achieved competitive classiffcation rateseven in large-scaled datasets with higher dimensionality.

___

  • Rifkin, R. and Klautau, A. (2004). In defense of one-vs-all classiffcation. Journal of machine learning research, volume 5, pages 101-141.
  • Xu, W. (2011). Towards optimal one pass large scale learning with averaged stochastic gradient descent. arXiv preprint arXiv:1107.2490.
  • Weston, J. and Watkins, C. (1999). Support vector machines for multi-class pattern recognition. In Proceedings of the seventh European symposium on artiffcial neural networks, volume 4, pages 219-224.
  • von Mering, C., Krause, R., Snel, B., Cornell, M., Oliver, S. G., Fields, S., and Bork, P. (2002). Comparative assessment of large-scale data sets of proteinprotein interactions. Nature, 417(6887), 399-403.
  • Vert, J.-P., Qiu, J., and Noble, W. (2007). A new pairwise kernel for biological network inference with support vector machines. BMC Bioinformatics, 8(Suppl 10), S8.
  • Vert, J.-P. and Kanehisa, M. (2003). Graph-driven features extraction from microarray data using diffusion kernels and kernel cca. Advances in Neural Information Processing System, 5, 1425-1432.
  • Shalev-Shwartz, S., Singer, Y., and Srebro, N. (2007). Pegasos: Primal estimated subgradient solver for svm. In Proceedings of the 24th international conference on Machine learning, pages 807-814. ACM.
  • Schölkopf, B. and Smola, A. (2001). Learning with kernels: Support vector machines, regularization, optimization, and beyond. MIT press.
  • Rosenblatt, F. (1958). The perceptron: a probabilistic model for information storage and organization in the brain. Psychological review, 65(6), 386.
  • Platt, J. C. (1999). 12 fast training of support vector machines using sequential minimal optimization.
  • Oyama, S. and Manning, C. D. (2004). Using feature conjunctions across examples for learning pairwise classiffers. In 15th European Conference on Machine Learning (ECML2004). URL http://ilpubs.stanford.edu:8090/767/.
  • Minsky, M. and Seymour, P. (1969). Perceptrons.
  • Mayoraz, E. and Alpaydin, E. (1999). Support vector machines for multi-class classiffcation. Engineering Applications of Bio-Inspired Artiffcial Neural Networks, pages 833-842.
  • Li, Y. and Long, P. (2002). The relaxed online maximum margin algorithm. Machine Learning, 46(1), 361-387.
  • Kashima, H., Oyama, S., Yamanishi, Y., and Tsuda, K. (2009). On Pairwise Kernels: An Effcient Alternative and Generalization Analysis. In Proceedings of the 13th Paciffc-Asia Conference on Advances in Knowledge Discovery and Data Mining, PAKDD '09, pages 1030-1037, Berlin, Heidelberg, 2009. Springer-Verlag. ISBN 978-3-642-01306-5. URL http: //dx.doi.org/10.1007/978-3-642-01307-2_110.
  • Hastie, T., Tibshirani, R., and Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, 2 edition. ISBN 0387927091. URL http: //www-stat.stanford.edu/~tibs/ElemStatLearn/.
  • Gentile, C. (2002). A new approximate maximal margin classication algorithm. J. Mach. Learn. Res., 2, 213242. ISSN 1532-4435. URL http://portal.acm.org/citation.cfm?id= 944790.944811.
  • Friedman, J. (1996). Another approach to polychotomous classifcation. Technical report, Technical report, Stanford University, Department of Statistics.
  • Freund, Y. and Schapire, R. (1999). Large margin classication using the perceptron algorithm. Machine learning, 37(3), 277-296.
  • Francois, D., Wertz, V., and Verleysen, M. (2005). About the locality of kernels in highdimensional spaces. In Proceedings of ASMDA 2005, International Symposium on Applied Stochastic Models and Data Analysis, pages 238245. URL http://hdl.handle.net/2078. 1/93830.
  • Dietterich, T. and Bakiri, G. (1995). Solving multiclass learning problems via errorcorrecting output codes. Journal of Articial Intelligence Research, 2(263), 286.
  • Bottou, L. and LeCun, Y. (2004). Large scale online learning. In Thrun, S., Saul, L., and Schölkopf, B., editors, Advances in Neural Information Processing Systems 16. MIT Press, Cambridge, MA. URL http://leon.bottou.org/papers/bottou-lecun-2004.
  • Boser, B., Guyon, I., and Vapnik, V. (1992). A training algorithm for optimal margin classiffers. In Proceedings of the fith annual workshop on Computational learning theory, pages 144-152. ACM.
  • Bordes, A., Ertekin, S., Weston, J., and Bottou, L. (2005). Fast kernel classiffers with online and active learning. Journal of Machine Learning Research, 6, 1579-1619.
  • Ben-Hur, A. and Noble, W. (2005). Kernel methods for predicting protein-protein interactions. Bioinformatics, 21(suppl 1), i38-i46.
  • Basilico, J. and Hofmann, T. (2004). Unifying collaborative and content-based filtering. In Proceedings of the twenty-first international conference on Machine learning, ICML '04, pages 9-, New York, NY, USA, 2004. ACM. ISBN 1-58113-838-5. doi: 10.1145/1015330. 1015394. URL http://doi.acm.org/10.1145/1015330.1015394.
  • Anlauf, J. and Biehl, M. (2007). The adatron: an adaptive perceptron algorithm. EPL (Europhysics Letters), 10(7), 687.