Berna ALTINEL, Muhammed Ali DOĞAN, Onur Can YÜCEDAĞ, Murat Can GANİZ, Erencan ERKAYA, Bilge ŞİPAL

Word sense disambiguation using semantic kernels with class-based term values

In this study, we propose several semantic kernels for word sense disambiguation (WSD). Our approachesadapt the intuition that class-based term values help in resolving ambiguity of polysemous words in WSD. We evaluateour proposed approaches with experiments, utilizing various sizes of training sets of disambiguated corpora (SensEval1). With these experiments we try to answer the following questions: 1.) Do our semantic kernel formulations yield higher classification performance than traditional linear kernel?, 2.) Under which conditions a kernel design performs better thanothers?, 3.) Does the addition of class labels into standard term-document matrix improve the classification accuracy?,4.) Is their combination superior to either type?, 5.) Is ensemble of these kernels perform better than the baseline?,6.) What is the effect of training set size? Our experiments demonstrate that our kernel-based WSD algorithms canoutperform baseline in terms of F-score.

PDF

___

[1] Joshi M. Kernel methods for word sense disambiguation and abbreviation expansion in the medical domain. MSc, University of Minnesota, MN, USA, 2006.
[2] Shawe-Taylor J, Nello C. Kernel Methods for Pattern Analysis. London, UK: Cambridge University Press, 2004.
[3] Giuliano C, Gliozzo A, Strapparava C. Kernel methods for minimally supervised WSD. Computational Linguistics 2009; 35 (4): 513-528. doi: 10.1162/coli.2009.35.4.35407
[4] Magnini B, Strapparava C, Pezzulo G, Gliozzo A. The role of domain information in word sense disambiguation. Natural Language Engineering 2002; 8 (4): 359-373. doi: 10.1017/S1351324902003029
[5] Cancedda N, Gaussier E, Goutte C, Renders J-M. Word-sequences kernels. Journal of Machine Learning Research 2003; 3 (2): 1059-1082. doi: 10.14257/jmlrs.2003.10.1.08
[6] Bloehdorn S, Hotho A. Boosting for text classification with semantic features. In: ACM 2004 International Conference on Knowledge Discovery and Data Mining; WA, USA; 2004. pp. 70-87.
[7] Scott S, Matwin S. Feature engineering for text classification. In: Morgan Kaufmann 1999 International Conference on Machine Learning; Bled, Slovenia; 1999. pp. 379-388.
[8] Agirre E, Rigau G. A proposal for word sense disambiguation using conceptual distance. In: John Benjamins Bv 1995 Recent Advances in NLP; Tzigov Chark, Bulgaria; 1995. pp. 258-264.
[9] Banerjee S, Pedersen T. Extended gloss overlaps as a measure of semantic relatedness. In: Morgan Kaufmann 2003 International Joint Conference on Artificial Intelligence; Acapulco, Mexico; 2003. pp. 805-810.
[10] Moschitti A. A study on convolution kernels for shallow statistic parsing. In: ACL 2004 Annual Meeting of the Association for Computing Linguist; Barcelona, Spain; 2004. pp. 335-342.
[11] Zhao S, Grishman R. Extracting relations with integrated information using kernel methods. In: ACL 2005 Annual Meeting of the Association for Computing Linguist; Ann Arbor, MI, USA; 2005. pp. 419-426.
[12] Cristianini N, Shawe-Taylor J, Lodhi H. Latent semantic kernels. Journal of Intelligent Information Systems 2004; 18 (2-3): 127-152. doi: 10.1023/A:1013625426931
[13] Miller G, Beckwith R, Fellbaum C, Gross D, Miller K. Five Papers of WordNet. Princeton, NY, USA: Cognitive Science Laboratory Press, 1993.
[14] Altınel B, Diri B, Ganiz MC. A novel semantic smoothing kernel for text classification with class-based weighting. Knowledge-Based Systems 2015; 89 (11): 265-277. doi: 10.1016/j.knosys.2015.07.008
[15] Altınel B, Ganiz MC, Diri B. Instance labeling in semi-supervised learning with meaning values of words. Engineering Applications of Artificial Intelligence 2017; 62 (6): 152-163. doi: 10.1016/j.engappai.2017.04.003
[16] Biricik G, Diri B, Sönmez AC. Abstract feature extraction for text classification. Turkish Jornal of Electrical Engineering & Computer Sciences 2012; 20 (1): 1137-1159. doi:10.3906/elk-1102-1015
[17] Lan M, Tan CL, Su J, Lu Y. Supervised and traditional term weighting methods for automatic text categorization. Transactions on Pattern Analysis and Machine Intelligence 2008; 31 (4): 721-735. doi: 10.1109/TPAMI.2008.110
[18] Bloehdorn S, Moschitti A. Combined syntactic and semantic kernels for text classification. In: Springer 2007 Advances in Information Retrieval Conference; Rome, Italy; 2007. pp. 307-318.
[19] Bloehdorn S, Basili R, Cammisa M, Moschitti A. Semantic kernels for text classification based on topological measures of feature similarity. In: IEEE 2006 International Conference on Data Mining; Hong Kong, China; 2006. pp. 808-812.
[20] Nasir JA, Karim A, Tsatsaronis G, Varlamis I. A knowledge-based semantic kernel for text classification. In: Springer 2011 International Conference on String Processing and Information Retrieval; Pisa, Italy; 2011. pp. 261-266.
[21] Nasir JA, Varlamis I, Karim A, Tsatsaronis G. Semantic smoothing for text clustering. Knowledge-Based Systems 2013; 54 (12): 216-229. doi: 10.1016/j.knosys.2013.09.012
[22] Siolas G, d’Alché-Buc F. Support vector machines based on a semantic kernel for text categorization. In: IEEE 2000 International Joint Conference on Neural Networks; Como, Italy; 2000. pp. 205-209.
[23] Wang P, Domeniconi C. Building semantic kernels for text classification using wikipedia. In: ACM 2008 International Conference on Knowledge Discovery and Data Mining; Las Vegas, NV, USA; 2008. pp. 713-721.
[24] Altınel B, Ganiz MC, Diri B. A novel higher-order semantic kernel for text classification. In: IEEE 2013 International Conference on Electronics, Computer and Computation; Ankara, Turkey; 2013. pp. 216-219.
[25] Altınel B, Ganiz MC, Diri B. A semantic kernel for text classification based on iterative higher order relations between words and documents. In: Springer 2014 International Conference on Artificial Intelligence and Soft Computing; Zakopane, Poland; 2014. pp. 505-517.
[26] Altınel B, Ganiz MC, Diri B. A simple semantic kernel approach for SVM using higher-order paths. In: IEEE 2014 International Symposium on Innovations in Intelligent Systems and Applications; Alberobello, Italy; 2014. pp. 431-435.
[27] Wang T, Rao J, Hu Q. Supervised word sense disambiguation using semantic diffusion kernel. Engineering Applications of Artificial Intelligence 2014; 27 (1): 167-174. doi: 10.1016/j.engappai.2013.08.007
[28] Wang T, Li W, Liu F, Hua J. Sprinkled semantic diffusion kernel for word sense disambiguation. Engineering Applications of Artificial Intelligence 2017; 64 (9): 43-51. doi: 10.1016/j.engappai.2017.05.010
[29] Chakraborti S, Lothian R, Wiratunga N, Watt S. Sprinkling: supervised latent semantic indexing. In: Springer 2006 European Conference on IR Research; London, UK; 2006. pp. 510-514.
[30] Valentini G, Masulli F. Ensembles of learning machines. In: Springer 2002 Italian Workshop on Neural Nets; Vietri sul Mare, Italy; 2002. pp. 3-20.
[31] Freund Y, Schapire RE. Experiments with a new boosting algorithm. In:Morgan Kaufmann 1996 International Conference on Machine Learning; Bari, Italy; 1996. pp. 148-156.
[32] Bauer E, Kohavi R. An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Machine Learning 1999; 36 (1-2): 105-139. doi: 10.1023/A:1007515423169
[33] Dietterich TG. An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Machine Learning 2000; 40 (2): 139-157. doi: 10.1023/A:1007607513941
[34] Kittler J, Hatef M, Duin RPW, Matas J. On combining classifiers. Transactions on Pattern Analysis and Machine Intelligence 1998; 20 (3): 226-239. doi: 10.1109/34.667881
[35] Allwein EL, Schapire RE, Singer Y. Reducing multiclass to binary: A unifying approach for margin classifiers. Journal of Machine Learning Research 2000; 1 (12): 113-141. doi: 10.1162/15324430152733133
[36] Freund Y, Schapire RE. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences 1997; 55 (9): 119-139. doi: 10.1006/jcss.1997.1504
[37] Polikar R. Ensemble based systems in decision-making. Circuits and Systems Magazine 2006; 6 (3): 21-45. doi: 10.1109/MCAS.2006.1688199
[38] Kimura F, Shridhar M. Handwritten numerical recognition based on multiple algorithms. Pattern Recognition 1991; 24 (10): 969-983. doi: 10.1016/0031-3203(91)90094-L
[39] Perrone MP, Cooper LN. When networks disagree: ensemble methods for hybrid neural networks. In: Mammone R(editor). Artificial Neural Networks for Speech and Vision. NY, USA: Chapman & Hall Press, 1993, pp. 126-142
[40] Suen CY, Lam L. Multiple classifier combination methodologies for different output levels. In: Springer 2000 Multiple Classifier Systems; Cagliari, Italy; 2000. pp. 52-66.
[41] Cho SB, Kim JH. Combining multiple neural networks by fuzzy integral for robust classification. Transactions on Systems, Man, and Cybernetics 1995; 25 (2): 380-384. doi: 10.1109/21.364825
[42] Breiman L. Bagging predictors. Machine Learning 1996; 24 (2): 123-140. doi: 10.1007/BF00058655
[43] Magnini B, Cavaglia G. Integrating subject field codes into WordNet. In: IEEE 2000 International Conference on Language Re-sources and Evaluation; Athens, Greece; 2000. pp. 1413-1418.
[44] Gliozzo A, Giuliano C, Strapparava C. Domain kernels for word sense disambiguation. In: ACL 2005 Annual Meeting of the Association for Comput. Linguist; Ann Arbor, MN, USA; 2005. pp. 403-410.
[45] Bruce R, Wiebe J. A new approach to word sense disambiguation. In: ACL 1994 ARPA Workshop on Human Language Technology; Plainsboro, NJ, USA; 1994. pp. 244-249
[46] Yarowsky D. One sense per collocation. In: ACL 1993 Workshop on Human Language Technology; NJ, USA; 1993: pp. 266-271.
[47] Mavroeidis D, Tsatsaronis G, Vazirgiannis M, Theobald M, Weikum G. Word Sense Disambiguation for Exploiting Hierarchical Thesauri in Text Classification. In: Jorge AM, Torgo L, Brazdil P, Camacho R, Gama J, editors. Berlin, Germany: Lecture Notes in Computer Science book series press, 1995, pp. 181-192.
[48] Alpaydın E. Introduction to Machine Learning. Cambridge, MA, USA: MIT press, 2004.
[49] Leacock C, Miller GA, Chodorow M. Using corpus statistics and WordNet relations for sense identification. Computational Linguistics 1998; 24 (1): 147-165. doi: 10.3115/1075812.1075867