Word sense disambiguation using semantic kernels with class-based term values
Word sense disambiguation using semantic kernels with class-based term values
In this study, we propose several semantic kernels for word sense disambiguation (WSD). Our approachesadapt the intuition that class-based term values help in resolving ambiguity of polysemous words in WSD. We evaluateour proposed approaches with experiments, utilizing various sizes of training sets of disambiguated corpora (SensEval1). With these experiments we try to answer the following questions: 1.) Do our semantic kernel formulations yield higher classification performance than traditional linear kernel?, 2.) Under which conditions a kernel design performs better thanothers?, 3.) Does the addition of class labels into standard term-document matrix improve the classification accuracy?,4.) Is their combination superior to either type?, 5.) Is ensemble of these kernels perform better than the baseline?,6.) What is the effect of training set size? Our experiments demonstrate that our kernel-based WSD algorithms canoutperform baseline in terms of F-score.
___
- [1] Joshi M. Kernel methods for word sense disambiguation and abbreviation expansion in the medical domain. MSc,
University of Minnesota, MN, USA, 2006.
- [2] Shawe-Taylor J, Nello C. Kernel Methods for Pattern Analysis. London, UK: Cambridge University Press, 2004.
- [3] Giuliano C, Gliozzo A, Strapparava C. Kernel methods for minimally supervised WSD. Computational Linguistics
2009; 35 (4): 513-528. doi: 10.1162/coli.2009.35.4.35407
- [4] Magnini B, Strapparava C, Pezzulo G, Gliozzo A. The role of domain information in word sense disambiguation.
Natural Language Engineering 2002; 8 (4): 359-373. doi: 10.1017/S1351324902003029
- [5] Cancedda N, Gaussier E, Goutte C, Renders J-M. Word-sequences kernels. Journal of Machine Learning Research
2003; 3 (2): 1059-1082. doi: 10.14257/jmlrs.2003.10.1.08
- [6] Bloehdorn S, Hotho A. Boosting for text classification with semantic features. In: ACM 2004 International Conference on Knowledge Discovery and Data Mining; WA, USA; 2004. pp. 70-87.
- [7] Scott S, Matwin S. Feature engineering for text classification. In: Morgan Kaufmann 1999 International Conference
on Machine Learning; Bled, Slovenia; 1999. pp. 379-388.
- [8] Agirre E, Rigau G. A proposal for word sense disambiguation using conceptual distance. In: John Benjamins Bv
1995 Recent Advances in NLP; Tzigov Chark, Bulgaria; 1995. pp. 258-264.
- [9] Banerjee S, Pedersen T. Extended gloss overlaps as a measure of semantic relatedness. In: Morgan Kaufmann 2003
International Joint Conference on Artificial Intelligence; Acapulco, Mexico; 2003. pp. 805-810.
- [10] Moschitti A. A study on convolution kernels for shallow statistic parsing. In: ACL 2004 Annual Meeting of the
Association for Computing Linguist; Barcelona, Spain; 2004. pp. 335-342.
- [11] Zhao S, Grishman R. Extracting relations with integrated information using kernel methods. In: ACL 2005 Annual
Meeting of the Association for Computing Linguist; Ann Arbor, MI, USA; 2005. pp. 419-426.
- [12] Cristianini N, Shawe-Taylor J, Lodhi H. Latent semantic kernels. Journal of Intelligent Information Systems 2004;
18 (2-3): 127-152. doi: 10.1023/A:1013625426931
- [13] Miller G, Beckwith R, Fellbaum C, Gross D, Miller K. Five Papers of WordNet. Princeton, NY, USA: Cognitive
Science Laboratory Press, 1993.
- [14] Altınel B, Diri B, Ganiz MC. A novel semantic smoothing kernel for text classification with class-based weighting.
Knowledge-Based Systems 2015; 89 (11): 265-277. doi: 10.1016/j.knosys.2015.07.008
- [15] Altınel B, Ganiz MC, Diri B. Instance labeling in semi-supervised learning with meaning values of words. Engineering
Applications of Artificial Intelligence 2017; 62 (6): 152-163. doi: 10.1016/j.engappai.2017.04.003
- [16] Biricik G, Diri B, Sönmez AC. Abstract feature extraction for text classification. Turkish Jornal of Electrical
Engineering & Computer Sciences 2012; 20 (1): 1137-1159. doi:10.3906/elk-1102-1015
- [17] Lan M, Tan CL, Su J, Lu Y. Supervised and traditional term weighting methods for automatic text categorization.
Transactions on Pattern Analysis and Machine Intelligence 2008; 31 (4): 721-735. doi: 10.1109/TPAMI.2008.110
- [18] Bloehdorn S, Moschitti A. Combined syntactic and semantic kernels for text classification. In: Springer 2007
Advances in Information Retrieval Conference; Rome, Italy; 2007. pp. 307-318.
- [19] Bloehdorn S, Basili R, Cammisa M, Moschitti A. Semantic kernels for text classification based on topological
measures of feature similarity. In: IEEE 2006 International Conference on Data Mining; Hong Kong, China; 2006.
pp. 808-812.
- [20] Nasir JA, Karim A, Tsatsaronis G, Varlamis I. A knowledge-based semantic kernel for text classification. In: Springer
2011 International Conference on String Processing and Information Retrieval; Pisa, Italy; 2011. pp. 261-266.
- [21] Nasir JA, Varlamis I, Karim A, Tsatsaronis G. Semantic smoothing for text clustering. Knowledge-Based Systems
2013; 54 (12): 216-229. doi: 10.1016/j.knosys.2013.09.012
- [22] Siolas G, d’Alché-Buc F. Support vector machines based on a semantic kernel for text categorization. In: IEEE
2000 International Joint Conference on Neural Networks; Como, Italy; 2000. pp. 205-209.
- [23] Wang P, Domeniconi C. Building semantic kernels for text classification using wikipedia. In: ACM 2008 International
Conference on Knowledge Discovery and Data Mining; Las Vegas, NV, USA; 2008. pp. 713-721.
- [24] Altınel B, Ganiz MC, Diri B. A novel higher-order semantic kernel for text classification. In: IEEE 2013 International
Conference on Electronics, Computer and Computation; Ankara, Turkey; 2013. pp. 216-219.
- [25] Altınel B, Ganiz MC, Diri B. A semantic kernel for text classification based on iterative higher order relations
between words and documents. In: Springer 2014 International Conference on Artificial Intelligence and Soft
Computing; Zakopane, Poland; 2014. pp. 505-517.
- [26] Altınel B, Ganiz MC, Diri B. A simple semantic kernel approach for SVM using higher-order paths. In: IEEE
2014 International Symposium on Innovations in Intelligent Systems and Applications; Alberobello, Italy; 2014. pp.
431-435.
- [27] Wang T, Rao J, Hu Q. Supervised word sense disambiguation using semantic diffusion kernel. Engineering Applications of Artificial Intelligence 2014; 27 (1): 167-174. doi: 10.1016/j.engappai.2013.08.007
- [28] Wang T, Li W, Liu F, Hua J. Sprinkled semantic diffusion kernel for word sense disambiguation. Engineering
Applications of Artificial Intelligence 2017; 64 (9): 43-51. doi: 10.1016/j.engappai.2017.05.010
- [29] Chakraborti S, Lothian R, Wiratunga N, Watt S. Sprinkling: supervised latent semantic indexing. In: Springer
2006 European Conference on IR Research; London, UK; 2006. pp. 510-514.
- [30] Valentini G, Masulli F. Ensembles of learning machines. In: Springer 2002 Italian Workshop on Neural Nets; Vietri
sul Mare, Italy; 2002. pp. 3-20.
- [31] Freund Y, Schapire RE. Experiments with a new boosting algorithm. In:Morgan Kaufmann 1996 International
Conference on Machine Learning; Bari, Italy; 1996. pp. 148-156.
- [32] Bauer E, Kohavi R. An empirical comparison of voting classification algorithms: Bagging, boosting, and variants.
Machine Learning 1999; 36 (1-2): 105-139. doi: 10.1023/A:1007515423169
- [33] Dietterich TG. An experimental comparison of three methods for constructing ensembles of decision trees: Bagging,
boosting, and randomization. Machine Learning 2000; 40 (2): 139-157. doi: 10.1023/A:1007607513941
- [34] Kittler J, Hatef M, Duin RPW, Matas J. On combining classifiers. Transactions on Pattern Analysis and Machine
Intelligence 1998; 20 (3): 226-239. doi: 10.1109/34.667881
- [35] Allwein EL, Schapire RE, Singer Y. Reducing multiclass to binary: A unifying approach for margin classifiers.
Journal of Machine Learning Research 2000; 1 (12): 113-141. doi: 10.1162/15324430152733133
- [36] Freund Y, Schapire RE. A decision-theoretic generalization of on-line learning and an application to boosting.
Journal of Computer and System Sciences 1997; 55 (9): 119-139. doi: 10.1006/jcss.1997.1504
- [37] Polikar R. Ensemble based systems in decision-making. Circuits and Systems Magazine 2006; 6 (3): 21-45. doi:
10.1109/MCAS.2006.1688199
- [38] Kimura F, Shridhar M. Handwritten numerical recognition based on multiple algorithms. Pattern Recognition 1991;
24 (10): 969-983. doi: 10.1016/0031-3203(91)90094-L
- [39] Perrone MP, Cooper LN. When networks disagree: ensemble methods for hybrid neural networks. In: Mammone
R(editor). Artificial Neural Networks for Speech and Vision. NY, USA: Chapman & Hall Press, 1993, pp. 126-142
- [40] Suen CY, Lam L. Multiple classifier combination methodologies for different output levels. In: Springer 2000
Multiple Classifier Systems; Cagliari, Italy; 2000. pp. 52-66.
- [41] Cho SB, Kim JH. Combining multiple neural networks by fuzzy integral for robust classification. Transactions on
Systems, Man, and Cybernetics 1995; 25 (2): 380-384. doi: 10.1109/21.364825
- [42] Breiman L. Bagging predictors. Machine Learning 1996; 24 (2): 123-140. doi: 10.1007/BF00058655
- [43] Magnini B, Cavaglia G. Integrating subject field codes into WordNet. In: IEEE 2000 International Conference on
Language Re-sources and Evaluation; Athens, Greece; 2000. pp. 1413-1418.
- [44] Gliozzo A, Giuliano C, Strapparava C. Domain kernels for word sense disambiguation. In: ACL 2005 Annual
Meeting of the Association for Comput. Linguist; Ann Arbor, MN, USA; 2005. pp. 403-410.
- [45] Bruce R, Wiebe J. A new approach to word sense disambiguation. In: ACL 1994 ARPA Workshop on Human
Language Technology; Plainsboro, NJ, USA; 1994. pp. 244-249
- [46] Yarowsky D. One sense per collocation. In: ACL 1993 Workshop on Human Language Technology; NJ, USA; 1993:
pp. 266-271.
- [47] Mavroeidis D, Tsatsaronis G, Vazirgiannis M, Theobald M, Weikum G. Word Sense Disambiguation for Exploiting
Hierarchical Thesauri in Text Classification. In: Jorge AM, Torgo L, Brazdil P, Camacho R, Gama J, editors. Berlin,
Germany: Lecture Notes in Computer Science book series press, 1995, pp. 181-192.
- [48] Alpaydın E. Introduction to Machine Learning. Cambridge, MA, USA: MIT press, 2004.
- [49] Leacock C, Miller GA, Chodorow M. Using corpus statistics and WordNet relations for sense identification. Computational Linguistics 1998; 24 (1): 147-165. doi: 10.3115/1075812.1075867