EXPLORING THE EFFECT OF BAG-OF-WORDS AND BAG-OF-BIGRAM FEATURES ON TURKISH WORD SENSE DISAMBIGUATION

EXPLORING THE EFFECT OF BAG-OF-WORDS AND BAG-OF-BIGRAM FEATURES ON TURKISH WORD SENSE DISAMBIGUATION

Feature selection in Word Sense Disambiguation (WSD) is as important as the selection of algorithm to remove sense ambiguity. Bag-of-word (BoW) features comprise the information of neighbors around the ambiguous target word without considering any relation between words. In this study, we investigate the effect of BoW features and Bag-of-bigrams (BoB) on Turkish WSD and compare the results with the collocational features. The results suggest that BoW features yield better accuracy for all the cases. According to the comparison results, collocational features are more effective than both BoW and the BoB features on disambiguation of word senses.

___

  • 1. Zhou, X. and H. Han. Survey of Word Sense Disambiguation Approaches. in FLAIRS Conference. 2005.
  • 2. Orhan, Z. and Z. Altan. Effective Features for Disambiguation of Turkish Verbs. in IEC (Prague). 2005.
  • 3. ORHAN, Z. and Z. Altan, Determining Effective Features for Word Sense Disambiguation in Turkish. IU-Journal of Electrical & Electronics Engineering, 2011. 5(2): p. 1341-1352.
  • 4. Agirre, E., O.L. de Lacalle, and D. Martınez. Exploring feature spaces with svd and unlabeled data for Word Sense Disambiguation. in Proceedings of the Conference on Recent Advances on Natural Language Processing (RANLP’05). 2005.
  • 5. Turdakov, D.Y., Word sense disambiguation methods. Programming and Computer Software, 2010. 36(6): p. 309-326.
  • 6. Suárez, A. and M. Palomar, Feature selection analysis for maximum entropybased wsd, in Computational Linguistics and Intelligent Text Processing. 2002, Springer. p. 146-155.
  • 7. Dang, H.T., et al. Simple features for Chinese word sense disambiguation. in Proceedings of the 19th international conference on Computational linguisticsVolume 1. 2002. Association for Computational Linguistics.
  • 8. Dang, H.T. and M. Palmer. Combining contextual features for word sense disambiguation. in Proceedings of the ACL-02 workshop on Word sense disambiguation: recent successes and future directions-Volume 8. 2002. Association for Computational Linguistics.
  • 9. Agirre, E., O.L. de Lacalle, and D. Martínez. Exploring feature set combinations for WSD. in Proc. of the SEPLN. 2006.
  • 10. Ilgen, B., E. Adali, and A. Tantug. The impact of collocational features in Turkish Word Sense Disambiguation. in Intelligent Engineering Systems (INES), 2012 IEEE 16th International Conference on. 2012. IEEE.
  • 11. Oflazer, K., Two-level description of Turkish morphology. Literary and linguistic computing, 1994. 9(2): p. 137- 148.
  • 12. Yuret, D. and F. Türe. Learning morphological disambiguation rules for Turkish. in Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics. 2006. Association for Computational Linguistics.
  • 13. Göz, İ., Yazılı türkçenin kelime sıklığı sözlüğü. Vol. 823. 2003: Türk Dil Kurumu.
  • 14. Sözlük, G.T., Türk Dil Kurumu,[çevrimiçi]. Elektronik adres: http://tdk. org. tr/tdksozluk/sozbul. ASP, 2005.