Veri Artırımı için Yarı-Denetimli Bağlamsal Anlam Belirsizliği Giderme

Yapay zekâ alanında son dönemlerde öne çıkan derin öğrenme mimarilerinin, doğal dil işleme konusunun önemli problemlerinden biri olan Anlam Belirsizliği Giderme (ABG) çalışmalarında kayda değer gelişmelere yol açtığı gözlemlenmektedir. Denetimli yöntemler rakiplerine göre daha yüksek performans sergilemektedirler. Bunun en büyük nedeni kullanılan eğitim verilerinin büyüklükleridir. ABG problemi için İngilizce dili üzerinde elle-etiketlenmiş çok miktarda veri çevrim içi olarak erişilebilir durumdadır. Ancak düşük-kaynaklı diller (DKD’ler) probleme uygun veri eksikliği yaşamaktadırlar. Yeterli derecede probleme uygun veri toplamak ve etiketlemek vakit alıcı ve yüksek maliyet gerektiren bir iştir. Bu probleme değinmek ve aşmak üzere, bu çalışmada yarı-denetimli bağlamsal anlam belirsizliği giderme yaklaşımının veri artırımı için (daha sonra denetimli öğrenmede eğitim verisi olarak kullanılmak üzere) kullanılabileceğinin gösterilmesi amaçlanmıştır. Bu bağlamda özellikle DKD’lerde ABG problemi için test verisi bulmanın zor olması nedeniyle yaklaşımın doğruluğunu ve ilerleyen dönemlerde DKD’lerde kullanılabilirliğini ispatlamak amacıyla çevrimiçi bulunan elle-etiketlenmiş İngilizce ABG verisi kullanılmıştır. Oluşturulan yarı-denetimli yöntemde öbek kümesi (seed set) ve bağlam vektörleri (context embeddings) kullanılmaktadır. Yapılan çalışma 9 farklı bağlamsal dil modelinde (ELMo, BERT, RoBERTa vb.) test edilmiş ve her bir dil modelinin ABG problemi üzerindeki etkileri raporlanmıştır. İlk temel yaklaşıma göre sonuçlar üzerinde %28 doğruluk oranında performans artışı sağlanmıştır. (ELMo ile ilk temel yaklaşım ile %50,39 ve ELMo Anlam Öbek Esaslı Ortalama Benzerlik Modeli ile %78,06). Alınan ilk sonuçlara neticesinde, önerilen yaklaşımın özellikle DKD’ler yönelik ABG veri kümesi oluşturmak için gelecek vaat eden ettiği gösterilmiştir. Bu makale [18]’deki çalışmamızın genişletilmiş bir versiyonudur.

Semi-Supervised Contextual Word Sense Disambiguation for Data Augmentation

Recently, neural architectures play a significant role in the task of Word Sense Disambiguation (WSD). Supervised methods seem to be ahead of its rivals and their performance mostly depends on the size of training data. A numerous number of human-annotated data available for WSD task have been constructed for English. However, low-resource languages (LRLs) still face difficulty in finding suitable data resources. Gathering and annotating a sufficient amount of training data is a time consuming and labor-expensive work. To address and overcome this problem, in this paper we investigate the possibility of using a semi-supervised context based WSD approach for data augmentation (in order to be later used for supervised learning). Since, it is even difficult to find WSD evaluation datasets for LRLs, in this study, we use English datasets to build a proof-of concept and to evaluate their applicability onto LRLs. Our semi-supervised approach uses a seed set and context embeddings. We test with 9 different context based language models (including ELMo, BERT, RoBERTa etc.) and investigate their impacts on WSD. We increased our baseline results up to 28 percentage point improvements (baseline with ELMo 50.39% and ELMo Sense Seed Based Average Similarity Model 78.06%) in terms of accuracy. Our initial findings reveal that the proposed approach is very promising for the augmentation of WSD datasets of LRLs. This study is an extention version of the work from [18].

___

  • Lesk, M. Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone, in Proceedings of the 5th annual international conference on Systems documentation. ACM, 1986, pp. 24 26.
  • Agirre, E. ve Soroa, A. Personalizing pagerank for word sense disambiguation, in Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, 2009, pp. 33–41.
  • Yarowsky, D. Unsupervised word sense disambiguation rivaling supervised methods, in 33rd annual meeting of the association for computational linguistics, 1995, pp. 189–196.
  • Zhong, Z. ve Ng, H. T. It makes sense: A wide-coverage word sense disambiguation system for free text, in Proceedings of the ACL 2010 system demonstrations, 2010, pp. 78–83.
  • Iacobacci, I., Pilehvar, M. T. ve Navigli, R. Embeddings for word sense disambiguation: An evaluation study, in Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, 2016, pp. 897–907.
  • Raganato, A., Bovi, C. D. ve Navigli, R. Neural sequence learning models for word sense disambiguation, in Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, 2017, pp. 1156–1167.
  • Luo, F., Liu, T., Xia, Q., Chang, B. ve Sui, Z. Incorporating glosses into neural word sense disambiguation, arXiv preprint arXiv:1805.08028, 2018.
  • Miller, G. A., Chodorow, M., Landes, S., Leacock, C. Ve Thomas, R. G. Using a semantic concordance for sense identification, in Proceedings of the workshop on Human Language Technology. Association for Computational Linguistics, 1994, pp. 240–243.
  • Taghipour, K. ve Ng, H. T. One million sense-tagged instances for word sense disambiguation and induction, in Proceedings of the nineteenth conference on computational natural language learning, 2015, pp. 338–344.
  • Jaitly, N. ve Hinton, G. E. Vocal tract length perturbation (vtlp) improves speech recognition, in Proc. ICML Workshop on Deep Learning for Audio, Speech and Language, vol. 117, 2013.
  • Ko, T., Peddinti, V., Povey, D. Ve Khudanpur, S. Audio augmentation for speech recognition” in Sixteenth Annual Conference of the International Speech Communication Association, 2015.
  • Simard, P. Y., LeCun, Y. A., Denker, J. S. ve Victorri, B. Transformation invariance in pattern recognition?tangent distance and tangent propagation, in Neural networks: tricks of the trade. Springer, 1998, pp. 239–274.
  • Krizhevsky, A., Sutskever, I. ve Hinton, G. E. Imagenet classification with deep convolutional neural networks, in Advances in neural information processing systems, 2012, pp. 1097–1105.
  • Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V. ve Rabinovich, A. Going deeper with convolutions, in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 1–9.
  • Zhang, X., Zhao, J. ve LeCun, Y. Character-level convolutional networks for text classification, in Advances in neural information processing systems, 2015, pp. 649–657.
  • Miller, G. A. Wordnet: a lexical database for English, Communications of the ACM, vol. 38, no. 11, pp. 39–41, 1995.
  • Wang, W. Y. Ve Yang, D. That?s so annoying!!!: A lexical and framesemantic embedding based data augmentation approach to automatic categorization of annoying behaviors using# petpeeve tweets, in Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 2015, pp. 2557–2563.
  • Torunoğlu-Selamet, D., İnceoğlu, A. ve Eryiğit, G. Preliminary Investigation on Using Semi-Supervised Contextual Word Sense Disambiguation for Data Augmentation, 2020 5th International Conference on Computer Science and Engineering (UBMK), Diyarbakır, Turkey, 2020, pp. 337-342, doi: 10.1109/UBMK50275.2020.9219389.
  • Kim, Y. ve Rush, A. M. Sequence-level knowledge distillation,” arXiv preprint arXiv:1606.07947, 2016.
  • Sennrich, R., Haddow, B. ve Birch, A. Improving neural machine translation models with monolingual data, arXiv preprint arXiv:1511.06709, 2015.
  • Xia, Y., Qin, T., Chen, W., Bian, J., Yu, N. ve Liu, T.-Y. Dual supervised learning, in Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, 2017, pp. 3789–3798.
  • Bergmanis, T., Kann, K., Schutze, H. ve Goldwater, S. Training data augmentation for low-resource morphological inflection, in Proceedings of the CoNLL SIGMORPHON 2017 Shared Task: Universal Morphological Reinflection, 2017, pp. 31–39.
  • Xu, W., Sun, H., Deng, C. ve Tan, Y. Variational autoencoder for semi-supervised text classification, in Thirty-First AAAI Conference on Artificial Intelligence, 2017.
  • Hu, Z., Yang, Z., Liang, X., Salakhutdinov, R. ve Xing, E. P. Toward controlled generation of text, in Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR.org, 2017, pp. 1587–1596.
  • Jia, R. ve Liang, P. Data recombination for neural semantic parsing, arXiv preprint arXiv:1606.03622, 2016.
  • Furstenau, H. ve Lapata, M. Semi-supervised semantic role labeling, in Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, 2009, pp. 220–228.
  • Kafle, K., Yousefhussien, M. ve Kanan, C. Data augmentation for visual question answering, in Proceedings of the 10th International Conference on Natural Language Generation, 2017, pp. 198–202.
  • Silfverberg, M., Wiemerslage, A., Liu, L. ve Mao, L.J. Data augmentation for morphological reinflection, Proceedings of the CoNLL SIGMORPHON 2017 Shared Task: Universal Morphological Reinflection, pp. 90–99, 2017.
  • Melamud, O., Goldberger, J. ve Dagan, I. context2vec: Learning generic context embedding with bidirectional lstm, in Proceedings of the 20th SIGNLL conference on computational natural language learning, 2016, pp. 51–61.
  • Peters, M. E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K. ve Zettlemoyer, L. Deep contextualized word representations, arXiv preprint arXiv:1802.05365, 2018.
  • Kobayashi, S., Tian, R., Okazaki, N. ve Inui, K. Dynamic entity representation with max-pooling improves machine reading, in Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2016, pp. 850–855.
  • Kobayashi, S., Okazaki, N. ve Inui, K. A neural language model for dynamically representing the meanings of unknown words and entities in a discourse, arXiv preprint arXiv:1709.01679, 2017.
  • Kobayashi, S. Contextual augmentation: Data augmentation by words with paradigmatic relations, arXiv preprint arXiv:1805.06201, 2018.
  • Kolomiyets, O., Bethard, S. ve Moens, M.-F. Model-portability experiments for textual temporal analysis, in Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers-Volume 2. Association for Computational Linguistics, 2011, pp. 271–276.
  • Fadaee, M., Bisazza, A. ve Monz, C. Data augmentation for lowresource neural machine translation, arXiv preprint arXiv:1705.00440, 2017.
  • Lala, C., Madhyastha, P. S., Scarton, C. ve Specia, L. Sheffield submissions for wmt18 multimodal translation shared task, in Proceedings of the Third Conference on Machine Translation: Shared Task Papers, 2018, pp. 624–631.
  • Devlin, J., Chang, M.-W., Lee, K. ve Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding, arXiv preprint arXiv:1810.04805, 2018.
  • Radford, A., Narasimhan, K., Salimans, T. ve Sutskever, I. Improving language understanding by generative pretraining, URL https://s3-us-west-2. amazonaws.com/openaiassets/researchcovers/languageunsupervised/languageunderstanding paper. pdf, 2018.
  • Radford, A., Wu, J., Child, R., Luan, D., Amodei, D. ve Sutskever, I. Language models are unsupervised multitask learners, OpenAI Blog, vol. 1, no. 8, 2019.
  • Dai, Z., Yang, Z., Yang, Y., Cohen, W. W., Carbonell, J., Le, Q. V. ve Salakhutdinov, R. Transformer-xl: Attentive language models beyond a fixed-length context, arXiv preprint arXiv:1901.02860, 2019.
  • Yang, Z., Dai, Z., Yang, Y. J. Carbonell, R. Salakhutdinov, ve Q. V. Le, Xlnet: Generalized autoregressive pretraining for language understanding, arXiv preprint arXiv:1906.08237, 2019
  • Lample, G. ve Conneau, A. Cross-lingual language model pretraining, arXiv preprint arXiv:1901.07291, 2019.
  • Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L. ve Stoyanov, V. Roberta: A robustly optimized bert pretraining approach, arXiv preprint arXiv:1907.11692, 2019.
  • Sanh, V. , Debut, L., Chaumond, J. ve Wolf, T. Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter, arXiv preprint arXiv:1910.01108, 2019.
  • Clark, K., Khandelwal, U., Levy, O. ve Manning, C. D. What does bert look at? an analysis of bert’s attention” arXiv preprint arXiv:1906.04341, 2019.