Özel Alanlarda Düşük Kaynaklara Sahip Makine Çevirisinde Çeviri Kalitesi: Türkçeden İngilizceye İstatistiksel ve Nöral Makine Çevirisi Üzerine Ayrıntılı Bir Karşılaştırmalı Çalışma

Derlem tabanlı makine çevirisi (MÇ), son otuz yılda hem akademide hem de endüstride MÇ sistemleri geliştirmek ve uygulamak konusunda ana yaklaşım olmuştur. MÇ motorlarını eğitmek için kullanılan derlemin türü ve boyutu, iki baskın derlem tabanlı yaklaşım olan istatistiksel MÇ (İMÇ) sistemleri ve nöral MÇ (NMÇ) sistemleri için problemler ortaya çıkarmıştır. Ayrıca bu çerçevede Türkçe → İngilizce gibi dil çiftleri üzerinde yeterince çalışma yapılmamıştır. Bu makale, farklı derlem boyutu ve türü üzerinde eğitilmiş Türkçe → İngilizce, özelleştirilmiş MÇ sistemlerinde çeviri kalitesini değerlendirmeyi amaçlamaktadır. İki NMÇ motoru ve iki İMÇ motoru, yalnızca alana özgü kardiyoloji derlemi veya bu derlem artı bir karma alanlı derlem ile iki farklı MÇ eğitme derlemi türü kullanılarak KantanMT platformunda eğitildi. Hem BLEU, F-Measure ve TER gibi metriklerle otomatik değerlendirmeler, hem de akıcılık, A/B testi ve yeterlilik gibi metriklerle kapsamlı bir insan değerlendirmesi yapıldı. Son olarak, kardiyoloji gibi belirli bir alana dayalı metin türleri için çok önemli olduğundan farklı MÇ sistemlerinin terminolojiyi nasıl ele aldığını araştırmak adına ayrı, öznel bir terminoloji değerlendirmesi gerçekleştirildi. Otomatik değerlendirme sonuçları, İMÇ motorlarının NMÇ motorlarından daha iyi performans sergilediğini gösterirken, tüm insan değerlendiriciler, karma alanlı NMÇ motorunu en yüksek performanslı motor olarak değerlendirdi. Yine de terminoloji değerlendirme görevi, endüstri ve akademi NMÇ'ye doğru kaysa da İMÇ'nin yine de daha iyi performans gösterebileceğini ve daha az terminoloji hatası yapabileceğini ortaya koydu.

Anahtar Kelimeler:

Makine çevirisi değerlendirmesi, Türkçeden İngilizceye makine çevirisi, tıp çevirisi, nöral makine çevirisi, istatistiksel makine çevirisi

Translation Quality Regarding Low-Resource, Custom Machine Translations: A Fine-Grained Comparative Study on Turkish-to-English Statistical and Neural Machine Translation Systems

Corpus-based machine translation (MT) has been the main approach to developing and implementing MT systems in both academia and the industry over the last three decades. In this field, the type and size of the corpus used for training MT engines have presented problems for both statistical MT (SMT) systems as well as neural MT (NMT) systems, being the two dominant corpusbased approaches. Moreover, language pairs such as Turkish-English have been understudied within this framework. This article aims to evaluate the translation quality in Turkish-to-English custom MT systems that have been trained on different corpus sizes and types. Two NMT engines and two SMT engines were trained on the KantanMT platform using two different training corpus types with either only domain-specific cardiology corpus or this corpus plus a mixed-domain corpus. The study conducted both automatic evaluations with metrics including BLEU, F-Measure and TER, as well as a comprehensive human evaluation with metrics including fluency, A/B test, and adequacy. Lastly, the study realized a separate, subjective terminology evaluation in order to investigate how differently MT systems handle terminology, as this is a crucial aspect for specific-domain text types such as cardiology. While the automatic evaluation results suggest the SMT engines to perform better than NMT engines, all human evaluators rated the mixed-domain NMT engine as the highest performing one. However, the terminology evaluation task demonstrated SMT to still be able to perform better and to commit less terminology errors, despite the industry and academia shifting toward NMT engines.

Keywords:

Machine translation evaluation, Turkish-to-English machine translation, medical translation, neural machine translation, statistical machine translation,

PDF

___

Ataman, D. (2018). Bianet: A Parallel News Corpus in Turkish, Kurdish and English. Proceedings of the LREC Workshop MLP-Moment, (pp. 14-17). google scholar
Bentivogli, L., Bisazza, A., Cettolo, M., & Federico, M. (2016). Neural versus Phrase-Based Machine Translation Quality: a Case Study. Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, (pp. 257-267). doi:10.18653/v1/D16-1025 google scholar
Burlot, F., Scherrer, Y., Ravishankar, V., Bojar, O., Gronroos, S.-A., Koponen, M., . . . Yvon, F. (2018). The WMT’18 Morpheval test suites for English-Czech, English-German, English- Finnish and Turkish-English. Proceedings of the Third Conference on Machine Translation: Shared Task Papers, (pp. 546-560). google scholar
Castilho, S., Doherty, S., Gaspari, F., & Moorkens, J. (2018). Approaches to human and machine translation quality assessment. In J. Moorkens, S. Castilho, F. Gaspari, & S. (. Doherty (Eds.), Translation Quality Assessment (pp. 9-38). Springer. google scholar
Castilho, S., Moorkens, J., & Gaspari, F. e. (2018). Evaluating MT for massive open online courses: A multifaceted comparison between PBSMT and NMT systems. Machine Translation, 32, 255-278. doi:https:// doi.org/10.1007/s10590-018-9221-y google scholar
Castilho, S., Moorkens, J., Gaspari, F., Calixto, I., Tinsley, J., & Way, A. (2017). Is Neural Machine Translation the New State of the Art? The Prague Bulletin of Mathematical Linguistics, 108(1), pp. 109-120. doi:https:// doi.org/10.1515/pralin-2017-0013 google scholar
Doğru, G., Martin-Mor, A., & Aguilar-Amat, A. (2018). Parallel Corpora Preparation for Machine Translation of Low-Resource Languages: Turkish-to-English Cardiology Corpora. Proceedings of the LREC 2018 Workshop “MultilingualBIO: Multilingual Biomedical Text Processing”, (pp. 12 - 15). Miyazaki. google scholar
El-Kahlout, İ. D., & Oflazer, K. (2006). Initial Explorations in English ^ Turkish Statistical Machine Translation. Proceedings of the Workshop on Statistical Machine Translation, (pp. 7-14). google scholar
El-Kahlout, İ. D., & Oflazer, K. (2010). Exploiting Morphology and Local Word Reordering in English ^ Turkish Phrase-based Statistical Machine Translation. IEEE Transactions on Audio, Speech and Language Processing, 1313-1322. google scholar
Esplâ-Gomis, M., Forcada, M. L., Ramirez-Sanchez, G., & Hoang, H. (2019). ParaCrawl: Web-scale parallel corpora for the languages of the EU. Proceedings of MT Summit XVII, volume 2, (pp. 118 - 119). google scholar
Forcada, M. L. (2010). Machine translation today. In Y. Gambier, & L. Doorslaer (Eds.), Handbook of Translation Studies (pp. 215-223). Amsterdam and Philadelphia: John Benjamins. google scholar
Koehn, P. (2020). Neural Machine Translation. Cambridge University Press. google scholar
Lumeras, M., & Way, A. (2017). On the Complementarity between Human Translators and Machine Translation. HERMES - Journal of Language and Communication in Business(56), 21-42. doi:https://doi.org/10.7146/ hjlcb.v0i56.97200 google scholar
Melamed, I., Green, R., & Turian, J. (2003). Precision and recall of machine translation. Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, HLT-NAACL 2003, (pp. 61-63). Edmonton. google scholar
Oflazer, K., & El-Kahlout, I. D. (2007). Exploring different representational units in English-to-Turkish statistical machine translation. Proceedings of the Second Workshop on Statistical Machine Translation, (pp. 25-32). Prague. google scholar
Oflazer, K., & Saraclar, M. (2018). Turkish Natural Language Processing. Springer International Publishing. google scholar
Papineni, K., Roukos, S., Ward, T., & J, Z. W. (2002). BLEU: a method for automatic evaluation of machine. google scholar
Proceedings of the 40th annual meeting on association for computational linguistics, (pp. 311-318). Philadelphia, Pennsylvania, USA. google scholar
Perez-Ortiz, J. A., Forcada, M. L., & Sanchez-Martinez, F. (2022). How neural machine translation works. In D. Kenny, Machine translation for everyone: Empowering users in the age of artificial intelligence (pp. 141-164). Dublin: Language Science Press. google scholar
Şahin, M. (2015). Çevirmen Adaylarının Gözünden İngilizce- Türkçe Bilgisayar Çevirisi ve Bilgisayar Destekli Çeviri: Google Deneyi. Çeviribilim ve Uygulamaları Dergisi, Journal of Translation Studies(21), 43-60. google scholar
Shterionov, D., Superbo, R., Nagle, P., Casanellas, L., O’Dowd, T., & Way, A. (2018). Human versus automatic quality evaluation of NMT and PBSMT. Machine Translation, 32, 217- 235. doi:https://doi.org/10.1007/ s10590-018-9220-z google scholar
Snover, M., Dorr, B., Schwartz, R., Micciulla, L., & Makhoul, J. (2006). A study of translation edit rate with targeted human annotation. 2006. Proceedings of the 7th conference of the association for machine translation of the Americas. Visions for the future of machine translation, (pp. 223-231). Cambridge,Massachusetts,. google scholar
Tantuğ, A. C., & Adalı, E. (2018). Machine Translation Between Turkic Languages. In K. Oflazer, & M. (. Saraçlar, Turkish Natural Language Processing. Springer International Publishing. google scholar
Tantuğ, A. C., Oflazer, K., & El-Kahlout, I. D. (2008). BLEU+: a Tool for Fine-Grained BLEU Computation. Proceedings of the International Conference on Language Resources and Evaluation, LREC. google scholar
Tiedemann, J. (2012). Parallel Data, Tools and Interfaces in OPUS. Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC 2012), (pp. 2214 - 2218). google scholar
Tyers, F. M., & Alperen, M. S. (2010). South-east European Times: A parallel corpus of Balkan Languages. Proceedings of LREC 2010, Seventh International Conference on Language Resources and Evaluation. Retrieved from http://nlp.ffzg.hr/resources/corpora/setimes/ google scholar
Way, A. (2018). Quality Expectations of Machine Translation. In S. Castilho, J. Moorkens, & F. &. Gaspari (Eds.), Translation Quality Assessment: From Principles to Practice (pp. 159-178). Springer. google scholar
Way, A., & Hearne, M. (2011). On the Role of Translations in State-of-the-Art Statistical Machine Translation. Language and Linguistics Compass, 5/5, 227-248. google scholar
Wolk, K., & Marasek, K. (2014). Building Subject-aligned Comparable Corpora and Mining it for Truly Parallel Sentence Pairs. Procedia Technology, 18, 126-132. google scholar