A tree-based approach for English-to-Turkish translation

  In this paper, we present our English-to-Turkish translation methodology, which adopts a tree-based approach. Our approach relies on tree analysis and the application of structural modification rules to get the target side (Turkish) trees from source side (English) ones. We also use morphological analysis to get candidate root words and apply tree-based rules to obtain the agglutinated target words. Compared to earlier work on English-to-Turkish translation using phrase-based models, we have been able to obtain higher BLEU scores in our current study. Our syntactic subtree permutation strategy, combined with a word replacement algorithm, provides a 67 % relative improvement from a baseline 12.8 to 21.4 BLEU, all averaged over 10-fold cross-validation. As future work, improvements in choosing the correct senses and structural rules are needed.

___

  • El-Kahlout ID, Oflazer K. Initial explorations in English to Turkish statistical machine translation. In: Workshop on Statistical Machine Translation; 2006; Stroudsburg, PA, USA. pp. 7-14.
  • El-Kahlout ID, Oflazer K. Exploiting morphology and local word reordering in English-to-Turkish phrase-based statistical machine translation. IEEE T Audio Speech 2010; 18: 1313-1322.
  • Oflazer K, El-Kahlout ID. Exploring different representational units in English-to-Turkish statistical machine translation. In: Workshop on Statistical Machine Translation; 2007; Prague, Czech Republic. pp. 25-32.
  • Oflazer K. Statistical machine translation into a morphologically complex language. In: Intelligent Text Processing and Computational Linguistics; 2008; Haifa, Israel. pp. 376-387.
  • Yeniterzi R, Oflazer K. Syntax-to-morphology mapping in factored phrase-based statistical machine translation from English to Turkish. In: 48th Annual Meeting of the Association for Computational Linguistics; 2010; Stroudsburg, PA, USA. pp. 454-464.
  • Koehn P. Statistical Machine Translation. New York, NY, USA: Cambridge University Press, 2010.
  • Gorgun O, Yildiz OT, Solak E, Ehsani R. English-Turkish parallel treebank with morphological annotations and its use in tree-based smt. In: International Conference on Pattern Recognition and Methods; 2016; Rome, Italy. pp. 510-516.
  • Marcus MP, Marcinkiewicz MA, Santorini B. Building a large annotated corpus of English: The Penn treebank. Comput Linguist 1993; 19: 313-330.
  • Erguvanlı ET. The Function of Word Order in Turkish Grammar. Berkeley, CA, USA: University of California Press, 1984.
  • Hutchins J. ALPAC: The (In)famous Report. Cambridge, MA, USA: MIT Press, 2003.
  • Thouin B. The METEO system. Practical Experience of Machine Translation. Amsterdam, the Netherlands: North Holland Publishing Co., 1982.
  • Berger AL, Brown PF, Pietra SAD, Pietra VJD, Gillett JR, Lafferty JD, Mercer RL, Printz H, Ures L. The Candide system for machine translation. In: Workshop on Human Language Technology; 1994; New Jersey, USA. pp. 157-162.
  • Och FJ, Ney H. A systematic comparison of various statistical alignment models. Comput Linguist 2003; 29: 1951.
  • Vogel S, Ney H, Tillman C. HMM-based word alignment in statistical translation. In: 16th International Conference on Computational Linguistics; 1996; Copenhagen, Denmark. pp. 836-841.
  • Och FJ, Tillmann C, Ney H. Improved alignment models for statistical machine translation. In: Joint Conference of Empirical Methods in Natural Language Processing and Very Large Corpora; 1999; Maryland, USA. pp. 20-28.
  • Och FJ, Ney H. The alignment template approach to statistical machine translation. Comput Linguist 2004; 30: 418-449.
  • Oflazer K. Turkish and its challenges for language processing. Language Resources and Evaluation 2014; 48: 639-653.
  • Chiang D. A hierarchical phrase-based model for statistical machine translation. In: 43rd Annual Meeting of the Association for Computational Linguistics; 2005; Ann Arbor, MI, USA. pp. 263-270.
  • Chiang D. Hierarchical phrase-based translation. Comput Linguist 2007; 33: 201-228.
  • Chiang D, Lopez A, Madnani N, Monz C, Resnik P, Subotin M. The Hiero machine translation system: extensions, evaluation, and analysis. In: Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing; 2005; Vancouver, Canada. pp. 779-786.
  • Gorgun O, Yildiz OT. A novel approach to morphological disambiguation for Turkish. In: Computer and Information Sciences II; 2011; Germany. pp. 77-83.
  • Papineni K, Roukos S, Ward T, Zhu W. BLEU: A method for automatic evaluation of machine translation. In: 40th Annual Meeting of the Association for Computation Linguistics; 2002; Philadelphia, PA, USA. pp. 311-318.