TRMOR: a finite-state-based morphological analyzer for Turkish

TRMOR: a finite-state-based morphological analyzer for Turkish

Morphological analysis is an important component of natural language processing systems like spellingcorrection tools, parsers, machine translation systems, and dictionary tools. In this paper, we present TRMOR, amorphological analyzer for Turkish, which uses the SFST tool (Stuttgart Finite-State Transducer). TRMOR can be freelyused for academic research (see http://www.cis.uni-muenchen.de/ schmid/tools/SFST/). It covers a large part of Turkishmorphology including inflection, derivation, and some compounding. It uses morphotactic and morphophonological rulesand a stem lexicon. We describe the morphological structure of Turkish, explain the phonological and morphological rulesimplemented in TRMOR, evaluate the system, and test it in special cases. The evaluation of TRMOR was executedon gold-standard words. One thousand words were randomly selected from Wikipedia word lists. For those words,we achieved gold-standard analysis. TRMOR has 94.12% precision on these 1000 words that were randomly selectedfrom Wikipedia word lists. Morphological analyses of Turkish are prepared for the gold-standard version since, to ourknowledge, there is no gold-standard segmentation available for Turkish morphological analyzers for noncommercialpurposes.

___

  • [1] Schmid H. A programming language for finite state transducers. In: FSMNL; Helsinki, Finland; 2005. pp. 308-309.
  • [2] Koehn P. Europarl: A parallel corpus for statistical machine translation. In: MT Summit; Edinburgh, UK; 2005. pp. 79-86.
  • [3] Eryiğit G, Oflazer K. Statistical dependency parsing of Turkish. In: 11th Conference of the European Chapter of the Association for Computational Linguistics; Trento, Italy; 2006. pp. 89-96.
  • [4] Göksel A, Kerslake C. Turkish: A Comprehensive Grammar. London, UK: Routledge, 2004.
  • [5] Can B. Unsupervised learning of allomorphs in Turkish. Turkish Journal of Electrical Engineering & Computer Sciences 2017; 25 (4): 3253-3260. doi: 10.3906/elk-1605-216
  • [6] Schmid H, Fitschen A, Heid U. SMOR: A German computational morphology covering derivation, composition and inflection. In: 4th International Conference on Language Resources and Evaluation; Lisbon, Portugal; 2004. pp. 1263-1266.
  • [7] Hankamer J. Turkish vowel epenthesis. In: Erguvanli-Taylan E, Rona B (editors). Puzzles of Languages: Essays in Honour of Karl Zimmer. Wiesbaden, Germany: Harrasowitz-Verlag, 2011, pp. 55-69.
  • [8] Antworth EL. PC-KIMMO: A two-level processor for morphological analysis. In: Summer Institute of Linguistics; Dallas, TX, USA; 1990.
  • [9] Beesley KR, Karttunen L. Finite-state morphology: Xerox tools and techniques. Stanford, CA, USA: CSLI, 2003.
  • [10] Hulden M. Foma: A finite-state compiler and library. In: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics: Demonstrations Session; Athens, Greece; 2009. pp. 29-32.
  • [11] Lindén K, Silfverberg M, Pirinen T. HFST tools for morphology–an efficient open-source package for construction of morphological analyzers. In: International Workshop on Systems and Frameworks for Computational Morphology; Berlin, Germany; 2009. pp. 28-47.
  • [12] Oflazer K. Two-level description of Turkish morphology. Literary and Linguistic Computing 1994; 9 (2): 137-148.
  • [13] Hankamer J. Finite state morphology and left-to-right morphology. In: West Coast Conference on Formal Linguistics; Stanford, CA, USA; 1986.
  • [14] Karttunen L. Finite-state constraints. In: Goldsmith J (editor). The Last Phonological Rule 6. Chicago, IL, USA: University of Chicago Press, 1993, pp. 173-194.
  • [15] Çöltekin Ç. A freely available morphological analyzer for Turkish. In: 7th International Conference on Language Resources and Evaluation; Valetta, Malta; 2010. pp. 820-827.
  • [16] Springmann U, Schmid H, Najock D. LatMor: A Latin finite-state morphology encoding vowel quantity. Open Linguistics 2016; 2: 386–392.
  • [17] Karp D, Schabes Y, Zaidel M, Egedi D. A freely available wide coverage morphological analyzer for English. In: Proceedings of the 14th Conference on Computational Linguistics-Volume 3; Nantes, France; 1992. pp. 950-955.
  • [18] Cap F. Morphological processing of compounds for statistical machine translation. PhD, University of Stuttgart, Stuttgart, Germany, 2014.
  • [19] Karttunen L, Beesley KR. A short history of two-level morphology. In: ESSLLI-2001 Special Event Titled Twenty Years of Finite-State Morphology; Helsinki, Finland; 2001.
  • [20] Oflazer K, Inkelas S. A finite state pronunciation lexicon for Turkish. In: Proceedings of the EACL Workshop on Finite State Methods in NLP; 2003. pp. 900-918.
  • [21] Solak A, Oflazer K. Parsing agglutinative word structures and its application to spelling checking for Turkish. In: 1992 Proceedings of the 14th Conference on Computational Linguistics-Volume 1; Nantes, France; 1992. pp. 39-45
  • [22] Kaplan RM, Kay M. Regular models of phonological rule systems. Computational Linguistics 1994, 20: 331-378.
  • [23] Fitschen A. Ein computerlinguistisches Lexikon als komplexes System. PhD, University of Stuttgart, Stuttgart, Germany, 2004 (in German).
  • [24] Van der Hulst H, Van De Weijer J. Topics in Turkish phonology. In: Boeschoten H, Verhoeven L (editors). Turkish Linguistics Today. Leiden, the Netherlands: E.J. Brill, 1991, pp. 11-59.
  • [25] Öztaner SM. A word grammar of Turkish with morphophonemic rules. MSc, Middle East Technical University, Ankara, Turkey, 1996.
  • [26] Pado S. Computerlinguistik und Sprachtechnologie: Eine Einführung. 3rd ed. Berlin, Germany: Springer-Verlag, 2009 (in German).
  • [27] Kurimo M, Creutz M, Varjokallio M. Unsupervised morpheme analysis evaluation by a comparison to a linguistic gold standard. In: CLEF; Budapest, Hungary; 2007.
  • [28] Faaß G, Heid U, Schmid H. Design and application of a gold standard for morphological analysis: SMOR as an example of morphological evaluation. In: 7th International Conference on Language Resources and Evaluation; Valetta, Malta; 2010. pp. 803-810.
Turkish Journal of Electrical Engineering and Computer Sciences-Cover
  • ISSN: 1300-0632
  • Yayın Aralığı: Yılda 6 Sayı
  • Yayıncı: TÜBİTAK