GRAMMATICAL DISAMBIGUATION IN THE TATAR LANGUAGE CORPUS

GRAMMATICAL DISAMBIGUATION IN THE TATAR LANGUAGE CORPUS

This article concerns the issues of corpus-oriented study of the most frequent types of grammatical homonymy in the Tatar language and the possiblities for automation of the disambiguation process in the corpus. The authors determine the relevance of alternative parses generated in the process of automatic morphological analysis in terms of real linguistic ambiguity. This work presents a variant of classification of frequent homoforms and methods for their disambiguation, and it estimates the potential impact on the corpus.

___

  • [1] Yuret D., Ture F. Learning Morphological Disambiguation Rules for Turkish. Proceedings of the Human Language Technology Conference of the North American Chapter of the ACL. NewYork, 2006. Pp. 328–334.
  • [2] Galieva A.M., Khakimov B.E., Gatiatullin A.R. Metayazyk opisaniya struktury tatarskoy slovoformy dlya korpusnoy grammaticheskoy annotatsii [Metalanguage of description of a Tatar wordform for corpus-based grammatical annotation]. Proceedings of the Kazan University, Ser. Human sciences, 2013. V. 155, b. 5. Pp. 287-296. (rus)
  • [3] Nevzorova O.A., Zinkina Yu.V., Pyatkin N.V. Razresheniye funktsionalnoy omonimii v russkom yazyke na osnove kontekstnykh pravil [Resolution of functional homonymy in the Russian language based on context rules]. Proceedings of “Dialog'2005” International Conference. Moscow: Nauka, 2005. Pp. 198- 202. (rus)
  • [4] Suleymanov D.Sh., Gilmullin R.A. Dvukhurovnevoye opisaniye morfologii tatarskogo yazyka [Two-level description of the Tatar language morphology]. Proceedings of “Language semantics and image of the world” International Scientific Conference. Kazan: Ed. Kazan State University, 1997. Vol 2. Pp. 65-67. (rus)
  • [5] Suleymanov D.Sh., Gilmullin R.A., Gataullin R.R. Programmnyy instrumentariy dlya razresheniya morfologicheskoy mnogoznachnosti v tatarskom yazyke [Software toolkit for morphologic disambiguation in the Tatar language]. Proceedings of OSTIS-2014 IV International scientific and technical conference. Minsk, 2014. Pp. 503-508. (rus)
  • [6] Suleymanov D.Sh., Khakimov B.E., Gilmullin R.A. Korpus tatarskogo yazyka: kontseptualnyye i lingvisticheskiye aspekty [Tatar language corpus: conceptual and linguistic aspects]. Bulletin of Tatar State Humanitarian Pedagogical University. 2011. № 4 (26). Pp.211-216. (rus)
  • [7] “Tugan Tel” Tatar National Corpus. – URL: http://webcorpora.net/TatarCorpus/search/?interface_lang uage=ru.
  • [8] Khakimov B.E., Gilmullin R.A. K razrabotke morfologicheskogo standarta dlya sistem avtomaticheskoy obrabotki tekstov na tatarskom yazyke [Notes on the development of a morphological standard for automatic text processing systems in the Tatar language]. System analysis and semiotic modeling: Proceedings of all-Russia conference with international participation (SASM-2011). Kazan, 2011. PP. С. 209-214. (rus)