Implementing universal dependency, morphology, and multiword expression annotation standards for Turkish language processing

Implementing universal dependency, morphology, and multiword expression annotation standards for Turkish language processing

Released only a year ago as the outputs of a research project (“Parsing Web 2.0 Sentences”, supported in part by a TUB¨ ˙ITAK 1001 grant (No. 112E276) and a part of the ICT COST Action PARSEME (IC1207)), IMST and IWT are currently the most comprehensive Turkish dependency treebanks in the literature. This article introduces the final states of our treebanks, as well as a newly integrated hierarchical categorization of the multiheaded dependencies and their organization in an exclusive deep dependency layer in the treebanks. It also presents the adaptation of recent studies on standardizing multiword expression and named entity annotation schemes for the Turkish language and integration of benchmark annotations into the dependency layers of our treebanks and the mapping of the treebanks to the latest Universal Dependencies (v2.0) standard, ensuring further compliance with rising universal annotation trends. In addition to significantly boosting the universal recognition of Turkish treebanks, our recent efforts have shown an improvement in their syntactic parsing performance (up to 77.8%/82.8% LAS and 84.0%/87.9% UAS for IMST/IWT, respectively). The final states of the treebanks are expected to be more suited to different natural language processing tasks, such as named entity recognition, multiword expression detection, transfer-based machine translation, semantic parsing, and semantic role labeling.

___

  • Nivre J, Hall J, Nilsson J, Chanev A, Eryiğit G, Kübler S, Marinov S, Marsi E. MaltParser: A language-independent system for data-driven dependency parsing. Nat Lang Eng 2007; 13: 95-135.
  • Adalı K, Din¸c T, Gökırmak M, Eryiğit G. Comprehensive annotation of multiword expressions for Turkish. In: 1st International Conference on Computational Turkish Linguistics; 3 April 2016; Konya, Turkey. pp. 60-66.
  • Savary A, Sailer M, Parmentier Y, Rosner M, Ros´en V, Przepi´orkowski A, Krstev C, Vincze V, W´ojtowicz B, Losnegaard GS et al. PARSEME–PARSing and multiword expressions within a European multilingual network. In: 7th Annual Language and Technology Conference; 2015; Pozna´n, Poland.
  • Eryiğit G, ˙Ilbay T, Can OA. Multiword expressions in statistical dependency parsing. In: 2nd Workshop on Statistical Parsing of Morphologically-Rich Languages; 6 October 2011; Dublin, Ireland. New York, NY, USA:ACL. pp. 44-55
  • Eryiğit G, Adalı K, Torunoğlu-Selamet D, Sulubacak U, Pamay T. Annotation and extraction of multiword expressions in Turkish treebanks. In: 11th Workshop on Multiword Expressions; 4 June 2015; Denver, CO, USA.
  • Çelikkaya G, Torunoğlu D, Eryiğit G. Named entity recognition on real data: a preliminary investigation for Turkish. In: 7th Annual Conference on Application of Information and Communication Technologies; 23–25 October 2013; Baku, Azerbaijan. New York, NY, USA: IEEE. pp. 1-5.
  • Sang EFTK, De Meulder F. Introduction to the CoNLL-2003 shared task: language-independent named entity recognition. In: 7th Conference on Natural Language Learning; 2003; Edmonton, Canada. New York, NY, USA: ACL. pp. 142-147.
  • Baldwin T, Kim SN. Multiword expressions. In: Indurkhya N, Damerau FJ, editors. Handbook of Natural Language Processing. 2nd ed. Boca Raton, FL, USA: Chapman and Hall/CRC; 2010. pp. 267-292.
  • Sulubacak U. Improving statistical dependency parsing performance in Turkish by use of a new annotation scheme. MSc, İstanbul Technical University, İstanbul, Turkey, 2015 (in Turkish with a summary in English).
  • Nivre J, de Marneffe MC, Ginter F, Goldberg Y, Hajic J, Manning CD, McDonald R, Petrov S, Pyysalo S, Silveira N et al. Universal Dependencies v1: A multilingual treebank collection. In: 10th International Conference on Language Resources and Evaluation; May 2016; Paris, France.
  • Sundheim B. Overview of results of the MUC-6 evaluation. In: 6th Message Understanding Conference; 6–8 November 1995; Columbia, MD, USA.
  • Rosen V, Losnegaard GS, De Smedt K, Bejcek E, Savary A, Przepi´orkowski A, Osenova P, Mititelu VB. A survey of multiword expressions in treebanks. In: 14th International Workshop on Treebanks & Linguistics; December 2015; Warsaw, Poland.
  • Schuster S, Manning CD. Enhanced English universal dependencies: an improved representation for natural language understanding tasks. In: 10th International Conference on Language Resources and Evaluation; 23–28 May 2016; Portoroˇz, Slovenia.
  • Sulubacak U, Gökırmak M, Tyers F, Çöltekin C¸, Nivre J, Eryiğit G. Universal dependencies for Turkish. In: 26th International Conference on Computational Linguistics; 11–17 December 2016; Osaka, Japan. pp. 3444-3454.
  • Seddah D, Sagot B, Candito M, Mouilleron V, Combet V. The French social media bank: a treebank of noisy user generated content. In: 24th International Conference on Computational Linguistics; December 2012; Mumbai, India
  • Bies A, Mott J, Warner C, Kulick S. English Web Treebank. Philadelphia, PA, USA: LDC, 2012.
  • Pamay T, Sulubacak U, Toruno˘glu-Selamet D, Eryi˘git G. The annotation process of the ITU Web Treebank. In: Proceedings of the 9th Linguistic Annotation Workshop; 5 June 2015; Denver, CO, USA.
  • S¸eker GA, Eryiğit G. Extending a CRF-based named entity recognition model for Turkish well formed text and user generated content. Semant Web 2017; 8: 625-642.
  • Eryiğit G. ITU Turkish NLP web service. In: 14th Conference of the European Chapter of the Association for Computational Linguistics; 2014; Gothenburg, Sweden. pp. 1-8
  • Sulubacak U, Pamay T, Eryi˘git G. IMST: A revisited Turkish dependency treebank. In: 1st International Conference on Computational Turkish Linguistics; 3 April 2016; Konya, Turkey.
  • Percival WK. Reflections on the history of dependency notions in linguistics. Hist Ling 1990; 17: 29-47.
  • Eryğit G, Nivre J, Oflazer K. Dependency parsing of Turkish. Comput Linguist 2008; 34: 357-389.
  • Eryiğit G. Dependency parsing of Turkish. PhD, İstanbul Technical University, İstanbul, Turkey, 2006.
  • Buchholz S, Marsi E. CoNLL-X Shared Task on multilingual dependency parsing. In: 10th Conference on Computational Natural Language Learning; 8–9 June 2006; New York, NY, USA. New York, NY, USA: ACL. pp.149-164.
  • Oflazer K, Say B, Hakkani-T¨ur DZ, T¨ur G. Building a Turkish treebank. In: Abeille A, editor. Building and Exploiting Syntactically-Annotated Corpora. Dordrecht, the Netherlands: Kluwer Academic Publishers, 2003.
  • Atalay NB, Oflazer K, Say B. The annotation process in the Turkish Treebank. In: 4th International Workshop on Linguistically Interpreted Corpora; 13–14 April 2003; Budapest, Hungary.
  • Tesniere L. Elements de Syntaxe Structurale. Paris, France: Editions Klinksieck, 1959 (in French).
  • Kübler S, McDonald R, Nivre J. Dependency parsing. In: Heinz J, editor. Synthesis Lectures on Human Language Technologies. San Rafael, CA, USA: Morgan & Claypool, 2009.
Turkish Journal of Electrical Engineering and Computer Sciences-Cover
  • ISSN: 1300-0632
  • Yayın Aralığı: Yılda 6 Sayı
  • Yayıncı: TÜBİTAK
Sayıdaki Diğer Makaleler

Ab initio study of the structural, electronic, and magnetic properties of Co2FeGa and Co2FeSi and their future contribution to the building of quantum devices

Badra BOUABDALLAH, Abdallah OUMSALEM, Yamina BOUREZIG, Zakia NABI

A compact branch-line coupler design using low-pass resonators and meandered lines open stubs

Fatemeh HOSSEINKHANI, Saeed ROSHANI

Investigating the experimental limits of the Brewster’s angle method

Mehmet Salih DİNLEYİCİ, Alp KUŞTEPELİ, Can SUMER

Modeling of realistic heart electrical excitation based on DTI scans and modified reaction diffusion equation

Ihab ELAFF

A new inset-fed UWB printed antenna with triple 3.5/5.5/7.5-GHz band-notched characteristics

Avez SYED, Rabah Wasel ALDHAHERI

Ozyeğin Biopsy Robot: System integration architecture and motion compensation of a moving target

Özkan BEBEK, Awais AHMAD

Virtual force-based intelligent clustering for energy-efficient routing in mobile wireless sensor networks

Selvi MUNUSWAMY, Jothi Muneeswari SARAVANAKUMAR, Ganapathy SANNASI, Khanna Nehemiah HARICHANDRAN, Kannan ARPUTHARAJ

A novel DTC-SVM approach for two parallel-connected induction motors fed by matrix converter

Karim ABBASZADEH, Mahdi JAFARI, Mustafa MOHAMMADIAN

An integrated approach for the development of an electric vehicle powertrain: design, analysis, and implementation

Özgür ÜSTÜN, Ömer Cihan KIVANÇ, Murat ÇAKAN, Ramazan Nejat TUNCAY, Can GÖKÇE, Mert Safa MÖKÜKCÜ, Gürkan TOSUN

Implementing universal dependency, morphology, and multiword expression annotation standards for Turkish language processing

Umut SULUBACAK, Gül¸sen ERYİĞİT