A RULE BASED NOUN PHRASE CHUNKER FOR TURKISH

A RULE BASED NOUN PHRASE CHUNKER FOR TURKISH

In this paper, we presented a noun phrase chunker for Turkish as an agglutinative language. For finding noun phrases in Turkish sentences, we propose a rule based model which includes preprocessing part and a unit that applies the local grammatical rules to the output of the dependency parser. To the best of our knowledge, our model gives the first results on noun phrase chunking of Turkish sentences that is expected to find not only the basic noun phrase sentences but also the complex noun phrases including the relative clauses. We believe that on that sense, our model will be a good reference for future studies in this domain. We tested our model both on manually annotated data and the output version of the dependency parser. Our model gives the results with annotated data for full match 66.15\% and the partial match 76.79\% (for F1 results). Using output of the dependency parser, the results are 47.91\% and 60.75\% for F1 results accordingly (for F1 results)

___

  • Masayuki Asahara, Chooi Ling Goh, Xiaojie Wang, and Yuji Matsumoto. 2003. Combining segmenter and chunker for chinese word segmentation. In Pro- ceedings of the Second SIGHAN Workshop on Chinese Language Processing - Volume 17, SIGHAN’03, pages 144–147, Stroudsburg, PA, USA. Association for Computational Linguistics.
  • Nart B. Atalay, Kemal Oflazer, Bilge Say, and Informatics Inst. 2003. The annotation process in the turkish treebank. In Proc. of the 4th Intern. EACL Workshop on Linguistically Interpreteted Corpora (LINC).
  • Michaela Atterer and David Schlangen. 2009.Rubisc: A robust unification-based incremental semantic chunker. In Proceedings of the 2Nd Workshop on Semantic Representation of Spoken Language, SRSL’09, pages 66–73, Stroudsburg, PA, USA. Association for Computational Linguistics.
  • Claire Cardie and David Pierce. 1998. Errordriven pruning of treebank grammars for base noun phrase identification. In Proceedings of the 17th International Conference on Computational Linguistics - Volume 1, COLING’98, pages 218–224, Stroudsburg, PA, USA. Association for Computational Linguistics.
  • Kenneth Ward Church. 1988. A stochastic parts program and noun phrase parser for unrestricted text. In Proceedings of the Second Conference on Applied Natural Language Processing, ANLC’88, pages 136–143,Stroudsburg, PA, USA. Association for Computational Linguistics.
  • V. Dhanalakshmi, P. Padmavathy, M. Anand Kumar, K. P. Soman, and S. Rajendran. 2009. Chunker for tamil. In ARTCom, pages 436–438. IEEE Computer Society.
  • Ilknur Durgar El-Kahlout and Ahmet Afsin Akin.2013. Turkish constituent chunking with morphological and contextual features. In CICLing (1), pages 270–281.
  • Gülşen Eryiğit, Tugay Ilbay, and Ozan Arkan Can.2011. Multiword expressions in statistical dependency parsing. In Proceedings of the Second Work- shop on Statistical Parsing of Morphologically Rich Languages ( IWPT 12th International Conference on Parsing Technologies), pages 45–55, Dublin, Ireland, October. Association for Computational Linguistics.
  • Gülşen Eryiğit. 2014. ITU Turkish NLP web service.In Proceedings of the Demonstrations at the 14th Conference of the European Chapter of the Association for Computational Linguistics (EACL), Gothenburg, Sweden, April. Association for Computational Linguistics.
  • Gülşen Eryiğit, Joakim Nivre, and Kemal Oflazer. 2008. Dependency parsing of Turkish. Computational Linguistics, 34(3):357–389.
  • Mehmet Hengirmen. 2002. Tamlamalar. In Türkçe Dilbilgisi, pages 118–142. Engin.
  • Kuang hua Chen and Hsin-Hsi Chen. 1993. A probabilistic chunker. In In: Proceedings of ROCLING VI, pages 99–117.
  • Hannah Kermes and Stefan Evert. 2002. Yac –a recursive chunker for unrestricted german text. In Rodriguez M G, Araujo C P (eds), Proceedings of the Third International Conference on Language Resources and Evaluation, Las, pages 1805–1812.
  • Mücahit Kutlu. 2010. Noun phrase chunker for Turkish using dependency parser.
  • Kemal Oflazer, Bilge Say, Dilek Zeynep Hakkani Tür , and Gökhan Tür. 2003. Building a turkish treebank.
  • Adam Radziszewski and Maciej Piasecki. 2010. A preliminary Noun Phrase Chunker for Polish. In Intelligent Information Systems, pages 169–180. Springer.
  • Lance A. Ramshaw and Mitchell P. Marcus. 1995. Text chunking using transformation based learning. CoRR, cmp-lg/9505040.
  • Muhammet Şahin, Umut Sulubacak, and Gülşen Eryiğit. 2013. Redefinition of Turkish morphology using flag diacritics. In Proceedings of The Tenth Symposium on Natural Language Processing (SNLP-2013), Phuket, Thailand, October.
  • Manabu Sassano and Takehito Utsuro. 2000. Named entity chunking techniques in supervised learning for japanese named entity recognition.
  • Akshay Singh, Sushma Bendre, and Rajeev Sangal.2005. Hmm based chunker for hindi. In In the Proceedings of International Joint Conference on NLP.
  • Kristina Vuckovic, Marko Tadic, and Zdravko Dovedan. 2008. Rule-based chunker for croatian.In LREC. European Language Resources Association.
  • İnternet Sistemleri Konsorsiyomu, www.isc.org/solutions/survey, Erişim tarihi: Kasım 2011.
  • xxx. 2014. Morphological processing of turkish. xxx, x(x), x.