PREDICTION OF FUNCTION TAGS OF THE SIMPLE TURKISH SENTENCES BY CONDITIONAL RANDOM FIELDS

The prediction of function tags is a key component of several natural language tasks. In this study, Conditional Random Fields are employed for Turkish sentences. The affects of the size of training set, the usage of morphological features of the words are investigated. As a result, we achieved 75% success ratio on our datasets having 2000 simple sentences.

___

  • [1] Milli Eğitim Bakanlığı (2014) , Eğitim Bilişim Ağı, 9. Sınıf Ders içerikleri, Cümlenin Öğeleri, [Internet] www.eba.gov.tr/video/izle/02587b6392e7b8b634f78977bd638f5cc482581ed6300 [Erişim tarihi;11.02.2014].
  • [2] S.V. N. Vishwanathan, Nicol N. Schraudolph, Mark W. Schmidt, Kevin P. Murphy, “Accelerated Training of Conditional Random Fields with Stochastic Gradient Methods”, In Proceedings of the 23 rd International Conference on Machine Learning, Pittsburgh, PA, 2006.
  • [3] Lafferty, J. D., McCallum, A., ve Pereira, F., “Conditional random fields: Probabilistic modeling for segmenting and labeling sequence data”, In Proc. Intl. Conf. Machine Learning, vol. 18. 2001.
  • [4] Cihan Özköse, M.Fatih Amasyalı, “Cümle Öğelerinden Hayat Bilgisi Çıkarımı”, Türkiye Bilgisayar Mühendisliği Dergisi, Sayı:06, Aralık 2012.
  • [5] Nilay Coşkun, “Türkçe Tümcelerin Öğelerinin Bulunması”, Yüksek Lisans Tezi, İTÜ Fen Bilimleri Enstitüsü, 2013.
  • [6] Charles Sutton, Andrew McCallum, “An Introduction to Conditional Random Fields”, Foundations and Trends in Machine Learning 4 (4). 2012.
  • [7] Gökhan Akın Şeker, Gülşen Eryiğit. “Initial explorations on using CRFs for Turkish Named Entity Recognition”, In Proceedings of the 24th International Conference on Computational Linguistics, COLING 2012, Mumbai, India, 2012.
  • [8] Ozkaya, S., Diri, B., “Named Entity Recognition by Conditional Random Fields from Turkish informal texts”, Signal Processing and Communications Applications (SIU), 2011 IEEE 19th Conference.
  • [9] Gülşen Eryiğit, “Dependency parsing of Turkish”, 2006. Ph.D. Thesis, Istanbul Technical University, Istanbul.
  • [10] Kudo, T. (2009) CRF++: Yet Another CRF toolkit, [Internet] https://code.google.com/p/crfpp [Erişim tarihi;11.02.2014].
  • [11] Han-Shen Huang, Yu-Ming Chang, Chun-Nan Hsu, “Training Conditional Random Fields by Periodic Step Size Adaptation for Large-Scale Text Mining”, ICDM, 511–516, 2007.
  • [12] Akin, A.A., Akin, M.D. (2007) Zemberek, an open source NLP framework for Turkic Languages, [Internet] http://zemberek.googlecode.com/files/zemberek_makale.pdf [Erişim tarihi;11.02.2014].
  • [13] Can, F., Koçberber, S., Bağlıoğlu, O., Kardaş, S., Öcalan, H.C., Uyar, E., “Türkçe haberlerde yeni olay bulma ve izleme: Bir deney derleminin oluşturulması”, Akademik Bilişim Sempozyumu, 2009.
  • [14] M. A. Hall, “Correlation-based Feature Subset Selection for Machine Learning”, Ph.D. thesis, University of Waikato, 1998.