Automatic knowledge extraction for filling in biography forms from Turkish texts

This study presents a method for building an automatic knowledge extraction system for filling in biography forms from Turkish texts. Several biographies are analyzed in order to choose the set of biography categories to be studied. The fields of the biography form to be created are also defined based on this analysis. Information extraction techniques are used for implementation. A separate testing platform is designed to evaluate the accuracy of the extracted data. Results of the testing platform have shown this study to be a promising process to be further developed especially for creating forms in the Turkish language.

Automatic knowledge extraction for filling in biography forms from Turkish texts

This study presents a method for building an automatic knowledge extraction system for filling in biography forms from Turkish texts. Several biographies are analyzed in order to choose the set of biography categories to be studied. The fields of the biography form to be created are also defined based on this analysis. Information extraction techniques are used for implementation. A separate testing platform is designed to evaluate the accuracy of the extracted data. Results of the testing platform have shown this study to be a promising process to be further developed especially for creating forms in the Turkish language.

___

  • H. Alani, S. Kim, D. E. Millard, M. J. Weal, P. H. Lewis, W. Hall, N. R. Shadbolt, “Automatic Extraction of Knowledge from Web Documents”, In: 2nd International Semantic Web Conference - Workshop on Human Language Technology for the Semantic Web and Web Services, October 20-23, Sanibel Island, Florida, USA, 2003.
  • C. Hargood, D. Millard, M. Weal, “A Thematic Approach to Emerging Narrative Structure”. In: Web Science Workshop at Hypertext, Pittsburg, USA, 2008
  • M. Hearst, “Applied Natural Language Processing Lecture notes”, Slides adopted by W. Cohen, 15 November 2006, http://www.sims.berkeley.edu/∼hearst
  • L. Sarmento, “Hunting Answers with RAPOSA (FOX)”, Working Notes of the Cross-Language Evaluation Forum Workshop CLEF, Alicant, Spain, 20-22 September, 2006
  • S. Soderland, “Learning Information Extraction Rules for semi structured and free text”, Special issue on natural language learning on Machine Learning, February 1999, Volume 34, Issue 1-3, Pages: 233 – 272, 1999,
  • W. Cohen, “Fast effective rule induction”, In Proceedings of the Twelfth International Conference on Machine Learning (ICML-95), pages 115–123, San Francisco, CA, 1995.
  • A. McCallum, “Information Extraction: Distilling Structured Data from Unstructured Text”, Social Computing , Q focus: social computing, Queue, Volume 3, Issue 9, Pages: 48 – 57, 2005
  • H. Cunningham, “Information Extraction, Automatic”, In Brown, K. (ed.), Encyclopedia of Language and Linguis- tics, vol. 1-14, p.665-677, 2nd Edition, Elsevier Science Publishers, 2005
  • L. Karttunen, J-P. Chanod, G. Grefenstette, A. Schiller, “Regular expressions for language engineering”, Natural Language Engineering,1996
  • S. Levithan, J. Goyvaerts, Regular Expressions Cookbook, O’Reilly Media , 2003-2009
  • D. Jurasfsky, and J.H. Martin, An Introduction to Natural Language Processing, Computational Linguistics, and
  • Speech Recognition, Prentice Hall, New Jersey, 2000